Why Is Distribution Prioritized Over Combining?

So I've come up with this question that's been bothering me for weeks,and I've been searching and asking everyone and everywhere that I can. I suspect that it may have no answer. The question is this:
Consider the properties of real numbers that we take for granted at the start of an algebra or analysis class (commutativity, association, and distribution of multiplying over addition). Granted that the last one, distribution (the transformation \(a(b+c) = ab + ac\)), is effectively equivalent to what we might call "combining like terms" (the transformation \(ax + bx = (a+b)x\)). It seems like the latter is more fundamental and easier to intuit as an axiom, since it resembles simple addition of units (e.g., 3 feet + 5 feet = 8 feet). So historically and/or pedagogically, what was the reason choosing the name and order we have ("distribution", \(a(b+c)=ab+ac\)), instead of the other option ("combining", \(ax+bx = (a+b)x\)) for the starting axiom?
I suspect now that there simply isn't any reason that we can document. Some expansion on the problem:

In dimensional analysis, some call the idea of only adding or comparing like units the "Great Principle of Similitude". Which provides some of my motivation for wishing that we would start with this ("combining") and then derive distribution (using commutativity a few times). Note that this phrase is in many places erroneously attributed to Newton; in truth the earliest documented usage of the phrase is by Rayleigh in a letter to Nature (No. 2368, Vol. 95; March 18, 1915). I could probably write a whole post just on the hunt for this quote. Big thanks to Juan Santiago who teaches a class by that name at Stanford (link) for helping me track down the article.

The Math Forum at Drexel discusses some history of the names of the basic properties. The best that Doctor Peterson can track down is that terms such as "distribution" were first used in the late 1700's to 1800's (starting in French in a memoir by Francois Joseph Servois). No commentary on a reason for why this was picked over alternative formulations. But perhaps the fact that the original discussion was in terms of functions (not binary operators) provides a clue. (For the full French text, see here and search "commutative and distributive").

Here's me asking the question at the StackExchange Mathematics site. Unfortunately, most commentators considered it to be uninteresting. When it got no responses, I cross-posted to the Mathematics Educators site -- which is apparently a huge faux pas, and immediately got it down-moderated into oblivion. The only relevant answer to date was from Benjamin Dickman, who pointed to a very nice quote from Euclid: when he states a similar property in geometric terms (area of a rectangle versus the sum of it being sliced up), it happens to be in the same order as we present the distribution property. But still no word on any reason why it should be in that order and not the reverse. 

Observations from a few textbooks that I have lying around:
  • Rietz and Crathorne, Introductory College Algebra (1933). Article 4 shows combining like terms, and asserts that's justified by the associative law (which is nonsensical). The distributive property isn't presented until later, in Article 8.
  • Martin-Gay, Prealgebra & Introductory Algebra. In the purely numerical prealgebra section, this first shows up as distribution among numbers (Sec 1.5). But the first time it appears in the algebra section with variables it is in fact written and used for combining like terms (Sec 3.1: \(ac + bc = (a+b)c\), although still called the "distributive property"). Combining like terms is actually done even earlier than that on an intuitive basis (see Sec. 2.6, Example 4). Only later is the property presented and used to remove parentheses from variable expressions.
  • Bittinger's Intermediate Algebra shows standard distribution, followed immediately by use for combining like terms. Sullivan's Algebra & Trigonometry does the same.
So my point with those sources is that even though distribution is usually presented in a removing-parentheses format, in practice many textbook authors find themselves unable to escape the need to use combining like terms at some earlier point in their presentation (Rietz and Crathorne, Martin-Gay). This observation bolsters my growing instinct that it would be more intuitive to present the property in that format in the first place (as Martin-Gay does, the first time it appears with variables), and then derive what we call distribution from that.

Another thought is that while you can point to distribution as justifying the standard long-multiplication process (across decimal place value), the interior additions are implied and not explicit, and so they don't really serve to develop intuition in the same way that simple unit addition does.

Therefore, I find myself fantasizing about the following. Write a slightly nonstandard algebra textbook that starts by assuming commutativity, association, and the combining-like-terms-property (and shortly after deriving the distribution property). Perhaps for a better name it could be called "collection of like multiplications inside addition", or something like that.

Do you think this would be a better set of axioms for a basic algebra class? Can you think of a solid historical or pedagogical reason why the name and presentation were not the other way around, like this? Likely some more on this later.


  1. I think the reason for this choice has to do with geometric intuition. It is often helpful to visualize multiplication as computation of area. Being able to combine two rectangles with a common side length into one is just a very special case of combining two arbitrary rectangles, less interesting than splitting one rectangle into two.

    > (the transformation ax+bx=(a+b))

    Missing an x on the rhs.

  2. I don't think I understand your point.

    The distributive law merely asserts that the two sides of the equation are equal. It makes no difference which is on the left and which is on the right.

    Perhaps your difficulty is coming from thinking of the "=" as an operation to perform, rather than as a symmetric statement of equality. (This is a common mistake for students to make.)

    1. I suggest you think more carefully about the naming question. When the transformation is from right-to-left, it doesn't match the natural language definition of "to distribute" ("to divide and give out in shares/to scatter or spread out"). And in fact, we do have a standard existing phrase, "combine like terms", used to describe that distinct transformation (with numerical coefficients). So arguably that distinct usage and name should be presented first.