I suspect now that there simply isn't any reason that we can document. Some expansion on the problem:Consider the properties of real numbers that we take for granted at the start of an algebra or analysis class (commutativity, association, and distribution of multiplying over addition). Granted that the last one, distribution (the transformation \(a(b+c) = ab + ac\)), is effectively equivalent to what we might call "combining like terms" (the transformation \(ax + bx = (a+b)x\)). It seems like the latter is more fundamental and easier to intuit as an axiom, since it resembles simple addition of units (e.g., 3 feet + 5 feet = 8 feet). So historically and/or pedagogically, what was the reason choosing the name and order we have ("distribution", \(a(b+c)=ab+ac\)), instead of the other option ("combining", \(ax+bx = (a+b)x\)) for the starting axiom?

In dimensional analysis, some call the idea of only adding or comparing like units the "Great Principle of Similitude". Which provides some of my motivation for wishing that we would start with this ("combining") and then derive distribution (using commutativity a few times). Note that this phrase is in many places erroneously attributed to Newton; in truth the earliest documented usage of the phrase is by Rayleigh in a letter to

*Nature*(No. 2368, Vol. 95; March 18, 1915). I could probably write a whole post just on the hunt for this quote. Big thanks to Juan Santiago who teaches a class by that name at Stanford (link) for helping me track down the article.

The Math Forum at Drexel discusses some history of the names of the basic properties. The best that Doctor Peterson can track down is that terms such as "distribution" were first used in the late 1700's to 1800's (starting in French in a memoir by Francois Joseph Servois). No commentary on a

*reason*for why this was picked over alternative formulations. But perhaps the fact that the original discussion was in terms of functions (not binary operators) provides a clue. (For the full French text, see here and search "commutative and distributive").

Here's me asking the question at the StackExchange Mathematics site. Unfortunately, most commentators considered it to be uninteresting. When it got no responses, I cross-posted to the Mathematics Educators site -- which is apparently a huge

*faux pas*, and immediately got it down-moderated into oblivion. The only relevant answer to date was from Benjamin Dickman, who pointed to a very nice quote from Euclid: when he states a similar property in geometric terms (area of a rectangle versus the sum of it being sliced up), it happens to be in the same order as we present the distribution property. But still no word on any reason

*why*it should be in that order and not the reverse.

Observations from a few textbooks that I have lying around:

- Rietz and Crathorne,
*Introductory College Algebra*(1933). Article 4 shows combining like terms, and asserts that's justified by the associative law (which is nonsensical). The distributive property isn't presented until later, in Article 8. - Martin-Gay,
*Prealgebra & Introductory Algebra*. In the purely numerical prealgebra section, this first shows up as distribution among numbers (Sec 1.5). But the first time it appears in the algebra section with variables it is in fact written and used for combining like terms (Sec 3.1: \(ac + bc = (a+b)c\), although still called the "distributive property"). Combining like terms is actually done even earlier than that on an intuitive basis (see Sec. 2.6, Example 4). Only later is the property presented and used to remove parentheses from variable expressions. - Bittinger's
*Intermediate Algebra*shows standard distribution, followed immediately by use for combining like terms. Sullivan's*Algebra & Trigonometry*does the same.

Another thought is that while you can point to distribution as justifying the standard long-multiplication process (across decimal place value), the interior additions are implied and not explicit, and so they don't really serve to develop intuition in the same way that simple unit addition does.

Therefore, I find myself fantasizing about the following. Write a slightly nonstandard algebra textbook that starts by assuming commutativity, association, and the combining-like-terms-property (and shortly after deriving the distribution property). Perhaps for a better name it could be called "collection of like multiplications inside addition", or something like that.

Do you think this would be a better set of axioms for a basic algebra class? Can you think of a solid historical or pedagogical reason why the name and presentation were not the other way around, like this? Likely some more on this later.