Why Is Distribution Prioritized Over Combining?

So I've come up with this question that's been bothering me for weeks,and I've been searching and asking everyone and everywhere that I can. I suspect that it may have no answer. The question is this:
Consider the properties of real numbers that we take for granted at the start of an algebra or analysis class (commutativity, association, and distribution of multiplying over addition). Granted that the last one, distribution (the transformation \(a(b+c) = ab + ac\)), is effectively equivalent to what we might call "combining like terms" (the transformation \(ax + bx = (a+b)x\)). It seems like the latter is more fundamental and easier to intuit as an axiom, since it resembles simple addition of units (e.g., 3 feet + 5 feet = 8 feet). So historically and/or pedagogically, what was the reason choosing the name and order we have ("distribution", \(a(b+c)=ab+ac\)), instead of the other option ("combining", \(ax+bx = (a+b)x\)) for the starting axiom?
I suspect now that there simply isn't any reason that we can document. Some expansion on the problem:

In dimensional analysis, some call the idea of only adding or comparing like units the "Great Principle of Similitude". Which provides some of my motivation for wishing that we would start with this ("combining") and then derive distribution (using commutativity a few times). Note that this phrase is in many places erroneously attributed to Newton; in truth the earliest documented usage of the phrase is by Rayleigh in a letter to Nature (No. 2368, Vol. 95; March 18, 1915). I could probably write a whole post just on the hunt for this quote. Big thanks to Juan Santiago who teaches a class by that name at Stanford (link) for helping me track down the article.

The Math Forum at Drexel discusses some history of the names of the basic properties. The best that Doctor Peterson can track down is that terms such as "distribution" were first used in the late 1700's to 1800's (starting in French in a memoir by Francois Joseph Servois). No commentary on a reason for why this was picked over alternative formulations. But perhaps the fact that the original discussion was in terms of functions (not binary operators) provides a clue. (For the full French text, see here and search "commutative and distributive").

Here's me asking the question at the StackExchange Mathematics site. Unfortunately, most commentators considered it to be uninteresting. When it got no responses, I cross-posted to the Mathematics Educators site -- which is apparently a huge faux pas, and immediately got it down-moderated into oblivion. The only relevant answer to date was from Benjamin Dickman, who pointed to a very nice quote from Euclid: when he states a similar property in geometric terms (area of a rectangle versus the sum of it being sliced up), it happens to be in the same order as we present the distribution property. But still no word on any reason why it should be in that order and not the reverse. 

Observations from a few textbooks that I have lying around:
  • Rietz and Crathorne, Introductory College Algebra (1933). Article 4 shows combining like terms, and asserts that's justified by the associative law (which is nonsensical). The distributive property isn't presented until later, in Article 8.
  • Martin-Gay, Prealgebra & Introductory Algebra. In the purely numerical prealgebra section, this first shows up as distribution among numbers (Sec 1.5). But the first time it appears in the algebra section with variables it is in fact written and used for combining like terms (Sec 3.1: \(ac + bc = (a+b)c\), although still called the "distributive property"). Combining like terms is actually done even earlier than that on an intuitive basis (see Sec. 2.6, Example 4). Only later is the property presented and used to remove parentheses from variable expressions.
  • Bittinger's Intermediate Algebra shows standard distribution, followed immediately by use for combining like terms. Sullivan's Algebra & Trigonometry does the same.
So my point with those sources is that even though distribution is usually presented in a removing-parentheses format, in practice many textbook authors find themselves unable to escape the need to use combining like terms at some earlier point in their presentation (Rietz and Crathorne, Martin-Gay). This observation bolsters my growing instinct that it would be more intuitive to present the property in that format in the first place (as Martin-Gay does, the first time it appears with variables), and then derive what we call distribution from that.

Another thought is that while you can point to distribution as justifying the standard long-multiplication process (across decimal place value), the interior additions are implied and not explicit, and so they don't really serve to develop intuition in the same way that simple unit addition does.

Therefore, I find myself fantasizing about the following. Write a slightly nonstandard algebra textbook that starts by assuming commutativity, association, and the combining-like-terms-property (and shortly after deriving the distribution property). Perhaps for a better name it could be called "collection of like multiplications inside addition", or something like that.

Do you think this would be a better set of axioms for a basic algebra class? Can you think of a solid historical or pedagogical reason why the name and presentation were not the other way around, like this? Likely some more on this later.


Rational Numbers and Randomized Digits

Here's a quick thought experiment to develop intuition about the cardinality of rational versus irrational decimal numbers. We know that any rational number (a/b with integer a, b and b ≠ 0) has a decimal expansion that either terminates or repeats (and terminating is itself equivalent to ending with a repeating block of all 0's).

Consider randomizing decimal digits in an infinite string (say, by using a standard d10 from a roleplaying game, shown above). How likely does it seem that at any point you'll start rolling repeated 0's, and nothing but 0's, until the end of time? It's obviously diminishingly unlikely, so effectively impossible that you'll roll a terminating decimal. Alternatively, how probable does it seem that you'll roll some particular block of digits, and then repeat them in exactly the same order, and keep doing so without fail an infinite number of times? Again, it seems effectively impossible.

So this intuitively shows that if you pick any real number "at random" (in this case, generating random decimal digits one at a time), it's effectively certain that you'll produce an irrational number. The proportion of rational numbers can be seen to be practically negligible compared to the preponderance of irrationals.


Algebra for Cryptography

Cryptography researcher Victor Shoup recently gave a talk at the Simons Institute at Berkeley. Richard Lipton quotes him in one of his interesting observations about cryptography:
He also made another point: For the basic type of systems under discussion, he averred that the mathematics needed to describe and understand them was essentially high school algebra. Or as he said, “at least high school algebra outside the US.”
Quoted here.


The MOOC Revolution that Wasn't

Three years ago I wrote a review of "Udacity Statistics 101" that went semi-viral, finding the MOOC course to be slapdash, unplanned, and in many cases pure nonsense (link). I wound up personally corresponding with Sebastian Thrun (Stanford professor, founder of Udacity, head of Google's auto-car project) over it, and came away super skeptical of his work. Today here's a fantastic article about the fallen hopes for MOOCs and Thrun's Udacity in particular -- highly recommended, jealous that I didn't write this.
Just a few short years after promising higher education for anyone with an Internet connection, MOOCs have scaled back their ambitions, content to become job training for the tech sector and for students who already have college degrees...

"In 50 years,” Thrun told Wired, “there will be only 10 institutions in the world delivering higher education and Udacity has a shot at being one of them.”

Three years later, Thrun and the other MOOC startup founders are now telling a different story. The latest tagline used by Thrun to describe his company: “Uber for Education.”
I want to quote the whole thing here; probably best that you just go and read it. Big kudos to Audrey Waters for writing this (and tip to Cathy O'Neil for sharing a link).