Some years ago I found myself puzzling over how hardware multipliers work. Honestly, I’m not sure why! Perhaps for the same reason some people do cryptic crosswords?
I’ve known about half- and full-adder circuits for a long time. I reckon I could draw up the necessary arrangement of logic gates by working backwards from the truth table, but hardware multipliers were a mysterious black box.
I could have just looked up the Wikipedia page, but I thought it would be interesting to try and puzzle it out from first principles while walking the dog. Some people do crosswords…
The naive way of doing multiplication (x × y) is just to add x
to itself y times. That is: 3 × 5 = 15 is
3 + 3 + 3 + 3 + 3 = 15 or literally 5 times 3.
In C code, it looks like this:
uint32_t mul(uint16_t x, uint16_t y) {
uint32_t r = 0;
while (y) {
r += x;
y--;
}
return r;
}This is fine as far as it goes, but the time taken to perform the
multiplication depends on the value of y. Multiplying 10 × 10 takes twice as
long as multiplying 10 × 5. It’s also sensitive to the order of the arguments!
Multiplying 5 × 10 takes twice as long as multiplying 10 × 5.
There must be a better way!
When we do ‘long’ multiplication by hand, we multiply x by the least
significant digit of y, then multiply x again by the next least significant
digit of y, recording the result one column over, and so on. Finally, we add
all the intermediate results to get the answer:
38 ; x
x 52 ; y
----
76 ; = 2 x 38
+ 1900 ; = 5 x 38 x 10
----
1976Something interesting happens if the multiplier y only contains ones
and zeroes:
38 ; x
x 101 ; y
-----
38 ; = 1 x 38
000 ; = 0 x 38 x 10
+ 3800 ; = 1 x 38 x 100
----
3838No multiplication - just shifting to the left and adding!
Now, if we represent x and y in binary - just ones and zeroes - and
use binary addition - voila! Binary multiplication!
This example uses smaller numbers to show the principle without too much clutter.
0011 ; x = 3
x 0101 ; y = 5
-----------
0011 ; = 3
00000
001100 ; = 12 or 0011 << 2
+ 0000000
---------
0001111 ; = 15!This translates naturally(!) into:
uint32_t multiply(uint16_t x, uint16_t y) {
uint32_t r = 0;
uint32_t xx = x; // Promote x to prevent overflow
while (y) {
if (y & 1) r += xx;
y >>= 1;
xx <<= 1;
}
return r;
}A couple of details to note if you’re not entirely fluent in C:
The
xparameter has to be promoted to a larger type so the value can be left-shifted (xx <<= 1) without overflowing.The
if (y & 1) r += xxline tests the least significant bit ofy, and adds the current (shifted) value ofxxto the resultr.y >>= 1shiftsyto the right.
This is much faster than the naive solution, although it’s still sensitive to
the order of the arguments. The time taken to perform the multiplication
depends on the number of bits needed to represent the y value.
Consider multiplying 0x0001 by 0xffff. The naive implementation will iterate over its inner loop 65,535 times. The shift-add implementation will perform its inner loop only 16 times.
In ‘Big O’ notation, that’s ~O(y) vs ~O(log y).
Though I was very pleased to have figured this out, I knew it couldn’t be original work. But it wasn’t until I gave the code to Claude to review that I found out it’s called ‘Egyptian’ or ‘Russian Peasant’ multiplication and long predates the common use of binary!
Actually, that’s probably the coolest thing about this - Russian peasants and Egyptian scribes were using binary multiplication to do arithmetic centuries before the difference engine was a twinkle in Mr Babbage’s eye!
Revised March 24th, 2026 to fix a typo.
Revised March 23rd, 2026 to add the example showing decimal multiplication by 101 and to generally improve the tone and readability.