number-theorynumerical

Created 26 Jul 2023

Bezout's Lemma and the Extended Euclidean Algorithm: Linear Combinations

tl;dr: A journey from the ground up in which we use axioms to build a proof that every positive integer can be uniquely prime factored.

This is part 3, where we discover and prove Bezout's lemma using the division algorithm. In general, the level of rigor will decrease as the parts go on, so that the reader doesn't get bored to death. But it should be obvious how to fill out everything with complete rigor. Here is a glossary of math symbols.

Puzzle: water-jug problem

You might have heard this one before:

There are two water jugs A and B, of size 8 and 5 litres respectively. They have no markings, so it is impossible to tell how much water is in a jug unless it is completely full or completely empty. There is a sink with a water tap and a drain. How can exactly one litre of water be obtained from the tap using the two jugs?

It's a fun puzzle - definitely worth playing around with it before reading the solution below.

Now it's time for every mathematician's favourite question: can we generalize it?

We start in the state (0,0)(0,0) and want to end in the state (1,0)(1,0) (goal state)1. Notice how one jug must always be empty or full, due to the rules of the question ("it is impossible to tell how much water is in a jug unless it is completely full or empty"). Therefore we can deduce that in a valid solution that uses as few operations as possible2, the total amount of water only changes by 5 or 8. This is because:

  • transferring water from one jug to another does not change the total amount
  • filling an empty jug will change the total amount by 5 or 8.
  • filling a non-empty but non-full jug is pointless, because the other jug must be empty or full, thus we would obtain the state of both jugs full, or one jug full and one empty, which is backwards progress because those states can be easily reached from (0,0)(0,0) in 1 or 2 moves.
  • filling a full jug does nothing

So now, we have traction on the problem: the total amount of water is initially 0, finally 1, and only changes by 5 or 8. In the general case of the jug capacities being aa and bb, the total must start at 0, end at 1, and change by aa or bb at each step.

Therefore, if we can solve the problem then we must be able to write 1 as a linear combination of aa and bb, i.e. we must be able to find integers x,yx,y such that ax+by=1ax + by = 1, because multiplication is repeated addition.

In the case of 5 and 8, we can write 1=82531 = 8 \cdot 2 - 5 \cdot 3, i.e. 1=8+85551 = 8 + 8 - 5 - 5 - 5.

And so, we've motivated the main question:

Given integers a,ba,b, do there exist integers x,yx,y such that ax+by=1ax+by=1?

If there do not, then the puzzle can't be solved.3

Numerical Evidence

The first thing we might note is that, ax+byax+by will always by divisible by gcd(a,b)gcd(a,b), no matter what integers xx and yy we choose (see Lemma 5 in part 1). For example if aa is even (divisible by 2) and bb is even, then axax and byby will both be even, so ax+byax+by will be even.

Therefore, if we want ax+byax+by to equal 1, we need gcd(a,b)gcd(a,b) to be 1, i.e. aa and bb share no common factors (are "coprime").

Now what if they are coprime, say, a=5 and b=8?

The key is to write 8 = 5+3. Then a linear combination as 8 and 5 can be rewritten as a linear combination of 3 and 5, and vice versa. Because 8x+5y8\cdot x + 5 \cdot y =(3+5)x+5y= (3 + 5) \cdot x + 5 \cdot y =3x+5(x+y)= 3 \cdot x + 5 \cdot (x+y). Similarly, a linear combination of 3 and 5 can be transformed into a combination of 3 and 2, which can be transformed into a combination of 1 and 2. But we can always write 1 as a linear combination of 1 and 2, i.e. 1=11+201 = 1\cdot 1 + 2\cdot 0. So theoretically, we should be able to "undo" our sequence of transformations to get back to the combination of 5 and 8!

Let's try:

1=11+201 = 1\cdot 1 + 2\cdot 0=(32)1+20 = (3-2)\cdot 1 + 2\cdot 0=31+2(01) = 3\cdot 1 + 2\cdot (0-1)=31+(53)(1) = 3\cdot 1 + (5-3)\cdot (-1)=3(1+1)+5(1) = 3\cdot (1+1) + 5\cdot (-1)=(85)2+5(1) = (8-5)\cdot 2 + 5\cdot (-1)=82+5(12) = 8\cdot 2 + 5\cdot (-1-2)=8253 = 8\cdot 2 - 5\cdot 3

Which is a linear combination of 5 and 8!

Now what about a=155a=155 and b=27b=27?

Let's do the same thing. Note that a linear combination of 155 and 27 is a linear combination of (27*5 + 20) and 27, which is a linear combination of 20 and 27 because 155x+27y155x + 27y =(275+20)x+27y = (27\cdot 5 + 20)x + 27y =27(5x+y)+20x= 27(5x+y) + 20x. And so on, this is a linear combination of 7 and 20, which is a combination of 7 and 6 (because 20 divided by 7 has remainder 6), which is a combination of 7-6=1 and 6. We've hit 1, so we can start building up the desired combination by going backwards.

1=11+601 = 1\cdot 1 + 6\cdot 0=(76)1+60 = (7-6)\cdot 1 + 6\cdot 0=71+6(01) = 7\cdot 1 + 6\cdot (0-1)=71+(2027)(1) = 7\cdot 1 + (20-2\cdot 7)\cdot (-1)=7(1+21)+20(1) = 7\cdot (1+2\cdot 1) + 20\cdot (-1)=(2720)3+20(1) = (27-20)\cdot 3 + 20\cdot (-1)=273+20(13) = 27\cdot 3 + 20\cdot (-1-3)=273+(155527)(4) = 27\cdot 3 + (155-5\cdot 27)\cdot (-4)=27(3+54)+(155(4) = 27\cdot (3+5\cdot 4) + (155\cdot (-4)=27231554 = 27\cdot 23 - 155\cdot 4

Are you starting to get the idea? Let's do one more example (I also strongly recommend trying some on your own). This time, we'll write out the divisions that we're doing at the start, as well as the reconstruction steps.

a=259,b=443a = 259, b = 443

First, our division steps that "reduce" the problem:

443=2591+184, new pair is (259,184)443 = 259 \cdot 1 + 184 \text{, new pair is (259,184)}259=1841+75, new pair is (184,75)259 = 184 \cdot 1 + 75 \text{, new pair is (184,75)}184=752+34, new pair is (75,34)184 = 75 \cdot 2 + 34 \text{, new pair is (75,34)}75=342+7, new pair is (34,7)75 = 34 \cdot 2 + 7 \text{, new pair is (34,7)}34=74+6, new pair is (7,6)34 = 7 \cdot 4 + 6 \text{, new pair is (7,6)}7=61+1, new pair is (6,1)7 = 6 \cdot 1 + 1 \text{, new pair is (6,1)}6=16+0, new pair is (1,0)6 = 1 \cdot 6 + 0 \text{, new pair is (1,0)}

We stop once we hit (1,0). Now, 1=11001 = 1\cdot 1 - 0\cdot 0 so we can start reconstructing, by travelling back up the list of divisions.

1=11001 = 1\cdot 1 - 0 \cdot 0=11(616)0 (see last line of the division list) = 1\cdot 1 - (6 - 1\cdot 6) \cdot 0 \text{ (see last line of the division list)}=1(1+60)60 = 1\cdot (1+6\cdot 0) - 6 \cdot 0=(761)160 (see penultimate line) = (7-6 \cdot 1)\cdot 1 - 6 \cdot 0 \text{ (see penultimate line)}=716(0+11) = 7\cdot 1 - 6 \cdot (0+1\cdot 1)=71(3474)1 (see... etc) = 7\cdot 1 - (34 - 7 \cdot 4) \cdot 1 \text{ (see... etc)}=7(1+41)341 = 7\cdot (1+4\cdot 1) - 34 \cdot 1=(75342)5341 = (75 - 34\cdot 2)\cdot 5 - 34 \cdot 1=75534(1+25) = 75\cdot 5 - 34 \cdot (1+2\cdot 5)=755(184752)11 = 75\cdot 5 - (184 - 75 \cdot 2) \cdot 11=75(5+211)18411 = 75\cdot (5 + 2\cdot 11) - 184 \cdot 11=(2591841)2718411 = (259-184\cdot 1)\cdot 27 - 184 \cdot 11=25927184(11+127) = 259\cdot 27 - 184 \cdot (11 + 1\cdot 27)=25927(4432591)38 = 259\cdot 27 - (443 - 259 \cdot 1) \cdot 38=259(27+138)44338 = 259\cdot (27 + 1\cdot 38) - 443 \cdot 38=2596544338 = 259\cdot 65 - 443 \cdot 38

which is a solution.

Note that I kept the sign in the middle to be negative, which looks like it made all the coefficients positive4.

By the way, this repeated division process (not including the reconstructing) is called Euclid's Algorithm, and is used to find the gcd of two numbers efficiently5. In our case we were finding the gcd of 259 and 443, which is 1, so we ended up with (1,0). Feel free to experiment with what would happen if we had gcd(a,b)>1gcd(a,b) \gt 1. If we include the reconstructing, then the process of finding a solution to ax+by=1ax+by=1 is called the extended Euclidean algorithm (egcd). There are different ways to construct the solution, one of which is exactly as we've done here. (Feel free to try and generalize what we've been doing).

As long as the initial repeated division (Euclid's algorithm) ends with the pair (1,0), we can perform the reconstruction to find a solution to ax+by=1ax + by = 1. In general, Euclid's algorithm ends with the pair being (gcd(a,b),0)(gcd(a,b),0), so if we do the reconstruction process, we will be able to find a solution to ax+by=gcd(a,b)ax + by = gcd(a,b).

Bezout's Lemma: Proof

Let's switch to our axiom world for a second, to formalize the lemma and prove it.

Theorem. (Bezout's Lemma)

For any two positive integers a,ba,b, there exist integers x,yx,y such that ax+by=gcd(a,b)ax + by = gcd(a,b).

Given that we want to use the well-ordering principle as a proof technique (since it's one of our axioms), we could try to consider the set of all possible ax+byax+by (a,b fixed, x,y vary) and take the smallest one, ee. Then, if egcd(a,b)e \neq gcd(a,b) we somehow want to generate an element of the set that is smaller than ee, so that we can show a contradiction.

Let's think about this on the number line. Suppose a,ba,b coprime for simplicity; then we want to show that there is a solution to ax+by=1ax+by = 1. If we start at 0 and are allowed to jump left or right by aa or bb, can we get to 1?

Well, if we can't, i.e. if the smallest positive number we could reach was ee where e>1e\gt 1, then we can essentially think of "going from 0 to e" as one operation. Then we can go from 0 to e, e to 2e, 2e to 3e, etc., until we get close to aa on the number line. By the division algorithm, we can always land in the region between a,a+e1a,a+e-1 inclusive. But then we can travel left by a, and we will be in the region between 0,e10, e-1 inclusive, which is a contradiction because we'd be able to reach a smaller number than ee.

Proof. (of Bezout's Lemma)

Let a,bNa,b \in \mathbb{N}. Consider the set

S={nNn=ax+by,  x,yZ}S = \{n \in \mathbb{N} \mid n = ax + by,\; x,y \in \mathbb{Z}\}

This set is nonempty (since, for example, aSa \in S) and a subset of the naturals by construction. Thus by the well-ordering principle, S has a least element, say e=ax0+by0e = ax_0 + by_0.

Claim.

eae \mid a.

Proof.

By the Divison Algorithm, write a=qe+ra = qe + r with qZq \in \mathbb{Z} and 0r<e0 \leq r \lt e. Then r=aq(ax0+by0)r = a - q(ax_0 + by_0) is a linear combination of aa and bb. But 0r<e0 \leq r \lt e, thus either r=0r=0, or rSr \in S with r<er \lt e. Since the second option contradicts the minimality of ee, we must have r=0r=0, and so a=qe+0a = qe + 0 i.e. eae \mid a.

Now by the claim, eae \mid a. Similarly, repeating the above argument analogously for bb, we have ebe \mid b. Thus ee is a common divisor of aa and bb, so egcd(a,b)e \leq gcd(a,b) by definition of "greatest".

But also, recall that e=ax0+by0e = ax_0 + by_0. Since the gcd of a and b divides the RHS by Lemma 4 of the lemma list from part 1, we have that gcd(a,b)egcd(a,b) \mid e, and so egcd(a,b)e \geq gcd(a,b) by Lemma 19.

Overall, since egcd(a,b)e \leq gcd(a,b) and egcd(a,b)e \geq gcd(a,b), we have e=gcd(a,b)e = gcd(a,b). So gcd(a,b)=e=ax0+by0gcd(a,b) = e = ax_0 + by_0, so gcd(a,b)gcd(a,b) can be written as a linear combination of aa and bb. Done.

Magic box

Let's try another concrete example: finding a solution to 29x+11y=129x + 11y = 1.

We could do what we did before, which was the Euclidean Algorithm and then building a solution in reverse6. But what if we try the same thing but going forwards?

29x+11y=129x + 11y = 1(211+7)x+11y=1(2\cdot 11 + 7)x + 11y = 17x+11(2x+y)=17x + 11(2x+y) = 17x+(71+4)(2x+y)=17x + (7\cdot 1+4)(2x+y) = 17(3x+y)+4(2x+y)=17(3x+y) + 4(2x+y) = 1(4+3)(3x+y)+4(2x+y)=1(4+3)(3x+y) + 4(2x+y) = 13(3x+y)+4(5x+2y)=13(3x+y) + 4(5x+2y) = 13(3x+y)+(3+1)(5x+2y)=13(3x+y) + (3+1)(5x+2y) = 13(8x+3y)+1(5x+2y)=13(8x+3y) + 1(5x+2y) = 1(31+0)(8x+3y)+1(5x+2y)=1(3\cdot 1 + 0)(8x+3y) + 1(5x+2y) = 10(8x+3y)+1(29x+11y)=10(8x+3y) + 1(29x + 11y) = 1

Uh, oh, it looks like we started with 29x+11y=129x + 11y = 1 and ended up with 0+1(29x+11y)=10 + 1(29x+11y) = 1. Did we go in a circle? It certainly looks like it, apart from one thing - why did we get (8x+3y)(8x+3y) in that bracket? Surely there's something special about it.

Recall that, when we built up the solution in reverse, we started with "00+11=10\cdot0 + 1\cdot 1 = 1" then built it up. So here, looking at the last line of the above, why don't we set 8x+3y=08x + 3y = 0?7 This is easy to find a solution to, e.g. x=3,y=8x=-3, y=8.

What is 29x+11y29x+11y when x=3x=3 and y=8y=-8?

It's one!! So could it be that the significance of (8x+3y)(8x+3y) is that it gives a solution?

In fact, it looks like something even better is true: for convenience, I'll write a compressed version of what we did again:

29x+11y=129x + 11y = 17x+11(2x+y)=17x + 11(2x+y) = 17(3x+y)+4(2x+y)=17(3x+y) + 4(2x+y) = 13(3x+y)+4(5x+2y)=13(3x+y) + 4(5x+2y) = 13(8x+3y)+1(5x+2y)=13(8x+3y) + 1(5x+2y) = 10(8x+3y)+1(29x+11y)=10(8x+3y) + 1(29x + 11y) = 1

Look at the coefficients of xx and yy in the last line: it fits the pattern that 118293=111\cdot 8 - 29\cdot 3 = 1

Look at the coefficients in the penultimate line: it fits the pattern that 2853=12\cdot 8 - 5\cdot 3 = 1

Look at the coefficients in the third-to-last line: it fits the pattern that 3251=13\cdot 2 - 5\cdot 1 = 1

Etc.: it's true for all the lines!

If we look at these brackets, and list them out in order:

(0x+1y)(0x+1y), (1x+0y)(1x+0y), (2x+y)(2x+y), (3x+y)(3x+y), (5x+2y)(5x+2y), (8x+3y)(8x+3y), (29x+11y)(29x+11y)

Let's stop writing the x and y, and put these in a table instead, where each column represents a bracket:

01235829
10112311

Then, arranged like this, the determinant of each 2x2 square alternates between 1 and -1.

Now, let's think about how we generated these brackets.

Let's say we have written the line m(ax+by)+n(cx+dy)m(ax+by) + n(cx+dy) , with m>nm \gt n. For example, if m=3m=3, a=8a=8, b=3b=3, n=1n=1, c=5c=5, d=2d=2 then we have the penultimate line of the above.

So what's the next line? Well, we reduce: write m=qn+mm = qn + m', so qq is the next quotient in the Euclidean algorithm, and mm' is the remainder.

m(ax+by)+n(cx+dy)m(ax + by) + n(cx + dy)=(qn+m)(ax+by)+n(cx+dy) = (qn+m')(ax + by) + n(cx + dy)=m(ax+by)+n(q(ax+by)+(cx+dy)) = m'(ax + by) + n(q(ax+by)+(cx + dy))=m(ax+by)+n((qa+c)x+(qb+d)y) = m'(ax + by) + n((qa+c)x+(qb+d)y)

And so, if we have the two brackets (cx+dy)(cx+dy) and (ax+by)(ax+by), then the next bracket is ((qa+c)x+(qb+d)y)((qa+c)x + (qb+d)y).

Writing this in the table form, if we currently have two adjacent columns like this:

...ccaa
...ddbb

Then the next column is like this:

...ccaaqa+cqa+c
...ddbbqb+dqb+d

And so, if we put the quotients in a row on top:

q0q_0...qi2q_{i-2}qi1q_{i-1}qiq_i...
0011q01+0q_0\cdot 1 + 0...ccaaqia+cq_ia+c...
1100q00+1q_0 \cdot 0 + 1...ddbbqib+dq_ib+d...

Wow, so now we have an efficient, convenient way to compute solutions!

It's almost... magic! So magic, it's called the magic box!

Let's do a couple of examples. First, let's summarize how we found a solution to 29x+11y=129x + 11y = 1:

We first do Euclid's Algorithm to find the quotients:

29=29 = 22 11+7\cdot 11 + 7

11=11 = 11 7+4\cdot 7 + 4

7=7 = 11 4+3\cdot 4 + 3

4=4 = 11 3+1\cdot 3 + 1

3=3 = 33 1+0\cdot 1 + 0

And so the quotients are [2,1,1,1,3][2, 1, 1, 1, 3].

Then we draw out the start of the magic box:

2211111133
0011
1100

Now we fill out each row from left to right: each number is equal to the quotient above it in the top row, multiplied by the number to the left of it, plus the number two squares to the left of it.

2211111133
001121+0=22\cdot1 + 0 = 212+1=31\cdot 2 + 1 = 3etc.
110020+1=12\cdot 0 + 1 = 111+0=11 \cdot 1 + 0 = 1etc.

Then, when we get the whole magic box:

2211111133
0011223355882929
1100111122331111

The last 2x2 square gives us a solution to 29x11y=129x - 11y = 1, namely x=3x = 3, y=8y=8.

Now try if yourself! Compute the magic box for 121x+43y=1121x + 43y = 1, and hence find an integer solution. Check against the answer below.

The magic box also relates to continued fractions: we actually computed the convergents of 121/43121/43, i.e. (best possible) rational approximations using smaller integers. Really, the "magic" of it is that it feels inside-out: originally we were using the quotients in reverse order, but now, we're using them in the same order that we compute them8.

I won't prove that the magic box works, because this article is quite long already. But feel free to try yourself (hint: induction).

Finally, it will feel satisfying to actually write the magic box in terms of 2x2 matrices, since we're talking about determinants:

Let's say the list of quotients is t1,t2,,tnt_1, t_2, \cdots, t_n .

Then we build a sequence of matrices (which are the 2x2 squares in the magic box, from left to right): the first is M1=(0110)M_1 = \begin{pmatrix}0 & 1\\1 & 0\end{pmatrix}, and for any matrix MiM_i in the sequence with Mi=(pipi+1qiqi+1)M_i = \begin{pmatrix} p_i & p_{i+1} \\ q_i & q_{i+1} \end{pmatrix}, we have that:Mi+1=(pi+1tipi+1+piqi+1tiqi+1+qi)M_{i+1} = \begin{pmatrix} p_{i+1} & t_i p_{i+1} + p_i \\ q_{i+1} & t_i q_{i+1} + q_i\end{pmatrix}Now we can write this in terms of MiM_i like so:Mi+1=(pi+1piqi+1qi)+(0tipi+10tiqi+1)M_{i+1} = \begin{pmatrix} p_{i+1} & p_i \\ q_{i+1} & q_i\end{pmatrix} + \begin{pmatrix} 0 & t_i p_{i+1} \\ 0 & t_i q_{i+1}\end{pmatrix}=Mi(0110)+Mi(000ti)= M_i \begin{pmatrix} 0 & 1 \\ 1 & 0 \end{pmatrix} + M_i\begin{pmatrix} 0 & 0 \\ 0 & t_i \end{pmatrix}=Mi(011ti)= M_i \begin{pmatrix} 0 & 1 \\ 1 & t_i \end{pmatrix}

Now, using this recurrence, we have that the last 2x2 square in the magic box, the one that gives the solution to ax+by=±1ax + by = \pm 1, is:(0110)(011t1)(011t2)()(011tn)\begin{pmatrix} 0 & 1 \\ 1 & 0\end{pmatrix} \begin{pmatrix} 0 & 1 \\ 1 & t_1\end{pmatrix} \begin{pmatrix} 0 & 1 \\ 1 & t_2\end{pmatrix} (\cdots) \begin{pmatrix} 0 & 1 \\ 1 & t_n\end{pmatrix}=0in(011ti) = \prod_{0 \leq i \leq n} \begin{pmatrix} 0 & 1 \\ 1 & t_i\end{pmatrix}Where we extend the definition of the tit_i to include t0=0t_0 = 0.

Note: this recurrence also justifies that the determinant of each 2x2 square in the magic box alternates between 1 and -1, because:det(Mi+1)det(M_{i+1})=det(Mi(011ti))= det\left(M_i \begin{pmatrix} 0 & 1 \\ 1 & t_i \end{pmatrix}\right)=det(Mi)det((011ti))= det(M_i) \, det\left(\begin{pmatrix} 0 & 1 \\ 1 & t_i \end{pmatrix}\right)=det(Mi) = -det(M_i)

Remarks: first unobvious result?

Bezout's lemma is interesting because it is the first thing we've come across that wouldn't be obvious to an average high-school student. Indeed, most people would say, "why so much rigor?" when we're proving things like the division algorithm that just already feel obvious to everyone.

But that's what a lot of maths is about - ensuring we have rigorous foundations to stand on. In this case, we are all convinced that what we are trying to prove is true, and it does indeed turn out to be provably true, but what about when we're trying to prove something we believe, and it's not true? We need to ensure that all proofs are rigorous (enough), otherwise holes could creep in, and one false assumption would render mathematics ostensibly inconsistent.


  1. There are technically two goal states: (0,1)(0,1) and (1,0)(1,0), but if we reach one of these then we can reach the other by transferring the water, so it doesn't really matter. 

  2. This assumption is important, because it lets us deduce that we never fill a non-empty but non-full jug (as that would create unnecessary steps). 

  3. This is the contrapositive statement of "if the puzzle can be solved, then there do exist integers...". 

  4. Why? 

  5. Why does the repeated division process end up finding the gcd? Isn't that magical? Hint: gcd(a,b)=gcd(ab,b)gcd(a,b) = gcd(a-b,b) because something divides both aa and bb if and only if it divides both aba-b and bb

  6. This would be quite inconvenient for a computer algorithm, because we use the quotients in the opposite order to which they are generated, which means we have to store all the quotients rather than working with each quotient as it is generated. 

  7. Actually, we can set it to be anything, because it's multiplied by 0: and we always get an integer solution (x,y)(x,y) from the two simultaneous equations, because the matrix (832911)\begin{pmatrix}8 & 3 \\ 29 & 11\end{pmatrix} has an integral inverse due to its determinant! 

  8. This is good because a computer can fill in the table at the same time as generating the quotients, so it's very memory efficient.