Download Report

Math 3070 Course Notes
Rob Noble
October 16, 2014
2
Contents
1 Mathematical Induction and the Least Integer Principle
5
2 Integers
11
3 Unique Factorization
19
4 Linear Diophantine Equations
25
5 Congruences
31
6 Linear Congruences
37
7 Fermat’s and Wilson’s Theorems
43
8 The Divisors of an Integer
47
9 Perfect Numbers
51
10 Euler’s Theorem and Function
55
11 Primitive Roots
61
12 Quadratic Congruences
69
13 Quadratic Reciprocity
77
14 Pythagorean Triangles
87
15 Infinite Descent and Fermat’s Conjecture
97
16 Sums of Squares
103
17 x2 − N y 2 = 1
117
3
4
CONTENTS
Chapter 1
Mathematical Induction and the
Least Integer Principle
Many times in what follows, we will invoke the principle of mathematical induction or the least
integer principle to prove results. These principles are equivalent, and are used extensively, sometimes implicitly, in mathematics. There are two equivalent forms of the principle of mathematical
induction. These two forms are given below.
Lemma 1 (Principle of Mathematical Induction (First Form)). Let S be a set of integers. If S
contains some integer m and is such that
For all integers n ≥ m, if n ∈ S then n + 1 ∈ S,
then S contains all integers greater than or equal to m.
Lemma 2 (Principle of Mathematical Induction (Second Form)). Let S be a set of integers. If S
contains some integer m and is such that
For all integers n ≥ m, if all of m, m + 1, . . . , n ∈ S then n + 1 ∈ S,
then S contains all integers greater than or equal to m.
Although the principle of mathematical induction is equivalent to the least integer principle,
sometimes it is more natural to apply the latter to prove a particular result. Below is stated the
least integer principle along with the dual greatest integer principle.
Lemma 3 (Greatest and Least Integer Principles). Let S be a set of integers. If S is nonempty
and bounded above (resp. below) then S has a greatest (resp. least) element.
As stated above, the two forms of the principle of mathematical induction and the least integer
principle are equivalent. This forms the content of the following proposition.
5
6
CHAPTER 1. MATHEMATICAL INDUCTION AND THE LEAST INTEGER PRINCIPLE
Proposition 1. The following statements are equivalent.
1. If S is a set of integers containing some integer m such that
For all integers n ≥ m, if all of m, m + 1, . . . , n ∈ S then n + 1 ∈ S,
then S contains all integers greater than or equal to m.
2. If S is a nonempty set of integers that is bounded below, then S contains a least element.
We close this section with an example that illustrates the use of these equivalent fundamental
concepts. We will then turn to the development of the core material of the course.
Example 1. Use one of the two forms of mathematical induction or the least integer principle to
prove the following statements.
(a) For all integers n ≥ 1 we have
1 3 + 2 3 + · · · + n3 =
n2 (n + 1)2
.
4
(b) The Fibonacci numbers {f0 , f1 , f2 , . . . } are defined recursively by
f0 = 0, f1 = 1, fn+2 = fn+1 + fn
(n ≥ 0).
For all n ≥ 0 we have
1
fn = √
5
√ !n
1+ 5
1
−√
2
5
√ !n
1− 5
.
2
(c) For every positive integer n there exist integers q and r such that 0 ≤ r ≤ 3 and
n = 4q + r.
Here, we will see later that part (c) above is a special case of what is called the division algorithm
which makes rigorous the well known process of division with remainder.
Solution. The first form of the principle of mathematical induction will be the most natural to
use for part (a), whereas the second form of the principle of mathematical induction, and the least
integer principle will prove to be the most natural to apply to parts (b) and (c) respectively. We
start with part (a). Let S be the set of all integers greater than or equal to 1 for which the result
holds. That is
n2 (n + 1)2
S = n ≥ 1 13 + 23 + · · · + n3 =
.
4
We need to prove that S contains all integers greater than or equal to one, and, by the first form
of the principle of mathematical induction, it will be sufficient to show that 1 ∈ S and that for any
integer n ≥ 1, if n ∈ S then n + 1 ∈ S. Well, 1 is certainly in S since
13 = 1 =
4
12 (1 + 1)2
=
.
4
4
7
Suppose then that for some integer n ≥ 1 we have n ∈ S. We need to prove that under this
assumption n + 1 ∈ S as well. To this end, we compute
13 + 23 + · · · + n3 + (n + 1)3 = (13 + 23 + · · · + n3 ) + (n + 1)3
n2 (n + 1)2
+ (n + 1)3
4 n2
+n+1
= (n + 1)2
4
2
n + 4n + 4
= (n + 1)2
4
(n + 2)2
2
= (n + 1)
4
2
(n + 1) ((n + 1) + 1)2
=
.
4
(Since we are assuming n ∈ S)
=
We have shown that n + 1 ∈ S so that we can conclude by the first form of the principle of
mathematical induction that S contains all integers greater than or equal to one. Thus, for all
integers n ≥ 1, we have
n2 (n + 1)2
13 + · · · + n3 =
4
as claimed.
We turn now to the proof of part (b). We will use the second form of the principle of mathematical induction. Suppose then that S is the set of all integers greater than or equal to 0 for which
the result holds. That is
(
√ !n
√ !n )
1
1+ 5
1
1− 5
S = n ≥ 0 fn = √
−√
.
2
2
5
5
We need to show that S contains all integers greater than or equal to zero. By the second form of
the principle of mathematical induction it is sufficient to show that 0, 1 ∈ S and that for all n ≥ 1
if 1, 2, . . . , n ∈ S, then n + 1 ∈ S as well. First of all, 0 ∈ S since
1
1
1
f0 = 0 = √ − √ = √
5
5
5
√ !0
1+ 5
1
−√
2
5
√ !0
1− 5
,
2
and 1 ∈ S since
1
√
5
√ !1
1+ 5
1
−√
2
5
√ !1
1− 5
1
=√
2
5
(1 +
√
1 2 5
=√
5 2
=1
= f1 .
√
5) − (1 −
2
√
5)
!
8
CHAPTER 1. MATHEMATICAL INDUCTION AND THE LEAST INTEGER PRINCIPLE
Suppose then that we have an integer n ≥ 1 for which 1, . . . , n ∈ S. We need to show that this
assumption implies that n + 1 ∈ S as well. Well, we are assuming that n and n − 1 are both in S
so that
√ !n
√ !n
1+ 5
1− 5
1
1
−√
,
fn = √
2
2
5
5
and
fn−1
1
=√
5
√ !n−1
1
1+ 5
−√
2
5
√ !n−1
1− 5
.
2
Therefore
fn+1 = fn + fn−1

"
√ !n
√ !n #
√ !n−1
1
1
1
1+ 5
1− 5
1
+
5
1
−√
+ √
−√
= √
2
2
2
5
5
5
5
!
√ !n √
n
1
1+ 5
2
1− 5
2
1
√
√
=√
1+
1+
−√
.
2
2
5
1+ 5
5
1− 5

√ !n−1
1− 5

2
But
√
2(1 − 5)
2
√ =1+
√
√
1+
1+ 5
(1 + 5)(1 − 5)
√
2(1 − 5)
=1−
4
√
1− 5
=1−
√ 2
1+ 5
=
.
2
Similarly, we have
1+
√
1− 5
2
√ =
.
2
1− 5
We conclude that
fn+1
1
=√
5
1
=√
5
1
=√
5
√ !n √ !n 1+ 5
1− 5
2
1
2
√
√
1+
−√
1+
2
2
1+ 5
5
1− 5
√ !n
√ !
√ !n
√ !
1+ 5
1+ 5
1
1− 5
1− 5
−√
2
2
2
2
5
!
!
√ n+1
√ n+1
1
1+ 5
1− 5
√
−
.
2
2
5
We have therefore shown that n + 1 ∈ S so that S contains all integers greater than or equal to 0,
as required.
9
Finally, we turn to the proof of part (c). We will apply the least integer principle. Let n ≥ 1 be
an integer. Define
S = {n − 4q ≥ 0 | q is an integer.}.
Then S is nonempty since it contains
n = n − 4(0).
Also, since every member of S is greater than or equal to zero, we see that S is bounded below. By
the least integer principle we conclude that S has a least element r. But then r = n − 4q for some
q so that n = 4q + r. In order to complete the proof, we need to show that 0 ≤ r ≤ 3. But this must
be the case since r ≥ 0 (since it is a member of S), and if r ≥ 4, then we could write
n = 4(q + 1) + (r − 4),
where 0 ≤ r − 4 < r. This would imply that r − 4 = n − 4(q + 1) ∈ S which would contradict the
minimality of r. By contradiction, we conclude that 0 ≤ r ≤ 3, as required.
We will get plenty of additional practise applying the fundamental principles introduced in this
section in what follows.
10 CHAPTER 1. MATHEMATICAL INDUCTION AND THE LEAST INTEGER PRINCIPLE
Chapter 2
Integers
This chapter is based on [Dud08, §1].
We denote the set of integers by Z, the set of natural numbers, or positive integers, by N, and
the set of nonnegative integers by N0 . That is,
Z = {. . . , −3, −2, −1, 0, 1, 2, 3, . . . };
(2.1)
N = {1, 2, 3, . . . };
(2.2)
N0 = {0, 1, 2, 3, . . . }.
(2.3)
Definition 1. Let a, b ∈ Z. We say that a divides b, written a | b, if and only if there exists d ∈ Z
for which b = ad. If a does not divide b, we write a - b.
Example 2. We have 2 | 6, 12 | 60, 17 | 17, −5 | 50, and 8 | −24, but 4 - 2 and 3 - 4.
Proposition 2. The relation | satisfies the following properties:
(i) It is transitive. That is, for all a, b, c ∈ Z, if a | b and b | c, then a | c.
(ii) For integers d, a1 , . . . , an , c1 , . . . , cn , if d | a1 , d | a2 , . . . , d | an , then d | (c1 a1 + · · · + cn an ).
Proof. Let a, b, c, d, a1 , . . . , an , c1 , . . . , cn ∈ Z.
(i) Suppose that a | b and b | c. Then, there exist integers e and f such that b = ae and c = bf .
But then, we have
c = bf = (ae)f = a(ef ).
Setting g = ef , we see that c = ag for some integer g so that a | c, as required.
(ii) Suppose that d | a1 , d | a2 , . . . , d | an . Then, there exist integers b1 , . . . , bn such that a1 = db1 ,
a2 = db2 , . . . , an = dbn . But then, we see that
c1 a1 + · · · + cn an = c1 (db1 ) + · · · + cn (dbn )
= d(c1 b1 ) + · · · + d(cn bn )
= d(c1 b1 + · · · + cn bn )
= dh
11
12
CHAPTER 2. INTEGERS
where we have set h = c1 b1 + . . . cn bn . Since we have found an integer h such that c1 a1 +
· · · + cn an = dh, we conclude that d | (c1 a1 + · · · + cn an ), as required.
Definition 2. Let a, b ∈ Z where at least one of a, b is nonzero. The unique integer d satisfying
the pair of conditions:
(i) d | a and d | b;
(ii) For any integer c such that c | a and c | b, we have c ≤ d,
is called the greatest common divisor of a and b and is denoted by (a, b) or gcd(a, b).
Implicit in this definition is that such an integer d exists. In order to prove that this is the case,
we need to invoke the greatest integer principle stated along with its dual the least integer principle
in Lemma 3.
Proposition 3. Let a, b ∈ Z not both be zero. Then, the greatest common divisor of a and b is well
defined and at least 1. That is, there exists a unique integer d satisfying the conditions (i) and (ii)
of Definition 2 and this d is greater than or equal to one.
Proof. Let S be the set of all common divisors of a and b. That is,
S = {c ∈ Z | c | a and c | b} ⊆ Z.
Since 1 | a and 1 | b, we see that 1 ∈ S so that S =
6 ∅. Further, since any common divisor of a and b
is bounded above by |a| and |b|, we see that S is bounded above. By the greatest integer principle,
S has a largest element d. But then, since d ∈ S, we see that d | a and d | b, and since it is the
largest element of S, any integer c for which c | a and c | b must satisfy c ≤ d. Further, since S
cannot have two largest elements, we see that d is unique. Finally, since 1 ∈ S, we conclude that
d ≥ 1.
Theorem 1. Let a, b ∈ Z not both be zero, and let d = (a, b). Then (a/d, b/d) = 1.
Proof. Assume the hypotheses. From Proposition 3 we know that (a/d, b/d) ≥ 1. We complete the
proof by showing that (a/d, b/d) ≤ 1. To this end, let g = (a/d, b/d). Since g | a/d and g | b/d,
there exist integers u and v such that
a
= gu,
d
b
= gv.
d
Therefore,
a = (gd)u,
b = (gd)v.
We conclude that gd is a common divisor of a and b. Since d is the greatest common divisor of a
and b, we conclude that gd ≤ d. Finally, since d > 0, we can divide by d to conclude that g ≤ 1, as
required.
Definition 3 (Relatively Prime). Let a, b ∈ Z not both be zero. We call a and b relatively prime
provided (a, b) = 1.
13
If a and b are large integers, it is impractical to find their greatest common divisor by trial
division. The Euclidean Algorithm provides us with an efficient, systematic way of determining
greatest common divisors. First, we need to make use of the division algorithm which relies on the
archimedean principle.
Theorem 2 (The Archimedean Principle For Integers). Given any integers a and b, there exist
integers u and v such that
a ≤ bu,
a ≥ bv.
The division algorithm referred to above is given by the following theorem.
Theorem 3. Let a, b ∈ Z, with b 6= 0. There exist unique integers q and r, with 0 ≤ r < |b| such
that a = bq + r.
Proof. Assume the hypotheses and let
S = {a − bq ∈ N0 | q ∈ Z}.
We will see that S is a nonempty set of integers that is bounded below. The least integer principle
will then provide us with a least element r, and we’ll see that this r together with the q for which
r = a − bq satisfy the requirements of the theorem. By the archimedean principle, there exists an
integer q such that a ≥ bq. But then a − bq ≥ 0 so that a − bq ∈ S. S is therefore nonempty.
Since every element of S is nonnegative, we see that S is bounded below by 0. By the least integer
principle, S has a least element r = a − bq. Then a = bq + r. Further, we see that r = a − bq ≥ 0
since r ∈ S. We conclude the proof of the existence of integers q and r satisfying the conditions
of the theorem by proving that r < |b|. But this must be the case since if r ≥ |b|, and |b| = bε for
ε = ±1, we could write
a = b(q + ε) + (r − bε),
where r > r −bε = r −|b| ≥ 0. This would imply that r −bε = a−b(q +ε) ∈ S thereby contradicting
the minimality of r. Therefore, we have 0 ≤ r < |b|. We have proved that there is at least one pair
of integers q, r satisfying the conditions of the theorem. We now complete the proof by showing
that this pair of integers is unique. To this end, suppose that the pairs q1 , r1 and q2 , r2 both satisfy
the conditions of the theorem. Then, we have
a = bq1 + r1 ,
0 ≤ r1 < |b|;
(2.4)
a = bq2 + r2 ,
0 ≤ r2 < |b|.
(2.5)
Subtracting (2.4) from (2.5) yields
b(q2 − q1 ) = r1 − r2 .
(2.6)
Now, (2.6) implies that r1 − r2 is a multiple of b. However, we have −|b| < r1 − r2 < |b|. Since
the only multiple of b in this range is 0, we conclude that r1 − r2 = 0. But then, (2.6) reads
b(q2 − q1 ) = 0, and since b 6= 0, we can conclude that q2 − q1 = 0. We have therefore shown that
r1 = r2 and q1 = q2 so that the pair of integers referred to in the statement of the theorem is indeed
unique.
14
CHAPTER 2. INTEGERS
Remark 1. We proved the division algorithm in this fashion since it allowed us to do everything
while speaking only of integers. We didn’t need to discuss real numbers or rational numbers.
However, if we allow ourselves the use of the reals and rationals, one can show that the unique
integers q and r referred to in the statement of Theorem 3 are intimately related to the floor and
fractional part of a/b, respectively:
q=
jak
b
,
r nao
.
=
b
b
Here, the floor bxc of a real number x is the largest integer less than or equal to x while the
fractional part {x} of a real number x is the difference x − bxc. Therefore, the division algorithm
is simply the familiar process of division with remainder.
Theorem 3 together with the following lemma will yield the Euclidean algorithm for computing
greatest common divisors.
Lemma 4. Let a, b ∈ Z not both be zero. If a = bq + r, for integers q and r, then (a, b) = (b, r).
Proof. Firstly, we note that the greatest common divisors in question are well-defined since there
is definitely no problem if b 6= 0, while if b = 0, then r = a 6= 0. We will use part (ii) of Proposition
2 that states that a common divisor of integers must divide any linear combination of the integers.
Let gab and gbr denote the greatest common divisor of a, b and b, r, respectively. From a = bq + r
together with the fact that gab divides both a and b, we conclude that gab | r. Consequently, gab
divides both b and r so that gab ≤ gbr . Similarly, Since gbr divides both b and r, and a = bq + r,
we see that gbr also divides a. But then gbr divides both a and b so that gbr ≤ gab . Putting these
two inequalities together yields gab ≤ gbr ≤ gab so that gab = gbr , as required.
We have arrived at last at the Euclidean algorithm.
Theorem 4 (The Euclidean Algorithm). Let a, b ∈ Z with b 6= 0. If we define sequences {q0 , q1 , q2 , . . . }
and {r−1 , r0 , r1 , r2 , . . . } by letting r−1 = |b| and then applying the division algorithm successively to
obtain
a = bq0 + r0 ,
0 ≤ r0 < |b|
b = r0 q1 + r1 ,
0 ≤ r1 < r0
r0 = r1 q2 + r2 ,
0 ≤ r2 < r1
r1 = r2 q3 + r3 ,
0 ≤ r3 < r2
r2 = r3 q4 + r4 ,
..
.
0 ≤ r4 < r3
..
.
there is a first index t ≥ 0 such that rt = 0. The greatest common divisor of a and b is then given
by (a, b) = rt−1 .
Proof. We have the decreasing sequence |b| > r0 > r1 > r2 > · · · ≥ 0. Therefore, eventually we
15
obtain a first zero remainder rt . We then have
a = bq0 + r0 ,
0 ≤ r0 < |b|
b = r0 q1 + r1 ,
0 ≤ r1 < r0
r0 = r1 q2 + r2 ,
0 ≤ r2 < r1
r1 = r2 q3 + r3 ,
0 ≤ r3 < r2
r2 = r3 q4 + r4 ,
..
.
0 ≤ r4 < r3
..
.
rt−3 = rt−2 qt−1 + rt−1 ,
0 ≤ rt−1 < rt−2
rt−2 = rt−1 qt .
Successively applying Lemma 4 yields (a, b) = (b, r0 ) = (r0 , r1 ) = · · · = (rt−2 , rt−1 ) = rt−1 , as
required.
Corollary 1. Let a, b ∈ Z not both be zero and d = (a, b). Then
(i) There exist integers x and y such that d = ax + by.
(ii) Every common divisor of a and b divides d.
Proof. Part (i) follows from running the Euclidean algorithm backwards starting with the second
to last equation. The details are left to the reader. For part (ii), we use part (i) to find integers x
and y such that d = ax + by and then note that any common divisor of a and b must also divide
ax + by = d.
Remark 2. We defined the greatest common divisor as the common divisor that is larger than any
other common divisor. Using part (ii) of Corollary 1, we could have, instead, defined the greatest
common divisor to be the common divisor that is divisible by every common divisor.
Example 3. Use the Euclidean algorithm to calculate (343, −280) and (1578, 442). In each case,
express the greatest common divisor as a linear combination of the given integers.
Solution. We compute
343 = (−280)(−1) + 63;
(2.7)
−280 = 63(−5) + 35;
(2.8)
63 = 35(1) + 28;
(2.9)
35 = 28(1) + 7;
(2.10)
28 = 7(4).
We conclude that (343, −280) = 7. Running equations (2.7)–(2.10) backwards yields
16
CHAPTER 2. INTEGERS
7 = 35 − 28
from (2.10)
= 35 − (63 − 35)
from (2.9)
= −63 + 2(35)
= −63 + 2(−280 + 5(63))
from (2.8)
= 2(−280) + 9(63)
= 2(−280) + 9(343 − 280)
from (2.7)
= 11(−280) + 9(343).
Similarly, we compute
1578 = (442)(3) + 252;
(2.11)
442 = 252(1) + 190;
(2.12)
252 = 190(1) + 62;
(2.13)
190 = 62(3) + 4;
(2.14)
62 = 4(15) + 2;
(2.15)
4 = 2(2).
We conclude that (1578, 442) = 2. Running equations (2.11)–(2.15) backwards yields
2 = 62 − 4(15)
= 62 − 15(190 − 62(3))
from (2.15)
from (2.14)
= −15(190) + 46(62)
= −15(190) + 46(252 − 190)
= 46(252) − 61(190)
from (2.13)
= 46(252) − 61(442 − 252)
from (2.12)
= −61(442) + 107(252)
= −61(442) + 107(1578 − 442(3))
from (2.11)
= 107(1578) − 382(442).
We close this section with a couple of properties of divisibility in the presence of relative primality.
17
Proposition 4. The following two statements hold.
(i) If a, b, d ∈ Z are such that d | ab and (d, a) = 1, then d | b.
(ii) If a, b, m ∈ Z are such that a | m, b | m and (a, b) = 1, then ab | m.
Proof.
(i) Assume the hypotheses. From part (i) of Corollary 1, we can find integers x and y such that
1 = dx + ay.
Multiplying by b yields
b = bxd + yab.
But then, since d divides itself as well as ab, we see that it also divides bxd + yab = b.
(ii) Assume the hypotheses. Since b | m, there exists an integer q such that m = bq. But then,
a | m reads a | bq. Since (a, b) = 1, we can invoke part (i) to conclude that a | q. But then,
there is an integer r such that q = ar. We conclude that
m = bq = bar = (ab)r,
so that ab | m, as required.
18
CHAPTER 2. INTEGERS
Chapter 3
Unique Factorization
This chapter is based on [Dud08, §2].
In this section, we prove that the set Z of integers has unique factorization. That is, we show
that every nonzero integer not equal to 1 or −1 can be factored into a product of prime numbers in
an essentially unique way. This does not hold in general for other sets of numbers as the following
example illustrates. For the purposes of this example, we need to distinguish between primes that
never divide a product without dividing one of the individual factors and irreducibles that cannot
be factored nontrivially. The reason for this distinction is precisely because in the case given in the
example, we do not have unique factorization, as we will see.
√
√
√
Example 4. Let Z[ −6] = {a + b −6 | a, b ∈ Z}. Call elements of Z[
√ −6] irreducible if they
cannot be factored nontrivially√in to the product of two elements of Z[ −6]. Here,√by nontrivial
factors, we mean elements of Z[ −6] that do not have absolute value 1. Show that
√ Z[ −6] does not
possess
unique
factorization
into
irreducibles.
Further,
defining
primes
of
Z[
−6] to be elements
√
√
of Z[ −6]
that
cannot
divide
a
product
over
Z[
−6]
without
dividing
one
of
the
individual factors
√
√
over Z[ −6], we have irreducibles in Z[ −6] that are not prime.
Solution. Consider the following equations
√
−6 × −6.
(3.1)
√
√
We will finish the solution by
√ showing that√−2, 3, and −6 are all irreducible in Z[ −6]. If we
have a factorization of e + f −6 ∈ {−2, 3, −6} of the form
√
√
√
e + f −6 = (a + b −6)(c + d −6),
(3.2)
− 6 = −2 × 3 =
√
Then, multiplying by conjugates, we obtain
e2 + 6f 2 = (a2 + 6b2 )(c2 + 6d2 ).
In our cases of interest, we obtain the equations
(a2 + 6b2 )(c2 + 6d2 ) = 4;
(3.3)
2
2
2
2
(3.4)
2
2
2
2
(3.5)
(a + 6b )(c + 6d ) = 9;
(a + 6b )(c + 6d ) = 6.
19
20
CHAPTER 3. UNIQUE FACTORIZATION
Therefore, a2 +6b2 must be a positive divisor of 4, 9, or 6. This forces a2 +6b2 ∈ {1, 2, 3, 4, 6, 9}.
If |b| ≥ 2, then a2 + 6b2 ≥ 6(2)2 = 24 > 9 is too big to lie in this set. Therefore, we have
b ∈ {0, 1, −1}. Similarly, we must have a ∈ {0, ±1, ±2, ±3}. Further, if a and b are both nonzero,
then a2 + 6b2 = a2 + 6 √
∈ {7, 10, . .√
. }. We conclude that one of a, b is√zero (and√the other is not).
We conclude that a + b −6 ∈ {± −6, ±1, ±2, ±3}. Similarly, c + d −6 ∈ {± −6, ±1, ±2, ±3}.
The equations one obtains from (3.2) by substituting the relevant values for e and f become
−2 = ±1 × ∓2;
√
3 = ±1 × ±3;
√
−6 = ± −6 × ±1.
√
In any case, we obtain a factor with absolute value 1 so that −2, 3, and −6 are irreducible,
as required. Finally, we note that this lack of unique factorization is the reason for the√need to
distinguish between primes and irreducibles. Indeed, we have just shown that −2, 3, and −6 are
all irreducible.√However, none of these are prime since (3.1) shows that each divides a product of
elements of Z[ −6] without dividing any of the individual factors:
√
√
√
−2 | ( −6 × −6) but − 2 - −6
√
√
√
3 | ( −6 × −6) but 3 - −6
√
√
√
−6 | (−2 × 3) but −6 - −2 and −6 - 3.
Here, we can say that −2, 3 -
√
√
−6 and −6 - −2, 3 since none of
√
√
3
−2
−6
−6
=√ ,
=√
−2
3
−6
−6
√
lie in Z[ −6].
The lack of unique factorization seen in Example 4 does not hold for the set Z of integers. This
is reflected in the fact that for integers, primes and irreducibles coincide. Before proving that the
set Z of integers possesses unique factorization, we need some preliminaries to which we now turn.
Definition 4 (Primes and Irreducibles). A positive integer p > 1 is called prime if whenever it
divides a product of integers, it divides one of its factors. That is, p > 1 is prime provided
For all integers a and b, p | ab =⇒ p | a or p | b.
A positive integer p > 1 is called irreducible if its only positive factors are 1 and p.
Proposition 5. An integer is prime if and only if it is irreducible.
Proof. Let p > 1 be an integer.
p prime =⇒ p irreducible:
Suppose first that p is prime. Then, whenever p divides a product of integers, it must divide
one of the factors. We wish to prove that p is irreducible. We therefore assume that p can be
21
factored as p = ab for positive integers a and b and then prove, assuming this, that one of a, b is
equal to 1 (and the other is equal to p). This will show that the only positive factors of p are 1
and p. Suppose then that p = ab. Then p | ab and so p | a or p | b since p is prime. If p | a, we
can write a = pq for some positive integer q. But then, we see that p = ab = pqb so that qb=1. It
follows that q = b = 1, and a = p. Similarly, if p | b, we conclude that b = p and a = 1. Therefore
p is irreducible, as required.
p irreducible =⇒ p prime:
Suppose now that p > 1 is irreducible. Then, the only positive divisors of p are 1 and p. It
follows that for any integer a, (p, a) ∈ {1, p}. We need to prove that if p divides a product of
integers, then it must divide one of the factors. Suppose then that for integers a and b, we have
p | ab. We complete the proof by establishing that p | a or p | b. We know that (p, a) is either 1 or
p. If (p, a) = 1, then we can invoke part (i) of Proposition 4 to conclude that p | b. On the other
hand, if (p, a) = p, then p | a. We conclude that p is prime, as required.
The definition of primality implies that whenever a prime divides the product of two integers,
it must divide one of the individual factors. We can extend this to any finite number of factors
using mathematical induction or the least integer principle. This forms the content of the following
proposition.
Proposition 6. Let p be a prime, and suppose that we have integers a1 , . . . , an such that
p | a1 . . . an .
Then, p | ai for some 1 ≤ i ≤ n. In particular, if a prime p divides a product q1 . . . qn of primes
q1 , . . . , qn , then p = qi for some 1 ≤ i ≤ n.
Proof. Let S = {1 ≤ j ≤ n | p | (a1 . . . aj ) but p - ai for any 1 ≤ i ≤ j}. We wish to show that S
is empty. This would imply, in particular, that n 6∈ S so that p | (a1 . . . an ) forces p | ai for some
1 ≤ i ≤ n, as required. Towards a contradiction, suppose that S =
6 ∅. Then S is a nonempty set
of integers that is bounded below. By the least integer principle, S has a least element m. Since
m ∈ S, we have p | (a1 . . . am ) but p - ai for any 1 ≤ i ≤ m. From p | (a1 . . . am−1 )am and the
definition of primality, we conclude that p | (a1 . . . am−1 ) or p | am . Since the latter is impossible,
we conclude that p | (a1 . . . am−1 ). But m is the least element of S and so m − 1 6∈ S. Therefore
p | (a1 . . . am−1 ) forces p | ai for some 1 ≤ i ≤ m − 1. This is a contradiction and proves that S is
indeed empty. As remarked above, this completes the proof that whenever a prime divides a finite
product of integers, it must divide one of the individual factors. The second part follows from the
first together with the fact that the only positive divisors of a prime are 1 and the prime itself.
Indeed, assuming that p divides the product q1 . . . qn of primes q1 , . . . , qn , we can conclude from
the first part that p | qi for some 1 ≤ i ≤ n. Since the only positive divisors of qi are 1 and qi and
p 6= 1, we conclude that p = qi , as required.
Having established that primes and irreducibles coincide for the integers, we now invariably
refer to these fundamental integers as primes. We now proceed to the proof of the Fundamental
Theorem of Arithmetic that expresses the fact that integers possess unique factorization.
22
CHAPTER 3. UNIQUE FACTORIZATION
Lemma 5. Every integer n > 1 is divisible by a prime.
Proof. Let S = {n > 1 | n is not divisible by a prime}. We wish to show that S = ∅. We will
do this by contradiction. Suppose then that S 6= ∅. Then S is a nonempty set of integers that is
bounded below (by 1 for example). By the least integer principle, S has a least element m > 1.
Since m ∈ S, we know that m is not divisible by a prime. In particular, it is composite. Therefore,
there exist positive integers a and b, such that 1 < a, b < m and m = ab. But then, since a < m,
the minimality of m implies that a is not in S. We conclude that a is divisible by some prime p.
But this is a contradiction since the transitivity of divisibility implies that p | a | m so that m is
divisible by a prime. This contradiction implies that S is indeed the empty set so that every integer
greater than one is divisible by a prime, as required.
Lemma 6. Every integer n > 1 can be written as a finite product of primes.
Proof. We will prove this by invoking the least integer principle. Define S to be the set of all
integers greater than one that cannot be expressed as a finite product of primes. We wish to show
that S = ∅, and we will do this by contradiction. Suppose then that S =
6 ∅. Then, S is a nonempty
set of integers bounded below (by 1 for example). By the least integer principle, we conclude that S
has a least element m. Since m is in S, it cannot be expressed as a finite product of primes and so,
in particular, cannot itself be prime. Therefore, there exist integers a and b such that 1 < a, b < m
and m = ab. But then, a and b are both less than m and so cannot lie in S since m is the least
element of S. We conclude that each of a and b is a finite product of primes. Say
a = p1 . . . pn ,
b = q1 . . . qk ,
for primes p1 , . . . , pn , q1 , . . . , qk . We then obtain
m = ab = p1 . . . pn q1 . . . qk
is a finite product of primes, thereby contradicting m ∈ S. This contradiction proves that S is
indeed empty so that every integer greater than one can be written as a finite product of primes,
as required.
Theorem 5 (Euclid). There are infinitely many primes.
Proof. Towards a contradiction, suppose this is false. Then, there are only finitely many primes,
say p1 , . . . , pn . Consider the positive integer N given by
N = p1 p2 . . . pn + 1.
Since N > 1, we know from Lemma 5 that N is divisible by a prime q. Since we are assuming
that the only primes that exist are p1 , . . . , pn , we see that q must be one of the pj . But this is
impossible since then q would divide both N and p1 p2 . . . pn and consequently would also divide
N − p1 p2 . . . pn = 1. This contradiction completes the proof.
23
Lemma 7. Every composite integer n has a prime divisor less than or equal to
√
n.
Proof. We will once again prove this using the least integer principle. Let S be the set of all
composite integers that do not have a prime divisor less than or equal to their square root. We
wish to show that S = ∅, and will do this by contradiction. Suppose then that S is nonempty.
Then S is a nonempty set of integers, and since each of its members is greater than one, S is
bounded below. By the least integer principle S has a least element m. Since m is composite, we
can find integers a and b such that 1 < a, b <√m and m = ab. If a and b √
were both prime then,
m,
we
would
have
a,
b
>
m which would force
since m only
has
prime
divisors
greater
than
√ √
m = ab > m m = m. Therefore, at least one of a and b is composite. If a is composite, then
it is smaller than m and so does not lie in S. It√therefore
√ has a prime divisor p less than or equal
to its square root. But then p | a | m and p ≤ a ≤ m thereby contradicting m ∈ S. Similarly,
if b were composite, it would have a prime divisor√less than or equal to its square root, and this
prime would be a divisor of m less than or equal to m. This contradiction implies that S is indeed
empty so that every composite integer has a prime divisor less than or equal to its square root, as
required.
We have arrived at the Fundamental Theorem of Arithmetic that expresses the fact that the set
of integers possesses unique factorization.
Theorem 6 (The Fundamental Theorem of Arithmetic). Every positive integer can be written
uniquely as a product of primes.
Proof. Here, we consider two products of primes to be the same provided they differ only in the
ordering of the primes involved in the product. Let n be a positive integer. If n = 1, then we agree
that n is the empty product of primes. On the other hand, if n > 1, we know from Lemma 6 that
n can be written in at least one way as a product of primes. What we need to prove is that this
can be done in only one way. Suppose then that
n = p1 . . . pk = q1 . . . qm
(3.6)
for primes p1 , . . . , pk , q1 , . . . , qm . We complete the proof by showing that {p1 , . . . , pk } = {q1 , . . . , qm }.
This can be done by induction or by the least integer principle, but let’s see why this holds using
a more natural heuristic argument. From (3.6) we conclude that
p1 | (q1 . . . qm ).
From Proposition 6 we conclude that p1 = qi for some 1 ≤ i ≤ m. We can then divide both sides
of (3.6) by p1 = qi to obtain
p2 . . . pk = q1 . . . qi−1 qi+1 . . . qm .
Continuing in this fashion, we can pair off each of the p` ’s with one of the qt ’s until no more primes
appear on the left hand side. We must then have k = m for otherwise we would end up with a
product of primes equal to 1 which is impossible. We conclude that {p1 , . . . , pk } = {q1 , . . . , qm }, as
required.
From the Fundamental Theorem of Arithmetic, we know that every positive integer can be
factored uniquely into a product of primes. Collecting together like primes in this factorization
leads to the prime-power factorization of the integer in question. This is the content of the following
definition.
24
CHAPTER 3. UNIQUE FACTORIZATION
Definition 5 (Prime-Power Factorization). Let n be a positive integer. Then n can be written
uniquely in the form
n = pe11 . . . pekk
(3.7)
for distinct primes p1 , . . . , pk and positive integers e1 , . . . , ek . The factorization given by (3.7) is
called the prime-power factorization of n.
To conclude this section, we note that in the presence of unique factorization, we have another
way of determining the greatest common divisor of two integers. We first state a lemma that
characterizes the positive divisors of an integer in terms of its prime divisors.
Lemma 8. Let n be a positive integer with prime-power factorization given by
n = pe11 . . . pekk ,
where the pi are distinct primes and the ei are positive integers. Then, the positive divisors of n
are those d of the form
d = pg11 . . . pgkk ,
where, for all i, 0 ≤ gi ≤ ei .
Proof. It is clear that any integer d of the form stated in the lemma is a divisor of n. On the other
hand, if d is any divisor of n, then its prime power factorization cannot have any primes distinct
from the pi , and cannot have corresponding exponents greater than the ei . This completes the
proof.
We have arrived at the method of calculating greatest common divisors from prime-power factorizations.
Theorem 7. Let m and n be positive integers having prime-power factorizations given by
m = pe11 . . . pekk ;
n = pf11 . . . pfkk ,
for distinct primes p1 , . . . , pk and nonnegative integers e1 , . . . , ek , f1 , . . . , fk . Then, the greatest
common divisor (m, n) of m and n is given by
min{e1 ,f1 }
(m, n) = p1
min{ek ,fk }
. . . pk
.
Proof. From Lemma 8 we know that the positive divisors of m are those integers of the form
pg11 . . . pgkk where 0 ≤ gi ≤ ei for each i, and the positive divisors of n are those integers of the form
pg11 . . . pgkk where 0 ≤ gi ≤ fi for each i. The common divisors of m and n are therefore the integers
of the form pg11 . . . pgkk where 0 ≤ gi ≤ min{ei , fi } for each i, and thus the greatest common divisor
is given by
min{e1 ,f1 }
min{ek ,fk }
. . . pk
(m, n) = p1
as claimed.
Chapter 4
Linear Diophantine Equations
This chapter is based on [Dud08, §3].
Theorem 8. Let a, b, c ∈ Z, and consider the linear diophantine equation
ax + by = c.
(4.1)
If (a, b) - c, then (4.1) has no solutions in integers. On the other hand, if (a, b) | c, (4.1) has
infinitely many solutions in integers parametrized as
x=r+
b
t,
(a, b)
y =s−
a
t,
(a, b)
(t ∈ Z),
(4.2)
where r, s is any particular solution to (4.1).
Proof. Now, if ax + by = c has a solution, then (a, b) must divide c = ax + by since it divides both
a and b and consequently any linear combination of a and b. We are therefore reduced to proving
that when (a, b) | c the solutions to (4.1) are precisely those pairs x, y of the form given in (4.2).
We split the proof of this into two parts. We first show that any pair x, y of the form given in (4.2)
is a solution to the linear diophantine equation (4.1), and then show that every solution to (4.1)
has the form given by (4.2). The first part is a simple calculation. Indeed, if r, s is some particular
solution to (4.1), and x, y are given by (4.2), then
b
a
ax + by = a r +
t +b s−
t
(a, b)
(a, b)
= ar + bs
= 0.
We conclude that any pair x, y of the form given by (4.2) is a solution to the linear diophantine
equation (4.1). Conversely, by the Euclidean Algorithm, we know that there exist integers r0 , s0
such that
ar0 + bs0 = (a, b).
(4.3)
25
26
CHAPTER 4. LINEAR DIOPHANTINE EQUATIONS
Further, from the assumption that (a, b) | c, we have an integer d such that c = (a, b)d. Multiplying
both sides of (4.3) by d yields
a(r0 d) + b(s0 d) = (a, b)d = c.
This proves that we have at least one solution to the linear diophantine equation in question.
Suppose then that r, s is any particular solution to (4.1). We complete the proof by showing that
for every solution x, y to (4.1) there exists an integer t such that
x=r+
b
t,
(a, b)
y =s−
a
t.
(a, b)
Since
ar + bs = c,
and
ax + by = c,
we see that
a(x − r) + b(y − s) = 0.
Dividing by (a, b) yields
But then,
a
(a,b)
b
a
(x − r) =
(s − y).
(a, b)
(a, b)
b
a
b
| (a,b)
(s − y) while (a,b)
, (a,b)
= 1. We conclude that
a
| (s − y).
(a, b)
We therefore have an integer t such that
s−y =
a
t.
(a, b)
y =s−
a
t.
(a, b)
x=r+
b
t
(a, b)
This yields
Substituting this into (4.4), we obtain
as required.
Example 5. Find all positive integer solutions to
343x − 280y = 49.
(4.4)
27
Solution. In Example 3 we found that (343, −280) = 7. Since 7 | 49, Theorem 8 tells us that
343x − 280y = 49 has infinitely many solutions parametrized as
x=r+
−280
t,
7
y =s−
343
t
7
(t ∈ Z),
where r, s is any particular solution. In Example 3, we found that
343(9) − 280(11) = 7.
Multiplying by 7 yields
343(63) − 280(77) = 49,
so that r = 63, s = 77 is a particular solution. We conclude from Theorem 8 that all integer
solutions to 343x − 280y = 49 are given by
x = 63 − 40t,
y = 77 − 49t,
(t ∈ Z).
Finally, the requirement that x, y be a positive solution is the requirement that
63 − 40t > 0,
77 − 49t > 0.
Equivalently, we require
77
63
= 1.575,
t<
≈ 1.5714.
40
49
We conclude that the totality of positive solutions to 343x − 280y = 49 is given by
t<
x = 63 − 40t,
y = 77 − 49t,
(t ∈ Z, t ≤ 1).
Linear diophantine equations can also be disguised in the form of word problems. The following
example illustrates this.
Example 6. Suppose that you have 5 pennies, 5 nickles, 6 dimes and 10 quarters. Find all the
possible ways of making $2.99 in change.
Solution. The equation that needs to be solved is
x + 5y + 10z + 25w = 299.
We break this up into three binary linear diophantine equations as follows:
A + 25w = 299;
(4.5)
B + 10z = A;
(4.6)
x + 5y = B.
(4.7)
We will solve these equations in succession. Dividing 299 by 25 yields
299 = 25(11) + 24.
Therefore, a particular solution to (4.5) is given by A0 = 24 and w0 = 11. We conclude that all
solutions are given by
A = 24 + 25t,
w = 11 − t
(t ∈ Z).
28
CHAPTER 4. LINEAR DIOPHANTINE EQUATIONS
Since the number of quarters used is nonnegative and at most 10, we obtain
0 ≤ w ≤ 10 =⇒ 0 ≤ 11 − t ≤ 10.
Therefore, we must have 1 ≤ t ≤ 11. We now turn to (4.6). This is given by
B + 10z = 24 + 25t,
where 1 ≤ t ≤ 11. We take the particular solution B0 = 24 + 25t, z0 = 0. We then obtain from
Theorem 8 that all solutions are given by
B = 24 + 25t + 10u,
z = −u
(u ∈ Z).
Since z, being the number of dimes used, is nonnegative and at most 6, we see that we require
0 ≤ −u ≤ 6 =⇒ −6 ≤ u ≤ 0.
We finally turn to equation (4.7). This equation is given by
x + 5y = 24 + 25t + 10u,
where we require 1 ≤ t ≤ 11 and −6 ≤ u ≤ 0. We take the particular solution x0 = 24 + 25t + 10u,
y0 = 0, then obtain the totality of solutions given by
y = −v
x = 24 + 25t + 10u + 5v,
(v ∈ Z).
Since 0 ≤ x, y ≤ 5, we conclude that −5 ≤ v ≤ 0 and that
0 ≤ 24 + 25t + 10u + 5v ≤ 5.
All solutions to our problem are then given by
y = −v,
x = 24 + 25t + 10u + 5v,
z = −u,
w = 11 − t,
for integers t, u, v for which 1 ≤ t ≤ 11, −6 ≤ u ≤ 0, −5 ≤ v ≤ 0 and 0 ≤ 24 + 25t + 10u + 5v ≤ 5.
Note that if t ≥ 3, then
x = 24 + 25t + 10u + 5v
≥ 99 + 10u + 5v
≥ 99 − 60 − 25
= 14
> 5.
We must therefore have 1 ≤ t ≤ 2. Substituting these two values in for t and then finding the
corresponding compatible values for u and v yields the solutions given in the following table.
t
1
1
1
2
2
u
−4
−3
−2
−6
−5
v
−1
−3
−5
−2
−4
⇐⇒
x
4
4
4
4
4
y
1
3
5
2
4
z
4
3
2
6
5
w
10
10
10
9
9
29
We conclude that, in order to make change for $2.99 using the coins we have on hand, we have to
use x pennies, y nickels, z dimes, and w quarters, where the quadruple x, y, z, w is one of the five
possibilities given in the above table.
30
CHAPTER 4. LINEAR DIOPHANTINE EQUATIONS
Chapter 5
Congruences
This chapter is based on [Dud08, §4].
Many times in mathematics it is useful to consider different objects as being equivalent. In order
for this notion of equivalence to be reasonable, we usually force the relation to be an equivalence
relation. That is, a reasonable notion of equivalence on a set X should satisfy the following three
properties:
(i) For all x ∈ X, x is equivalent to itself.
(ii) For all x, y ∈ X, if x is equivalent to y then y is equivalent to x.
(iii) For all x, y, z ∈ X, if x is equivalent to y and y is equivalent to z, then x is equivalent to z.
These properties are referred to as reflexivity, symmetry and transitivity respectively. Examples
of equivalence relations that we are already familiar with include equality = on any set X, the
relation given by similarity of matrices on the space Rn×n of n × n matrices, as well as the relation
given by isomorphism on the set of vector spaces over R. For the purposes of number theory, a very
important equivalence relation on the set Z of integers is obtained by identifying integers that have
the same remainder upon division by a particular positive integer. This is the notion of congruence.
Definition 6 (Congruence). Let a, b ∈ Z and m ∈ N. We say that a is congruent to b modulo m,
written a ≡ b (mod m), provided m | (b − a).
Proposition 7. Let m ∈ N. Then, congruence modulo m is an equivalence relation on the set Z
of integers. That is, for all a, b, c ∈ Z,
(i) (Reflexivity) a ≡ a (mod m)
(ii) (Symmetry) If a ≡ b (mod m) then b ≡ a (mod m).
(iii) (Transitivity) If a ≡ b (mod m) and b ≡ c (mod m), then a ≡ c (mod m).
Proof. If we translate the statements given in (i), (ii) and (iii), they become immediately clear.
Indeed, using the divisibility notation, these statements read, for all a, b, c ∈ Z,
31
32
CHAPTER 5. CONGRUENCES
(i) m | (a − a)
(ii) m | (b − a) =⇒ m | (a − b)
(iii) m | (b − a), (c − b) =⇒ m | (c − a).
Statement (i) is clear since every integer divides 0, statement (ii) is clear since any divisor of b − a
is also a divisor of (−1)(b − a) = a − b, and statement (iii) is clear since any divisor of b − a and
c − b must divide the sum (b − a) + (c − b) = c − a.
Notation 3. For integers a and b and positive integer m, we sometimes denote a ≡ b (mod m) using
the shorthand notation a ≡m b.
Since ≡m is an equivalence relation on Z, we know that the corresponding equivalence classes
form a partition of Z. This is the content of the following theorem.
Theorem 9. Let m be a positive integer. Then every integer is congruent to precisely one of
0, 1, . . . , m − 1 modulo m.
Proof. This follows from the division algorithm. Indeed, if a is an integer, we have unique integers
q and r such that 0 ≤ r < m and a = mq + r. But then r − a = −mq so that m | (r − a). We
conclude that a ≡m r, so that every integer is congruent modulo m to its remainder upon division
by m. Since this remainder r lies in {0, 1, . . . , m − 1}, we conclude that every integer is congruent
modulo m to at least one of 0, 1, . . . , m − 1. On the other hand, if a were congruent to two elements
of {0, 1, . . . , m − 1}, say r1 and r2 , then we’d have
r1 ≡m a ≡m r2 ,
by symmetry, so that by transitivity we could conclude that r1 ≡m r2 . But this implies that
m | (r2 − r1 ). Since −m < r2 − r1 < m, we obtain r2 − r1 = 0 so that r1 = r2 . Therefore, every
integer is congruent modulo m to precisely one of 0, 1, . . . , m − 1 as claimed.
Given an integer a and positive integer m, we refer to the set of all integers to which a is
congruent modulo m as the residue class of a modulo m. The least nonnegative element in this
residue class is the remainder a leaves when divided by m. We call this remainder the least residue
of a modulo m. Recall that, by the division algorithm, we can express the least residue of a modulo
m in the form r = a − mq for some integer q. In fact, as the following theorem shows, the residue
class of a modulo m consists precisely of the integers of this form.
Theorem 10. Let a, b ∈ Z and m ∈ N. Then a ≡ b (mod m) if and only if a = b + km for some
integer k.
Proof. We have
a ≡ b (mod m) ⇐⇒ b ≡ a
(mod m)
⇐⇒ m | (a − b)
⇐⇒ a − b = km for some integer k
⇐⇒ a = b + km for some integer k.
33
Theorem 11. Let a, b ∈ Z and m ∈ N. Then a ≡ b (mod m) if and only if a and b leave the same
remainder when divided by m.
Proof. We know that every integer is congruent modulo m to the remainder it leaves when divided
by m, and so if ra is the remainder left when a is divided by m and rb is the remainder left when b
is divided by m, we have
a ≡m ra ,
b ≡m rb .
We conclude that a ≡m b if and only if ra ≡m rb . However, 0 ≤ ra , rb < m, and so since an integer
can be congruent to only one of 0, 1, . . . , m − 1 modulo m, we see that ra ≡m rb if and only if
ra = rb . All in all, we have shown that a ≡m b if and only if ra = rb , as required.
Summarizing what has been done so far, we have three equivalent ways of expressing that a ≡ b
(mod m). We could say that m divides b − a, or that a = b + km for some integer k, or that a and
b leave the same remainder when divided by m.
We now gather together some properties of congruence modulo m:
Proposition 8. Let a, b, c, d be integers, and m be a positive integer. The following statements
hold:
1. If a ≡m b and c ≡m d then a + c ≡m b + d
2. If a ≡m b and c ≡m d then ac ≡m bd
3. If a ≡m b and d is a positive divisor of m then a ≡d b
4. If a ≡m b and c > 0 then ac ≡mc bc
m
c
5. ab ≡m ac if and only if b ≡ (a,m)
6. If ab ≡m ac and (a, m) = 1 then b ≡m c
7. If a ≡m b then (a, m) = (b, m)
Proof. For (1) and (2), assume that a ≡m b and c ≡m d. Then
a = b + mk,
c = d + m`
for some integers k and `. Therefore
a + c = b + d + m(k + `),
ac = (b + mk)(d + m`) = bd + m(b` + kd + mk`).
In particular, (a + c) = (b + d) + mu and ac = bd + mv for some integers u and v. We conclude
that a + c ≡m b + d and ac ≡m bd, as required. For (3), we go back to the original definition
of congruence modulo m. If a ≡m b then m | (b − a). But since d | m, we see that d | (b − a).
Consequently a ≡d b. We now turn to (4). Suppose that a ≡m b. Then a = b + km for some integer
k. Multiplying by c yields ac = bc + k(mc) and we conclude accordingly that ac ≡mc bc. For (5),
suppose first that ab ≡m ac. Then m | (ac − ab). Dividing by (a, m) yields
m
a
|
(c − b).
(a, m) (a, m)
34
But
CHAPTER 5. CONGRUENCES
m
a
(a,m) , (a,m)
= 1 and so we obtain
m
| (c − b)
(a, m)
m
m
so that b ≡ (a,m)
c. Conversely, suppose that b ≡ (a,m)
c. Then
m
| (c − b).
(a, m)
Multiplying by a yields
a
m | (ac − ab).
(a, m)
a
m and so we can conclude from the transitivity of divisibility that m | (ac − ab).
But m | (a,m)
Therefore, ab ≡m ac, as required. Part (6) is an immediate consequence of part (5). Finally, part
(7) is simply a restatement of Lemma 4 using different notation. Indeed, if a ≡m b, then a = mk + b
for some integer k. We can therefore conclude by Lemma 4 that (a, m) = (m, b) = (b, m), as
required.
Remark 4. A special case of part (1) of Proposition 8 provides us with a useful way to switch
between representatives for a particular congruence class. Indeed, if a ≡m b, and k is any integer,
then since km ≡m 0, we see that a+km ≡m b+0 ≡m b. Therefore, if it is convenient, we can always
add or subtract any multiple of m from a without changing its value modulo m. In particular, if
we want to find the least nonnegative integer in the same congruence class as a (which will be the
remainder a leaves when divided by m), we need only continue adding or subtracting m from a
until we obtain an integer between 0 and m − 1.
Proposition 8 tells us that we can treat congruences the same way as equalities, except we need
to be careful with cancellation. We can add, multiply or scale congruences by integers at will, but
need to change the modulus when we cancel. For example, we have
3·8≡3·4
(mod 12),
but
8 6≡ 4
(mod 12).
The correct cancellation is given by part (5) of Proposition 8:
3·8≡3·4
(mod 12) =⇒ 8 ≡ 4
(mod 12/(12, 3)) =⇒ 8 ≡ 4
(mod 4).
Since polynomials with integer coefficients can be built up by successively applying multiplication
and addition, we see that Proposition 8 implies that we can substitute into polynomial congruences.
This is the content of the following result.
Proposition 9. Let f (x) be a polynomial with integer coefficients, a, b be integers and m be a
positive integer. If a ≡m b then f (a) ≡m f (b).
Using this fact together with the fact that the only possible values for integers modulo m are 0,
1, . . . , m − 1 allows for quickly verifying results. Indeed, if we wish to determine when a particular
polynomial expression can take on a particular value modulo m, we need only check each of 0, . . . ,
m − 1 in order to discover the answer. We illustrate this with a couple of examples.
35
Example 7. Show that an integer of the form 4n + 3 cannot be the sum of two squares of integers.
Solution. Consider a sum of squares x2 + y 2 . Since x and y can only take on the values 0, 1, 2, 3
modulo 4, we see that x2 and y 2 must be congruent to one of 02 ≡4 0, 12 ≡4 1, 22 = 4 ≡4 0,
32 = 9 ≡4 1. Therefore, x2 + y 2 ≡4 0 + 0, 0 + 1, 1 + 0, 1 + 1. That is x2 + y 2 ≡4 0, 1, 2. We conclude
that x2 + y 2 6≡ 3 (mod 4), as required.
Example 8. Solve the congruences 3x ≡ 1 (mod 8) and x2 ≡ 1 (mod 8) for x (mod 8).
Solution. We could always just plug in each of 0, . . . , 7 into the congruences to see which ones
work and which ones do not, but in order to get some practice using properties of congruences, we
will solve the congruences similarly to how one would solve the analogous equations. We compute
3x ≡ 1
(mod 8) =⇒ 3x ≡ 9
=⇒ x ≡ 3
(Since 1 ≡8 9)
(mod 8)
(mod 8)
(By part (6) of Prop. 8)
For the second congruence, we proceed as follows:
x2 ≡ 1
(mod 8) =⇒ x2 ≡ 1
(mod 2)
2
=⇒ 2 | (x − 1)
=⇒ 2 | (x − 1)(x + 1)
=⇒ 2 | (x − 1) or 2 | (x + 1)
=⇒ x ≡2 1, −1
=⇒ x ≡2 1.
We conclude that any solution must be congruent to 1 modulo 2. That is, any solution must be odd.
Conversely, suppose that x ≡ 1 (mod 2). Then x = 2k + 1 for some integer k so that
x2 = (2k + 1)2 = 4k 2 + 4k + 1 = 4k(k + 1) + 1.
Now, one of k, k + 1 is even while the other is odd. In any case, we have k(k + 1) ≡ 0 · 1 = 0
(mod 2). Consequently 4k(k + 1) ≡ 4 · 0 = 0 (mod 8). Finally, we note that this implies that
x2 = 4k(k + 1) + 1 ≡8 0 + 1 = 1 so that x is a solution to the congruence in question. Therefore
x2 ≡ 1 (mod 8) ⇐⇒ x ≡ 1 (mod 2).
36
CHAPTER 5. CONGRUENCES
Chapter 6
Linear Congruences
This chapter is based on [Dud08, §5].
Recall that in section 3 we saw how to solve linear diophantine equations. If we express these
equations in congruence notation, we can simplify the process of solving these equations. In particular, instead of invoking the Euclidean algorithm to find a particular solution to our equation,
by switching to congruence notation we can sometimes find a particular solution by inspection.
We first restate Theorem 8 in terms of congruences. Now, given the linear diophantine equation
ax + by = c, we know that the solutions coincide with the solutions of −ax − by = −c. We can
therefore assume that b ≥ 0. Also, when b = 0, the equation becomes ax = c which fails to be of
much interest. We therefore arrive at the equation ax + by = c for b > 0. We can then rewrite this
equation as ax ≡b c. Further, we know that when there is a solution r, s to the linear diophantine
equation ax + by = c, that there are infinitely many solutions given by
x=r+
b
t,
(a, b)
y =s−
a
t
(a, b)
(t ∈ Z).
(6.1)
Denoting (a, b) by g, we can express (6.1) as
x ≡ b r.
g
Therefore, when there exists a solution, there is precisely one congruence class of solutions for
b
x modulo (a,b)
. But, with g = (a, b), we have x ≡ r (mod b/g) if and only if x is congruent to
b
one of r, r + g , . . . , r + (g − 1) gb modulo b. Therefore, when ax ≡ c (mod b) has a solution, it
has precisely g = (a, b) solutions modulo b (which correspond to the unique solution modulo b/g).
Switching to the more familiar notation obtained by using m in place of b and b in place of c, we
obtain the following theorem.
Theorem 12. Consider the linear congruence
ax ≡ b (mod m)
(6.2)
for integers a and b and positive integer m. If (a, m) - b, (6.2) has no solutions, while if (a, m) | b,
(6.2) has precisely (a, m) solutions.
37
38
CHAPTER 6. LINEAR CONGRUENCES
Example 9. Solve the linear diophantine equation
343x − 280y = 49
by converting the equation to a linear congruence.
Solution. We came across this linear diophantine equation in Example 5. There, we found that
the integer solutions were given by
x = 63 − 40t,
y = 77 − 49t
(t ∈ Z).
We now show how to obtain this via congruences. We start by rewriting our linear diophantine
equation as the congruence
−280y ≡ 49 (mod 343).
Dividing through by 7 (remembering to divide the modulus by 7 as well) yields
−40y ≡ 7
(mod 49).
Replacing −40 by 9 to which it is congruent modulo 49 yields
9y ≡ 7
(mod 49).
One can obtain via a quick application of the Euclidean algorithm that
9(11) − 2(49) = 1.
Thus
9(11) ≡ 1
(mod 49).
We then multiply our congruence 9y ≡ 7 (mod 49) by 11 to obtain
9(11)y ≡ 7(11)
(mod 49)
which reduces to
y ≡ 77
(mod 49)
We can therefore write y = 77 + 49s for some integer s. Defining t = −s, we obtain
y = 77 + 49s = 77 − 49t.
Substituting this into our original equation and solving for x yields x = 63 − 40t as expected.
We close this section with a very important theorem that allows us to solve systems of simultaneous linear congruences. It is the celebrated Chinese Remainder Theorem. First we need a
lemma.
Lemma 9. Let m and n be relatively prime positive integers and a and b be arbitrary integers. If
a ≡m b and a ≡n b then a ≡mn b.
39
Proof. This is simply a restatement of part (ii) of Proposition 4. Indeed, if a ≡m b and a ≡n b,
then
m | (b − a) and n | (b − a),
then, since (m, n) = 1, we can conclude that mn | (b−a). That is, a ≡ b (mod mn), as required.
We are now ready to state and prove the Chinese Remainder Theorem.
Theorem 13 (The Chinese Remainder Theorem). Let a1 , . . . , ak be integers and m1 , . . . , mk be
positive integers that are relatively prime in pairs: (mi , mj ) = 1 for i 6= j. The system of congruences
x ≡ a1
(mod m1 )
x ≡ a2
..
.
(mod m2 )
x ≡ ak
(mod mk )
has a unique solution modulo the product m1 m2 . . . mk .
Proof. Let m = m1 . . . mk . For each j, we have (m1 . . . mj−1 mj+1 . . . mk , mj ) = 1 and so we can
express 1 as a linear combination of m1 . . . mj−1 mj+1 . . . mk and mj . If this linear combination is
given by
(m1 . . . mj−1 mj+1 . . . mk )bj + mj cj = 1,
then we have
(m1 . . . mj−1 mj+1 . . . mk )bj ≡ 1
For ease of notation, we will write
m
mj
(mod mj ).
instead of m1 . . . mj−1 mj+1 . . . mk . Set
x0 =
k
X
m
bj aj .
m
j
j=1
(6.3)
We claim that the residue class of x0 modulo m is the unique solution modulo m we are after. First
of all, for any 1 ≤ i ≤ k, we have
x0 =
k
X
m
m
bj aj ≡
bi ai ≡ (1)ai = ai
mj
mi
j=1
(mod mi )
since every term in the sum except the i-th term is divisible by mi . We conclude that x0 is indeed
a solution to our system of congruences. On the other hand, if x is any solution to our system of
congruences, then, for any 1 ≤ i ≤ k, we have
x ≡ ai ≡ x0
(mod mi ).
Since (m1 , m2 ) = 1, we can invoke Lemma 9 to conclude that
x ≡ x0
(mod m1 m2 ).
40
CHAPTER 6. LINEAR CONGRUENCES
Then, since (m1 m2 , m3 ) = 1, we can invoke Lemma 9 once again to obtain
x ≡ x0
(mod m1 m2 m3 ).
Continuing in this fashion, we eventually obtain
x ≡ x0
(mod m)
as required.
The Chinese Remainder Theorem guarantees that we will always be able to find a solution to a
system of linear congruences modulo relatively prime moduli, and we could use (6.3) to write down
this solution. In practise, however, it is usually easier just to solve the congruences in succession.
We illustrate this with an example.
Example 10. Find the unique solution modulo 60 to the following system of linear congruences:
3x ≡ 2
(mod 4)
(6.4)
2x ≡ 1
(mod 3)
(6.5)
3x ≡ 4
(mod 5).
(6.6)
Solution. We start by rewriting these congruences in the form x ≡ a (mod m) by multiplying by a
suitable integer to eliminate the coefficient of x. Since 3 · 3 = 9 ≡ 1 (mod 4), 2 · 2 = 4 ≡ 1 (mod 3)
and 3 · 2 = 6 ≡ 1 (mod 5), we multiply (6.4) by 3, (6.5) by 2, and (6.6) by 2. We obtain
x≡6
(mod 4)
(6.7)
x≡2
(mod 3)
(6.8)
x≡3
(mod 5).
(6.9)
We now solve the congruences (6.7), (6.8), (6.9) in succession. From (6.7), we find that
x = 6 + 4k
for some integer k. We then substitute this into (6.8) to obtain
6 + 4k ≡ 2
(mod 3).
This simplifies to
k≡2
(mod 3),
since 6 ≡ 0 (mod 3) and 4 ≡ 1 (mod 3). We conclude that k = 2 + 3` for some integer ` so that
x = 6 + 4k = 6 + 4(2 + 3`) = 14 + 12`.
We then substitute this into (6.9) to obtain
14 + 12` ≡ 3
(mod 5).
This simplifies to
4 + 2` ≡ 3
(mod 5)
41
since 14 ≡ 4 (mod 5) and 12 ≡ 2 (mod 5). Thus
2` ≡ −1 ≡ 4
(mod 5).
Dividing by 2 (which is valid since (2, 5) = 1), or equivalently, multiplying by 3, we obtain
`≡2
(mod 5).
We conclude that ` = 2 + 5m for some integer m. Finally, this yields
x = 14 + 12` = 14 + 12(2 + 5m) = 38 + 60m.
The unique solution modulo 60 is then given by x ≡ 38 (mod 60).
42
CHAPTER 6. LINEAR CONGRUENCES
Chapter 7
Fermat’s and Wilson’s Theorems
This chapter is based on [Dud08, §6].
In this section, we prove the following two theorems.
Theorem 14 (Fermat’s Little Theorem). Let a, p ∈ Z with p prime. Then ap ≡ a (mod p). In
particular, if (a, p) = 1, then ap−1 ≡ 1 (mod p).
Theorem 15 (Wilson’s Theorem). A positive integer p is a prime if and only if (p − 1)! ≡ −1
(mod p).
We start the proof of Fermat’s Little Theorem with the following lemma.
Lemma 10. Let a ∈ Z and m ∈ N be such that (a, m) = 1. Then the least residues of
a, 2a, 3a, . . . , (m − 1)a
(mod m)
are
1, 2, 3, . . . , m − 1
in some order. That is, modulo m, multiplication by an integer a relatively prime to m simply
permutes 1, 2, . . . , m − 1.
Proof. If we can show that none of a, 2a, . . . , (m − 1)a is congruent to 0 modulo m and that no two
of these multiples of a are congruent modulo m, then we will be done. Indeed, this will imply that
a, 2a, . . . , (m − 1)a are m − 1 distinct nonzero residue classes modulo m. Since there are only m − 1
such residue classes, namely 1, 2, . . . , m − 1, we will be able to conclude that
{a, 2a, . . . , (m − 1)a} = {1, 2, . . . , m − 1}
(mod m).
To this end, suppose that ja ≡ 0 (mod m) for some 1 ≤ j ≤ m − 1. Then, since (a, m) = 1,
we would have to conclude that j ≡ 0 (mod m) thereby contradicting 1 ≤ j ≤ m − 1. We have
therefore shown that none of the multiples of a in question is congruent to 0 modulo m. Finally,
if, for some 1 ≤ i, j ≤ m − 1, we had ia ≡ ja (mod m), then using (a, m) = 1, we could cancel a
from both sides to obtain i ≡ j (mod m). Finally, since i and j both lie between 1 and m − 1, we
conclude that i = j. Therefore no two of the multiples of a in question are congruent modulo m.
This completes the proof.
43
44
CHAPTER 7. FERMAT’S AND WILSON’S THEOREMS
We are now prepared to prove Fermat’s Little Theorem:
Proof of Fermat’s Little Theorem. For (a, p) > 1, ap ≡ a (mod p) reads 0 ≡ 0 (mod p) which
clearly holds. We can therefore assume that (a, p) = 1. We then need to prove that ap−1 ≡ 1
(mod p). To this end, we first invoke Lemma 10 to conclude that a, 2a, . . . , (p − 1)a is simply a
reordering of 1, 2, . . . , p − 1 modulo p. We can then multiply together these residues to obtain
a(2a)(3a) . . . [(p − 1)a] ≡ 1(2)(3) . . . (p − 1)
(mod p).
Simplifying yields
ap−1 (p − 1)! ≡ (p − 1)!
(mod p).
Finally, since (p − 1)! = (p − 1)(p − 2) . . . 2(1) is a product of positive integers less than p, we see
that ((p − 1)!, p) = 1. We can therefore divide each side by (p − 1)! to obtain
ap−1 ≡ 1
(mod p)
as required.
We turn now to the proof of Wilson’s Theorem. We need a preliminary lemma.
Lemma 11. Let p be a prime. Then, the congruence x2 ≡ 1 (mod p) has precisely two solutions:
1 and −1 ≡ p − 1 (mod p).
Proof. Indeed, x2 ≡ 1 (mod p) is equivalent to
p | (x2 − 1) = (x − 1)(x + 1).
Since p is prime, this is equivalent to p | (x − 1) or p | (x + 1). That is, x ≡ 1 (mod p) or
x ≡ −1 ≡ p − 1 (mod p).
From the Euclidean Algorithm, we know that given any two relatively prime integers a and m,
there exists integers x and y such that
ax + my = 1.
In fact, since this equation implies that the greatest common divisor of a and m is a positive divisor
of 1, we see that a and m are relatively prime if an only if ax + by = 1 for some integers x and y. In
turn, for m > 0, this is equivalent to the existence of an integer x such that ax ≡ 1 (mod m). That
is, for m > 0, (a, m) = 1 is equivalent to a having an inverse modulo m. Further, if x and y are
both inverses of a modulo m, then we’d have ax ≡ 1 ≡ ay (mod m) which would imply that x ≡ y
(mod m) since (a, m) = 1 allowing us to divide congruences modulo m by a. We conclude that
the integers relatively prime to m are precisely the ones that have an inverse modulo m, and that
when an inverse exists, it is unique modulo m. We can therefore refer to the inverse of a modulo m
when it exists, and denote it using the familiar notation a−1 . When m is equal to a prime p, every
integer that is not a multiple of p is relatively prime to p and so has an inverse modulo p. What
Lemma 11 says is that the only residue classes that are their own inverses modulo a prime p are 1
and p − 1. So, out of the residue classes 0, 1, . . . , p − 1, only 0 fails to have an inverse modulo p,
and the only two that are their own inverses are 1 and p − 1. We summarize this in the following
lemma.
45
Lemma 12. Let m be a positive integer and a be an arbitrary integer. Then, a has an inverse
modulo m if and only if (a, m) = 1. When this is the case, the inverse is uniquely determined
modulo m and denoted by a−1 . In the special case m = p is a prime, the residue classes possessing
an inverse modulo p are 1, 2, . . . , p − 1, and among these, only 1 and p − 1 are their own inverse.
We now have all that is required to prove Wilson’s Theorem:
Proof of Wilson’s Theorem. Suppose first that p is prime. If p = 2, then (p − 1)! = 1! = 1 ≡ −1
(mod 2). We can therefore assume that p is odd. Consider the product
1(2) . . . (p − 2)(p − 1)
of all the nonzero residue classes modulo p. By Lemma 12, we know that each of these residue
classes has a unique inverse, and the only two that are equal to their inverse are 1 and p − 1. Each
of 2, 3, . . . , p − 2 therefore gets multiplied by its inverse to yield 1 modulo p reducing the product
to p − 1 which is −1 modulo p. That is, denoting the inverse of a modulo p by a−1 , we obtain
(p − 1)! = (2) . . . (p − 2)(p − 1)
−1
≡ 1(2 · 2
−1
)(3 · 3
"
)...
p−1
2
p−1
2
−1 #
(p − 1)
≡ 1(1)(1) . . . (1)(p − 1)
=p−1
≡ −1
(mod p).
This completes the proof of the “only if” direction of Wilson’s Theorem. Conversely, suppose that
m is composite. We need to prove that (m − 1)! 6≡ −1 (mod m). But this is easily proved with
the help of Lemma 12. Indeed, the fact that m is composite implies that m has a nontrivial proper
positive divisor d with 1 < d < m. But then d appears in the product that defines (m − 1)! so that d
is a common divisor of (m − 1)! and m. We conclude that (m − 1)! and m fail to be relatively prime
so that (m − 1)! cannot have an inverse modulo m. In particular, (m − 1)! cannot be congruent
to −1 modulo m (or to any other invertible residue class modulo m). This completes the proof of
Wilson’s Theorem.
Fermat’s Little Theorem provides us with an efficient method of finding the least residue of large
powers of integers modulo primes. We illustrate this with the following example.
Example 11. Find the least residue of 55754 modulo 17.
46
CHAPTER 7. FERMAT’S AND WILSON’S THEOREMS
Solution. We compute
55754 = 516
359
359
≡ (1)
= 5
· 510
10
·5
(Since 5754 = (16)(359) + 10)
(mod 17)
2 5
= (25)5
≡ 85
(mod 17)
2
= 82 · 8
= (64)2 · 8
≡ (−4)2 · 8
= 128
≡9
(mod 17).
(By Fermat’s Little theorem)
Chapter 8
The Divisors of an Integer
This chapter is based on [Dud08, §7].
In this section, two important members of the class of multiplicative functions are introduced.
One of these is the number of divisors function d that assigns to a positive integer the number
of its positive divisors. The other is the sum of the positive divisors function σ that assigns to a
positive integer the sum of its positive divisors. We start by defining multiplicative functions and
then proceed to the introduction to these two particular examples.
Definition 7 (Multiplicative Function). A function f defined on the set of positive integers N is
called multiplicative provided
f (mn) = f (m)f (n) for all positive integers m and n with (m, n) = 1.
A multiplicative function f is called totally multiplicative provided
f (mn) = f (m)f (n) for all positive integers m and n.
Note that the values of a multiplicative function f are completely determined by its values on
prime powers. Indeed, if f (pk ) is known for all prime powers pk , then, for any n ∈ N, we have a
prime-power factorization
n = pe11 . . . perr
for distinct primes p1 , . . . , pr and positive integers e1 , . . . , er . Since the prime powers pei i are relatively prime, we must have
f (n) = f (pe11 ) . . . f (perr ).
Similarly, if f is completely multiplicative, its values are completely determined by the values it
takes on at primes. Indeed, with n as above, if we know the values of the f (pi ), we must have
f (n) = f (p1 )e1 . . . f (pr )er .
This is similar to the fact that a linear transformation of vector spaces is completely determined by
its values on a basis. Indeed, with respect to multiplication, the set of primes can be considered a
basis for the set of positive integers, and then completely multiplicative functions can be considered
47
48
CHAPTER 8. THE DIVISORS OF AN INTEGER
as the “linear transformations” in this context. Indeed, this situation is made rigorous if we consider
scalar multiplication to be given by exponentiation and vector addition to be given by product. We
illustrate the determination of multiplicative (resp. completely multiplicative) functions by their
values on prime powers (resp. primes) in the following example.
Example 12. Let f and g be functions defined on the set N of positive integers. Suppose further
that
f (22 ) = 3,
f (7) = −2;
g(2) = −4,
g(5) = 7.
(i) Assuming that f is multiplicative, find f (28).
(ii) Assuming that g is completely multiplicative, find g(500).
Solution. For part (i), we note that 28 = 22 · 7. Therefore, since f is multiplicative, we have
f (28) = f (22 · 7) = f (22 )f (7) = 3(−2) = −6.
For part (ii), we note that 500 = 22 · 53 . Therefore, since g is completely multiplicative, we have
g(500) = g(22 · 53 ) = g(2)2 g(5)3 = (−4)2 (7)3 = 5488.
We turn now to the two
Pexamples of multiplicative functions we will investigate in this section.
We will use the notation d|n to denote the sum over the set of all positive divisors of n. With
this notation, we make the following definition.
Definition 8. Let r ∈ N0 . We define the function σr on N by
X
σr (n) =
dr .
d|n
Two particular cases of interest are obtained by taking r = 0 and r = 1. For r = 0 we obtain the
number of positive divisors function d defined on N by
X
d(n) =
1
d|n
while for r = 1 we obtain the sum of the positive divisors function σ defined on N by
X
σ(n) =
d.
d|n
The main result to be proved in this section is that for all r ≥ 0, the function σr is multiplicative.
Taking r = 0 and r = 1 will prove the multiplicativity of the functions d and σ in particular. We
prove the multiplicativity of the σr by combining prime-power factorizations with induction. The
details are given below.
49
Theorem 16. Let r ∈ N0 . The function σr defined on N by
X
σr (n) =
dr
(n ∈ N)
d|n
is multiplicative.
Proof. Let r ∈ N0 and m, n ∈ N be such that (m, n) = 1. We need to prove that σr (mn) =
σr (m)σr (n). Now, we have prime-power factorizations
m = pe11 . . . pekk ;
n=
q1f1
(8.1)
. . . q`f` ,
(8.2)
where the pi 6= pj for i 6= j, qi 6= qj for i 6= j and the ei and fj are positive integers. Since
(m, n) = 1, we also have that the pi are distinct from the qj . Therefore, if we can show that for
any product P1g1 . . . Ptgt of distinct prime powers P1g1 , . . . , Ptgt , we have
σr (P1g1 . . . Ptgt ) = σr (P1g1 ) . . . σr (Ptgt ),
we’d be able to conclude that
σr (mn) = σr pe11 . . . pekk q1f1 . . . q`f`
(8.3)
(From (8.1) and (8.2))
= σr (pe11 ) . . . σr (pekk )σr (q1f1 ) . . . σr (q`f` )
(From (8.3))
= σr (pe11 . . . pekk )σr (q1f1 . . . q`f` )
(From (8.3))
= σr (m)σr (n)
(From (8.1) and (8.2))
as required. We have therefore reduced the proof to establishing that for distinct prime powers
P1g1 , . . . , Ptgt , we have
σr (P1g1 . . . Ptgt ) = σr (P1g1 ) . . . σr (Ptgt ).
We will establish this by induction on the number t of prime powers appearing in the product. To
this end, let S be the set of all t ≥ 1 such that σr is multiplicative for the product of t distinct
prime powers. We show that S contains all of N by induction. For t = 1, there is nothing to show
since we only have one prime power in question. Both sides of (8.3) are therefore equal to σr (P1g1 ).
We conclude that 1 ∈ S. Fix a positive integer t and suppose that t ∈ S. We complete the proof
gt+1
by showing that t + 1 ∈ S. Consider then a product P1g1 . . . Ptgt Pt+1
of distinct prime powers P1g1 ,
gt+1
. . . , Pt+1 . Define
N = P1g1 . . . Ptgt ,
g
t+1
so that (N, Pt+1 ) = 1, our product is given by N Pt+1
, and we are assuming as inductive hypothesis
that
σr (N ) = σr (P1g1 ) . . . σr (Ptgt ).
(8.4)
Let d1 , . . . , ds be the positive divisors of N . Since (N, Pt+1 ) = 1, all the positive divisors of the
gt+1
product N Pt+1
are given by the following array:
1
Pt+1
2
Pt+1
..
.
d1
d1 Pt+1
2
d1 Pt+1
..
.
d2
d2 Pt+1
2
d2 Pt+1
..
.
...
...
...
ds
ds Pt+1
2
ds Pt+1
..
.
t+1
Pt+1
t+1
d1 Pt+1
t+1
d2 Pt+1
...
t+1
ds Pt+1
g
g
g
g
50
CHAPTER 8. THE DIVISORS OF AN INTEGER
g
t+1
In order to compute σr (N Pt+1
), we raise each of the positive divisors in the above array to the
r-th power and then sum the resulting numbers. If we set d0 = 1 and sum by rows we obtain
g
t+1
)=
σr (N Pt+1
s
X
j=0
=
s
X
j=0
drj +
s
X
r
(dj Pt+1 ) +
j=0
r
drj + Pt+1
s
X
2
dj Pt+1
r
+ ··· +
j=0
s
X
2
drj + Pt+1
j=0
r
2
= 1 + Pt+1
+ Pt+1
s
X
g
t+1
dj Pt+1
j=0
s
r X
g
t+1
drj + · · · + Pt+1
s
r X
j=0
r
j=0
s
X
gt+1 r
+ · · · + Pt+1
drj
j=0
=
=
r
gt+1
)σr (N )
σr (Pt+1
gt+1
).
σr (N )σr (Pt+1
We conclude from (8.4) that
gt+1
gt+1 gt+1
)
= σr (N Pt+1
) = σr (P1g1 ) . . . σr (Ptgt )σr (Pt+1
σr P1g1 . . . Pt+1
as required. We conclude that t + 1 ∈ S. By induction the proof is complete.
drj
Chapter 9
Perfect Numbers
This chapter is based on [Dud08, §8].
In this section, we introduce perfect numbers. We then give the complete characterization of
the even perfect numbers due to Euclid and Euler.
Definition 9 (Perfect Numbers). A positive integer n is said to be perfect if it is equal to the sum
of its proper positive divisors. That is, n is perfect provided
X
n=
d − n ⇐⇒ σ(n) = 2n.
d|n
Example 13. The first four perfect numbers are 6, 28, 496, and 8128. These numbers are perfect
since they are all equal to the sum of their proper positive divisors:
6 = 1 + 2 + 3;
28 = 1 + 2 + 4 + 7 + 14;
496 = 1 + 2 + 4 + 8 + 16 + 31 + 62 + 124 + 248;
8128 = 1 + 2 + 4 + 8 + 16 + 32 + 64 + 127 + 254 + 508 + 1016 + 2032 + 4064.
We note that
6 = 22−1 (22 − 1);
28 = 23−1 (23 − 1);
496 = 25−1 (25 − 1);
8128 = 27−1 (27 − 1),
and that all of 3 = 22 − 1, 7 = 23 − 1, 31 = 25 − 1 and 127 = 27 − 1 are prime numbers. This is a
special case of the main result of this section.
No odd perfect numbers are known, whereas, the even perfect numbers have been completely
classified by Euler. In order to state this classification, we need to define Mersenne primes. These
51
52
CHAPTER 9. PERFECT NUMBERS
primes are the ones that are one less than a power of two. In searching for such primes, we need
only look at the numbers that are one less than a prime power of 2 as shown by the following
proposition.
Proposition 10. Let m ∈ N. If 2m − 1 is prime then m is itself prime.
Proof. We will prove the contrapositive. That is, we will show that if m is composite then so too
is 2m − 1. But this follows easily since if m = ab for integers a and b with 1 < a, b < m, then we
have the factorization
2m − 1 = 2ab − 1 = (2a − 1)(1 + 2a + 22a + · · · + 2(b−1)a ),
where 1 < 2a − 1, 1 + 2a + 22a + · · · + 2(b−1)a < 2m − 1. This shows that 2m − 1 is composite, as
required.
This brings us to the definition of Mersenne primes.
Definition 10 (Mersenne Prime). A prime is called a Mersenne prime if it is one less than a power
of 2. By Proposition 10, the Mersenne primes are the prime numbers of the form 2p − 1 for p prime.
We ave arrived at the characterization of the even perfect numbers due to Euclid and Euler.
Theorem 17 (Euclid, Euler). The even perfect numbers are precisely those numbers n of the form
n = 2p−1 (2p − 1)
(9.1)
where p is a prime and 2p − 1 is a (Mersenne) prime.
Proof. We first show that every integer n of the form (9.1) is a perfect number. This was shown
by Euclid. We then complete the proof by showing that every even perfect number n has the form
given by (9.1). This is the contribution of Euler. The first part is a simple calculation. Indeed,
since 2p−1 is a power of two and 2p − 1 is odd, we see that (2p−1 , 2p − 1) = 1. We conclude from
the multiplicativity of σ that for n defined by (9.1),
σ(n) = σ(2p−1 (2p − 1))
= σ(2p−1 )σ(2p − 1)
= (1 + 2 + · · · + 2p−1 )(1 + (2p − 1))
2p − 1 p
=
·2
2−1
= 2[2p−1 (2p − 1)]
= 2n.
We conclude that n is perfect as claimed. Conversely, suppose that n is an even perfect number.
We need to show that there exists a prime p such that 2p − 1 is also prime and n = 2p−1 (2p − 1).
Suppose that e is the power of 2 in the prime power factorization of n. Then n = 2e m where e ≥ 1
and m is odd. Since m and 1 are both positive divisor of m, we have σ(m) ≥ m + 1 > m. We can
therefore write σ(m) = m + s for some positive integer s. But then, since n is perfect, we must
have
2e+1 − 1
(m + s).
2n = σ(n) ⇐⇒ 2e+1 m =
2−1
53
Therefore, we have
2e+1 m − (2e+1 − 1)m = (2e+1 − 1)s,
or,
m = (2e+1 − 1)s.
We conclude that s < m and is a positive divisor of m. From σ(m) = m + s we can conclude
that s and m are the only positive divisors of m. We conclude that m is prime and s = 1. Thus
m = 2e+1 − 1 is a Mersenne prime. From Proposition 10 we conclude that e + 1 = p for some prime
p so that
n = 2e m = 2p−1 (2p − 1),
for primes p and 2p − 1, as required.
54
CHAPTER 9. PERFECT NUMBERS
Chapter 10
Euler’s Theorem and Function
This chapter is based on [Dud08, §9].
Recall Fermat’s little theorem that used the fact that for prime moduli p the invertible residue
classes where the classes 1, 2, . . . , p − 1 to conclude that for (a, p) = 1, ap−1 ≡ 1 (mod p). If we
reconstruct the same argument using a general modulus m ∈ N, we get Euler’s generalization of
Fermat’s little theorem. First we introduce Euler’s ϕ-function that counts the number of invertible
congruence classes modulo a particular integer.
Definition 11. We define Euler’s ϕ-function on N by
ϕ(n) = #{1 ≤ m ≤ n | (m, n) = 1}
(n ∈ N).
Since we have seen that the invertible classes modulo n are precisely the ones corresponding to
integers relatively prime to n, we see that ϕ(n) is equal to the number of invertible residue classes
modulo n. This observation allows us to generalize the proof of Fermat’s little theorem to obtain
Euler’s generalization below.
Theorem 18 (Euler’s Theorem). Let a ∈ Z and n ∈ N. If (a, n) = 1 we have aϕ(n) ≡ 1 (mod n).
Proof. We take our cue from the proof of Fermat’s little theorem and consider the set S of invertible
elements modulo n. As we have seen, this set S contains ϕ(n) classes and is given by
S = {1 ≤ m ≤ n | (m, n) = 1}.
As in the proof of Fermat’s little theorem, we show that multiplication by a is a permutation of S.
We will then be able to conclude that S = aS (where aS = {ax | x ∈ S}) so that multiplying the
elements of S together yields
Y
Y
(ax) ≡
x (mod n).
(10.1)
x∈S
x∈S
Finally, since each x ∈ S is relatively prime to n (and so can be cancelled from (10.1)) and there
are ϕ(n) elements in S, we conclude that
aϕ(n) ≡ 1
55
(mod n)
56
CHAPTER 10. EULER’S THEOREM AND FUNCTION
as required. We conclude the proof by observing that for all x ∈ S, ax ∈ S, and that no two distinct
ax ∈ aS are congruent modulo n. In exactly the same fashion as in the proof of Fermat’s little
theorem, multiplication by a is then a permutation of S, as required.
We now show that Euler’s ϕ-function is another example of a multiplicative function. This will
allow for efficient calculation of its values.
Theorem 19. Euler’s ϕ-function is multiplicative.
Proof. In order to prove the theorem, we need to show that, for positive integers m and n with
(m, n) = 1, we have ϕ(mn) = ϕ(m)ϕ(n). That is, we need to verify that the number of invertible
residue classes modulo mn is equal to the product of the number of invertible residue classes modulo
m and the number of invertible residue classes modulo n. We will do this by way of the Chinese
remainder theorem. First we need some notation. Given an integer r, and a modulus m, we denote
by rm the least residue of r modulo m. That is, we let rm denote the remainder left when r is
divided by m. Then, if Sm , Sn and Smn denote the sets of invertible residue classes modulo m, n
and mn respectively, we will prove that the map
f : Smn → Sm × Sn
given by
f (rmn ) = (rm , rn )
is a one to one correspondence. This will show that
ϕ(mn) = #Smn = #(Sm × Sn ) = (#Sm )(#Sn ) = ϕ(m)ϕ(n)
as required. Here, our function f takes as input some integer less than mn and relatively prime to
mn and reduces it modulo m and n obtaining the two coordinates of the output ordered pair. In
order to complete the proof, we need to show that f is well-defined and that it is one to one and
onto. The fact that f is well defined is a consequence that since our input rmn is relatively prime
to mn, it is also relatively prime to both m and n. But then, since
rmn ≡ rm
(mod m),
rmn ≡ rn
(mod n),
we see that (rm , m) = (rmn , m) = 1 and (rn , n) = (rmn , n) = 1. Also, since rm and rn are least
residues modulo m and n respectively, we have rm < m and rn < n. It follows that rm ∈ Sm and
rn ∈ Sn . We conclude that f (rmn ) ∈ Sm ×Sn , as required. Having established that the definition of
f makes sense, we proceed to showing that it is one to one and onto. We’ll see that this is basically
a restatement of the Chinese remainder theorem. Indeed, given any pair (am , an ) ∈ Sm × Sn , the
Chinese remainder theorem provides us with a solution to the system of congruences
x ≡ am
(mod m),
x ≡ an
(mod n).
Here we have used the assumption that (m, n) = 1. But then, we have xm = am and xn = an . It
follows that
f (xmn ) = (xm , xn ) = (am , an )
so that f is onto as claimed. We see that the existence part of the Chinese remainder theorem proved
that f was onto. The uniqueness part will prove that f is one-to-one. Indeed, if f (rmn ) = f (smn ),
then rmn and smn are both solutions to the system of congruences
x ≡ rm
(mod m),
x ≡ rn
(mod n).
57
It follows from the Chinese remainder theorem that rmn ≡ smn (mod mn). But this forces rmn =
smn since rmn and smn both lie between 1 and mn. We conclude that f is one-to-one, as required.
Theorem 19 shows that Euler’s ϕ-function is multiplicative. Its values are then completely
determined by its values on prime powers. Since these are easily computed, we obtain a general
formula for computing ϕ(n) in terms of the prime powers appearing in the prime-power factorization
of n. We first state a lemma that gives the values of ϕ on prime powers before stating the result
for general positive integers n.
Lemma 13. Let p be a prime and e be a positive integer. Then
ϕ(pe ) = pe−1 (p − 1).
Proof. In order to prove that ϕ(pe ) = pe−1 (p − 1), we need to count the number of positive integers
less than pe that are relatively prime to pe . We will do this by subtracting from pe the number of
positive integers less than pe that possess a nontrivial common factor with pe . Since the integers
between 1 and pe that possess a nontrivial common factor with pe are given by
p, 2p, 3p, . . . , pe−1 p,
we see that there are pe−1 of these integers. We conclude that
ϕ(pe ) = pe − pe−1 = pe−1 (p − 1)
as required.
We have arrived at the general formula for the values of ϕ at positive integers n.
Theorem 20. Let n ∈ N have the prime-power factorization
n = pe11 . . . perr ,
for distinct primes p1 , . . . , pr and positive integers e1 , . . . , er . We have the formula
ϕ(n) = pe11 −1 (p1 − 1) . . . prer −1 (pr − 1).
Proof. This is a simple consequence of Theorem 19 and Lemma 13. Indeed, from Theorem 19 we
conclude that
ϕ(n) = ϕ(pe11 ) . . . ϕ(perr ),
(10.2)
and from Lemma 13 we conclude that for each 1 ≤ i ≤ r we have
ϕ(pei i ) = piei −1 (pi − 1).
Putting (10.2) and (10.3) together yields
ϕ(n) = ϕ(pe11 ) . . . ϕ(prer ) = p1e1 −1 (p1 − 1) . . . prer −1 (pr − 1)
as required.
(10.3)
58
CHAPTER 10. EULER’S THEOREM AND FUNCTION
As a corollary, we note that the formula given in Theorem 20 can be expressed in an alternative
way.
Corollary 2. Let n be a positive integer and p1 , . . . , pr be the distinct primes appearing in the
prime-power factorization of n. Then
1
ϕ(n) = n 1 −
p1
1
... 1 −
pr
.
Proof. Indeed, if the prime-power factorization of n is given by
n = pe11 . . . perr ,
we can use Theorem 20 to obtain
ϕ(n) = p1e1 −1 (p1 − 1) . . . prer −1 (pr − 1)
e1 er p1
pr
=
...
(p1 − 1) . . . (pr − 1)
p1
pr
pr − 1
p1 − 1
...
= (pe11 . . . perr )
p1
pr
1
1
=n 1−
... 1 −
p1
pr
as required.
We now illustrate Theorem 20 by way of an example.
Example 14. Use Theorem 20 to compute ϕ(500) and ϕ(588).
Solution. We start by decomposing 500 and 588 into their prime-power factorizations. This gives
500 = 22 · 53 ,
588 = 22 · 3 · 72 .
Applying Theorem 20 yields
ϕ(500) = 22−1 (2 − 1)53−1 (5 − 1) = 2 · 25 · 4 = 200,
and
ϕ(588) = 22−1 (2 − 1)31−1 (3 − 1)72−1 (7 − 1) = 2 · 2 · 7 · 6 = 168.
We give an example similar to Example 11 that illustrates how one can apply Euler’s Theorem
to compute the least residue of large powers modulo arbitrary positive integers.
Example 15. Find the least residue of 51549 modulo 588.
59
Solution. In Example 14 we calculated ϕ(588) = 168. Euler’s Theorem then tells us that for
(a, 588) = 1 we have a168 ≡ 1 (mod 588). Since the prime factors of 588 are 2, 3, and 7, we can
apply this result for any integer a that fails to be divisible by 2, 3 and 7. Therefore, we find that
51549 = 5168
9
9
· 537
(Since 1549 = 9 · 168 + 37)
37
(Since 5168 ≡588 1)
≡588 1 · 5
9
= 54 · 5
(Since 37 = 4 · 9 + 1)
9
(Since 54 = 625)
= 625 · 5
≡588 379 · 5
4
= 372 · 37 · 5
(Since 625 ≡588 37)
(Since 9 = 2 · 4 + 1)
4
(Since 372 = 1369)
= 1369 · 37 · 5
≡588 1934 · 37 · 5
2
= 1932 · 37 · 5
(Since 1369 ≡588 193)
2
= 37249 · 37 · 5
(Since 1932 = 37249)
≡588 2052 · 37 · 5
(Since 37249 ≡588 205)
(Since 4 = 2 · 2)
(Since 2052 = 42025)
= 42025 · 37 · 5
≡588 277 · 37 · 5
(Since 42025 ≡588 277)
= 10249 · 5
≡588 253 · 5
(Since 277 · 37 = 10249)
(Since 10249 ≡588 253)
= 1265
(Since 253 · 5 = 1265)
≡588 89
(Since 1265 ≡588 89)
We conclude that the least residue of 51549 modulo 588 is equal to 89.
We conclude this section with a result that we prove using a clever argument due to Gauss.
Theorem 21. For positive integers n we have
X
ϕ(d) = n.
d|n
Proof. The idea of the proof is to partition the set Nn of positive integers less than or equal to n
into equivalence classes obtained using the relation defined by considering two positive integers d1
and d2 less than or equal to n to be equivalent if they have the same greatest common divisor with
n. That is, for an integer d with 1 ≤ d ≤ n, we define Cd by
Cd = {1 ≤ g ≤ n | (g, n) = d}.
Since, for any g, and d, we have (g, n) = d if and only if
g n
d, d
= 1, we conclude that for all
60
CHAPTER 10. EULER’S THEOREM AND FUNCTION
1 ≤ d ≤ n,
#Cd = #{1 ≤ g ≤ n | (g, n) = d}
o
n
g
n g n =1
=# 1≤ ≤ ,
d
d
d d
n
o
n n = # 1 ≤ h ≤ h,
=1
d
d
n
=ϕ
.
d
·
S
Now, we introduce a little bit of notation. If S is a collection of sets, we use the notation S to
·
S
denote the disjoint union of the sets in S. That is,
S S is the set consisting of all elements x that
lie in one of the sets in S, and the · on top of the symbol is there to remind us that the sets in S
share no elements in common (are disjoint). Using this notation, we can express the fact that the
classes Cd partition the set Nn = {1, 2, 3, . . . , n} by
Nn =
Combining this with #Cd = ϕ
n
d
·
[
{Cd | d ≥ 1 and d | n}.
yields
n=
n
X
1
j=1
=
X
1
j∈Nn
=
XX
1
d|n j∈Cd
=
X
#Cd
d|n
X n
=
ϕ
d
d|n
X
=
ϕ(d)
n
d |n
=
X
ϕ(d).
d|n
where the last equality follows from the fact that summing over d instead of
the order of the summands. This completes the proof.
n
d
changes nothing but
Chapter 11
Primitive Roots
This chapter is based on [Dud08, §10].
Given a positive integer m, we will denote the set of congruence classes modulo m by Z/mZ
×
and the subset of invertible classes by (Z/mZ) . It is common in Abstract Algebra to denote these
×
sets by Zm and Zm , respectively, but we will avoid this notation due to the fact that for primes p,
and for the purposes of Number Theory, the notation Zp is typically reserved for the p-adic integers
rather than the integers modulo p.
For those familiar with abstract algebra, Z is an integral domain, mZ is an ideal of Z and
Z/mZ is the corresponding quotient ring, which explains the use of the symbol “/”, but for our
purposes, we can ignore this inherent algebraic structure and simply consider Z/mZ as notation for
×
the integers modulo m. Similarly, (Z/mZ) is not just a set but is in fact an abelian group under
multiplication, but this knowledge is not required in what follows; we can again simply consider
×
(Z/mZ) as notation, this time for the invertible elements modulo m (those that are relatively
prime to m).
×
We know that (Z/mZ) consists precisely of the congruence classes corresponding to integers
×
that are relatively prime to m. Both Z/mZ and (Z/mZ) are finite sets, where the former contains
×
m elements and the latter contains ϕ(m) elements. Now, we know that every element of (Z/mZ)
is invertible modulo m. What is shown in this section is that we can obtain the inverse of an
invertible element a by raising it to a suitable power. This leads us to the notion of the order of
elements, and specifically to the study of primitive roots which are the elements of largest possible
order.
×
Definition 12. Let m ∈ N and a ∈ (Z/mZ) . The least positive integer k such that ak ≡m 1 is
called the order of a modulo m, denoted ordm (a).
×
Proposition 11. Let m ∈ N and a ∈ (Z/mZ) . The order of a modulo m is well-defined.
Proof. What needs to be shown here is that a least positive k such that ak ≡m 1 exists. We will
do this by way of the least integer principle. Suppose then that S = {k ∈ N | ak ≡m 1}. Since
the elements in S are all positive, we see that S is bounded below. We complete the proof by
61
62
CHAPTER 11. PRIMITIVE ROOTS
showing that S is nonempty followed by invoking the least-integer principle. To this end, we note
×
×
that (Z/mZ) is closed under powers. Indeed, if a ∈ (Z/mZ) , then a is invertible modulo m. Say
ab ≡m 1. If k is any positive integer, it follows that
ak bk = (ab)k ≡m 1k = 1.
It follows that ak is also invertible (with inverse bk ). We conclude that
×
{ak | k ∈ N} ⊆ (Z/mZ) .
×
Since (Z/mZ) contains only ϕ(m) elements, we conclude that the set of powers of a modulo m is
finite. Therefore, there must exist distinct positive integers k < ` such that
ak ≡m a` .
This implies that a`−k ≡m 1 so that ` − k ∈ S. We conclude that S is nonempty and then invoke
the least integer principle to obtain a least element k ∈ S. But then k is the least positive power
of a congruent to 1 modulo m. That is, ordm (a) = k exists.
×
Lemma 14. Let m, k, ` ∈ N and a ∈ (Z/mZ) . We have
ak ≡m a` ⇐⇒ k ≡ordm (a) `.
In particular, we have ak ≡m 1 ⇐⇒ ordm (a) | k.
Proof. This follows readily from the division algorithm. Note that the “⇐” direction also holds if
we replace ordm (a) by ϕ(m) and this formed the basis for our method of using Euler’s theorem to
reduce large powers of integers prime to m. To prove the result, we proceed as follows. Since we
are dealing with invertible elements, we are free to use negative exponents. We can also assume,
without loss of generality, that k ≤ `. We then have
ak ≡m a` ⇐⇒ a`−k ≡m 1.
Since we also have
k ≡ordm (a) ` ⇐⇒ ` − k ≡ordm (a) 0,
we are reduced to proving, with n = ` − k ≥ 0, that
an ≡m 1 ⇐⇒ n ≡ordm (a) 0.
We do this by way of the division algorithm. Write n = ordm (a)q + r for (unique) integers q
and r such that 0 ≤ r < ordm (a). Suppose first that an ≡m 1. We have the following chain of
implications:
an ≡m 1 =⇒ aordm (a)q+r ≡m 1
q
=⇒ aordm (a) ar ≡m 1
=⇒ 1q ar ≡m 1
=⇒ ar ≡m 1
=⇒ r = 0.
63
Here, the last implication follows from the fact that ordm (a) is the smallest positive power of
a congruent to 1 modulo m since we know that 0 ≤ r < ordm (a). Conversely, suppose that
n ≡ordm (a) 0. We then have an integer q such that n = ordm (a)q. It follows that
q
an = aordm (a)q = aordm (a) ≡m 1q = 1
as required.
×
Corollary 3. Let m ∈ N and a ∈ (Z/mZ) . Then ordm (a) | ϕ(m).
Proof. This is a simple consequence of combining Euler’s theorem with Lemma 14. Indeed, we
know from Euler’s theorem that aϕ(m) ≡m 1 so that we may invoke Lemma 14 to conclude that
ordm (a) | ϕ(m), as required.
We now know that modulo m every invertible congruence class has order dividing ϕ(m). It
×
follows that the maximum possible order for an element of (Z/mZ) is ϕ(m). This leads us to the
definition of primitive roots.
×
Definition 13. Let m ∈ N. If there exists g ∈ (Z/mZ) of order ϕ(m), then m is said to have a
primitive root. Any such g is called a primitive root modulo m.
Remark 5. There is another way to define primitive roots that warrants to be mentioned. Given
an invertible congruence class g modulo m, denote by hgi the set of powers of g modulo m. That is
×
hgi ≡m {1, g, g 2 , . . . , g ϕ(m)−1 } ⊆ (Z/mZ) .
(11.1)
Note that since g ϕ(m) ≡m 1, this set consists of all of the integral powers of g modulo m. The
×
primitive roots modulo m are precisely those g ∈ (Z/mZ) for which we have equality rather than
simply containment in (11.1). Indeed, since for primitive roots g modulo m, g ϕ(m) is the first power
of g congruent to 1 modulo m, we see that the elements of hgi = {1, g, g 2 , . . . , g ϕ(m)−1 } are distinct
×
modulo m. This set is therefore a subset of (Z/mZ) having the same number of elements as
×
×
×
(Z/mZ) . It must therefore be equal to the whole of (Z/mZ) . Similarly, for any a ∈ (Z/mZ) ,
the order of a modulo m is equal to #hai modulo m. We therefore always have containment in
(11.1) and equality in case of primitive roots.
We now turn to the determination of the moduli possessing primitive roots. The interest in this
classification is that if m possesses a primitive root g, then we can generate all of the invertible
elements modulo m by taking powers of g. We start by showing that every prime possesses a
primitive root.
×
Lemma 15. Let m ∈ N and a ∈ (Z/mZ)
have order t modulo m. For any k ∈ Z we have
ordm (ak ) =
t
.
(t, k)
In particular, ak and a have the same order modulo p if and only if (t, k) = 1.
Proof. First of all, since
ak
t
(t,k)
= at
k
(t,k)
k
≡m 1 (t,k) = 1,
64
CHAPTER 11. PRIMITIVE ROOTS
we see that
t
.
(t, k)
ordm (ak ) |
k
On the other hand, we have (ak )ordm (a
)
(11.2)
≡m 1 so that
k
akordm (a
)
≡m 1.
It follows that t | kordm (ak ) since t is the order of a modulo m. Dividing by (t, k) yields
k
t
|
ordm (ak ).
(t, k) (t, k)
Finally, since
t
(t,k)
and
k
(t,k)
are relatively prime, we can conclude that
t
| ordm (ak ).
(t, k)
(11.3)
From (11.2) and (11.3) together with the fact that we are dealing with positive quantities, we can
conclude that
t
ordm (ak ) =
(t, k)
as required.
Lemma 16. Let f (x) be a monic (lead coefficient equal to one) polynomial with integer coefficients
of degree n and p be a prime. Then f (x) ≡p 0 has at most n solutions.
Pn−1
Proof. Let f (x) = xn + j=0 cj xj . We start by showing that for a ∈ Z/pZ, f (a) ≡p 0 if and only
if x − a is a factor of f modulo p. It is clear that x − a being a factor of f implies that f (a) ≡p 0.
Conversely, suppose that f (a) ≡p 0. Then
f (x) ≡p f (x) − f (a) ≡p (xn − an ) +
n−1
X
cj (xj − aj ) ≡p (x − a)g(x)
j=1
for some polynomial g of degree n − 1. This is due to the fact that x − a is a factor of x` − a` for
all ` ≥ 1. The result now follows from a simple induction on n. Indeed, if n = 1 then f (x) = x + c0
has a single root, and, if we assume for a given n > 1 that all polynomials such as f of degree at
least 1 and less than n do not have more roots than their degree, and f has degree n, then either
f (x) ≡p 0 has no solutions or it has a solution a which implies that we can write f (x) = (x − a)g(x)
for some monic polynomial g of degree n − 1. Since 1 ≤ n − 1 < n we can then invoke the inductive
hypothesis to obtain that g(x) ≡p 0 has at most n − 1 solutions. It follows that f (x) ≡p 0 has at
most n solutions, as required.
Lemma 17. Let p be prime and d be a positive divisor of p − 1. Then xd ≡p 1 has precisely d
solutions modulo p.
65
Proof. Let r denote the number of solutions to xd ≡p 1. By Lemma 16, we know that there are at
most d solutions to xd ≡p 1. That is
r ≤ d.
(11.4)
On the other hand, since d | p−1 we can write p−1 = de for some e ∈ N and obtain the factorization
xp−1 − 1 = xde − 1 = (xd − 1)
e−1
X
xdj .
j=0
By Fermat’s little theorem, there are precisely p − 1 solutions to xp−1 ≡p 1, and by invoking Lemma
Pe−1
16, we see that j=0 xdj ≡p 0 has at most d(e − 1) = p − 1 − d solutions. It follows that the number
of solutions to xd ≡p 1 is at least (p − 1) − (p − 1 − d) = d. That is
r ≥ d.
(11.5)
Putting (11.4) and (11.5) together yields r = d, as required.
Theorem 22. Let p be a prime and d be a positive divisor of p − 1. Then there are precisely ϕ(d)
elements of Z/pZ× of order d. In particular, there are ϕ(p − 1) primitive roots modulo p.
Proof. Let p be prime and consider the partition of Z/pZ× associated to the equivalence relation
defined by identifying elements having the same order modulo p. We then have
Z/pZ× =
·
[
{a ∈ Z/pZ× | ordp (a) = d}.
d|p−1
Now, for positive divisors d of p − 1, let ψ(d) denote the number of elements in Z/pZ× that have
order d modulo p. We then have
X
p − 1 = #Z/pZ× =
ψ(d).
d|p−1
On the other hand, from Theorem 21 we also have
X
p−1=
ϕ(d).
d|p−1
We conclude that
X
d|p−1
ψ(d) =
X
ϕ(d).
(11.6)
d|p−1
If we can show that ψ(d) ≤ ϕ(d) for all d | p − 1, we would then be able to conclude from (11.6)
that ψ(d) = ϕ(d) for all d | p − 1, as required. Suppose then that d is a positive divisor of p − 1. If
ψ(d) = 0 then ψ(d) < ϕ(d). On the other hand, if ψ(d) ≥ 1, then there exists an element a of order
d modulo p. But then, the d integers 1, a, a2 , . . . , ad−1 are distinct modulo p (lest ak ≡p 1 for
some k < d) and are roots of xd ≡p 1. Since this congruence has only d solutions, we conclude that
these powers of a are all of the solutions. But any element of order d must be a root of xd ≡p 1
and therefore equal to one of 1, a, . . . , ad−1 . But we know how to pick out the powers of a that
have the same order modulo p as a: they are the ones having exponent prime to d. Since there are
ϕ(d) of these, we conclude that ψ(d) = ϕ(d). In any case, we have shown that ψ(d) ≤ ϕ(d) for all
positive divisors d of p − 1, as required.
66
CHAPTER 11. PRIMITIVE ROOTS
At this point, having proved the existence of primitive roots modulo primes, it is natural to
wonder if other moduli possess primitive roots. The answer is yes, and a complete classification of
such moduli is given by the following theorem.
Theorem 23. Let m ∈ N. Then m possesses a primitive root if and only if m = 1, 2, 4, pk , or
2pk where p is an odd prime and k ∈ N. In any case, there are ϕ(ϕ(m)) primitive roots when one
exists.
We now look at an example that illustrates the utility of the results of this section.
×
Example 16. Partition (Z/17Z)
elements of the same order.
into equivalence classes determined by the identification of
Solution. We could proceed simply by raising each of the integers from 1 to 16 to subsequently
×
higher powers until we obtain 1 modulo 17 in order to classify the elements of (Z/17Z) according
to their orders, but in order to get practice with the results of this section, we will go about matters
differently. Since ϕ(17) = 16 = 24 , and Corollary 3 implies that the only possible orders for
×
elements of (Z/17Z) are divisors of ϕ(17), we see that the only possible orders for elements of
×
×
(Z/17Z) are 1, 2, 4, 8, 16. At this point we could compute xa (mod 17) for all x ∈ (Z/17Z) and
a ∈ {1, 2, 4, 8, 16} by using increasing values for a until we first obtain 1 modulo 17 to determine the
×
orders of the elements in (Z/17Z) . This would reduce the workload a little since we have restricted
the exponents that we need to test, but we will continue on examining how to apply the results of this
section. We know, by Theorem 22, that for each divisor d of 16 there are precisely ϕ(d) elements
×
×
of (Z/17Z) of order d. The elements of (Z/17Z) are therefore split up as follows:
Order d Number of elements of (Z/17Z)
1
ϕ(1) = 1
2
ϕ(2) = 1
4
ϕ(4) = 2
8
ϕ(8) = 4
16
ϕ(16) = 8
×
of order d
We also know from Lemma 15 how to determine the orders of powers of an element once we know
the order of the element itself. In particular, once a primitive root is found, we can apply Lemma
×
15 to immediately identify all elements of (Z/17Z) of order d for all d | 16. In searching for a
×
primitive root, we need only find an element of (Z/17Z) whose eighth power is not congruent to
1 modulo 17. This is due to Lemma 14 that tells us that the order of any element a divides every
exponent n for which an ≡ 1 (mod 17). Now, we compute
38 = (34 )2 = (81)2 ≡17 ≡ (−4)2 = 16 6≡ 1
(mod 17).
We conclude that 3 is a primitive root modulo 17. We then invoke Lemma 14 to conclude that for
16
. The characterization is then given by:
1 ≤ a ≤ 16, the order of 3a modulo 17 is equal to gcd(a,16)
67
×
Order d Elements of (Z/17Z)
of order d
1
316
2
38
4
34 , 312
8
32 , 36 , 310 , 314
16
31 , 33 , 35 , 37 , 39 , 311 , 313 , 315
Reducing these powers of 3 modulo 17 yields the partition
×
·
·
·
·
(Z/17Z) = {1} ∪ {16} ∪ {4, 13} ∪ {2, 8, 9, 15} ∪ {3, 5, 6, 7, 10, 11, 12, 14}
where we have written the sets in order of increasing order.
Recall that in the proof of Fermat’s Little Theorem, we came across the product
Y
a ≡p
p−1
Y
a∈Z/pZ×
j = (p − 1)!
j=1
and that we cancelled this factor from both sides of a particular congruence to obtain our desired
result. We subsequently proved Wilson’s Theorem, thereby determining the value of this product
modulo p. That is, we proved that
Y
a ≡p −1.
a∈Z/pZ×
In proving Euler’s generalization of Fermat’s Little Theorem, we came across the analogous product
Y
Y
a ≡m
j
a∈(Z/mZ)×
1≤j≤m, (j,m)=1
and cancelled this factor from both sides of a particular congruence to obtain our desired result.
The question arises as to the value of this product modulo m. One can show by combining Theorem
23 with The Chinese Remainder Theorem and a “singular” version of Hensel’s Lemma that this
product is always congruent to 1 or −1 modulo m and that we obtain −1 precisely when m possesses
primitive roots. Here we will content ourselves with the partial answer provided by the following
proposition.
Proposition 12. Let m ∈ N possess primitive roots. Then
(a) We have x2 ≡m 1 if and only if x ≡m ±1.
×
(b) If m ≥ 3 then −1 is the unique element of (Z/mZ)
Q
(b) We have a∈(Z/mZ)× a ≡m −1.
of order 2.
×
Proof. Suppose that m possesses a primitive root g so that (Z/mZ) = hgi = {1, g, g 2 , . . . , g ϕ(m)−1 }.
68
CHAPTER 11. PRIMITIVE ROOTS
(a), (b) To prove (a) and (b) it is enough to verify (b). To this end, we note that the elements of order
2 are the powers g a , (0 ≤ a < ϕ(m)) for which (a, ϕ(m)) = ϕ(m)
2 . We would then require a
to be an odd multiple of ϕ(m)/2 lying between 0 and ϕ(m) − 1. The only possibility is given
×
by a = ϕ(m)/2. We conclude that there is precisely one element of (Z/mZ) of order two,
ϕ(m)/2
ϕ(m)/2
namely g
. Since −1 is clearly of order two, we must have g
≡m −1.
(c) Using the same argument as was used to prove Wilson’s Theorem, one can show that for any
m
Y
Y
a ≡m
a.
a∈(Z/mZ)×
a∈(Z/mZ)× , ordm (a)=2
Indeed, we can pair off each of the invertible elements modulo m with its inverse to obtain a
product of 1 as long as the element in question is not its own inverse. In our particular case,
there is only one element of order two, namely −1, and so this product is congruent to −1
modulo m, as required.
Chapter 12
Quadratic Congruences
This chapter is based on [Dud08, §11].
In this section we study quadratic congruences modulo odd primes p. That is, we study the
solutions to congruences of the form f (x) ≡p 0 where p is an odd prime and f is a polynomial of
degree two that has integer coefficients. We first reduce our study to the study of congruences of
the form x2 ≡p a. Write
f (x) = ax2 + bx + c
(a, b, c ∈ Z).
If (a, p) 6= 1, then the congruence f (x) ≡p 0 reduces to the linear congruence bx + c ≡p 0 which
we already studied in some depth. We can therefore assume that (a, p) = 1 so that a ∈ Z/pZ× .
Further, by multiplying by the inverse of a modulo p if necessary, we may suppose that f is monic
(has lead coefficient 1). We have therefore arrived at the study of congruences of the form
x2 + bx + c ≡p 0.
(12.1)
The next simplification comes from completing the square in (12.1). If b is odd, we may replace b
by b + p which is even and congruent to b modulo p. Therefore, we may suppose that b is even so
that b = 2d for some integer d. But then, we can rewrite (12.1) as
x2 + 2dx + c = (x + d)2 + (c − d2 ) ≡p 0.
This completes the reduction since being able to solve x2 ≡p d2 − c is equivalent to being able
to solve (x + d)2 + (c − d2 ) ≡p 0 since the solutions of one are simply translates of the solutions of
the other. We illustrate what has been done so far with an example.
Example 17. Find all solutions to 3x2 + 4x + 2 ≡11 0.
Solution. We start by multiplying by 4 which is the inverse of 3 modulo 11. This gives the congruence
12x2 + 16x + 8 ≡11 0 ⇐⇒ x2 + 5x + 8 ≡11 0.
The next step is to prepare for completing the square by replacing 5 with 5 + 11 = 16 so that the
coefficient of x is even. This gives
x2 + 16x + 8 ≡11 0.
69
70
CHAPTER 12. QUADRATIC CONGRUENCES
We now complete the square to obtain
(x + 8)2 + (8 − 64) ≡11 0 ⇐⇒ (x + 8)2 ≡11 1.
We have therefore simplified our congruence to one of the form y 2 ≡11 1. Since this congruence
has solutions y ≡11 1 and y ≡11 −1, we obtain two solutions to our congruence determined by
x + 8 ≡11 1,
x + 8 ≡11 −1.
The two solutions to our congruence 3x2 +4x+2 ≡11 0 are then x ≡11 −7 ≡11 4 and x ≡11 −9 ≡11 2.
We now turn to studying the congruence x2 ≡p a for an odd prime p and arbitrary integer a.
We first note that there are at most two solutions since we are dealing with a monic quadratic
polynomial modulo a prime. We can say more however as is shown by the following proposition.
Proposition 13. Let p be an odd prime and a ∈ Z. The congruence
x 2 ≡p a
has the unique solution x ≡p 0 in case p | a and has either zero or two solutions otherwise.
Proof. It is clear that if p | a then we obtain the unique solution x ≡p 0. On the other hand, if p - a
and b2 ≡p a for some b, we also have (−b)2 ≡p a and b 6≡p −b. Here we have used the fact that p is
odd and p - b. We therefore obtain two distinct solutions modulo p if one exists at all.
The congruence x2 ≡p a has a solution for exactly half of the elements a ∈ Z/pZ× . In fact, we
can distinguish the squares from the non-squares by use of Euler’s criterion. Before stating this,
we require the following definition.
Definition 14. Let m ∈ N and a ∈ Z. If x2 ≡m a has a solution then we call a a quadratic residue
modulo m. Otherwise, a is referred to as a quadratic non-residue modulo m.
We are now ready to state Euler’s criterion.
Theorem 24. Let p be an odd prime.
(a) Exactly half of the invertible elements modulo p are quadratic residues.
(b) For all a ∈ Z/pZ× , we have
a
p−1
2
≡p ±1
(c) (Euler’s criterion) For a ∈ Z/pZ× ,
a
p−1
2
≡p 1 ⇐⇒ a is a quadratic residue modulo p.
Proof. Let g be a primitive root modulo p so that
Z/pZ× = {1, g, . . . , g p−2 }.
71
(a) We show that the powers 1, g 2 , g 4 , . . . , g p−3 of g having even exponents are the quadratic
residues modulo p and that the powers g, g 3 , g 5 , . . . , g p−2 of g having odd exponents are the
quadratic non-residues modulo p. This implies that precisely half of the invertible elements
modulo p are quadratic residues. Let a ≡p g k ∈ Z/pZ× , where 0 ≤ k < p − 1. We need to
prove that a is a quadratic residue modulo p if and only if k is even. Suppose first that a is
a quadratic residue. Then, there exists b ∈ Z/pZ× such that a ≡p b2 . But then, we can write
b ≡p g ` for some 0 ≤ ` < p − 1 so that
g k ≡p a ≡p b2 ≡p (g ` )2 ≡p g 2` .
We conclude that k ≡p−1 2` so that k ≡2 0, as required. Conversely, if k = 2` is even then
a ≡p g k ≡p g 2` ≡p (g ` )2
so that a is a quadratic residue, as required.
(b) Let a ∈ Z/pZ× . Since
a
p−1
2
2
≡p ap−1 ≡p 1,
p−1
we see that a 2 satisfies x2 ≡p 1. Since the only solutions to this congruence are 1 and −1
modulo p we conclude that
p−1
a 2 ≡p ±1
as required.
(c) Let a ∈ Z/pZ× , and suppose that a ≡p g k where 0 ≤ k < p − 1. From Part (b) we know that
a
p−1
2
≡p ±1.
What we need to prove, taking the proof of Part (a) into consideration is that
g k(
p−1
2
) ≡ 1 ⇐⇒ k is even.
p
p−1
We know that g 2 ≡p −1 is the unique element of Z/pZ× of order two and that g p−1 ≡p 1
is the unique element of Z/pZ× of order one. We are therefore reduced to showing that
(
p−1 1 if k is even;
k( 2 )
=
ordp g
2 if k is odd.
But this follows readily since
p−1 ordp g k( 2 ) =
p−1
=
p − 1, k p−1
2
p−1
2
=
=
p−1
(2,
k)
(2,
k)
2
We now illustrate what has been done so far with an example.
(
1
2
if k is even;
if k is odd.
72
CHAPTER 12. QUADRATIC CONGRUENCES
Example 18. Distinguish the quadratic residues modulo 17 from the quadratic non-residues modulo 17 by direct computation of the values modulo 17 taken on by squares. Show that this agrees
with what is obtained by way of Euler’s Criterion.
Solution. The direct computation yields:
12 ≡17 162 ≡17 1
22 ≡17 152 ≡17 4
32 ≡17 142 ≡17 9
42 ≡17 132 ≡17 16
52 ≡17 122 ≡17 8
62 ≡17 112 ≡17 2
72 ≡17 102 ≡17 15
82 ≡17 92 ≡17 13
We conclude that the quadratic residues modulo 17 are 1, 2, 4, 8, 9, 13, 15 and 16 and then quadratic
non-residues modulo 17 are 3, 5, 6, 7, 10, 11, 12 and 14. We now turn to Euler’s criterion. In our
case (p − 1)/2 = 8 and we compute
18 ≡17 168 ≡17 1
28 ≡17 158 ≡17 44 ≡17 (42 )2 ≡17 (16)2 ≡17 1
38 ≡17 148 ≡17 94 ≡17 (92 )2 ≡17 132 ≡17 16 ≡17 −1
48 ≡17 138 ≡17 164 ≡17 1
58 ≡17 128 ≡17 84 ≡17 (82 )2 ≡17 132 ≡17 16 ≡17 −1
68 ≡17 118 ≡17 24 ≡17 16 ≡17 −1
78 ≡17 108 ≡17 154 ≡17 (152 )2 ≡17 42 ≡17 16 ≡17 −1
88 ≡17 98 ≡17 134 ≡17 (132 )2 ≡17 162 ≡17 1
We note that this agrees with the answer obtained by direct computation since we obtain 1 for 1, 2,
4, 8, 9, 13, 15 and 16 and −1 otherwise.
We now define the Legendre symbol which provides us with a convenient notation for distinguishing quadratic residues from quadratic non-residues modulo odd primes.
Definition 15 (Legendre symbol). Let p be an odd prime and a ∈ Z be relatively prime to p. We
define the Legendre symbol (a/p) by
(
1
if a is a quadratic residue modulo p;
a
=
p
−1 if a is a quadratic non-residue modulo p.
By Euler’s criterion, we know that for (a, p) = 1 we have
(
p−1
1
if a is a quadratic residue modulo p;
a 2 ≡p
−1 if a is a quadratic non-residue modulo p.
We can therefore re-write Euler’s criterion using the notation of Definition 15 as
p−1
a
a 2 ≡p
.
p
The following theorem lists some of the properties of the Legendre symbol.
(12.2)
73
Theorem 25. Let p be an odd prime and a, b ∈ Z be relatively prime to p. Then
(a) if a ≡p b then (a/p) = (b/p);
(b) (a2 /p) = 1;
(c) (ab/p) = (a/p)(b/p).
(
1
if p ≡4 1;
(d) (−1/p) =
−1 if p ≡4 3.
Proof. Since p is an odd prime, we have −1 6≡p 1. Therefore, in order to verify an equality of
Legendre symbols, it is enough to verify the corresponding congruence modulo p. Noting this, each
of (a), (b), (c) and (d) follows readily from re-writing the statement using (12.2). Indeed, the
statements become:
(a) if a ≡p b then a
(b) a2
p−1
2
(c) (ab)
p−1
2
(d) (−1)
≡p b
p−1
2
;
≡p 1;
p−1 p−1 ≡p a 2
b 2 .
(
p−1
2
p−1
2
≡p
1
−1
if p ≡4 1;
if p ≡4 3.
Each of these parts is clear except perhaps Part (d). But Part (d) merely expresses the fact that
p−1
2 is even if p ≡4 1 and odd if p ≡4 3.
Part (d) of this theorem tells us that Z/pZ× contains a square root of −1 if and only if p ≡4 1.
We will see that combining Theorem 25 with the law of quadratic reciprocity allows for efficient
computation of Legendre symbols. The law of quadratic reciprocity relates the Legendre symbols
(p/q) and (q/p) for distinct odd primes p and q. It says that unless both of p and q are congruent
to 3 modulo 4, p is a quadratic residue modulo q if and only if q is a quadratic residue modulo p.
In case p ≡4 q ≡4 3, precisely one of p, q is a quadratic residue modulo the other. The precise
statement of this celebrated theorem is given below.
Theorem 26 (The Law of Quadratic Reciprocity). Let p and q be distinct odd primes. Then
(p−1)(q−1)
p
q
4
= (−1)
.
q
p
That is,
 − q
(p−1)(q−1)
p
q
p
4
=
(−1)
= q

q
p
p
if p ≡4 q ≡4 3;
otherwise.
We will prove this theorem in the next section. We will also prove the supplementary result that
classifies the odd primes p for which 2 is a quadratic residue. The answer is given by the following
theorem.
74
CHAPTER 12. QUADRATIC CONGRUENCES
Theorem 27. Let p be an odd prime. Then
(
1
2
=
p
−1
if p ≡8 ±1;
if p ≡8 ±3.
In general, for (a, p) = 1, (a/p) is completely determined by the value of p modulo 4|a|. Theorem
27 illustrates this for the case a = 2. To close this section, we provide some examples that illustrate
the utility of combining Theorems 26 and 27 with Theorem 25 to compute Legendre symbols.
Example 19. Determine whether or not 5335 is a quadratic residue modulo 8209.
Solution. Since 5335 = 5 · 11 · 97 and 8209 is an odd prime not dividing 5335, we can determine
whether or not 5335 is a quadratic residue modulo 8209 by computing the corresponding Legendre
symbol (5335/8209). We note for future reference that 5 ≡4 1, 11 ≡4 3, 97 ≡4 1 and 8209 ≡4 1.
We compute
5 · 11 · 97
5335
=
(Since 5335 = 5 · 11 · 97)
8209
8209
5
11
97
=
(By Theorem 25 Part (c))
8209
8209
8209
8209
8209
8209
(By Theorem 26)
=
5
11
97
4
3
61
=
(By Theorem 25 Part (a))
5
11
97
2 3
61
2
=
5
11
97
3
61
= (1)
(By Theorem 25 Part (b))
11
97
61
3
=
11
97
11
97
= −
(By Theorem 26)
3
61
2
36
=−
(By Theorem 25 Part (a))
3
97
2
6
(By Theorem 27 or Theorem 25 Part (d))
= −(−1)
97
= (1)(1)
(By Theorem 25 Part (b))
= 1.
We conclude that 5335 is a quadratic residue modulo 8209. One can verify that in fact x2 ≡8209 5335
has the solutions x ≡8209 ±1315.
Example 20. Determine the value of
3
p
where p is an odd prime greater than or equal to 5.
75
Solution. By Theorem 26 we have
(
p
p 1
(3−1)(p−1)
p−1
3
p
4
=
=
(−1)
(−1) 2 =
p
3
3
3
−1
if p ≡4 1;
if p ≡4 −1.
But also, we have
(
1
=
3
−1
p
if p ≡3 1;
if p ≡3 −1,
×
since the only quadratic residue in (Z/3Z) is 1. Putting these together yields
(
! (
!
1
if p ≡3 1;
1
if p ≡4 1;
3
=
p
−1 if p ≡3 −1.
−1 if p ≡4 −1.
(
1
if (p ≡3 1 and p ≡4 1) or (p ≡3 −1 and p ≡4 −1);
=
−1 if (p ≡3 1 and p ≡4 −1) or (p ≡3 −1 and p ≡4 1).
However, an application of the Chinese Remainder Theorem shows that
p ≡3 1
p ≡3 −1
⇐⇒ p ≡12 1,
⇐⇒ p ≡12 −1,
p ≡4 1
p ≡4 −1
p ≡3 1
p ≡3 −1
⇐⇒ p ≡12 −5,
⇐⇒ p ≡12 5.
p ≡4 −1
p ≡4 1
We conclude that
(
1
3
=
p
−1
if p ≡12 ±1;
if p ≡12 ±5.
Example 21. Determine whether or not the congruence
x2 ≡159 211
has solutions. If it has solutions, find them all.
Solution. We note that 159 = 3 · 53 and so in order to apply the law of quadratic reciprocity, we
must answer this question modulo 3 and 53 and then apply the Chinese Remainder Theorem to
complete the solution. We start with x2 ≡3 211. We compute
211
1
=
= 1,
3
3
and
211
−1
=
= 1.
53
53
Here we have used the fact that 53 ≡4 1. We conclude that the congruence in question has solutions
modulo both 3 and 53 and therefore has solutions modulo 159. In fact
x2 ≡3 211 ≡3 1 ⇐⇒ x ≡3 ±1,
and
x2 ≡53 211 ≡53 −1 ≡53 529 = 232 ⇐⇒ x ≡53 ±23.
Applying the Chinese Remainder Theorem yields the solutions x ≡159 ±23, ±76.
76
CHAPTER 12. QUADRATIC CONGRUENCES
Chapter 13
Quadratic Reciprocity
This chapter is based on [Dud08, §12].
In this section we prove Gauss’ law of Quadratic Reciprocity. As of 2013, there are 246 known
proofs of this fundamental result. References to each of these proofs can be found at
http://www.rzuser.uni-heidelberg.de/~hb3/rchrono.html.
We give Gauss’ third proof here, following the exposition given in the textbook. It will be
convenient to set some notation before proceeding to the proof.
Notation 6. Let p be an odd prime. We let `p denote the least residue function defined by
`p (n) = the least residue of n modulo p
(n ∈ Z).
Let a be an integer relatively prime to p. Throughout this section we use the following notation:
Lp (a) = {`p (ka) | 1 ≤ k ≤ (p − 1)/2} ,
L>
p (a) = {x ∈ Lp (a) | x > (p − 1)/2} ,
L≤
p (a) = {x ∈ Lp (a) | x ≤ (p − 1)/2} ,
(p−1)/2 X
ka
.
Sp (a) =
p
k=1
Here, for a real number x, bxc denotes the floor of x equal to the greatest integer less than or equal
to x. We also use the notation {x} to denote the fractional part of x equal to x − bxc. Recall that
the division algorithm for dividing ka by p with remainder can be written as
ka
ka = p
+ `p (ka)
p
We then have `p (ka) = p
n
ka
p
o
.
77
78
CHAPTER 13. QUADRATIC RECIPROCITY
The first result of this section tells us that when we multiply 1, . . . , (p − 1)/2 by a, the numbers
between 1 and (p − 1)/2 that do not occur as a least residue of one of the multiples in question are
covered by subtracting from p the least residues of the multiples that are greater than (p − 1)/2.
Lemma 18. Let p be an odd prime and a be an integer relatively prime to p. Then
·
>
{1, 2, . . . , (p − 1)/2} = L≤
p (a) ∪ (p − Lp (a)),
>
where p − L>
p (a) denotes the set of all p − x for x ∈ Lp (a).
>
Proof. We need only show that L≤
p (a) ∩ (p − Lp (a)) = ∅. Indeed, since multiplication by a is
known to permute the invertible elements modulo p, we would then be able to conclude that
·
>
L≤
p (a) ∪ (p − Lp (a)) is a subset of {1, 2, . . . , (p − 1)/2} containing (p − 1)/2 elements. We would
then obtain the equality we are after. Suppose then that for some 1 ≤ k, ` ≤ (p − 1)/2 we have
ka ≡p p − `a.
We would then have
(k + `)a ≡p 0
so that, since p is prime, k + ` ≡p 0 or a ≡p 0. Since we know that a 6≡p 0, this implies that
k + ` ≡p 0. However, 1 < k + ` < p and so this is impossible. We conclude by contradiction that
the union is disjoint, as required.
Now, Lemma 18 provides us with two distinct representations of the same set of invertible
elements modulo p. We take our cue from the proofs of Fermat’s Little Theorem and Euler’s
Theorem and multiply together the elements of the set in question and cancel a particular factor
to obtain a significant congruence. We obtain Gauss’ Lemma as a result.
Theorem 28 (Gauss’ Lemma). Let p be an odd prime and a be an integer relatively prime to p.
Then, we have
>
a
= (−1)#Lp (a) .
p
That is, a is a quadratic residue modulo p if and only if #L>
p (a) is even.
Proof. To prove Gauss’ Lemma, we use a familiar trick: we multiply together invertible elements
using two different characterizations of the elements and then cancel a common factor from both
sides of the resulting congruence. In this case, we invoke Lemma 18 to write
·
>
{1, 2, . . . , (p − 1)/2} = L≤
p (a) ∪ (p − Lp (a)).
Multiplying together the elements of this set yields
79



Y
Y
p−1
! ≡p 
(ak) 
(p − ak)
2
>
≤
`p (ak)∈Lp (a)
`p (ak)∈Lp (a)



Y
Y
≡p 
(ak) 
(−ak)
`p (ak)∈L>
p (a)
≤
`p (ak)∈Lp (a)
>
Y
= (−1)#Lp (a)
(ak)
`p (ak)∈Lp (a)
(p−1)/2
>
Y
= (−1)#Lp (a)
(ak)
k=1
(p−1)/2
>
Y
= (−1)#Lp (a) a(p−1)/2
k
k=1
= (−1)
#L>
p (a) (p−1)/2
a
p−1
!
2
Cancelling the (invertible element) [(p − 1)/2]! from both sides yields
>
a(p−1)/2 ≡p (−1)#Lp (a) .
Finally, we can invoke Euler’s criterion to conclude that
>
a
≡p a(p−1)/2 ≡p (−1)#Lp (a) ,
p
so that, since p is odd,
>
a
= (−1)#Lp (a)
p
as required.
We now have all that is required to prove Theorem 27 that determines the value of (2/p):
Proof of Theorem 27. Recall that Theorem 27 determined the value of (2/p) for an odd prime p as
(
1
2
=
p
−1
if p ≡8 ±1;
if p ≡8 ±3.
We now prove this claim by invoking Gauss’ Lemma (Theorem 28). We need to determine the
parity of #L>
p (2). The multiples of 2 in question are
2, 4, 6, . . . , p − 1.
80
CHAPTER 13. QUADRATIC RECIPROCITY
These are already least residues modulo p and so we need only count how many of 2, 4, . . . , p − 1
are greater than p−1
2 . It is clear that the multiples in question that satisfy this condition are
(
p−1
2
p−1
2
+ 2, p−1
2 + 4, . . . , p − 1
p−1
+ 1, 2 + 3, . . . , p − 1
Therefore, the number of such integers is
(
p−1
4
p+1
4
if
if
p−1
2
p−1
2
is even.
is odd.
if p ≡4 1;
if p ≡4 3.
This number is even if p ≡8 ±1 and odd if p ≡8 ±3, as required.
The next result we will need to prove Gauss’ law of Quadratic Reciprocity is the following
lemma.
Lemma 19. Let p be an odd prime and a be an odd integer relatively prime to p. Then
Sp (a) ≡2 #L>
p (a).
Proof. We compute
(p−1)/2 pSp (a) = p
X
k=1
ka
p
(p−1)/2 =p
ka
−
p
X
k=1
ka
p
(p−1)/2
(p−1)/2
X
X
=
(ka) −
`p (ka)
k=1
k=1
(p−1)/2
=a
X
X
k−
`p (ka).
`p (ka)∈L>
p (a)
≤
k=1
X
`p (ka) −
`p (ka)∈Lp (a)
But, since
·
≤
{1, 2, . . . , (p − 1)/2} = L>
p (a) ∪ (p − Lp (a)),
we have
(p−1)/2
X
≤
`p (ka)∈Lp (a)
`p (ka) =
X
k=1

(p−1)/2
=
X
k−
X
k=1
(p − `p (ka))
`p (ka)∈L>
p (a)

k  − p · #L>
p (a) +
X
`p (ka)∈L>
p (a)
`p (ka).
81
We conclude that
(p−1)/2
pSp (a) = a
X
k=1
(p−1)/2
=a
X
X
k−
(p−1)/2
k−
k=1
X
`p (ka)
`p (ka)∈L>
p (a)
≤
`p (ka)∈Lp (a)

X
`p (ka) −

X
k  + p#L>
p (a) − 2
`p (ka)
`p (ka)∈L>
p (a)
k=1
(p−1)/2
= p#L>
p (a) + (a − 1)
X
X
k−2
`p (ka)
`p (ka)∈L>
p (a)
k=1
Taking this equation modulo 2 and using the fact that a ≡2 1 yields
Sp (a) ≡2 #L>
p (a)
as required.
The final result needed in our proof of Gauss’ Law of Quadratic Reciprocity is the following
theorem.
Theorem 29. Let p and q be distinct odd primes. Then
Sp (q) + Sq (p) =
(p − 1)(q − 1)
.
4
Proof. Consider the line segment given by
y=
q
x,
p
0<x≤
p−1
2
.
We know that the total number of points (x, y) for x, y ∈ Z and 1 ≤ x ≤ (p−1)/2, 1 ≤ y ≤ (q −1)/2
is equal to
p−1
q−1
(p − 1)(q − 1)
=
.
2
2
4
We will complete the proof by counting these integer points in a different way and obtaining a total
of Sp (q) + Sq (p). We will split the grid of integer points in question into three classes based on
where they lie with respect to the line y = (q/p)x:
(On the line:) We first note that no integer point under consideration can lie on the line.
Indeed, if x and y are integers and y = (q/p)x, then x would be a multiple of p which is
impossible for 1 ≤ x ≤ (p − 1)/2.
(Below the line:) Here we need to count the number of points (x, y) with integer coordinates
having 1 ≤ x ≤ (p − 1)/2 and 1 ≤ y < (q/p)x. Since we have seen that no integer point
in question lies on the line, we can replace the condition 1 ≤ y < (q/p)x by the condition
82
CHAPTER 13. QUADRATIC RECIPROCITY
1 ≤ y ≤ b(q/p)xc. For each value of x there are exactly b(q/p)xc such points. We conclude
that the total number of such points is given by
(p−1)/2 X
x=1
qx
= Sp (q).
p
(Above the line:) Here, similarly to counting the points below the line, we see that we need
to count the number of points (x, y) with integer coordinates having 1 ≤ y ≤ (q − 1)/2 and
1 ≤ x ≤ b(p/q)yc. The total number is then given by
(q−1)/2 X
y=1
py
q
= Sq (p).
We have therefore shown that the total number of integer points (x, y) with 1 ≤ x ≤ (p − 1)/2 and
1 ≤ y ≤ (q − 1)/2 is equal to Sp (q) + Sq (p), as required.
We are now prepared to prove Gauss’ law of Quadratic Reciprocity.
Proof of Gauss’ law of Quadratic Reciprocity (Theorem 26). We have
>
>
p
q
= (−1)#Lq (p) (−1)#Lp (q)
(By Theorem 28)
q
p
= (−1)Sq (p) (−1)Sp (q)
= (−1)
= (−1)
(By Lemma 19)
Sq (p)+Sp (q)
(p−1)(q−1)
4
(By Theorem 29)
as required.
We close this section with a couple of results, the first of which gives sufficient conditions for 2
to be a primitive root modulo a prime, and the second of which provides a generalization of Euler’s
criterion relevant in the search for cubic residues.
Proposition 14. Let p and q be primes such that q = 4p + 1. Then 2 is a primitive root modulo q.
Proof. Assume the hypotheses. We know that the order of 2 modulo q must divide ϕ(q) which is
equal to q − 1 since q is prime. Since q − 1 = 4p, we obtain
ordq (2) ∈ {1, 2, 4, p, 2p, 4p}.
In order to prove that 2 is indeed a primitive root modulo q, we need to eliminate the first five
possibilities. Also, since any possibility that is eliminated automatically eliminates all of its divisors,
we are reduced to showing that 22p and 24 are not congruent to 1 modulo q. Well, we have
q−1
2
2p
2
2 =2
≡q
q
83
by Euler’s criterion. By Theorem 27, we also know that
(
1
if q ≡8 ±1;
2
=
q
−1 if q ≡8 ±3.
Since p ≡2 1, we see that 4p ≡8 4 so that q = 4p + 1 ≡8 5 ≡8 −3. We therefore have (2/q) = −1 so
that
2
22p ≡q
≡q −1.
q
We have therefore ruled out the cases 1, 2, p and 2p from contention for the order of 2 modulo q.
We complete the proof by showing that 24 6≡q 1 thereby forcing ordq (2) = 4p = ϕ(q), as required.
But this is straight forward. Indeed,
24 ≡q 1 =⇒ q | (24 − 1) = 15.
In turn, this forces q ∈ {3, 5} which is impossible since q = 4p + 1 > 5. All in all, we have shown
that the least power a for which 2a ≡q 1 is a = 4p = ϕ(q) so that 2 is a primitive root modulo q,
as required.
We have seen that when p ≡2 1, the quotient (p − 1)/2 can be formed and that this quotient,
due to Euler’s criterion, when employed as an exponent allows us to distinguish quadratic residues
modulo p from quadratic nonresidues modulo p. It might be expected that when p ≡3 1 so that the
quotient (p − 1)/3 can be formed that there is a generalization of Euler’s criterion which will allow
us to use this quotient as an exponent to distinguish cubic residues modulo p from cubic nonresidues
modulo p. Here the term cubic residue is used to describe the invertible elements modulo p that
can be written as a cube modulo p. This is in fact the case, and we will close this section by stating
and proving this generalization of Euler’s criterion. It should be noted that there is nothing special
here about 2 or 3. One can reproduce the arguments used in the proof of Euler’s criterion to obtain
a generalization that can be used to distinguish q-th power residues from q-th power nonresidues
for any prime q. We note first that saying a prime is congruent to 1 modulo 3 is the same as saying
that a prime is congruent to 1 modulo 6 since any such prime must be odd.
Proposition 15. Let p be prime. If p 6≡6 1 then every element of Z/pZ× is a cubic residue. On
the other hand, if p ≡6 1 then a ∈ Z/pZ× is a cubic residue if and only if
a
p−1
3
≡p 1.
Proof. Let g be a primitive root modulo p. Then
ordp (g)
p−1
ordp (g 3 ) =
=
=
(ordp (g), 3)
(p − 1, 3)
(
p−1
p−1
3
if 3 - (p − 1)
.
if 3 | (p − 1)
That is,
(
3
ordp (g ) =
p−1
p−1
3
if p 6≡3 1
=
if p ≡3 1
(
p−1
p−1
3
if p 6≡6 1
if p ≡6 1
We conclude that for p 6≡6 1, the element g 3 is a primitive root modulo p so that every invertible
element can be written as a power of g 3 . Since every power of g 3 is a cube, we see that in this case
84
CHAPTER 13. QUADRATIC RECIPROCITY
every invertible element is a cubic residue. Conversely, suppose that p ≡6 1. In this case, g 3 has
×
order p−1
3 . We now complete the proof by showing that for a ∈ Z/pZ , a is a cubic residue modulo
(p−1)/3
k
p if and only if a
≡p 1. Let a = g for some k. We have
a(p−1)/3 ≡p 1 ⇐⇒ ordp a(p−1)/3 = 1
⇐⇒ ordp g k(p−1)/3 = 1
⇐⇒ ⇐⇒ ⇐⇒
ordp (g)
ordp (g), k(p−1)
3
p−1
p − 1, k(p−1)
3
=1
=1
p−1
=1
(3, k)
p−1
3
3
=1
(3, k)
⇐⇒ (3, k) = 3
⇐⇒
⇐⇒ k ≡3 0.
We are therefore reduced to proving that g k is a cubic residue if and only if k ≡3 0. But this is
clear since 3 | p − 1 and this allows us to construct the following chain of equivalences:
g k is a cubic residue modulo p ⇐⇒ g k ≡p (g ` )3 for some `
⇐⇒ k ≡p−1 3` for some `
⇐⇒ k ≡3 0.
Remark 7. Note that we can re-state Euler’s criterion as follows. Let p be prime. If p 6≡2 1 then
every element of Z/pZ× is a quadratic residue. On the other hand, if p ≡2 1 then a ∈ Z/pZ× is a
quadratic residue if and only if
p−1
a 2 ≡p 1.
This makes it reasonable to consider Proposition 15 as a generalization of Euler’s criterion.
We have another result related to cubic residues that is analogous to the result that for primes
p, x2 ≡p 1 has a nontrivial solution if and only if p is odd.
Proposition 16. Let p be a prime. Then
x 3 ≡p 1
has nontrivial solutions if and only if p ≡6 1.
Proof. Suppose first that x3 ≡p 1 has a nontrivial solution a 6= 1. Then ordp (a) = 3 so that
3 | ϕ(p) = p − 1. This shows that p ≡3 1. However, as p 6= 2, we see that p ≡2 1 as well so that
p ≡6 1, as required. Conversely, suppose that p ≡6 1. Consider the factorization
x3 − 1 = (x − 1)(x2 + x + 1).
85
We show that x3 ≡p 1 has a nontrivial solution by proving that x2 + x + 1 has a nontrivial solution.
We do this by completing the square and invoking quadratic reciprocity. We have
4x2 + 4x + 4 = (2x + 1)2 + 3.
Therefore, we obtain nontrivial solutions if and only if (2x + 1)2 ≡p −3 has a solution. That is, we
obtain nontrivial solutions if and only if (−3/p) = 1. We now turn to computing the value of this
Legendre symbol.
−1
3
−3
=
p
p
p
(
! (
!
1
if p ≡4 1
1
if p ≡12 ±1
=
−1 if p ≡4 −1
−1 if p ≡12 ±5
(
1
if p ≡12 1 or 7
=
−1 if p ≡12 5 or 11
(
1
if p ≡6 1
=
−1 if p ≡6 −1
We conclude that x3 ≡p 1 has nontrivial solutions if and only if p ≡6 1, as required.
86
CHAPTER 13. QUADRATIC RECIPROCITY
Chapter 14
Pythagorean Triangles
This chapter is based on [Dud08, §16].
The goal of this section is to find all integer solutions to Pythagoras’ quadratic diophantine
equation
x2 + y 2 = z 2 .
We first note that if d = (x, y), then d | z. We could then divide through by d2 to obtain
x 2 y 2 z 2
+
=
,
d
d
d
where xd , yd = 1. Therefore, we may suppose that x and y are relatively prime. Indeed, if we
can solve the equation in this case, the general case is obtained by simply multiplying our solution
by the relevant greatest common divisor. We have therefore reduced our problem to finding the
integer solutions to
x2 + y 2 = z 2 ,
(x, y) = 1.
(14.1)
Next, we note that if any prime divides two of x, y, z, then it must also divide the third. In
particular, we see that in the case given by (14.1), we also have (x, z) = (y, z) = 1. We have
therefore arrived at the study of Pythagorean triples x, y, z (which are solutions to x2 + y 2 = z 2
in integers) that are relatively prime in pairs. Finally, it is clear that the solutions come in pairs
since for any w, (−w)2 = w2 . All in all, we can find all Pythagorean triples as long as we can
find all fundamental Pythagorean triples, where the fundamental Pythagorean triples are defined
as follows:
Definition 16. A triple (a, b, c) of integers is called a fundamental Pythagorean triple if a, b, c are
positive, a2 + b2 = c2 and (a, b) = 1.
As remarked above, for fundamental Pythagorean triples (a, b, c), the condition (a, b) = 1 is
equivalent to the condition that a, b and c are relatively prime in pairs. Further, if S denotes the
set of all fundamental Pythagorean triples, the set of all solutions to x2 +y 2 = z 2 in positive integers
is given by
{(da, db, dc) | (a, b, c) ∈ S and d ∈ N}.
We turn now to the determination of all fundamental Pythagorean triples. We will use the
results of the following proposition.
87
88
CHAPTER 14. PYTHAGOREAN TRIANGLES
Proposition 17. Let (a, b, c) denote a fundamental Pythagorean triple so that a, b, c ∈ N, (a, b) =
(a, c) = (b, c) = 1 and a2 + b2 = c2 . Then exactly one of a, b is even while the other is odd and c is
odd.
Proof. First of all, since a, b, c are relatively prime in pairs, we see that at most one of a, b, c is
even. On the other hand, a, b and c cannot all be odd since c2 = a2 + b2 and so if a and b were
both odd, c would have to be even. We conclude that exactly one of a, b, c is even while the other
two are odd. We complete the proof by showing that c cannot be the one that is even. We do this
by considering our equation modulo 4, and recalling that the even squares are 0 modulo 4 while
the odd squares are 1 modulo 4. If a and b were odd and c were even, we would have
a2 + b2 ≡4 1 + 1 ≡4 2
while
c2 ≡4 0.
This contradiction completes the proof that in any fundamental Pythagorean triple, (a, b, c), c is
odd and exactly one of a, b is even.
In light of Proposition 17, and due to symmetry, we may, and do, suppose that a is even and
that b and c are odd for the remainder of this section. Here, as well as elsewhere, the concept of
p-adic valuations will prove useful. In order to define this, we first make some preliminary remarks.
Let
nm o
Q=
m ∈ Z, n ∈ N, (m, n) = 1
n
denote the set of all rational numbers. The fundamental theorem of arithmetic can then be seen to
apply to the set of nonzero rational numbers as follows.
Theorem 30 (Fundamental Theorem of Arithmetic for Rationals). Let P denote the set of all
primes. Then every nonzero rational number x can be written uniquely in the form
Y
x=±
pvp (x) ,
(14.2)
p∈P
for integers v2 (x), v3 (x), v5 (x), . . . of which only finitely many are nonzero.
Proof. We first note that it is sufficient to prove the result for positive rationals x. Indeed, if we
can prove the result for positive values of x, then we’d obtain the result for negative values of x
simply by introducing a minus sign. Suppose then that x = m
n for positive integers m and n such
that (m, n) = 1. By the fundamental theorem of arithmetic (for integers) we know that m and n
have unique representations of the form given by (14.2) using nonnegative exponents. That is, we
have
Y
Y
m=
pvp (m) ,
n=
pvp (n)
p∈P
p∈P
for uniquely determined integers v2 (m), v3 (m), v5 (m), . . . , v2 (n), v3 (n), v5 (n), · · · ≥ 0 of which only
finitely many are nonzero. We then obtain
Q
vp (m)
Y
m
p∈P p
= Q
=
pvp (m)−vp (n) .
vp (n)
n
p∈P p
p∈P
89
Since only finitely many of the differences vp (m) − vp (n) are nonzero, we can set vp
vp (n) to see that
Y
Y
m
m
=
pvp (m)−vp (n) =
pvp ( n )
n
p∈P
m
n
= vp (m) −
p∈P
has at least one representation of the form given by (14.2). To prove uniqueness, we proceed as
usual by assuming that we have two potentially different representations of m
n in the form given by
(14.2) and then prove that they are in fact equal. To do this, we use the fact that every rational
number x can be written uniquely in the form m
n for relatively prime integers m and n with n ≥ 1.
To see this, we simply divide out all common factors from the numerator and denominator of x and
arrange for the minus sign, if it is present, to be attached to the numerator of x. Suppose then that
Y
Y 0 m
m
m
=
p vp ( n ) =
p vp ( n )
n
p∈P
p∈P
0 m
for integers v2 n , v3 n , v5 n , . . . , v20 n , v30 m
n , v5 n , . . . of which only finitely many
0 m
are nonzero. We need to prove that vp m
n = vp n for all primes p. We do this as follows. Let
m
P+ denote the set of primes for which vp m
n ≥ 0, P− denote the set of primes for which vp n < 0
0
0
and define P+ , P− similarly. We then have
m
m
m
m
Q
Q
vp0 ( m
vp ( m
n )
n )
p∈P0+ p
m
p∈P+ p
=Q
=
.
Q
m
−vp0 ( m
n
n )
p−vp ( n )
0 p
p∈P−
p∈P−
From the uniqueness of the representation of rational numbers into quotients of relatively prime
integers, we conclude that
Y 0 m
Y
m
pvp ( n )
m=
pvp ( n ) =
p∈P0+
p∈P+
and
n=
Y
p∈P−
p−vp ( n ) =
m
Y
0
p−vp ( n ) .
m
p∈P0−
We now invoke the uniqueness part of the fundamental theorem of arithmetic to conclude that
m
m
vp
= vp0
n
n
for all primes p, as required.
We now come to the definition of the p-adic valuation on the set Q of rationals. The p-adic
valuation of nonzero rational numbers x will be defined to be the exponent vp (x) that appears in its
prime-power factorization given by (14.2). Since it will be convenient to have the p-adic valuation
defined for all rationals, including zero, we seek a reasonable definition for vp (0). To this end, we
note that for nonzero rationals x, vp (x) is equal to the largest power of p that divides x. Here, this
largest power is the difference of the largest power appearing in the factorization of the numerator
of x and the largest power appearing in the factorization of the denominator of x. It would then
be reasonable to define vp (0) to be “the largest power of p that divides 0.” Since every power of
90
CHAPTER 14. PYTHAGOREAN TRIANGLES
p divides 0, it seems reasonable to define vp (0) = ∞. We therefore adopt the conventions that the
values of vp lie in Z ∪ {∞} and ∞ is the maximum element of this set. We also adopt the convention
that ∞ + a = a + ∞ = ∞ for all a ∈ Z ∪ {∞}. This discussion leads us to the formal definition of
p-adic valuation.
Definition 17 (p-adic valuation). Let x ∈ Q. We define the p-adic valuation of x, denoted vp (x),
to be equal to ∞ if x = 0 and equal to the power of p appearing in the prime-power factorization
of x given by (14.2) otherwise.
Before proceeding, we first note an alternative way of defining the p-adic valuation of nonzero
rational numbers x. Given a nonzero rational number x, we note that vp (x) is the unique integer
for which x can be written in the form
x = pvp (x)
m0
,
n0
(p - m0 , n0 ).
This is saying nothing more than the fact that given any nonzero rational number x, we can factor
the largest power of p appearing in the numerator and denominator and be left with a new numerator
and denominator that are relatively prime to p. Since the factorizations of the numerator m0 and
denominator n0 that are left over will not contain a power of p, it is clear that they will be relatively
prime to p.
Aside 1. Let p be a prime. By defining | · |p on the set of rational numbers by
|x|p = p−vp (x) ,
(x ∈ Q),
we get the p-adic absolute value which satisfies the same fundamental properties as the usual
absolute value. It takes on only nonnegative values, is only zero when the input is zero, is completely
multiplicative, and satisfies (a stronger version of) the triangle inequality. If we add to the set of
rationals all numbers we can obtain as limits of (Cauchy) sequences of rationals with respect to
the usual absolute value we obtain the field R of real numbers. In exactly the same way, if we
add to the set of rationals all numbers we can obtain as limits of (Cauchy) sequences of rationals
with respect to the p-adic absolute value we obtain the field Qp of p-adic numbers. These fields are
fundamental in the study of more advanced topics in number theory. As a particular instance of
this, we will be able to completely characterize the integers that can be written as the sum of two
or four squares by using methods developed in the text. However, in order to classify those integers
that can be written as the sum of three squares, one needs to use the fields Qp mentioned above.
Further remarks in this direction will be made when we study sums of squares.
We now state a proposition that gives the fundamental properties satisfied by the p-adic valuation (as well as every other “non-archimedean valuation”).
Proposition 18. Let p be a prime. The p-adic valuation vp satisfies the following properties for
all x, y ∈ Q:
(a) vp (x) ≤ ∞.
(b) vp (x) = ∞ if and only if x = 0.
(c) vp (xy) = vp (x) + vp (y).
(d) If y 6= 0, vp xy = vp (x) − vp (y).
91
(e) vp (xa ) = avp (x) for all a ∈ Z for which xa is defined.
(f) vp (x + y) ≥ min{vp (x), vp (y)}.
(g) If vp (x) 6= vp (y) then vp (x + y) = min{vp (x), vp (y)}.
Proof. Let p be a prime and x, y ∈ Q.
(a) This is clear from the definition of the p-adic valuation vp .
(b) This is also clear from the definition of the p-adic valuation vp .
(c) First suppose that at least one of x, y is equal to zero. In this case, since ∞ + a = a + ∞ = ∞
for all a ∈ Z ∪ {∞}, we see that both sides of the proposed equality are equal to ∞. Suppose
then that x 6= 0 and y 6= 0. We can write
x = pvp (x)
m
,
n
y = pvp (y)
m0
n0
(p - m, n, m0 , n0 ).
But then,
mm0
,
(p - mm0 , nn0 ).
nn0
We conclude that vp (xy) = vp (x) + vp (y) as was to be shown.
xy = pvp (x)+vp (y)
(d) Suppose that y 6= 0 so that
1
y
∈ Q. We can write
y = pvp (y)
m
,
n
(p - m, n).
But then,
1
n
= p−vp (y) ,
y
m
and so
vp
(p - n, m),
1
= −vp (y).
y
(14.3)
Combining this with Part (c) gives us our result:
x
1
1
vp
= vp x ·
= vp (x) + vp
= vp (x) − vp (y).
y
y
y
(e) This follows from Parts (c) and (d) by a routine induction. Indeed, the cases a ∈ {0, 1} are
clear, and if we assume it holds for a fixed value of a ≥ 1, then we obtain from Part (c) that
vp xa+1 = vp (xa · x) = vp (xa ) + vp (x) = avp (x) + vp (x) = (a + 1)vp (x).
We conclude by induction that our result holds for all integers a ≥ 0. Finally, if a < 0 and
xa is defined, we must have x 6= 0. We then have −a > 0 so that, by what we just proved
together with (14.3) ,
1
vp (xa ) = vp
= −vp x−a = −(−a)vp (x) = avp (x).
−a
x
92
CHAPTER 14. PYTHAGOREAN TRIANGLES
(f), (g) Suppose first that at least one of x, y is equal to zero. Without loss of generality, we can then
suppose that x = 0. We then see that
vp (x + y) = vp (0 + y) = vp (y) = min{∞, vp (y)} = min{vp (x), vp (y)}.
The inequality in question therefore holds in this case. We are therefore reduced to the case
where neither x nor y is equal to zero. In this case, we can write
x = pvp (x)
m
,
n
y = pvp (y)
m0
n0
(p - m, n, m0 , n0 ).
By symmetry, we may suppose, without loss of generality, that vp (x) = min{vp (x), vp (y)}.
Let a = vp (y) − vp (x) ≥ 0. We then have
m
(mn0 + pa m0 n)
m0
m0
m
.
+ pa 0 = pvp (x)
x + y = pvp (x) + pvp (y) 0 = pvp (x)
n
n
n
n
nn0
Now, since p - nn0 , we have
vp
mn0 + pa m0 n
nn0
≥ 0.
Consequently,
+ pa m0 n)
vp (x + y) = vp p
nn0
mn0 + pa m0 n
vp (x)
= vp p
+ vp
nn0
≥ vp pvp (x) + 0
vp (x) (mn
0
(From Part (c))
(14.4)
= vp (x) + 0
= vp (x)
= min{vp (x), vp (y)}.
Finally, we note that when vp (x) 6= vp (y), we have a ≥ 1. We conclude that
p - (mn0 + pa m0 n),
so that
vp
mn0 + pa m0 n
nn0
p - nn0
= 0.
This allows us to replace the inequality ≥ in (14.4) with equality thereby obtaining
vp (x + y) = min{vp (x), vp (y)}
as required.
We now illustrate the utility of the p-adic valuation by proving a lemma that provides us with
the final piece needed to obtain our classification of fundamental Pythagorean triples.
93
Lemma 20. Let s and t be relatively prime positive integers and suppose that
st = r2
for some positive integer r. Then both s and t are squares. That is, there exist positive integers m
and n such that
s = m2 ,
t = n2 .
Further, we have (m, n) = 1.
Proof. Assume the hypotheses and let p be a prime. Applying the p-adic valuation to both sides
of st = r2 yields
vp (s) + vp (t) = 2vp (r) ≡2 0.
(14.5)
Since (s, t) = 1 we see that at least one of vp (s), vp (t) is equal to zero. Therefore, we can conclude
from (14.5) that both of vp (s) and vp (t) are even. Every exponent appearing in the factorizations
of s and t is therefore even so that s and t are both squares. Finally, if s = m2 and t = n2 , then m
and n must be relatively prime since s and t are relatively prime.
We now have all that is required to give the complete classification of all fundamental Pythagorean
triples.
Theorem 31. Let (a, b, c) be a triple of integers with a even. Then (a, b, c) is a fundamental
Pythagorean triple if and only if there exist positive integers m > n with (m, n) = 1 and m 6≡2 n
such that
a = 2mn,
(14.6)
2
2
b=m −n ,
(14.7)
2
2
(14.8)
c=m +n .
Proof. First suppose that we have positive integers m > n that are relatively prime and of opposite
parity (m 6≡2 n). It is then clear that a, b and c defined by equations (14.6), (14.7) and (14.8) are
positive integers. Further, we see that
a2 + b2 = (2mn)2 + (m2 − n2 )2 = m4 + 2m2 n2 + n4 = (m2 + n2 )2 = c2 .
Finally, we need to verify that (a, b) = 1. But this is clear since a = 2mn is even while b = m2 − n2
is odd so that any common prime divisor of a and b would have to divide both m and n thereby
contradicting (m, n) = 1. (Here is where we use the assumption that m 6≡2 n so that exactly
one of m, n is even while the other is odd). Conversely, suppose that (a, b, c) is a fundamental
Pythagorean triple. We need to show that there exist positive integers m > n that are relatively
prime and of opposite parity such that (14.6), (14.7) and (14.8) hold. We write a = 2r and then
rewrite a2 + b2 = c2 as
4r2 = c2 − b2 = (c − b)(c + b).
(14.9)
Since b and c are both odd, we see that c−b and c+b are both even. Write c−b = 2t and c+b = 2s.
Then Equation (14.9) reads
4r2 = (2s)(2t) = 4st,
which simplifies to
r2 = st.
94
CHAPTER 14. PYTHAGOREAN TRIANGLES
We now show that (s, t) = 1 in preparation of invoking Lemma 20. First of all, we note that
c = b + 2t = 2s − c + 2t =⇒ c = s + t,
and that similarly,
b = s − t.
Suppose then that some prime p divides both s and t. Then p divides both b and c thereby
contradicting the relative primality of b and c. We conclude by contradiction that (s, t) = 1. We
now invoke Lemma 20 to write
s = m2 ,
t = n2
for some relatively prime positive integers m and n. Since r2 = m2 n2 and each of r, m, n is positive,
we obtain
a = 2r = 2mn.
Further,
b = s − t = m2 − n2 ,
c = s + t = m2 + n2 .
Since b is positive, we must have m > n, and since b is odd, we must have m 6≡2 n. This completes
the proof.
As a corollary to Theorem 31, we get the complete classification of solutions to x2 + y 2 = z 2 is
integers.
Corollary 4. Let x, y, z ∈ Z. Then x2 + y 2 = z 2 if and only if one of (x, y, z), (y, x, z) can be
written as
(2d1 mn, d2 (m2 − n2 ), d3 (m2 + n2 ))
for d, m, n ∈ N0 , m > n, (m, n) = 1, m 6≡2 n, 1 , 2 , 3 ∈ {1, −1}.
Proof. Let S be the set of all triples of the form
(2d1 mn, d2 (m2 − n2 ), d3 (m2 + n2 ))
for d, m, n ∈ N0 , m > n, (m, n) = 1, m 6≡2 n, 1 , 2 , 3 ∈ {1, −1}. First of all, for 1 , 2 , 3 ∈
{1, −1}, and any integers d, m, n, we have
2
2
(2d1 mn) + d2 (m2 − n2 ) = 4d2 m2 n2 + d2 (m2 − n2 )2
= 4d2 m2 n2 + d2 (m4 − 2m2 n2 + n4 )
= d2 m4 + 2m2 n2 + n4
= d2 (m2 + n2 )2
2
= d(m2 + n2 )
2
= d3 (m2 + n2 )) .
Therefore every element of S gives a solution. We are therefore reduced to proving that x2 +y 2 = z 2
implies that one of (x, y, z), (y, x, z) lies in S. We will do this by first looking at some trivial cases
95
and then invoking Theorem 31 to deal with the other cases. The trivial cases arising when at least
one of x, y is equal to zero are dealt with as follows:
(x, y, z)
(0, b, b), (b, 0, b), b ∈ Z
(0, b, −b), (b, 0, −b), b ∈ Z
Values of parameters showing one of (x, y, z), (y, x, z) ∈ S
b
m = 1, n = 0, d = |b|, 2 = 3 = |b|
b
b
m = 1, n = 0, d = |b|, 2 = |b|
, 3 = − |b|
2
2
2
Now suppose that neither x nor y is zero, and
let d = (x, y). Then d = (|x|, |y|) and |x| +|y| = |z| .
|x| |y| |z|
d , d , d
We then obtain that the triple
is a fundamental Pythagorean triple. By Theorem 31,
|y| |z|
there exist positive integers m > n with (m, n) = 1 and m 6≡2 n such that one of |x|
d , d , d ,
|y| |x| |z|
is equal to (2mn, m2 − n2 , m2 + n2 ). We therefore have one of (x, y, z), (y, x, z) in S
d , d , d
as is shown by taking
x
y
z
1 =
,
2 =
,
3 =
.
|x|
|y|
|z|
We end this section with an example.
Example 22. Determine the right triangles having integer side lengths and area equal to twice
their perimeter.
Solution. Let x, y and z denote the sides of a right triangle, with z being the hypotenuse. Then,
x, y, z are positive integers such that x2 + y 2 = z 2 . From Corollary 4, one of (x, y, z), (y, x, z) is
equal to (2dmn, d(m2 − n2 ), d(m2 + n2 )) for positive integers d, m, n satisfying m > n, (m, n) = 1
and m 6≡2 n. If the area of our triangle is twice its perimeter, we have
1
xy = 2(x + y + z) =⇒ d2 mn(m2 − n2 ) = 2d(2mn + m2 − n2 + m2 + n2 )
2
=⇒ d2 mn(m − n)(m + n) = 4dm(m + n)
=⇒ dn(m − n) = 4
We conclude that d, n, and m−n are all positive divisors of 4. Recalling that m and n are relatively
prime and of opposite parity, we obtain the possibilities given in the following table:
d n
1 4
2 2
4 1
m−n m
1
5
1
3
1
2
Side lengths of the Corresponding Right Triangle
40, 9, 41
24, 10, 26
16, 12, 20
96
CHAPTER 14. PYTHAGOREAN TRIANGLES
Chapter 15
Infinite Descent and Fermat’s
Conjecture
This chapter is based on [Dud08, §17].
In this section we introduce Fermat’s method of infinite descent that can be used to show that
certain diophantine equations fail to have nontrivial integer solutions. The idea is to proceed by
contradiction by supposing that there exists a nontrivial solution, taking such a solution that is
smallest in some sense and then obtaining a contradiction by deriving an even smaller solution.
This is the description of the “least element” version of the method. The “induction” version of the
method constructs, from some given nontrivial solution, an infinite sequence of positive solutions,
each smaller than its predecessor. This version explains why the method is called infinite descent.
The classic example of using Fermat’s method of infinite descent is the n = 4 case of Fermat’s Last
Theorem. We will prove this case later on in this section, but first we set the stage.
Definition 18. Let f be a polynomial in the variables x1 , . . . , xn with integer coefficients. An
integer solution to the diophantine equation f (x1 , . . . , xn ) = 0 is called nontrivial if none of the xj
are equal to zero.
The following theorem is known as Fermat’s Last Theorem:
Theorem 32 (Fermat’s Last Theorem). If n is a positive integer greater than 2 then the diophantine
equation xn + y n = z n has no nontrivial solutions in integers.
Since x1 + y 1 = z 1 clearly has infinitely many nontrivial integer solutions, and the same is
true for x2 + y 2 = z 2 by the previous section, we see that Fermat’s Last Theorem completes the
determination of when a power can be written as the sum of two like powers. Before arriving at the
n = 4 case of Fermat’s Last Theorem, we first state and prove the following lemma that generalizes
Lemma 20.
Lemma 21. Let k, r, s, t, q ∈ N with (s, t) = 1 and q a prime. Suppose that
st = qrk .
Then one of s, t is a k-th power and the other is q times a k-th power.
97
98
CHAPTER 15. INFINITE DESCENT AND FERMAT’S CONJECTURE
Proof. Assume the hypotheses and let p be a prime. Applying the p-adic valuation vp to both sides
of
st = qrk
yields
(
1
vp (s) + vp (t) = vp (q) + kvp (r) ≡k vp (q) ≡k
0
if p = q;
if p =
6 q.
Since (s, t) = 1, we know that at least one of vp (s), vp (t) is equal to zero. We conclude that
one of vq (s), vq (t) is equal to zero and the other is congruent to 1 modulo k while for p 6= q,
vp (s) ≡k vp (t) ≡k 0. Since a positive integer is a k-th power if and only if each of the exponents
appearing in its prime-power factorization is a multiple of k, we see that one of s, t is a k-th power
while the other is q times a k-th power.
We now apply Fermat’s method of infinite descent to prove the n = 4 case of Fermat’s Last
Theorem.
Theorem 33. The diophantine equation x4 + y 4 = z 2 has no nontrivial solutions in integers. In
particular, x4 + y 4 = z 4 has no nontrivial solutions in integers.
Proof. Towards a contradiction, suppose that x4 + y 4 = z 2 has a nontrivial solution x, y, z in
integers. Since the powers involved in the diophantine equation we are considering are even, we can
assume that x, y and z are all positive. We show now that there is a solution having least positive
value for z, and then obtain a contradiction by deriving from this solution another solution with
an even smaller positive value for z. Suppose then that
S = {z ∈ N | x4 + y 4 = z 2 for some x, y ∈ N}.
To show that S has a least element, we will show that it is nonempty and bounded below and then
invoke the least integer principle. By hypothesis, S 6= ∅ since we are assuming the existence of a
nontrivial solution to our diophantine equation, and, as remarked above, this implies the existence
of a solution x, y, z to our diophantine equation having x, y, z ∈ N. Also, since every element of S
is positive, we see that S is bounded below. From the least integer principle, we conclude therefore
that S has a least element z0 . Let x0 , y0 , z0 ∈ N be a corresponding solution to our diophantine
equation. We claim that x0 and y0 are relatively prime. Indeed, if p is a prime dividing both x0
and y0 , then from
x40 + y04 = z02 ,
we would conclude that p2 | z0 . But then,
x0
p
4
+
y0
p
4
=
z0
p2
2
,
yielding a “smaller”nontrivial solution to our diophantine equation. Indeed, we would have
even though pz02 < z0 . We can therefore write
x20
2
+ y02
2
= z02 ,
z0
p2
∈S
99
for (x0 , y0 ) = 1. We conclude that (x20 , y02 , z0 ) is a fundamental Pythagorean triple. We can therefore
assume without loss of generality that x0 = 2r is even, y0 , z0 are odd and
4r2 = x20 = 2st,
y02
(15.1)
2
2
(15.2)
2
2
(15.3)
=s −t ,
z0 = s + t ,
for some positive integers s, t with s > t, (s, t) = 1 and s 6≡2 t. We know that one of s, t is even
while the other is odd. To determine which one is even and which one is odd, we look at (15.2)
modulo 4 recalling that even squares are congruent to 0 modulo 4 and odd squares are congruent
to 1 modulo 4. Since y0 is odd we obtain
(
(
1
if s is odd and t is even;
12 − 02 if s is odd and t is even;
2
2
2
1 ≡4 y0 ≡4 s − t ≡4
=
−1 if t is odd and s is even.
02 − 12 if t is odd and s is even.
From this we conclude that s is odd and t is even. Looking at (15.2) once more, we see that
t2 + y02 = s2 .
Further, it is easy to see that (t, y0 ) = 1. We therefore have another fundamental Pythagorean
triple which yields relatively prime positive integers m and n with m > n, m 6≡2 n such that
t = 2mn;
(15.4)
y0 = m2 − n2 ;
2
2
s=m +n .
(15.5)
(15.6)
What we do now is show that all three of m, n and s are squares. Equation (15.6) would then
provide us with a positive integer smaller than z0 with square equal to the sum of two fourth powers.
This is the contradiction we are after. First of all, we see from (15.1) that st = 2r2 . By Lemma
21, together with the fact that t is even, we conclude that for some u, v ∈ N, we have s = u2 and
t = 2v 2 . But then, (15.4) yields mn = v 2 so that m and n are both squares. Finally, if m = a2 and
n = b2 then (15.6) reads
u2 = a4 + b4 .
Since
0 < u ≤ u4 = s2 < s2 + t2 = z0 ,
we have obtained the contradiction we were after.
We close this section by proving that the only integers that have rational square roots are the
perfect squares. We first apply Fermat’s method of Infinite Descent to prove the result for primes,
and then show how the general case follows from the Rational Root Theorem.
√
Proposition 19. Let p be prime. Then p is irrational.
√
Proof. Let p be a prime. We will prove that p is irrational by employing Fermat’s method of
Infinite Descent. Let
S = {n ∈ N | pn2 = m2 for some integer m}
100
Assuming that
CHAPTER 15. INFINITE DESCENT AND FERMAT’S CONJECTURE
√
p is rational, we can write
√
p=
m
,
n
for some positive integers m and n. But then pn2 = m2 so that n ∈ S. We conclude that S is
nonempty. Since every element of S is positive, we see as well that S is bounded below. Therefore,
by the least integer principle, S has a least element n0 . Let m0 ∈ Z be such that
pn20 = m20 .
Then p | m0 so that m0 = pm1 for some integer m1 . Consequently
pn20 = m20 = p2 m21 ,
and so
n20 = pm21 .
We now see that p | n0 so that n0 = pn1 for some n1 ∈ N. This gives
p2 n21 = n20 = pm21
so that
pn21 = m21 .
But this forces n1 ∈ S which is a contradiction since n1 < pn1 = n0 . By contradiction, we conclude
√
that p is irrational, as required.
√
We have applied Fermat’s method of Infinite Descent to prove the irrationality of p for primes
√
p. However, for any non square d ∈ N, d is irrational. We now prove this generalization as a
corollary of the Rational Root Theorem.
Theorem 34 (Rational Root Theorem). Let f be a monic polynomial with integer coefficients.
Then every rational root of f is in fact an integer.
Pk−1
Proof. We may suppose that the degree, k, of f is positive. Let f (x) = xk + j=0 aj xj , for integers
a0 , . . . , ak−1 . Suppose that x0 = m
n is a rational root of f . By cancelling common factors from the
numerator and denominator of x0 if necessary, we may suppose that (m, n) = 1. Since x0 is a root
of f , we have
k−1
mk X mj
aj j = 0.
f (x0 ) = k +
n
n
j=0
Multiplying through by nk yields
mk + n
k−1
X
aj mj nk−j−1 = 0.
j=0
Since
every exponent of n that appears in the sum is nonnegative, we see that n divides the integer
Pk−1
n j=0 aj mj nk−j−1 . Therefore n | mk . Since (m, n) = 1, we see that the only way for this to occur
m
is to have n = 1. We conclude that x0 = m
n = 1 = m ∈ Z, as required.
101
√
Corollary 5. Let d ∈ N not be a square. Then d is irrational.
√
Proof. We need to prove that if d is rational then d is the square of an integer. Consider the
monic quadratic polynomial f given by
f (x) = x2 − d.
√
d is a root
From Theorem 34 we know that every rational root of f is in fact an integer. Since
√
of f , we conclude that if it were rational, it would have to be an integer. But if d = n ∈ Z, then
d = n2 would be the square of an integer as was to be shown.
102
CHAPTER 15. INFINITE DESCENT AND FERMAT’S CONJECTURE
Chapter 16
Sums of Squares
This chapter is based on [Dud08, §18, 19] and [Ser73, Appendix to Ch. 4].
In this section, we classify the integers that can be written as sums of squares. We will give
complete proofs for the two squares and four squares cases, and a very rough outline for the three
squares case. The results are that the only positive integers that cannot be written as a sum of two
squares are the ones divisible by a prime p ≡4 3 to an odd power, the only positive integers that
cannot be written as a sum of three squares are the ones of the form 4a (8b − 1) for a ∈ N0 and
b ∈ N and that every positive integer can be written as the sum of four squares. We start with the
two squares case.
Theorem 35. A positive integer n can be written as the sum of two squares if and only if vp (n) is
even for all primes p ≡4 3.
Proof. Let n ∈ N be the sum of two squares. Say
n = x2 + y 2
(16.1)
for nonnegative integers x, y. Suppose that p is a prime such that vp (n) is odd. Since n is an integer
we have vp (n) ≥ 0 and so we conclude from the assumption that vp (n) is odd that vp (n) ≥ 1. We
summarize
vp (n) is an odd positive integer.
(16.2)
Let d = (x, y). Then, since d divides both x and y, we see from (16.1) that d2 divides n. Define
x1 = x/d, y1 = y/d and n1 = n/d2 . Dividing (16.1) by d2 yields
n1 = x21 + y12 .
(16.3)
Now, we take the p-adic valuation of n1 , recalling that the p-adic valuation of integers is nonnegative,
to obtain
n
0 ≤ vp (n1 ) = vp 2 = vp (n) − vp (d2 ) = vp (n) − 2vp (d).
d
We conclude from this and (16.2) that vp (n1 ) is odd and nonnegative. It is therefore positive so
that p | n1 . From (16.3), we see that either both of x1 , y1 are divisible by p or that neither one
of them is divisible by p. Since (x1 , y1 ) = 1 we conclude that neither x1 nor y1 is divisible by p so
that they both lie in Z/pZ× . Considering (16.3) modulo p yields
x21 + y12 ≡p n1 ≡p 0,
103
104
CHAPTER 16. SUMS OF SQUARES
so that
x21 ≡p −y12 .
Since y1 ∈ Z/pZ× , the inverse y1−1 modulo p exists and we can multiply by y1−2 to obtain
x1 y1−1
2
≡p −1.
But this forces p = 2 or p is odd and (−1/p) = 1. Since the latter forces p ≡4 1, we see that the
only primes p for which vp (n) can be odd are p = 2 and p ≡4 1. Therefore, if n is the sum of two
squares then for all primes p ≡4 3 we have vp (n) is even. Conversely, suppose that vp (n) is even for
all primes p ≡4 3. We can then write the prime-power factorization of n as
!2
!
v2 (n)
n=2
Y
p≡4 1
vp (n)
p
Y
vp (n)/2
p
.
p≡4 3
Since any square is already a sum of two squares (equal to itself plus 02 ), in order to prove that n is
a sum of two squares, it suffices to show that 2 and primes p ≡4 1 are sums of two squares and that
multiplying together integers representable as the sum of two squares yields another integer that
is representable as the sum of two squares. We start by showing that the product of representable
integers is representable. To do this, we need only combine the observation that
(a2 + b2 )(c2 + d2 ) = |a − bi|2 |c + di|2 = |(ac + bd) + (ad − bc)i|2 = (ac + bd)2 + (ad − bc)2 (16.4)
with a routine induction. Therefore, since 2 = 12 + 12 , we are reduced to proving that every prime p
congruent to 1 modulo 4 is representable as the sum of two squares. To this end, we first note that
since p ≡4 1, we have (−1/p) = 1. Therefore, there exists a positive integer u such that u2 ≡p −1.
This implies that
p | (u2 + 1).
(16.5)
We will now complete the proof in two different ways. One way will be by Descent, and the other
will be algebraic. First we proceed by descent. From (16.5) we see that the set S given by
S = {k ∈ N | kp = x2 + y 2 for some x, y ∈ N}
is nonempty. Since it is bounded below, we can invoke the least integer principle to obtain a least
element k. Say
x2 + y 2 = kp.
(16.6)
If we can show that k = 1, then we will have p written as the sum of two squares, as required. We
now proceed to show this. Let r and s be the representatives modulo k for x and y respectively
having least absolute value. Then
r ≡k x,
s ≡k y,
k
<r≤
2
k
− <s≤
2
−
Then
r2 + s2 ≡k x2 + y 2 = kp ≡k 0.
k
;
2
k
.
2
(16.7)
(16.8)
105
We can therefore write r2 + s2 = k1 k for some k1 ∈ N0 . Now, if k1 = 0 then r = s = 0 so that
x ≡k y ≡k 0. By (16.6) we see that this forces k 2 | kp so that k | p. Therefore k = 1 or k = p. If
k = 1 we’re done. If k = p then p | x, y and
2 2
y
x
+
= 1.
p
p
But this would force one of x, y to be equal to zero (and the other to be equal to p) which is a
contradiction since x, y ∈ N. We can therefore assume that k1 ∈ N. We have
(rx + sy)2 + (ry − sx)2 = (r2 + s2 )(x2 + y 2 ) = k1 k 2 p.
(16.9)
However,
rx + sy ≡k r2 + s2 ≡k 0,
ry − sx ≡k rs − sr ≡k 0
so that both rx + sy and ry − sx are divisible by k. We can therefore divide (16.9) through by k 2
to obtain
2 2
rx + sy
ry − sx
+
= k1 p.
k
k
However, we have
2 2
k
k2
k
+
=
< k2 ,
k ≤ k1 k = r + s ≤
2
2
2
2
2
so that
1 ≤ k1 < k
which contradicts the minimality of k. The only case that didn’t lead to a contradiction was the
case r = s = 0 and k = 1. Therefore k = 1 and
p = x2 + y 2
is representable as the sum of two squares, as required. To provide an alternative proof, we work
in the ring Z[i] or Gaussian integers given by
Z[i] = {a + bi | a, b ∈ Z},
and i is a chosen square root of −1. It can be shown that Z[i] has unique factorization so that
primes in Z[i] correspond to irreducibles in Z[i]. Recall that an element α is called prime when
α | βγ =⇒ α | β or α | γ
and an element α is called irreducible if
α = βγ =⇒ |β| = 1 or |γ| = 1.
Since p | (u2 + 1), we see that
p | (u + i)(u − i).
However, as neither up + p1 i nor up − p1 i lies in Z[i] (lest
follows that it is reducible so that we can write
p = αβ
1
p
∈ Z) we see that p is not prime in Z[i]. It
(16.10)
106
CHAPTER 16. SUMS OF SQUARES
for some α, β ∈ Z[i] with |α| =
6 1 and |β| =
6 1. Note that for any δ ∈ Z[i], |δ|2 ∈ Z. Indeed, if
2
2
2
δ = c + di then |δ| = c + d ∈ Z. From (16.10) we see that
p2 = |α|2 |β|2
so that both of |α|2 and |β|2 are integers not equal to 1 that divide p2 . The only possibility is to
have
|α|2 = |β|2 = p.
But this completes the proof since if α = a + bi, then p = |α|2 = a2 + b2 , as required.
We now turn to the determination of the positive integers that can be written as the sum of
three squares. The proof of the classification of such integers is difficult and so we can only provide
an outline. We start with the statement of Gauss’ classification before proceeding to an outline of
the proof.
Theorem 36 (Gauss). A positive integer n can be written as the sum of three squares if and only
if it is not of the form 4a (8b − 1). In particular, an odd integer n can be written as the sum of three
squares if and only if n 6≡8 −1.
The proof of Theorem 36 is split into establishing three equivalences:
n is not of the form 4a (8b − 1) ⇐⇒ −n fails to be a square modulo some power of 2;
−n fails to be a square modulo some power of 2 ⇐⇒ n = x2 + y 2 + z 2 for some x, y, z ∈ Q;
n = x2 + y 2 + z 2 for some x, y, z ∈ Q ⇐⇒ n = x2 + y 2 + z 2 for some x, y, z ∈ Z.
We will prove the first and third equivalences, but content ourselves with a very rough sketch
of a proof of the second equivalence. We prove the first equivalence by way of Hensel’s Lemma. By
the Chinese Remainder Theorem and the Fundamental Theorem of Arithmetic, solving polynomial
congruences modulo positive integers is reduced to solving polynomial congruences modulo prime
powers. Hensel’s Lemma allows us, under certain conditions, to further reduce this problem to the
consideration of polynomial congruences modulo primes.
Theorem 37 (Hensel’s Lemma). Let f be a polynomial with integer coefficients and p be a prime.
If there exists an integer a such that
vp (f (a)) > 2vp (f 0 (a))
(16.11)
then f (x) ≡pj 0 has solutions for all j.
Sketch of Proof. Assume the hypotheses and define a sequence of rational numbers {α0 , α1 , α2 , . . . }
recursively by
f (αi )
α0 = a,
αi+1 = αi − 0
(i ≥ 0).
f (αi )
One then shows that limi→∞ vp (f (αi )) = ∞. This implies that f has roots modulo every power of
p since given a particular power pk , we need only choose an index i such that vp (f (αi )) ≥ k. We
would then have
f (αi ) ≡pk 0
107
as required. More comes out of the proof, however. One shows that for all i ≥ 0 we have
vp (αi+1 − αi ) ≥ 2i ,
vp (f 0 (αi )) = vp (f 0 (a)),
vp (f (αi )) ≥ 2i .
We conclude that for each i,
αi+1 ≡p(2i ) αi ,
f (αi ) ≡p(2i ) 0.
Some remarks are in order. First of all, the condition
vp (f (a)) > 2vp (f 0 (a))
implies in particular that vp (f (a)) > 0 since f 0 (a) in an integer and so has nonnegative p-adic
valuation. Therefore, the situation in which we apply Hensel’s Lemma is the situation where f has
a root modulo p. This is the case since
vp (f (a)) > 0 ⇐⇒ f (a) ≡p 0.
Now, since
vp (f 0 (a)) = 0 ⇐⇒ f 0 (a) 6≡p 0,
we see that Hensel’s Lemma applies automatically whenever f has a simple root modulo p. Indeed,
if
f (a) ≡p 0
but
f 0 (a) 6≡p 0,
then
vp (f (a)) > 0 = 2vp (f 0 (a)).
The point is that whenever f has a simple root modulo p, this root can be lifted to higher powers
of p without bound to obtain roots of f modulo any power of p. This is the nonsingular special
case of Hensel’s Lemma given by Corollary 6 below. Hensel’s Lemma can also be applied, however,
when f 0 (a) ≡p 0 (the singular case) provided f vanishes at a modulo a sufficiently high power of p
(larger than 2vp (f 0 (a))).
Aside 2. Recall that the p-adic absolute value | · |p is defined by
|x|p = x−vp (x) ,
and that we obtain the p-adic numbers by adding to the set of rationals all limits of Cauchy
sequences of rationals. We have vp (x) is large when |x|p is small. We can re-write (16.11) as
|f (a)| < |f 0 (a)|2 ,
and it turns out that f (x) ≡pj 0 having solutions for all j is equivalent to f (x) = 0 having solutions
in p-adic numbers. So, all in all, Hensel’s Lemma can be interpreted as stating that if we can find an
integer, a, such that f (a) is sufficiently close to zero, (closer than f 0 (a)2 ) then we can actually find
a p-adic integer b (corresponding to the infinite sequence {α0 , α1 , α2 , . . . } constructed via Newton’s
method) such that f (b) = 0.
Often, the following “nonsingular” version of Hensel’s Lemma corresponding to when f has a
simple root modulo p is sufficient.
108
CHAPTER 16. SUMS OF SQUARES
Corollary 6 (Hensel’s Lemma: Nonsingular Case). Let f be a polynomial with integer coefficients
and p be a prime. If there exists an integer a such that
f (a) ≡p 0
and
f 0 (a) 6≡p 0,
then f (x) ≡pj 0 has solutions for all j. In words, if f has a simple root modulo p then f has roots
modulo every power of p.
Proof. By Theorem 37, it is enough to show that f (a) ≡p 0 and f 0 (a) 6≡p 0 implies that
vp (f (a)) > 2vp (f 0 (a)).
But this is clear since f (a) and f 0 (a) are integers so that vp (f (a)) and vp (f 0 (a)) are nonnegative.
We can then restate our hypotheses as follows:
f (a) ≡p 0 ⇐⇒ vp (f (a)) ≥ 1
f 0 (a) 6≡p 0 ⇐⇒ vp (f (a)) = 0.
Therefore, in this case, we have
vp (f (a)) ≥ 1 > 0 = 2 · 0 = 2vp (f 0 (a)).
We now illustrate the utility of Hensel’s Lemma with an example.
Example 23. Show that f (x) has roots modulo every power of 3 for the following polynomials
f (x):
(a) f (x) = x3 + x2 + x + 1;
(b) f (x) = x2 + x + 223.
Solution.
(a) We consider f (x) modulo 3 to obtain
x3 + x2 + x + 1 ≡3 0.
We see that if there is a solution a, it must not be congruent to 0 modulo 3. By Fermat’s
Little Theorem, we have a3 ≡3 a and a2 ≡3 1. The congruence then becomes
a + 1 + a + 1 ≡3 0 ⇐⇒ 2(a + 1) ≡3 0 ⇐⇒ a + 1 ≡3 0 ⇐⇒ a ≡3 2.
Further,
f 0 (x) = 3x2 + 2x + 1
so that
f 0 (a) ≡3 2a + 1.
If we choose a ≡3 2 we then have f (a) ≡3 0 and f 0 (a) ≡3 (2)(2) + 1 6≡3 0. Therefore, by the
“nonsingular” version of Hensel’s Lemma, we see that f (x) ≡3j 0 has solutions for all j.
109
(b) Here, the congruences in question are
x2 + x + 223 ≡3j 0
(j ≥ 1).
We start with the j = 1 case with hopes of being able to apply the nonsingular case of Hensel’s
Lemma. Since 223 ≡3 2 + 2 + 3 ≡3 7 ≡3 1 we see that the congruence in question is given by
x2 + x + 1 ≡3 0 ⇐⇒ x2 − 2x + 1 ≡3 0 ⇐⇒ (x − 1)2 ≡3 0 ⇐⇒ x ≡3 1.
Now we compute f 0 (1) in hopes that it is not congruent to 0 modulo 3. We are not that lucky
in this case since
f 0 (x) = 2x + 1
so that
f 0 (1) ≡3 2(1) + 1 ≡3 0.
We now see that we need to apply the “singular” version of Hensel’s Lemma. We therefore
take note of the 3-adic valuation of f 0 (1) and try to find a solution to f (x) ≡3j 0 for j =
2v3 (f 0 (1)) + 1. Since
f 0 (1) = 2(1) + 1 = 3,
we see that
v3 (f 0 (1)) = v3 (3) = 1.
We therefore seek a solution a to f (x) ≡33 0 for which v3 (f 0 (a)) = 1. Since 223 ≡27 7, the
congruence in question is given by
x2 + x + 7 ≡27 0.
Now, we know that any solution must be a solution modulo 3 as well and so must be congruent
to 1 modulo 3. Since 1 clearly is not a solution modulo 27, the next integer to try is 4. We
compute
f (4) ≡27 42 + 4 + 7 ≡27 27 ≡27 0.
However, f 0 (4) = 2(4) + 1 = 9. Therefore v3 (f 0 (4)) = v3 (32 ) = 2. We therefore did not
manage to find a root a of f modulo 33 for which v3 (f 0 (a)) = 1. Turning things around,
however, since we know that v3 (f 0 (4)) = 2, we will be able to apply Hensel’s Lemma provided
f (4) ≡35 0. This is in fact the case since
f (4) = 44 + 4 + 223 = 243 = 35 ≡35 0.
We conclude that v3 (f (4)) ≥ 5 > 4 = (2)(2) = 2v3 (f 0 (4)) so that Hensel’s Lemma applies.
We therefore have roots of f modulo every power of 3.
Hensel’s Lemma comes into our outline of a proof to Theorem 36 by establishing the first
equivalence mentioned above.
Lemma 22. Let n ∈ N. Then n is of the form 4a (8b − 1) if and only if −n is a square modulo
every power of 2.
110
CHAPTER 16. SUMS OF SQUARES
Proof. Let n ∈ N and suppose that n = 4a (8b − 1) for some a ≥ 0 and b ≥ 1. Define f (x) = x2 + n.
We need to prove that
f (x) ≡2j 0
has solutions for all j. By Hensel’s Lemma, it is sufficient to find an integer m such that
v2 (f (m)) > 2v2 (f 0 (m)).
That is, it is sufficient to find an integer m such that
f (m) ≡2e 0
for e ≥ 2v2 (f 0 (m)) + 1. We note that
2
n = 4a (8b − 1) = 22a+3 b − (2a ) ≡22a+3 −(2a )2 .
Therefore
f (2a ) = (2a )2 + n ≡22a+3 (2a )2 − (2a )2 = 0.
Finally, since
v2 (f (2a )) ≥ 2a + 3 > 2a + 2 = 2(a + 1) = 2v2 (2 · 2a ) = 2v2 (f 0 (2a )),
we see that Hensel’s Lemma applies and we obtain solutions to f (x) ≡2j 0 for all j. Consequently,
−n is a square modulo every power of 2. Conversely, suppose that −n is a square modulo every
power of 2. Write n = 2k m for m odd. We know that −n is a square modulo 2k+1 so that, for some
x ∈ N we have
−2k m ≡2k+1 x2 .
But then, we can write
x2 = −2k m + 2k+1 `
for some ` which implies that
x2 = 2k (2` − m).
But 2` − m is odd since m is odd, and so
2v2 (x) = v2 (x2 ) = v2 (2k (2` − m)) = v2 (2k ) + v2 (2` − m) = k + 0 = k.
We conclude that k = 2a is even. Since this gives n = 4a m, we are reduced to proving that
m ≡8 −1. We will do this by showing that −m is a square modulo 8 which yields the desired result
since the only odd square modulo 8 is 1. To this end, we know that −n is a square modulo 22a+3 .
Therefore, for some integer y we have
− 4a m ≡22a+3 y 2 ,
which yields an integer j such that
−4a m + 22a+3 j = y 2 =⇒ 22a (8j − m) = y 2 .
We conclude that 2a | y so that y = 2a z for some z. We now see that (16.12) becomes
−22a m ≡22a+3 22a z 2 =⇒ −m ≡23 z 2
as was to be shown.
(16.12)
111
We now proceed to establishing the third equivalence given above by showing that, for a positive
integer n, n is representable as the sum of three rational squares if and only if it is representable
as the sum of three integral squares. This is the content of the following proposition.
Proposition 20. Let f (X, Y, Z) = X 2 + Y 2 + Z 2 and n ∈ N. Then f (X, Y, Z) = n has a solution
X, Y, Z ∈ Q if and only if f (X, Y, Z) = n has a solution X, Y, Z ∈ Z.
Proof. We will denote f (x1 , x2 , x3 ) by f (x) where x = [x1 , x2 , x3 ]T is the associated column vector.
We then see that
f (x) = kxk2 = x · x,
where k · k denotes the norm and · denotes the dot product defined on vectors. If f (x) = n has a
solution in Z3 , then it is clear that f (x) = n has a solution in Q3 . In fact, we can use the same
solution. What needs to be proved here is that the existence of a solution over the rationals implies
the existence of a solution over the integers. Suppose then that f (x) = n has a solution v ∈ Q3 .
For 1 ≤ i ≤ 3, write vi = ri /si for integers ri , si with si > 0. The equation f (v) = n becomes
r22
r32
r12
+
+
= n.
s21
s22
s23
Multiplying through by (s1 s2 s3 )2 yields
(r1 s2 s3 )2 + (r2 s1 s3 )2 + (r3 s1 s2 )2 = (s1 s2 s3 )2 n.
We see, therefore, that there exists a positive integer t such that
t2 n = f (x) for some x ∈ Z3 .
The set
S = {t ∈ N | t2 n = f (x) for some x ∈ Z3 }
is then nonempty and bounded below. By the least integer principle it has a least element t. Say
f (x) = t2 n for x ∈ Z3 . We aim to prove that t = 1 so that n is represented by f over the integers.
Towards a contradiction, suppose that t > 1. For 1 ≤ i ≤ 3, let yi be the closest integer to xi /t so
that |yi − xi /t| ≤ 21 . Define z = y − 1t x. Then
f (z) = z · z =
3 X
j=1
3
xi 2 X 1
3
yi −
= < 1.
≤
t
4
4
j=1
(16.13)
If f (z) = 0, then kzk = 0 so that z = 0. This forces x = ty so that
t2 n = f (x) = x · x = (ty) · (ty) = t2 (y · y) = t2 f (y).
But then
f (y) = n,
(y ∈ Z3 )
and so we have a representation of n by f over the integers which forces t = 1. On the other hand,
if f (z) 6= 0, then z · z > 0. Define
x0 = ax + by,
112
CHAPTER 16. SUMS OF SQUARES
for a = f (y) − n and b = 2nt − 2x · y. Then
f (x0 ) = x0 · x0
= (ax + by) · (ax + by)
= a2 x · x + 2abx · y + b2 y · y
= a2 f (x) + ab(2nt − b) + b2 f (y)
= a2 t2 n + 2abnt − ab2 + b2 (a + n)
= a2 t2 n + 2abnt − ab2 + ab2 + b2 n
= n(a2 t2 + 2abt + b2 )
= n(at + b)2 .
Thus, with t0 = at + b we have nt02 = f (x0 ) for x0 ∈ Z3 . However, we have
tt0 = at2 + bt
= t2 y · y − t2 n + 2nt2 − 2tx · y
= t2 y · y + t2 n − 2tx · y
= t2 y · y − 2tx · y + x · x
= (ty − x) · (ty − x)
1
1
= t2 y − x · y − x
t
t
= t2 z · z.
We conclude from (16.13) and our assumption that z · z > 0 that t0 = tz · z is positive and less than
t. But this contradicts the minimality of t.
We now provide a rough sketch of a proof of the second equivalence given above and put
everything we have done so far together to obtain an outline of a proof of Theorem 36.
Outline of Proof of Theorem 36. By Lemma 22, we know that n is not of the form 4a (8b − 1) if and
only if −n fails to be a square modulo some power of 2. Also, by Proposition 20 we know that n
can be written as the sum of three integral squares if and only if it can be written as the sum of
three rational squares. All in all, we are reduced to proving that with f (X, Y, Z) = X 2 + Y 2 + Z 2 ,
we have
f (x) = n has a solution x ∈ Q3 ⇐⇒ −n fails to be a square modulo some power of 2.
(16.14)
Unfortunately, proving this final equivalence lies beyond the scope of these notes. What is required
is the theorem of Hasse-Minkowski that states that nondegenerate quadratic forms have rational
solutions if and only if they have real solutions and p-adic solutions for all primes p. This applies
to f and so f (x) = n has solutions over Q if and only if it has solutions over R and each of the
p-adic fields Qp . The condition f (x) = n having solutions over R forces n > 0 and it turns out that
we automatically obtain solutions over each p-adic field Qp for p odd. It therefore all comes down
to the p = 2 case and it can be shown that having solutions over Q2 is equivalent to −n not being
a square in Q2 . This, in turn, is equivalent to the right hand side of (16.14).
113
We now show how Lagrange’s four square theorem follows readily from Theorem 36.
Theorem 38 (Lagrange). Every positive integer can be written as the sum of four squares.
2
Proof. Let n ∈ N and write n = 4k m for m not divisible by 4. Since 4k = 2k is a square,
it is sufficient to prove that m is a sum of four squares. Indeed, if m = a2 + b2 + c2 + d2 , then
n = (2k a)2 + (2k b)2 + (2k c)2 + (2k d)2 . If m 6≡8 −1, then we know from Theorem 36 that m can be
written as the sum of three squares. Adding 02 to such an expression shows that every such m can
be written as the sum of four squares. On the other hand, if m ≡8 −1, then Theorem 36 implies
that m − 1 can be written as the sum of three squares. Indeed, we’d have


7 if a = 0;
a
a
m − 1 ≡8 −2 ≡8 6
whereas
4 (8b − 1) ≡8 −4 ≡8 4 if a = 1;


0 if a ≥ 2.
Writing m − 1 = a2 + b2 + c2 yields m = a2 + b2 + c2 + 12 is the sum of four squares.
The following table lists the smallest representations of the positive integers less than or equal
to 100 as the sum of squares. Here, by smallest, we mean that we use the least number of positive
squares necessary, and then pick from all representations using this number of squares the smallest
with respect to the lexicographic ordering.
1 = 12
2 = 12 + 12
3 = 12 + 12 + 12
4 = 22
5 = 12 + 22
6 = 12 + 12 + 22
7 = 12 + 12 + 12 + 22
8 = 22 + 22
9 = 32
10 = 12 + 32
11 = 12 + 12 + 32
12 = 22 + 22 + 22
13 = 22 + 32
14 = 12 + 22 + 32
15 = 12 + 12 + 22 + 32
16 = 42
17 = 12 + 42
18 = 32 + 32
19 = 12 + 32 + 32
20 = 22 + 42
21 = 12 + 22 + 42
22 = 22 + 32 + 32
23 = 12 + 22 + 32 + 32
24 = 22 + 22 + 42
25 = 52
26 = 12 + 52
27 = 12 + 12 + 52
28 = 12 + 12 + 12 + 52
29 = 22 + 52
30 = 12 + 22 + 52
31 = 12 + 12 + 22 + 52
32 = 42 + 42
33 = 12 + 42 + 42
34 = 32 + 52
35 = 12 + 32 + 52
36 = 62
37 = 12 + 62
38 = 12 + 12 + 62
39 = 12 + 12 + 12 + 62
40 = 22 + 62
41 = 42 + 52
42 = 12 + 42 + 52
43 = 32 + 32 + 52
44 = 22 + 22 + 62
45 = 32 + 62
46 = 12 + 32 + 62
47 = 12 + 12 + 32 + 62
48 = 42 + 42 + 42
49 = 72
50 = 12 + 72
51 = 12 + 12 + 72
52 = 42 + 62
53 = 22 + 72
54 = 12 + 22 + 72
55 = 12 + 12 + 22 + 72
56 = 22 + 42 + 62
57 = 22 + 22 + 72
58 = 32 + 72
59 = 12 + 32 + 72
60 = 12 + 12 + 32 + 72
61 = 52 + 62
62 = 12 + 52 + 62
63 = 12 + 12 + 52 + 62
64 = 82
65 = 12 + 82
66 = 12 + 12 + 82
67 = 32 + 32 + 72
68 = 22 + 82
69 = 12 + 22 + 82
70 = 32 + 52 + 62
71 = 12 + 32 + 52 + 62
72 = 62 + 62
73 = 32 + 82
74 = 52 + 72
75 = 12 + 52 + 72
76 = 22 + 62 + 62
77 = 22 + 32 + 82
78 = 22 + 52 + 72
79 = 12 + 22 + 52 + 72
80 = 42 + 82
81 = 92
82 = 12 + 92
83 = 12 + 12 + 92
84 = 22 + 42 + 82
85 = 22 + 92
86 = 12 + 22 + 92
87 = 12 + 12 + 22 + 92
88 = 42 + 62 + 62
89 = 52 + 82
90 = 32 + 92
91 = 12 + 32 + 92
92 = 12 + 12 + 32 + 92
93 = 22 + 52 + 82
94 = 22 + 32 + 92
95 = 12 + 22 + 32 + 92
96 = 42 + 42 + 82
97 = 42 + 92
98 = 72 + 72
99 = 12 + 72 + 72
100 = 102
114
CHAPTER 16. SUMS OF SQUARES
We have seen that every positive integer can be written as the sum of four squares. However, we
have allowed 02 to appear as one of the summands. The question arises: “What is the situation for
nonzero squares?” By the following proposition, we see that infinitely many positive integers cannot
be written as the sum of four nonzero squares but that every sufficiently large positive integer can
be written as the sum of five nonzero squares.
Proposition 21.
(a) No odd power of 2 can be written as the sum of four nonzero squares.
(b) Every integer n > 169 can be written as the sum of five nonzero squares.
Proof.
(a) We prove the result by way of the least integer principle. Let
S = {r ∈ N0 | 22r+1 can be written as the sum of four nonzero squares}.
We aim to show that S = ∅. Suppose not. Then S is a nonempty set of integers that is
bounded below. By the least integer principle it has a least element r. Say
22r+1 = x2 + y 2 + z 2 + w2 ,
(1 ≤ x ≤ y ≤ z ≤ w).
(16.15)
Since 22r+1 = x2 + y 2 + z 2 + w2 ≥ 12 + 12 + 12 + 12 = 4 = 22 , we see that 2r + 1 ≥ 2 so that
r ≥ 1. Since this implies that 2r + 1 ≥ 3, we see that
x2 + y 2 + z 2 + w2 ≡8 22r+1 ≡8 0.
Since every odd square is congruent to 1 modulo 8, we see that 0, 2 or 4 of x, y, z, w are odd.
However, a calculation shows that
(
4
if all of x, y, z, w are odd;
2
2
2
2
x + y + z + w ≡8
2 or 4 if exactly two of x, y, z, w are odd.
Therefore all of x, y, z, w are even. We now divide (16.15) by 4 to obtain
22(r−1)+1 = 22r−1 =
x 2
2
+
y 2
2
+
z 2
2
+
w 2
2
.
But r − 1 ≥ 0, and so this last equation implies that r − 1 ∈ S thereby contradicting the
minimality of r. We must then have S = ∅ as was to be shown.
(b) Observe that
169 = 132
(16.16)
2
= 5 + 12
2
2
2
2
2
(16.17)
= 3 + 4 + 12
2
2
(16.18)
2
= 1 + 2 + 8 + 10 .
(16.19)
115
Now let n > 169 so that n − 169 is a positive integer. By Lagrange’s four square theorem we
know that n − 169 can be written as the sum of four squares. Also, since n − 169 > 0, at least
one of the squares involved in such a representation must be nonzero. We can therefore write
n − 169 = x2 + y 2 + z 2 + w2
(0 ≤ x ≤ y ≤ z ≤ w, w 6= 0).
We then have
 2
13 + x2 + y 2 + z 2 + w2



52 + 122 + y 2 + z 2 + w2
n=

32 + 42 + 122 + z 2 + w2


 2
1 + 22 + 82 + 102 + w2
if
if
if
if
x, y, z 6= 0;
x = 0, y, z 6= 0;
x = y = 0, z 6= 0;
x = y = z = 0.
In any case, we have represented n as the sum of five nonzero squares, as required.
116
CHAPTER 16. SUMS OF SQUARES
Chapter 17
x2 − N y 2 = 1
This chapter is based on [Dud08, §20].
We have been interested in this course in the values taken on by polynomials in two variables
having integer coefficients for which every term has the same degree. Such polynomials are called
binary homogeneous forms. We have already studied this question for linear forms as well as a
particular case of a quadratic form. In the linear case, we ask which values c are taken on by the
form ax + by. We found that c is not taken on by this form when (a, b) - c while when (a, b) | c, c is
taken on by this form infinitely often. Indeed, we found, in case (a, b) | c, by way of the Euclidean
Algorithm that c is taken on by the form ax + by at least once and that if ax0 + by0 = c then c is
taken on by the form infinitely often as is seen by taking
x = x0 + t
b
,
(a, b)
y = y0 − t
a
,
(a, b)
(t ∈ Z).
The next level of complexity is given by binary quadratic forms. In this case, we ask which
integers c are taken on by the form
ax2 + bxy + dy 2
(a, b, d ∈ Z).
The particular case obtained by setting a = d = 1 and b = 0 was studied earlier in this course. This
case calls for the classification of the integers c that can be written as the sum of two squares. In
this section we consider another binary quadratic form, namely x2 − dy 2 for d ∈ Z. We will show
that for d positive and not a square, we obtain infinitely many representations of 1 by this form.
So, we can obtain infinitely many representations for a given integer by binary linear forms and
binary quadratic forms. This is where it stops, however, as is shown by the following theorem.
Pn
Theorem 39. Let f (x) = k=0 ak xk be a polynomial of degree n ≥ 3 with no repeated roots. Then,
for any integer c, the binary form F of degree n given by
F (x, y) =
n
X
k=0
takes on the value c only finitely often.
117
ak xk y n−k
CHAPTER 17. X 2 − N Y 2 = 1
118
We now turn to determining the solutions to Pell’s equation
x2 − dy 2 = 1.
That is, we determine when 1 can be represented by the binary quadratic form x2 −dy 2 . First of all,
since we are dealing with squares, the nontrivial solutions (xy 6= 0) are determined by the positive
solutions (x, y > 0). Also, the equation is not interesting when d = 0. Further, by the following
lemma, in searching for positive solutions, we can assume that d is positive and not a square.
Lemma 23. If d < 0 or d is a square, then there are no nontrivial solutions to x2 − dy 2 = 1.
Proof. If d < 0 and neither x nor y is zero then x2 , y 2 ≥ 1 and d ≤ −1. Therefore x2 − dy 2 ≥
1 + 1 = 2 > 1. We therefore have no nontrivial solutions. If d = m2 is a square, then our equation
becomes
x2 − (my)2 = (x − my)(x + my) = 1.
Therefore x − my = x + my = ±1. In particular, we have 2my = 0 so that y = 0. We therefore
have a trivial solution.
We therefore arrive at the determination of the positive solutions to x2 − dy 2 = 1 for d ∈ N not
a square. The observation that we can factor our equation as
√
√
(x + y d)(x − y d) = 1,
√
leads us to the question of when a real number of the form x + y d √
has an inverse of the same
form. To answer √
this question, we develop some properties of the set Z[ d] of numbers of this form
and its subset Z[ d]× consisting of the numbers of this form having inverse also of this form.
Definition 19. Let d ∈ N not be a square. We define
√
√
1. Z[ d] = {a + b d | a, b ∈ Z}
√
√
√
√
2. Z[ d]× = {a + b d | a, b ∈ Z, (a + b d)(r + s d) = 1 for some r, s ∈ Z}
Remark 8. For d ∈ N not a square, the set
√
√
Q( d) = {a + b d | a, b ∈ Q}
√
is a real quadratic field and the subset Z[ d] consists of elements that satisfy a quadratic irreducible
monic polynomial with integer
coefficients. The set of all √
such elements is called the ring of integers
√
in the number√ field Q( d). In fact, unless d ≡4 1, Z[ d] will actually be equal to the ring of
integers in Q( d). In case d ≡4 1, we get more elementsh in thei ring
n of integers,
√ and
it can be
o shown
√
√
1+ d
1+ d
that the entire ring of integers in Q( d) is given by Z
= a+b
a, b ∈ Z .
2
2
√
√
It will be important for us to note that Z[ d] is an integral domain and that Z[ d]× is an
abelian group. The relevant definitions are given below.
Definition 20.
1. An abelian group is a set G together with a binary operation ∗ : G × G → G such that
119
(a) (Closure under ∗) For all a, b ∈ G, a ∗ b ∈ G;
(b) (Associativity of ∗) For all a, b, c ∈ G we have (a ∗ b) ∗ c = a ∗ (b ∗ c);
(c) (Existence of Identity) There exists an element e ∈ G such that for all a ∈ G we have
a ∗ e = e ∗ a = a;
(d) (Existence of Inverse) For all a ∈ G there exists b ∈ G such that a ∗ b = b ∗ a = e;
(e) (Commutativity of ∗) For all a, b ∈ G, a ∗ b = b ∗ a.
2. An integral domain is a set D together with binary operations + and · such that
(a) (D, +) is an abelian group (with identity element denoted by 0)
(b) (Closure under ·) For all a, b ∈ D, a · b ∈ D;
(c) (Associativity of ·) For all a, b, c ∈ D we have (a · b) · c = a · (b · c);
(d) (Existence of Identity) There exists an element 1 ∈ D such that for all a ∈ D we have
a · 1 = 1 · a = a;
(e) (Commutativity of ·) For all a, b ∈ D, a · b = b · a.
(f) (D fails to have zero divisors) For all a, b ∈ D, if ab = 0 then a = 0 or b = 0.
(g) (· distributes over +) For all a, b, c ∈ D we have
a · (b + c) = (a · b) + (a · c)
Remark 9. We denote a · b by ab when it proves convenient to do so. We note that when D is an
integral domain, the subset D× consisting of all invertible elements is an abelian group under ·.
Also, fields are precisely the sets of numbers obtained by taking quotients of elements in an integral
domain. For example, Z is an integral domain,
√ and Q is its field of quotients. By the proposition
d] is an integral domain and the real quadratic field
below,
we
obtain
another
example
of
this:
Z[
√
Q( d) is its field of quotients.
√
√
Proposition 22. Let d ∈ Z. Then Z[ d] is an integral domain. In particular, Z[ d]× is an abelian
group.
√
√
Proof. Since Z[ d] ⊆ C, and we are using the usual operations, the verification that Z[ d] is an
integral domain reduces to the verification of closure under the operations in question. Indeed, any
of the axioms defining an integral domain that hold in C will automatically hold in subsets of C as
long as we remain inside the subset when we apply the operations in question. In our case, we are
reduced to verifying the following:
√
√
1. For all α, β ∈ Z[ d] we have α + β ∈ Z[ d];
√
2. 0 ∈ Z[ d];
√
√
3. For all α ∈ Z[ d] we have −α ∈ Z[ d];
√
√
4. For all α, β ∈ Z[ d] we have αβ ∈ Z[ d];
√
5. 1 ∈ Z[ d].
CHAPTER 17. X 2 − N Y 2 = 1
120
√
In each case, we simply need to note that the element in question can be written in the form x+y d
for integers x, y. But this follows from a quick calculation together with the fact that Z is closed
under the operations in question:
√
√
√
1. If α = a + b d and β = r + s d for integers a, b, r, s, then α + β = (a + r) + (b + s) d is of
the desired form since a + r and b + s are integers;
√
2. 0 = 0 + 0 d is of the desired form since 0 ∈ Z;
√
√
3. If α = a + b d for integers a and b, then −α = (−a) + (−b) d is of the desired form since
−a, −b ∈ Z;
√
√
√
4. If α = a + b d and β = r + s d for integers a, b, r and s, then αβ = (ar + dbs) + (as + br) d
is of the desired form since ar + dbs, as + br ∈ Z;
√
5. 1 = 1 + 0 d is of the desired form since 1, 0 ∈ Z.
Aside 3. Any field extension of Q of finite degree (i.e. any number field) can be seen to be a
finite-dimensional
vector
we are dealing with the quadratic
√
√ space over Q. In our case of interest, √
field Q(
d)
=
{a
+
b
d
|
a,
b
∈
Q}.
It
follows
from
the
fact
that
d is√irrational that every element
√
of Q( d) can√be written uniquely as a linear combination of 1 and d √
with rational coefficients.
Therefore Q(√ d) is a vector space over Q of dimension 2 and basis {1, d}. A similar argument
applies to Z[ d], but we need to use different language to express this fact. This is due to the fact
that vector spaces are only defined when the scalars come from a field like Q, R or C or a number
field or a finite field like the integers modulo a prime. In case the scalars come from an integral
domain like Z, we use the notion of a module. Since we don’t always obtain bases for modules
over an integral domain, we attach the word “free” to the description
in case a basis exists. The
√
following proposition √
can be seen as expressing the fact that Z[ d] is a free module of dimension 2
over Z with basis {1, d}.
√
√
Proposition 23. Every element α ∈ Z[ d] can be written uniquely in the form α = a + b d for
a, b ∈ Z. That is, for a, b, r, s ∈ Z we have
√
√
a + b d = r + s d ⇐⇒ a = r and b = s.
√
√
Proof. All we need to note here is that a + b d = r + s d forces
√
(b − s) d = (r − a).
Therefore, b − s 6= 0 would imply that
√
d=
r−a
∈Q
b−s
contrary to our assumption that d is not a square. Therefore b − s = 0 so that b = s. We then have
r = a as well.
√
√
From now on, when we say that α = a + b d ∈ Z[ d], it will be understood that a, b ∈ Z. In
light of the previous result, a and b are uniquely determined by√α. We call a and b the components
of α. We now define the conjugate and norm of elements of Z[ d].
121
√
√
Definition 21. Let α = a + b d ∈ Z[ d].
√
√
1. The conjugate of α, denoted α is defined by α = a − b d ∈ Z[ d].
2. The norm of α, denoted N (α) is defined by N (α) = αα = a2 − db2 ∈ Z.
√
We can now rephrase our problem as the search for all α ∈ Z[ d] having positive components
and norm equal to 1. We will use the following result to show that once a single α having positive
components and norm equal to 1 can be found, all powers of α are also such elements.
√
Proposition 24. Let α, β ∈ Z[ d]. Then N (αβ) = N (α)N (β).
√
√
Proof. Suppose that α = a + b d and that β = r + s d for integers a, b, r, s. We compute
√ N (αβ) = N (ar + dbs) + (as + br) d
= (ar + dbs)2 − d(as + br)2
= a2 r2 + 2dabrs + d2 b2 s2 − da2 s2 − 2dabrs − db2 r2
= a2 r2 − d(a2 s2 + b2 r2 ) + d2 b2 s2
= (a2 − db2 )(r2 − ds2 )
= N (α)N (β).
√
√
We now show that Z[ d]× consists precisely of the elements of Z[ d] having norm equal to 1
or −1. Our problem is then to determine when we get +1 rather than −1.
√
√
Proposition 25. Let d ∈ N not be a square. Then Z[ d]× = {α ∈ Z[ d] | N (α) = ±1}.
√
√
Proof. Let α ∈ Z[ d]× . Then there exists β ∈ Z[ d] such that αβ = 1. Taking norms of both
sides yields
N (α)N (β) = N (αβ) = N (1) = 1.
√
Since N (α) ∈ Z, we conclude that N (α) ∈ {±1}. Conversely, suppose that α ∈ Z[ d] satisfies
N (α) = ±1. We then have
αα = ±1.
√
We conclude
√ × that one of α, −α is the inverse of α. Therefore, α is invertible in Z[ d] so that
α ∈ Z[ d] , as required.
√
√
Having established that Z[ d]× consists of the elements α ∈ Z[ d] with norm equal to ±1,
we see that the invertible elements are split into two categories. We have the ones of norm 1 that
correspond to solutions to x2 − dy 2 = 1 and the ones of norm −1 that correspond to solutions to
x2√
− dy 2 = −1. Since we are interested here only in the ones giving +1, we introduce the notation
Z[ d]×
+ for these elements. That is
√
√
Z[ d]×
+ = {α ∈ Z[ d] | N (α) = 1}.
√
Our problem can now be formulated as determining the elements of Z[ d]×
+ having positive
components. The outline of this classification is as follows:
CHAPTER 17. X 2 − N Y 2 = 1
122
√
1. We show that for α, β ∈ Z[ d]×
+ , we have α < β if and only if their first components satisfy
the same inequality.
2. This allows us to order the elements of interest by ordering their first components.
3. We then apply the least integer principle to the set of first components of the elements of
interest to obtain a least first component.
4. We show that this corresponds to a least θ > 1, called the generator for x2 − dy 2 = 1, among
our elements of interest.
5. We show that every element of interest greater than one is a positive power of θ.
√
6. We conclude that all positive solutions to x2 − dy 2 = 1 correspond to x + y d being a positive
power of θ.
√
Among the above steps in the classification of the elements of Z[ d]×
+ with positive components,
the one that is the most difficult to establish is the fact that the set of first components of the
elements of interest is nonempty. This is required in order to apply the least integer principle.
Equivalently, it is difficult to prove that there exists at least one positive solution to x2 − dy 2 = 1,
but once we know that there is at least one, it isn’t too difficult to describe the rest of the solutions.
We now complete the classification of the elements of interest by establishing (1)–(6) above. We
start with the verification of (1).
√
√
√
Proposition 26. Let α = a + b d and β = r + s d lie in Z[ d]×
+ have positive components. Then
α < β ⇐⇒ a < r.
Proof. We have a2 − db2 = r2 − ds2 = 1, and a, b, d, r, s ≥ 1. Therefore
a < r =⇒ a2 < r2
=⇒ db2 + 1 < ds2 + 1
=⇒ db2 < ds2
=⇒ b2 < s2
=⇒ b < s
√
√
=⇒ a + b d < r + s d
=⇒ α < β.
Proving the converse is entirely similar:
a ≥ r =⇒ a2 ≥ r2
=⇒ db2 + 1 ≥ ds2 + 1
=⇒ db2 ≥ ds2
=⇒ b2 ≥ s2
=⇒ b ≥ s
√
√
=⇒ a + b d ≥ r + s d
=⇒ α ≥ β.
123
As outlined above, we now define
√
√
S = {a ∈ N | a + b d ∈ Z[ d]×
+ for some b ∈ N};
√
√
SZ[√d] = {a + b d ∈ Z[ d] | a, b ∈ N, a ∈ S}
√
√
= {a + b d ∈ Z[ d]×
+ | a, b ∈ N},
invoke the least integer principle to obtain a least element in S and then show that this corresponds
to a least element θ of SZ[√d] whose positive powers provide us with all of the positive solutions to
x2 − dy 2 = 1. Since proving that S is nonempty is harder than the rest of the steps involved, we
will save the proof of this fact for last.
Proposition 27. With the above notation, we have S 6= ∅.
Assuming this result for the time being, we see that we can invoke the least integer principle
to obtain a least element√a of S √since S is bounded below. Since a ∈ S, we then have a positive
b since the components of
integer b such that a + b d ∈ Z[ d]×
+ . In fact, there is exactly one such
√
our elements of interest are uniquely determined. Define θ = a + b d. We now show that SZ[√d]
has θ as a minimum.
Lemma 24. With the above notation, we have
√
(a) SZ[√d] = Z[ d]×
+ ∩ (1, ∞)
(b) θ = min SZ[√d] .
Proof.
(a) It is clear √
that if α√has positive components, then α > 1. Conversely, suppose that
α = a + b d ∈ Z[ d]×
+ is greater than 1. Then, since αα = 1, we conclude that 0 < α < 1.
We have
√
√
1 < α = α + 2b d < 1 + 2b d.
√
Consequently, 2b d > 0 so that b > 0. Having established that b is positive, we now obtain
√
α > 0 =⇒ a − b d > 0
√
=⇒ a > b d > 0.
Therefore, α has positive components so that α ∈ SZ[√d] , as required.
(b) To prove this part, we first note that since θ ∈ SZ[√d] by construction, we are reduced to
proving that for any α ∈ SZ[√d] , we have θ ≤ α. But this follows readily from Proposition
√
26. Indeed, if α = r + s d for positive r and s, then r ∈ S so that a ≤ r since a is the least
element of S. We conclude that θ ≤ α, as required.
We now show that every element of interest is a positive power of θ.
CHAPTER 17. X 2 − N Y 2 = 1
124
Proposition 28. With the above notation, we have
SZ[√d] = {θk | k ∈ N}.
Proof. Since θ ∈ SZ[√d] and SZ[√d] is closed under multiplication, a routine induction shows that
{θk | k ∈ N} ⊆ SZ[√d] .
Conversely, suppose that α ∈ SZ[√d] . Then 1 < θ ≤ α so that α lies between two consecutive
positive powers of θ. Say
θk ≤ α < θk+1 ,
(k ∈ N).
We obtain upon dividing by θk that
1 ≤ θ−k α < θ.
We must then have θ−k α = 1 for otherwise θ−k α ∈ SZ[√d] due to part (a) of Lemma 24. Since
θ−k α is strictly smaller than θ, this would contradict the minimality of θ. Therefore
θ−k α = 1
so that
α = θk .
Since this implies that
SZ[√d] ⊆ {θk | k ∈ N},
and we have already established the reverse containment, we conclude that
SZ[√d] = {θk | k ∈ N}
as required.
We now have all that is required to classify the positive solutions to x2 − dy 2 = 1.
Theorem 40. Let d ∈ N not be a square and θ be the generator for x2 − dy 2 = 1. The positive
solutions to x2 − dy 2 = 1 are precisely the components of the positive powers of θ. That is, all
solutions to x2 − dy 2 = 1 in positive integers are given by
k
θk + θ
x=
,
2
θk − θ
√
y=
2 d
k
(k ∈ N)
Proof. We have seen that the positive solutions correspond to the elements of SZ[√d] all of which
are positive positive powers of θ. It follows that the positive solutions to x2 − dy 2 = 1 are given
by the components of such elements. The last part follows from
√ the observation that the formulae
given extract the components of θk . Indeed, if θk = ak + bk d, then
√
√
θk = ak + bk d,
θk = ak − bk d.
k
If we solve these equations for ak and bk , and use the fact that θk = θ , we obtain
k
θk + θ
ak =
,
2
as required.
θk − θ
√
bk =
2 d
k
125
We now turn to the proof that S 6= ∅. That is, we establish that there exist positive integers a
and b such that a2 − db2 = 1.
Proof of Proposition 27. We start with a proposition due to Dirichlet regarding approximating
irrational numbers by rational numbers. The proof requires invoking the Pigeonhole Principle.
This principle states that if one has n + 1 pigeons to place in n pigeonholes then at least one of the
pigeonholes contains at least two pigeons.
Proposition 29. Let ξ ∈ R \ Q then there exist infinitely many rational numbers x/y with x, y
relatively prime such that
ξ − x < 1 .
y y2
Proof. Let n ∈ N and consider the partition of the half-open unit interval given by
·
1 · 1 2 ·
n−2 n−1 · n−1
[0, 1) = 0,
∪
,
∪ ... ∪
,
∪
,1 .
n
n n
n
n
n
Recall that for real numbers α, the floor of α, denoted bαc, is defined to be the largest integer less
than or equal to α and the fractional part of α, denoted {α}, is defined by {α} = α − bαc. It is
clear that for any α ∈ R, we have {α} ∈ [0, 1). Consider the following list of numbers:
{0ξ}, {1ξ}, {2ξ}, . . . , {nξ} ∈ [0, 1).
These n + 1 numbers (representing pigeons) all lie in one of the n subintervals of [0, 1) listed above
(representing the pigeonholes). By the Pigeonhole Principle, we conclude that at least one of the
subintervals above contains at least two of the numbers listed above. That is, for some 0 ≤ j ≤ n−1,
there exist integers k and ` with 0 ≤ k < ` ≤ n such that
j j+1
{kξ}, {`ξ} ∈
,
.
n n
Thus
1
.
n
Using the floor rather than the fractional part, we obtain
|{`ξ} − {kξ}| <
|(`ξ − b`ξc) − (kξ − bkξc)| <
We therefore have
1
.
n
1
.
n
Let a = b`ξc − bkξc, b = ` − k, g = gcd(a, b) and define x = a/g, y = b/g. We then have (x, y) = 1
and
1
|gyξ − gx| < .
n
Dividing by gy and noticing that y < n and g ≥ 1 yields
ξ − x < 1 ≤ 1 < 1 .
y ngy
ny
y2
|(` − k)ξ − (b`ξc − bkξc)| <
CHAPTER 17. X 2 − N Y 2 = 1
126
We have therefore shown that there exists a rational x/y with (x, y) = 1 such that
ξ − x < 1 .
y y2
Now, since ξ is irrational, we have
ξ − x > 0.
y
We can then choose an integer m such that
1
m > ξ − xy and apply the above argument with n = m to obtain relatively prime integers x1 , y1 with 0 < y1 < m
such that
ξ − x1 < 1
y1 y12
and in fact
ξ − x1 < 1 < 1
y1 my1
y1
ξ − x ≤ ξ − x .
y
y
Repeating the process provides us with relatively prime integers x2 , y2 with y2 > 0 such that
ξ − x2 < 1
y2 y22
and
ξ − x2 < ξ − x1 < ξ − x .
y2
y1
y
Continuing in this fashion, we inductively obtain an infinite sequence of rationals xk /yk (k ≥ 1)
such that
ξ − x > ξ − x1 > ξ − x2 > ξ − x3 > · · · > 0
y
y1
y2
y3 and
ξ − xk < 1
yk yk2
for all k.
We use Proposition 29 to prove the following proposition.
Proposition 30. If d ∈ N is not a square, then the inequality
√
|x2 − dy 2 | < 1 + 2 d.
has infinitely many integer solutions.
127
√
Proof. Since d is positive and not a square, we know that d ∈ R \ Q. By Proposition 29, we
therefore obtain infinitely many rational numbers x/y with x, y relatively prime such that
√
d − x < 1 .
y y2
Multiplying by |y| yields
√ 1
.
x − y d <
|y|
By the triangle inequality, we then have
√ √
√ √
√ √
1
+ 2|y| d.
x + y d = (x − y d) + 2y d ≤ x − y d + 2|y| d <
|y|
Thus
√ √ √
√
2
x − dy 2 = x + y d x − y d < 1 + 2 d ≤ 1 + 2 d.
2
y
This proves the claim.
We now apply Proposition 30 to establish that x2 − dy 2 = 1 has a solution in positive integers
x, y. By Proposition 30, we know that there are infinitely many integers x, y such that
√
2
x − dy 2 < 1 + 2 d.
√
There must then exist an integer m such that 1 ≤ |m| < 1 + 2 d and
x2 − dy 2 = m
for infinitely many integers x and y. This can be seen by applying an extended version of the
Pigeonhole Principle. In particular, we can find two solutions (x1 , y1 ) and (x2 , y2 ) such that x1 6=
±x2 but x1 ≡|m| x2 and y1 ≡|m| y2 . Again, this can be seen by applying an extended version of
the Pigeonhole Principle and using the
√ only finitely many congruence classes
√ fact that there are
modulo |m|. Now define α = x1 − y1 d and β = x2 − y2 d. We then have
√
√
√
αβ = (x1 − y1 d)(x2 + y2 d) = (x1 x2 − dy1 y2 ) + (x1 y2 − x2 y1 ) d.
By construction, the components of αβ are congruent to 0 modulo |m|. We can therefore write
√
αβ = ma + mb d
for some integers a and b. Taking the norm of both sides yields
m2 = |m||m| = N (α)N (β) = m2 a2 − dm2 b2 .
Consequently,
a2 − db2 = 1.
To complete the proof, we need only establish that ab 6= 0. Now, a 6= 0 since otherwise −db2 = 1
and the left hand side is negative whereas the right hand side is positive. Also, if b = 0 then a = ±1
so that αβ = ±m. Multiplying by β yields
mα = ±mβ
so that α = ±β. But this forces x1 = ±x2 which is a contradiction. Therefore, x2 − dy 2 = 1 has a
nontrivial solution, as required.
CHAPTER 17. X 2 − N Y 2 = 1
128
We close this section with an example.
Example 24. Find all positive solutions to x2 − dy 2 = 1 for d ∈ {2, 3, 5}.
√
Solution. For each value of d in question, we need to find the generator θ = a+b d for x2 −dy 2 = 1.
We know that it will correspond to the least
√ positive
√ value of x that yields a positive solution. Also,
since the corresponding element θ = a + b d of Z[ d]×
+ will be greater than 1, we know that a > 1.
We know that a is the least integer greater than 1 such that
a2 − 1 = db2
for some b ∈ N. In particular, we require a2 ≡d 1. This implies that a must be congruent to ±1
modulo the prime divisors of d. In particular, if d = p is a prime, then a ∈ {p − 1, p + 1, 2p − 1, 2p +
1, . . . }. We then look at the numbers
a2 − 1
p
a ∈ {p − 1, p + 1, 2p − 1, 2p + 1, . . . }
in ascending order until we come across a square. Once a square is found, we have located the
generator whose positive powers yield the positive solutions to x2 − py 2 = 1.
1. For d = 2 we have
32 − 1
= 4 = 22
2
√
so we obtain the generator θ = 3 + 2 2.
2. For d = 3 we have
22 − 1
= 1 = 12
3
so we obtain the generator θ = 2 +
√
3.
3. For d = 5 we have
42 − 1
= 3;
5
2
6 −1
= 7;
5
92 − 1
= 16 = 42 ,
5
√
so we obtain the generator θ = 9 + 4 5.
In each case, the positive solutions are given by the components of the powers of θ.
Bibliography
[Dud08] Underwood Dudley, Elementary number theory, 2nd ed., Dover Publications, 2008.
[Ser73] Jean-Pierre Serre, A course in arithmetic, Springer, 1973.
129