next up previous


Answers to questions about factoring and primality testing

Carl Kingsford

April 27, 2000

Q: No questions really, just theorems I had to accept with no proofs. Where can I find proofs of those theorems?

A: Most of the material covered is the the textbook. Pollard Rho factorization (day 1) is covered in section 3.5 (156-159). Pseudoprimes and the Miller-Rabin test for primality (day 2) are covered in section 5.2 (192-200). The theorems that I skipped for time sake were theorems 5.9, 5.10, 5.12. There are proofs for these in the book. Theorem 5.11 concerning the effectiveness of Rabin's Probabilistic Primality Test has no formal proof in the book, but the result follows obviously from Theorem 5.10.


Q: In the theorem for testing and constructing Carmichael numbers (THM 5.7), we let $n=q_1q_2q_3\ldots q_s$ where they are all distinct primes. Can the qi's be prime powers? I.e. can a Carmichael number look like $7\cdot 11^2\cdot 19$?

A: The short answer is no. All Carmichael numbers are of the form $n=q_1q_2q_3\ldots q_s$ where the qi's are all distinct primes. For the longer answer, we ask: why is this so?

Well, the proof of that theorem breaks down if the qi's aren't distinct. In that theorem, we construct a set of congruence equations

\begin{displaymath}
b^{n-1} = b^{(q_j-1)t_j} \equiv 1 \pmod{q_j}
 \end{displaymath}

and then apply COR 3.8.1 (page 124) to combine all these congruences into a single congruence

\begin{displaymath}
b^{n-1} \equiv 1 \pmod{n}.
 \end{displaymath}

We can't use COR 3.8.1 unless the moduli of the ``input'' congruences are not relatively prime. Hence they must be distinct.

For example: $3321 = 3^4\cdot 41$. We see that (3-1)|3320 and (41-1)|3320 so except for being distinct the conditions of the theorem are satisfied, but $2^{3320} \equiv 2099 \not\equiv 1
 \pmod{3321}$.


Q: If a number is a pseduoprime with base 2, is it a pseudoprime with a base of any number? I.e. if n is not prime, but $2^{n-1}
\equiv 1 \pmod{n}$ does it follow that $7^{n-1}\equiv 1 \pmod{n}$

A: Not necessarily. For example, $2^{560} \equiv 1 \pmod{561}$ so 561 is a pseudoprime to the base 2, but $3^{560} \equiv 375 \pmod{561}$ so it is not a pseduoprime to the base 3.

Some numbers as pseudoprimes to several bases, some numbers are pseudoprimes to all bases (these are the Carmichael numbers) and some are of course not pseudoprimes at all.


Q: Is there any particular base or group of bases which is more likely to work with ``mostly all of the different theorems'' relating to pseudoprimes? If so, what is it and why? If not, why not?

A: To be a bit more specific, let's consider the question ``is there a base or group of bases for which fewer numbers are pseudoprimes to that base?'' On one level, the answer to this question is no, because as we showed in class, there are an infinite number of pseudoprimes to base 2, and this result generalizes to any base. Hence, if we pick any base, there will be an infinite set of numbers that slip by.

But that doesn't consider density. Suppose that for base 2, pseudoprimes come along about every 500 numbers, but for base 13 we find them only every million numbers. Then base 13 might be a better choice. (These figures have no basis in reality; they are for example only.) As far as I know, no one has investigated the relative densities of pseudoprimes to different bases. I calculated all the psuedoprimes to the base 2,3,5,13,17,19,23,27 below 50,000. The table below summarizes how many pseudoprimes to each base there are below 50,000:

base: 2 3 5 13 17 19 23 27
# 55 53 54 68 63 93 110 119
From this small amount of data, one might suspect that the larger the base, the more pseudoprimes there are. More computation and work would be needed to know for sure. Of course, if this is true, then it is a reason for using small bases. The list of pseudoprimes is available from me upon request.

For Miller's test, we have an upper bound on the density of strong pseudoprimes to the base b. From THM 5.10, we know that n is a strong pseudoprime (that is passes Miller's test) for no more than (n-1)/4 bases b, where $1 \leq b \leq n-1$. So the probability that a number is a strong pseudoprime to the base b is less than 1/4.

The idea of density is really important when you consider that most work with factoring is done on computers with limited precision. That means that they can only represent numbers up to some fixed size.[*] If we know that for some base, due to its low density, there are only, say, 5 pseudoprimes to that base among the integers that we can represent then that base might be very useful. We could even compute and remember those 5 integers and check against them when testing for primality. In fact, as stated in the textbook (p. 197) the only odd integer that passes Miller's test for all bases 2, 3, 4, and 7 less than $25\cdot 10^9$ is 3215031751.

Another practical consideration is the calculation involved. Since to test if something is a pseudoprime (or to apply Miller's test for primality), we have to compute $b^{O(n)}\pmod{n}$ we don't want the base to be too large, since squaring a big base is more work than squaring a little base. Also, since modern computers work in base 2, it is simple to manipulate powers of 2. For example, 2 would be represented in the computer as $\ldots 00000010_2$. To compute 25 simply means shifting the 1 bit over 4 places:

\begin{displaymath}
2 = \ldots 00000010_2 \longrightarrow \ldots 00100000_2 = 32.
 \end{displaymath}

Computers can execute a shift operation like the one above very quickly. These two reasons, and the lack of evidence suggesting any other base is better, is what leads most people to use base 2 to start out with.


Q: Are there fewer Carmichael numbers or strong pseudoprimes?

A: We showed in class there are infinitely many strong pseudoprimes to any base (THM 5.9). THM 5.7 on constructing Carmichael numbers is a fundamental component in the proof that there are infinitely many Carmichael numbers. (That proof by Alford, Granville, and Pomerance in 1992 is very complicated.) Hence both sets are infinite.

Which set is less than the other in cardinality? [The cardinality of both sets is the same; they are countably infinite (cardinality = aleph-null), the same as the integers.]

Given a Carmichael number c, we know that it is a pseudoprime to base 2. In the proof of THM 5.9 (pg. 196-197) we showed that if n is a pseudoprime to the base 2, then N = 2n - 1 is a strong pseudoprime to the base 2. So, define g from the Carmichael numbers to the strong pseudoprimes to the base 2 by g(n) = 2n - 1. Clearly g is one-to-one. Also, 1373653 is a strong pseudoprime to the base 2, but

220 = 1048576 < 1373653 < 209752 = 221

so 1373653 is not of the form 2n - 1 so g is not onto. Hence there are fewer Carmichael numbers than strong pseduoprimes.


Q: Can a number be both a Carmichael number and a strong pseudoprime? I.e. does the set of strong pseudoprimes intersect the set of Carmichael numbers?

A: Yes. In fact $8911 = 7\cdot 19 \cdot 67$ is both a Carmichael number and a strong pseudoprime to the base 3. We can see that it is a Carmichael number because 6|8910, 18|8910, and 66|8910. It is a strong pseudoprime to the base 3 because $8910 =
 2\cdot 3^4 \cdot 5 \cdot 11$. So in Miller's test, $t = 3^4\cdot 5
 \cdot 11 = 4455$ and $3^{4455} \equiv 8910 \equiv -1
 \pmod{8911}$. In fact, the table below lists all the Carmichael numbers less than 200,000 which are also strong pseudoprimes to the bases 2,3,5, and 6:

2 3 5 6
15841 8911 none 46657
29341 10585   115921
52633      


Q: When talking about the Pollard Rho factorization method we were calculating the random number sequence by calculating $x_{k+1} \equiv f(x_k) \pmod{n}$. What is the significance of calculating it $\pmod d$ and finding the loop in the $\rho$?

A: Recall the Pollard Rho algorithm:

1.
Generate x1, x2, ...
2.
Compute (x2k - xk, n) = d for $k = 1,2,3,\ldots$. If $d \not= n$ and $d \not= 1$ then clearly d is an interesting factor of n.
We never calculate $\pmod d$ since we don't know what d is. It might be helpful to refresh our memory about why we compute x2k - xk -- that's were the idea of working $\pmod d$ and the expanding loop come in.

Remember in the picture of the $\rho$ there is a first place were the values repeat (after the ``stem''). Call this place xi. Suppose we (by some magical power) knew d from above. Let j be the next index where $x_i \equiv x_j \pmod{d}$. Then we know that the length of the loop is j-i so if we pick any xq and xr such that $q \equiv r
\pmod{j-i}$ then $x_r \equiv x_q \pmod{d}$

So let s be the smallest multiple of (j-i) that is > i. That means that xs is in the loop (since s > i) and we know that $s
\equiv 2s \equiv 0 \pmod{j-i}$ since s is a multiple of j-i. Then by our discussion above, $x_s \equiv x_{2s} \pmod{d}$.

When we compute (x2k - xk, n) = d and get a non-trivial GCD, we know that d | (x2k - xk) in other words, $x_{2k} \equiv x_k
\pmod{d}$. So when we compute (x2k - xk, n) = d we are looking for a d and that search is motivated by the idea that if we have a loop, then (x2k - xk, n) = d for some x2k - xk.

The idea of the expanding ``lasso'' is simply this search. Since the x2ks and the xks get farther and farther apart as k grows, we are trying to make them equivalent $\pmod d$. For example, suppose the length of the loop were 100. Then when k=3, xk and x2k are very near each other and they could not possibly be equivalent, but as k grows, they will drift farther apart.

Note though, that the method is not perfect, because we don't use a real random sequence, we may not get a loop (or at least it may take us a long time).


Appendix: MATLAB Code

Recursive exponentiation procedure

This function computes $b^n \pmod{m}$ in a way that won't exceed the computers precision (for reasonably sized integers):
function a = recmod(b,n,m)
%
% BASE CASE
%
a = 1;
if( n == 0 )
   return;
end

%
% RECURSIVE CASE
y = recmod(b, floor(n/2), m);
if( mod(n,2) == 0 )
   a = mod( (y^2), m );
else
   a = mod( (y^2)*b, m );
end;
return;

Miller's test routine

This function returns 1 if the integer n passes Miller's test for base b; it returns 0 otherwise. A key point in the implementation is that

b2<<103>>j+1t = (bt)2<<104>>j+1 = ((bt)2j)2

So, we compute value for the next j using the value for the previous j and save us some computation time.
function y = Millers( b, n )

% calculate t and s
fact = sort(factor( n - 1 ));

% We figure out how many 2s there are in the factorization
s = 0;
len = length( fact );
for k=1:len
   if( fact(k) == 2 )
      fact(k) = 1;  % change the 2s to 1s
      s = s + 1;
   else
      break;
   end;
end;

t = prod( fact );  % we compute, since all 2 --> 1

% compute b^t mod n and check to see if it is 1 or -1:
bt = recmod( b, t, n );
if( bt == 1 | bt == n - 1 )
   y = 1;
   return;
end;

% We now check the second case of Miller's test. See the note
% above.
prev = bt;
for j=1:s-1
   r = recmod( prev, 2, n );
   if( r == n - 1 )
      y = 1;
      return;
   end;
   prev = r;
end;

y = 0;  % it failed all tests
return;

About this document ...

Answers to questions about factoring and primality testing

This document was generated using the LaTeX2HTML translator Version 97.1 (release) (July 13th, 1997)

Copyright © 1993, 1994, 1995, 1996, 1997, Nikos Drakos, Computer Based Learning Unit, University of Leeds.

The command line arguments were:
latex2html -split 0 -link 0 mth128s_quest.tex.

The translation was initiated by Joshua Holden on 4/27/2000


Footnotes

...size.
Some may object ``what about `infinite precision' math packages?'' But note that there is a always practical space--memory and disk storage--and time limit on the magnitude of the integers involved, even if there is no arbitrary limit.


next up previous
Joshua Holden
4/27/2000