Factoring numbers using silly methods
Oct. 4th, 2020 08:24 amAn infamously difficult unsolved problem in mathematics is to prove that factoring numbers is tough. We know that if someone gives you a claimed factorization that it is easy to check. If someone asks you to factor 629, you may not have an easy time. But if someone tells you that 629 is 17 times 37, you can check this by hand (or if you want even more easily with a calculator). In general, proving that something is tough to do is very hard. Even after one gets past all the issues of how to define what one means rigorously, one has to somehow prove that *no algorithm* works efficiently and that's a tall order. But in the case of factoring there are some interesting examples of how far we are from actually proving anything which are worth stating explicitly.
Here's a naive way of trying to factor a number (discussed in more detail at this excellent Quanta article ). The basic idea is that it turns out that polynomials (assuming their degree and coefficients aren't too large) are easier to factor than numbers. How can we take advantage of that? Let's say we wanted to factor the number 15, and we didn't notice it was just 15 = (3)(5). So instead we'll try to write 15 in base 2, where we get 15 = 11112, and this is the same as saying 15 = 1(23)+1(22)+1(2)+1, and this now is almost a polynomial, in particular it is the value of the polynomial x3 +x2 +x +1 when x=2. So if we can factor x3 +x2 +x +1. we can use that to factor our original. Sure enough, x3 +x2 +x + 1 = (x+1)(x2+1), and so we plug back in x=2, and we get that 15 = (2+1)(22+1) = (3)(5). We did this using base 2, but one can use the same trick in other bases for some numbers. (Of course it could turn out that n is prime in which case this trick will fail for any base, but we can detect primes using other means so that's ok.) Maybe this trick will almost always work. We don't get that lucky, but even proving this is tough. Emmanuel Breuillard and Péter Varjú proved that (assuming the Riemann Hypothesis), that for any base b, only a very tiny fraction of numbers can be factored using this trick. But wait, you say, that just means we probably can't do this for a specific base. What if we're allowed to pick our base cleverly. Maybe we can always find such a base if we can somehow choose the base. Or maybe, we can go through a small list of bases, say try this trick with every base b where b is small compared to n, say not much bigger than log log n. Already we can't prove that this doesn't work, although it does seem plausible that the techniques of Breuillard and Varjú could be extended to this context, and Theorem 2 from their paper does push in that direction. But if we struggle to even prove that this sort of very basic (if you'll pardon the pun) trick won't work in general, how can we possibly hope to prove that factoring itself is tough? Worse, from a security standpoint, the numbers we really care about not being able to factor efficiently are *semiprimes* that is a number which is a product of exactly two primes. 15 above is an example of a semiprime, but for example, 1001 is not a semiprime because it had three prime factors, 1001=(7)(11)(13).
In fact, most positive integers are easy to factor; the difficulty is really showing that numbers which are not prime but have only a small set of prime factors each of which is relatively large compared to the number are difficult to factor. Here's another example although in some respects a bit sillier. It also turns out that there's an easy way given two positive integers x and y to find their greatest common divisor which is much more efficient than factoring x and y . For example, it may not be that easy to factor 133 or 217, but if you can find their gcd, which is 7, then you've already found a factor for both. The ability to efficiently find gcds is due to the Euclidean Algorithm and has been known for about 2000 years. So, maybe we can given a number n, make a short list of numbers based on n, and get that the gcd of n and at least one of these numbers is greater than 1, in which case we've found a factor. So, in that spirit, here's a naive idea which has absolutely no serious hope of actually working but which I don't know how to prove that it doesn't work: For a given positive integer we can look at its digital reversal in base b, which we can get when we take the number where we write all its digits in base b in reverse. For example, in base 10, 132 would become 231. Maybe, for n, we can find a small base b, such that n and the digital reversal of n share a common factor. This isn't interesting with our example of 15, since 15 in base 2 is just 11112 so it is the same as its digital reversal. But this does actually work to factor some numbers and bases. For example, 21 in base 10 has a digital reversal of 12, and 21 and 12 share a common factor of 3. In fact, a good exercise is to show that if n is divisible by 3, then n does share a factor of 3 with its base 10 digital reversal (what is the correct generalization here to any given base?). So here's a statement which must be false but I don't know how to prove: Given a composite n, there's always a base b where b is small compared to n (say not larger than a constant time log n) and n and the digital reversal of n with respect to b has a common prime factor. It may be that there's an easy proof that the above doesn't happen, but if so, I don't know of it. Now, a professional, either a number theorist or a computer scientist, might object to some of what we've written above: Of course, proving these sorts of things is difficult. Proving statements about the behavior of numbers in different bases is known in general to be very difficult. (For one semi-famous unsolved problem of this sort: We strongly suspect that this sequence is finite https://oeis.org/A258107 but cannot prove it. ) So why should we be surprised that these sorts of things are tough to prove? But this underscores exactly why proving that factoring is hard; if we can't even prove things about these highly restricted algorithms coming just from writing the number in various bases, how can we hope to prove anything when the space of all possible algorithms is far, far larger?
Here's a naive way of trying to factor a number (discussed in more detail at this excellent Quanta article ). The basic idea is that it turns out that polynomials (assuming their degree and coefficients aren't too large) are easier to factor than numbers. How can we take advantage of that? Let's say we wanted to factor the number 15, and we didn't notice it was just 15 = (3)(5). So instead we'll try to write 15 in base 2, where we get 15 = 11112, and this is the same as saying 15 = 1(23)+1(22)+1(2)+1, and this now is almost a polynomial, in particular it is the value of the polynomial x3 +x2 +x +1 when x=2. So if we can factor x3 +x2 +x +1. we can use that to factor our original. Sure enough, x3 +x2 +x + 1 = (x+1)(x2+1), and so we plug back in x=2, and we get that 15 = (2+1)(22+1) = (3)(5). We did this using base 2, but one can use the same trick in other bases for some numbers. (Of course it could turn out that n is prime in which case this trick will fail for any base, but we can detect primes using other means so that's ok.) Maybe this trick will almost always work. We don't get that lucky, but even proving this is tough. Emmanuel Breuillard and Péter Varjú proved that (assuming the Riemann Hypothesis), that for any base b, only a very tiny fraction of numbers can be factored using this trick. But wait, you say, that just means we probably can't do this for a specific base. What if we're allowed to pick our base cleverly. Maybe we can always find such a base if we can somehow choose the base. Or maybe, we can go through a small list of bases, say try this trick with every base b where b is small compared to n, say not much bigger than log log n. Already we can't prove that this doesn't work, although it does seem plausible that the techniques of Breuillard and Varjú could be extended to this context, and Theorem 2 from their paper does push in that direction. But if we struggle to even prove that this sort of very basic (if you'll pardon the pun) trick won't work in general, how can we possibly hope to prove that factoring itself is tough? Worse, from a security standpoint, the numbers we really care about not being able to factor efficiently are *semiprimes* that is a number which is a product of exactly two primes. 15 above is an example of a semiprime, but for example, 1001 is not a semiprime because it had three prime factors, 1001=(7)(11)(13).
In fact, most positive integers are easy to factor; the difficulty is really showing that numbers which are not prime but have only a small set of prime factors each of which is relatively large compared to the number are difficult to factor. Here's another example although in some respects a bit sillier. It also turns out that there's an easy way given two positive integers x and y to find their greatest common divisor which is much more efficient than factoring x and y . For example, it may not be that easy to factor 133 or 217, but if you can find their gcd, which is 7, then you've already found a factor for both. The ability to efficiently find gcds is due to the Euclidean Algorithm and has been known for about 2000 years. So, maybe we can given a number n, make a short list of numbers based on n, and get that the gcd of n and at least one of these numbers is greater than 1, in which case we've found a factor. So, in that spirit, here's a naive idea which has absolutely no serious hope of actually working but which I don't know how to prove that it doesn't work: For a given positive integer we can look at its digital reversal in base b, which we can get when we take the number where we write all its digits in base b in reverse. For example, in base 10, 132 would become 231. Maybe, for n, we can find a small base b, such that n and the digital reversal of n share a common factor. This isn't interesting with our example of 15, since 15 in base 2 is just 11112 so it is the same as its digital reversal. But this does actually work to factor some numbers and bases. For example, 21 in base 10 has a digital reversal of 12, and 21 and 12 share a common factor of 3. In fact, a good exercise is to show that if n is divisible by 3, then n does share a factor of 3 with its base 10 digital reversal (what is the correct generalization here to any given base?). So here's a statement which must be false but I don't know how to prove: Given a composite n, there's always a base b where b is small compared to n (say not larger than a constant time log n) and n and the digital reversal of n with respect to b has a common prime factor. It may be that there's an easy proof that the above doesn't happen, but if so, I don't know of it. Now, a professional, either a number theorist or a computer scientist, might object to some of what we've written above: Of course, proving these sorts of things is difficult. Proving statements about the behavior of numbers in different bases is known in general to be very difficult. (For one semi-famous unsolved problem of this sort: We strongly suspect that this sequence is finite https://oeis.org/A258107 but cannot prove it. ) So why should we be surprised that these sorts of things are tough to prove? But this underscores exactly why proving that factoring is hard; if we can't even prove things about these highly restricted algorithms coming just from writing the number in various bases, how can we hope to prove anything when the space of all possible algorithms is far, far larger?