Consider the final k digits of 2^n. (If 2^n has fewer than k digits, pad with zeroes.) Now repeatedly multiply by 2, and then truncate mod 10^k.
For k = 2, you get the repeating sequence "2", "4", "8", "6", "2", ....
For k = 3 you get the repeating sequence "04", "08", "16", "32", "64", "28", "56", "12", "24", "48", "96", "92", "84", "68", "36", "72", "44", "88", "76", "52", "04", ....
2^(n+4) = 2^n mod 10^1
2^(n+20) = 2^n mod 10^2
2^(n+100) = 2^n mod 10^3
2^(n+500) = 2^n mod 10^4
...
Because there are only 10^k possible k-digit strings, we know that ultimately the final k digits of each successive power of 2 must ultimately enter a loop.
In practice, that loop seems to grow in size by a factor of 5 each time. (I assume there is a number theoretic reason for this, but I am not sure because I am terrible at number theory.) So let's assume (possibly fallaciously) that the loop contains 4* 5^(k-1) distinct elements. (Notice how this is substantially less than the 10^k possible elements.)
Assuming that the digits are distributed randomly - which, again, is possibly fallacious - the probability of a random k-digit number containing no zeroes is (9/10)^k.
Therefore, the probability of a k-digit number containing at least one zero is 1-(9/10)^k.
Therefore, the probability of all 4* 5^(k-1) elements containing at least one zero is [1-(9/10)^k]^[4* 5^(k-1)].
Which rapidly tends to 0. Hoom. Well, I hoped to show it tended to 1, but never mind.
> In practice, that loop seems to grow in size by a factor of 5 each time. (I assume there is a number theoretic reason for this, but I am not sure because I am terrible at number theory.)
I can explain this reason. Consider 2^n mod 10^k. If n > k, then we can rewrite this as:
2^k * (2^(n-k) mod 5^k)
And then the Euler-Fermat theorem states that, if a and m are coprime, then
a^(φ(m)) ≣ 1 (mod m)
where φ(m) is Euler's totient function, the number of integers from 1 to m that are coprime to m. And φ(p^k) = (p-1) * p^(k-1) for any prime p, so we get in particular that φ(5^k) = 4 * 5^(k-1), which gives us the (maximum possible value of the) cycle length.
Maybe some thinking about individual digit positions could shed some light. Let's call the last digit d_0, the next to last d_1, and so on.
d_0 follows a simple pattern: 1, 2, 4, 8, 6, 2, ..., and so never has a zero. It ends up in this pattern [2, 4, 8, 6], and throws two carries over to d_1.
d_1 goes 0, 0, 0, 0, 1, 3, 6, 2, 5, 1, 2, 4, 9, 8, 6, 2, 4. It ends up repeating [8, 6, 2, 4], and throwing two carries to d_1.
In general, d_n starts off with a sequence of 0, until it gets the first carry from d_{n-1}, then tries to fall into 1, 2, 4, 8, 6, 2, ..., but the carries from the previous column perturb that. However, the carries come in on a pattern, and so the result will be that d_n ends up in a cycle.
I haven't looked past the first couple of columns, but I'd not be surprised if there is a predictable pattern in these cycles and their phases that would allow someone to work out a considerable amount about the occurrences of any specific digits in powers of two.
EDIT: I missed a carry when doing d_1. d_1 goes 0,0,0,0,1,3,6,2,5,1,2,4,9,9,8,6,3,7,4,8,7,5,0,0 and repeats [1,3,6,2,5,1,2,4,9,9,8,6,3,7,4,8,7,5,0,0]. It throws 10 carries into d_2.
It's interesting to me that he has an article a few places below this one on "Contrived versus genuine mathematical problems". Anything that looks at string properties of integers' base ten representations as important seems fairly contrived. Project Euler suffers a lot from this type of problem as well.
That's the cycle of powers of 2 mod 100. (I removed the beginning 1 and 2, which don't repeat.) The first two numbers will show up as 04 and 08 in our number (except when the number is 4 or 8). Thus, there's always a 0 when n is 2 or 3 mod 20. (Note that there are 20 numbers up there.)
That's the cycle of powers of 2 mod 1000, starting with n=3. There are 100 numbers. (Incidentally, the Euler-Fermat theorem and some thinking can tell you that the cycle of powers of 2 mod 10^k will have length that divides phi(5^k), which is 4 * 5^(k-1), which is 4, 20, 100, 500, etc.) I'll make my computer figure out which numbers contain 0.
Which is 19 out of the 100 different things mod 1000. Now let's write a function to deal with 10^k.
(def ass (k)
(withs (pt (expt 10 k)
u (dedup:mapn [string:num->digs (mod-expt 2 _ pt) 10 k]
k (+ k (* 4 (expt 5 dec.k)))))
(list (count [pos #\0 _] u)
len.u)))
To explain the above code: I let "pt" = 10^k, then I take the numbers from k to k + 4 * 5^(k-1), find 2^n (mod pt), write out its digits in base 10 (and force there to be k digits, by appending 0's to the front if necessary), put the digits together into a string, make a list of all these resulting strings, remove duplicates just to be sure, and give the resulting list the name "u"; then I return a list of two numbers, the number of strings in the list that contain 0, and the total number of strings in the list. From what we've seen above, we'd expect (ass 2) to return (2 20), and (ass 3) to return (19 100).
arc> ass.2
(2 20)
arc> ass.3
(19 100)
That's a good sign. Now let's proceed to higher numbers. I'll make it print out the resources it's using, too.
Well, we've passed the halfway mark, but my computer is feeling the pressure. If you rewrote this in C or something, you might be able to go up a few numbers higher, but probably not much. (Hmm, actually, I'm kind of repeating a bunch of work: I already know that "2 and 3 mod 20" numbers always have a 0, and similar for "3 4 5 6 10 ... mod 100 [add 3 to my third list of numbers]". However, if I ignored those, that would only cut out half the work for k=8. But maybe a lot more will get cut out in higher numbers, and the number of things to deal with will become manageable?)
Well, in principle at least, this could be solved by computer.
Edit: Hmm, qntm seems to have said much of this already. Well, at least I have done some grunt work to illustrate his arguments.
+1. At first I thought that none of these statements could even remotely be proved, but I've changed my mind. I totally buy this.
In particular, it looks like this would prove "Every sufficiently large power of 2 contains at least k zeroes", for any value of k. The computational time would probably be prohibitive, but this should definitely work in principle.
As a working analytic number theorist, I suspect the asymptotic formula is out of reach, for similar reasons to why you can't count twin primes. I'd be very pleased if someone persuaded me I'm wrong.
But, please, none of this mumbo-jumbo about "higher order Fourier analysis". Gowers et al. developed it to find progressions in arithmetic sequences, which is not related to the present problem as far as I can see.
it looks like this would prove "Every sufficiently large power of 2 contains at least k zeroes", for any value of k. The computational time would probably be prohibitive, but this should definitely work in principle.
Only when you accept constructive proofs that do not say anything about their running time and that may not even halt.
I suspect the asymptotic formula is out of reach, for similar reasons to why you can't count twin primes.
I am not aware of any result on the _inherent_ difficulty of counting twin primes. Are you hinting at one? If so, can you give a reference?
By "in principle" I am excluding any consideration of running time, or computing resources. For example, if k = 10, perhaps you would need a computer with 10^10000 bytes of RAM working for an equal number of years. It's a finite computation. ;)
"may not even halt" corresponds to my "it looks like". Such algorithms not working would be equivalent to me and the parent commenter being wrong -- which is not out of the question. Probabilistically, such an algorithm should work (I think -- have not checked the details), so it ought to work unless there is some unforeseen reason why it wouldn't. A "conspiracy" against this happening, if you will.
As far as the difficulty in counting twin primes -- well, to give an easier example, look at Cojocaru and Murty's book on sieve methods and read about the sieve of Eratosthenes-Legendre. You know that 1/2 of numbers are even, 2/3 aren't divisible by 3, 4/5 aren't divisible by 5, etc., so the proportion of numbers that are prime is equal to (1/2)(2/3)(4/5)(6/7)..., which you can show with a little bit of effort is equal to 0.
With some work, you can turn this into a proof. In theory, you can estimate the number of primes < X, as a function of X. This estimate would be roughly along the lines of the original post. But as Cojocaru and Murty (among others) point out, the error terms get bad quickly. This type of phenomenon is extremely common in analytic number theory.
In the case of prime counting, you can get better results by other methods, most prominently the Riemann zeta function. But even then you can only get the best results if you prove the Riemann Hypothesis. Million dollar bounty out on that one.
2^n has approximately n log 2 digits. Assuming that they are distributed randomly, the probability that none of these digits are 0 is (9/10)^(n log 2).
For n = 50,000, this is already unbelievably tiny. For n = 50,001, it is even tinier. And so on.
In fact, summing all of these tiny probabilities from n = 50,000 to n = infinity is about the same as integrating (9/10)^(x log 2) from x = 50,000 to x = infinity, which is, again, only slightly above zero.
So, statistically at least, there is an excellent chance that every power of 2 above 50,000 has at least one zero.
Well, this is what precisely needs to be proven. You cannot just assume that -- they're not distributed randomly in base 2, why they should be in base 10? They most likely are, but if you assume that, you are assuming the hypothesis.
If we ask instead how many nonzero digits there are in 2^n, there's actually a known lower bound on this, due to Cam Stewart, though it is unfortunately quite tiny - on the order of (log log n)/(log log log n), much smaller than what empirically appears to be the case.
In fact, the statement is much more general: Say you have two bases A and B (such that no power of A is a power of B, to prevent triviality), a base-A digit a, and a base-B digit b. For any n let L_A,a(n) denote the number of non-a digits in the base-A expansion of n, and L_B,b(n) similarly with B and b. Then for any n, L_A,a(n)+L_B,b(n)>(log log n / (log log log n + C)) - 1, where C is a constant depending only on A and B. So while it's certainly impossible to say "large numbers have complicated base-A expansions", it is in some sense possible to say that they can't simultaneously have simple base-A and base-B representations!
In particular you get a lower bound on number of nonzero digits (or more generally non-b digits) of powers of A written to base B by just taking n to be powers of A and taking a=0.
Unfortunately, not only is the lower bound tiny even asympotically, but the constant involved is absolutely huge. (The paper doesn't explicitly state how to compute this constant, but once I was trying to bound below the number of nonzero digits in powers of 2 written to base 3, so I traced through all the computations and references, and it's ridiculously large, much too large for what I was trying to do... I forget just how large, except that it was the sort of number most conveniently described as being "about 10^[something]" rather than in words.) So again, not close to what appears to be true empirically.
The fact that it's linear actually hints that the distribution of digits is actually random: the probability of 0 in any position other than the first and the last is indeed approaching 1/10, as n goes to infinity.
For k = 2, you get the repeating sequence "2", "4", "8", "6", "2", ....
For k = 3 you get the repeating sequence "04", "08", "16", "32", "64", "28", "56", "12", "24", "48", "96", "92", "84", "68", "36", "72", "44", "88", "76", "52", "04", ....
2^(n+4) = 2^n mod 10^1
2^(n+20) = 2^n mod 10^2
2^(n+100) = 2^n mod 10^3
2^(n+500) = 2^n mod 10^4
...
Because there are only 10^k possible k-digit strings, we know that ultimately the final k digits of each successive power of 2 must ultimately enter a loop.
In practice, that loop seems to grow in size by a factor of 5 each time. (I assume there is a number theoretic reason for this, but I am not sure because I am terrible at number theory.) So let's assume (possibly fallaciously) that the loop contains 4* 5^(k-1) distinct elements. (Notice how this is substantially less than the 10^k possible elements.)
Assuming that the digits are distributed randomly - which, again, is possibly fallacious - the probability of a random k-digit number containing no zeroes is (9/10)^k.
Therefore, the probability of a k-digit number containing at least one zero is 1-(9/10)^k.
Therefore, the probability of all 4* 5^(k-1) elements containing at least one zero is [1-(9/10)^k]^[4* 5^(k-1)].
Which rapidly tends to 0. Hoom. Well, I hoped to show it tended to 1, but never mind.