How many zeros are there in 2^n?

qntm · on April 24, 2011

Consider the final k digits of 2^n. (If 2^n has fewer than k digits, pad with zeroes.) Now repeatedly multiply by 2, and then truncate mod 10^k.

For k = 2, you get the repeating sequence "2", "4", "8", "6", "2", ....

For k = 3 you get the repeating sequence "04", "08", "16", "32", "64", "28", "56", "12", "24", "48", "96", "92", "84", "68", "36", "72", "44", "88", "76", "52", "04", ....

2^(n+4) = 2^n mod 10^1

2^(n+20) = 2^n mod 10^2

2^(n+100) = 2^n mod 10^3

2^(n+500) = 2^n mod 10^4

...

Because there are only 10^k possible k-digit strings, we know that ultimately the final k digits of each successive power of 2 must ultimately enter a loop.

In practice, that loop seems to grow in size by a factor of 5 each time. (I assume there is a number theoretic reason for this, but I am not sure because I am terrible at number theory.) So let's assume (possibly fallaciously) that the loop contains 4* 5^(k-1) distinct elements. (Notice how this is substantially less than the 10^k possible elements.)

Assuming that the digits are distributed randomly - which, again, is possibly fallacious - the probability of a random k-digit number containing no zeroes is (9/10)^k.

Therefore, the probability of a k-digit number containing at least one zero is 1-(9/10)^k.

Therefore, the probability of all 4* 5^(k-1) elements containing at least one zero is [1-(9/10)^k]^[4* 5^(k-1)].

Which rapidly tends to 0. Hoom. Well, I hoped to show it tended to 1, but never mind.

waterhouse · on April 24, 2011

> In practice, that loop seems to grow in size by a factor of 5 each time. (I assume there is a number theoretic reason for this, but I am not sure because I am terrible at number theory.)

I can explain this reason. Consider 2^n mod 10^k. If n > k, then we can rewrite this as:

  2^k * (2^(n-k) mod 5^k)

And then the Euler-Fermat theorem states that, if a and m are coprime, then

  a^(φ(m)) ≣ 1 (mod m)

where φ(m) is Euler's totient function, the number of integers from 1 to m that are coprime to m. And φ(p^k) = (p-1) * p^(k-1) for any prime p, so we get in particular that φ(5^k) = 4 * 5^(k-1), which gives us the (maximum possible value of the) cycle length.

leif · on April 25, 2011

this is similar to a greater ny acm icpc problem from 2008 (which my team was the only one to solve :-)

impendia · on April 24, 2011

Nice data, good conjecture with solid probabilistic reasoning.

But beware: none of this is even the start of a proof.

tzs · on April 24, 2011

Maybe some thinking about individual digit positions could shed some light. Let's call the last digit d_0, the next to last d_1, and so on.

d_0 follows a simple pattern: 1, 2, 4, 8, 6, 2, ..., and so never has a zero. It ends up in this pattern [2, 4, 8, 6], and throws two carries over to d_1.

d_1 goes 0, 0, 0, 0, 1, 3, 6, 2, 5, 1, 2, 4, 9, 8, 6, 2, 4. It ends up repeating [8, 6, 2, 4], and throwing two carries to d_1.

In general, d_n starts off with a sequence of 0, until it gets the first carry from d_{n-1}, then tries to fall into 1, 2, 4, 8, 6, 2, ..., but the carries from the previous column perturb that. However, the carries come in on a pattern, and so the result will be that d_n ends up in a cycle.

I haven't looked past the first couple of columns, but I'd not be surprised if there is a predictable pattern in these cycles and their phases that would allow someone to work out a considerable amount about the occurrences of any specific digits in powers of two.

EDIT: I missed a carry when doing d_1. d_1 goes 0,0,0,0,1,3,6,2,5,1,2,4,9,9,8,6,3,7,4,8,7,5,0,0 and repeats [1,3,6,2,5,1,2,4,9,9,8,6,3,7,4,8,7,5,0,0]. It throws 10 carries into d_2.

Almaviva · on April 24, 2011

It's interesting to me that he has an article a few places below this one on "Contrived versus genuine mathematical problems". Anything that looks at string properties of integers' base ten representations as important seems fairly contrived. Project Euler suffers a lot from this type of problem as well.

waterhouse · on April 24, 2011

Beginning of an outline of a proof...

  arc> (dedup:mapn [mod-expt 2 _ 100] 2 100)
  (4 8 16 32 64 28 56 12 24 48 96 92 84 68 36 72 44 88 76 52)

That's the cycle of powers of 2 mod 100. (I removed the beginning 1 and 2, which don't repeat.) The first two numbers will show up as 04 and 08 in our number (except when the number is 4 or 8). Thus, there's always a 0 when n is 2 or 3 mod 20. (Note that there are 20 numbers up there.)

  arc> (no:prn:dedup:mapn [string:num->digs (mod-expt 2 _ 1000) 10 3] 3 1000)
  (008 016 032 064 128 256 512 024 048 096 192 384 768 536 072 144 288 576 152 304
  608 216 432 864 728 456 912 824 648 296 592 184 368 736 472 944 888 776 552 104
  208 416 832 664 328 656 312 624 248 496 992 984 968 936 872 744 488 976 952 904
  808 616 232 464 928 856 712 424 848 696 392 784 568 136 272 544 088 176 352 704
  408 816 632 264 528 056 112 224 448 896 792 584 168 336 672 344 688 376 752 504)

That's the cycle of powers of 2 mod 1000, starting with n=3. There are 100 numbers. (Incidentally, the Euler-Fermat theorem and some thinking can tell you that the cycle of powers of 2 mod 10^k will have length that divides phi(5^k), which is 4 * 5^(k-1), which is 4, 20, 100, 500, etc.) I'll make my computer figure out which numbers contain 0.

  arc> (let u (dedup:mapn [string:num->digs (mod-expt 2 _ 1000) 10 3] 3 1000) (keep [pos #\0 u._] (range 0 dec:len.u)))
  (0 1 2 3 7 8 9 14 19 20 39 40 59 60 76 79 80 85 99)

Which is 19 out of the 100 different things mod 1000. Now let's write a function to deal with 10^k.

  (def ass (k)
    (withs (pt (expt 10 k)
            u  (dedup:mapn [string:num->digs (mod-expt 2 _ pt) 10 k]
                           k (+ k (* 4 (expt 5 dec.k)))))
      (list (count [pos #\0 _] u)
            len.u)))

To explain the above code: I let "pt" = 10^k, then I take the numbers from k to k + 4 * 5^(k-1), find 2^n (mod pt), write out its digits in base 10 (and force there to be k digits, by appending 0's to the front if necessary), put the digits together into a string, make a list of all these resulting strings, remove duplicates just to be sure, and give the resulting list the name "u"; then I return a list of two numbers, the number of strings in the list that contain 0, and the total number of strings in the list. From what we've seen above, we'd expect (ass 2) to return (2 20), and (ass 3) to return (19 100).

  arc> ass.2
  (2 20)
  arc> ass.3
  (19 100)

That's a good sign. Now let's proceed to higher numbers. I'll make it print out the resources it's using, too.

  arc> time:ass.4
  time: 30 gc: 0 mem: 2458488
  (136 500)
  arc> time:ass.5
  time: 338 gc: 169 mem: -21400960
  (862 2500)
  arc> time:ass.6
  time: 905 gc: 141 mem: -2376232
  (5129 12500)
  arc> time:ass.7
  time: 4333 gc: 227 mem: 43046240
  (29330 62500)
  arc> time:ass.8
  time: 28885 gc: 1797 mem: 74515640
  (163232 312500)

Well, we've passed the halfway mark, but my computer is feeling the pressure. If you rewrote this in C or something, you might be able to go up a few numbers higher, but probably not much. (Hmm, actually, I'm kind of repeating a bunch of work: I already know that "2 and 3 mod 20" numbers always have a 0, and similar for "3 4 5 6 10 ... mod 100 [add 3 to my third list of numbers]". However, if I ignored those, that would only cut out half the work for k=8. But maybe a lot more will get cut out in higher numbers, and the number of things to deal with will become manageable?)

Well, in principle at least, this could be solved by computer.

Edit: Hmm, qntm seems to have said much of this already. Well, at least I have done some grunt work to illustrate his arguments.

impendia · on April 24, 2011

+1. At first I thought that none of these statements could even remotely be proved, but I've changed my mind. I totally buy this.

In particular, it looks like this would prove "Every sufficiently large power of 2 contains at least k zeroes", for any value of k. The computational time would probably be prohibitive, but this should definitely work in principle.

As a working analytic number theorist, I suspect the asymptotic formula is out of reach, for similar reasons to why you can't count twin primes. I'd be very pleased if someone persuaded me I'm wrong.

But, please, none of this mumbo-jumbo about "higher order Fourier analysis". Gowers et al. developed it to find progressions in arithmetic sequences, which is not related to the present problem as far as I can see.

Someone · on April 25, 2011

it looks like this would prove "Every sufficiently large power of 2 contains at least k zeroes", for any value of k. The computational time would probably be prohibitive, but this should definitely work in principle.

Only when you accept constructive proofs that do not say anything about their running time and that may not even halt.

I suspect the asymptotic formula is out of reach, for similar reasons to why you can't count twin primes.

I am not aware of any result on the _inherent_ difficulty of counting twin primes. Are you hinting at one? If so, can you give a reference?

impendia · on April 26, 2011

By "in principle" I am excluding any consideration of running time, or computing resources. For example, if k = 10, perhaps you would need a computer with 10^10000 bytes of RAM working for an equal number of years. It's a finite computation. ;)

"may not even halt" corresponds to my "it looks like". Such algorithms not working would be equivalent to me and the parent commenter being wrong -- which is not out of the question. Probabilistically, such an algorithm should work (I think -- have not checked the details), so it ought to work unless there is some unforeseen reason why it wouldn't. A "conspiracy" against this happening, if you will.

As far as the difficulty in counting twin primes -- well, to give an easier example, look at Cojocaru and Murty's book on sieve methods and read about the sieve of Eratosthenes-Legendre. You know that 1/2 of numbers are even, 2/3 aren't divisible by 3, 4/5 aren't divisible by 5, etc., so the proportion of numbers that are prime is equal to (1/2)(2/3)(4/5)(6/7)..., which you can show with a little bit of effort is equal to 0.

With some work, you can turn this into a proof. In theory, you can estimate the number of primes < X, as a function of X. This estimate would be roughly along the lines of the original post. But as Cojocaru and Murty (among others) point out, the error terms get bad quickly. This type of phenomenon is extremely common in analytic number theory.

In the case of prime counting, you can get better results by other methods, most prominently the Riemann zeta function. But even then you can only get the best results if you prove the Riemann Hypothesis. Million dollar bounty out on that one.

qntm · on April 24, 2011

Second attempt.

2^n has approximately n log 2 digits. Assuming that they are distributed randomly, the probability that none of these digits are 0 is (9/10)^(n log 2).

For n = 50,000, this is already unbelievably tiny. For n = 50,001, it is even tinier. And so on.

In fact, summing all of these tiny probabilities from n = 50,000 to n = infinity is about the same as integrating (9/10)^(x log 2) from x = 50,000 to x = infinity, which is, again, only slightly above zero.

So, statistically at least, there is an excellent chance that every power of 2 above 50,000 has at least one zero.

xyzzyz · on April 24, 2011

Assuming that they are distributed randomly

Well, this is what precisely needs to be proven. You cannot just assume that -- they're not distributed randomly in base 2, why they should be in base 10? They most likely are, but if you assume that, you are assuming the hypothesis.

Sniffnoy · on April 24, 2011

If we ask instead how many nonzero digits there are in 2^n, there's actually a known lower bound on this, due to Cam Stewart, though it is unfortunately quite tiny - on the order of (log log n)/(log log log n), much smaller than what empirically appears to be the case.

In fact, the statement is much more general: Say you have two bases A and B (such that no power of A is a power of B, to prevent triviality), a base-A digit a, and a base-B digit b. For any n let L_A,a(n) denote the number of non-a digits in the base-A expansion of n, and L_B,b(n) similarly with B and b. Then for any n, L_A,a(n)+L_B,b(n)>(log log n / (log log log n + C)) - 1, where C is a constant depending only on A and B. So while it's certainly impossible to say "large numbers have complicated base-A expansions", it is in some sense possible to say that they can't simultaneously have simple base-A and base-B representations!

In particular you get a lower bound on number of nonzero digits (or more generally non-b digits) of powers of A written to base B by just taking n to be powers of A and taking a=0.

Paper: http://www.math.uwaterloo.ca/PM_Dept/Homepages/Stewart/Jour_...

Unfortunately, not only is the lower bound tiny even asympotically, but the constant involved is absolutely huge. (The paper doesn't explicitly state how to compute this constant, but once I was trying to bound below the number of nonzero digits in powers of 2 written to base 3, so I traced through all the computations and references, and it's ridiculously large, much too large for what I was trying to do... I forget just how large, except that it was the sort of number most conveniently described as being "about 10^[something]" rather than in words.) So again, not close to what appears to be true empirically.

leif · on April 24, 2011

arethuza · on April 24, 2011

That's exactly what I thought then I read the article which does state that they are talking about base ten.

forinti · on April 24, 2011

For base 2, you can use Pascal's triangle to find out how many numbers with x zeroes (or ones) exist up to 2^n: http://alquerubim.blogspot.com/2010/10/combinacoes-de-digito... (portuguese only, I'm afraid).

From base 3 onwards, I would expect to see some "subparticles" inside each number in the triangle. I would start from there.

rounak · on April 24, 2011

The approximate linearity is amazing!

JoachimSchipper · on April 24, 2011

For a random number m, you'd expect about 1/10 digits to be zero. Why is it surprising that 2^n would behave somewhat similarly?

qntm · on April 24, 2011

The numbers 2^n aren't random. For example, the final digit is never 0. You might expect other patterns to appear on larger scales.

baq · on April 24, 2011

you'd have to prove that 2^n has a uniform distribution of digits in base 10.

cft · on April 24, 2011

The fact that it's linear actually hints that the distribution of digits is actually random: the probability of 0 in any position other than the first and the last is indeed approaching 1/10, as n goes to infinity.

Qz · on April 24, 2011

Something about the font they use there makes 0 look like () on Firefox for me.

marshray · on April 24, 2011

My quick estimate, on average, for large numbers: n/(log2 10)/10 which is about n/33.

He says it way mo better tho.