I’m also skeptical of the approach here but for the opposite reason. ML will conquer proofs by an AlphaZero approach of ignoring all human training data. You can generate arbitrary problems easily enough and doing some GAN like system of both generating problems that yet can’t be solved and a solver to solve them seems to obviate the need for human training data. I’d be really hesitant to be a PhD student in the GOFAI or natural language datamining approaches since I think this could easily be solved by ML in the next five or ten years by ML (specifically at least one well known unsolved research problem being solved by AI). I hope that I’m wrong… I like math as a human endeavor.
The problem with generating arbitrary math problems is that you can generate an infinite number of boring, pointless problems. For example, prove that there is a solution to the system of equations x + 2y + 135798230 > 2x + y + 123532 and x - y < 234. Training an AI system how to solve these problems doesn't do anything.
I think we are in a stage for mathematics similar to where solving Go was in the 80's. For Go back then we didn't even have the right logical structure of the algorithm, we hadn't invented Monte Carlo tree search yet. Once a good underlying structure was found, and we added on AI heuristics, then Go was solvable by AIs. For mathematics I think we also need multiple innovations to get there from here.
Such an "AlphaZero approach" will only knock Gowers's GOFAI approach out of business if the AI can also "justify" its proofs, in the sense that Gowers explains in his blogpost.
Do you think that will happen in 5 years?
DeepMind's recently introduced 540 billion parameter language model PaLM can already explain complex jokes and and reason logically by explicitly describing reasoning steps. The results are quite impressive.
https://ai.googleblog.com/2022/04/pathways-language-model-pa...
To be sure, this happened just two years after GPT-3. So 5 years for something which Gowers wants, but based on a large language model, seems in fact quite realistic.
I find myself doubting that this goal makes any sense.
One mathematician often cannot "justify" their reasoning process to another if their thinking is even slightly different. For example what is intuitive for one side of the algebra/analysis divide is often not to people on the other side.
It is therefore unlikely that a human level computer program will think enough like humans that it can be justified to any human. And vice versa. For it to do so, it has to be better than the human, also have a model of how the human thinks, and then be able to break down thinking it arrived at one way to something that makes sense to the human.
If a computer succeeded in this endeavor, it would almost certainly come up with new principles of doing mathematics that would then have to be taught to humans to for humans to appreciate what it is doing. Something like this has already happened with Go, see https://escholarship.org/uc/item/6q05n7pz for example. Even so, humans are still unable to learn to play the game at the same level as the superhuman program. And the same would be true in mathematics.
I've found for most substantial proofs I've been exposed to I understood the author's justification of how they found the proof. That doesn't mean I would have been able to come up with that step on my own, but once it was done I understood why they took it. If you get sufficiently far into the esoteric weeds of either analysis or algebra, the justifications won't be understood outside that field. But they don't have to be.
Theorem provers starting from basic axioms are often looking for things that aren't in the esoteric weeds of a specific subfield as it often takes too long for them to generate a suitable set of theorems that establish those fields.
> And vice versa. For it to do so, it has to be better than the human, also have a model of how the human thinks, and then be able to break down thinking it arrived at one way to something that makes sense to the human.
This is more or less what Gowers is after. And that's exactly why he wants GOFAI instead of ML. He wants to understand how "doing mathematics" works?
I think this approach hinges on how you build and define the network that flags the "most promising search state". I'd argue this is a far harder problem in theorem proving than it is in various games. Before AlphaZero all kinds of position evaluation techniques existed in each of these games and were in widespread use for over 25 years.
Comparatively, it's even a hard problem to define anything resembling a board where the structure is fixed enough to apply something similar to positional evaluation to. What you want is a network that flags next steps in theorems as promising and allows you to effectively ignore most of the search space. If you have a good way to do this I think AlphaZero would be a promising approach.
But even if you get AlphaZero to work it doesn't solve the problem of understanding. AlphaZero made go players better by giving them more content to study but it didn't explicitly teach knowledge of the game. Strategically in Go, it's actually rather unsatisfying because it seems to win most of its games by entirely abandoning influence for territory and then relying on its superior reading skills to pick fights inside the influence where it often seems to win from what pro players would previously describe as a disadvantageous fight. This means the learnings are limited for players who have less good reading skills than AlphaZero as they can't fight nearly as well, which suggests abandoning the entire dominant strategy it uses.
I think for short proofs that fit in a GPT-style NLP predictor's input, it could produce an initial list of likely "next steps" which could be further scored by the adversarially-trained scoring model which has some intuition for which proof search strategies are likely to work, the combination allowing each model to have different areas of expertise (NLP being technically competent locally and the scorer providing global judgement). The entire proof search would correspond to a single play in AlphaGo with proof search happening entirely within the monte-carlo exploration of likely next-steps as opposed to a sequence of plays as in Go, and play only happening when an automatic proof-checker validates the final proof. Automatic proof-checking could also prune syntactically invalid continuations from the monte-carlo search tree, and probably provide some similar optimizations.
My guess is that such a model would learn to compose a collection of lemmas that score as useful, produce sub-proofs for those, and then combine them into the final proof. This still mimics a sequence of plays closely enough that the scoring model could recognize "progress" toward the goal and predict complementary lemmas until a final solution is visible.
It may even work for very long proofs by letting it "play" a portion of its early lemmas which remain visible to the scorer but the proof sequence is chunked into as many pieces as the NLP needs to see it all and backtracking behind what was output is no longer possible. Once a lemma and its proof are complete it can be used or ignored in the future proof but modifying it sounds less useful.
I think the probability of a GPT-style approach learning to generate mathematically interesting next steps is close to zero. The structure GPT captures is a sense of plausibility of a next step and while you could argue that it's possible given enough data to train the network to only rate mathematically interesting steps highly, the rate of that training would be extremely slow. GPT-style training is most effective for language where there is a certain degree of flexibility. It tends to learn plausibility and provides relevance mostly from context. This approach seems akin to searching in the language space of English+Mathematics over all proofs and hoping to learn mathematically relevant continuations. I think the search space is way too large and positive feedback is far too rare for the model to converge to anything useful.
I highly doubt you could get such a model to produce a useful collection of lemmas. Even if you could, I don't think a useful collection of lemmas progressing towards a goal has anywhere near the same structure as scoring a position in a game.
Ultimately for any AlphaZero technique to work you need a powerful position evaluation network. Almost all the magic happens in that network's pruning. It's quite remarkable that you can get a search of 80,000 nodes to do better that a search of 35,000,000 nodes. All of that is due to efficient pruning from the network. You haven't convinced me of any plausible approach to getting such a network here.
There's a very big difference between solving one problem (if you through enough compute at it you obviously will succeed) and general proving being solved. The latter essentially means being able to replace all programmers, that'd effectively mean AGI in 5 to 10 years.
>>doing some GAN like system of both generating problems that yet can’t be solved and a solver to solve them seems to obviate the need for human training data.
I don't see how this is supposed to work, what if the generator just outputs unsolvable problems?