Fair enough to be skeptical. Some responses to your points:
> the bar to doing something useful in mathematics is extremely high
Ah but the bar to do something interesting in automated theorem proving is much lower. Solving exercises from an advanced undergraduate class involving proofs would already be of interest.
> Generating a good dataset of plausible and implausible next steps in mathematical proofs is a much harder problem.
There are thousands of textbooks, monographs, and research mathematical journals. There really is a gigantic corpus of natural language mathematical proofs to study.
In graduate school there were a bunch of homework proofs which the professors would describe as "follow your nose" : after you make an initial step in the right direction the remaining steps followed the kind of pattern that quickly becomes familiar. I think it is very plausible that a GPT3 style system trained on mathematical writing could learn these "follow your nose" patterns.
> problems you classify into "formal language translation" aren't even translation problems
Fair. Going from natural language proofs like from a textbook to a formal language like automatic theorem provers use has similarities to a natural language translation problem but it would be fair to say that this is its own category of problem.
I agree there might be some sort of translation problem that would partially automate the cost of converting all these examples in textbooks, monographs, and research journals from pseudocode in English+Mathematics into the correct formal logic statements. I think this is an interesting and complex problem that could make managing the cost of a dataset manageable. It still comes with a problem that most of these sources start from very far past the axioms so in order to use them you need formal language proofs for each of the things they assert without proof.
I question whether you'd get high enough accuracy out of a pattern matching type model like GPT3 that occasionally chooses an unusual or unexpected word. Given how frequently translating A->B->A yields A* instead of A with GPT3 I wonder if we are actually successfully capturing the precise mathematical statements.
> the bar to doing something useful in mathematics is extremely high
Ah but the bar to do something interesting in automated theorem proving is much lower. Solving exercises from an advanced undergraduate class involving proofs would already be of interest.
> Generating a good dataset of plausible and implausible next steps in mathematical proofs is a much harder problem.
There are thousands of textbooks, monographs, and research mathematical journals. There really is a gigantic corpus of natural language mathematical proofs to study.
In graduate school there were a bunch of homework proofs which the professors would describe as "follow your nose" : after you make an initial step in the right direction the remaining steps followed the kind of pattern that quickly becomes familiar. I think it is very plausible that a GPT3 style system trained on mathematical writing could learn these "follow your nose" patterns.
> problems you classify into "formal language translation" aren't even translation problems
Fair. Going from natural language proofs like from a textbook to a formal language like automatic theorem provers use has similarities to a natural language translation problem but it would be fair to say that this is its own category of problem.