Basically, transformer models are the best for NLP. They use something called attention based mechanisms, which allows the model to draw correlations between pieces of text/tokens that are far apart. The issue is that this is an O(n^2) operation. So the model is bounded by the context window, which is currently mostly at 512 tokens, and is thus, bounded in how much it can understand.
Recent innovations, and further study, will broaden the context window, and thus unlock better reading comprehension and context understanding.
For instance, the ability to answer a question using a piece of text is mostly stuck at just finding one paragraph. The future will see models that can find multiple different paragraphs, understand how they relate, pull the relevant information, and synthesize it. This sounds like a minor step forwards, but its important.
This will unlock better conversational abilities, but also, better ways to understand how different pieces of textual information relate. The scattershot of information across the internet can go away. Computers can better understand context to act on human intention through language, unlocking the ability to handle ambiguity. This will change the internet.
If this is really where the researchers think these tools are headed (and I don't really doubt you on that point), then this is incredibly dangerous stuff. No matter how good your system is, the impact of implicit, unintentional, and non-targeted bias is huge on the sorts of content these systems will produce. But expose it to the levels of intentional manipulation present on the Internet of today, and these models don't stand a chance of producing something that safely does what you claim.
I'm happy to be wrong about this, but I'm not seeing any discussion about the safety and security of using these systems. And if it's not even being discussed, we can be sure nothing's actually being done about it. Selling promises of active-agent computers interpreting human intent and summarizing information from the Internet without addressing this concern is irresponsible at this point.
Basically, transformer models are the best for NLP. They use something called attention based mechanisms, which allows the model to draw correlations between pieces of text/tokens that are far apart. The issue is that this is an O(n^2) operation. So the model is bounded by the context window, which is currently mostly at 512 tokens, and is thus, bounded in how much it can understand.
Recent innovations, and further study, will broaden the context window, and thus unlock better reading comprehension and context understanding.
For instance, the ability to answer a question using a piece of text is mostly stuck at just finding one paragraph. The future will see models that can find multiple different paragraphs, understand how they relate, pull the relevant information, and synthesize it. This sounds like a minor step forwards, but its important.
This will unlock better conversational abilities, but also, better ways to understand how different pieces of textual information relate. The scattershot of information across the internet can go away. Computers can better understand context to act on human intention through language, unlocking the ability to handle ambiguity. This will change the internet.