I was just watching a science-related video containing math equations. I wondere...

groby_b · 2025-03-06T18:24:49 1741285489

Now? OK, you need to screencap and upload to LLM, but that's well established tech by now. (Where by "well established", I mean at least 9 months old ;)

Same goes for "navigating HTTP sites via LLM prompts". Most LLMs have web search integration, and the "Deep Research" variants do more complex navigation.

Video chat is there partially, as well. It doesn't really pay much attention to gestures & expressions, but I'd put the "earliest possible" threshold for that a good chunk closer than 5 years.

notepad0x90 · 2025-03-06T18:37:32 1741286252

Yeah, all these things are possible today, but getting them well polished and integrated is another story. Imagine all this being supported by "HTML6" lol. When apple gets around to making this part of safari, then we know it's ready.

groby_b · 2025-03-06T20:44:48 1741293888

That's a great upper-bound estimator ;)

But kidding aside - I'm not sure people want this being supported by web standards. We could be a huge step closer to that future had we decided to actually take RDF/Dublin Core/Microdata seriously. (LLMs perform a lot better with well-annotated data)

The unanimous verdict across web publishers was "looks like a lot of work, let's not". That is, ultimately, why we need to jump through all the OCR hoops. Not only did the world not annotate the data, it then proceeded to remove as many traces of machine readability as possible.

So, the likely gating factor is probably not Apple & Safari & "HTML6" (shudder!)

If I venture my best bet what's preventing polished integration: It's really hard to do via foundational models only, and the number of people who want to have deep & well-informed conversations via a polished app enough that they're willing to pay for an app that does that is low enough that it's not the hot VC space. (Yet?)

Crystal ball: Some OSS project will probably get within spitting distance of something really useful, but also probably flub the UX. Somebody else will take up these ideas while it's hot and polish it in a startup. So, 18-36 months for an integrated experience from here?

devmor · 2025-03-06T18:20:08 1741285208

Good lord, I dearly hope not. That sounds like a coddled hellscape world, something you'd see made fun of in Disney's Wall-E.

notepad0x90 · 2025-03-06T18:36:12 1741286172

hence my comment about privacy and need for legislation :)

It isn't the tech that's the problem but the people that will abuse it.

devmor · 2025-03-06T18:40:29 1741286429

While those are concerns, my point was that having everything on the internet navigated to, digested and explained to me sounds unpleasant and overall a drain on my ability to think and reason for myself.

It is specifically how you describe using the tech that provokes a feeling of revulsion to me.

notepad0x90 · 2025-03-06T20:35:51 1741293351

Then I think you misunderstand. The ML system would know when you want things digested to you or not. Right now companies are assuming this and forcing LLM interaction. But when properly done, the system would know based on your behavior or explicit prompts what you want and provide the service. If you're staring at a paragraph intently and confused, it might start highlighting common phrases or parts of the text/picture that might be hard to grasp and based on your reaction to that, it might start describing things via audio,tool tips,side pane,etc.. In other words, if you don't like how and when you're interacting with the LLM ecosystem, then that is an immature and failing ecosystem, in my vision this would be a largely solved problems, like how we interact with keyboards,mouse and touchscreens today.

devmor · 2025-03-06T23:05:32 1741302332

No, I fully understand.

I am saying that this type of system, that deprives the user of problem solving, is itself a problem. A detriment to the very essence of human intelligence.

notepad0x90 · 2025-03-07T15:11:00 1741360260

I just look at it as allowing the user to focus on problems that aren't already easily solved. Like using a calculator instead of calculating manually on paper.

devmor · 2025-03-07T17:02:03 1741366923

But the scenario you described is one in which you need an equation explained to you. That is exactly the kind of scenario where it's important to do the calculation yourself to understand it.

If you are expecting problems to be solved for you, you are not learning, you're just consuming content.

notepad0x90 · 2025-03-08T00:43:45 1741394625

explained != solved

abrichr · 2025-03-06T18:20:50 1741285250

> I wondered how soon will I be able to ask the video player "What am I looking at here, describe the equations" and it will OCR the frames, analyze them and explain them to me.

Seems like https://aiscreenshot.app might fit the bill.