Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I sort of don't understand how does it make video searchable... I guess each frame is extracted from a video and some sort of model is applied to it? And each frame is stored to some embedding? I imagine the storage will become huge since, say each frame from a video is stored and indexed!


Correct. Videos are broken down into frames and each is indexed with a vision model. The images get scaled down to optimize for space. The more media you index, obviously that will increase the size of the index, so we're always looking for ways to improve the process for scale, like compressing the vectors or partitioning the embeddings so they can be stored elsewhere.

I'm still learning a lot about performance for this type of operation, so it's a work in progress.


Looks like image search using CLIP embeddings with frames sampled from video at a low frame rate.


Exactly




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: