I sort of don't understand how does it make video searchable... I guess each fra...

correa_brian · on May 16, 2024

Correct. Videos are broken down into frames and each is indexed with a vision model. The images get scaled down to optimize for space. The more media you index, obviously that will increase the size of the index, so we're always looking for ways to improve the process for scale, like compressing the vectors or partitioning the embeddings so they can be stored elsewhere.

I'm still learning a lot about performance for this type of operation, so it's a work in progress.

danbrooks · on May 16, 2024

Looks like image search using CLIP embeddings with frames sampled from video at a low frame rate.

correa_brian · on May 16, 2024

Exactly