I'm reading the docs and it does not appear Google keeps these embeddings at all...

I'm reading the docs and it does not appear Google keeps these embeddings at all. I send some text to them, they return the embedding for that text at the size I specified.

So the flow is something like:

1. Have a text doc (or library of docs)

2. Chunk it into small pieces

3. Send each chunk to <provider> and get an embedding vector of some size back

4. Use the embedding to:

4a. Semantic search / RAG: put the embeddings in a vector DB and do some similarity search on the embedding. The ultimate output is the source chunk

4b. Run a cluster algorithm on the embedding to generate some kind of graph representation of my data

4c. Run a classifier algorithm on the embedding to allow me to classify new data

5. The output of all steps in 4 is crucially text

6. Send that text to an LLM

At no point is the embedding directly in the models memory.