The Shinkansen is FASCINATING.
I recently went and was amazed by Tokyo's infrastructure and how they have a city under a city.
The fact that there is a bullet train at tokyo station every 10 mins or so is mind blowing
I went into a Youtube rabbit hole the other night...
Well, of course I'm biased on the answer :). But to give a not-so-biased answer, I would first try to understand what the project is about and whether RAG is a priority in it.
If the project is leveraging agents and LLMs without worrying too much on context/up-to-date data then Haystack could be a good option.
If the focus is to eventually use RAG then our framework could help.
Additionally, there might be a potential route where both are used, depending on the use case.
Feel free to dm if you want to chat further on this!
Relevance calculations are handled by the vector db but we try to improve such relevance with the use of metadata (you will see how our components have "selectors" so that metadata can flow all the way to the vector database at the vector level and have an influence when results/scores get retrieved at search time)
Got it. I'd encourage you to expose more of that functionality at the level of your application if possible. I think there is a lot of potential in using more than just cosine similarity, especially when there are lots of candidates and you really want to sharpen up the top few recommendations to the best ones. You might find this open-source library I made recently useful for that:
I've had good results from starting with cosine similarity (using FAISS) and then "enriching" the top results from that with more sophisticated measures of similarity from my library to get the final ranking.
Hey! Yes! I am the creator, anything specific you wanted to know about? We published the tech-stack in this tweet https://x.com/kevin_neum/status/1712915693874958604?s=20
but essentially the way it works is:
1. Vercel and nextjs for frontend code and deployment
2. Neum to power the rag pipelines for the chatbot to query up-to-date information which we pull from a variety of sources
2.a) the text embeddings are stored in Weaviate (vector db)
3. We then create a prompt/some code with langchain to help query openai/stream the response back!
Thanks! Yes. We ingest from a variety of sources. You can check in the about section of the page but essentially with Neum - Disclaimer, I'm the co-founder of Neum (https://neum.ai) - we power the RAG for the chatbot.
The bulk of the data is getting refreshed by Tweets from all of the candidates.
We also pull in data from public sources such as wikipedia and ballotpedia (the bot outputs the sources used)
And we also pull in from transcripts of interviews the candidates have had, again, if a piece of indo was used from any of these sources, we show it to the user