Chat with Docs
Build an AI Chatbot with MistralAI + Streamlit
Chat with your docs RAG + embeddings with MistralAI.
Want to chat with your docs?
This article is a quick walkthrough of how to build your own chatbot using Streamlit and Mistral AI.
Code is here → Deepnote Notebook and GitHub.
Interested in collaborating? DM on Twitter!
What is Mistral AI
Based in France, Mistral AI is the actual open OpenAI, with a mission to deliver the best open models to the developer community.
They stand out for their commitment to open-source development, offering models under the Apache 2.0 License and providing access to raw model weights for research.
raw weights
They open-source both pre-trained models and fine-tuned models.
- Mistral-7B-v0.1: Hugging Face // raw_weights (md5sum:
37dab53973db2d56b2da0a033a15307f
). - Mistral-7B-Instruct-v0.2: Hugging Face // raw_weights (md5sum:
fbae55bc038f12f010b4251326e73d39
). - Mixtral-8x7B-v0.1: Hugging Face.
- Mixtral-8x7B-Instruct-v0.1: Hugging Face // raw_weights (md5sum:
8e2d3930145dc43d3084396f49d38a3f
).
If you want to know more about their models, read the blog posts for Mistral 7b and Mixtral 8x7B.
Mistral models
Mistral AI provides three models through their API endpoints: tiny, small, and medium.
They also have an embedding model.
Pricing
The pricing is pay-as-you-go.
They have a nice dashboard that shows you the usage.
Comparison with other models
Mistral-medium is really impressive and sits perfectly sandwiched between GPT-3.5 and GPT-4. In my (limited) experience it’s a great choice for anyone that isn’t able to get consistency or quality out of GPT-3.5.
Even cheaper options on other hosts
- Anyscale: $0.15/M and $0.50/M for Mistral-tiny (7B) and Mistral-small (8x7B)
- Deepinfra: has Mixtral for $0.27/M input & output
It’s time to Build
A chat with resume app
I’m using my resume for this demo. You can use any document you want.
Here’s the resume for reference.
Reading PDF
I have my resume in the data folder.
We use pathlib Path
to read in the data folder and pypdf to read the pdf file. We store everything in the text
variable.
Let’s peek at the first 100 characters.
Chunking
To perform retrieval augmented generation (RAG), giving LLMs context data to reduce hallucination in the response. It’s like if I asked you a question and also gave you a textbook to find an answer from the textbook. You're retrieving information from that textbook.
To perform RAG, we need to split documents into smaller chunks, so it’s more effective to identify and retrieve the most relevant information.
Depending on the use case, a smaller chunk size will be beneficial for RAG to identify and extract relevant information more accurately, as larger text chunks can contain filler text that obscures the semantic representation.
Here, we combine 500 characters into one chunk and get 8 chunks.
Embedding
We create a text embedding for each text chunk, a numerical representation of text in the vector space. In other words, we’re mapping words to vectors.
In this space, words with similar meanings are closer to each other
To create embeddings, we use Mistral AI’s embeddings API endpoint.
We create a simple embed function to get embeddings from a single chunk and store all of them in a NumPy array.
Here’s what the embeddings look like.
These embeddings have a dimension of 1024, which means the size of the word vector is 1024.
Vector databases
Once we have the embeddings, we store them in a vector database for efficient processing and retrieval.
Here, we use Faiss, an open-source vector db developed by Meta.
We create an index to store our embeddings.
There are other indexes available.
Now that we have an index that stores all our embeddings, we can process the user question.
Query
When a user asks a question, we create embeddings using the same model.
This way, we can compare the user question what the embeddings we have stored.
Retrieval
To find the best information to answer the user query, we perform a search on our vector db using index.search
, it takes two parameters, the embedding of our question and k, which is the number of similar vectors, to retrieve.
The function returns the distances (D) and indices (I) of the most similar vector, and based on the indices, we can return the actual text.
Prompt
We create a prompt template that combines the retrieved chunk and the question.
Chat model
Using the mistral chat completion API with a mistral model; here we’re using mistral-medium
, we generate an answer based on the user question and the context retrieved.
I’d say this is a great answer!
putting everything together
We can put all the pieces together and write a function ask
to just pass in a question and have it return a response.
Let’s test it on a couple more questions.
Now that you know the individual pieces that make up the app, let’s look at the streamlit app!
Streamlit app
Here’s a demo of the app, with text streaming included.
Building the index
We create a function to build out the index and have it return the index and the chunks. We need the chunks to retrieve them later on a new query.
We cache this index using st.cache_resource
so streamlit isn’t creating it every time.
Streaming
To simulate streaming, we write a generator stream_response
that yields the responses from AI, and another generator stream_str
that yields characters from a string one by one and pass it to st.write_stream
.
Storing messages
To maintain a history of the chat in our app, we populate the chats in our session_state variable messages
main
In our main, we create a sidebar button to reset the conversation, which clears the session state.
And we put all the pieces together. We first add a welcome message, “Ask me anything!” for the AI agent, and every query involves adding a human message and passing that query to reply
to generate a response.
That’s it for this article!
If you have any questions again, don’t hestiate to reach out to me by leaving a comment or on Twitter!
Thanks for reading
Be sure to follow the bitgrit Data Science Publication to keep updated!
Want to discuss the latest developments in Data Science and AI with other data scientists? Join our discord server!
Follow Bitgrit below to stay updated on workshops and upcoming competitions!
Discord | Website | Twitter | LinkedIn | Instagram | Facebook | YouTube