Chat with Docs

Build an AI Chatbot with MistralAI + Streamlit

Chat with your docs RAG + embeddings with MistralAI.

Benedict Neo
bitgrit Data Science Publication
6 min readMar 28, 2024

--

Want to chat with your docs?

This article is a quick walkthrough of how to build your own chatbot using Streamlit and Mistral AI.

Code is here → Deepnote Notebook and GitHub.

Interested in collaborating? DM on Twitter!

What is Mistral AI

Based in France, Mistral AI is the actual open OpenAI, with a mission to deliver the best open models to the developer community.

They stand out for their commitment to open-source development, offering models under the Apache 2.0 License and providing access to raw model weights for research.

source

raw weights

They open-source both pre-trained models and fine-tuned models.

If you want to know more about their models, read the blog posts for Mistral 7b and Mixtral 8x7B.

Mistral models

Mistral AI provides three models through their API endpoints: tiny, small, and medium.

source

They also have an embedding model.

Pricing

The pricing is pay-as-you-go.

They have a nice dashboard that shows you the usage.

Comparison with other models

reddit

Mistral-medium is really impressive and sits perfectly sandwiched between GPT-3.5 and GPT-4. In my (limited) experience it’s a great choice for anyone that isn’t able to get consistency or quality out of GPT-3.5.

Even cheaper options on other hosts

It’s time to Build

A chat with resume app

I’m using my resume for this demo. You can use any document you want.

Here’s the resume for reference.

Reading PDF

I have my resume in the data folder.

cursor

We use pathlib Path to read in the data folder and pypdf to read the pdf file. We store everything in the text variable.

Let’s peek at the first 100 characters.

Chunking

To perform retrieval augmented generation (RAG), giving LLMs context data to reduce hallucination in the response. It’s like if I asked you a question and also gave you a textbook to find an answer from the textbook. You're retrieving information from that textbook.

To perform RAG, we need to split documents into smaller chunks, so it’s more effective to identify and retrieve the most relevant information.

Depending on the use case, a smaller chunk size will be beneficial for RAG to identify and extract relevant information more accurately, as larger text chunks can contain filler text that obscures the semantic representation.

Here, we combine 500 characters into one chunk and get 8 chunks.

Embedding

We create a text embedding for each text chunk, a numerical representation of text in the vector space. In other words, we’re mapping words to vectors.

In this space, words with similar meanings are closer to each other

To create embeddings, we use Mistral AI’s embeddings API endpoint.

We create a simple embed function to get embeddings from a single chunk and store all of them in a NumPy array.

Here’s what the embeddings look like.

These embeddings have a dimension of 1024, which means the size of the word vector is 1024.

Vector databases

Once we have the embeddings, we store them in a vector database for efficient processing and retrieval.

Here, we use Faiss, an open-source vector db developed by Meta.

We create an index to store our embeddings.

There are other indexes available.

Now that we have an index that stores all our embeddings, we can process the user question.

Query

When a user asks a question, we create embeddings using the same model.

This way, we can compare the user question what the embeddings we have stored.

Retrieval

To find the best information to answer the user query, we perform a search on our vector db using index.search, it takes two parameters, the embedding of our question and k, which is the number of similar vectors, to retrieve.

The function returns the distances (D) and indices (I) of the most similar vector, and based on the indices, we can return the actual text.

Prompt

We create a prompt template that combines the retrieved chunk and the question.

Chat model

Using the mistral chat completion API with a mistral model; here we’re using mistral-medium, we generate an answer based on the user question and the context retrieved.

I’d say this is a great answer!

putting everything together

We can put all the pieces together and write a function ask to just pass in a question and have it return a response.

Let’s test it on a couple more questions.

Now that you know the individual pieces that make up the app, let’s look at the streamlit app!

Streamlit app

Here’s a demo of the app, with text streaming included.

Building the index

We create a function to build out the index and have it return the index and the chunks. We need the chunks to retrieve them later on a new query.

We cache this index using st.cache_resource so streamlit isn’t creating it every time.

Streaming

To simulate streaming, we write a generator stream_response that yields the responses from AI, and another generator stream_str that yields characters from a string one by one and pass it to st.write_stream .

Storing messages

To maintain a history of the chat in our app, we populate the chats in our session_state variable messages

main

In our main, we create a sidebar button to reset the conversation, which clears the session state.

And we put all the pieces together. We first add a welcome message, “Ask me anything!” for the AI agent, and every query involves adding a human message and passing that query to reply to generate a response.

That’s it for this article!

If you have any questions again, don’t hestiate to reach out to me by leaving a comment or on Twitter!

Thanks for reading

Be sure to follow the bitgrit Data Science Publication to keep updated!

Want to discuss the latest developments in Data Science and AI with other data scientists? Join our discord server!

Follow Bitgrit below to stay updated on workshops and upcoming competitions!

Discord | Website | Twitter | LinkedIn | Instagram | Facebook | YouTube

--

--