AI for productivity

Summarize your Emails with GPT4

Gmail API + OpenAI GPT 4 → better email

Benedict Neo

Published in

bitgrit Data Science Publication

5 min readMar 19, 2024

I read a lot of newsletters; the one I keep up with the most (and actually open) is TLDR.

I find it tedious to open up each individually; I prefer to have them all on one page.

For example, on my website, I have the Hackernews feed fetched from their API, displayed minimally with just the hyperlinked title and a speech balloon emoji that directed me to the comments.

All in one page. Simple and clean.

On a whim, I coded it. Here’s what the email looks like.

Since you clicked on this article, I’m assuming you want to also “summarize” your emails.

This article is a tutorial for you to recreate what I did.

I’ll show you how to read emails from specific Gmail senders, pass it to OpenAI GPT 4 with your custom prompt, and send the response back to your email.

So fire up your IDE (I recommend Cursor) and follow along.

All the code is in this Deepnote notebook.

Setup: Prerequisites

Here are some prerequisites

Python 3.10 or higher
Gmail account
Google Cloud account with Gmail API enabled
OpenAI API key

Once you have these, clone my GitHub

git clone https://github.com/benthecoder/gmail_llm.git
cd gmail_llm

Install some Python packages

pip install -r requirements.txt

Set up Google API credentials. This way, your code has access to your account.

There are many ways; this is one of them:

Create a new OAuth 2.0 Client ID (instructions)
Download the JSON file and rename it to credentials.json
Put credentials.json in the project directory.

Rename .env.local to .env and set your key

OPENAI_API_KEY=<YOUR_API_KEY>

Now, you should be good to go!

Configurations

I’m using the GPT 4 Turbo model with 128k context length, so it can handle more emails; feel free to switch this out.

You can also change the prompt to your liking. Look for system_message in the function summarize_email

You can now run python main.py

It should generate a token file in the background and prompt you with a few questions.

Here’s a sample output.

This code should also be generalizable to other use cases you have. If you want to tweak it or are curious about how it works, keep reading 👇.

Gmail stuff

Creating the Gmail client

Here, the credential file is read to create a token file, which expires after a day. So, remember to refresh this token when it expires.

With a Gmail client, you can perform Gmail operations.

Filtering emails

You can build a query to filter your emails based on sender and dates. More options are in the docs.

Fetching the emails

Emails are fetched with the users.messages.list API.

Parsing the emails

We only want the text from our emails, so it checks for the text/plain mimeType , and encode it in ASCII format so it’s readable.

Here’s what it looks like parsing the emails from TLDR.

And the body of one of the emails.

Sending emails

AI stuff

GPT 4 Turbo (JSON Output)

Some things to note here

I’m joining all the emails into one big string. There could be better ways to first format the data, maybe in a database.
It does a retry on fail using tenacity and stops after three attempts.
It checks for token length using tiktoken
Temperature is set to 0.9 arbitrarily; play around with this value and the prompt.
It enforces JSON output with response_format={“type”: “json_object”}

Output format

If OpenAI is behaving nicely and I get a JSON output from it, I convert it into a markdown format. You could prompt OpenAI to give this to you directly, but having it output JSON first is more reliable. Plus, it lets you utilize the data for other purposes if you decide to.

Next steps

This is an MVP, and there are a lot of improvements that can be made to this project.

Here are a few I can think of now:

Improve Link Extraction: Develop methods for accurately parsing and extracting links from newsletters.
Zapier Integration for Automation: Set up a workflow in Zapier to automate the execution of your Python script, processing emails, and sending summaries.
Database Integration: Store extracted links and relevant information in a database for better organization and long-term data management.
Monitoring and Logging: Implement systems to track the performance of your script and log any errors or issues for easier troubleshooting.
Token Count Optimization: Research and apply techniques to optimize the use of tokens in GPT-4 requests to reduce costs and improve efficiency.
Experiment with Output Formats: Explore different formats for presenting the extracted links, such as article-style summaries or other creative formats.

Thanks for reading!

Be sure to follow the bitgrit Data Science Publication to keep updated!

Want to discuss the latest developments in Data Science and AI with other data scientists? Join our Discord server!

Follow Bitgrit below to stay updated on workshops and upcoming competitions!