Data Science 🤝 Crypto

Predict Cryptocurrency Prices with Data Science

Train multiple models in a few lines of code with AutoGluon

Benedict Neo

Published in

bitgrit Data Science Publication

5 min readMar 29, 2024

Interested in Crypto?

Put your data science skills to the test by participating in this challenge!

Problem Statement

(The following information is taken from the bitgrit competition page)

In the ever-evolving world of digital finance, cryptocurrencies have become much more than just an investment opportunity — they’re revolutionizing the way we think about and engage in transactions online. After a rollercoaster year in 2022, marked by market downturns in response to interest rate hikes from the Federal Reserve, it’s been nothing short of thrilling to see cryptos bounce back with vigor at the start of 2023, even close to hitting record highs without any easing of federal funds rates.
Amidst this exciting landscape, bitgrit — a leading company in data science competitions and an emerging force in Web3 technology — is thrilled to announce the upcoming release of its own token, BGR. This new token is set to enrich experiences on our platform, making our community and competitions even more engaging. Now is the best time to celebrate our new endeavor with a brand-new challenge that’s all about forecasting the future of the crypto price. Whether you’re a seasoned trader or simply crypto-curious, we’re inviting you to put your prediction prowess to the test.
“Will the cryptocurrency (ticker is undisclosed) rise or fall within a two-week window?” That’s for you to decide, using a rich tapestry of information ranging from market trends to social media indicators. We believe that diversity fuels innovation, so no matter where you come from or what your background is, if you’ve got a knack for numbers and a passion for crypto, join a warm community of like-minded enthusiasts and excel in a professional yet congenial competition atmosphere.

The goal 🥅

Build a model to predict whether the price of cryptocurrency will go up (1) or down (0) 2 weeks from the day of prediction, which is indicated by “Target” column by using the following information:

Let’s look at the data

The data 💾

Get the data by registering for the competition.

📂 dataset
 ├── test.csv
 ├── train.csv
 └── submission_format.csv

A little info about the data:

train.csv: data to train your machine learning model.
test.csv: data to test how well your model performs on unseen data. Used to make predictions on with your trained model and create a submission file.
solution_format.csv: example of the format the submission file needs to be in to be properly scored.

More info about the data is in the guidelines section of the competition.

Now that you know the goal and some information about the data given to you, it’s time to get your hands dirty.

The code is on Deepnote.

Load data

Let’s take a peek at the data.

And some information about the train data.

Seems like there’s some missing data.

Let’s look at the distribution of the target column.

Seems like it’s not imbalanced!

Missing values

Now, let’s visualize the missing values.

Purple indicates missing values.

Let’s look at more specific numbers using this custom function.

It seems like the columns TR_x_EventInd , which from the docs are “Events that may of may not affect the cryptocurrencies price.” have a lot of missing values.

correlation heatmap

Some of the features are well-correlated with each other.

The competition page says these columns: feature_x_y are independent variables from the market data.

Let’s start modeling!

AutoGluon

What is AutoGluon?

It’s an AutoML solution for Image, Text, Time Series, and Tabular data.

Its four main features are:

Quick Prototyping: Build machine learning solutions on raw data in a few lines of code.
State-of-the-art Techniques: Automatically utilize SOTA models without expert knowledge.
Easy to Deploy: Move from experimentation to production with cloud predictors and pre-built containers.
Customizable: Extensible with custom feature processing, models, and metrics.

Let’s dive into using AutoGluon for our dataset!

Train test split

First, we do a train-test split.

And define our label class.

Next, we drop the labels from our test dataset.

TabularPredictor

Now we initialize and fit AutoGluon’s TabularPredictor using .fit()

A few notes on our predictor

we use F1 as the evaluation metric based on the competition
we set a save path so we can load the model later on
num_bag_folds = Number of folds used for bagging of models
num_bag_sets= Number of repeats of kfold bagging to perform
Total number of models trained during bagging = num_bag_folds * num_bag_sets, so in our case, 5 models are trained.
num_stack_levels = Number of stacking levels to use in stack ensemble

It provides rich information as you run the predictor.

First, your system info and information about your data.

Creating features using the AutoMLPipelineFeatureGenerator, which fills NA and drops duplicates.

And starts training the various models.

Once it’s done training, we can evaluate the models

Evaluation

We can look into what AutoGluon inferred from the data.

We can make predictions using .predict()

We can also view the probabilities of the class predictions.

To evaluate the predictions, we use the function evaluate_predictions

Here we plot a confusion matric and plot the AUROC curve.

And here’s a summary of all models that were trained.

We can also see the leaderboard.

The best model can be obtained like this.

We can also see which features were more important for predicting the target.

Once you’re satisfied with the best model, you can predict it on the test set.

Predict on test set

We can load the model based on our saved path.

Then do .predict()

We also create the submission file.

That’s it!

More resources

There’s much more you can do with AutoGluon.

Check out these articles on their official website.

Essential Functionality
Feature Engineering
In-Depth — Hyperparameter tuning, accelerating inference, etc.
Tabular FAQ

Thanks for reading

Be sure to follow the bitgrit Data Science Publication to keep updated!

Want to discuss the latest developments in Data Science and AI with other data scientists? Join our discord server!

Follow Bitgrit below to stay updated on workshops and upcoming competitions!