An AI that Predicts Video Game ESRB Ratings

A machine learning model, created by Jonah Yang and Isabelle Hu. Given the year, genre, and platform of a game, it can predict the video game's ESRB rating.

Get Started

Project Sections

Our Vision

Our initial goal was to accurately predict the age range of a certain game, a certain time period, a certain genre, or of all three. However, due to a lack of solid data, we had to compromise. Now, we show the ESRB ratings for games and genres.

Cleaning and Processing Data

The first thing when developing any machine learning is to get the data and clean it. To do this, we dropped the rows and columns that were useless and/or empty. Next, we converted Genre and Platform (which are our inputs) and ESRB Rating (which is our output) from strings into numbers. Then we balanced the dataset, then divided it into training and testing. After this process, we were ready to feed the data into our model.

K-Nearest Neighbors

Our machine learning model uses KNN, or K-Nearest Neighbors. This works by looking at a datapoint’s K nearest neighbors to define it. For example, if all the nearest neighbors of the datapoint are E-rated, the datapoint is very likely also to be E-rated. K can be set by the user, defaulting to 5. For our model, we used the value of 7.

Visualization

HEATMAPS (jonah)

Our Goal

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Est ultricies integer quis auctor elit sed vulputate mi sit. Dolor morbi non arcu risus quis varius quam.

Processing Data

The first thing we did with the data was to clean it. We deleted all columns but the Name, ESRB Rating, Genre, Developer, Year, and Platform because many of the other columns were either empty or useless.
Next, we dropped the duplicates, the rows with N/A and NaN, and the ESRB ratings that are too infrequent to be used.
Then, we used custom hash functions to convert the ESRB Rating, Genre and Platform from strings to integers.
Finally, we balanced the dataset because some ESRB ratings were overrepresented. Then, we divided into training and testing.

A screenshot of our dataset.

Our Work

Assorted heatmaps

This is a heatmap to show correlation between the major factors of a video game (Platform, Year, Genre, ESRB Rating). We chose this visual because of our goal of taking success rate based on genre, age range, and year it was introduced to the public.

This is a heatmap to show correlation between ratings (raw numbers). This heatmap was formed using a confusion matrix.We chose this visual because it shows our model's consistency with it's accuracy. For example, if you look up at the top left, we can see that our model correctly predicted the rating "E" 251 times.

This is a heatmap to show correlation between ratings, but instead of raw numbers, we used percentages. This heatmap was formed using a confusion matrix. If we take the same top left square, we can see that the 251 times it correctly predicted E translates to a prediction rate of 74%.