Skip to content

Developed a personalized game recommendation system combining K-Means clustering and Nearest Neighbors algorithms to suggest games based on user preferences and game attributes. The system mitigates decision paralysis by offering tailored recommendations using game features like price, platform, genres, and release year. We implemented content-base

Notifications You must be signed in to change notification settings

adsh16/game-recommendation-system

Repository files navigation

PlaySense - A Game Recommendation System ๐ŸŽฎ

A sophisticated game recommendation system that combines K-Means clustering and Nearest Neighbors to provide personalized game recommendations. The system addresses the challenge of decision paralysis faced by players when navigating vast game libraries by offering tailored suggestions based on game attributes and user preferences.

Project Banner

๐Ÿ“‘ Table of Contents

๐ŸŽฏ Motivation

The gaming industry's exponential growth has led to an overwhelming number of choices for players. Our project addresses this challenge by:

  • Helping players navigate extensive game libraries effectively
  • Reducing decision paralysis through personalized recommendations
  • Supporting discovery of both popular and niche titles
  • Leveraging machine learning to understand and match player preferences

Games Growth

๐Ÿ“Š Project Structure

game-recommendation-system/
โ”œโ”€โ”€ data/
โ”‚   โ”œโ”€โ”€ final_data.csv
โ”‚   โ”œโ”€โ”€ games_with_reviews.csv
โ”‚   โ”œโ”€โ”€ games.csv
โ”‚   โ””โ”€โ”€ processed_game_data.csv
โ”œโ”€โ”€ indexdir/
โ”‚   โ”œโ”€โ”€ _MAIN_1.toc
โ”‚   โ”œโ”€โ”€ MAIN_ez3ijabrp49e5se8.seg
โ”‚   โ””โ”€โ”€ MAIN_WRITELOCK
โ”œโ”€โ”€ pickle/
โ”‚   โ””โ”€โ”€ kmeans_model.pkl
โ”œโ”€โ”€ src/
โ”‚   โ”œโ”€โ”€ collaborative_filtering.ipynb
โ”‚   โ”œโ”€โ”€ data_scraping.py
โ”‚   โ”œโ”€โ”€ eda.ipynb
โ”‚   โ”œโ”€โ”€ main.ipynb
โ”‚   โ””โ”€โ”€ main2.ipynb
โ”œโ”€โ”€ txt/
โ”‚   โ”œโ”€โ”€ notes.md
โ”‚   โ”œโ”€โ”€ output.txt
โ”‚   โ”œโ”€โ”€ testing.txt
โ”‚   โ””โ”€โ”€ user_agents.txt
โ””โ”€โ”€ documentation/
    โ”œโ”€โ”€ ML_Project_proposal.pdf
    โ”œโ”€โ”€ Project_Presentation.pdf
    โ””โ”€โ”€ Project_Report.pdf

๐Ÿ“š Literature Review

Our approach is informed by several key research papers:

  1. Machine-Learning Item Recommendation System for Video Games

    • Explores ERT and DNN models for personalized recommendations
    • Focuses on real-time user behavior adaptation
    • ERT model showed superior accuracy and scalability
  2. Content-Based and Context-Based Recommendation Systems

    • Reviews various recommendation techniques
    • Addresses challenges like information overload
    • Emphasizes importance of contextual information
  3. STEAM Game Recommendations

    • Investigates recommender systems for the STEAM platform
    • Tests various models including FM, DeepNN, and DeepFM
    • Found DeepNN performs best for accuracy and novelty

Literature Review Summary

๐Ÿ” Dataset Description

Content-Based Filtering Dataset

  • Source: Video Games Recommendation System (Kaggle)
  • Features:
    name | release_date | price | dlc_count | detailed_description | about_the_game
    windows | mac | linux | achievements | supported_languages | developers
    publishers | categories | genres | estimated_owners | average_playtime_forever
    

Dataset Features

Collaborative Filtering Dataset

  • Source: Steam Store
  • Contents:
    • 41 million user recommendations
    • Game metadata
    • User profiles
    • Review data

Data Distribution Analysis

EDA

EDA

๐Ÿ› ๏ธ Methodology

Data Preprocessing

Initial Features

  • Numerical Features: Price, Release Year
  • Categories & Genres: A list of categories & genres that a game belongs to.
  • Platform: 0/1 Binary features for the availability of Windows, Mac, Linux.
  • Publishers & Studios: A list of publishers & studios that a game belongs to.
  • PlayTime, Description, Supported Languages: Other features that were either missing for many entries or not relevant.

Data Normalization

To ensure equal contribution of all features during clustering, numerical features like Price and Release Year were normalized. Binary features like Categories, Genres, and platform support were normalized using StandardScaler. This prevented any single feature or group of features from disproportionately influencing the clustering process.


Processed Columns

# Column Name Dtype Description
1 windows int64 Binary feature for Windows
2 mac int64 Binary feature for Mac
3 release_year int64 Year of game release
4 linux int64 Binary feature for Linux
5 price float64 Price of the game
6 categories object Categories list
7 genres object Genres list
8 game_studios object Associated game studios
9 categories_includes_level_editor int64 Level editor feature
10 categories_<category_name> int64 One-hot encoded categories
... ... ... ...
52 genres_nudity int64 Binary for genre: nudity
53 genres_casual int64 Binary for genre: casual
54 genres_short int64 Binary for genre: short
55 genres_video_production int64 Binary for genre: video production

Initial Features

  • Numerical Features: Price, Release Year
  • Categories & Genres: A list of categories & genres that a game belongs to.
  • Platform: 0/1 Binary features for the availability of Windows, Mac, Linux.
  • Publishers & Studios: A list of publishers & studios that a game belongs to.
  • PlayTime, Description, Supported Languages: Other features that were either missing for many entries or not relevant.

Data Normalization

To ensure equal contribution of all features during clustering, numerical features like Price and Release Year were normalized. Binary features like Categories, Genres, and platform support were normalized using StandardScaler. This prevented any single feature or group of features from disproportionately influencing the clustering process.


Processed Columns

# Column Name Dtype Description
1 windows int64 Binary feature for Windows
2 mac int64 Binary feature for Mac
3 release_year int64 Year of game release
4 linux int64 Binary feature for Linux
5 price float64 Price of the game
6 categories object Categories list
7 genres object Genres list
8 game_studios object Associated game studios
9 categories_includes_level_editor int64 Level editor feature
10 categories_<category_name> int64 One-hot encoded categories
... ... ... ...
52 genres_nudity int64 Binary for genre: nudity
53 genres_casual int64 Binary for genre: casual
54 genres_short int64 Binary for genre: short
55 genres_video_production int64 Binary for genre: video production

Content-Based Filtering

  1. Feature Engineering
    • Numerical features: Price, Release Year
    • Binary features: Platform support, categories, genres
    • Studio clustering using K-Means++

Feature Engineering Process

  1. Dimensionality Reduction
    • Applied Truncated SVD
    • Reduced studio data to 60 components
    • Achieved 6.8% explained variance

SVD Analysis Add SVD variance explanation chart

๐Ÿ”„ Iterative Model Development

Initial Challenges & Solutions

1. Random Initialization Problems

  1. Arriving at global minima through random initialization is not guaranteed, and in most cases, it is highly unlikely
  2. Noticed poor inter-cluster similarity using Silhouette Analysis.
  3. Non-Convex Optimization Problem
    • Multiple local minima exist
    • Final clustering highly dependent on initial centroid positions
    • May lead to:
      • Splitting of a single cluster
      • Merging of two clustersRandom centroid initialization when doing clustering has some shortcomings.
  4. Solution: K-Means++ Algorithm implementation improves clustering by initializing centroids in a smarter, probabilistic way that ensures they are spread out, reducing the chances of poor convergence and suboptimal clusters. It results in faster convergence and better clustering quality compared to random initialization in standard K-Means. Random Centroid Init Clustering Comparison

2. Distance Metric Selection

  • Evaluated multiple distance metrics:
    # Distance metric comparison
    metrics = {
        'euclidean': euclidean_distances,
        'manhattan': manhattan_distances,
        'cosine': cosine_distances
    }

Distance Metrics

Model Comparison With GMM(Gaussian Mixture Models)

comparison with GMM

Performance Analysis

Cluster-wise Silhouette Analysis:
- Cluster 1.0: 0.263 (46,626 games) - Good structure
- Cluster 3.0: 0.108 (16,156 games) - Normal structure
- Cluster 4.0: 0.065 (1,612 games) - Normal structure
- Cluster 2.0: 0.043 (19,331 games) - Weak structure
- Cluster 0.0: -0.010 (13,679 games) - Potential misclassification

Silhouette Analysis

๐Ÿš€ Getting Started

Prerequisites

python>=3.8
numpy
pandas
scikit-learn
scipy
whoosh

Installation

  1. Clone the repository
git clone [your-repository-link]
cd game-recommendation-system
  1. Install required packages
pip install -r requirements.txt

Usage Example

from src.recommender import GameRecommender

# Initialize the recommender
recommender = GameRecommender()

# Get recommendations for a game
recommendations = recommender.get_recommendations("FIFA")

# Example output:
# 1. FIFA 23 (Match Score: 0.89)
# 2. Pro Evolution Soccer 2023 (Match Score: 0.82)
# 3. FIFA 22 (Match Score: 0.81)

๐Ÿ‘ฅ Team Members

Aditya Sharma @adsh16

  • Data preprocessing
  • Feature engineering
  • Clustering analysis
  • Model development
  • EDA, SVD, fuzzy search

Kanishk Kumar Meena @KanishkKumarMeena

  • Data cleaning
  • Collaborative Filtering
  • Model evaluation

Vansh Aggarwal @VanshAg283

  • Dataset management
  • Visualization
  • EDA
  • Performance Testing

๐Ÿ”ฎ Future Work

  1. More robust and real-life based model performance testing. Using avaiable databases of similar or co-bought games to match with model's reccomendations.
  2. Integration of user interaction data for hybrid recommendations
  3. Enhancement of clustering algorithms for better game categorization
  4. Implementation of real-time recommendation updates
  5. Addition of more sophisticated feature engineering techniques
  6. Development of a user interface for easier interaction

๐Ÿ“Š Sample Run

  1. The model runs for the query "fifa".
  2. The model first fixes the nearest matching strings available in the database for the query name.
  3. It uses the top match to find the top recommendations, which are then sorted in order of critic score. Sample Run

๐Ÿ“š References

  1. Video Games Recommendation System (Kaggle)
  2. Game Recommendations on Steam (Kaggle)
  3. Paul Bertens, et al. "A Machine-Learning Item Recommendation System for Video Games"
  4. Umair Javed and Kamran Shaukat, "A Review of Content-Based and Context-Based Recommendation Systems"
  5. Germรกn Cheuque, et al. "Recommender Systems for Online Video Game Platforms: the Case of STEAM"

๐Ÿ“ License

This project is licensed under the MIT License - see the LICENSE file for details.

๐Ÿ™ Acknowledgments

  • IIIT Delhi for project support and guidance
  • Kaggle and Steam for providing comprehensive datasets
  • The gaming community for inspiration and feedback

About

Developed a personalized game recommendation system combining K-Means clustering and Nearest Neighbors algorithms to suggest games based on user preferences and game attributes. The system mitigates decision paralysis by offering tailored recommendations using game features like price, platform, genres, and release year. We implemented content-base

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •