PlaySense - A Game Recommendation System 🎮

A sophisticated game recommendation system that combines K-Means clustering and Nearest Neighbors to provide personalized game recommendations. The system addresses the challenge of decision paralysis faced by players when navigating vast game libraries by offering tailored suggestions based on game attributes and user preferences.

🎯 Motivation

The gaming industry's exponential growth has led to an overwhelming number of choices for players. Our project addresses this challenge by:

Helping players navigate extensive game libraries effectively
Reducing decision paralysis through personalized recommendations
Supporting discovery of both popular and niche titles
Leveraging machine learning to understand and match player preferences

📊 Project Structure

game-recommendation-system/
├── data/
│   ├── final_data.csv
│   ├── games_with_reviews.csv
│   ├── games.csv
│   └── processed_game_data.csv
├── indexdir/
│   ├── _MAIN_1.toc
│   ├── MAIN_ez3ijabrp49e5se8.seg
│   └── MAIN_WRITELOCK
├── pickle/
│   └── kmeans_model.pkl
├── src/
│   ├── collaborative_filtering.ipynb
│   ├── data_scraping.py
│   ├── eda.ipynb
│   ├── main.ipynb
│   └── main2.ipynb
├── txt/
│   ├── notes.md
│   ├── output.txt
│   ├── testing.txt
│   └── user_agents.txt
└── documentation/
    ├── ML_Project_proposal.pdf
    ├── Project_Presentation.pdf
    └── Project_Report.pdf

📚 Literature Review

Our approach is informed by several key research papers:

Machine-Learning Item Recommendation System for Video Games
- Explores ERT and DNN models for personalized recommendations
- Focuses on real-time user behavior adaptation
- ERT model showed superior accuracy and scalability
Content-Based and Context-Based Recommendation Systems
- Reviews various recommendation techniques
- Addresses challenges like information overload
- Emphasizes importance of contextual information
STEAM Game Recommendations
- Investigates recommender systems for the STEAM platform
- Tests various models including FM, DeepNN, and DeepFM
- Found DeepNN performs best for accuracy and novelty

🔍 Dataset Description

Content-Based Filtering Dataset

Source: Video Games Recommendation System (Kaggle)

Features:

name | release_date | price | dlc_count | detailed_description | about_the_game
windows | mac | linux | achievements | supported_languages | developers
publishers | categories | genres | estimated_owners | average_playtime_forever

Collaborative Filtering Dataset

Source: Steam Store
Contents:
- 41 million user recommendations
- Game metadata
- User profiles
- Review data

Data Distribution Analysis

🛠️ Methodology

Data Preprocessing

Initial Features

Numerical Features: Price, Release Year
Categories & Genres: A list of categories & genres that a game belongs to.
Platform: 0/1 Binary features for the availability of Windows, Mac, Linux.
Publishers & Studios: A list of publishers & studios that a game belongs to.
PlayTime, Description, Supported Languages: Other features that were either missing for many entries or not relevant.

Data Normalization

To ensure equal contribution of all features during clustering, numerical features like Price and Release Year were normalized. Binary features like Categories, Genres, and platform support were normalized using StandardScaler. This prevented any single feature or group of features from disproportionately influencing the clustering process.

Processed Columns

#	Column Name	Dtype	Description
1	`windows`	`int64`	Binary feature for Windows
2	`mac`	`int64`	Binary feature for Mac
3	`release_year`	`int64`	Year of game release
4	`linux`	`int64`	Binary feature for Linux
5	`price`	`float64`	Price of the game
6	`categories`	`object`	Categories list
7	`genres`	`object`	Genres list
8	`game_studios`	`object`	Associated game studios
9	`categories_includes_level_editor`	`int64`	Level editor feature
10	`categories_<category_name>`	`int64`	One-hot encoded categories
...	...	...	...
52	`genres_nudity`	`int64`	Binary for genre: nudity
53	`genres_casual`	`int64`	Binary for genre: casual
54	`genres_short`	`int64`	Binary for genre: short
55	`genres_video_production`	`int64`	Binary for genre: video production

Initial Features

Numerical Features: Price, Release Year
Categories & Genres: A list of categories & genres that a game belongs to.
Platform: 0/1 Binary features for the availability of Windows, Mac, Linux.
Publishers & Studios: A list of publishers & studios that a game belongs to.
PlayTime, Description, Supported Languages: Other features that were either missing for many entries or not relevant.

Data Normalization

To ensure equal contribution of all features during clustering, numerical features like Price and Release Year were normalized. Binary features like Categories, Genres, and platform support were normalized using StandardScaler. This prevented any single feature or group of features from disproportionately influencing the clustering process.

Processed Columns

#	Column Name	Dtype	Description
1	`windows`	`int64`	Binary feature for Windows
2	`mac`	`int64`	Binary feature for Mac
3	`release_year`	`int64`	Year of game release
4	`linux`	`int64`	Binary feature for Linux
5	`price`	`float64`	Price of the game
6	`categories`	`object`	Categories list
7	`genres`	`object`	Genres list
8	`game_studios`	`object`	Associated game studios
9	`categories_includes_level_editor`	`int64`	Level editor feature
10	`categories_<category_name>`	`int64`	One-hot encoded categories
...	...	...	...
52	`genres_nudity`	`int64`	Binary for genre: nudity
53	`genres_casual`	`int64`	Binary for genre: casual
54	`genres_short`	`int64`	Binary for genre: short
55	`genres_video_production`	`int64`	Binary for genre: video production

Content-Based Filtering

Feature Engineering
- Numerical features: Price, Release Year
- Binary features: Platform support, categories, genres
- Studio clustering using K-Means++

Dimensionality Reduction
- Applied Truncated SVD
- Reduced studio data to 60 components
- Achieved 6.8% explained variance

Add SVD variance explanation chart

🔄 Iterative Model Development

Initial Challenges & Solutions

1. Random Initialization Problems

Arriving at global minima through random initialization is not guaranteed, and in most cases, it is highly unlikely
Noticed poor inter-cluster similarity using Silhouette Analysis.
Non-Convex Optimization Problem
- Multiple local minima exist
- Final clustering highly dependent on initial centroid positions
- May lead to:
  - Splitting of a single cluster
  - Merging of two clustersRandom centroid initialization when doing clustering has some shortcomings.
Solution: K-Means++ Algorithm implementation improves clustering by initializing centroids in a smarter, probabilistic way that ensures they are spread out, reducing the chances of poor convergence and suboptimal clusters. It results in faster convergence and better clustering quality compared to random initialization in standard K-Means.

2. Distance Metric Selection

Evaluated multiple distance metrics:

# Distance metric comparison
metrics = {
    'euclidean': euclidean_distances,
    'manhattan': manhattan_distances,
    'cosine': cosine_distances
}

Model Comparison With GMM(Gaussian Mixture Models)

Performance Analysis

Cluster-wise Silhouette Analysis:
- Cluster 1.0: 0.263 (46,626 games) - Good structure
- Cluster 3.0: 0.108 (16,156 games) - Normal structure
- Cluster 4.0: 0.065 (1,612 games) - Normal structure
- Cluster 2.0: 0.043 (19,331 games) - Weak structure
- Cluster 0.0: -0.010 (13,679 games) - Potential misclassification

🚀 Getting Started

Prerequisites

python>=3.8
numpy
pandas
scikit-learn
scipy
whoosh

Installation

Clone the repository

git clone [your-repository-link]
cd game-recommendation-system

Install required packages

pip install -r requirements.txt

Usage Example

from src.recommender import GameRecommender

# Initialize the recommender
recommender = GameRecommender()

# Get recommendations for a game
recommendations = recommender.get_recommendations("FIFA")

# Example output:
# 1. FIFA 23 (Match Score: 0.89)
# 2. Pro Evolution Soccer 2023 (Match Score: 0.82)
# 3. FIFA 22 (Match Score: 0.81)

👥 Team Members

Aditya Sharma @adsh16

Data preprocessing
Feature engineering
Clustering analysis
Model development
EDA, SVD, fuzzy search

Kanishk Kumar Meena @KanishkKumarMeena

Data cleaning
Collaborative Filtering
Model evaluation

Vansh Aggarwal @VanshAg283

Dataset management
Visualization
EDA
Performance Testing

🔮 Future Work

More robust and real-life based model performance testing. Using avaiable databases of similar or co-bought games to match with model's reccomendations.
Integration of user interaction data for hybrid recommendations
Enhancement of clustering algorithms for better game categorization
Implementation of real-time recommendation updates
Addition of more sophisticated feature engineering techniques
Development of a user interface for easier interaction

📊 Sample Run

The model runs for the query "fifa".
The model first fixes the nearest matching strings available in the database for the query name.
It uses the top match to find the top recommendations, which are then sorted in order of critic score.

📚 References

Video Games Recommendation System (Kaggle)
Game Recommendations on Steam (Kaggle)
Paul Bertens, et al. "A Machine-Learning Item Recommendation System for Video Games"
Umair Javed and Kamran Shaukat, "A Review of Content-Based and Context-Based Recommendation Systems"
Germán Cheuque, et al. "Recommender Systems for Online Video Game Platforms: the Case of STEAM"

📝 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

IIIT Delhi for project support and guidance
Kaggle and Steam for providing comprehensive datasets
The gaming community for inspiration and feedback

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
assests/images		assests/images
indexdir		indexdir
pickle		pickle
src		src
.gitignore		.gitignore
ML Project research paper.pdf		ML Project research paper.pdf
ML_Project_proposal.pdf		ML_Project_proposal.pdf
Project_Presentation.pdf		Project_Presentation.pdf
Project_Report.pdf		Project_Report.pdf
cf_recommendation.ipynb		cf_recommendation.ipynb
readme.md		readme.md

adsh16/game-recommendation-system

Folders and files

Latest commit

History

Repository files navigation

PlaySense - A Game Recommendation System 🎮

📑 Table of Contents

🎯 Motivation

📊 Project Structure

📚 Literature Review

🔍 Dataset Description

Content-Based Filtering Dataset

Collaborative Filtering Dataset

Data Distribution Analysis

🛠️ Methodology

Data Preprocessing

Initial Features

Data Normalization

Processed Columns

Initial Features

Data Normalization

Processed Columns

Content-Based Filtering

🔄 Iterative Model Development

Initial Challenges & Solutions

1. Random Initialization Problems

2. Distance Metric Selection

Model Comparison With GMM(Gaussian Mixture Models)

Performance Analysis

🚀 Getting Started

Prerequisites

Installation

Usage Example

👥 Team Members

Aditya Sharma @adsh16

Kanishk Kumar Meena @KanishkKumarMeena

Vansh Aggarwal @VanshAg283

🔮 Future Work

📊 Sample Run

📚 References

📝 License

🙏 Acknowledgments

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages