Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
data_split.ipynb		data_split.ipynb
data_transform.ipynb		data_transform.ipynb
mind_utils.ipynb		mind_utils.ipynb
wikidata_knowledge_graph.ipynb		wikidata_knowledge_graph.ipynb

README.md

Data Preparation

In this directory, notebooks are provided to illustrate utility functions for data operations such as data import / export, data transformation, data split, etc., which are frequent data preparation tasks witnessed in recommendation system development.

Notebook	Description
data_split	Details on splitting data (randomly, chronologically, etc).
data_transform	Guidance on how to transform (implicit / explicit) data for building collaborative filtering typed recommender.
wikidata knowledge graph	Details on how to create a knowledge graph using Wikidata

Data split

Three methods of splitting the data for training and testing are demonstrated in this notebook. Each supports both Spark and pandas DataFrames.

Random Split: this is the simplest way to split the data, it randomly assigns entries to either the train set or the test set based on the allocation ratio desired.
Chronological Split: in many cases accounting for temporal variations when evaluating your model can provide more realistic measures of performance. This approach will split the train and test set based on timestamps by user or item.
Stratified Split: it may be preferable to ensure the same set of users or items are in the train and test sets, this method of splitting will ensure that is the case.

Data transform

Data transformation techniques which are commonly used in various recommendation scenarios are introduced and reviewed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

01_prepare_data

01_prepare_data

README.md

Data Preparation

Data split

Data transform

Files

01_prepare_data

Directory actions

More options

Directory actions

More options

Latest commit

History

01_prepare_data

Folders and files

parent directory

README.md

Data Preparation

Data split

Data transform