EmotionalArcMovie is an interactive recommender system that supports discovery of unknown movies with the desired sentiment arc to go beyond the static ranked list paradigm.
Motived by the research paper The emotional arcs of stories are dominated by six basic shapes that used the NLP methodology to map emotional journeys of novel collections and identified 6 emotional arcs which describe all those stories,I decided to analized 9000 movie scripts and build a movie recommendation system engine.
The example of an interactive main interface of engine as below:
The detailed project introduction is https://emotionalarcmovie.herokuapp.com/EmotionalArcMovie:
It is recommended to use the Anaconda distribution, to install a set of standard required packages. Once Anaconda is installed, please type:
conda install numpy pandas matplotlib
The additional required Python packages are listed in the file requirements.txt. In order to install them, please type:
pip install -r requirements.txt
The scripts are obtained by scraping 1100 movies from website IMSDb and 23576 from springfieledspringfiled. You can automatically download them by running the code in src/imsdb_scraping .ipynb and src/scraping_springField_movieScripts.ipynb.
cd src/
jupyter notebook
and then run imsdb_scraping.ipynb and scraping_springField_movieScripts.ipynb. Above code will creates a directory 'data/imsdb_scraping' and 'data/springField_scraping' where they stores the movie scripts, along with some meta-information.
The movie meta-information like youtubeId,genome vectors,movieID,could be downloaded from the movieLens.
Like the way to run imsdb_scraping.ipynb scipts,consecutively run
write_genome_df.ipynb
write_movie_with_youtubeId.ipynb
rating_with_imdbId.ipynb
This is done by looking up each word of a given window in the NRC Word-Emotion Association Lexicon, which associates words 2 sentiments (negative, positive). The code that extract the emotional content, smooth the arc trajectory and subsample 100 points of each movie is in src/R_sentiment.r. It can be run in R interactive shell by a command:
Rscript R_sentiment.r.
The code creates a directory "../data/normed_sentiment/", where it stores the datapoints needed to trace the trajectory for each movie.
it will use all the 100-dimentional vectors of everymovie to fit in 6 clusters, reducing the 100-dimensional emotional arc to 2-dimentio as well for visualizing on the plane.All results by k-means will be in a dicrectory ""../data/k-means-results/".
k_means.ipynb
After having the 100-dimentional arcs and the genome vector of every movie,we calculated the sentiment arc simmilarities ans genome similarities betwen each pair of movies.this is done by c++ scipts calculate_genome_similarities.cpp and calculate_sentiment_similarities.cpp since wi will take more than one day by python scripts but only 4 minutes in c++.
open the terminal and type in
gcc calculate_genome_similarities.cpp -o calculate_genome_similarities.out
./calculate_genome_similarities.out
gcc calculate_sentiment_similarities.cpp -o calculate_genome_similarities.out
./calculate_sentiment_similarities.out
To build a recommend system for registered usrs,I applied the matrix factorization algorithm by using the ratings from Movielens. I used turi create to train the model and save in the dirtory "..data/tc_matrix_factorization_model'.the code is
turi_matrix_factorization.ipynb
This is a a hybrid recommender system which integrates content and sentiment-based filtering in an interactive main interface. It based on the Collaborative Filtering(for the user having past history) and the content-based recommendation(for the user don't have the past history). And on top of it , the recommender give the user right to choose movies with centain type of emotional arc and adjust the influence of the emotional arc of movies.