Our project examines gender dynamics in cinema through the lens of the Bechdel Test, a metric introduced in 1985 by Alison Bechdel to assess female interaction in films. The Bechdel Test requires that a movie feature at least two named women who converse about something other than a man. While a simple benchmark, it uncovers significant gender disparities in films. This project aims to go beyond the test by exploring how women are represented through character tropes, roles, and narrative functions. We focus on how women are often sidelined, reduced to their appeareance, or defined by their relationships with men. By analyzing plot summaries and character features, this project identifies trends in female representation, tracking changes across genres and historical contexts. Our goal is to uncover patterns of gender representation over time, spark discussions on gender equality, and advocate for more inclusive and balanced storytelling.
"Her Side Story" will explore the following research questions:
- How are women sidelined in movies?
- Does gender parity among actors influence a movie's economic performance, ratings, and global reach?
- How do character tropes related to women affect their narrative function and presence in films over time?
- How does the gender of a movie director influence gender equity in a movie ?
- What role does the Bechdel Test play in predicting cinematic gender representation, and how does it correlate with mediatic and financial success ?
-
Python Libraries:
- Pandas
- Numpy
- Matplotlib
- Seaborn (display graphs)
- json (clustering movie languages, genres and countries)
- tqdm (progression bar when running functions)
- collections (Counter)
- Hugging Face’s transformers library (sentiment analysis)
-
Visualization: Interactive visualization libraries (to be determined)
- CMU Movie Summaries Dataset: contains the following files:
- characters_metadata.tsv
- movie_metadata.tsv
- name_clusters.txt
- plot_summaries.txt
- tvtropes.clusters.txt
- IMDb Ratings:
- provides movie ratings data
- reduces the initial dataset size by only keeping movies whose ratings are available
- TMDB Ratings:
- provides box office and budget data
- Bechdel Test API: This dataset
- provides Bechdel Test result ('rating') for a group of movies.
- drastically reduces primary dataset size
- essential for assessing gender interaction trends and understanding the accuracy of the Bechdel Test as a predictor of gender equality in film.
- Gender by name - UCI :
- provides a wide range of first names and associated gender
- used to recover missing gender in the character_metadata.tsv file
- indirectly helps to analyze how genders correlate with character types
- Data Wrangling: extraction, cleaning and standardization of the data
- Focus on aligning the datasets with respect to key attributes such as character tv tropes, character and actors respective names and genders, plots, and movie genres
- Data filtering to comply with the proposed additional datasets and assure compatibility across sources + reduction of the usable data size and
- Data clustering
-
Univariable Analysis: use of data visualisation techniques (histograms, box and scatter plots...) to conduct a graphical analysis of the gender distribution of characters and actors.
-
Multivariable Analysis: further analysis to identify relationships between various factors (e.g. the presence of female characters, movie ratings, box office performance, etc...)
Robust statistical methods is used to evaluate correlations, distributions, and outliers in the data, using t-tests and chi-square tests to examine the significance of the findings.
Sensitivity analysis is performed to evaluate result uncertainty and assess model feasibility.
- Predictive Modeling: use of linear regression models to predict gender equality in film based on certain features such as character roles (tv tropes),genres and plot summary attributes.
- Machine Learning Techniques: techniques such as Decision Trees and Support Vector Machines (SVM) (to be determined) will be employed to create models for classifying films based on their gender representation and to predict whether a film will pass or fail the Bechdel Test.
The sentiment of character descriptions and plot summaries will be analyzed using pre-trained sentiment models to assess how women’s roles are portrayed physically and emotionally.
Until week 9:
- Individual exploration and data wrangling
- Preliminary analysis on the CMU Movie Summaries dataset
- Definition of project objectives, allocation of tasks and delineation of additional datasets
Week 10:
- Further data wrangling
- Analysis on the preprocessed data
Week 11:
- Team collaboration in order to refine data handling steps
- Work on initial visualizations and testing basic machine learning models
- Creation of web interface, work on storytelling and interactive features
Week 12:
- Finalization of data analysis and visualizations.
- Sentiment analysis on character descriptions and plot summaries
- Further work on web interface structure, improvment of interactivness
Week 13:
- Focus on predictive modeling and refining the analysis based on feedback
- Final touches on interface
Week 14:
- Completion of the final project notebook
- Focus on styling, design, and content proofreading
-Coralie: "movie metadata" analysis -Juliette: "movie metadata" analysis, project timeline management -Mahlia: "character metadata" analysis, "transformer" model analysis -Maximilien: "tvtropes" and "plot_summaries" analysis -Pernelle: "character metadata" managment of copywriting and visual/graphical web interface
- Are there any known issues with integrating datasets like IMDb ratings with the CMU dataset? If so, how can we address potential discrepancies?