ACM KDD 2024 | Barcelona, Spain | Monday, August 26, 10:00 AM – 1:00 PM
Reddit and other social media platforms have become ubiquitous, providing opportunities for individuals to connect, share, and access information. However, for individuals with disabilities, accessing and interacting with social media content can present significant challenges.
Our mission is to bring community, belonging, and empowerment to everyone in the world. By making content accessible and inclusive for all, we strive to create a space where everyone feels welcome, valued, and represented. We see accessibility (a11y) as a fundamental aspect of inclusivity, so we prioritize providing content and features that are easy for all users to navigate, understand, and enjoy. By removing barriers and ensuring accessibility, we want to empower everyone to fully participate in our community, share their perspectives, and connect with others who share their interests and passions.
Artificial intelligence (AI) offers promising solutions to enhance accessibility and inclusivity, especially with the emergence of Multimodal Large Language Models (LLMs). Multimodal LLMs have witnessed remarkable advancements, empowering them with the ability to analyze and understand all media formats, including text, images, audio, and video.
This hands-on tutorial explores the immense potential of AI to improve accessibility to social media content for individuals with different disabilities, including hearing, visual, and cognitive impairments. We will design and implement a variety of AI-based approaches based on multimodal open-source LLMs to bridge the gap between research and real-world use cases:
- Providing alternative text descriptions (captions) for images, making them accessible to users with visual impairments.
- Generating transcripts and summaries of audio and video content, enabling hearing-impaired users to access the information without relying on others for assistance.
- Fixing accessibility issues in social media posts, generating adapted versions and/or summaries for long and complex texts, making it easier for users with cognitive disabilities to understand and engage with social media content.
We will analyze and highlight the strengths and limitations of these techniques and discuss the challenges and opportunities for further application to other use cases.
Despite their importance and due to time constraints, this tutorial excludes text-to-speech conversion (for helping visually impaired people), and content translation (for overcoming language barriers), as these are well-established techniques with ample resources available elsewhere.
The target audience are researchers or practitioners interested about AI-enabled accessibility for social media content, regardless of whether they work in the industry or not. Participants should have a basic understanding of AI, Natural Language Processing (NLP), and LLMs.
The tutorial will use Google Colaboratory, running a different notebook for each use case.
Leveraging advanced AI techniques to enhance social media accessibility holds an immense potential for transformative societal impact, fostering a more inclusive, equitable, and accessible society where individuals with disabilities are empowered to actively engage in the digital world. sense of belonging, self-esteem, and overall well-being:
- Inclusion and empowerment: Empowering individuals with disabilities by providing them equal access to social media platforms, enabling them to connect, share experiences, contribute and fully participate in the digital world, fostering inclusivity and equity.
- Reduced isolation: Breaking down barriers to social interaction for people with disabilities, reducing feelings of isolation and fostering a sense of belonging.
- Improved educational outcomes: Enhancing educational opportunities for students with disabilities by providing equitable access to learning resources, assignments, and group discussions.
- Greater civic participation: Enabling individuals with disabilities to fully participate in political and social discussions online, shaping public discourse and advocating for their rights.
- Increased employment opportunities: By improving access to information and communication tools, AI-enabled accessibility can support individuals with disabilities in seeking and securing employment.
- Economic benefits: By increasing the participation of individuals with disabilities in the digital economy, AI-enabled accessibility can contribute to economic growth and innovation.
Moreover, our goal is to use this tutorial to raise awareness of the importance of accessibility and to spread the word about the role that AI can play in making digital content more accessible.
The 3-hour tutorial will take place on Monday, August 26, 2024 from 9am to 12pm and will be organized in the following sections.
- Explain the importance of accessibility in digital content.
- Provide an overview of the challenges faced by users with disabilities when accessing Reddit content.
- Discuss accessibility guidelines and best practices to ensure inclusivity.
- Introduce the potential of AI to enhance accessibility.
Discussion: Accessibility
- Walk participants through the process of deploying and prompting different multimodal LLMs such as LLaVA, phi-3, imp-v1-3b, and others to generate short, descriptive captions for social media images.
- Discuss the challenges and limitations of using LLMs for image captioning.
Notebook: Use Case 1. Image Short Captions
- Use open source speech-to-text models (Whisper) to transcribe video clips to text and produce closed captions.
- Explore transcript translation.
- Discuss techniques for handling multiple speakers (speaker diarization).
Notebook: Use Case 2. Audio Clip Transcripts
- Guide participants through the steps of designing and implementing a pipeline to generate video descriptions, combining keyframe extraction, image captioning, audio transcript and summarization using LLMs.
- Explore the challenges and advantages for different types of video content.
Notebook: Use Case 3. Video Descriptions
- Combine all previous models to demonstrate how to use AI to summarize lengthy posts for users with cognitive impairments.
- Compare and contrast different summarization techniques.
- Discuss the ethical considerations of using AI for summarization.
Notebook: Use Case 4. Complex Post Summarization
- Discuss the challenges and best practices for deploying accessible content solutions.
- Guide participants in developing a plan for implementing accessibility initiatives.
- Summarize the key takeaways and benefits of enhancing social media accessibility.
- Explore several open source models for speech generation.
- Discuss the implications for accessibility.
Notebook: Bonus Use Case. Text to Speech
To open the notebooks on Colab:
- Open the
reddit/kdd2024-tutorial-breaking-barriers
project in GitHub. - Go to each notebook (files with
.ipynb
extension) and open it in Colab by clicking on . - Follow the instructions in the notebook cells.
This dataset featuring diverse multimedia posts with curated images and videos has been created to support the tutorial. This dataset showcases real-world scenarios, allowing participants to gain a practical understanding of how the presented approaches can effectively address the different challenges.
All authors are members of the multidisciplinary ML Understanding team at Reddit, based in the United States, Spain, and Canada.
- Julio Villena
<[email protected]>
- Principal Engineer. Madrid, Spain. - Rosa Català
<[email protected]
- Senior Director. San Francisco, CA, USA. - Janine García
<[email protected]
- Staff Engineer. Madrid, Spain. - Concepción Polo
<[email protected]>
- Staff Engineer. Madrid, Spain. - Yessika Labrador
<[email protected]>
- Staff Engineer. San Francisco, CA, USA. - Francisco del Valle
<[email protected]>
- Senior Engineer. Madrid, Spain. - Bhargav Ayyagari
<[email protected]>
- Engineering Manager. Toronto, Canada. - Audrey Holmes
<[email protected]>
- Staff Engineer. New York, NY, USA.
The notebooks and code snippets are provided for illustrative purposes only and should not be considered production-ready solutions. The code demonstrates various use cases and concepts but may contain bugs or inefficiencies. There might be more optimal ways to achieve the same functionality. This code also incorporates external implementations, all of which are appropriately referenced within the code comments. Please refer to the original sources for detailed documentation and licensing information. Use this code as a starting point for your own implementations and adapt it to your specific needs and requirements.
# Copyright 2024 Reddit, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.