Breaking Barriers: A Hands-On Tutorial on AI-Enabled Accessibility to Social Media Content

ACM KDD 2024 | Barcelona, Spain | Monday, August 26, 10:00 AM – 1:00 PM

Motivation

Reddit and other social media platforms have become ubiquitous, providing opportunities for individuals to connect, share, and access information. However, for individuals with disabilities, accessing and interacting with social media content can present significant challenges.

Our mission is to bring community, belonging, and empowerment to everyone in the world. By making content accessible and inclusive for all, we strive to create a space where everyone feels welcome, valued, and represented. We see accessibility (a11y) as a fundamental aspect of inclusivity, so we prioritize providing content and features that are easy for all users to navigate, understand, and enjoy. By removing barriers and ensuring accessibility, we want to empower everyone to fully participate in our community, share their perspectives, and connect with others who share their interests and passions.

Artificial intelligence (AI) offers promising solutions to enhance accessibility and inclusivity, especially with the emergence of Multimodal Large Language Models (LLMs). Multimodal LLMs have witnessed remarkable advancements, empowering them with the ability to analyze and understand all media formats, including text, images, audio, and video.

Objectives

This hands-on tutorial explores the immense potential of AI to improve accessibility to social media content for individuals with different disabilities, including hearing, visual, and cognitive impairments. We will design and implement a variety of AI-based approaches based on multimodal open-source LLMs to bridge the gap between research and real-world use cases:

Providing alternative text descriptions (captions) for images, making them accessible to users with visual impairments.
Generating transcripts and summaries of audio and video content, enabling hearing-impaired users to access the information without relying on others for assistance.
Fixing accessibility issues in social media posts, generating adapted versions and/or summaries for long and complex texts, making it easier for users with cognitive disabilities to understand and engage with social media content.

We will analyze and highlight the strengths and limitations of these techniques and discuss the challenges and opportunities for further application to other use cases.

Despite their importance and due to time constraints, this tutorial excludes text-to-speech conversion (for helping visually impaired people), and content translation (for overcoming language barriers), as these are well-established techniques with ample resources available elsewhere.

The target audience are researchers or practitioners interested about AI-enabled accessibility for social media content, regardless of whether they work in the industry or not. Participants should have a basic understanding of AI, Natural Language Processing (NLP), and LLMs.

The tutorial will use Google Colaboratory, running a different notebook for each use case.

Impact on Society

Leveraging advanced AI techniques to enhance social media accessibility holds an immense potential for transformative societal impact, fostering a more inclusive, equitable, and accessible society where individuals with disabilities are empowered to actively engage in the digital world. sense of belonging, self-esteem, and overall well-being:

Inclusion and empowerment: Empowering individuals with disabilities by providing them equal access to social media platforms, enabling them to connect, share experiences, contribute and fully participate in the digital world, fostering inclusivity and equity.
Reduced isolation: Breaking down barriers to social interaction for people with disabilities, reducing feelings of isolation and fostering a sense of belonging.
Improved educational outcomes: Enhancing educational opportunities for students with disabilities by providing equitable access to learning resources, assignments, and group discussions.
Greater civic participation: Enabling individuals with disabilities to fully participate in political and social discussions online, shaping public discourse and advocating for their rights.
Increased employment opportunities: By improving access to information and communication tools, AI-enabled accessibility can support individuals with disabilities in seeking and securing employment.
Economic benefits: By increasing the participation of individuals with disabilities in the digital economy, AI-enabled accessibility can contribute to economic growth and innovation.

Moreover, our goal is to use this tutorial to raise awareness of the importance of accessibility and to spread the word about the role that AI can play in making digital content more accessible.

Tutorial Outline

The 3-hour tutorial will take place on Monday, August 26, 2024 from 9am to 12pm and will be organized in the following sections.

Introduction (15 min)

Explain the importance of accessibility in digital content.
Provide an overview of the challenges faced by users with disabilities when accessing Reddit content.
Discuss accessibility guidelines and best practices to ensure inclusivity.
Introduce the potential of AI to enhance accessibility.

Discussion: Accessibility

Use Case 1. Image Short Captions (45 min)

Walk participants through the process of deploying and prompting different multimodal LLMs such as LLaVA, phi-3, imp-v1-3b, and others to generate short, descriptive captions for social media images.
Discuss the challenges and limitations of using LLMs for image captioning.

Notebook: Use Case 1. Image Short Captions

Use Case 2. Audio Clip Transcripts (30 min)

Use open source speech-to-text models (Whisper) to transcribe video clips to text and produce closed captions.
Explore transcript translation.
Discuss techniques for handling multiple speakers (speaker diarization).

Notebook: Use Case 2. Audio Clip Transcripts

Use Case 3. Video Descriptions (30 min)

Guide participants through the steps of designing and implementing a pipeline to generate video descriptions, combining keyframe extraction, image captioning, audio transcript and summarization using LLMs.
Explore the challenges and advantages for different types of video content.

Notebook: Use Case 3. Video Descriptions

Use Case 4. Complex Post Summarization (30 min)

Combine all previous models to demonstrate how to use AI to summarize lengthy posts for users with cognitive impairments.
Compare and contrast different summarization techniques.
Discuss the ethical considerations of using AI for summarization.

Notebook: Use Case 4. Complex Post Summarization

Discussion (30 min)

Discuss the challenges and best practices for deploying accessible content solutions.
Guide participants in developing a plan for implementing accessibility initiatives.
Summarize the key takeaways and benefits of enhancing social media accessibility.

Bonus Use Case. Text to Speech (for fast runners)

Explore several open source models for speech generation.
Discuss the implications for accessibility.

Notebook: Bonus Use Case. Text to Speech

Running the Notebooks

To open the notebooks on Colab:

Open the reddit/kdd2024-tutorial-breaking-barriers project in GitHub.
Go to each notebook (files with .ipynb extension) and open it in Colab by clicking on .
Follow the instructions in the notebook cells.

Dataset

This dataset featuring diverse multimedia posts with curated images and videos has been created to support the tutorial. This dataset showcases real-world scenarios, allowing participants to gain a practical understanding of how the presented approaches can effectively address the different challenges.

Subreddit	Title	Attachments
catsplayingvideogames	Pro Gamer here	image1
pics	English football fans leaving Frankfurt in a mess after the match	image2
pics	My oil painting of red wine and dino nuggs	image3
pics	The first set photo of actor David Corenswet in James Gunn’s upcoming ‘Superman’ (2025)	image4
gaming	A bunch of 40 year olds just playing games at my house	image5
auroraborealis	Whistler BC Canada	image6
science	Why can't we walk in a straight line?	video1
2latinoforyou	Está picante la cosa	video2
funny	Rob Brydon and Steve Coogan's duelling Michael Caine impressions	video3
interestingasfuck	Part of the ravens' morning routine at the Tower of London with their Ravenmaster	video4
interestingasfuck	A bear says hi and catches the food like a pro	video5
interestingasfuck	AI learns to see with Wi-Fi routers as its eyes	video6
pettyrevenge	You get me fired, so you can’t work where I care about	text1
pettyrevenge	Keep pestering me to tie into my fence? Say goodbye to the fence!	text2
stories	I found an endless hole on some land I recently bought. It changes anything I send down in bizarre ways.	text3
stories	Wisdom story: A wise elderly man and two travelers	text4

Authors

All authors are members of the multidisciplinary ML Understanding team at Reddit, based in the United States, Spain, and Canada.

Julio Villena <[email protected]> - Principal Engineer. Madrid, Spain.
Rosa Català <[email protected] - Senior Director. San Francisco, CA, USA.
Janine García <[email protected] - Staff Engineer. Madrid, Spain.
Concepción Polo <[email protected]> - Staff Engineer. Madrid, Spain.
Yessika Labrador <[email protected]> - Staff Engineer. San Francisco, CA, USA.
Francisco del Valle <[email protected]> - Senior Engineer. Madrid, Spain.
Bhargav Ayyagari <[email protected]> - Engineering Manager. Toronto, Canada.
Audrey Holmes <[email protected]> - Staff Engineer. New York, NY, USA.

Disclaimer

The notebooks and code snippets are provided for illustrative purposes only and should not be considered production-ready solutions. The code demonstrates various use cases and concepts but may contain bugs or inefficiencies. There might be more optimal ways to achieve the same functionality. This code also incorporates external implementations, all of which are appropriately referenced within the code comments. Please refer to the original sources for detailed documentation and licensing information. Use this code as a starting point for your own implementations and adapt it to your specific needs and requirements.

License

# Copyright 2024 Reddit, Inc.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     https://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
media		media
Accessibility.md		Accessibility.md
Bonus_Use_Case_Text_to_Speech.ipynb		Bonus_Use_Case_Text_to_Speech.ipynb
LICENSE		LICENSE
README.md		README.md
Use_Case_1_Image_Short_Captions.ipynb		Use_Case_1_Image_Short_Captions.ipynb
Use_Case_2_Audio_Clip_Transcripts.ipynb		Use_Case_2_Audio_Clip_Transcripts.ipynb
Use_Case_3_Video_Descriptions.ipynb		Use_Case_3_Video_Descriptions.ipynb
Use_Case_4_Complex_Post_Summarization.ipynb		Use_Case_4_Complex_Post_Summarization.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Breaking Barriers: A Hands-On Tutorial on AI-Enabled Accessibility to Social Media Content

Motivation

Objectives

Impact on Society

Tutorial Outline

Introduction (15 min)

Use Case 1. Image Short Captions (45 min)

Use Case 2. Audio Clip Transcripts (30 min)

Use Case 3. Video Descriptions (30 min)

Use Case 4. Complex Post Summarization (30 min)

Discussion (30 min)

Bonus Use Case. Text to Speech (for fast runners)

Running the Notebooks

Dataset

Authors

Disclaimer

License

About

Releases

Packages

Languages

License

reddit/kdd2024-tutorial-breaking-barriers

Folders and files

Latest commit

History

Repository files navigation

Breaking Barriers: A Hands-On Tutorial on AI-Enabled Accessibility to Social Media Content

Motivation

Objectives

Impact on Society

Tutorial Outline

Introduction (15 min)

Use Case 1. Image Short Captions (45 min)

Use Case 2. Audio Clip Transcripts (30 min)

Use Case 3. Video Descriptions (30 min)

Use Case 4. Complex Post Summarization (30 min)

Discussion (30 min)

Bonus Use Case. Text to Speech (for fast runners)

Running the Notebooks

Dataset

Authors

Disclaimer

License

About

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages