Skip to content

HaoAreYuDong/Large-Language-Models-for-Tabular-Data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 

Repository files navigation

Tutorial on Large Language Models for Tabular Data: Progresses and Future Directions

🌟 A tutorial on “Large Language Models for Tabular Data” at the SIGIR’24 conference in D.C.

Slides

Paper

Paper List

Introduction

  • Binder: Binding Language Models in Symbolic Languages [Paper]
  • TabLLM: Few-shot Classification of Tabular Data with Large Language Models [Paper]
  • Datar: Large Language Models are Versatile Decomposers: Decompose Evidence and Questions for Table-based Reasoning [Paper]
  • Din-sql: Decomposed in-context learning of text-to-sql with self-correction [Paper]
  • Table Meets LLM: Can Large Language Models Understand Structured Table Data? [Paper]
  • SheetCopilot: Bringing Software Productivity to the Next Level through Large Language Models [Paper]
  • Data-Copilot: Bridging Billions of Data and Humans with Autonomous Workflow [Paper]
  • DAIL-SQL: Text-to-SQL Empowered by Large Language Models: A Benchmark Evaluation [Paper]
  • Table-GPT: Table-tuned GPT for Diverse Table Tasks [Paper]
  • API-Assisted Code Generation for Question Answering on Varied Table Structures [Paper]
  • InsightPilot: An LLM-Empowered Automated Data Exploration System [Paper]
  • TableLlama: Towards Open Large Generalist Models for Tables [Paper]
  • DBCopilot: Scaling Natural Language Querying to Massive Databases [Paper]
  • TAP4LLM: Table Provider on Sampling, Augmenting, and Packing Semi-structured Data for Large Language Model Reasoning [Paper]
  • DB-GPT: Empowering Database Interactions with Private Large Language Models [Paper]
  • Chain-of-Table: Evolving Tables in the Reasoning Chain for Table Understanding [Paper]
  • Trove: Inducing verifiable and efficient toolboxes for solving programmatic tasks [Paper]
  • MatPlotAgent: Method and Evaluation for LLM-Based Agentic Scientific Data Visualization [Paper]
  • StructLM: Towards Building Generalist Models for Structured Knowledge Grounding [Paper]
  • TableLLM: Enabling Tabular Data Manipulation by LLMs in Real Office Usage Scenarios [Paper]
  • Vision Language Models for Spreadsheet Understanding: Challenges and Opportunities [Paper]
  • Table-LLaVA: Multimodal Table Understanding [Paper]
  • SpreadsheetBench: Towards Challenging Real World Spreadsheet Manipulation [Paper]
  • SpreadsheetLLM: Encoding Spreadsheets for Large Language Models [Paper]

Encoding Tabular Data for LLMs

  • Table Meets LLM: Can Large Language Models Understand Structured Table Data? [Paper]
  • Tabular Representation, Noisy Operators, and Impacts on Table Structure Understanding Tasks in LLMs [Paper]
  • SheetAgent: A Generalist Agent for Spreadsheet Reasoning and Manipulation via Large Language Models [Paper]
  • Text-to-SQL Empowered by Large Language Models: A Benchmark Evaluation [Paper]
  • Enhancing text-to-SQL capabilities of large language models: A study on prompt design strategies [Paper]
  • Automated Data Visualization from Natural Language via Large Language Models: An Exploratory Study [Paper]
  • TAP4LLM: Table Provider on Sampling, Augmenting, and Packing Semi-structured Data for Large Language Model Reasoning [Paper]
  • DBCopilot: Scaling Natural Language Querying to Massive Databases [Paper]
  • TabLLM: Few-shot Classification of Tabular Data with Large Language Models [Paper]
  • Towards foundation models for learning on tabular data [Paper]
  • Tables as Images? Exploring the Strengths and Limitations of LLMs on Multimodal Representations of Tabular Data [Paper]
  • Multimodal Table Understanding [Paper]
  • Vision Language Models for Spreadsheet Understanding: Challenges and Opportunities [Paper]
  • SpreadsheetLLM: Encoding Spreadsheets for Large Language Models [Paper]

Modeling and Training LLMs for Tabular Data

  • TaBERT: Pretraining for Joint Understanding of Textual and Tabular Data [Paper]
  • TaPas: Weakly Supervised Table Parsing via Pre-training [Paper]
  • TURL: table understanding through representation learning [Paper]
  • TUTA: Tree-based transformers for generally structured table pre-training [Paper]
  • TAPEX: Table Pre-Training via Learning a Neural SQL Executor [Paper]
  • Unifiedskg: Unifying and multi-tasking structured knowledge grounding with text-to-text language models [Paper]
  • Table-GPT: Table-tuned GPT for Diverse Table Tasks [Paper]
  • SpreadsheetBench: Towards Challenging Real World Spreadsheet Manipulation [Paper]
  • TableLlama: Towards Open Large Generalist Models for Tables [Paper]
  • Hellama: Llamabased table to text generation by highlighting the important evidence [Paper]
  • StructLM: Towards Building Generalist Models for Structured Knowledge Grounding [Paper]
  • TableLLM: Enabling Tabular Data Manipulation by LLMs in Real Office Usage Scenarios [Paper]
  • TAT-LLM: A Specialized Language Model for Discrete Reasoning over Tabular and Textual Data [Paper]
  • Text-to-SQL Empowered by Large Language Models: A Benchmark Evaluation [Paper]
  • DB-GPT: Empowering Database Interactions with Private Large Language Models [Paper]
  • Towards foundation models for learning on tabular data [Paper]
  • LIFT: Language-Interfaced Fine-Tuning for Non-Language Machine Learning Tasks [Paper]
  • Unleashing the Potential of Large Language Models for Predictive Tabular Tasks in Data Science [Paper]
  • Multimodal Table Understanding [Paper]
  • Effective distillation of table-based reasoning ability from llms [Paper]
  • OmChat: A Recipe to Train Multimodal Language Models with Strong Long Context and Video Understanding [Paper]
  • Vision Language Models for Spreadsheet Understanding: Challenges and Opportunities [Paper]
  • SpreadsheetLLM: Encoding Spreadsheets for Large Language Models [Paper]

Tasks and Benchmarks

  • Tablesense: Spreadsheet table detection with convolutional neural networks [Paper]
  • Auto-tables: Synthesizing multi-step transformations to relationalize tables without using examples [Paper]
  • Spreadsheet table transformations from examples [Paper]
  • TUTA: Tree-based transformers for generally structured table pre-training [Paper]
  • Fortap: Using formulas for numerical-reasoning-aware table pretraining [Paper]
  • Open domain question answering over tables via dense retrieval [Paper]
  • Table Retrieval May Not Necessitate Table-specific Model Design [Paper]
  • Compositional semantic parsing on semi-structured tables [Paper]
  • Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task [Paper]
  • HiTab: A Hierarchical Table Dataset for Question Answering and Natural Language Generation [Paper]
  • Dynamic prompt learning via policy gradient for semi-structured mathematical reasoning [Paper]
  • FeTaQA: Free-form Table Question Answering [Paper]
  • Tab-CQA: A Tabular Conversational Question Answering Dataset on Financial Reports [Paper]
  • TempTabQA: Temporal Question Answering for Semi-Structured Tables [Paper]
  • Open question answering over tables and text [Paper]
  • TAT-QA: A Question Answering Benchmark on a Hybrid of Tabular and Textual Content in Finance [Paper]
  • AIT-QA: Question Answering Dataset over Complex Tables in the Airline Industry [Paper]
  • Tabfact: A large-scale dataset for table-based fact verification [Paper]
  • ToTTo: A Controlled Table-To-Text Generation Dataset. [Paper] [Dataset]
  • Enhanced Chart Understanding via Visual Language Pre-training on Plot Table Pairs [Paper]
  • Matplotagent: Method and evaluation for llm-based agentic scientific data visualization [Paper]
  • SheetCopilot: Bringing Software Productivity to the Next Level through Large Language Models. [Paper]
  • Language models enable simple systems for generating structured views of heterogeneous data lakes [Paper]
  • Large language models (LLMs) on tabular data: Prediction, generation, and understanding-a survey [Paper]

LLM-driven Table Agents

  • Large language models are versatile decomposers: Decompose evidence and questions for table-based reasoning [Paper]
  • Exploring chain-of-thought style prompting for text-to-sql [Paper]
  • Chain-of-table: Evolving tables in the reasoning chain for table understanding. [Paper]
  • DIN-SQL: Decomposed InContext Learning of Text-to-SQL with Self-Correction. [Paper]
  • Tab-cot: Zero-shot tabular chain of thought [Paper]
  • Selective demonstrations for cross-domain text-to-SQL [Paper]
  • Spreadsheetcoder: Formula prediction from semi-structured context [Paper]
  • SheetCopilot: Bringing Software Productivity to the Next Level through Large Language Models. [Paper]
  • Toolqa: A dataset for llm question answering with external tools [Paper]
  • ReAcTable: Enhancing ReAct for Table Question Answering. [Paper]
  • Lever: Learning to verify language-to-code generation with execution [Paper]
  • MatPlotAgent: Method and Evaluation for LLM-Based Agentic Scientific Data Visualization [Paper]
  • Binding Language Models in Symbolic Languages. [Paper]
  • Chameleon: Plug-and-play compositional reasoning with large language models [Paper]
  • API-Assisted Code Generation for Question Answering on Varied Table Structures [Paper]
  • Executable code actions elicit better llm agents [Paper]
  • Data-Copilot: Bridging Billions of Data and Humans with Autonomous Workflow. [Paper]
  • ToolWriter: Question Specific Tool Synthesis for Tabular Data [Paper]
  • CRAFT: Customizing llms by creating and retrieving from specialized toolsets [Paper]
  • Trove: Inducing verifiable and efficient toolboxes for solving programmatic tasks [Paper]
  • Cognitive architectures for language agent [Paper]
  • BAGEL: Bootstrapping Agents by Guiding Exploration with Language [Paper]
  • Ehragent: Code empowers large language models for few-shot complex tabular reasoning on electronic health records [Paper]
  • Towards knowledge-intensive text-to-SQL semantic parsing with formulaic knowledge [Paper]

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published