Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: core: 1st party sql parser using winnow combinators #371

Open
wants to merge 18 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions core/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,7 @@ pest = "2.0"
pest_derive = "2.0"
mockall = "0.13.0"
rand = "0.8.5"
winnow = { version = "0.6.20", features = ["simd"] }

[target.'cfg(not(target_family = "windows"))'.dev-dependencies]
pprof = { version = "0.12.1", features = ["criterion", "flamegraph"] }
Expand Down
44 changes: 39 additions & 5 deletions core/benches/benchmark.rs
Original file line number Diff line number Diff line change
@@ -1,18 +1,52 @@
use criterion::{criterion_group, criterion_main, Criterion, Throughput};
use fallible_iterator::FallibleIterator;
use limbo_core::parser::parse_sql_statement;
use limbo_core::{Database, PlatformIO, IO};
use pprof::criterion::{Output, PProfProfiler};
use sqlite3_parser::lexer::sql::Parser;
use std::sync::Arc;

fn bench(c: &mut Criterion) {
limbo_bench(c);

if std::env::var("DISABLE_PARSER_BENCHMARK").is_err() {
parser_bench(c);
}
if std::env::var("DISABLE_LIMBO_BENCHMARK").is_err() {
limbo_bench(c);
}
// https://github.com/penberg/limbo/issues/174
// The rusqlite benchmark crashes on Mac M1 when using the flamegraph features
if std::env::var("DISABLE_RUSQLITE_BENCHMARK").is_ok() {
return;
if std::env::var("DISABLE_RUSQLITE_BENCHMARK").is_err() {
rusqlite_bench(c);
}
}

fn parser_bench(criterion: &mut Criterion) {
let mut group = criterion.benchmark_group("parser");
group.throughput(Throughput::Elements(1));

let sql_statements = vec![
("SELECT_STAR_LIMIT_1".to_string(), "SELECT * FROM users LIMIT 1".to_string()),
("MULTIPLE_JOINS".to_string(), "SELECT foo,bar,baz,bad FROM users LEFT OUTER JOIN products INNER JOIN gizmos LIMIT 100".to_string()),
("MULTIPLE_JOINS_WITH_WHERE".to_string(), "SELECT foo,bar,baz,bad FROM users LEFT OUTER JOIN products INNER JOIN gizmos WHERE foo = 1 AND bar = 2 OR baz < 3 AND bad = 4 LIMIT 100".to_string()),
("MULTIPLE_JOINS_WITH_WHERE_GROUPBY_AND_ORDERBY".to_string(), "SELECT foo,bar,baz,bad FROM users LEFT OUTER JOIN products INNER JOIN gizmos WHERE foo = 1 AND bar = 2 OR baz < 3 AND bad = 4 GROUP BY foo,bar ORDER BY baz DESC, bad ASC LIMIT 100".to_string()),
];

for (i, (test_name, sql)) in sql_statements.into_iter().enumerate() {
group.bench_function(format!("Parse SQL, sqlite3_parser: '{}'", test_name), |b| {
b.iter(|| {
let mut p = Parser::new(sql.as_bytes());
p.next().unwrap();
});
});
let mut sql2 = sql.clone();
group.bench_function(format!("Parse SQL, limbo: '{}'", test_name), |b| {
b.iter(|| {
parse_sql_statement(&mut sql2).unwrap();
});
});
}

rusqlite_bench(c)
group.finish();
}

fn limbo_bench(criterion: &mut Criterion) {
Expand Down
1 change: 1 addition & 0 deletions core/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@ mod error;
mod function;
mod io;
mod json;
pub mod parser;
mod pseudo;
mod schema;
mod storage;
Expand Down
77 changes: 77 additions & 0 deletions core/parser/SUPPORTED_FEATURES.MD
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
# SQL Parser Supported Features

This document outlines the features currently supported by our work-in-progress SQL parser.

## SELECT Statement

- [x] Basic SELECT ... FROM ... structure
- [x] Column selection
- [x] Star (*) selection
- [x] Individual column selection
- [x] Qualified column names (e.g., table.column)
- [x] Table selection in FROM clause
- [x] Table aliasing
- [x] WHERE clause with conditions
- [x] GROUP BY clause
- [x] ORDER BY clause
- [x] ASC and DESC directives
- [x] Multiple column ordering
- [x] LIMIT clause
- [ ] OFFSET clause
- [ ] HAVING clause
- [ ] Subqueries

## Operators and Expressions

- [x] Basic arithmetic operators (+, -, *, /)
- [x] Comparison operators (=, !=, <>, >, <, >=, <=)
- [x] Logical operators (AND, OR, NOT)
- [x] IN operator
- [x] NOT IN operator
- [x] LIKE operator
- [x] BETWEEN operator
- [x] Parenthesized expressions
- [x] Function calls
- [x] CASE expressions (both simple and searched)

## JOINs

- [x] INNER JOIN
- [x] LEFT OUTER JOIN
- [x] Multiple joins in a single query
- [x] Join conditions (ON clause)
- [ ] RIGHT OUTER JOIN
- [ ] FULL OUTER JOIN
- [ ] CROSS JOIN

## Data Types and Literals

- [x] String literals
- [x] Numeric literals (integers and floats)
- [ ] Date and time literals
- [ ] Boolean literals
- [ ] NULL

## Functions

- [x] Basic function calls with arguments
- [x] Aggregate functions (SUM, AVG, etc.)
- [ ] Window functions

## Additional Features

- [x] Case insensitivity for keywords
- [x] Column aliasing
- [ ] Common Table Expressions (CTEs)
- [ ] Set operations (UNION, INTERSECT, EXCEPT)

## Other SQL Statement Types

- [ ] INSERT
- [ ] UPDATE
- [ ] DELETE
- [ ] CREATE TABLE
- [ ] ALTER TABLE
- [ ] DROP TABLE
- [ ] CREATE INDEX
- [ ] CREATE VIEW
161 changes: 161 additions & 0 deletions core/parser/ast.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,161 @@
#[repr(u8)]
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub enum Operator {
Between,
Eq,
NotEq,
Lt,
LtEq,
Gt,
GtEq,
And,
Or,
Plus,
Minus,
Multiply,
Divide,
Like,
NotLike,
Glob,
Not,
In,
NotIn,
}

#[derive(Debug, Clone, PartialEq, Eq)]
pub struct Column {
pub name: String,
pub table_name: Option<String>,
pub alias: Option<String>,
pub table_no: Option<u64>,
pub column_no: Option<u64>,
}

#[derive(Debug, Clone, PartialEq, Eq)]
pub enum Expression {
Between {
lhs: Box<Expression>,
start: Box<Expression>,
end: Box<Expression>,
},
Case {
base: Option<Box<Expression>>,
when_then_pairs: Vec<(Expression, Expression)>,
else_expr: Option<Box<Expression>>,
},
Column(Column),
LiteralString(String),
LiteralNumber(String),
LiteralBlob(Vec<u8>),
Unary {
op: Operator,
expr: Box<Expression>,
},
Binary {
lhs: Box<Expression>,
op: Operator,
rhs: Box<Expression>,
},
Parenthesized(Box<Expression>),
FunctionCall {
name: String,
args: Option<Vec<Expression>>,
},
InList {
expr: Box<Expression>,
list: Option<Vec<Expression>>,
not: bool,
},
}

#[derive(Debug, Clone, PartialEq, Eq)]
pub enum ResultColumn {
Expr {
expr: Expression,
alias: Option<String>,
},
Star,
TableStar {
table: Table,
},
}

#[derive(Debug, Clone, PartialEq, Eq)]
pub struct Table {
pub name: String,
pub alias: Option<String>,
pub table_no: Option<u64>,
}

#[derive(Debug, Clone, PartialEq, Eq)]
pub enum JoinVariant {
Inner = 0,
Outer = 1,
Left = 2,
}

#[derive(Debug, Clone, PartialEq, Eq)]
pub struct JoinType {
pub(crate) bitmask: u8,
}

impl JoinType {
pub fn new() -> Self {
Self {
bitmask: JoinVariant::Inner as u8,
}
}

pub fn with(mut self, variant: JoinVariant) -> Self {
// outer needs to clear inner and vice versa
match variant {
JoinVariant::Inner => {
self.bitmask &= !(JoinVariant::Outer as u8);
self.bitmask |= JoinVariant::Inner as u8;
}
JoinVariant::Outer => {
self.bitmask |= JoinVariant::Outer as u8;
}
JoinVariant::Left => {
self.bitmask |= JoinVariant::Left as u8;
self.bitmask -= JoinVariant::Outer as u8;
}
}
self
}
}

#[derive(Debug, Clone, PartialEq, Eq)]
pub struct Join {
pub join_type: JoinType,
pub table: Table,
pub on: Option<Expression>,
}

#[derive(Debug, Clone, PartialEq, Eq)]
pub struct FromClause {
pub table: Table,
pub joins: Option<Vec<Join>>,
}

#[repr(u8)]
#[derive(Debug, Clone, PartialEq, Eq)]
pub enum Direction {
Ascending,
Descending,
}

#[derive(Debug, Clone, PartialEq, Eq)]
pub struct SelectStatement {
pub columns: Vec<ResultColumn>,
pub from: Option<FromClause>,
pub where_clause: Option<Expression>,
pub group_by: Option<Vec<Expression>>,
pub order_by: Option<Vec<(Expression, Direction)>>,
pub limit: Option<u64>,
}

#[derive(Debug, Clone, PartialEq, Eq)]
pub enum SqlStatement {
Select(SelectStatement),
}
Loading
Loading