Caution
This package is still a work in progress. You can try it out but the API might change in future versions but not drastically.
With cig, you can query a .csv file with sql syntax.
Important considerations:
- Columns to return, columns in where conditions, columns in ORDER BY clause and values must be enclosed in single quotes. For example:
SELECT 's.ColumnOne', 's.ColumnTwo'
FROM path:path_to_csv.csv AS s WHERE 's.ColumnThree' = 'value'
ORDER BY 's.columnFour', 's.ColumnFive' DESC
-
Alias is required. Without the
AS s
part of the above query, the query would not be able to run. -
Path to a file must be relative to the executing binary or an absolute path. Consider always giving absolute path for better portability.
-
This project does not and will not implement the entire SQL syntax. Other than tasks outlined in the Tasks section, nothing else will be developed except making it faster and maintainable.
-
This is not a project that should be used in production. Its only use is for simple lookups and nothing else. In most situations, it is better to import a csv file into a database of your choice. This project is intended as "something interesting to do" for me so do not take it too seriously.
-
This package will be concurrency safe. This means that
Run()
method will be able to be used inside your own concurrency primitives. Although I will try to make it faster using concurrency for very large files, that will not affect using the public API in your code.
go get github.com/MarioLegenda/[email protected]
Below snippet of sql describes almost all current features of this package:
SELECT * FROM path:path_to_file.csv AS g WHERE 'g.columnOne' = 'string_value'
AND 'g.columnTwo'::int != '65' OR 'g.columnThree'::float = '56.3'
OFFSET 34
LIMIT 56
ORDER BY 'g.columnFour', 'g.columnFive' DESC
Instead of *
, you can specify the columns to return like this:
SELECT 'g.columnOne', 'g.columnTwo' /** rest of query goes here */
If you don't specify DESC
or ASC
, ASC
is assumed.
In code, you use it like this:
package main
import (
"fmt"
"github.com/MarioLegenda/cig"
"log"
)
func main() {
c := cig.New()
result := c.Run(`
SELECT * FROM path:path_to_file.csv AS g WHERE 'g.columnOne' = 'string_value'
AND 'g.columnTwo'::int != '65' OR 'g.columnThree'::float = '56.3'
OFFSET 34
LIMIT 56
ORDER BY 'g.columnFour', 'g.columnFive' DESC
`)
if result.Error != nil {
log.Fatalln(result.Error)
}
fmt.Println(result.SelectedColumns)
fmt.Println(result.AllColumns)
fmt.Println(result.Data)
}
Signature of the result is
type Data struct {
SelectedColumns []string
AllColumns []string
Error error
Data []map[string]string
}
You can handle errors with the errors.Is
function if you need fine grained
control of exactly which error happened.
package main
import (
"errors"
"fmt"
"github.com/MarioLegenda/cig"
cigError "github.com/MarioLegenda/cig/pkg"
"log"
)
func main() {
c := cig.New()
result := c.Run(`
SELECT * FROM path:path_to_file.csv AS g WHERE 'g.columnOne' = 'string_value'
AND 'g.columnTwo'::int != '65' OR 'g.columnThree'::float = '56.3'
OFFSET 34
LIMIT 56
ORDER BY 'g.columnFour', 'g.columnFive' DESC
`)
if errors.Is(result.Error, cigError.InvalidAlias) {
log.Fatalln(result.Error)
}
fmt.Println(result.SelectedColumns)
fmt.Println(result.AllColumns)
fmt.Println(result.Data)
}
This is the full list of errors you can use:
var InvalidToken = errors.New("Expected WHERE or LIMIT, OFFSET, ORDER BY, got something else.")
var InvalidSelectToken = errors.New("Expected 'select', got something else.")
var InvalidSelectableColumns = errors.New("Expected selectable column")
var InvalidDuplicatedColumn = errors.New("Duplicated selectable column")
var InvalidFromToken = errors.New("Expected 'FROM', got something else.")
var InvalidFilePathToken = errors.New("Expected 'path:path_to_file' but did not get the path part")
var InvalidAsToken = errors.New("Expected 'as', got something else.")
var InvalidAlias = errors.New("Invalid alias.")
var InvalidColumnAlias = errors.New("Column alias not recognized.")
var InvalidWhereClause = errors.New("Expected WHERE clause, got something else.")
var InvalidConditionColumn = errors.New("Expected condition column.")
var InvalidComparisonOperator = errors.New("Invalid comparison operator")
var InvalidLogicalOperator = errors.New("Invalid logical operator")
var InvalidValueToken = errors.New("Invalid value token.")
var InvalidDataType = errors.New("Invalid data type.")
var InvalidConditionAlias = errors.New("Invalid condition alias.")
var InvalidOrderBy = errors.New("Invalid ORDER BY")
One use could be in an environment where it is not possible to install a database just to lookup some values in a .csv file. This package will provide a command line utility to do so. Other than that, it would be better to import a .csv file into a database of your choice and use it like that.
- Implement logical operators
- Implement all comparison operators (now, only equality works)
- Implement picking columns to return
- Implement OFFSET and LIMIT to implement pagination
- Implement sorting
- Create a command line utility to use it on the command line
- Implement JOIN with multiple files
- Implement options (cache, timeout with context, extremely simple optional indexing on first query execution)
- Implement splitting work into multiple goroutines
- Implement solutions from one billion rows challenge