-
-
Notifications
You must be signed in to change notification settings - Fork 694
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Datasette Library #417
Comments
This would allow Datasette to be easily used as a "data library" (like a data warehouse but less expectation of big data querying technology such as Presto). One of the things I learned at the NICAR CAR 2019 conference in Newport Beach is that there is a very real need for some kind of easily accessible data library at most newsrooms. |
A neat ability of Datasette Library would be if it can work against other files that have been dropped into the folder. In particular: if a user drops a CSV file into the folder, how about automatically converting that CSV file to SQLite using sqlite-utils? |
This would be really interesting but several possibilities in use arise, I think? For example:
CSV files may also have messy names compared to the table you want. Or for an update CSV, may have the form |
Refs #417 First proof-of-concept for Datasette Library. Run like this: datasette -d ~/Library Uses a new plugin hook - available_databases() BUT... I don't think this is quite the way I want to go.
OK, I have a plan. I'm going to try and implement this is a core Datasette feature (no plugins) with the following design:
To check if a file is valid SQLite, Datasette will first check if the first few bytes of the file are |
I'm going to add two methods to the Datasette class to help support this work (and to enable exciting new plugin opportunities in the future):
|
MVP for this feature: just do it once on startup, don't scan for new files every X seconds. |
I'm going to move this over to a draft pull request. |
So could the polling support also allow you to call sqlite_utils to update a database with csv files? (Though I'm guessing you would only want to handle changed files? Do your scrapers check and cache csv datestamps/hashes?) |
Instead of scanning the directory every 10s, have you considered listening for the native system events to notify you of updates? I think python has a nice module to do this for you called watchdog |
That's a great idea. I'd ruled that out because working with the different operating system versions of those is tricky, but if |
Very much looking forward to seeing this functionality come together. This is probably out-of-scope for an initial release, but in the future it could be useful to also think of how to run this is a container'ized context. For example, an immutable datasette container that points to an S3 bucket of SQLite DBs or CSVs. Or an immutable datasette container pointing to a NFS volume elsewhere on a Kubernetes cluster. |
FWIW, I had a look at Not a production thing, just an experiment trying to explore what might be possible... |
The ability to run Datasette in a mode where it automatically picks up new (or modified) files in a directory tree without needing to restart the server.
Suggested command:
The text was updated successfully, but these errors were encountered: