Datasette Library #417

simonw · 2019-03-15T14:30:22Z

The ability to run Datasette in a mode where it automatically picks up new (or modified) files in a directory tree without needing to restart the server.

Suggested command:

datasette library /path/to/mydbs/

The text was updated successfully, but these errors were encountered:

simonw · 2019-03-15T14:32:13Z

This would allow Datasette to be easily used as a "data library" (like a data warehouse but less expectation of big data querying technology such as Presto).

One of the things I learned at the NICAR CAR 2019 conference in Newport Beach is that there is a very real need for some kind of easily accessible data library at most newsrooms.

simonw · 2019-03-15T14:42:07Z

A neat ability of Datasette Library would be if it can work against other files that have been dropped into the folder. In particular: if a user drops a CSV file into the folder, how about automatically converting that CSV file to SQLite using sqlite-utils?

psychemedia · 2019-03-19T10:06:42Z

This would be really interesting but several possibilities in use arise, I think?

For example:

I put a new CSV file into the import dir and a new table is created therefrom
I put a CSV file into the import dir that replaces a previous file / table of the same name as a pre-existing table (eg files that contain monthly data in year to date). The data may also patch previous months, so a full replace / DROP on the original table may well be in order.
I put a CSV file into the import dir that updates a table of the same name as a pre-existing table (eg files that contain last month's data)

CSV files may also have messy names compared to the table you want. Or for an update CSV, may have the form MYTABLENAME-February2019.csv etc

Refs #417 First proof-of-concept for Datasette Library. Run like this: datasette -d ~/Library Uses a new plugin hook - available_databases() BUT... I don't think this is quite the way I want to go.

simonw · 2020-02-14T01:03:43Z

OK, I have a plan. I'm going to try and implement this is a core Datasette feature (no plugins) with the following design:

You can tell Datasette "load any databases you find in this directory" by passing the --dir=path/to/dir option to datasette that are valid SQLite files and will attach them to Datasette
Every 10 seconds Datasette will re-scan those directories to see if any new files have been added
That 10s will be the default for a new --config directory_scan_s:10 config option. You can set this to 0 to disable scanning entirely, at which point Datasette will only run the scan once on startup.

To check if a file is valid SQLite, Datasette will first check if the first few bytes of the file are b"SQLite format 3\x00". If they are, it will open a connection to the file and attempt to run select * from sqlite_master against it. If that runs without any errors it will assume the file is usable and connect it.

simonw · 2020-02-14T01:05:20Z

I'm going to add two methods to the Datasette class to help support this work (and to enable exciting new plugin opportunities in the future):

datasette.add_database(name, db) - adds a new named database to the list of connected databases. db will be a Database() object, which may prove useful in the future for things like Prototoype for Datasette on PostgreSQL #670 and could also allow some plugins to provide in-memory SQLite databases.
datasette.remove_database(name)

simonw · 2020-02-14T02:20:53Z

MVP for this feature: just do it once on startup, don't scan for new files every X seconds.

simonw · 2020-02-14T02:24:54Z

I'm going to move this over to a draft pull request.

psychemedia · 2020-02-15T15:12:19Z

So could the polling support also allow you to call sqlite_utils to update a database with csv files? (Though I'm guessing you would only want to handle changed files? Do your scrapers check and cache csv datestamps/hashes?)

dyllan-to-you · 2020-12-24T22:56:48Z

Instead of scanning the directory every 10s, have you considered listening for the native system events to notify you of updates?

I think python has a nice module to do this for you called watchdog

simonw · 2020-12-24T22:58:05Z

That's a great idea. I'd ruled that out because working with the different operating system versions of those is tricky, but if watchdog can handle those differences for me this could be a really good option.

drewda · 2020-12-27T19:02:06Z

Very much looking forward to seeing this functionality come together. This is probably out-of-scope for an initial release, but in the future it could be useful to also think of how to run this is a container'ized context. For example, an immutable datasette container that points to an S3 bucket of SQLite DBs or CSVs. Or an immutable datasette container pointing to a NFS volume elsewhere on a Kubernetes cluster.

psychemedia · 2020-12-29T14:34:30Z

FWIW, I had a look at watchdog for a datasette powered Jupyter notebook search tool: https://github.com/ouseful-testing/nbsearch/blob/main/nbsearch/nbwatchdog.py

Not a production thing, just an experiment trying to explore what might be possible...

This was referenced Mar 15, 2019

Hashed URLs should be optional #418

Closed

Default to opening files in mutable mode, special option for immutable files #419

Closed

simonw mentioned this issue Mar 15, 2019

Datasette serve should accept paths/URLs to CSVs and other file formats #123

Open

simonw added large feature labels Mar 15, 2019

simonw mentioned this issue Mar 20, 2019

Figure out what to do about table counts in a mutable world #422

Closed

This was referenced May 14, 2019

Paginate + search for databases/tables on the homepage #461

Open

Decide what to do about /-/inspect #465

Closed

Pagination for the database index page #468

Closed

simonw mentioned this issue Jun 4, 2019

Accessibility for non-techie newsies? #499

Open

simonw mentioned this issue Jun 22, 2019

Documentation with recommendations on running Datasette in production without using Docker #514

Closed

simonw mentioned this issue Jul 26, 2019

First proof-of-concept of Datasette Library #564

Draft

simonw mentioned this issue Aug 17, 2019

More advanced connection pooling #569

Open

simonw mentioned this issue Oct 2, 2019

Support cross-database joins #283

Closed

simonw mentioned this issue Feb 14, 2020

datasette.add_database(name, db) and datasette.remove_database(name) methods #671

Closed

simonw added a commit that referenced this issue Feb 14, 2020

--dirs scan mechanism, work in progress - refs #417

fe6f9e6

simonw mentioned this issue Feb 14, 2020

--dirs option for scanning directories for SQLite databases #672

Open

simonw added a commit that referenced this issue Feb 15, 2020

--dirs scan mechanism, work in progress - refs #417

d05eeee

simonw added a commit that referenced this issue Mar 26, 2020

--dirs scan mechanism, work in progress - refs #417

462c198

simonw added a commit that referenced this issue Mar 26, 2020

--dirs scan mechanism, work in progress - refs #417

6ff261c

simonw mentioned this issue Dec 17, 2020

Maintain an in-memory SQLite table of connected databases and their tables #1150

Closed

simonw mentioned this issue Feb 18, 2021

Runtime support for ATTACHing multiple databases #1234

Open

simonw mentioned this issue Apr 10, 2022

[feature] immutable mode for a directory, not just individual sqlite file #1706

Open

simonw mentioned this issue Nov 11, 2022

Datasette with many and large databases > Memory use #1880

Open

simonw mentioned this issue Dec 14, 2022

UI to create reduced scope tokens from the /-/create-token page #1947

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Datasette Library #417

Datasette Library #417

simonw commented Mar 15, 2019

simonw commented Mar 15, 2019

simonw commented Mar 15, 2019 •

edited

Loading

psychemedia commented Mar 19, 2019

simonw commented Feb 14, 2020 •

edited

Loading

simonw commented Feb 14, 2020

simonw commented Feb 14, 2020

simonw commented Feb 14, 2020

psychemedia commented Feb 15, 2020 •

edited

Loading

dyllan-to-you commented Dec 24, 2020

simonw commented Dec 24, 2020

drewda commented Dec 27, 2020

psychemedia commented Dec 29, 2020 •

edited

Loading

Datasette Library #417

Datasette Library #417

Comments

simonw commented Mar 15, 2019

simonw commented Mar 15, 2019

simonw commented Mar 15, 2019 • edited Loading

psychemedia commented Mar 19, 2019

simonw commented Feb 14, 2020 • edited Loading

simonw commented Feb 14, 2020

simonw commented Feb 14, 2020

simonw commented Feb 14, 2020

psychemedia commented Feb 15, 2020 • edited Loading

dyllan-to-you commented Dec 24, 2020

simonw commented Dec 24, 2020

drewda commented Dec 27, 2020

psychemedia commented Dec 29, 2020 • edited Loading

simonw commented Mar 15, 2019 •

edited

Loading

simonw commented Feb 14, 2020 •

edited

Loading

psychemedia commented Feb 15, 2020 •

edited

Loading

psychemedia commented Dec 29, 2020 •

edited

Loading