Possible solutions to the concurrency problems #1179

alinnert · 2023-07-16T15:44:06Z

I am aware that this library can cause problems when being used in different goroutines. These days I was thinking about how this could be solved. I think a good idea would be to use a dedicated goroutine that handles all database related work. It would be similar to a task queue since you could send a query to execute into it and wait for a result to come back, but with support for a single task runner only.

Since channels have a limited capacity a direct connection wouldn't work because the channel can overflow pretty easily. So the idea is to use a task quere separate from the actual worker. Like this:

sequenceDiagram
    participant Source A
    participant Source B
    Source A->>Task Queue: Send task A
    Source B->>Task Queue: Send task B
    Task Queue->>DB Worker: Send task A
    activate DB Worker
    DB Worker->>Task Queue: Send result A
    deactivate DB Worker
    Task Queue->>Source A: Send result A
    Task Queue->>DB Worker: Send task B
    activate DB Worker
    DB Worker->>Task Queue: Send result B
    deactivate DB Worker
    Task Queue->>Source B: Send result B

I'm currently not very experienced with using goroutines and channels, so I've found a possible implementation here: https://reintech.io/blog/implementing-distributed-task-queue-go

But it contains a very curious piece of code:

If you're interested

for {
    select {
    case task := <-tq.enqueue:
        tq.tasks = append(tq.tasks, task)
    case tq.dequeue <- tq.tasks[0]:
        tq.tasks = tq.tasks[1:]
    }
}

The second case seems to wait until tq.tasks contains an item that can be sent to tq.dequeue. According to a Go developer I know this behavior is not well defined in the language spec and might break or change in a future release. So, I'm keeping my fingers from that for now.

So, my questions are: a) does anybody know how to implement this or how to improve the implementation above? And b) (that's why I'm opening an issue here) where is the best place for this? It could be done in the individual projects. But since this seems to be a big problem amongst many users couldn't it be part of go-sqlite3 itself (maybe as a public wrapper or helper function or as an internal implementation detail)? Or maybe a separate wrapper/helper library? In the case of this being custom code could a solution like this be added to the documentation (or linked to one) so everyone working with concurrency has an easier starting point?

What do you think about this in general? Are there problems to this approach I don't see yet?

The text was updated successfully, but these errors were encountered:

rittneje · 2023-07-17T00:15:50Z

I am aware that this library can cause problems when being used in different goroutines.

Where did you hear that?

alinnert · 2023-07-17T08:39:32Z

I mean when you want to write to the database from more than one goroutine. See docs:

Can I use this in multiple routines concurrently?

Yes for readonly. But not for writable. See #50, #51, #209, #274.

Although, I've seen issues that people sometimes have problems with reading too, like in this comment.

rittneje · 2023-07-17T12:53:08Z

Unfortunately, the README is a little misleading. This library is already safe for concurrent use. However, there are a few caveats that need to be acknowledged.

SQLite will support any number of concurrent reads under any mode. It does not support a concurrent read and write to the same database file under the default rollback journal mode, but this is supported under WAL mode. It does not support concurrent writes to the same database file under any mode. It is generally recommended to use WAL. Please note that "does not support" means that it will perform these operations serially, with a wait/retry period defined by your busy handler (or the busy_timeout pragma). If the wait period expires before the operation can proceed, then you will get SQLITE_BUSY (aka "database is locked").

You may have come across some suggestions to use shared cache mode to avoid SQLITE_BUSY. Do not follow these suggestions. The SQLite authors themselves discourage using this mode under any circumstance. If you use it, it may appear to resolve the issue, but then you can start to get the similar-looking SQLITE_LOCKED (aka "database table is locked").

If your code is going to read and write to the database as part of the same operation, you must use a single transaction for the whole thing. Otherwise, because of how sql.DB works, you can run into issues where the in-progress read on one connection blocks the attempted write on the other connection.

Whenever you start a transaction that might write to the database, always use BEGIN IMMEDIATE, not the default (BEGIN DEFERRED). Otherwise, if another connection also writes to the database in the middle of your transaction, you can get SQLITE_BUSY without your busy handler even being triggered, due to SQLite's need to enforce the ACID requirements of a database engine. If your transaction will definitely only read, then you should continue to use BEGIN DEFERRED in order to allow concurrent reads to function.

My general recommendation is to have two connection pools (i.e., sql.DB instances). One should be read-write, configured to use BEGIN IMMEDIATE, and throttled to a single connection via SetMaxOpenConns, SetMaxIdleConns, SetConnMaxLifetime, and SetConnMaxIdleTime. The other should be read-only, configured to use BEGIN DEFERRED (which is the default), and can be throttled to any number of connections within the limits of your system.

alinnert · 2023-07-18T09:04:58Z

Thanks a lot for this detailed explanation. Currently I don't know much about the technical details of SQLite. But using this I now know what to look for. So, it's very helpful and could actually be part of the documentation, or linked to from the docs or the FAQ I've mentioned. 👍

Please note that "does not support" means that it will perform these operations serially

Yes, this is really a big confusing part because I expect every RDBMS to work that way (at least to some extend) because of ACID.

I will close this issue now because my original intent doesn't really make any sense now.

shivabohemian · 2023-09-18T09:13:35Z

My general recommendation is to have two connection pools (i.e., sql.DB instances). One should be read-write, configured to use BEGIN IMMEDIATE, and throttled to a single connection via SetMaxOpenConns, SetMaxIdleConns, SetConnMaxLifetime, and SetConnMaxIdleTime. The other should be read-only, configured to use BEGIN DEFERRED (which is the default), and can be throttled to any number of connections within the limits of your system.

Thank you so much! It's a very helpful recommendation. Do I also need to set both connection pools to WAL mode? @rittneje

rittneje · 2023-09-18T11:18:58Z

@shivabohemian I would generally recommend using WAL when possible, but you don't have to. Please note that a read-only connection will not be able to convert the database to WAL mode, so be sure to open the read-write connection first. (This also means there is little point in attempting to set the journal mode to WAL via the read-only connections.) Also note that since sql.DB lazily opens connections, you will need to call Ping on the read-write pool to force this to happen.

shivabohemian · 2023-09-18T12:51:16Z

@shivabohemian I would generally recommend using WAL when possible, but you don't have to. Please note that a read-only connection will not be able to convert the database to WAL mode, so be sure to open the read-write connection first. (This also means there is little point in attempting to set the journal mode to WAL via the read-only connections.) Also note that since sql.DB lazily opens connections, you will need to call Ping on the read-write pool to force this to happen.

Got it. Thank you for your notes and it's very useful.

grifx · 2024-01-16T04:49:50Z

Is it possible / recommended to do the following with multiple single-tenant dbs:

0 or 1 read-write connection using the WAL handling serialised reads and writes
0 or many read-only connections leveraging SharedCache

You may have come across some suggestions to use shared cache mode to avoid SQLITE_BUSY. Do not follow these suggestions. The SQLite authors themselves discourage using this mode under any circumstance. If you use it, it may appear to resolve the issue, but then you can start to get the similar-looking SQLITE_LOCKED (aka "database table is locked").

@rittneje Is there a clear explanation why this error is happening? Under which scenarios does it happen? Can the issue be circumvented?

rittneje · 2024-01-16T05:30:38Z

Shared cache mode is not recommended for use at all.

As far as SQLITE_LOCKED goes, see https://www.sqlite.org/unlock_notify.html for details. Note this requires building your application with the sqlite_unlock_notify build tag, which I would recommend against doing.

grifx · 2024-01-16T05:56:33Z

Shared cache mode is not recommended for use at all.

Why would shared cache be an issue with read-only connections?

infogulch · 2024-02-26T22:20:33Z

My general recommendation is to have two connection pools (i.e., sql.DB instances). One should be read-write, configured to use BEGIN IMMEDIATE, and throttled to a single connection via SetMaxOpenConns, SetMaxIdleConns, SetConnMaxLifetime, and SetConnMaxIdleTime. The other should be read-only, configured to use BEGIN DEFERRED (which is the default), and can be throttled to any number of connections within the limits of your system.

Could mattn/go-sqlite3 be wrapped by a new driver.Driver that implements this strategy and exposes it so users can get the benefits of using sqlite correctly without having to change how they manage their sql.DB?

Lets call this hypothetical package wrapped-sqlite3. When you sql.Open("wrapped-sqlite3", "...") then this package opens two sqlite3 sql.DBs, one for shared read-only queries and one for read-write throttled to a single connection as described above. When a BeginTx is called with TxOptions { ReadOnly: true } then the connection is routed to the read-only instance, but if TxOptions { ReadOnly: false } then the connection is routed to the read-write instance.

rittneje · 2024-02-27T00:10:51Z

A driver needs to implement the database/sql/driver APIs, which underpin sql.DB. Thus I think trying to implement a driver on top of sql.DB would be confusing, as you'd end up with two layers of pools.

rittneje · 2024-02-27T00:16:12Z

Another issue with that is there is no way in general to handle any non-transactional access. For example, suppose the application code calls db.Query(...). Should this hypothetical driver route that to the read-only pool or the read-write pool? Keep in mind that not all calls to Query are read-only, especially now that we have RETURNING clauses in SQLite.

infogulch · 2024-03-12T07:58:45Z

Keep in mind that not all calls to Query are read-only, especially now that we have RETURNING clauses in SQLite.

I see, I was not aware that you can use the Query statement to perform an update. This fact seems to preclude any attempt to multiplex multi-reader/single-writer connections automatically behind a single sql.DB, regardless of where you try to fit it.

It seems that optimal usage of SQLite in Go significantly more complicated than the default/typical usage of sql.DB intended by the language. The recommended configuration with separate read/write connection pools seems like a solid solution to the problem, it's unfortunate that this solution can't fit into Go's sql.DB-shaped hole.

alinnert closed this as completed Jul 18, 2023

willemschots mentioned this issue Mar 25, 2024

4: Seperate read and write instances for sqlite willemschots/househunt#19

Merged

darthShadow mentioned this issue Oct 12, 2024

Refactor library.db into jellyfin.db and EFCore jellyfin/jellyfin#12798

Open

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible solutions to the concurrency problems #1179

Possible solutions to the concurrency problems #1179

alinnert commented Jul 16, 2023

rittneje commented Jul 17, 2023 •

edited

Loading

alinnert commented Jul 17, 2023

rittneje commented Jul 17, 2023

alinnert commented Jul 18, 2023

shivabohemian commented Sep 18, 2023

rittneje commented Sep 18, 2023

shivabohemian commented Sep 18, 2023

grifx commented Jan 16, 2024

rittneje commented Jan 16, 2024

grifx commented Jan 16, 2024

infogulch commented Feb 26, 2024

rittneje commented Feb 27, 2024

rittneje commented Feb 27, 2024

infogulch commented Mar 12, 2024

Possible solutions to the concurrency problems #1179

Possible solutions to the concurrency problems #1179

Comments

alinnert commented Jul 16, 2023

rittneje commented Jul 17, 2023 • edited Loading

alinnert commented Jul 17, 2023

rittneje commented Jul 17, 2023

alinnert commented Jul 18, 2023

shivabohemian commented Sep 18, 2023

rittneje commented Sep 18, 2023

shivabohemian commented Sep 18, 2023

grifx commented Jan 16, 2024

rittneje commented Jan 16, 2024

grifx commented Jan 16, 2024

infogulch commented Feb 26, 2024

rittneje commented Feb 27, 2024

rittneje commented Feb 27, 2024

infogulch commented Mar 12, 2024

rittneje commented Jul 17, 2023 •

edited

Loading