Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: add lock to registering instrumented drivers #722

Merged
merged 1 commit into from
Jun 1, 2022

Conversation

zepatrik
Copy link
Contributor

@zepatrik zepatrik commented May 27, 2022

Fixes

panic: sql: Register called twice for driver instrumented-sql-driver-sqlite3 [recovered]
        panic: sql: Register called twice for driver instrumented-sql-driver-sqlite3

goroutine 15 [running]:
testing.tRunner.func1.2({0x1837d40, 0xc000473b70})
        /usr/lib/go/src/testing/testing.go:1389 +0x366
testing.tRunner.func1()
        /usr/lib/go/src/testing/testing.go:1392 +0x5d2
panic({0x1837d40, 0xc000473b70})
        /usr/lib/go/src/runtime/panic.go:844 +0x258
database/sql.Register({0x1a43574, 0x1f}, {0x1de6f60?, 0xc00046fb00})
        /usr/lib/go/src/database/sql/sql.go:51 +0x1d4
github.com/gobuffalo/pop/v6.instrumentDriver(0xc0002e63c0, {0x1a1f3bd, 0x7})
        /home/patrik/git/go/pkg/mod/github.com/gobuffalo/pop/[email protected]/connection_instrumented.go:68 +0x3e5
github.com/gobuffalo/pop/v6.openPotentiallyInstrumentedConnection({0x1df80b0, 0xc00000f7b8}, {0xc000182780, 0x5b})
        /home/patrik/git/go/pkg/mod/github.com/gobuffalo/pop/[email protected]/connection_instrumented.go:81 +0x9a
github.com/gobuffalo/pop/v6.(*Connection).Open(0xc000455500)
        /home/patrik/git/go/pkg/mod/github.com/gobuffalo/pop/[email protected]/connection.go:112 +0x11a

This problem occurred during parallel tests.

The race condition happened because the reading of the drivers and the registration of the not-found driver can have some delay, during which another go routine could already have registered the driver. Because the registration is happening as a side-effect, this cannot be fixed by the client.

@paganotoni
Copy link
Member

@zepatrik is there any chance we add a test for this one 🥹? Thanks in advance.

@zepatrik
Copy link
Contributor Author

How could we test for such a race condition? 🤔

@paganotoni
Copy link
Member

Good question. Not sure, maybe -race helps? Maybe running parallel tests?

@zepatrik
Copy link
Contributor Author

Hm but I also did not get the panic all the time before the fix, only on some occasions. I guess the only way is to inject some waiting behavior that we could then control in the test. I will try to come up with something in the next days.

@paganotoni
Copy link
Member

paganotoni commented May 27, 2022

Got it. Lets give it a shot, I don't think that's a huge blocker for merging this change but want to at least try this so we leave this covered for any future change. thanks in advance @zepatrik

@zepatrik
Copy link
Contributor Author

zepatrik commented May 30, 2022

I tried for a bit yesterday and could not come up with any useful test. Do you maybe have a proposal? My approach involved channels and stopping execution at the critical point in two go-routines, but as the lock prevented me from getting to that point I could only make it flaky without the lock, or a dead-lock with the lock.

@sio4
Copy link
Member

sio4 commented May 30, 2022

By the way, pop's test workflow already has -race option so the issue could be one that cannot be detected within pop. Plus, the error message sql: Register called twice for driver instrumented-sql-driver-sqlite3 tell us the function should be called just once (maybe per driver?), so I guess locking the function call could be helpful if we have some code to prevent the second call but in my rough looking at the patch, I am not sure this locking could help the situation.

If the issue itself is not clear, the approach for fixing the issue also will not be clear. So I hope we can clarify the issue first before patching it. Actually, I am not familiar enough with pop nowadays but please let me take a look at the situation before moving forward. (Also, it could be fine if you give some more idea on the issue and solution)

connection_instrumented.go Show resolved Hide resolved
@zepatrik
Copy link
Contributor Author

I think the new test failure is because we cannot re-register the instrumented driver with different options... Would you like to have that test or not? I think I will have to make it part of the already existing test to work.

@sio4
Copy link
Member

sio4 commented May 30, 2022

I don't think adding a channel to the function just for testing is a good thing. If the logic is good enough but is hard to test, it could be fine without a test.

I just wonder, in the first place, if we can avoid this situation (calling this call twice from goroutines even though it looks like a kind of initialization).

By the way, I think your previous version (w/o a test) could be acceptable. (I mean that is logically fine)

@zepatrik
Copy link
Contributor Author

I just wonder, in the first place, if we can avoid this situation (calling this call twice from goroutines even though it looks like a kind of initialization).

Unfortunately not. The sql.RegisterDriver is meant to be called in an init func, so it is a side-effect of importing a driver. There is no atomic way to check whether a driver was already registered.

@zepatrik
Copy link
Contributor Author

So @paganotoni, do you want to keep the test or not? I would just remove the commit again and we can merge once that is decided.

@sio4
Copy link
Member

sio4 commented May 31, 2022

First, the test with an additional channel argument (just for testing) should not be added.

By the way, the only meaningful place that calls the function openPotentiallyInstrumentedConnection() is the following code (the rest of the calls are from utility functions like CreateDB()):

pop/connection.go

Lines 103 to 113 in d146f1b

// Open creates a new datasource connection
func (c *Connection) Open() error {
if c.Store != nil {
return nil
}
if c.Dialect == nil {
return errors.New("invalid connection instance")
}
details := c.Dialect.Details()
db, err := openPotentiallyInstrumentedConnection(c.Dialect, c.Dialect.URL())

So it means the Connection.Open() was called twice which is somewhat odd to me. Connection.Open() is only called from pop.Connect(env) at least within the pop library and that is the preferred usage. I wonder if the issue was basically caused by the usage of pop which is not the same as its original design. However, once the Connection.Open() is exposed as a public function, we cannot force users not to use them so I agree we need to fix this.

However, even though we prevented the second call of sql.Register() with the lock, still the code flows to the following lines,

con, err := sql.Open(driverName, dsn)
if err != nil {
return nil, fmt.Errorf("could not open database connection: %w", err)
}
return sqlx.NewDb(con, dialect), nil

and the function openPotentiallyInstrumentedConnection() will call sql.Open() and the rest of code once again for the same connection. So IMO, the situation we need to prevent is calling Connection.Open() twice. Maybe, we need to investigate some more about the situation to make the code solid.

(What I wondered from my previous comment is not about Register but why the function Connection.Open() is called twice. I don't think this issue is related to the init since the code register the driver with a custom name and it could not conflict with the original driver name.)

@zepatrik
Copy link
Contributor Author

zepatrik commented May 31, 2022

The point is, we test our persistence layer against all supported databases (postgres, cockroach, mysql, sqlite-in-memory, sqlite-file):
https://github.com/ory/keto/blob/6b409141ec00f20ab3d5c5cdd2dd03f696751d1b/internal/persistence/sql/relationtuples_test.go#L41-L51

This of course means that we create a new connection per database. As the tests run parallel, it can happen that two Connection.Open calls happen at the same time. However, the underlying connection is not the same, but instead we open four connections to different databases at the same time.

First, the test with an additional channel argument (just for testing) should not be added.

Ok sure, I'll remove it.

@zepatrik zepatrik force-pushed the fix/registering-driver branch from ee31dce to b405df3 Compare May 31, 2022 12:19
@zepatrik
Copy link
Contributor Author

For documentation purposes, I will keep the test on this branch: https://github.com/zepatrik/pop/commits/fix/registering-driver-2

@sio4
Copy link
Member

sio4 commented May 31, 2022

... all supported databases (postgres, cockroach, mysql, sqlite-in-memory, sqlite-file):

This of course means that we create a new connection per database. As the tests run parallel, it can happen that two Connection.Open calls happen at the same time...

Ah, got it. Now the issue is clear. That is the reason of the error message was ...for driver instrumented-sql-driver-sqlite3 since you registered sqlite3 twice one for a file and another for the memory. Then locking per driver as the PR does will break your job since those two will use the same connection. Is it correct?

Then we need to find another approach here. (Maybe using the env as a name? Just a rough idea) Let me take a look at the code once I come back home.

@sio4
Copy link
Member

sio4 commented May 31, 2022

... Then locking per driver as the PR does will break your job since those two will use the same connection. Is it correct?

Maybe not the same connection but just the same driver and instrumented configs. I am AFK now, let me check it soon.

@zepatrik
Copy link
Contributor Author

We don't lock per driver or connection here, but the lock is a global variable. Therefore it will ensure no goroutine reads the list of drivers while another adds to the list of drivers. I searched the whole codebase and this is the only usage of sql.Register and sql.Drivers.

Changing the driver name would also fix it, as that would mean we don't create a conflict anymore.

@sio4
Copy link
Member

sio4 commented May 31, 2022

We don't lock per driver or connection here, but the lock is a global variable. Therefore it will ensure no goroutine reads the list of drivers while another adds to the list of drivers. I searched the whole codebase and this is the only usage of sql.Register and sql.Drivers.

Yeah, my explanation on my mobile keyboard with my poor English was not correct. My bad.

What I wanted to say is that registering an instrumented wrapper per driver safely by (global) locking will not work as intended since the deets.InstrumentedDriverOptions is per connection but the registered wrapper is currently per driver. So regardless of this PR, even if the calls are successful without panic, the options of the connection that successfully registered its driver will be used for the other connection which calls the instrumentDriver() later even though they could have different instrumented driver options. (so this is not an issue of your PR but the existing issue of the function itself when the application has multiple connections with the same driver but different options.)

For example with your explanation, you have five connections for postgres, cockroach, mysql, sqlite-in-memory, and sqlite-file, but the code will register four wrapped drivers for postgres, cockroach, mysql, sqlite even if sqlite-in-memory and sqlite-file has different configurations.

In short, since the function instrumentDriver() works based on the configuration of the ConnectionDetails, the wrapped driver to be registered should be per ConnectionDetails rather than per driver.

However, ConnectionDetails has no information about the connection name so we need to figure out what could be the best solution for this. (Maybe randomizing could be one option, another is adding the connection name to the ConnectionDetails)

By the way, even though this behavior is the root of the real issue, your PR could be fine and it will make the code safer in my opinion. Let's merge it first and find a good solution for the naming.

@paganotoni what do you think? Do you have any concerns about it?

@sio4 sio4 added the bug Something isn't working label May 31, 2022
@zepatrik
Copy link
Contributor Author

the options of the connection that successfully registered its driver will be used for the other connection which calls the instrumentDriver() later even though they could have different instrumented driver options

Correct, I did not bother about that yet. But definitely a valid point.

Copy link
Member

@sio4 sio4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's merge it for now.

I will take care of the following action items:

@sio4 sio4 assigned sio4 and zepatrik Jun 1, 2022
@sio4 sio4 merged commit 3afc531 into gobuffalo:development Jun 1, 2022
@zepatrik zepatrik deleted the fix/registering-driver branch June 1, 2022 07:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants