fix: add lock to registering instrumented drivers #722

zepatrik · 2022-05-27T14:53:43Z

Fixes

panic: sql: Register called twice for driver instrumented-sql-driver-sqlite3 [recovered]
        panic: sql: Register called twice for driver instrumented-sql-driver-sqlite3

goroutine 15 [running]:
testing.tRunner.func1.2({0x1837d40, 0xc000473b70})
        /usr/lib/go/src/testing/testing.go:1389 +0x366
testing.tRunner.func1()
        /usr/lib/go/src/testing/testing.go:1392 +0x5d2
panic({0x1837d40, 0xc000473b70})
        /usr/lib/go/src/runtime/panic.go:844 +0x258
database/sql.Register({0x1a43574, 0x1f}, {0x1de6f60?, 0xc00046fb00})
        /usr/lib/go/src/database/sql/sql.go:51 +0x1d4
github.com/gobuffalo/pop/v6.instrumentDriver(0xc0002e63c0, {0x1a1f3bd, 0x7})
        /home/patrik/git/go/pkg/mod/github.com/gobuffalo/pop/[email protected]/connection_instrumented.go:68 +0x3e5
github.com/gobuffalo/pop/v6.openPotentiallyInstrumentedConnection({0x1df80b0, 0xc00000f7b8}, {0xc000182780, 0x5b})
        /home/patrik/git/go/pkg/mod/github.com/gobuffalo/pop/[email protected]/connection_instrumented.go:81 +0x9a
github.com/gobuffalo/pop/v6.(*Connection).Open(0xc000455500)
        /home/patrik/git/go/pkg/mod/github.com/gobuffalo/pop/[email protected]/connection.go:112 +0x11a

This problem occurred during parallel tests.

The race condition happened because the reading of the drivers and the registration of the not-found driver can have some delay, during which another go routine could already have registered the driver. Because the registration is happening as a side-effect, this cannot be fixed by the client.

paganotoni · 2022-05-27T15:39:08Z

@zepatrik is there any chance we add a test for this one 🥹? Thanks in advance.

zepatrik · 2022-05-27T16:09:12Z

How could we test for such a race condition? 🤔

paganotoni · 2022-05-27T16:10:30Z

Good question. Not sure, maybe -race helps? Maybe running parallel tests?

zepatrik · 2022-05-27T16:12:23Z

Hm but I also did not get the panic all the time before the fix, only on some occasions. I guess the only way is to inject some waiting behavior that we could then control in the test. I will try to come up with something in the next days.

paganotoni · 2022-05-27T16:14:01Z

Got it. Lets give it a shot, I don't think that's a huge blocker for merging this change but want to at least try this so we leave this covered for any future change. thanks in advance @zepatrik

zepatrik · 2022-05-30T08:21:33Z

I tried for a bit yesterday and could not come up with any useful test. Do you maybe have a proposal? My approach involved channels and stopping execution at the critical point in two go-routines, but as the lock prevented me from getting to that point I could only make it flaky without the lock, or a dead-lock with the lock.

sio4 · 2022-05-30T10:42:23Z

By the way, pop's test workflow already has -race option so the issue could be one that cannot be detected within pop. Plus, the error message sql: Register called twice for driver instrumented-sql-driver-sqlite3 tell us the function should be called just once (maybe per driver?), so I guess locking the function call could be helpful if we have some code to prevent the second call but in my rough looking at the patch, I am not sure this locking could help the situation.

If the issue itself is not clear, the approach for fixing the issue also will not be clear. So I hope we can clarify the issue first before patching it. Actually, I am not familiar enough with pop nowadays but please let me take a look at the situation before moving forward. (Also, it could be fine if you give some more idea on the issue and solution)

connection_instrumented.go

zepatrik · 2022-05-30T11:39:07Z

I think the new test failure is because we cannot re-register the instrumented driver with different options... Would you like to have that test or not? I think I will have to make it part of the already existing test to work.

sio4 · 2022-05-30T11:54:27Z

I don't think adding a channel to the function just for testing is a good thing. If the logic is good enough but is hard to test, it could be fine without a test.

I just wonder, in the first place, if we can avoid this situation (calling this call twice from goroutines even though it looks like a kind of initialization).

By the way, I think your previous version (w/o a test) could be acceptable. (I mean that is logically fine)

zepatrik · 2022-05-30T11:56:47Z

I just wonder, in the first place, if we can avoid this situation (calling this call twice from goroutines even though it looks like a kind of initialization).

Unfortunately not. The sql.RegisterDriver is meant to be called in an init func, so it is a side-effect of importing a driver. There is no atomic way to check whether a driver was already registered.

zepatrik · 2022-05-31T10:03:31Z

So @paganotoni, do you want to keep the test or not? I would just remove the commit again and we can merge once that is decided.

sio4 · 2022-05-31T11:49:52Z

First, the test with an additional channel argument (just for testing) should not be added.

By the way, the only meaningful place that calls the function openPotentiallyInstrumentedConnection() is the following code (the rest of the calls are from utility functions like CreateDB()):

pop/connection.go

Lines 103 to 113 in d146f1b

    
           // Open creates a new datasource connection 
        
           func (c *Connection) Open() error { 
        
           	if c.Store != nil { 
        
           		return nil 
        
           	} 
        
           	if c.Dialect == nil { 
        
           		return errors.New("invalid connection instance") 
        
           	} 
        
           	details := c.Dialect.Details() 
        
           	db, err := openPotentiallyInstrumentedConnection(c.Dialect, c.Dialect.URL())

So it means the Connection.Open() was called twice which is somewhat odd to me. Connection.Open() is only called from pop.Connect(env) at least within the pop library and that is the preferred usage. I wonder if the issue was basically caused by the usage of pop which is not the same as its original design. However, once the Connection.Open() is exposed as a public function, we cannot force users not to use them so I agree we need to fix this.

However, even though we prevented the second call of sql.Register() with the lock, still the code flows to the following lines,

pop/connection_instrumented.go

Lines 86 to 91 in d146f1b

    
           con, err := sql.Open(driverName, dsn) 
        
           if err != nil { 
        
           	return nil, fmt.Errorf("could not open database connection: %w", err) 
        
           } 
        
           return sqlx.NewDb(con, dialect), nil

and the function openPotentiallyInstrumentedConnection() will call sql.Open() and the rest of code once again for the same connection. So IMO, the situation we need to prevent is calling Connection.Open() twice. Maybe, we need to investigate some more about the situation to make the code solid.

(What I wondered from my previous comment is not about Register but why the function Connection.Open() is called twice. I don't think this issue is related to the init since the code register the driver with a custom name and it could not conflict with the original driver name.)

zepatrik · 2022-05-31T12:17:20Z

The point is, we test our persistence layer against all supported databases (postgres, cockroach, mysql, sqlite-in-memory, sqlite-file):
https://github.com/ory/keto/blob/6b409141ec00f20ab3d5c5cdd2dd03f696751d1b/internal/persistence/sql/relationtuples_test.go#L41-L51

This of course means that we create a new connection per database. As the tests run parallel, it can happen that two Connection.Open calls happen at the same time. However, the underlying connection is not the same, but instead we open four connections to different databases at the same time.

First, the test with an additional channel argument (just for testing) should not be added.

Ok sure, I'll remove it.

zepatrik · 2022-05-31T12:20:58Z

For documentation purposes, I will keep the test on this branch: https://github.com/zepatrik/pop/commits/fix/registering-driver-2

sio4 · 2022-05-31T12:53:08Z

... all supported databases (postgres, cockroach, mysql, sqlite-in-memory, sqlite-file):

This of course means that we create a new connection per database. As the tests run parallel, it can happen that two Connection.Open calls happen at the same time...

Ah, got it. Now the issue is clear. That is the reason of the error message was ...for driver instrumented-sql-driver-sqlite3 since you registered sqlite3 twice one for a file and another for the memory. Then locking per driver as the PR does will break your job since those two will use the same connection. Is it correct?

Then we need to find another approach here. (Maybe using the env as a name? Just a rough idea) Let me take a look at the code once I come back home.

sio4 · 2022-05-31T13:01:14Z

... Then locking per driver as the PR does will break your job since those two will use the same connection. Is it correct?

Maybe not the same connection but just the same driver and instrumented configs. I am AFK now, let me check it soon.

zepatrik · 2022-05-31T13:02:49Z

We don't lock per driver or connection here, but the lock is a global variable. Therefore it will ensure no goroutine reads the list of drivers while another adds to the list of drivers. I searched the whole codebase and this is the only usage of sql.Register and sql.Drivers.

Changing the driver name would also fix it, as that would mean we don't create a conflict anymore.

sio4 · 2022-05-31T14:33:00Z

We don't lock per driver or connection here, but the lock is a global variable. Therefore it will ensure no goroutine reads the list of drivers while another adds to the list of drivers. I searched the whole codebase and this is the only usage of sql.Register and sql.Drivers.

Yeah, my explanation on my mobile keyboard with my poor English was not correct. My bad.

What I wanted to say is that registering an instrumented wrapper per driver safely by (global) locking will not work as intended since the deets.InstrumentedDriverOptions is per connection but the registered wrapper is currently per driver. So regardless of this PR, even if the calls are successful without panic, the options of the connection that successfully registered its driver will be used for the other connection which calls the instrumentDriver() later even though they could have different instrumented driver options. (so this is not an issue of your PR but the existing issue of the function itself when the application has multiple connections with the same driver but different options.)

For example with your explanation, you have five connections for postgres, cockroach, mysql, sqlite-in-memory, and sqlite-file, but the code will register four wrapped drivers for postgres, cockroach, mysql, sqlite even if sqlite-in-memory and sqlite-file has different configurations.

In short, since the function instrumentDriver() works based on the configuration of the ConnectionDetails, the wrapped driver to be registered should be per ConnectionDetails rather than per driver.

However, ConnectionDetails has no information about the connection name so we need to figure out what could be the best solution for this. (Maybe randomizing could be one option, another is adding the connection name to the ConnectionDetails)

By the way, even though this behavior is the root of the real issue, your PR could be fine and it will make the code safer in my opinion. Let's merge it first and find a good solution for the naming.

@paganotoni what do you think? Do you have any concerns about it?

zepatrik · 2022-05-31T14:54:33Z

the options of the connection that successfully registered its driver will be used for the other connection which calls the instrumentDriver() later even though they could have different instrumented driver options

Correct, I did not bother about that yet. But definitely a valid point.

sio4

Let's merge it for now.

I will take care of the following action items:

consider how to correct the instrumentDriver() to register custom driver "per connection" if it is really a good thing.
along with Support SQL instrumentation with Open Census, Open Tracing, AWS Xray, Google Stack Driver #599

fix: add lock to registering instrumented drivers

b405df3

sio4 requested changes May 30, 2022

View reviewed changes

connection_instrumented.go Show resolved Hide resolved

zepatrik force-pushed the fix/registering-driver branch from ee31dce to b405df3 Compare May 31, 2022 12:19

sio4 added the bug Something isn't working label May 31, 2022

sio4 approved these changes Jun 1, 2022

View reviewed changes

sio4 assigned sio4 and zepatrik Jun 1, 2022

sio4 merged commit 3afc531 into gobuffalo:development Jun 1, 2022

sio4 mentioned this pull request Jun 1, 2022

instrumentDriver() register wrapped driver per driver instead of per connection #725

Open

zepatrik deleted the fix/registering-driver branch June 1, 2022 07:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: add lock to registering instrumented drivers #722

fix: add lock to registering instrumented drivers #722

zepatrik commented May 27, 2022 •

edited

Loading

paganotoni commented May 27, 2022

zepatrik commented May 27, 2022

paganotoni commented May 27, 2022

zepatrik commented May 27, 2022

paganotoni commented May 27, 2022 •

edited

Loading

zepatrik commented May 30, 2022 •

edited

Loading

sio4 commented May 30, 2022

zepatrik commented May 30, 2022

sio4 commented May 30, 2022 •

edited

Loading

zepatrik commented May 30, 2022

zepatrik commented May 31, 2022

sio4 commented May 31, 2022

zepatrik commented May 31, 2022 •

edited

Loading

zepatrik commented May 31, 2022

sio4 commented May 31, 2022

sio4 commented May 31, 2022

zepatrik commented May 31, 2022

sio4 commented May 31, 2022 •

edited

Loading

zepatrik commented May 31, 2022

sio4 left a comment

fix: add lock to registering instrumented drivers #722

fix: add lock to registering instrumented drivers #722

Conversation

zepatrik commented May 27, 2022 • edited Loading

paganotoni commented May 27, 2022

zepatrik commented May 27, 2022

paganotoni commented May 27, 2022

zepatrik commented May 27, 2022

paganotoni commented May 27, 2022 • edited Loading

zepatrik commented May 30, 2022 • edited Loading

sio4 commented May 30, 2022

zepatrik commented May 30, 2022

sio4 commented May 30, 2022 • edited Loading

zepatrik commented May 30, 2022

zepatrik commented May 31, 2022

sio4 commented May 31, 2022

zepatrik commented May 31, 2022 • edited Loading

zepatrik commented May 31, 2022

sio4 commented May 31, 2022

sio4 commented May 31, 2022

zepatrik commented May 31, 2022

sio4 commented May 31, 2022 • edited Loading

zepatrik commented May 31, 2022

sio4 left a comment

Choose a reason for hiding this comment

zepatrik commented May 27, 2022 •

edited

Loading

paganotoni commented May 27, 2022 •

edited

Loading

zepatrik commented May 30, 2022 •

edited

Loading

sio4 commented May 30, 2022 •

edited

Loading

zepatrik commented May 31, 2022 •

edited

Loading

sio4 commented May 31, 2022 •

edited

Loading