-
-
Notifications
You must be signed in to change notification settings - Fork 246
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: add lock to registering instrumented drivers #722
Conversation
@zepatrik is there any chance we add a test for this one 🥹? Thanks in advance. |
How could we test for such a race condition? 🤔 |
Good question. Not sure, maybe |
Hm but I also did not get the panic all the time before the fix, only on some occasions. I guess the only way is to inject some waiting behavior that we could then control in the test. I will try to come up with something in the next days. |
Got it. Lets give it a shot, I don't think that's a huge blocker for merging this change but want to at least try this so we leave this covered for any future change. thanks in advance @zepatrik |
I tried for a bit yesterday and could not come up with any useful test. Do you maybe have a proposal? My approach involved channels and stopping execution at the critical point in two go-routines, but as the lock prevented me from getting to that point I could only make it flaky without the lock, or a dead-lock with the lock. |
By the way, pop's test workflow already has If the issue itself is not clear, the approach for fixing the issue also will not be clear. So I hope we can clarify the issue first before patching it. Actually, I am not familiar enough with pop nowadays but please let me take a look at the situation before moving forward. (Also, it could be fine if you give some more idea on the issue and solution) |
I think the new test failure is because we cannot re-register the instrumented driver with different options... Would you like to have that test or not? I think I will have to make it part of the already existing test to work. |
I don't think adding a channel to the function just for testing is a good thing. If the logic is good enough but is hard to test, it could be fine without a test. I just wonder, in the first place, if we can avoid this situation (calling this call twice from goroutines even though it looks like a kind of initialization). By the way, I think your previous version (w/o a test) could be acceptable. (I mean that is logically fine) |
Unfortunately not. The |
So @paganotoni, do you want to keep the test or not? I would just remove the commit again and we can merge once that is decided. |
First, the test with an additional channel argument (just for testing) should not be added. By the way, the only meaningful place that calls the function Lines 103 to 113 in d146f1b
So it means the However, even though we prevented the second call of pop/connection_instrumented.go Lines 86 to 91 in d146f1b
and the function (What I wondered from my previous comment is not about |
The point is, we test our persistence layer against all supported databases (postgres, cockroach, mysql, sqlite-in-memory, sqlite-file): This of course means that we create a new connection per database. As the tests run parallel, it can happen that two
Ok sure, I'll remove it. |
ee31dce
to
b405df3
Compare
For documentation purposes, I will keep the test on this branch: https://github.com/zepatrik/pop/commits/fix/registering-driver-2 |
Ah, got it. Now the issue is clear. That is the reason of the error message was Then we need to find another approach here. (Maybe using the env as a name? Just a rough idea) Let me take a look at the code once I come back home. |
Maybe not the same connection but just the same driver and instrumented configs. I am AFK now, let me check it soon. |
We don't lock per driver or connection here, but the lock is a global variable. Therefore it will ensure no goroutine reads the list of drivers while another adds to the list of drivers. I searched the whole codebase and this is the only usage of Changing the driver name would also fix it, as that would mean we don't create a conflict anymore. |
Yeah, my explanation on my mobile keyboard with my poor English was not correct. My bad. What I wanted to say is that registering an instrumented wrapper per driver safely by (global) locking will not work as intended since the For example with your explanation, you have five connections for In short, since the function However, By the way, even though this behavior is the root of the real issue, your PR could be fine and it will make the code safer in my opinion. Let's merge it first and find a good solution for the naming. @paganotoni what do you think? Do you have any concerns about it? |
Correct, I did not bother about that yet. But definitely a valid point. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's merge it for now.
I will take care of the following action items:
- consider how to correct the
instrumentDriver()
to register custom driver "per connection" if it is really a good thing. - along with Support SQL instrumentation with Open Census, Open Tracing, AWS Xray, Google Stack Driver #599
Fixes
This problem occurred during parallel tests.
The race condition happened because the reading of the drivers and the registration of the not-found driver can have some delay, during which another go routine could already have registered the driver. Because the registration is happening as a side-effect, this cannot be fixed by the client.