Skip to content

Multithreading diff_tables() for Comparing Many Tables #52

Answered by erezsh
alex-mirkin asked this question in Q&A
Discussion options

You must be logged in to vote

It looks like it's happening because somewhere the shared connection gets closed.

I'm a little fuzzy on the exact details (I need to re-read the code), but I'll answer what I can.

When you connect(), the thread_count determines how many threads will communicate with the database.

In diff_tables(), the threadpool size determines how many threads will be used to manage the algorithm. Each such thread should occupy up to one database thread at a time. That's where the 2x suggestion comes from - the idea is to have 1 algorithm thread for each database thread, since there are two databases. (ofc that only applies to hashdiff between two different connections)

But if you use the same connection…

Replies: 2 comments 3 replies

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
3 replies
@erezsh
Comment options

@alex-mirkin
Comment options

@erezsh
Comment options

Answer selected by alex-mirkin
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants