-
Notifications
You must be signed in to change notification settings - Fork 20.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
core: implement ChainIndexer #14522
core: implement ChainIndexer #14522
Conversation
I would really really like to see thorough tests on any new stuff added to |
@karalabe you are right, although this is not consensus stuff and could be placed in any package but it deserves a test so I added one. Also, it found problems in some corner cases which I have fixed now :) |
Uhm, @zsfelfoldi you did see that all the tests failed, right? |
@karalabe It looks like that was just timeouts? |
core/chain_indexer.go
Outdated
} | ||
|
||
// ChainIndex interface is a backend for the indexer doing the actual post-processing job | ||
type ChainIndex interface { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like it should be a verb rather than a noun. Maybe ChainIndexerBackend
or somesuch?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I already had a round about namings with @fjl and this is what we settled on but you're probably right. I renamed it to ChainIndexerBackend until a better suggestion comes up :)
core/chain_indexer.go
Outdated
func NewChainIndexer(db ethdb.Database, dbKey []byte, backend ChainIndex, sectionSize, confirmReq uint64, procWait time.Duration, stop chan struct{}) *ChainIndexer { | ||
c := &ChainIndexer{ | ||
db: db, | ||
validSectionsKey: append(dbKey, []byte("-count")...), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you use the table interface here, to help avoid the spread of prefixes all over the place?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. I also need the unprefixed chain db so I'm passing two Databases (chainDb, indexDb) but it feels nice because now it's clear what those databases are used for.
core/chain_indexer.go
Outdated
lastSectionHead = c.getSectionHead(c.calcIdx - 1) | ||
} | ||
|
||
c.lock.Unlock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This screams race condition to me. Is there any way to avoid this? Are you sure it's safe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is safe but now that I think about it maybe I don't need so many locks any more since I changed the way newHead and CanonicalSections work. I think I can simplify it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On second thought, I'd leave it like this, see above. Do you see any explicit race conditions? Or are you just concerned about temporarily releasing the lock? I considered the possibility that new head/rollback events can happen while processing and I think the code handles this appropriately (see the comments before processSection).
core/chain_indexer.go
Outdated
case <-c.stop: | ||
return | ||
case <-c.tryUpdate: | ||
c.lock.Lock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we avoid this lock entirely by message passing, so each value is only touched by a single process?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've given it some thought but I think it's better to keep it like this. The update loop is a slow loop that blocks while processing sections so I don't want to pass all new head events, only send a "wake up" signal to tryUpdate channel if necessary (when updating flag is false and there is something new to update). Also, it's not ideal to process all new head events sequentially because some new sections might have already been rolled back by the time the update loop gets there.
core/chain_indexer.go
Outdated
|
||
if c.targetCount > c.stored { | ||
go func() { | ||
time.Sleep(c.procWait) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the purpose of waiting here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes sense when generating an index for a large number of sections. It is a simple but effective way of ensuring that it will leave some processing power for the other tasks too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you not do the equivalent of 'yield' here instead? Building in hardcoded waits does not seem like a great pattern to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you mean https://golang.org/pkg/runtime/#Gosched but I'm not sure if that would help. After taking up something like a hundred milliseconds to process a section, it waits a scheduler queue round, which might not be a lot.
I understand that this is not a nice solution but I can't think of an effective, easy and nice solution right now. When updating an entire database in the background, I experienced annoyingly slow console response, adding a sleep was an easy fix to a problem that only occurs during db upgrade.
core/chain_indexer.go
Outdated
|
||
// processSection processes an entire section by calling backend functions while ensuring | ||
// the continuity of the passed headers. Since the chain mutex is not held while processing, | ||
// the continuity can be broken by a long reorg, in which case the function returns with ok == false. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a race condition here where the reorg happens after you check for it, isn't there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's safe (CanonicalSections always checks for the current canonical hashes and rolls back if needed).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function can definitely return true
when the chain is actually no longer canonical, though. What are the consequences of that? If they're insignificant, why return a status at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that it also returns a section head hash. The "ok" flag means that it has processed a valid, continuous chain of headers (parent hashes are matching), which means that IF the returned section head (hash of the last block in the section) is still canonical then the results are also canonical.
setSectionHead stores the last known section head (known to the indexer), then setValidSections marks that we have a new processed valid, continuous chain section (not necessarily canonical). Later, when someone wants to know the number of currently canonical sections, CanonicalSections checks if the last known valid sections are actually still canonical.
core/chain_indexer.go
Outdated
} | ||
|
||
// getValidSections reads the number of valid sections from the index database | ||
func (c *ChainIndexer) getValidSections() uint64 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we not cache these in memory, rather than reading them off disk each time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it makes sense, it is not called so frequently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's only two integers that are usually written from the same process. For that matter, wouldn't it make more sense to write a config struct, rather than a separate entry for each?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not two integers. There's one integer at key "count" and one section head hash entry for each section, at key "shead" + sectionIdx (big endian uint64).
@Arachnid PTAL, we've polished up the PR a bit, it's ready from my perspective. More exhaustive tests would be nice, but let's build on top and test, not hold it off. |
This PR is an alternative version of #14431 that only uses the event system for updating sections and does not use the chain mutex or modify BlockChain/HeaderChain/LightChain at all.
A rebased version of the bloombits filter can be found on the https://github.com/zsfelfoldi/go-ethereum/commits/bloombits2 branch.