core: implement ChainIndexer #14522

zsfelfoldi · 2017-05-26T10:34:21Z

This PR is an alternative version of #14431 that only uses the event system for updating sections and does not use the chain mutex or modify BlockChain/HeaderChain/LightChain at all.
A rebased version of the bloombits filter can be found on the https://github.com/zsfelfoldi/go-ethereum/commits/bloombits2 branch.

karalabe · 2017-05-26T13:12:31Z

I would really really like to see thorough tests on any new stuff added to core.

zsfelfoldi · 2017-05-28T12:35:35Z

@karalabe you are right, although this is not consensus stuff and could be placed in any package but it deserves a test so I added one. Also, it found problems in some corner cases which I have fixed now :)

karalabe · 2017-06-14T09:41:03Z

Uhm, @zsfelfoldi you did see that all the tests failed, right?

Arachnid · 2017-06-15T11:16:49Z

@karalabe It looks like that was just timeouts?

Arachnid · 2017-07-04T15:52:25Z

core/chain_indexer.go

+}
+
+// ChainIndex interface is a backend for the indexer doing the actual post-processing job
+type ChainIndex interface {


This seems like it should be a verb rather than a noun. Maybe ChainIndexerBackend or somesuch?

I already had a round about namings with @fjl and this is what we settled on but you're probably right. I renamed it to ChainIndexerBackend until a better suggestion comes up :)

Arachnid · 2017-07-04T15:54:09Z

core/chain_indexer.go

+func NewChainIndexer(db ethdb.Database, dbKey []byte, backend ChainIndex, sectionSize, confirmReq uint64, procWait time.Duration, stop chan struct{}) *ChainIndexer {
+	c := &ChainIndexer{
+		db:                   db,
+		validSectionsKey:     append(dbKey, []byte("-count")...),


Could you use the table interface here, to help avoid the spread of prefixes all over the place?

Sure. I also need the unprefixed chain db so I'm passing two Databases (chainDb, indexDb) but it feels nice because now it's clear what those databases are used for.

Arachnid · 2017-07-05T10:37:02Z

core/chain_indexer.go

+					lastSectionHead = c.getSectionHead(c.calcIdx - 1)
+				}
+
+				c.lock.Unlock()


This screams race condition to me. Is there any way to avoid this? Are you sure it's safe?

I think it is safe but now that I think about it maybe I don't need so many locks any more since I changed the way newHead and CanonicalSections work. I think I can simplify it.

On second thought, I'd leave it like this, see above. Do you see any explicit race conditions? Or are you just concerned about temporarily releasing the lock? I considered the possibility that new head/rollback events can happen while processing and I think the code handles this appropriately (see the comments before processSection).

Arachnid · 2017-07-05T10:43:50Z

core/chain_indexer.go

+		case <-c.stop:
+			return
+		case <-c.tryUpdate:
+			c.lock.Lock()


Could we avoid this lock entirely by message passing, so each value is only touched by a single process?

I've given it some thought but I think it's better to keep it like this. The update loop is a slow loop that blocks while processing sections so I don't want to pass all new head events, only send a "wake up" signal to tryUpdate channel if necessary (when updating flag is false and there is something new to update). Also, it's not ideal to process all new head events sequentially because some new sections might have already been rolled back by the time the update loop gets there.

Arachnid · 2017-07-05T10:48:17Z

core/chain_indexer.go

+
+			if c.targetCount > c.stored {
+				go func() {
+					time.Sleep(c.procWait)


What's the purpose of waiting here?

It makes sense when generating an index for a large number of sections. It is a simple but effective way of ensuring that it will leave some processing power for the other tasks too.

Can you not do the equivalent of 'yield' here instead? Building in hardcoded waits does not seem like a great pattern to me.

I think you mean https://golang.org/pkg/runtime/#Gosched but I'm not sure if that would help. After taking up something like a hundred milliseconds to process a section, it waits a scheduler queue round, which might not be a lot.
I understand that this is not a nice solution but I can't think of an effective, easy and nice solution right now. When updating an entire database in the background, I experienced annoyingly slow console response, adding a sleep was an easy fix to a problem that only occurs during db upgrade.

Arachnid · 2017-07-05T10:57:17Z

core/chain_indexer.go

+
+// processSection processes an entire section by calling backend functions while ensuring
+// the continuity of the passed headers. Since the chain mutex is not held while processing,
+// the continuity can be broken by a long reorg, in which case the function returns with ok == false.


There's a race condition here where the reorg happens after you check for it, isn't there?

I think it's safe (CanonicalSections always checks for the current canonical hashes and rolls back if needed).

This function can definitely return true when the chain is actually no longer canonical, though. What are the consequences of that? If they're insignificant, why return a status at all?

Note that it also returns a section head hash. The "ok" flag means that it has processed a valid, continuous chain of headers (parent hashes are matching), which means that IF the returned section head (hash of the last block in the section) is still canonical then the results are also canonical.
setSectionHead stores the last known section head (known to the indexer), then setValidSections marks that we have a new processed valid, continuous chain section (not necessarily canonical). Later, when someone wants to know the number of currently canonical sections, CanonicalSections checks if the last known valid sections are actually still canonical.

Arachnid · 2017-07-05T11:06:01Z

core/chain_indexer.go

+}
+
+// getValidSections reads the number of valid sections from the index database
+func (c *ChainIndexer) getValidSections() uint64 {


Can we not cache these in memory, rather than reading them off disk each time?

I don't think it makes sense, it is not called so frequently.

It's only two integers that are usually written from the same process. For that matter, wouldn't it make more sense to write a config struct, rather than a separate entry for each?

It's not two integers. There's one integer at key "count" and one section head hash entry for each section, at key "shead" + sectionIdx (big endian uint64).

karalabe · 2017-08-03T16:31:08Z

@Arachnid PTAL, we've polished up the PR a bit, it's ready from my perspective. More exhaustive tests would be nice, but let's build on top and test, not hold it off.

zsfelfoldi added the review label May 26, 2017

zsfelfoldi requested review from fjl and Arachnid May 26, 2017 10:34

zsfelfoldi force-pushed the chainproc2 branch from b7776be to c7908db Compare May 28, 2017 12:33

zsfelfoldi requested a review from karalabe May 28, 2017 12:35

fjl modified the milestone: 1.6.3 May 31, 2017

karalabe modified the milestones: 1.6.4, 1.6.3 Jun 1, 2017

zsfelfoldi force-pushed the chainproc2 branch from c7908db to 0c602f5 Compare June 16, 2017 01:30

zsfelfoldi mentioned this pull request Jun 16, 2017

core/bloombits, eth/filter: transformed bloom bitmap based log search #14631

Merged

fjl modified the milestones: 1.6.7, 1.6.4 Jun 23, 2017

zsfelfoldi force-pushed the chainproc2 branch from 0c602f5 to 2d65e99 Compare July 4, 2017 21:41

Arachnid reviewed Jul 5, 2017

View reviewed changes

fjl modified the milestones: 1.6.7, 1.7.0 Jul 17, 2017

karalabe force-pushed the chainproc2 branch from 7657715 to 8a3b1f9 Compare August 3, 2017 16:30

zsfelfoldi force-pushed the chainproc2 branch from ed06779 to 0ca1b68 Compare August 5, 2017 01:26

ethereum deleted a comment from GitCop Aug 5, 2017

karalabe force-pushed the chainproc2 branch from 0ca1b68 to add2bb9 Compare August 7, 2017 08:07

zsfelfoldi and others added 2 commits August 7, 2017 17:37

core: implement ChainIndexer

bd74882

core: polish chain indexer a bit

8edaaa2

karalabe force-pushed the chainproc2 branch from add2bb9 to 8edaaa2 Compare August 7, 2017 14:38

fjl approved these changes Aug 8, 2017

View reviewed changes

fjl merged commit 374c49e into ethereum:master Aug 8, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

core: implement ChainIndexer #14522

core: implement ChainIndexer #14522

zsfelfoldi commented May 26, 2017

karalabe commented May 26, 2017

zsfelfoldi commented May 28, 2017

karalabe commented Jun 14, 2017

Arachnid commented Jun 15, 2017

Arachnid Jul 4, 2017

zsfelfoldi Jul 6, 2017

Arachnid Jul 4, 2017

zsfelfoldi Jul 6, 2017

Arachnid Jul 5, 2017

zsfelfoldi Jul 6, 2017

zsfelfoldi Jul 6, 2017

Arachnid Jul 5, 2017

zsfelfoldi Jul 6, 2017

Arachnid Jul 5, 2017

zsfelfoldi Jul 6, 2017

Arachnid Jul 7, 2017

zsfelfoldi Jul 8, 2017 •

edited

Loading

Arachnid Jul 5, 2017

zsfelfoldi Jul 6, 2017

Arachnid Jul 7, 2017

zsfelfoldi Jul 8, 2017

Arachnid Jul 5, 2017

zsfelfoldi Jul 6, 2017

Arachnid Jul 7, 2017

zsfelfoldi Jul 8, 2017

karalabe commented Aug 3, 2017

core: implement ChainIndexer #14522

core: implement ChainIndexer #14522

Conversation

zsfelfoldi commented May 26, 2017

karalabe commented May 26, 2017

zsfelfoldi commented May 28, 2017

karalabe commented Jun 14, 2017

Arachnid commented Jun 15, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zsfelfoldi Jul 8, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

karalabe commented Aug 3, 2017

zsfelfoldi Jul 8, 2017 •

edited

Loading