Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider adding lower-level iterator controls available in LevelDB #17

Closed
tommetge opened this issue Oct 15, 2013 · 17 comments
Closed

Consider adding lower-level iterator controls available in LevelDB #17

tommetge opened this issue Oct 15, 2013 · 17 comments
Milestone

Comments

@tommetge
Copy link

Specifically:

  • seek() should actually perform a seek (this is not the case right now)
  • allow callers to check if the iterator is currently valid (LevelDB::Iterator::Valid())
  • allow callers to manually step the iterator forward and backward
  • allow callers to fetch the key or value for a current iterator

This is a completely different model of iteration from standard python. As mentioned here:

tommetge@25364fc

... we have an existing code base whose logic depends on the outlined features. It may not be appropriate for the general Plyvel consumer.

Let me know what you think.

@jtolio
Copy link

jtolio commented Oct 15, 2013

also, we need to support closing iterators, so that we don't have to wait for garbage collection to kick in to free the leveldb snapshot backing the iterator

@wbolster
Copy link
Owner

The ability to close iterators explicitly is actually an issue on itself. I've just opened #19 for that.

@jtolio
Copy link

jtolio commented Oct 15, 2013

thx

@wbolster
Copy link
Owner

I have a few remarks on the initial report.

seek() should actually perform a seek (this is not the case right now)

That is not correct. Iterator.seek() does seek (by calling LevelDB's Iterator.Seek()); it just does not return anything.

allow callers to check if the iterator is currently valid (LevelDB::Iterator::Valid())

This depends on how the iterator is used: next(...) might be valid, but .prev() might not (or any other combination).

allow callers to manually step the iterator forward and backward

It seems to me the same can be achieved using next(...) or .prev() (and ignoring the result).

allow callers to fetch the key or value for a current iterator

Again, this depends on how the iterator is used (see above).

This is a completely different model of iteration from standard python.

Indeed, Python has an (imho very sophisticated) iteration interface. I've spent quite some time transforming the LevelDB iterator interface to conform to Python's conventions. The result is that the underlying code is a state machine instead of a straight-forward API bridge. Mixing this already quite hairy code with additional Python-level access to the underlying LevelDB concepts is, at best, a daunting task. I'm especially worried about all the corner cases combining things like prefixed databases, the .iterator(prefix=...) arg, and all combinations of start, stop, include_start, include_stop, and prefix. Getting this (hopefully completely) right took quite some effort; see e.g. #4, #9, #15.

@tommetge
Copy link
Author

Thanks for the response. One correction in my original report: it's not seek() that doesn't move the iterator, it's seek_to_start() and seek_to_stop() that do not. These calls defer actual movement of the iterator to real_next() and real_prev().

The fact that seek() operates as expected (and moves the iterator position in LevelDB) makes the case for exposing Valid() a little stronger because, as you noted, next() and prev() can return entirely different things, depending on the database contents. The ambiguity can lead to confusion when implementing seek() (i.e. should i use next or prev here?).

I think that seek() really doesn't fit the Python iterator idiom in general - there isn't such a concept in the iterator protocol. The Python model would accommodate a seek by range-based iterating, which is already implemented.

Manually stepping with next() and prev() is totally fine. We've implemented it precisely that way, wrapping it in a try/except block to emulate an exception-free step.

I think the core concern here is that Plyvel is an excellent Python interface for LevelDB's C library - but our code makes some assumptions about seek(), seek_to_start(), and seek_to_stop() that aren't respected in Plyvel and are incompatible with the Python concept of iteration. Given this, if we can't find a good fit for these features here, I'm totally fine maintaining a fork or branch of Plyvel that implements these lower-level features that never sees use outside of our (rather unique) code base.

@jtolio
Copy link

jtolio commented Oct 17, 2013

the only differences i really am aware of between python iterators and leveldb iterators are that leveldb-next advances the iterator once, whereas python-next saves the current key and value, advances the iterator, and then returns the saved key and value. prev is similar but in the opposite order. basically the difference between i++ and ++i, right?

i feel like it should be fine to let both semantics coexist defaulting to python semantics, provided that there's calls to the lower level leveldb semantics-style calls?

maybe i'm just talking out of my rear end again, as i'm not as familiar with plyvel as either of you two. i'm also aware wouter had some questions about the leveldb-py interface (it seemed weird to him), but i think it's possible for both semantics to coexist.

for someone looking for a fast leveldb library in python, it's possible they'll want to use leveldb semantics, in the case they're transliterating code from a different language or something.

@wbolster
Copy link
Owner

leveldb-next advances the iterator once, whereas python-next saves the current key and value, advances the iterator, and then returns the saved key and value. prev is similar but in the opposite order

Not exactly. next() in Python first calls Next() in C++, then returns the current value. .prev() in Python remembers the current value, calls Prev() in C++, then returns the remembered value.

However, both next() and .prev() have quite some logic to correctly handle all the start, stop, include_start, include_stop variations. Allowing direct calls to Next() and Prev() on the C++ level could easily mess up this carefully crafted logic.

Example: Next() might succeed on the C++ level, but if it's called on a PrefixedDB instance, the iterator could still be invalid, even though the underlying C++ iterator is still valid.

i'm also aware wouter had some questions about the leveldb-py interface (it seemed weird to him)

Well, not exactly 'weird', but mostly not reallly Pythonic. With Plyvel, all common iteration patterns can be expressed using a common for loop with the right combination of keyword args to DB.iterator(), while in leveldb-py calling code needs to interact with the iterator instance instead of just using a for loop.

@wbolster
Copy link
Owner

wbolster commented Nov 3, 2013

Fwiw, I would not mind adding a "low level iterator" API to Plyvel, but I'd like to keep it separate from the Pythonic iterator API with all its quirks and corner cases, as outlined before.

What do you think of a db.raw_iterator() API returning a RawIterator instance that almost completely mimics the C++ API? It would not have support for the all the fancy flags (and would not support the "prefixed database" feature either).

@jtolio
Copy link

jtolio commented Nov 3, 2013

that would be fantastic.

@wbolster
Copy link
Owner

wbolster commented Nov 3, 2013

What is your opinion on .key() and .value() methods versus return values from .next() and .prev() (and maybe others)? And should there be a .valid() method, or should the code handle this inside the other methods?

— Wouter

(Sent from my tablet. Please ignore the typos.)

JT [email protected]:

that would be fantastic.


Reply to this email directly or view it on GitHub.

@jtolio
Copy link

jtolio commented Nov 8, 2013

whoops, i suck at email once again. i'm on freenode as 'jtolds' btw

the benefit to having separate key and value methods is that you don't have to call the underlying c method if you don't need to, reducing memory copies. it also helps makes code more general. you can have some chunk of code decide where to seek the iterator to, and another chunk of code to handle the iterator post seeking.

i also think a valid method is useful, cause try/except blocks around every seek/step seem like worse code to me, but i guess that's more subjective

wbolster added a commit that referenced this issue Nov 10, 2013
Added a .raw_iterator() method to DB and Snapshot instances. The
RawIterator instance returned by .raw_iterator() mimics the C++ iterator
API provided by LevelDB. The API is almost a direct mapping, except for
some added error handling, which is required to avoid segfaults or other
aborts from within LevelDB.

See issue #17.
@wbolster
Copy link
Owner

I've just implemented a RawIterator (published in the raw-iterator feature branch for now). Commit
8d0bfa6 lays the foundation by moving some of the shared iterator logic into a base class, and commit 2cd3101 adds the new RawIterator API on top of that.

@tommetge, @jtolds: please review these changes. The tests provide example API usage; the implementation itself shows the exposed API.

@wbolster
Copy link
Owner

Relevant parts of the commit message:

Implement low-level "raw iterator" API

Added a .raw_iterator() method to DB and Snapshot instances. The
RawIterator instance returned by .raw_iterator() mimics the C++ iterator
API provided by LevelDB. The API is almost a direct mapping, except for
some added error handling, which is required to avoid segfaults or other
aborts from within LevelDB.

@jtolio
Copy link

jtolio commented Nov 13, 2013

looks good to me

@wbolster
Copy link
Owner

Okay, so this just needs docs. I'll look into that later.

— Wouter

(Sent from my phone. Please ignore the typos.)

JT [email protected] schreef:

looks good to me


Reply to this email directly or view it on GitHub:
#17 (comment)

wbolster added a commit that referenced this issue Nov 15, 2013
@wbolster
Copy link
Owner

Just released Plyvel 0.7 with this new API. https://plyvel.readthedocs.org/en/latest/news.html#plyvel-0-7

@jtolio
Copy link

jtolio commented Nov 15, 2013

awesome, thx wouter

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants