-
Notifications
You must be signed in to change notification settings - Fork 18.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LMDB map size - double when full #3731
Conversation
b5dac14
to
b55eeb5
Compare
From the error message shown in #2404 (comment), I think this would also help to address the issue with running the docker images under OSX. See #3518 (comment). |
b55eeb5
to
cfdc303
Compare
Can I get a review on this PR? The problem keeps coming up (most recently at #4003). |
@lukeyeager this looks reasonable but I'd like to have it tested. Could you (or any volunteer!) confirm that the LeNet MNIST example still produces the same result and better still generate and check an ILSVRC—or any larger scale, color image data—lmdb? |
Instead, double the map size on the MDB_MAP_FULL exception.
cfdc303
to
74040cb
Compare
I added a DLOG message to make it clear what's going on. Here's the output from a debug build for each of the requested tests (Release build still works fine, just not as informative): MNIST example:
ImageNet validation set (50K, not the 1.3M training set):
|
I verified the pull request on ubuntu 14.04 32-bit. Here are the results. Without PR-3731, two warnings, two errors (see below).
With PR-3731:
|
Thanks @lukeyeager! This should settle the platform issues that come up from time to time. |
This commit breaks opening large LMDBs: |
@olesalscheider can you paste the error? Are you opening the LMDBs to read for training or are you trying to add data to an existing LMDB? |
I try to open them for training. With this patch, I get the following error:
|
|
|
It's working fine for me. Do you get a stacktrace with your error? Which MDB command is failing? Does this happen right away when you start training or after a while? Do you have a huge amount of data in a batch? Big enough to make a single batch bigger than the default 10MB map size? |
It fails right away at the first call of mdb_env_open in LMDB::Open. This is the stack trace with debug symbols:
One batch should be around 3 MB. |
Hmm. While debugging, I tried this to get some more info: $ git diff
diff --git a/src/caffe/util/db_lmdb.cpp b/src/caffe/util/db_lmdb.cpp
index df83a52..d0909f3 100644
--- a/src/caffe/util/db_lmdb.cpp
+++ b/src/caffe/util/db_lmdb.cpp
@@ -33,6 +33,9 @@ void LMDB::Open(const string& source, Mode mode) {
}
#endif
LOG(INFO) << "Opened lmdb " << source;
+ struct MDB_envinfo current_info;
+ MDB_CHECK(mdb_env_info(mdb_env_, ¤t_info));
+ LOG(INFO) << "Map size is " << current_info.me_mapsize;
}
LMDBCursor* LMDB::NewCursor() { When opening LMDBs for writing, the map size is the default 10MB (as expected):
But when opening LMDBs for reading/training (
I have no idea where that 1TB value is coming from. I'm not setting it in the code anywhere. @hyc - does opening LMDBs with |
@olesalscheider it looks like your system is refusing to allocate this 1TB. You said you had Linux x86_64? Can you be more specific? Which distro?
Which version of LMDB do you have?
|
Aha! https://github.com/LMDB/lmdb/blob/LMDB_0.9.10/libraries/liblmdb/mdb.c#L3458-L3471 When you open a database and don't set a mapsize, it opens with the last mapsize that you used when creating/editing the database (or rounds up to the minimum). So that explains the 1TB map size when opening old databases. |
Luke Yeager wrote:
Huh? MDB_RDONLY is for read-only, not for writing.
MDB_RDONLY | MDB_RDONLY - huh, again? No, there is nowhere that sets a different default map size, but if you're -- Howard Chu |
Luke Yeager wrote:
Yes, that's required since otherwise an arbitrary utility program can't know http://symas.com/mdb/doc/group__mdb.html#gaa2506ec8dab3d969b0e609cd82e619e5 -- Howard Chu |
@hyc Ok, that makes sense. Thanks for chiming in! @olesalscheider I'm not seeing a problem with this PR yet. If you could create the LMDB on your system, then you should be able to open it for reading. When you get a chance, will you send me the info I asked for in my last post? Maybe that will reveal something. |
@lukeyeager: The system I work on is similar to yours but I have a newer version of liblmdb:
|
I just noticed that the first LMDB in fact loads fine and reports a map size of 100000000000000. |
Woah yeah that fails for me too. I can't allocate 200TB of virtual memory. You should update your script to set the limit to 1TB instead of 100TB. If you want to use the same LMDBs without re-creating them (I would), then try this: $ git diff
diff --git a/src/caffe/util/db_lmdb.cpp b/src/caffe/util/db_lmdb.cpp
index df83a52..23d70c3 100644
--- a/src/caffe/util/db_lmdb.cpp
+++ b/src/caffe/util/db_lmdb.cpp
@@ -33,6 +33,13 @@ void LMDB::Open(const string& source, Mode mode) {
}
#endif
LOG(INFO) << "Opened lmdb " << source;
+ if (mode == READ) {
+ // Set the mapsize to the minimum allowed
+ MDB_CHECK(mdb_env_set_mapsize(mdb_env_, 1));
+ }
+ struct MDB_envinfo current_info;
+ MDB_CHECK(mdb_env_info(mdb_env_, ¤t_info));
+ LOG(INFO) << "Map size is " << current_info.me_mapsize;
}
LMDBCursor* LMDB::NewCursor() { If that works for you, I'll follow up with another PR to force the mapsize to the minimum allowed when reading databases. |
Luke Yeager wrote:
Heh yeah, AMD64 only has 48 address bits, 256TB max and half is reserved for -- Howard Chu |
Yes, I also just noticed that it was a bit much... |
Are you sure? It doesn't look like finding the minimum size requires any significant computation: I'm noticing an additional ~0.15 seconds for each call to |
Yes, the call to But this is probably not an issue for anyone with a sane map size. |
dynamically set LMDB map size (double when full)
Close #2404, close #3644, close #3728, close #3730
Also #1298, #1861, #2293, #2709
This change helps users on Windows and/or 32-bit systems by removing the 1TB hard-coded map size.