Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LMDB map size - double when full #3731

Merged
merged 3 commits into from
Apr 25, 2016
Merged

Conversation

lukeyeager
Copy link
Contributor

Close #2404, close #3644, close #3728, close #3730
Also #1298, #1861, #2293, #2709

This change helps users on Windows and/or 32-bit systems by removing the 1TB hard-coded map size.

@lukeyeager lukeyeager force-pushed the lmdb-map-full branch 2 times, most recently from b5dac14 to b55eeb5 Compare February 26, 2016 04:19
@lukeyeager lukeyeager changed the title MNIST example - double LMDB map size when needed LMDB map size - double when full Feb 26, 2016
@elezar
Copy link
Contributor

elezar commented Feb 26, 2016

From the error message shown in #2404 (comment), I think this would also help to address the issue with running the docker images under OSX.

See #3518 (comment).

@lukeyeager
Copy link
Contributor Author

Can I get a review on this PR? The problem keeps coming up (most recently at #4003).

@shelhamer
Copy link
Member

@lukeyeager this looks reasonable but I'd like to have it tested. Could you (or any volunteer!) confirm that the LeNet MNIST example still produces the same result and better still generate and check an ILSVRC—or any larger scale, color image data—lmdb?

@lukeyeager
Copy link
Contributor Author

I added a DLOG message to make it clear what's going on. Here's the output from a debug build for each of the requested tests (Release build still works fine, just not as informative):

MNIST example:

$ ./examples/mnist/create_mnist.sh 
Creating lmdb...
I0420 15:50:43.019188 12841 db_lmdb.cpp:35] Opened lmdb examples/mnist/mnist_train_lmdb
I0420 15:50:43.019433 12841 convert_mnist_data.cpp:88] A total of 60000 items.
I0420 15:50:43.019448 12841 convert_mnist_data.cpp:89] Rows: 28 Cols: 28
I0420 15:50:43.088729 12841 db_lmdb.cpp:101] Doubling LMDB map size to 2MB ...
I0420 15:50:43.138614 12841 db_lmdb.cpp:101] Doubling LMDB map size to 4MB ...
I0420 15:50:43.263651 12841 db_lmdb.cpp:101] Doubling LMDB map size to 8MB ...
I0420 15:50:45.462596 12841 db_lmdb.cpp:101] Doubling LMDB map size to 16MB ...
I0420 15:50:48.764879 12841 db_lmdb.cpp:101] Doubling LMDB map size to 32MB ...
I0420 15:50:50.015131 12841 db_lmdb.cpp:101] Doubling LMDB map size to 64MB ...
I0420 15:50:52.030293 12841 convert_mnist_data.cpp:108] Processed 60000 files.
I0420 15:50:52.084169 12844 db_lmdb.cpp:35] Opened lmdb examples/mnist/mnist_test_lmdb
I0420 15:50:52.084355 12844 convert_mnist_data.cpp:88] A total of 10000 items.
I0420 15:50:52.084370 12844 convert_mnist_data.cpp:89] Rows: 28 Cols: 28
I0420 15:50:52.147625 12844 db_lmdb.cpp:101] Doubling LMDB map size to 2MB ...
I0420 15:50:52.197607 12844 db_lmdb.cpp:101] Doubling LMDB map size to 4MB ...
I0420 15:50:52.314324 12844 db_lmdb.cpp:101] Doubling LMDB map size to 8MB ...
I0420 15:50:52.572700 12844 db_lmdb.cpp:101] Doubling LMDB map size to 16MB ...
I0420 15:50:52.712796 12844 convert_mnist_data.cpp:108] Processed 10000 files.
Done.

ImageNet validation set (50K, not the 1.3M training set):

$ ./build/tools/convert_imageset-d /raid/images/ilsvrc12/val_unsorted/ data/ilsvrc12/val.txt examples/imagenet/ilsvrc12_val_lmdb
I0420 15:46:48.630857 12003 convert_imageset.cpp:86] A total of 50000 images.
I0420 15:46:48.631248 12003 db_lmdb.cpp:35] Opened lmdb examples/imagenet/ilsvrc12_val_lmdb
I0420 15:46:56.756891 12003 db_lmdb.cpp:101] Doubling LMDB map size to 2MB ...
I0420 15:46:56.757127 12003 db_lmdb.cpp:101] Doubling LMDB map size to 4MB ...
I0420 15:46:56.757534 12003 db_lmdb.cpp:101] Doubling LMDB map size to 8MB ...
I0420 15:46:56.758452 12003 db_lmdb.cpp:101] Doubling LMDB map size to 16MB ...
I0420 15:46:56.761261 12003 db_lmdb.cpp:101] Doubling LMDB map size to 32MB ...
I0420 15:46:56.768887 12003 db_lmdb.cpp:101] Doubling LMDB map size to 64MB ...
I0420 15:46:56.786540 12003 db_lmdb.cpp:101] Doubling LMDB map size to 128MB ...
I0420 15:46:56.829578 12003 db_lmdb.cpp:101] Doubling LMDB map size to 256MB ...
I0420 15:46:56.916829 12003 db_lmdb.cpp:101] Doubling LMDB map size to 512MB ...
I0420 15:46:57.085364 12003 db_lmdb.cpp:101] Doubling LMDB map size to 1024MB ...
I0420 15:47:01.480418 12003 convert_imageset.cpp:144] Processed 1000 files.
I0420 15:47:09.345628 12003 db_lmdb.cpp:101] Doubling LMDB map size to 2048MB ...
I0420 15:47:13.841145 12003 convert_imageset.cpp:144] Processed 2000 files.
I0420 15:47:26.619678 12003 convert_imageset.cpp:144] Processed 3000 files.
I0420 15:47:34.023438 12003 db_lmdb.cpp:101] Doubling LMDB map size to 4096MB ...
I0420 15:47:38.012315 12003 convert_imageset.cpp:144] Processed 4000 files.
I0420 15:47:49.782227 12003 convert_imageset.cpp:144] Processed 5000 files.
I0420 15:48:01.866657 12003 convert_imageset.cpp:144] Processed 6000 files.
I0420 15:48:09.747257 12003 db_lmdb.cpp:101] Doubling LMDB map size to 8192MB ...
I0420 15:48:14.420207 12003 convert_imageset.cpp:144] Processed 7000 files.
I0420 15:48:26.005594 12003 convert_imageset.cpp:144] Processed 8000 files.
I0420 15:48:37.898532 12003 convert_imageset.cpp:144] Processed 9000 files.
I0420 15:48:50.917574 12003 convert_imageset.cpp:144] Processed 10000 files.
I0420 15:49:03.204999 12003 convert_imageset.cpp:144] Processed 11000 files.
I0420 15:49:15.131527 12003 convert_imageset.cpp:144] Processed 12000 files.
I0420 15:49:23.746790 12003 db_lmdb.cpp:101] Doubling LMDB map size to 16384MB ...
I0420 15:49:28.817087 12003 convert_imageset.cpp:144] Processed 13000 files.
I0420 15:49:41.960597 12003 convert_imageset.cpp:144] Processed 14000 files.
I0420 15:49:55.471647 12003 convert_imageset.cpp:144] Processed 15000 files.
I0420 15:50:08.232050 12003 convert_imageset.cpp:144] Processed 16000 files.
I0420 15:50:21.176434 12003 convert_imageset.cpp:144] Processed 17000 files.
I0420 15:50:34.679916 12003 convert_imageset.cpp:144] Processed 18000 files.
I0420 15:50:48.524369 12003 convert_imageset.cpp:144] Processed 19000 files.
I0420 15:51:01.250578 12003 convert_imageset.cpp:144] Processed 20000 files.
I0420 15:51:13.327651 12003 convert_imageset.cpp:144] Processed 21000 files.
I0420 15:51:25.104969 12003 convert_imageset.cpp:144] Processed 22000 files.
I0420 15:51:37.382288 12003 convert_imageset.cpp:144] Processed 23000 files.
I0420 15:51:49.226835 12003 convert_imageset.cpp:144] Processed 24000 files.
I0420 15:51:57.359771 12003 db_lmdb.cpp:101] Doubling LMDB map size to 32768MB ...
I0420 15:52:01.687361 12003 convert_imageset.cpp:144] Processed 25000 files.
I0420 15:52:14.073272 12003 convert_imageset.cpp:144] Processed 26000 files.
I0420 15:52:26.425160 12003 convert_imageset.cpp:144] Processed 27000 files.
I0420 15:52:39.002177 12003 convert_imageset.cpp:144] Processed 28000 files.
I0420 15:52:52.671653 12003 convert_imageset.cpp:144] Processed 29000 files.
I0420 15:53:04.965411 12003 convert_imageset.cpp:144] Processed 30000 files.
I0420 15:53:18.126294 12003 convert_imageset.cpp:144] Processed 31000 files.
I0420 15:53:31.020222 12003 convert_imageset.cpp:144] Processed 32000 files.
I0420 15:53:43.357043 12003 convert_imageset.cpp:144] Processed 33000 files.
I0420 15:53:56.091609 12003 convert_imageset.cpp:144] Processed 34000 files.
I0420 15:54:08.452263 12003 convert_imageset.cpp:144] Processed 35000 files.
I0420 15:54:21.596532 12003 convert_imageset.cpp:144] Processed 36000 files.
I0420 15:54:32.781889 12003 convert_imageset.cpp:144] Processed 37000 files.
I0420 15:54:46.468009 12003 convert_imageset.cpp:144] Processed 38000 files.
I0420 15:54:59.287308 12003 convert_imageset.cpp:144] Processed 39000 files.
I0420 15:55:12.398932 12003 convert_imageset.cpp:144] Processed 40000 files.
I0420 15:55:24.591749 12003 convert_imageset.cpp:144] Processed 41000 files.
I0420 15:55:37.352393 12003 convert_imageset.cpp:144] Processed 42000 files.
I0420 15:55:48.896234 12003 convert_imageset.cpp:144] Processed 43000 files.
I0420 15:56:01.307494 12003 convert_imageset.cpp:144] Processed 44000 files.
I0420 15:56:13.575911 12003 convert_imageset.cpp:144] Processed 45000 files.
I0420 15:56:26.319952 12003 convert_imageset.cpp:144] Processed 46000 files.
I0420 15:56:40.033298 12003 convert_imageset.cpp:144] Processed 47000 files.
I0420 15:56:51.899606 12003 convert_imageset.cpp:144] Processed 48000 files.
I0420 15:57:06.169258 12003 convert_imageset.cpp:144] Processed 49000 files.
I0420 15:57:13.917035 12003 db_lmdb.cpp:101] Doubling LMDB map size to 65536MB ...
I0420 15:57:18.296334 12003 convert_imageset.cpp:144] Processed 50000 files.

@IsaacYangSLA
Copy link

IsaacYangSLA commented Apr 25, 2016

I verified the pull request on ubuntu 14.04 32-bit. Here are the results.

Without PR-3731, two warnings, two errors (see below).

Warning 1:
CXX src/caffe/util/signal_handler.cpp
CXX src/caffe/util/db.cpp
src/caffe/util/db_lmdb.cpp:10:30: warning: large integer implicitly truncated to unsigned type [-Woverflow]
 const size_t LMDB_MAP_SIZE = 1099511627776;  // 1 TB
                              ^
CXX src/caffe/util/benchmark.cpp
CXX src/caffe/util/db_leveldb.cpp
CXX src/caffe/solvers/adam_solver.cpp


Warning 2:
CXX examples/mnist/convert_mnist_data.cpp
CXX .build_release/src/caffe/proto/caffe.pb.cc
In file included from examples/mnist/convert_mnist_data.cpp:10:0:
examples/mnist/convert_mnist_data.cpp: In function �void convert_dataset(const char*, const char*, const char*, const string&)�:
examples/mnist/convert_mnist_data.cpp:96:56: warning: large integer implicitly truncated to unsigned type [-Woverflow]
     CHECK_EQ(mdb_env_set_mapsize(mdb_env, 1099511627776), MDB_SUCCESS)  // 1TB
                                                        ^

Error 1 (during make runtest):
[----------] 5 tests from DBTest/1, where TypeParam = caffe::TypeLMDB
[ RUN      ] DBTest/1.TestGetDB
[       OK ] DBTest/1.TestGetDB (40 ms)
[ RUN      ] DBTest/1.TestWrite
F0425 13:28:59.536465 12113 db_lmdb.hpp:14] Check failed: mdb_status == 0 (-30792 vs. 0) MDB_MAP_FULL: Environment mapsize limit reached
*** Check failure stack trace: ***
    @ 0x40041efc  (unknown)
    @ 0x40041e13  (unknown)
    @ 0x4004185f  (unknown)
    @ 0x400448b0  (unknown)
    @ 0x40c52fee  caffe::db::LMDBTransaction::Put()
    @  0x80b6da0  caffe::DBTest_TestWrite_Test<>::TestBody()
    @  0x838caab  testing::internal::HandleExceptionsInMethodIfSupported<>()
    @  0x83835de  testing::Test::Run()
    @  0x8383698  testing::TestInfo::Run()
    @  0x83837cf  testing::TestCase::Run()
    @  0x838616d  testing::internal::UnitTestImpl::RunAllTests()
    @  0x8386442  testing::UnitTest::Run()
    @  0x8089c56  main
    @ 0x40dc7a83  (unknown)
    @  0x809193a  (unknown)
Aborted
make: *** [runtest] Error 134


Error 2 (during create_mnist dataset):
machine32:~/working/caffe$ ./examples/mnist/create_mnist.sh
Creating lmdb...
F0425 13:29:46.991717 12118 convert_mnist_data.cpp:136] Check failed: mdb_put(mdb_txn, mdb_dbi, &mdb_key, &mdb_data, 0) == 0 (-30792 vs. 0) mdb_put failed
*** Check failure stack trace: ***
    @ 0xb7432efc  (unknown)
    @ 0xb7432e13  (unknown)
    @ 0xb743285f  (unknown)
    @ 0xb74358b0  (unknown)
    @  0x804b12f  convert_dataset()
    @  0x804a058  main
    @ 0xb7008a83  (unknown)
    @  0x804a09c  (unknown)
Aborted
F0425 13:29:47.156059 12120 convert_mnist_data.cpp:136] Check failed: mdb_put(mdb_txn, mdb_dbi, &mdb_key, &mdb_data, 0) == 0 (-30792 vs. 0) mdb_put failed
*** Check failure stack trace: ***
    @ 0xb7480efc  (unknown)
    @ 0xb7480e13  (unknown)
    @ 0xb748085f  (unknown)
    @ 0xb74838b0  (unknown)
    @  0x804b12f  convert_dataset()
    @  0x804a058  main
    @ 0xb7056a83  (unknown)
    @  0x804a09c  (unknown)
Aborted
Done.

With PR-3731:

No warning during make all

PASSED in make test
[----------] Global test environment tear-down
[==========] 1056 tests from 146 test cases ran. (42155 ms total)
[  PASSED  ] 1056 tests.

Done w/o problem during create mnist dataset
machine32:~/working/caffe$ ./examples/mnist/create_mnist.sh
Creating lmdb...
I0425 13:33:58.701838 13017 db_lmdb.cpp:35] Opened lmdb examples/mnist/mnist_train_lmdb
I0425 13:33:58.703167 13017 convert_mnist_data.cpp:88] A total of 60000 items.
I0425 13:33:58.703193 13017 convert_mnist_data.cpp:89] Rows: 28 Cols: 28
I0425 13:33:59.928076 13017 convert_mnist_data.cpp:108] Processed 60000 files.
I0425 13:34:00.035750 13019 db_lmdb.cpp:35] Opened lmdb examples/mnist/mnist_test_lmdb
I0425 13:34:00.036919 13019 convert_mnist_data.cpp:88] A total of 10000 items.
I0425 13:34:00.037307 13019 convert_mnist_data.cpp:89] Rows: 28 Cols: 28
I0425 13:34:00.162163 13019 convert_mnist_data.cpp:108] Processed 10000 files.
Done.

@shelhamer shelhamer merged commit d8e2f05 into BVLC:master Apr 25, 2016
@shelhamer
Copy link
Member

Thanks @lukeyeager! This should settle the platform issues that come up from time to time.

@olesalscheider
Copy link
Contributor

This commit breaks opening large LMDBs:
Now, LMDB::Open does not call mdb_env_set_mapsize anymore but the map size is only doubles when writing data to the LMDB. But when opening an LMDB, the map size also must be large enough for it.

@lukeyeager
Copy link
Contributor Author

@olesalscheider can you paste the error? Are you opening the LMDBs to read for training or are you trying to add data to an existing LMDB?

@olesalscheider
Copy link
Contributor

I try to open them for training. With this patch, I get the following error:

F0503 18:08:58.062947 2769 db_lmdb.hpp:15] Check failed: mdb_status == 0 (12 vs. 0) Cannot allocate memory

@lukeyeager
Copy link
Contributor Author

  1. What system are you on? Linux x86_64?
  2. How big of an LMDB are we talking about?
  3. Can you verify that it works if you go back before this PR was merged?

@olesalscheider
Copy link
Contributor

  1. Yes, I'm on Linux x86_64
  2. My LMDB is around 2.5 GB
  3. It works fine if I revert 9042664

@lukeyeager
Copy link
Contributor Author

It's working fine for me. Do you get a stacktrace with your error? Which MDB command is failing? Does this happen right away when you start training or after a while?

Do you have a huge amount of data in a batch? Big enough to make a single batch bigger than the default 10MB map size?

@olesalscheider
Copy link
Contributor

It fails right away at the first call of mdb_env_open in LMDB::Open. This is the stack trace with debug symbols:

#0  0x00007ffff5b66cc9 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x00007ffff5b6a0d8 in __GI_abort () at abort.c:89
#2  0x00007ffff70cc099 in google::DumpStackTraceAndExit () at src/utilities.cc:147
#3  0x00007ffff70c2cfd in google::LogMessage::Fail () at src/logging.cc:1458
#4  0x00007ffff70c4bb0 in google::LogMessage::SendToLog (this=0x7fff94d1eb90) at src/logging.cc:1412
#5  0x00007ffff70c28c2 in google::LogMessage::Flush (this=0x20a8, this@entry=0x7fff94d1eb90) at src/logging.cc:1281
#6  0x00007ffff70c55ae in google::LogMessageFatal::~LogMessageFatal (this=0x7fff94d1eb90, __in_chrg=<optimized out>)
    at src/logging.cc:1984
#7  0x00007ffff7821bf7 in caffe::db::MDB_CHECK (mdb_status=12)
    at /home/salscheider/vcs/caffe/include/caffe/util/db_lmdb.hpp:15
#8  0x00007ffff78214fe in caffe::db::LMDB::Open (this=0x250c70003990, 
    source="/storage_local/cs/Train_Label_lmdb", mode=caffe::db::READ)
    at /home/salscheider/vcs/caffe/src/caffe/util/db_lmdb.cpp:21
#9  0x00007ffff788dd0d in caffe::DataReader::Body::InternalThreadEntry (this=0x58733d0)
    at /home/salscheider/vcs/caffe/src/caffe/data_reader.cpp:75
#10 0x00007ffff788aa33 in caffe::InternalThread::entry (this=0x58733d0, device=0, mode=caffe::Caffe::GPU, rand_seed=78095551, 
    solver_count=1, root_solver=true) at /home/salscheider/vcs/caffe/src/caffe/internal_thread.cpp:51
#11 0x00007ffff788d58f in boost::_mfi::mf5<void, caffe::InternalThread, int, caffe::Caffe::Brew, int, int, bool>::operator() (
    this=0x5873e58, p=0x58733d0, a1=0, a2=caffe::Caffe::GPU, a3=78095551, a4=1, a5=true)
    at /usr/include/boost/bind/mem_fn_template.hpp:619
#12 0x00007ffff788d469 in boost::_bi::list6<boost::_bi::value<caffe::InternalThread*>, boost::_bi::value<int>, boost::_bi::value<caffe::Caffe::Brew>, boost::_bi::value<int>, boost::_bi::value<int>, boost::_bi::value<bool> >::operator()<boost::_mfi::mf5<void, caffe::InternalThread, int, caffe::Caffe::Brew, int, int, bool>, boost::_bi::list0> (this=0x5873e68, f=..., a=...)
    at /usr/include/boost/bind/bind.hpp:596
#13 0x00007ffff788d371 in boost::_bi::bind_t<void, boost::_mfi::mf5<void, caffe::InternalThread, int, caffe::Caffe::Brew, int, int, bool>, boost::_bi::list6<boost::_bi::value<caffe::InternalThread*>, boost::_bi::value<int>, boost::_bi::value<caffe::Caffe::Brew>, boost::_bi::value<int>, boost::_bi::value<int>, boost::_bi::value<bool> > >::operator() (this=0x5873e58)
    at /usr/include/boost/bind/bind_template.hpp:20
#14 0x00007ffff788d334 in boost::detail::thread_data<boost::_bi::bind_t<void, boost::_mfi::mf5<void, caffe::InternalThread, int, caffe::Caffe::Brew, int, int, bool>, boost::_bi::list6<boost::_bi::value<caffe::InternalThread*>, boost::_bi::value<int>, boost::_bi::value<caffe::Caffe::Brew>, boost::_bi::value<int>, boost::_bi::value<int>, boost::_bi::value<bool> > > >::run (this=0x5873ca0)
    at /usr/include/boost/thread/detail/thread.hpp:117
#15 0x00007ffff5925a4a in ?? () from /usr/lib/x86_64-linux-gnu/libboost_thread.so.1.54.0
#16 0x00007ffff5704182 in start_thread (arg=0x7fff94d1f700) at pthread_create.c:312
#17 0x00007ffff5c2a47d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

One batch should be around 3 MB.

@lukeyeager
Copy link
Contributor Author

lukeyeager commented May 3, 2016

Hmm. While debugging, I tried this to get some more info:

$ git diff
diff --git a/src/caffe/util/db_lmdb.cpp b/src/caffe/util/db_lmdb.cpp
index df83a52..d0909f3 100644
--- a/src/caffe/util/db_lmdb.cpp
+++ b/src/caffe/util/db_lmdb.cpp
@@ -33,6 +33,9 @@ void LMDB::Open(const string& source, Mode mode) {
   }
 #endif
   LOG(INFO) << "Opened lmdb " << source;
+  struct MDB_envinfo current_info;
+  MDB_CHECK(mdb_env_info(mdb_env_, &current_info));
+  LOG(INFO) << "Map size is " << current_info.me_mapsize;
 }

 LMDBCursor* LMDB::NewCursor() {

When opening LMDBs for writing, the map size is the default 10MB (as expected):

I0503 09:54:46.593191 11405 db_lmdb.cpp:35] Opened lmdb examples/mnist/mnist_train_lmdb
I0503 09:54:46.593417 11405 db_lmdb.cpp:38] Map size is 1048576

But when opening LMDBs for reading/training (MDB_RDONLY | MDB_NOTLS), the map size is 1TB (?!?):

I0503 09:54:36.675428 11387 db_lmdb.cpp:35] Opened lmdb /raid/jobs/dev/20150423-182234-386d/train_db
I0503 09:54:36.675446 11387 db_lmdb.cpp:38] Map size is 1000000000000

I have no idea where that 1TB value is coming from. I'm not setting it in the code anywhere.

@hyc - does opening LMDBs with flags = MDB_RDONLY | MDB_NOTLS result in a different default map size?

@lukeyeager
Copy link
Contributor Author

@olesalscheider it looks like your system is refusing to allocate this 1TB.

You said you had Linux x86_64? Can you be more specific? Which distro?

$ uname -a
Linux lyeager-dt 3.13.0-85-generic #129-Ubuntu SMP Thu Mar 17 20:50:15 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 14.04.4 LTS
Release:    14.04
Codename:   trusty

Which version of LMDB do you have?

$ dpkg -l | grep lmdb
ii  liblmdb-dev:amd64                                     0.9.10-1                                            amd64        Lightning Memory-Mapped Database development files
ii  liblmdb0:amd64                                        0.9.10-1                                            amd64        Lightning Memory-Mapped Database shared library
ii  lmdb-doc                                              0.9.10-1                                            all          Lightning Memory-Mapped Database doxygen documentation
ii  python-lmdb                                           0.87-2                                              amd64        Lightning Memory-Mapped Database python bindings

@lukeyeager
Copy link
Contributor Author

lukeyeager commented May 3, 2016

I have no idea where that 1TB value is coming from. I'm not setting it in the code anywhere.

Aha! https://github.com/LMDB/lmdb/blob/LMDB_0.9.10/libraries/liblmdb/mdb.c#L3458-L3471

When you open a database and don't set a mapsize, it opens with the last mapsize that you used when creating/editing the database (or rounds up to the minimum). So that explains the 1TB map size when opening old databases.

@hyc
Copy link

hyc commented May 3, 2016

Luke Yeager wrote:

Hmm. While debugging, I tried this to get some more info:

$ git diff
diff --git a/src/caffe/util/db_lmdb.cpp b/src/caffe/util/db_lmdb.cpp
index df83a52..d0909f3 100644
--- a/src/caffe/util/db_lmdb.cpp
+++ b/src/caffe/util/db_lmdb.cpp
@@ -33,6 +33,9 @@ void LMDB::Open(const string& source, Mode mode) {
}
#endif
LOG(INFO) << "Opened lmdb " << source;

  • struct MDB_envinfo current_info;

  • MDB_CHECK(mdb_env_info(mdb_env_, &current_info));

  • LOG(INFO) << "Map size is " << current_info.me_mapsize;
    }

    LMDBCursor* LMDB::NewCursor() {

When opening LMDBs for writing (|MDB_RDONLY|), the map size is the default
10MB (as expected):

Huh? MDB_RDONLY is for read-only, not for writing.

|I0503 09:54:46.593191 11405 db_lmdb.cpp:35] Opened lmdb
examples/mnist/mnist_train_lmdb I0503 09:54:46.593417 11405 db_lmdb.cpp:38]
Map size is 1048576 |

But when opening LMDBs for reading/training, the map size is 1TB (?!?):

|I0503 09:54:36.675428 11387 db_lmdb.cpp:35] Opened lmdb
/raid/jobs/dev/20150423-182234-386d/train_db I0503 09:54:36.675446 11387
db_lmdb.cpp:38] Map size is 1000000000000 |

I have no idea where that 1TB value is coming from. I'm not setting it in the
code anywhere.

@hyc https://github.com/hyc - does opening LMDBs with |flags = MDB_RDONLY |
MDB_RDONLY| result in a different default map size?

MDB_RDONLY | MDB_RDONLY - huh, again?

No, there is nowhere that sets a different default map size, but if you're
opening an existing DB it reads the map size that was last used on that DB.

-- Howard Chu
CTO, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/

@hyc
Copy link

hyc commented May 3, 2016

Luke Yeager wrote:

Aha!
https://github.com/LMDB/lmdb/blob/LMDB_0.9.10/libraries/liblmdb/mdb.c#L3458-L3471

When you open a database and don't set a mapsize, it opens with the last
mapsize that you used when creating/editing the database (or rounds up to the
minimum). So that explains the 1TB map size when opening old databases.

Yes, that's required since otherwise an arbitrary utility program can't know
what size to use. It's already documented that mdb_env_set_mapsize() will
persist changes into the environment.

http://symas.com/mdb/doc/group__mdb.html#gaa2506ec8dab3d969b0e609cd82e619e5

-- Howard Chu
CTO, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/

@lukeyeager
Copy link
Contributor Author

@hyc Ok, that makes sense. Thanks for chiming in!

@olesalscheider I'm not seeing a problem with this PR yet. If you could create the LMDB on your system, then you should be able to open it for reading. When you get a chance, will you send me the info I asked for in my last post? Maybe that will reveal something.

@olesalscheider
Copy link
Contributor

@lukeyeager: The system I work on is similar to yours but I have a newer version of liblmdb:

$ uname -a
Linux mrtknecht1 3.13.0-58-generic #97-Ubuntu SMP Wed Jul 8 02:56:15 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

$ lsb_release -a
No LSB modules are available.                                                                                                              
Distributor ID: Ubuntu
Description:    Ubuntu 14.04.4 LTS
Release:        14.04
Codename:       trusty

$ dpkg -l | grep lmdb
ii  liblmdb-dev:amd64                                     0.9.16-1~ubuntu14.04.1                              amd64        Lightning Memory-Mapped Database development files
ii  liblmdb0:amd64                                        0.9.16-1~ubuntu14.04.1                              amd64        Lightning Memory-Mapped Database shared library
ii  lmdb-doc                                              0.9.10-1                                            all          Lightning Memory-Mapped Database doxygen documentation
ii  python-lmdb                                           0.86-1build1~ubuntu14.04.1                          amd64        Python binding for LMDB Lightning Memory-Mapped Database

@olesalscheider
Copy link
Contributor

I just noticed that the first LMDB in fact loads fine and reports a map size of 100000000000000.
It is the second LMDB that fails to load. But it was created with the same script and contains the same amount of images (but with 1 instead of 3 channels)...

@lukeyeager
Copy link
Contributor Author

Woah yeah that fails for me too. I can't allocate 200TB of virtual memory. You should update your script to set the limit to 1TB instead of 100TB.

If you want to use the same LMDBs without re-creating them (I would), then try this:

$ git diff
diff --git a/src/caffe/util/db_lmdb.cpp b/src/caffe/util/db_lmdb.cpp
index df83a52..23d70c3 100644
--- a/src/caffe/util/db_lmdb.cpp
+++ b/src/caffe/util/db_lmdb.cpp
@@ -33,6 +33,13 @@ void LMDB::Open(const string& source, Mode mode) {
   }
 #endif
   LOG(INFO) << "Opened lmdb " << source;
+  if (mode == READ) {
+    // Set the mapsize to the minimum allowed
+    MDB_CHECK(mdb_env_set_mapsize(mdb_env_, 1));
+  }
+  struct MDB_envinfo current_info;
+  MDB_CHECK(mdb_env_info(mdb_env_, &current_info));
+  LOG(INFO) << "Map size is " << current_info.me_mapsize;
 }

 LMDBCursor* LMDB::NewCursor() {

If that works for you, I'll follow up with another PR to force the mapsize to the minimum allowed when reading databases.

@hyc
Copy link

hyc commented May 3, 2016

Luke Yeager wrote:

Woah yeah that fails for me too. I can't allocate 200TB of virtual memory. You
should update your script to set the limit to 1TB instead of 100TB.

Heh yeah, AMD64 only has 48 address bits, 256TB max and half is reserved for
the OS so 128TB is max size for LMDB there. Not sure what the size limits are
for SPARC or POWER...

-- Howard Chu
CTO, Symas Corp. http://www.symas.com
Director, Highland Sun http://highlandsun.com/hyc/
Chief Architect, OpenLDAP http://www.openldap.org/project/

@olesalscheider
Copy link
Contributor

Yes, I also just noticed that it was a bit much...
@lukeyeager: Your patch works but now it is a bit slow to open the LMDBs. I think I will just create them again.

@lukeyeager
Copy link
Contributor Author

lukeyeager commented May 3, 2016

Your patch works but now it is a bit slow to open the LMDBs

Are you sure? It doesn't look like finding the minimum size requires any significant computation:
https://github.com/LMDB/lmdb/blob/LMDB_0.9.10/libraries/liblmdb/mdb.c#L3379-L3384

I'm noticing an additional ~0.15 seconds for each call to set_mapsize. And it's no slower than setting the map size to 1TB, which was the previous behavior.

@olesalscheider
Copy link
Contributor

Yes, the call to mdb_env_set_mapsize takes a bit more than 17 seconds here. Maybe it's spent in some syscall like ftruncate in mdb_env_map or something like that... Probably something that needs O(# of pages).

But this is probably not an issue for anyone with a sane map size.

fxbit pushed a commit to Yodigram/caffe that referenced this pull request Sep 1, 2016
dynamically set LMDB map size (double when full)
jspark1105 pushed a commit to IntelLabs/SkimCaffe that referenced this pull request Oct 6, 2016
@sidnt sidnt mentioned this pull request Jan 6, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Grow LMDB map size incrementally LMDB source doesn't work under valgrind
6 participants