Skip to content

Latest commit

 

History

History
48 lines (23 loc) · 1.54 KB

NOTES.md

File metadata and controls

48 lines (23 loc) · 1.54 KB

Note

Materials

Useful materials about HDF5

HDF5 tech note

Compression benchmark

Other materials and discussion

  1. A discussion industrial large dataset solution in Pytorch

  2. H5Record reddit post

Comparison between LMDB and HDF5

Data obtain from w86763777 script

Compression Type Write Read Size
HDF5 4.32 secs 1.20 secs 496K
LMDB 1.68 secs 0.10 secs 224M
  • Benchmarked on 103 images, total size of 5.4M, image resized on LMDB file

Overall LMDB provide a 2.6x improvement on write and 12x on read speed (results are averaged on 10 reads/writes session, benchmark on macbook 2017 Intel Core i5 ).

Maybe H5record should include additional backend choice for LMDB since it supports significant fast load of binary file.

TODO

  • Test combinations of different data modalities

  • Do more tuning and experiments on different driver settings

  • Performance benchmark:

    • Performance comparison between zip in multiple workers ( I suspect there's some improvement to be done here )

    • In memory (dataset[:]) access vs no compression