Support for large databases on 32 bit embedded systems #10486

fluffynukeit · 2018-11-15T18:28:54Z

Hi, for this proposal, I am also offering notes on a solution I am currently working on, but had to temporarily shelve to work on other things. I am hoping this thread can serve as a discussion of my solution-in-progress and also possible alternatives.

This is my first time working with Go and influxdb's source. I'm very open to suggestions or recommendations if something is not done idiomatically.

Proposal: Enable database on 32-bit embedded architectures to have maximum aggregate TSM file size greater than e.g. 3.4 GB.

Current behavior: Currently, influx memory maps all TSM files into its userland address space. On 64 bit architectures, this not a problem because the userland address space is huge. On 32 bit architectures, it can be between 2 and 3.4 GB, depending on the system. When the TSM files approach this limit, Influx fails with "Cannot allocate memory" errors.

Desired behavior: TSM limit of at least 100 GB.

Use case: Data logging locally on a piece of industrial equipment, not the cloud. See multiple times this has come up previously in similar applications: #10160, #6171

Question: Would the following solution be amenable to the influx team for inclusion into the main line? If not, what alternative solution would be? My organization would prefer to not maintain a separate fork of influx.

My in-progress solution is here: https://github.com/fluffynukeit/influxdb/tree/dual_mmap.

The implemented changes are described below. Most changes are associated with the mmapAccessor in reader.go. If there is a better way to do this, I'm open to working on it. This is what I was able to figure out from the code.

The index and block data regions of the TSM file are memory mapped separately. This allows madvising just the index region (a comment suggests that this is better than the current method that madvises the entire TSM file). To enable mmapping different regions, new functionality is added to mmap_unix and mmap_windows along with a new type. The new mmap type manages a memory mapping to a slice, ensuring that the memory map starts on a page boundary.
The new mmap type can release the memory map region using release() or automatically re-map it (if needed) when calling bytes().
A new configuration parameter tsm-on-demand-mmap (default false) controls the changes to behavior described below. When enabled:
mmap accessor will defer a call to free() after entering any function requiring the block data mapping. Any function requiring the mapping also increments an access counter.
The free() call (deferred typically) releases only the block data mmap if there are no executing accesses. Note that index data is never released.

This solution did a good job of powering through the address space limit. Care needs to be taken to set the shard duration low enough that compaction jobs don't steal a too large chunk of the memory space, and that individual TSM files are well below the 3.4 GB limit. See the monitoring output below, specifically the 6.3 GB aggregate size and the sub-GB VM peak usage. To generate the database, I used influx-stress with a modification to use randomly changing data instead of incrementing data that would otherwise be compacted to almost nothing via RLE.

Every 20s: du -hs /LASERINFO/influxdb/data/stress/autogen/ && du -hs /LASERINFO/influxdb/data/stress/autogen/* && cat /proc/904/status && tail -n 4 /LASERINFO/dan_influxlo2018-11-05 15:30:11

6.3G    /LASERINFO/influxdb/data/stress/autogen/
207.4M  /LASERINFO/influxdb/data/stress/autogen/10
207.7M  /LASERINFO/influxdb/data/stress/autogen/11
208.1M  /LASERINFO/influxdb/data/stress/autogen/12
207.7M  /LASERINFO/influxdb/data/stress/autogen/13
208.0M  /LASERINFO/influxdb/data/stress/autogen/14
177.8M  /LASERINFO/influxdb/data/stress/autogen/15
186.2M  /LASERINFO/influxdb/data/stress/autogen/16
183.9M  /LASERINFO/influxdb/data/stress/autogen/17
207.3M  /LASERINFO/influxdb/data/stress/autogen/18
207.4M  /LASERINFO/influxdb/data/stress/autogen/19
207.8M  /LASERINFO/influxdb/data/stress/autogen/20
208.0M  /LASERINFO/influxdb/data/stress/autogen/21
207.9M  /LASERINFO/influxdb/data/stress/autogen/22
208.2M  /LASERINFO/influxdb/data/stress/autogen/23
208.1M  /LASERINFO/influxdb/data/stress/autogen/24
208.1M  /LASERINFO/influxdb/data/stress/autogen/25
177.1M  /LASERINFO/influxdb/data/stress/autogen/26
207.7M  /LASERINFO/influxdb/data/stress/autogen/27
207.4M  /LASERINFO/influxdb/data/stress/autogen/28
207.7M  /LASERINFO/influxdb/data/stress/autogen/29
208.4M  /LASERINFO/influxdb/data/stress/autogen/30
207.7M  /LASERINFO/influxdb/data/stress/autogen/31
208.1M  /LASERINFO/influxdb/data/stress/autogen/32
207.9M  /LASERINFO/influxdb/data/stress/autogen/33
208.0M  /LASERINFO/influxdb/data/stress/autogen/34
207.9M  /LASERINFO/influxdb/data/stress/autogen/35
207.1M  /LASERINFO/influxdb/data/stress/autogen/36
206.8M  /LASERINFO/influxdb/data/stress/autogen/37
204.6M  /LASERINFO/influxdb/data/stress/autogen/38
31.8M   /LASERINFO/influxdb/data/stress/autogen/39
134.3M  /LASERINFO/influxdb/data/stress/autogen/7
160.2M  /LASERINFO/influxdb/data/stress/autogen/8
162.2M  /LASERINFO/influxdb/data/stress/autogen/9
Name:   influxd
Umask:  0022
State:  S (sleeping)
Tgid:   904
Ngid:   0
Pid:    904
PPid:   898
TracerPid:      0
Uid:    0       0       0       0
Gid:    0       0       0       0
FDSize: 256
Groups: 0
VmPeak:   999248 kB
VmSize:   897072 kB
VmLck:         0 kB
VmPin:         0 kB
VmHWM:    215724 kB
VmRSS:    124044 kB
RssAnon:          107592 kB
RssFile:           16452 kB
RssShmem:              0 kB
VmData:   174904 kB
VmStk:       136 kB
VmExe:      9124 kB
VmLib:         4 kB
VmPTE:       264 kB
VmPMD:         0 kB
VmSwap:        0 kB
Threads:        12
SigQ:   0/7913
SigPnd: 0000000000000000
ShdPnd: 0000000000000000
SigBlk: fffffffe3bfa3a00
SigIgn: 0000000000000000
SigCgt: fffffffe7fc1feff
CapInh: 0000000000000000
CapPrm: 0000003fffffffff
CapEff: 0000003fffffffff
CapBnd: 0000003fffffffff
CapAmb: 0000000000000000
Cpus_allowed:   3
Cpus_allowed_list:      0-1
voluntary_ctxt_switches:        1014
nonvoluntary_ctxt_switches:     605
[httpd] 172.28.169.86 - - [05/Nov/2018:15:30:00 +0000] "POST /write?db=stress HTTP/1.1" 204 0 "-" "fasthttp" a8a0fabb-e10f-11e8-8224-000a35001e53 2286904
[httpd] 172.28.169.86 - - [05/Nov/2018:15:30:02 +0000] "POST /write?db=stress HTTP/1.1" 204 0 "-" "fasthttp" a9ffac5a-e10f-11e8-8225-000a35001e53 2220618
[httpd] 172.28.169.86 - - [05/Nov/2018:15:30:05 +0000] "POST /write?db=stress HTTP/1.1" 204 0 "-" "fasthttp" ab577bfc-e10f-11e8-8226-000a35001e53 2534072
[httpd] 172.28.169.86 - - [05/Nov/2018:15:30:07 +0000] "POST /write?db=stress HTTP/1.1" 204 0 "-" "fasthttp" acdba48f-e10f-11e8-8227-000a35001e53 1914275

While writing data to the database was great, I had serious problem with querying the data. I encountered frequent, unpredictable SIGSEGV when running queries with the influx client. I believe this had something to do with cursors entering the same reader.go functions and some problem happening with the access counters that allowed the free() function to release an mmap when another cursor still needed to use it. I'm not super competent with the synchronization methods in Go, so I could have messed this part up.

The text was updated successfully, but these errors were encountered:

fluffynukeit · 2019-03-05T16:56:06Z

This issue is superseded by #12362.

stale · 2019-07-23T07:32:14Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale · 2019-07-30T07:42:34Z

This issue has been automatically closed because it has not had recent activity. Please reopen if this issue is still important to you. Thank you for your contributions.

ryan-williams mentioned this issue Dec 22, 2018

error compacting TSM files: cannot allocate memory #6975

Closed

dgnorton added the 1.x label Jan 7, 2019

fluffynukeit mentioned this issue Mar 5, 2019

Large database support for 32bit systems #12362

Closed

3 tasks

stale bot added the wontfix label Jul 23, 2019

stale bot closed this as completed Jul 30, 2019

JsBergbau mentioned this issue Jan 20, 2020

influxdb persistence: enable time resolution of 1 second openhab/openhab1-addons#5933

Open

JsBergbau mentioned this issue Jan 25, 2022

Compaction fails on 32 bit, retried every 3 minutes--> high write load ~350 GB written to SSD per day, compaction on 64 bit works prometheus/prometheus#10199

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for large databases on 32 bit embedded systems #10486

Support for large databases on 32 bit embedded systems #10486

fluffynukeit commented Nov 15, 2018 •

edited

Loading

fluffynukeit commented Mar 5, 2019

stale bot commented Jul 23, 2019

stale bot commented Jul 30, 2019

Support for large databases on 32 bit embedded systems #10486

Support for large databases on 32 bit embedded systems #10486

Comments

fluffynukeit commented Nov 15, 2018 • edited Loading

fluffynukeit commented Mar 5, 2019

stale bot commented Jul 23, 2019

stale bot commented Jul 30, 2019

fluffynukeit commented Nov 15, 2018 •

edited

Loading