Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for large databases on 32 bit embedded systems #10486

Closed
fluffynukeit opened this issue Nov 15, 2018 · 3 comments
Closed

Support for large databases on 32 bit embedded systems #10486

fluffynukeit opened this issue Nov 15, 2018 · 3 comments

Comments

@fluffynukeit
Copy link

fluffynukeit commented Nov 15, 2018

Hi, for this proposal, I am also offering notes on a solution I am currently working on, but had to temporarily shelve to work on other things. I am hoping this thread can serve as a discussion of my solution-in-progress and also possible alternatives.

This is my first time working with Go and influxdb's source. I'm very open to suggestions or recommendations if something is not done idiomatically.

Proposal: Enable database on 32-bit embedded architectures to have maximum aggregate TSM file size greater than e.g. 3.4 GB.

Current behavior: Currently, influx memory maps all TSM files into its userland address space. On 64 bit architectures, this not a problem because the userland address space is huge. On 32 bit architectures, it can be between 2 and 3.4 GB, depending on the system. When the TSM files approach this limit, Influx fails with "Cannot allocate memory" errors.

Desired behavior: TSM limit of at least 100 GB.

Use case: Data logging locally on a piece of industrial equipment, not the cloud. See multiple times this has come up previously in similar applications: #10160, #6171

Question: Would the following solution be amenable to the influx team for inclusion into the main line? If not, what alternative solution would be? My organization would prefer to not maintain a separate fork of influx.

My in-progress solution is here: https://github.com/fluffynukeit/influxdb/tree/dual_mmap.

The implemented changes are described below. Most changes are associated with the mmapAccessor in reader.go. If there is a better way to do this, I'm open to working on it. This is what I was able to figure out from the code.

  1. The index and block data regions of the TSM file are memory mapped separately. This allows madvising just the index region (a comment suggests that this is better than the current method that madvises the entire TSM file). To enable mmapping different regions, new functionality is added to mmap_unix and mmap_windows along with a new type. The new mmap type manages a memory mapping to a slice, ensuring that the memory map starts on a page boundary.
  2. The new mmap type can release the memory map region using release() or automatically re-map it (if needed) when calling bytes().
  3. A new configuration parameter tsm-on-demand-mmap (default false) controls the changes to behavior described below. When enabled:
  4. mmap accessor will defer a call to free() after entering any function requiring the block data mapping. Any function requiring the mapping also increments an access counter.
  5. The free() call (deferred typically) releases only the block data mmap if there are no executing accesses. Note that index data is never released.

This solution did a good job of powering through the address space limit. Care needs to be taken to set the shard duration low enough that compaction jobs don't steal a too large chunk of the memory space, and that individual TSM files are well below the 3.4 GB limit. See the monitoring output below, specifically the 6.3 GB aggregate size and the sub-GB VM peak usage. To generate the database, I used influx-stress with a modification to use randomly changing data instead of incrementing data that would otherwise be compacted to almost nothing via RLE.

Every 20s: du -hs /LASERINFO/influxdb/data/stress/autogen/ && du -hs /LASERINFO/influxdb/data/stress/autogen/* && cat /proc/904/status && tail -n 4 /LASERINFO/dan_influxlo2018-11-05 15:30:11

6.3G    /LASERINFO/influxdb/data/stress/autogen/
207.4M  /LASERINFO/influxdb/data/stress/autogen/10
207.7M  /LASERINFO/influxdb/data/stress/autogen/11
208.1M  /LASERINFO/influxdb/data/stress/autogen/12
207.7M  /LASERINFO/influxdb/data/stress/autogen/13
208.0M  /LASERINFO/influxdb/data/stress/autogen/14
177.8M  /LASERINFO/influxdb/data/stress/autogen/15
186.2M  /LASERINFO/influxdb/data/stress/autogen/16
183.9M  /LASERINFO/influxdb/data/stress/autogen/17
207.3M  /LASERINFO/influxdb/data/stress/autogen/18
207.4M  /LASERINFO/influxdb/data/stress/autogen/19
207.8M  /LASERINFO/influxdb/data/stress/autogen/20
208.0M  /LASERINFO/influxdb/data/stress/autogen/21
207.9M  /LASERINFO/influxdb/data/stress/autogen/22
208.2M  /LASERINFO/influxdb/data/stress/autogen/23
208.1M  /LASERINFO/influxdb/data/stress/autogen/24
208.1M  /LASERINFO/influxdb/data/stress/autogen/25
177.1M  /LASERINFO/influxdb/data/stress/autogen/26
207.7M  /LASERINFO/influxdb/data/stress/autogen/27
207.4M  /LASERINFO/influxdb/data/stress/autogen/28
207.7M  /LASERINFO/influxdb/data/stress/autogen/29
208.4M  /LASERINFO/influxdb/data/stress/autogen/30
207.7M  /LASERINFO/influxdb/data/stress/autogen/31
208.1M  /LASERINFO/influxdb/data/stress/autogen/32
207.9M  /LASERINFO/influxdb/data/stress/autogen/33
208.0M  /LASERINFO/influxdb/data/stress/autogen/34
207.9M  /LASERINFO/influxdb/data/stress/autogen/35
207.1M  /LASERINFO/influxdb/data/stress/autogen/36
206.8M  /LASERINFO/influxdb/data/stress/autogen/37
204.6M  /LASERINFO/influxdb/data/stress/autogen/38
31.8M   /LASERINFO/influxdb/data/stress/autogen/39
134.3M  /LASERINFO/influxdb/data/stress/autogen/7
160.2M  /LASERINFO/influxdb/data/stress/autogen/8
162.2M  /LASERINFO/influxdb/data/stress/autogen/9
Name:   influxd
Umask:  0022
State:  S (sleeping)
Tgid:   904
Ngid:   0
Pid:    904
PPid:   898
TracerPid:      0
Uid:    0       0       0       0
Gid:    0       0       0       0
FDSize: 256
Groups: 0
VmPeak:   999248 kB
VmSize:   897072 kB
VmLck:         0 kB
VmPin:         0 kB
VmHWM:    215724 kB
VmRSS:    124044 kB
RssAnon:          107592 kB
RssFile:           16452 kB
RssShmem:              0 kB
VmData:   174904 kB
VmStk:       136 kB
VmExe:      9124 kB
VmLib:         4 kB
VmPTE:       264 kB
VmPMD:         0 kB
VmSwap:        0 kB
Threads:        12
SigQ:   0/7913
SigPnd: 0000000000000000
ShdPnd: 0000000000000000
SigBlk: fffffffe3bfa3a00
SigIgn: 0000000000000000
SigCgt: fffffffe7fc1feff
CapInh: 0000000000000000
CapPrm: 0000003fffffffff
CapEff: 0000003fffffffff
CapBnd: 0000003fffffffff
CapAmb: 0000000000000000
Cpus_allowed:   3
Cpus_allowed_list:      0-1
voluntary_ctxt_switches:        1014
nonvoluntary_ctxt_switches:     605
[httpd] 172.28.169.86 - - [05/Nov/2018:15:30:00 +0000] "POST /write?db=stress HTTP/1.1" 204 0 "-" "fasthttp" a8a0fabb-e10f-11e8-8224-000a35001e53 2286904
[httpd] 172.28.169.86 - - [05/Nov/2018:15:30:02 +0000] "POST /write?db=stress HTTP/1.1" 204 0 "-" "fasthttp" a9ffac5a-e10f-11e8-8225-000a35001e53 2220618
[httpd] 172.28.169.86 - - [05/Nov/2018:15:30:05 +0000] "POST /write?db=stress HTTP/1.1" 204 0 "-" "fasthttp" ab577bfc-e10f-11e8-8226-000a35001e53 2534072
[httpd] 172.28.169.86 - - [05/Nov/2018:15:30:07 +0000] "POST /write?db=stress HTTP/1.1" 204 0 "-" "fasthttp" acdba48f-e10f-11e8-8227-000a35001e53 1914275

While writing data to the database was great, I had serious problem with querying the data. I encountered frequent, unpredictable SIGSEGV when running queries with the influx client. I believe this had something to do with cursors entering the same reader.go functions and some problem happening with the access counters that allowed the free() function to release an mmap when another cursor still needed to use it. I'm not super competent with the synchronization methods in Go, so I could have messed this part up.

@fluffynukeit
Copy link
Author

This issue is superseded by #12362.

@stale
Copy link

stale bot commented Jul 23, 2019

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix label Jul 23, 2019
@stale
Copy link

stale bot commented Jul 30, 2019

This issue has been automatically closed because it has not had recent activity. Please reopen if this issue is still important to you. Thank you for your contributions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants