Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Redis profiling experiment #442

Merged
merged 5 commits into from
Dec 8, 2015
Merged

Redis profiling experiment #442

merged 5 commits into from
Dec 8, 2015

Conversation

urso
Copy link

@urso urso commented Dec 4, 2015

Did play with golang pprof tool today + redis pcap files. This PR hugely reduces number of objects allocated by redis-module itself. When replaying redis traffic only object allocations are mostly dominated by the json encoder (the output worker directly calls console outputer doing json encoding):

(pprof) top20 -cum
2809008 of 6861983 total (40.94%)
Dropped 48 nodes (cum <= 34309)
Showing top 20 nodes out of 61 (cum >= 1385433)
    flat  flat%   sum%        cum   cum%
        0     0%     0%    6860618   100%  runtime.goexit
        0     0%     0%    3767472 54.90%  github.com/elastic/beats/libbeat/publisher.(*messageWorker).run
        0     0%     0%    2834909 41.31%  github.com/elastic/beats/libbeat/publisher.(*outputWorker).onBulk
        0     0%     0%    2834909 41.31%  github.com/elastic/beats/libbeat/publisher.(*outputWorker).onMessage
    229379  3.34%  3.34%    2834909 41.31%  github.com/elastic/beats/libbeat/publisher.(*outputWorker).sendBulk
    56608  0.82%  4.17%    2605530 37.97%  encoding/json.Marshal
        0     0%  4.17%    2605530 37.97%  github.com/elastic/beats/libbeat/outputs.(*bulkOutputAdapter).BulkPublish
        0     0%  4.17%    2605530 37.97%  github.com/elastic/beats/libbeat/outputs/console.(*console).PublishEvent
        0     0%  4.17%    2578716 37.58%  encoding/json.(*encodeState).marshal
        0     0%  4.17%    2578716 37.58%  encoding/json.(*encodeState).reflectValue
        0     0%  4.17%    2578716 37.58%  encoding/json.(*mapEncoder).(encoding/json.encode)-fm
    81922  1.19%  5.36%    2578716 37.58%  encoding/json.(*mapEncoder).encode
        0     0%  5.36%    2220027 32.35%  github.com/elastic/beats/packetbeat/beat.(*Packetbeat).Run.func1
    196612  2.87%  8.23%    2220027 32.35%  github.com/elastic/beats/packetbeat/sniffer.(*SnifferSetup).Run
    71010  1.03%  9.26%    1849392 26.95%  github.com/elastic/beats/packetbeat/decoder.(*DecoderStruct).DecodePacketData
    377929  5.51% 14.77%    1774285 25.86%  github.com/elastic/beats/packetbeat/protos/tcp.(*Tcp).Process
1458197 21.25% 36.02%    1458197 21.25%  reflect.unsafe_New
        0     0% 36.02%    1396356 20.35%  github.com/elastic/beats/packetbeat/protos/redis.(*Redis).Parse
        0     0% 36.02%    1396356 20.35%  github.com/elastic/beats/packetbeat/protos/tcp.(*TcpStream).addPacket
    337351  4.92% 40.94%    1385433 20.19%  github.com/elastic/beats/packetbeat/protos/redis.(*Redis).doParse

TODO: some benchmarking on redis module to check if allocations really make a big difference.

changes

  • added commandline flag '-httpprof' to start HTTP pprof endpoint for doing
    some live profiling of beat services (heap profiling, cpu profiling)

  • add streambuf.AppendWithCapLimits which guarantees newly allocated
    buffers have at least configured capacity in case the buffer
    must grow when appending new content (excert more control on buffer sizing)
    => not used, as number of objects allocated was minor in contrast to
    other object types. Plus bad policy can increase overal memory usage
    dramatically due to pointers back into original slice

  • add common.NetString basically being []byte. Difference to []byte is, this
    type implements the encoding.TextMarhsaller interface. NetString objects
    will be handled by JSON encoder like normal strings, thusly no allocation
    and copy is required when extracting strings from parsed messages.
    NetString being []byte will directly point into original message buffer.

    NetString also provides UnsafeString to cast a NetString into a string
    without allocation + copy

    => Using NetString over string reduce allocations + copy operations +
    less object to be GCed

  • remove some unwanted allocation in async publisher client

  • redis 'optimizations':

    • replace all string types with common.NetString to reduce allocations
    • pre-allocate command buffer holding command + 4 arguments by default
    • remove some more allocations + copies by 'low-level code'
    • remove transaction object type (not required anymore due to correlation
      supporting piping), but generate event type directly from correlated
      messages => reduces allocation once more + removes computation of unused
      fields

@urso urso added the discuss Issue needs further discussion. label Dec 4, 2015
func (n NetString) UnsafeString() string {
bh := (*reflect.SliceHeader)(unsafe.Pointer(&n))
sh := reflect.StringHeader{Data: bh.Data, Len: bh.Len}
return *(*string)(unsafe.Pointer(&sh))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Go performs a copy when converting from []byte to string to guarantee that the contents of the string can not be changed by mutating the []byte since the contract in Go is that strings are immutable.

This seems like a good optimization to use when you know for sure that the returned string is not going to be retained (where immutability guarantees might be required). The method should get updated to document when and how to safely use it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure we want to get ourselves in the Unsafe territory. How much do we win in terms of performance? Perhaps we can find another way to avoid the copy by refactoring at an upper layer.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not keen about unsafe territory either. That's why introduced NetString over unsafe copying all []byte into string. Purpose of this method is to be used with map lookups (map keys can not be []byte, but must be string) in cases of compiler not optimizing []byte into string conversion. Optimized version in disassembly contains runtime.slicebytetostringtmp, unoptimized with copy will be runtime.slicebytetostring).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not required anymore. Rather remove it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you intend to remove the method before merging?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, I thought I removed it. Will be removed.

@andrewkroh
Copy link
Member

I like the -httpprof flag. I am always modifying the source to drop in the HTTP server to enable pprof.

@urso
Copy link
Author

urso commented Dec 4, 2015

yep, also like to have -httpprof. you can run packetbeat with '-httpprof :6060' to have to locally on port 6060 ;)

@urso
Copy link
Author

urso commented Dec 4, 2015

added some very simple/small benchmarks for redis parser only (does not take the removed allocations for removal of redundant transaction object into account):

old parser (using string):

BenchmarkParserNoArgsResult-4    1000000          1414 ns/op         464 B/op          9 allocs/op
BenchmarkParserArrayRequest-4    1000000          2108 ns/op         592 B/op         15 allocs/op
BenchmarkParserArrayResponse-4   1000000          2420 ns/op         688 B/op         17 allocs/op
BenchmarkParserArrayNested-4     1000000          3223 ns/op         896 B/op         26 allocs/op
BenchmarkParserArray2Passes-4     500000          3390 ns/op         832 B/op         24 allocs/op

new parser (using NetString):

BenchmarkParserNoArgsResult-4    1000000          1338 ns/op         448 B/op          6 allocs/op
BenchmarkParserArrayRequest-4    1000000          1551 ns/op         448 B/op          6 allocs/op
BenchmarkParserArrayResponse-4   1000000          1598 ns/op         464 B/op          6 allocs/op
BenchmarkParserArrayNested-4     1000000          2263 ns/op         640 B/op         13 allocs/op
BenchmarkParserArray2Passes-4    1000000          2302 ns/op         576 B/op         10 allocs/op

Allocations and runtime are reduced (some allocations are due to test setup).

@urso urso force-pushed the prof/redis branch 2 times, most recently from 4206e37 to 32c7ea6 Compare December 5, 2015 23:14
@urso urso added the review label Dec 6, 2015
@tsg tsg added the Packetbeat label Dec 7, 2015
@urso urso mentioned this pull request Dec 7, 2015
@tsg
Copy link
Contributor

tsg commented Dec 7, 2015

LGTM! Do you want to cleanup history before merging?

@tsg
Copy link
Contributor

tsg commented Dec 7, 2015

Also a line in the Changelog about the Redis parser being faster would be good.

@urso
Copy link
Author

urso commented Dec 7, 2015

let me cleanup first

@urso
Copy link
Author

urso commented Dec 8, 2015

cleaned up history + guarded all debug statements.

New benchmark results:

BenchmarkParserNoArgsResult-4    1000000          1069 ns/op         368 B/op          3 allocs/op
BenchmarkParserArrayRequest-4    1000000          1292 ns/op         368 B/op          3 allocs/op
BenchmarkParserArrayResponse-4   1000000          1331 ns/op         384 B/op          3 allocs/op
BenchmarkParserArrayNested-4     1000000          1457 ns/op         400 B/op          4 allocs/op
BenchmarkParserArray2Passes-4    1000000          1804 ns/op         416 B/op          4 allocs/op


func init() {
memprofile = flag.String("memprofile", "", "Write memory profile to this file")
cpuprofile = flag.String("cpuprofile", "", "Write cpu profile to file")
httpprof = flag.String("httpprof", "", "Start pprof http server")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This option needs added to the docs.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@monicasarbu
Copy link
Contributor

That's incredible. Great work, Steffen!

urso added 3 commits December 8, 2015 14:09
- added commandline flag '-httpprof' to start HTTP pprof endpoint for doing
  some live profiling of beat services (heap profiling, cpu profiling)

- add streambuf.AppendWithCapLimits which guarantees newly allocated
  buffers have at least configured capacity in case the buffer
  must grow when appending new content (excert more control on buffer sizing)
  => not used, as number of objects allocated was minor in contrast to
     other object types. Plus bad policy can increase overal memory usage
     dramatically due to pointers back into original slice

- add common.NetString basically being []byte. Difference to []byte is, this
  type implements the encoding.TextMarhsaller interface. NetString objects
  will be handled by JSON encoder like normal strings, thusly no allocation
  and copy is required when extracting strings from parsed messages.
  NetString being []byte will directly point into original message buffer.

  NetString also provides UnsafeString to cast a NetString into a string
  without allocation + copy

  => Using NetString over string reduce allocations + copy operations +
     less object to be GCed

- remove some unwanted allocation in async publisher client

- redis 'optimizations':
  - replace all string types with common.NetString to reduce allocations
  - pre-allocate command buffer holding command + 4 arguments by default
  - remove some more allocations + copies by 'low-level code'
  - remove transaction object type (not required anymore due to correlation
    supporting piping), but generate event type directly from correlated
    messages => reduces allocation once more + removes computation of unused
    fields
@ruflin
Copy link
Contributor

ruflin commented Dec 8, 2015

@urso As @travis-ci seems to be broken at the moment I suggest to rely on our Jenkins build.

andrewkroh added a commit that referenced this pull request Dec 8, 2015
Redis profiling experiment
@andrewkroh andrewkroh merged commit 8a45017 into elastic:master Dec 8, 2015
@urso urso deleted the prof/redis branch December 8, 2015 18:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss Issue needs further discussion. Packetbeat review
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants