-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rewrite representation to a sorted binary list and embed it #33
Conversation
aef8000
to
4324ff2
Compare
6dace2c
to
6e80636
Compare
This result in a > ls -l old new
-rwxr-xr-x 1 hugo hugo 10398323 Dec 19 02:40 old*
-rwxr-xr-x 1 hugo hugo 6954814 Dec 19 02:39 new* I think the checks keep failing because the source is serving versions of the dataset randomly. Trying with a locally curled file shows no change. |
6e80636
to
323ec8c
Compare
323ec8c
to
5e1ecaf
Compare
The binary file isn't very optimized. it still compress good (~4.5x using
We could solve this with some kind of tree which could dedup leading parts of the network. But this is also significantly more complicated. I think it's fine to belive distributors (dist.ipfs.tech, docker, ...) will want to compress the executable. Then theses savings can be seen at the whole executable compression stage. |
5e1ecaf
to
ba0de38
Compare
@prestonvanloon I would love for you to review this PR 🙏. |
We could also maybe optimize this better by truncating networks to All possible place to win a bit of space if you (future reader) feel this is needed. |
This is also significantly faster, new:
old:
|
6cba0a8
to
5c0269e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- agree this is probably the right level of compression / complexity in generation to aim for here now.
- a test to validate some sanity-checked network lookups on the generated ap would be good.
For reference, in playing with an ipv4 mapping of similar complexity doing some of the tree pruning complexity was able to get things down another ~3x - just doing this level had the v4 table at ~6mb and aggressive tree work gets it down to 2mb before gzip.
5c0269e
to
e458e30
Compare
@willscott there are already tests: https://github.com/libp2p/go-libp2p-asn-util/pull/33/files#diff-94c5a8df386d715f15a01b4c61e58fe44944002da5513c966a602c93aad118c2R11-R32 Is this what you had in mind ? |
I have an other branch which gets it down to the ~500KiB range before zstd but it is extremely significantly more complicated. |
Here are the compression ratios with the various levels of zstd:
I think this is fine and we don't need more aggressive tree packing. |
e458e30
to
f65e57d
Compare
This is now a binary format which is embeded as a string, this remove the need to run code at init or when lazy initializing. It also fixes a bug where we would incorrectly assume the dataset only reports CIDR alligned ranges, however it does not. This result in a `1.56MiB` binary file for the dataset. It is embedded as a string into the `rodata` part of the executable. This remove the runtime memory cost (as this is usually mmaped directly from disk) and all the init execution. It also removes some dependencies, in practice here is the difference (tested with `go test -c`): ```console > ls -l old new -rwxr-xr-x 1 hugo hugo 10398323 Dec 19 02:40 old* -rwxr-xr-x 1 hugo hugo 6954814 Dec 19 02:39 new* ``` This is also significantly faster, new: ``` goos: linux goarch: amd64 pkg: github.com/libp2p/go-libp2p-asn-util cpu: AMD Ryzen 5 3600 6-Core Processor BenchmarkAsnForIPv6-12 21599720 56.85 ns/op ``` old: ``` BenchmarkAsnForIPv6-12 9417902 126.3 ns/op ```
f65e57d
to
c11e9d5
Compare
Suggested version: Changes in diff --git a/go.mod b/go.mod
index eb3df07..7ac3e35 100644
--- a/go.mod
+++ b/go.mod
@@ -17,7 +17,7 @@ require (
github.com/multiformats/go-varint v0.0.5 // indirect
github.com/pmezard/go-difflib v1.0.0 // indirect
github.com/spaolacci/murmur3 v1.1.0 // indirect
- golang.org/x/crypto v0.0.0-20190611184440-5c40567a22f8 // indirect
- golang.org/x/sys v0.0.0-20190412213103-97732733099d // indirect
+ golang.org/x/crypto v0.1.0 // indirect
+ golang.org/x/sys v0.1.0 // indirect
gopkg.in/yaml.v2 v2.2.2 // indirect
)
Cutting a Release (and modifying non-markdown files)This PR is modifying both Automatically created GitHub ReleaseA draft GitHub Release has been created. |
I broke the API eventho I didn't meant to, I thought I was safe because the gorelease bot didn't reported anything but it seems to be wrong: #33 (comment) > gocompat says: > ``` > Your branch is up to date with 'origin/master'. > ```
I broke the API eventho I didn't meant to, I thought I was safe because the gorelease bot didn't reported anything but it seems to be wrong: #33 (comment) > gocompat says: > ``` > Your branch is up to date with 'origin/master'. > ```
func AsnForIPv6Network(network uint64) (asn uint32) { | ||
n := uint(len(dataset)) / entrySize | ||
var i, j uint = 0, n | ||
for i < j { | ||
h := (i + j) / 2 // wont overflow since the list can't be that large | ||
start, end, asn := readEntry(h) | ||
if start <= network { | ||
if network <= end { | ||
return asn | ||
} | ||
i = h + 1 | ||
} else { | ||
j = h | ||
} | ||
} | ||
|
||
return &asnStore{cr}, nil | ||
if i >= n { | ||
return 0 | ||
} | ||
start, end, asn := readEntry(i) | ||
if start <= network && network <= end { | ||
return asn | ||
} | ||
return 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - Thank you!
Include memory usage fixes from libp2p/go-libp2p-asn-util#33.
Include memory usage fixes from libp2p/go-libp2p-asn-util#33.
Required for libp2p/go-libp2p#2664
Closes #32
Closes #31
Closes #8
Closes #5
Closes #4
This is now a binary format which is embeded as a string, this remove the need to run code at init or when lazy initializing. It also fixes a bug where we would incorrectly assume the dataset only reports CIDR alligned ranges, however it does not.
Embedded size: 1.6MB
In-memory size: 0MB
Load into memory: 0ms
Lookup Performance: ~50ns (2.25x improvement).