Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tun: use add-with-carry in checksumNoFold() #107

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

dinhngtu
Copy link

@dinhngtu dinhngtu commented Jul 13, 2024

Reopened with a fix for initial values.

This PR proposes a speedup for checksumNoFold().

Use parallel summation with native byte order per RFC 1071;
add-with-carry operation is used to add 4 words per operation.
Byteswap is performed before and after checksumming for compatibility with old checksumNoFold().
With this we get a 30-80% speedup in checksum() depending on packet sizes.

Add unit tests with comparison to a per-word implementation and the old big-endian implementation.

Intel(R) Xeon(R) Silver 4210R CPU @ 2.40GHz

Size OldTime NewTime Speedup
64 12.64 9.183 1.376456
128 18.52 12.72 1.455975
256 31.01 18.13 1.710425
512 54.46 29.03 1.87599
1024 102 52.2 1.954023
1500 146.8 81.36 1.804326
2048 196.9 102.5 1.920976
4096 389.8 200.8 1.941235
8192 767.3 413.3 1.856521
9000 851.7 448.8 1.897727
9001 854.8 451.9 1.891569

AMD EPYC 7352 24-Core Processor

Size OldTime NewTime Speedup
64 9.159 6.949 1.318031
128 13.59 10.59 1.283286
256 22.37 14.91 1.500335
512 41.42 24.22 1.710157
1024 81.59 45.05 1.811099
1500 120.4 68.35 1.761522
2048 162.8 90.14 1.806079
4096 321.4 180.3 1.782585
8192 650.4 360.8 1.802661
9000 706.3 398.1 1.774177
9001 712.4 398.2 1.789051

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

1 participant