-
Notifications
You must be signed in to change notification settings - Fork 840
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use unsafe to speed up string marshaling #6433
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #6433 +/- ##
============================================
- Coverage 90.87% 90.83% -0.05%
- Complexity 6169 6197 +28
============================================
Files 678 680 +2
Lines 18507 18568 +61
Branches 1818 1829 +11
============================================
+ Hits 16819 16866 +47
- Misses 1153 1160 +7
- Partials 535 542 +7 ☔ View full report in Codecov by Sentry. |
What will happen with java 22, when Unsafe is deprecated? |
Looking at https://openjdk.org/jeps/471 I'd say nothing will change before jdk 25. As unsafe usage is optional even then not much will change. Eventually users may need to opt in by allowing access to private members of java.lang.String so we could read the underlying byte array. Java agent can probably get access without requiring users to do anything. In coming jdk releases using vector api will allow us to further improve the performance. |
We definitely would not want to force people to do any additional work to use the SDK out of the box. For that reason, I'm a bit nervous about making this optimization. |
I believe the optimization only takes effect if Unsafe is available, @laurit can you clarify? |
Yes the intent is to use this optimization only when unsafe is available and has the required methods, otherwise just fall back to the version that doesn't require unsafe. If we need to further future proof it we could consider limiting it by jvm version. If
Given the way jdk authors have been trying to lock down access to jdk internals since jdk 9 at one point user code won't be able to access private fields in jdk classes unless user explicitly allows it with |
Interesting idea! I'll skip the obvious (Unsafe is unsafe, and has a limited shelf life at this point) and get right into the gritty details: If I've understood it right, there are at least three distinct things here, which are conflated in the presentation of the benchmark data and which may be interesting to consider separately:
|
These changes are already done and merged. I think they are out of the scope of this PR.
They aren't, they were copied from the existing span marshaling benchmark that creates test data with either 16 or 512 spans. This benchmark was added in a previous PR that introduced stateless marshalers. I think I removed 16 just to make the benchmark run faster.
If you look at the jdk code https://github.com/openjdk/jdk17u/blob/68caeca1a5f265ca30a7e6d2a62127d458a3473e/src/java.base/share/classes/java/lang/StringCoding.java#L37 you'll see that the method with the simple loop is using
The 24 bytes should come from |
My understanding is that there are two dimensions. 1. Whether or not the stateless marshaling is enabled, which is triggered by setting
So you only get the 0.032 us/op case if you opt in @laurit any indea why the ascii stateless/safe scenario seems to be so much worse that the latin1 or unicode equivalent? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couple of comments but I'm in favor of this. I get that unsafe is unsafe, but:
- We already have precedent for using it with
BatchSpanProcessor
- This is special case where we can benefit quite a bit from its usage
- The usage is limited to a small area of code which is well defined / contained, with fallback logic when unsafe is not available
...mon/src/test/java/io/opentelemetry/exporter/internal/marshal/StatelessMarshalerUtilTest.java
Show resolved
Hide resolved
return encodedUtf8Length(string); | ||
} | ||
|
||
/** Returns the count of bytes with negative value. */ | ||
private static int countNegative(byte[] bytes) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Took me a bit to parse what's going on here but I think I get it:
- We want to use unsafe to read the value of a string's internal
byte[]
directly when a string is composed of all latin1 characters that fit within 7 bits (first 128 code points) - In order to determine if a string meets this condition as fast as possible, we use unsafe and some clever (seriously clever - where did you come up with this??) shifting to read through the string in chunks of 8 bytes, counting the number of times when a byte has a 1 in the most significant bit and therefore doesn't fit in 7 bits. Reading 8 bytes at a time is faster than reading 1 byte at a time, which is what we do currently when trying to compute the size of a string with
reusable_data
memory mode. - Later, when we are serializing the string, we check if the string is latin1 and had a size which was the same as the string length. If true, we can use unsafe to write the string's internal
byte[]
to the output stream. If false, then fallback to the existing serialization logic.
Does this sound like a correct characterization @laurit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this sound like a correct characterization @laurit?
yes
We want to use unsafe to read the value of a string's internal byte[] directly when a string is composed of all latin1 characters that fit within 7 bits (first 128 code points)
when we read they byte array we don't know whether it is going to be 7bit or not
where did you come up with this?
Typically such loops are sped up by using vector instructions that allow processing multiple bytes simultaneously. Processing bytes on long a time is similar to using vector instructions. It would be interesting to see what effect there would be when this method is written using the vector api preview feature in recent jdks.
If true, we can use unsafe to write the string's internal byte[] to the output stream
That part doesn't use unsafe, just plain System.arraycopy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typically such loops are sped up by using vector instructions that allow processing multiple bytes simultaneously. Processing bytes on long a time is similar to using vector instructions. It would be interesting to see what effect there would be when this method is written using the vector api preview feature in recent jdks.
Fascinating!
.../common/src/main/java/io/opentelemetry/exporter/internal/marshal/StatelessMarshalerUtil.java
Outdated
Show resolved
Hide resolved
.../common/src/main/java/io/opentelemetry/exporter/internal/marshal/StatelessMarshalerUtil.java
Outdated
Show resolved
Hide resolved
@@ -14,6 +14,7 @@ dependencies { | |||
api(project(":sdk-extensions:autoconfigure-spi")) | |||
|
|||
compileOnly(project(":sdk:common")) | |||
compileOnly(project(":exporters:common:compile-stub")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering if you can explain why a separate module is needed for this, even tho its not published.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I try to compile a class that uses Unsafe
with javac from jdk 17 targeting jdk 8 it fails with
javac --release 8 Test.java
Test.java:3: error: package sun.misc does not exist
System.err.println(sun.misc.Unsafe.class.getName());
curiously when targeting jdk 11 it gets a warning instead
javac --release 11 Test.java
Test.java:3: warning: Unsafe is internal proprietary API and may be removed in a future release
System.err.println(sun.misc.Unsafe.class.getName());
Not using --release 8
also results in a warning
javac --source 8 --target 8 Test.java
warning: [options] bootstrap class path not set in conjunction with -source 8
Test.java:3: warning: Unsafe is internal proprietary API and may be removed in a future release
System.err.println(sun.misc.Unsafe.class.getName());
Creating a copy of Unsafe
just for compiling felt like the easiest way to make this work. There is a comment in our version of Unsafe
/**
* sun.misc.Unsafe from the JDK isn't found by the compiler, we provide out own trimmed down version
* that we can compile against.
*/
I could add a similar comment here also. Is this what you were looking for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that would be helpful.
yes, my concern is the future point where unsafe is retired or discouraged in the JDK and reusable_data is normal. The gc saving is nice if gc is a problem e.g. single core containers, but where gc is low cost e.g. enough spare cores to run it in the background, then the added CPU cost of the stateless serialization on the critical path by default may be unwelcome. As long as stateful is still available as an option it's not a big deal, the default can be tweaked fairly easily. |
perhaps, but I'm not clear what's allowing it to elide the allocation in some cases but not others. If the code was polymorphic on the ascii vs latin1/unicode choice then it could be optimizing the two functions differently, but I don't see where it is. Log the compiler execution trace perhaps? It's not bad, it's just annoyingly inexplicable. |
My understand is that this is because JDK optimizes for the ascii case. See https://github.com/openjdk/jdk17/blob/4afbcaf55383ec2f5da53282a1547bac3d099e9d/src/java.base/share/classes/java/lang/String.java#L1264 if (!StringCoding.hasNegatives(val, 0, val.length))
return Arrays.copyOf(val, val.length); The |
exporters/common/src/main/java/io/opentelemetry/exporter/internal/marshal/UnsafeAccess.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One last comment about providing a property for explicit opt out. Seems like a conservative way to protect against any user concerns or issues.
To marshal strings without allocating extra memory we need to implement our own methods for estimating the byte size of utf-8 encoded string and encoding the string to utf8. Unfortunately jit compiler isn't able to optimize these too well. This PR explores using unsafe to speed up utf8 encoding. The focus is on strings that use only latin1 and are stored inside a byte array in the
java.lang.String
class since jdk9. Unsafe is used to access that byte array. Based on the byte array we determine whether the string is composed only of 7-bit characters, if so we can copy they byte array directly to output as no additional encoding is needed. To determine whether the string only has 7-bit characters we count bytes that have negative value, the count of such bytes corresponds to characters that would be encoded as 2 bytes in utf8. Unsafe is used to let this loop process 8 bytes at a time.