faster `String` allocation #19449

JeffBezanson · 2016-11-29T16:09:03Z

This replaces the Array inside String with a length field, plus directly asking the GC to allocate space for the string data in-line. ~~This is very incomplete so far, and is only meant to demonstrate the basic idea.~~ The following gist has a hacked-up working version that can be used within an existing build: https://gist.github.com/JeffBezanson/6b7f1785bb7f2509cbd5d4ff1380556d

This makes string allocation ~2x faster, use 1 object instead of 2, and use around half the space for short strings. It's not pretty, but I think we should go with something like this to fix the immediate performance problem. Also removing the .data field as soon as possible will make it easier to change the representation in the future if we come up with something nicer.

yuyichao · 2016-11-29T16:39:21Z

base/strings/string.jl

-    d = s.data
-    i = length(d)
-    @inbounds while i > 0 && is_valid_continuation(d[i])
+    p = pointer(s)


We need a gcroot intrinsic with a semantic similar to ccall to make this usage gc safe.

Yep, that would be good.

Any idea what would be a good api? I'm mainly not sure about how to express the scope a gcroot applies to.....

For now we should make the scope the whole enclosing function.

quinnj · 2016-11-29T16:52:00Z

base/boot.jl

-    data::Array{UInt8,1}
-    # required to make String("foo") work (#15120):
-    String(d::Array{UInt8,1}) = new(d)
+type String <: AbstractString


Why not immutable?

I assume so that the compiler would pass it correctly. (It'll be a isbits type otherwise)

Correct; the presence of the string data is not reflected in the layout of the type, so it would not be passed correctly. Of course that's only the beginning of the problems caused by a hack like this, requiring special cases e.g. in deepcopy.

I assume you addressed the aforementioned "problems" including the special deepcopy implementation?

johnmyleswhite · 2016-11-30T03:50:34Z

base/strings/basic.jl

@@ -57,8 +57,8 @@ function unsafe_string(p::Union{Ptr{UInt8},Ptr{Int8}})
    ccall(:jl_cstr_to_string, Ref{String}, (Ptr{UInt8},), p)
 end

-convert(::Type{Vector{UInt8}}, s::AbstractString) = String(s).data
-convert(::Type{Array{UInt8}}, s::AbstractString) = String(s).data
+#convert(::Type{Vector{UInt8}}, s::AbstractString) = String(s).data


Will there be some mechanism to get a vector of bytes equivalent to a copy of the string after this is done?

Yes, that functionality is still pretty useful.

Yes, we can definitely still have that. It might even be possible for an array to share string data in-place.

yuyichao

I think this also needs a special case in the gc push_root before the pointerfree case. so that gc_setmark will see the right size.

JeffBezanson · 2016-12-19T23:30:18Z

Some interesting issues so far:

chomp!: changing the size of a string in place is not safe, since that can confuse the GC about how the object was allocated.
We need non-copying vector-to-string in IOBuffer. Some options are (1) keep 16 bytes available at the start of an IOBuffer's vector so its data can be reinterpreted as a String, or (2) use a String as the storage for an IOBuffer. Both probably have the same problem as chomp!. We might need to use a bit of the length field to mark whether the string was pool allocated.

JeffBezanson · 2016-12-20T22:03:38Z

Next issue: this representation does not support unsafe_wrap. Wrapping an arbitrary pointer inherently requires an extra indirection, which I suspect is not worth it. We could also potentially add a PointerString type somewhere that does this.

JeffBezanson · 2016-12-21T23:03:39Z

This has passed CI and is ready for review.

The last commit implements the following scheme for copy-avoidance when converting between strings and vectors:

A String can be wrapped by a Vector, since Arrays already have the ability to wrap pointers owned by other objects. Accessed by convert(Vector{UInt8}, str). This way you can still access the data as a vector without copying the whole string, only paying the cost to allocate the Vector when needed.
jl_gc_alloc_buf returns objects with String-compatible layout; there is an extra word to hold the length. IF a Vector contains such a buffer, it can be converted in-place to a String. This does not apply to all arrays. We need some way to decide when to use this mode of allocation. For now, I'm using it for all byte arrays, since they often become Strings. This can be discussed.

I'm seeing >=30% speedups for #19141, and the spellcheck benchmark.

We need to decide what to do about unsafe_wrap(String, p). We can remove it, or deprecate it to e.g. unsafe_wrap(ArrayString, p).

stevengj · 2016-12-21T23:17:57Z

Is there a deprecation mechanism for s.data, rather than giving an error? (This would be possible if . could be overloaded, nudge nudge.)

stevengj · 2016-12-21T23:18:46Z

base/LineEdit.jl

@@ -437,7 +437,7 @@ function splice_buffer!{T<:Integer}(buf::IOBuffer, r::UnitRange{T}, ins::Abstrac
    elseif pos > last(r)
        seek(buf, pos - length(r))
    end
-    splice!(buf.data, r + 1, ins.data) # position(), etc, are 0-indexed
+    splice!(buf.data, r + 1, convert(Vector{UInt8},ins)) # position(), etc, are 0-indexed


Would be more concise if we could just do Vector(ins).

stevengj · 2016-12-21T23:20:11Z

base/boot.jl

@@ -368,8 +372,8 @@ unsafe_write(io::IO, x::Ptr{UInt8}, nb::Int) =
 write(io::IO, x::UInt8) =
    (ccall(:jl_uv_putb, Void, (Ptr{Void}, UInt8), io_pointer(io), x); 1)
 function write(io::IO, x::String)
-    nb = sizeof(x.data)
-    unsafe_write(io, ccall(:jl_array_ptr, Ptr{UInt8}, (Any,), x.data), nb)
+    nb = x.len


Presumably sizeof(x) is defined and would be cleaner than accessing the .len field directly?

stevengj · 2016-12-21T23:38:24Z

base/datafmt.jl

    all_ascii = (D <: UInt8) || (isascii(eol) &&
                                 isascii(dlm) &&
                                 (!allow_quote || isascii(qchar)) &&
                                 (!allow_comments || isascii(cchar)))
-    if T === String && all_ascii
-        return dlm_parse(dbuff.data, eol % UInt8, dlm % UInt8, qchar % UInt8, cchar % UInt8,
+    if all_ascii && !(D <: UInt8)


D != UInt8? Why this additional condition?

I don't know. The whole all_ascii business at the top of this function can probably be removed.

stevengj · 2016-12-21T23:46:06Z

base/strings/util.jl

@@ -446,7 +448,7 @@ end
 # check for pure ASCII-ness

 function ascii(s::String)
-    for (i, b) in enumerate(s.data)
+    for (i, b) in enumerate(bytes(s))


What is bytes? I thought we were now using convert(Vector{UInt8}, s) or maybe Vector(s) or Vector{UInt8}(s)?

It's an iterator over bytes that I was trying out. I guess it's not really necessary.

stevengj · 2016-12-21T23:48:05Z

base/util.jl

@@ -576,4 +576,4 @@ a starting `crc` integer to be mixed in with the checksum.
 """
 function crc32c end
 crc32c(a::Array{UInt8}, crc::UInt32=0x00000000) = ccall(:jl_crc32c, UInt32, (UInt32, Ptr{UInt8}, Csize_t), crc, a, sizeof(a))
-crc32c(s::String, crc::UInt32=0x00000000) = crc32c(s.data, crc)
+crc32c(s::String, crc::UInt32=0x00000000) = ccall(:jl_crc32c, UInt32, (UInt32, Ptr{UInt8}, Csize_t), crc, s, sizeof(s))


Maybe just use Union{Array{UInt8},String} for the first argument in a single method?

stevengj · 2016-12-21T23:55:20Z

Needs NEWS and probably doc updates.

yuyichao · 2016-12-22T01:49:18Z

src/array.c

+        tsz = JL_ARRAY_ALIGN(tsz, JL_CACHE_BYTE_ALIGNMENT); // align whole object
+        a = (jl_array_t*)jl_gc_alloc(ptls, tsz, atype);
+        JL_GC_PUSH1(&a);
+        data = jl_gc_alloc_buf(ptls, tot);


I believe the how needs to be initialized to 0 first before calling into the GC again so that the marking code won't try to trace a dangling pointer.

vtjnash · 2016-12-22T04:15:31Z

src/julia_internal.h

 #define jl_buff_tag ((uintptr_t)0x4eade800)
 STATIC_INLINE void *jl_gc_alloc_buf(jl_ptls_t ptls, size_t sz)
 {
-    return jl_gc_alloc(ptls, sz, (void*)jl_buff_tag);
+    void **buf = (void**)jl_gc_alloc(ptls, sz + sizeof(void*), (void*)jl_buff_tag);
+    return (void*)(buf + 1);


isn't this mis-aligning the buf pointer?

Yes, I suppose so. I guess we should only do this for elsz==1.

tkelman · 2016-12-22T05:23:54Z

Should accessing the array be a method of convert, reinterpret, or transcode? There was some resistance to using convert between strings and arrays-of-integers during the last string refactoring.

JeffBezanson · 2016-12-22T05:34:59Z

We've had a conversion from String to Vector{UInt8} that returns .data for quite a while, so I went with that.

JeffBezanson · 2016-12-22T16:26:27Z

Ah, I see; using convert contradicts the idea that Vector(x) should do what collect does. reinterpret seems like the right function but is pretty verbose.

stevengj · 2016-12-22T16:39:53Z

We could define a generic function bytes(x) = reinterpret(UInt8, x), which gives us a concise method for strings and also might be useful elsewhere?

JeffBezanson · 2017-01-05T01:50:09Z

Does anybody understand the OSX failure here?

tkelman · 2017-01-05T07:27:02Z

segfaulting with --precompiled=no

JeffBezanson · 2017-01-05T17:03:19Z

That's weird. I can't reproduce that on linux.

JeffBezanson · 2017-01-06T20:47:34Z

Looks like it was a stack overflow loading the system image. I'll try a different approach to serializing the new String type.

JeffBezanson · 2017-01-07T02:53:28Z

OK, that didn't help.

tkelman · 2017-01-08T00:29:22Z

Should be good to merge once appveyor's happy, but since I think it's idle, let's see if anything recent made much difference: @nanosoldier runbenchmarks(ALL, vs = ":master")

nanosoldier · 2017-01-08T04:32:35Z

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @jrevels

JeffBezanson · 2017-01-08T06:07:29Z

Looking good. Away we go!

vtjnash · 2017-01-08T14:47:09Z

src/dump.c

+            arraylist_push(&backref_list, v);
+        jl_deserialize_struct(s, v, 1);
+        ((jl_typemap_entry_t*)v)->next = (jl_typemap_entry_t*)jl_nothing;
+        *pn = v;


missing gc_wb

I thought the gc is off here?

You don't need to create a gc stack but it is still necessary to use write barrier since there can still be old and young objects.

Doesn't a collection have to happen for an object to become old?

Yes, so if you are sure that the parent is always allocated in the same "deserialization session" it should be enough with a comment about it. Looks like it should be fine here since *pn is pointing to either the stack or a v allocated in a previous iteration?

…#19449).

…21993)

…21993) (cherry picked from commit a4bafff)

JeffBezanson force-pushed the jb/fasterstring branch 2 times, most recently from 9b881ff to 72ffe2d Compare November 29, 2016 16:14

JeffBezanson added breaking This change will break code performance Must go faster strings "Strings!" labels Nov 29, 2016

yuyichao reviewed Nov 29, 2016

View reviewed changes

quinnj reviewed Nov 29, 2016

View reviewed changes

johnmyleswhite reviewed Nov 30, 2016

View reviewed changes

yuyichao requested changes Nov 30, 2016

View reviewed changes

JeffBezanson force-pushed the jb/fasterstring branch 2 times, most recently from 133d41d to d72ffe2 Compare December 21, 2016 19:10

JeffBezanson changed the title ~~WIP/RFH: possible simple approach to faster String allocation~~ RFC: faster String allocation Dec 21, 2016

JeffBezanson force-pushed the jb/fasterstring branch from d72ffe2 to 8ae0e85 Compare December 21, 2016 20:29

stevengj reviewed Dec 21, 2016

View reviewed changes

stevengj added the needs docs Documentation for this change is required label Dec 21, 2016

yuyichao requested changes Dec 22, 2016

View reviewed changes

vtjnash reviewed Dec 22, 2016

View reviewed changes

JeffBezanson force-pushed the jb/fasterstring branch 2 times, most recently from 81e4251 to 8ea9828 Compare January 4, 2017 19:57

JeffBezanson force-pushed the jb/fasterstring branch from 8ea9828 to 30a7437 Compare January 5, 2017 06:04

JeffBezanson force-pushed the jb/fasterstring branch from 30a7437 to a72588a Compare January 6, 2017 20:47

JeffBezanson added 4 commits January 6, 2017 18:27

faster, smaller String allocation

e62352b

make tests independent of string representation

79bb7be

add the ability to share data between Strings and Vector{UInt8}s

ce24657

deprecate unsafe_wrap for String

c01a2cf

JeffBezanson force-pushed the jb/fasterstring branch from a72588a to c01a2cf Compare January 6, 2017 23:27

(de)serialize TypeMapEntry lists iteratively to save stack space

8c687da

JeffBezanson merged commit cfac61d into master Jan 8, 2017

tkelman deleted the jb/fasterstring branch January 8, 2017 08:14

josefsachsconning mentioned this pull request Jan 8, 2017

ERROR: type String has no field data JuliaIO/JSON.jl#183

Closed

vtjnash reviewed Jan 8, 2017

View reviewed changes

mpastell mentioned this pull request Jan 9, 2017

RFC: Make EOL handling more general #19877

Closed

5 tasks

stevengj mentioned this pull request Jan 9, 2017

Document/export copy-free string allocation? #19945

Open

Sacha0 added deprecation This change introduces or involves a deprecation needs news A NEWS entry is required for this change labels May 20, 2017

Sacha0 added a commit to Sacha0/julia that referenced this pull request May 20, 2017

Add NEWS.md entry for unsafe_wrap(String, ...) deprecation (JuliaLang…

ea32daa

…#19449).

tkelman pushed a commit that referenced this pull request May 24, 2017

Add NEWS.md entry for unsafe_wrap(String, ...) deprecation (#19449). (#…

a4bafff

…21993)

Sacha0 removed the needs news A NEWS entry is required for this change label May 25, 2017

tkelman pushed a commit that referenced this pull request Jun 3, 2017

Add NEWS.md entry for unsafe_wrap(String, ...) deprecation (#19449). (#…

9851e79

…21993) (cherry picked from commit a4bafff)

faster String allocation #19449

faster String allocation #19449

Conversation

JeffBezanson commented Nov 29, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yuyichao left a comment

Choose a reason for hiding this comment

JeffBezanson commented Dec 19, 2016

JeffBezanson commented Dec 20, 2016

JeffBezanson commented Dec 21, 2016

stevengj commented Dec 21, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

stevengj commented Dec 21, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tkelman commented Dec 22, 2016

JeffBezanson commented Dec 22, 2016

JeffBezanson commented Dec 22, 2016

stevengj commented Dec 22, 2016

JeffBezanson commented Jan 5, 2017

tkelman commented Jan 5, 2017

JeffBezanson commented Jan 5, 2017

JeffBezanson commented Jan 6, 2017

JeffBezanson commented Jan 7, 2017

tkelman commented Jan 8, 2017

nanosoldier commented Jan 8, 2017

JeffBezanson commented Jan 8, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

faster `String` allocation #19449

faster `String` allocation #19449

JeffBezanson commented Nov 29, 2016 •

edited

Loading