-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove the trailing null from strings #7235
Comments
nominating for backwards compatibility |
CStrings, by definition, are null-terminated, so I don't see any reason to change the name away from |
@kballard because rust strings, as valid utf-8, can contain NUL. Thus, not all rust strings from as_c_str are the C string you would expect. Naming it away from as_c_str emphasizes this. |
I don't think we should have a shortcut for temporarily copying to a nul-terminated string at all. Anything using The most common usage of nul-termination is likely to be paths, but those can't be represented with our string type because they aren't always valid Unicode so it's not something the string type can deal with. In special cases where you really do want to abandon all hope of correctness, you can just push |
@cmr: Naming it something like |
I wrote this under issue #6869, but seeing how there is more discussion here I'm moving it here: Summary of the current situation
All this is done just for one reason: So that
Making
|
I really don't think it deserves even one method because it's completely incorrect to treat UTF-8 this way. We aren't using modified UTF-8, so we have to accept that it's not usable with C string APIs. Can someone point out a situation or use case where this would actually be correct/useful? |
@thestinger good point, maybe the whole api should be based around |
While UTF-8 allows NUL, most UTF-8 strings won't have interior NULs. And the fact of the matter is, it's awfully convenient to be able to construct a NUL-terminated C String when working with FFI functions. It would be nice not to have to copy the string to a For reference, in Cocoa, |
Perhaps we should just implement |
@kballard: so what's an example FFI API, where handling only a subset of strings correctly is still going to be okay? This kind of flaw can easily become a security vulnerability and I don't see why every string should be a byte longer just to make it easier to write broken code. Encryption, compression, I/O and file paths are all byte based so the string representation having a Nothing stops you from pushing a |
My general feeling here is that, once you'e dealing with FFI, you're already in unsafe land, so instead of trying to be opinionated about what is safe and what is not, we should just provide the tools to do what's useful. That said, the argument that making every string a byte longer when most of them are never going to touch FFI code (not that this is precisely what you said) is a good one. Although I wonder how much of an impact this will have in practice. |
I don't think it's useful though. I can't think of any case where it would be okay/useful to use strings for stuff that's not actually Unicode. I was also against having any ASCII-only methods (like the now removed If we ever make a small string class, the impact will be that strings under 10 bytes are on the stack instead of strings under 11 bytes on 32-bit, and 22 instead of 23 on 64-bit. It can also push them into the next allocator size class. |
@thestinger suggests that we look at haskell's
IMO this relates closely to having to figure out APIs for handling non-utf8 encodings. See #6164, #4837, etc. (there should be a metabug for overhauling unicode / absorbing libICU / something; but there isn't yet) |
…=flip1995 Fix another manual_unwrap_or deref FP changelog: none (since this just piggybacks on rust-lang#7233) Fixes rust-lang#6960
Rollup of 3 pull requests Successful merges: - rust-lang#7235 (Fix another manual_unwrap_or deref FP) - rust-lang#7237 (Add the command to add upstream remote) - rust-lang#7239 (CI: update rustup before installing the toolchain on windows) Failed merges: r? `@ghost` `@rustbot` modify labels: rollup changelog: rollup
It is unsafe to convert every
&[u8]
into a&str
, because&str
s need to have a valid byte after the end of the characters. This means that this pattern:is unsafe because
str::as_c_str
dereferences the next byte past the end of the slice. This is safe when working with a~str
as a&str
, because it has the trailing null. To do this safely, you need to copy the bytes:While we could work around this by marking
from_bytes_slice
as unsafe and documenting this unsafe behavior, it would be simplest to just remove the trailing null from strings and allow the end user to add the trailing null if they want it..as_c_str()
can still take the fast path if the string is null terminated, or allocate a temporary string otherwise.Some comments from the IRC discussion:
@graydon suggests renaming
.as_c_str()
to.as_null_terminated_c_str()
@cmr suggested
.as_null_terminated_str()
@bstrie suggested
.null_terminate()
@Kimundi mentioned a
.as_c_str()
that takes a&mut ~str
, adds a NULL, calls the closure, then removes the NULL before exiting.cc: #6869
The text was updated successfully, but these errors were encountered: