Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Indexing StdString returns Int8 which breaks most basic string functions #363

Open
Keluaa opened this issue Jun 22, 2023 · 3 comments
Open

Comments

@Keluaa
Copy link
Contributor

Keluaa commented Jun 22, 2023

When indexing a StdString, a Int8 is returned, which is different from the indexing of Base.String, which returns a Char.

However, as mentioned in the AbstractString documentation, indexing a string should return an AbstractChar.

Therefore most functions operating on strings are broken:

julia> jl_s = "test"; cxx_s = CxxWrap.StdString(jl_s);

julia> jl_s == cxx_s  # comparison is explicitly defined by CxxWrap
true

julia> jl_s[1] == cxx_s[1]  # Internally compares a Char with a Int8
false

julia> occursin(jl_s, cxx_s)
false

julia> isnothing(findfirst('t', cxx_s))
true

I find this quite confusing, since StdString <: AbstractString, Julia happily treats cxx_s as a string yet fails to do any operations on it. Printing gives the correct string representation, == works as expected, but all else breaks.

Furthermore, UTF-8 characters completly break StdString:

julia> jl_s = "∘test"; cxx_s = CxxWrap.StdString(jl_s);

julia> length(jl_s)
5

julia> lengh(cxx_s)
7

julia> foreach(print, jl_s)
test

julia> foreach(print, cxx_s)
â

julia> jl_s[1], cxx_s[1]
('', -30)

julia> cxx_s[2]
-120
Keluaa added a commit to Keluaa/Kokkos.jl that referenced this issue Jun 22, 2023
@fingolfin
Copy link
Contributor

I think the best way to fix this is to change StdString to not inherit from AbstractString. The semantics of the two just don't fit; std::string doesn't support UTF-8 directly in any way; trying to make it work as an AbstractString would require implementing a lot of code, and introduce all sorts of inefficiencies.

@Keluaa
Copy link
Contributor Author

Keluaa commented Jun 22, 2023

I would agree that returning a non-AbstractString would make the most sense.

I should mention that there is ways to work with UTF-8 in C/C++. A familiar person wrote an article about it, but it is still quite a lot of work to correctly support StdString.

IMO, simply returning a Vector{UInt8} would already be enough. Or maybe a subtype AbstractVector{UInt8} which could be printed as a string. std::string is usually used when making copies of the string is not a problem, therefore needing to explicitly convert to String shouldn't be an issue.

barche added a commit that referenced this issue Aug 27, 2023
Issue Indexing `StdString` returns `Int8` which breaks most basic string functions #363
barche added a commit to JuliaInterop/libcxxwrap-julia that referenced this issue Aug 27, 2023
@barche
Copy link
Collaborator

barche commented Aug 27, 2023

I think this is actually fixable, at least the examples shown above pass with the two commits mentioned above. It is unfortunate that at the AbstractString level unicode seems to be specified now, std::string will of course never have that. The objective is not to have StdString behave exactly the same as a Julia String, but keep the ability to pass it around transparently for simple cases, e.g. non-unicode filenames should be OK.

barche added a commit to JuliaInterop/libcxxwrap-julia that referenced this issue Aug 27, 2023
barche added a commit that referenced this issue Sep 3, 2023
Issue Indexing `StdString` returns `Int8` which breaks most basic string functions #363
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants