-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
replace
does not handle multiple patterns for String
#35327
Comments
I thought of using a recursive solution: Unfortunately, I've run into a problem. julia> function replace(str::String, old_new::Pair...)
for pair in old_new
str = replace(str, pair)
end
str
end However, it encountered this error. julia> replace("Keno", 'e'=>"abc", 'f'=>'g', 'a'=>'z')
"Kzbcno" I would have assumed that the desired output would be "Kabcno", but I'm not sure. It raises the question; what should be the precedence of replacements? I'll keep thinking on it and get back to you. |
For the Char replacement case, it should definitely apply the patterns only in one go. |
Got it. How's this? julia> function replace(str::String, old_news::Pair...)
out::String, mapping::Dict{Char,Any} = "", Dict(old_news)
for c in str
if c in keys(mapping)
out *= mapping[c]
else
out *= c
end
end
out
end
replace (generic function with 10 methods)
julia> replace("Keno", 'e'=>"abc", 'f'=>'g', 'a'=>'z')
"Kabcno" |
It has the correct interface, probably, but a complete implementation needs to look a bit different, as that implementation has O(n^2) performance characteristics. There's also a question of how exactly it integrates with the rest of the generic API. |
I could speed it up to linear time, if it made a set of the keys in advance, but would that be good enough to solve the integration problem? |
I don't understand the error message here, either:
Doesn't the signature match the first candidate? |
It turns out, that first candidate only throws a method error on the arguments. From base/set.jl:584: replace(a::AbstractString, b::Pair, c::Pair) = throw(MethodError(replace, (a, b, c))) |
Is that a good idea, or a common pattern? It's very confusing, and quite frustrating. |
It doesn't seem like a good idea indeed (at least in general). IIRC in this case this is to handle ambiguities which arise when this method is not defined. Maybe throwing an |
I might be wrong but doesn't the call to |
edit: removed old post with code snippet, turned it to a PR see @ #35414 @sudo-rushil yes, this implementation is not the fastest, but it guarantees correct semantics. |
I don't think this is a good first issue. The basic difficulty is that it's quite hard to implement this well — that's why there have been multiple rounds of discussion on this feature. We don't want the Ideally you would encapsulate the search data in a data structure, e.g. Probably this is better implemented first in a package. |
This has been attempted before, sometimes fairly similar to this, but the attempts seemed to be either too simple or too complicated. This aims to be simple, and even beats one of the "handwritten" benchmark cases. Past issues (e.g. JuliaLang#25396) have proposed that using Regex may be faster, but in my tests, this handily bests even simplified regexes. There can be slow Regexes patterns that can cause this to exhibit O(n^2) behavior, but only if the one of the earlier patterns is a partial match for a later pattern Regex and that Regex always matches O(n) of the input stream. This is a case that is hopefully usually avoidable in practice. fixes JuliaLang#35327 fixes JuliaLang#39061 fixes JuliaLang#35414 fixes JuliaLang#29849 fixes JuliaLang#30457 fixes JuliaLang#25396
This has been attempted before, sometimes fairly similar to this, but the attempts seemed to be either too simple or too complicated. This aims to be simple, and even beats one of the "handwritten" benchmark cases. Past issues (e.g. #25396) have proposed that using Regex may be faster, but in my tests, this handily bests even simplified regexes. There can be slow Regexes patterns that can cause this to exhibit O(n^2) behavior, but only if the one of the earlier patterns is a partial match for a later pattern Regex and that Regex always matches O(n) of the input stream. This is a case that is hopefully usually avoidable in practice. fixes #35327 fixes #39061 fixes #35414 fixes #29849 fixes #30457 fixes #25396
This has been attempted before, sometimes fairly similar to this, but the attempts seemed to be either too simple or too complicated. This aims to be simple, and even beats one of the "handwritten" benchmark cases. Past issues (e.g. JuliaLang#25396) have proposed that using Regex may be faster, but in my tests, this handily bests even simplified regexes. There can be slow Regexes patterns that can cause this to exhibit O(n^2) behavior, but only if the one of the earlier patterns is a partial match for a later pattern Regex and that Regex always matches O(n) of the input stream. This is a case that is hopefully usually avoidable in practice. fixes JuliaLang#35327 fixes JuliaLang#39061 fixes JuliaLang#35414 fixes JuliaLang#29849 fixes JuliaLang#30457 fixes JuliaLang#25396
This has been attempted before, sometimes fairly similar to this, but the attempts seemed to be either too simple or too complicated. This aims to be simple, and even beats one of the "handwritten" benchmark cases. Past issues (e.g. JuliaLang#25396) have proposed that using Regex may be faster, but in my tests, this handily bests even simplified regexes. There can be slow Regexes patterns that can cause this to exhibit O(n^2) behavior, but only if the one of the earlier patterns is a partial match for a later pattern Regex and that Regex always matches O(n) of the input stream. This is a case that is hopefully usually avoidable in practice. fixes JuliaLang#35327 fixes JuliaLang#39061 fixes JuliaLang#35414 fixes JuliaLang#29849 fixes JuliaLang#30457 fixes JuliaLang#25396
This is surprising, because it doesn't match the API of
replace
on generic collections:Now,
replace
on string is a bit special, because it can takeString
s and Regexes also, as patterns, and perhaps we should only allow a single pattern for those cases (because otherwise the order of replacements is ambiguous), but I think it should work forChar
, and at least give a better error message for multiple patterns of another type.The text was updated successfully, but these errors were encountered: