Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

replace does not handle multiple patterns for String #35327

Closed
Keno opened this issue Mar 31, 2020 · 13 comments
Closed

replace does not handle multiple patterns for String #35327

Keno opened this issue Mar 31, 2020 · 13 comments
Labels
strings "Strings!"

Comments

@Keno
Copy link
Member

Keno commented Mar 31, 2020

julia> replace("Keno", 'e'=>"abc", 'f'=>'g', 'a'=>'z')
ERROR: MethodError: no method matching similar(::String, ::Type{Any})
Closest candidates are:
  similar(::Array{T,1}, ::Type) where T at array.jl:357
  similar(::Array{T,2}, ::Type) where T at array.jl:358
  similar(::Array, ::Type, ::Tuple{Vararg{Int64,N}}) where N at array.jl:360
  ...
Stacktrace:
 [1] _similar_or_copy(::String, ::Type{Any}) at ./set.jl:415
 [2] replace(::String, ::Pair{Char,String}, ::Vararg{Pair,N} where N; count::Nothing) at ./set.jl:530
 [3] replace(::String, ::Pair{Char,String}, ::Pair{Char,Char}, ::Pair{Char,Char}) at ./set.jl:527
 [4] top-level scope at REPL[12]:1

This is surprising, because it doesn't match the API of replace on generic collections:

julia> replace([1,2,3],1=>4,2=>5,3=>6)
3-element Array{Int64,1}:
 4
 5
 6

Now, replace on string is a bit special, because it can take Strings and Regexes also, as patterns, and perhaps we should only allow a single pattern for those cases (because otherwise the order of replacements is ambiguous), but I think it should work for Char, and at least give a better error message for multiple patterns of another type.

@Keno Keno added good first issue Indicates a good issue for first-time contributors to Julia strings "Strings!" labels Mar 31, 2020
@sudo-rushil
Copy link

I thought of using a recursive solution: Unfortunately, I've run into a problem.

julia> function replace(str::String, old_new::Pair...)
              for pair in old_new
                  str = replace(str, pair)
              end
              str
          end

However, it encountered this error.

julia> replace("Keno", 'e'=>"abc", 'f'=>'g', 'a'=>'z')
"Kzbcno"

I would have assumed that the desired output would be "Kabcno", but I'm not sure. It raises the question; what should be the precedence of replacements? I'll keep thinking on it and get back to you.

@Keno
Copy link
Member Author

Keno commented Apr 1, 2020

For the Char replacement case, it should definitely apply the patterns only in one go.

@sudo-rushil
Copy link

Got it. How's this?

julia> function replace(str::String, old_news::Pair...)
           out::String, mapping::Dict{Char,Any} = "", Dict(old_news)
           for c in str
               if c in keys(mapping)
                   out *= mapping[c]
               else
                   out *= c
               end
           end
           out
       end
replace (generic function with 10 methods)

julia> replace("Keno", 'e'=>"abc", 'f'=>'g', 'a'=>'z')
"Kabcno"

@Keno
Copy link
Member Author

Keno commented Apr 1, 2020

It has the correct interface, probably, but a complete implementation needs to look a bit different, as that implementation has O(n^2) performance characteristics. There's also a question of how exactly it integrates with the rest of the generic API.

@sudo-rushil
Copy link

I could speed it up to linear time, if it made a set of the keys in advance, but would that be good enough to solve the integration problem?

@tkf
Copy link
Member

tkf commented Apr 1, 2020

FYI previous discussions:
#30457
#25396

@DNF2
Copy link

DNF2 commented Apr 7, 2020

I don't understand the error message here, either:

julia> replace("hello", ('l'=>'n'), ('o'=>'r'))
ERROR: MethodError: no method matching replace(::String, ::Pair{Char,Char}, ::Pair{Char,Char})
Closest candidates are:
  replace(::AbstractString, ::Pair, ::Pair) at set.jl:583
  replace(::Any, ::Pair...; count) at set.jl:526
  replace(::Union{Function, Type}, ::Pair, ::Pair; count) at set.jl:582

Doesn't the signature match the first candidate?

@sudo-rushil
Copy link

It turns out, that first candidate only throws a method error on the arguments. From base/set.jl:584:

replace(a::AbstractString, b::Pair, c::Pair) = throw(MethodError(replace, (a, b, c)))

@DNF2
Copy link

DNF2 commented Apr 7, 2020

Is that a good idea, or a common pattern? It's very confusing, and quite frustrating.

@rfourquet
Copy link
Member

Is that a good idea, or a common pattern? It's very confusing, and quite frustrating.

It doesn't seem like a good idea indeed (at least in general). IIRC in this case this is to handle ambiguities which arise when this method is not defined. Maybe throwing an ArgumentError or another exception would be better (disclaimer: I believe to be the one to blame for this ugly throw(MethodError(...)) ;-) )

@sudo-rushil
Copy link

I might be wrong but doesn't the call to findfirst have linear complexity?

@francescoalemanno
Copy link
Contributor

francescoalemanno commented Apr 8, 2020

edit: removed old post with code snippet, turned it to a PR see @ #35414

@sudo-rushil yes, this implementation is not the fastest, but it guarantees correct semantics.

@stevengj stevengj removed the good first issue Indicates a good issue for first-time contributors to Julia label Aug 29, 2020
@stevengj
Copy link
Member

stevengj commented Aug 29, 2020

I don't think this is a good first issue. The basic difficulty is that it's quite hard to implement this well — that's why there have been multiple rounds of discussion on this feature. We don't want the Base version to be the one you use if you don't care about performance.

Ideally you would encapsulate the search data in a data structure, e.g. Replacer(pat1=>repl1, pat2=>repl2, ...) so that you can re-use the costly setup (constructing a Regex for the patterns and a Dict for the replacements) across searches.

Probably this is better implemented first in a package.

vtjnash added a commit to vtjnash/julia that referenced this issue Apr 14, 2021
This has been attempted before, sometimes fairly similar to this, but
the attempts seemed to be either too simple or too complicated. This
aims to be simple, and even beats one of the "handwritten" benchmark
cases.

Past issues (e.g. JuliaLang#25396) have proposed that using Regex may be faster,
but in my tests, this handily bests even simplified regexes. There can
be slow Regexes patterns that can cause this to exhibit O(n^2) behavior,
but only if the one of the earlier patterns is a partial match for a
later pattern Regex and that Regex always matches O(n) of the input
stream. This is a case that is hopefully usually avoidable in practice.

fixes JuliaLang#35327
fixes JuliaLang#39061
fixes JuliaLang#35414
fixes JuliaLang#29849
fixes JuliaLang#30457
fixes JuliaLang#25396
JeffBezanson pushed a commit that referenced this issue Jun 7, 2021
This has been attempted before, sometimes fairly similar to this, but
the attempts seemed to be either too simple or too complicated. This
aims to be simple, and even beats one of the "handwritten" benchmark
cases.

Past issues (e.g. #25396) have proposed that using Regex may be faster,
but in my tests, this handily bests even simplified regexes. There can
be slow Regexes patterns that can cause this to exhibit O(n^2) behavior,
but only if the one of the earlier patterns is a partial match for a
later pattern Regex and that Regex always matches O(n) of the input
stream. This is a case that is hopefully usually avoidable in practice.

fixes #35327
fixes #39061
fixes #35414
fixes #29849
fixes #30457
fixes #25396
shirodkara pushed a commit to shirodkara/julia that referenced this issue Jun 9, 2021
This has been attempted before, sometimes fairly similar to this, but
the attempts seemed to be either too simple or too complicated. This
aims to be simple, and even beats one of the "handwritten" benchmark
cases.

Past issues (e.g. JuliaLang#25396) have proposed that using Regex may be faster,
but in my tests, this handily bests even simplified regexes. There can
be slow Regexes patterns that can cause this to exhibit O(n^2) behavior,
but only if the one of the earlier patterns is a partial match for a
later pattern Regex and that Regex always matches O(n) of the input
stream. This is a case that is hopefully usually avoidable in practice.

fixes JuliaLang#35327
fixes JuliaLang#39061
fixes JuliaLang#35414
fixes JuliaLang#29849
fixes JuliaLang#30457
fixes JuliaLang#25396
johanmon pushed a commit to johanmon/julia that referenced this issue Jul 5, 2021
This has been attempted before, sometimes fairly similar to this, but
the attempts seemed to be either too simple or too complicated. This
aims to be simple, and even beats one of the "handwritten" benchmark
cases.

Past issues (e.g. JuliaLang#25396) have proposed that using Regex may be faster,
but in my tests, this handily bests even simplified regexes. There can
be slow Regexes patterns that can cause this to exhibit O(n^2) behavior,
but only if the one of the earlier patterns is a partial match for a
later pattern Regex and that Regex always matches O(n) of the input
stream. This is a case that is hopefully usually avoidable in practice.

fixes JuliaLang#35327
fixes JuliaLang#39061
fixes JuliaLang#35414
fixes JuliaLang#29849
fixes JuliaLang#30457
fixes JuliaLang#25396
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
strings "Strings!"
Projects
None yet
Development

No branches or pull requests

7 participants