-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add rtruncate
, ltruncate
, ctruncate
for truncating strings
#55351
add rtruncate
, ltruncate
, ctruncate
for truncating strings
#55351
Conversation
Isn't this font and terminal emulator dependent? (e.g. 🏳️🌈 is 1 wide in a browser but in my terminal emulator, displaying as 🏳️🌈) |
I'm far from an expert on this, but isn't that lack of adherence to the standard by the terminal, not variability that should be handled? |
In any case, |
no. Quoting https://www.unicode.org/emoji/charts/emoji-zwj-sequences.html:
|
These are adjacent functions to |
f13dcb6
to
3f012b2
Compare
(It definitely depends on the terminal; see the classic issue #3721 — but alternatives like |
Do we really need all three of these in the stdlib? |
I see two arguments for this functionality finding its way into
With regards to why While cpad(str, w) = lpad(rpad(str, w ÷ 2), w) and this was given as the reason why Unlike |
Oh also, I've just discovered a generalised truncation function I implemented in function struncate(str::AbstractString, maxwidth::Int, joiner::AbstractString = "…", mode::Symbol = :center)
textwidth(str) <= maxwidth && return str
left, right = firstindex(str), lastindex(str)
width = textwidth(joiner)
while true
if mode ∈ (:right, :center)
(width += textwidth(str[left])) <= maxwidth || break
left = nextind(str, left)
end
if mode ∈ (:left, :center) && width < maxwidth
(width += textwidth(str[right])) <= maxwidth || break
right = prevind(str, right)
end
end
str[begin:prevind(str, left)] * joiner * str[nextind(str, right):end]
end |
A single function like that seems tidier, but is it too atypical to put key functionality on a kwarg? Otherwise we might have had |
We could always use a single-function implementation internally (maybe with a |
3f012b2
to
62bb2d6
Compare
Indeed. Just updated to the single internal implementation |
62bb2d6
to
8436ea3
Compare
base/strings/util.jl
Outdated
ctrunc(str::AbstractString, maxwidth::Int, replacement::Union{AbstractString,Char} = '…'; prefer_left::Bool = true) = | ||
struncate(str, maxwidth, replacement, :center, prefer_left) | ||
|
||
function struncate(str::AbstractString, maxwidth::Int, replacement::Union{AbstractString,Char}, mode::Symbol = :center, prefer_left::Bool = true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
function struncate(str::AbstractString, maxwidth::Int, replacement::Union{AbstractString,Char}, mode::Symbol = :center, prefer_left::Bool = true) | |
function _struncate(str::AbstractString, maxwidth::Int, replacement::Union{AbstractString,Char}, ::Val{mode}, prefer_left::Bool = true) where {mode} |
use an underscore to indicate that this is an internal method, and use Val
so that the mode
checks can be elided by the compiler?
(Especially since the mode
checks occur in the inner loop.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that we have public
we don't need to mark internals by name convention?
Val sounds good. I didn't appreciate why before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why Val
and not e.g. a Symbol
keyword arg?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that we have
public
we don't need to mark internals by name convention?
because the REPL autocompletes and even gives hints for private names: #51327
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That change just needs implementing, right? That can happen in time for 1.12 along with this code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to make sure that the mode == ... conditionals are evaluated at compile time.
This should be backed up by some benchmark in my opinion. A single pointer equality check seems to me insignificant from the work of the function.
In principle this could also happen with enum or Symbol arguments if we ensure that the function is inlined.
I don't think constant propagation requires inlining. And there is also @constprop :aggressive
.
Val
is never really a good idea to make an API upon (any longer).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be backed up by some benchmark in my opinion. A single pointer equality check seems to me insignificant from the work of the function.
I tried this with
s = randstring(10^5)
@btime struncate($s, 100, '…', :center);
and I got about an 8% speedup by hard-coding the mode
checks. (This is after doing the total_width
optimization noted below — otherwise, it spends all its time on the textwidth(str)
check.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I get another 60% speedup by replacing textwidth(::Char)
with an optimized version that has a fast path for ASCII characters. We might want to do this in a separate PR:
const textwidth_ascii = Cint[textwidth(Char(i)) for i = 0:127]
function textwidth2(c::AbstractChar)
b = bswap(reinterpret(UInt32, c))
b < 0x80 && @inbounds return Int(textwidth_ascii[b+1]) # ASCII fast path
Base.ismalformed(c) && return 1
Int(ccall(:utf8proc_charwidth, Cint, (UInt32,), c))
end
Update: this optimization is #55398
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using Val
I do see reduction from
julia> @b ctrunc("dsadsafsadsadsafdsad", 10)
150.062 ns (4 allocs: 112 bytes)
to
136.986 ns (4 allocs: 112 bytes)
That's without #55398
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After #55398 this is now
julia> @b ctruncate("dsadsafsadsadsafdsad", 10)
91.665 ns (4 allocs: 112 bytes)
Could the exported functions get more descriptive names? E.g., |
@nsajko these are supposed to be the counterparts to to |
IMO, given that So, IMO, in order of preference:
FWIW. |
Ok. Fair.
|
base/exports.jl
Outdated
@@ -632,6 +634,7 @@ export | |||
rpad, | |||
rsplit, | |||
rstrip, | |||
rtrunc, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nsajko Regarding renaming. I'm wavering. We have rpad
rsplit
rstrip
, lpad
lstrip
.
Moving to truncate_right
etc. feels a bit awkward for something so closely tied.
What about rtruncate
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, rtruncate
is much better, I vote for rtruncate
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do generally like unabbreviated names (nextind
continues to annoy me).
However... I do wonder if there's a danger in ltruncate
(& co.) being very similar in name to truncate
yet doing something completely different?
Co-Authored-By: Timothy <[email protected]>
Co-Authored-By: Steven G. Johnson <[email protected]> Co-Authored-By: Timothy <[email protected]>
Co-Authored-By: Steven G. Johnson <[email protected]>
Co-Authored-By: Steven G. Johnson <[email protected]>
6c222fe
to
bc9e780
Compare
rtrunc
, ltrunc
, ctrunc
for truncating stringsrtruncate
, ltruncate
, ctruncate
for truncating strings
Not to open a can of worms, but it occurs to me that in cases like: julia> struncate("🍕🍕🍕🍕🍕🍕xxxxxxxxxxx", 9, "…")
"🍕🍕🍕…xx" The "centre" truncate call has 6 cells of (Sorry Ian 😅) |
Just pushed a fix
|
…erms of `textwidth` (JuliaLang#55351) Co-authored-by: Timothy <[email protected]> Co-authored-by: Steven G. Johnson <[email protected]>
Co-authored with @tecosaur
Looking at the implementation of these functions in
Profile
(which are touched in #55335) uncovered issues and it seems to make sense to make and test the proper functions and add them to Base.