Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

We need more unique and related methods #14649

Closed
4 tasks done
gajomi opened this issue Jan 11, 2016 · 14 comments
Closed
4 tasks done

We need more unique and related methods #14649

gajomi opened this issue Jan 11, 2016 · 14 comments
Labels
performance Must go faster

Comments

@gajomi
Copy link
Contributor

gajomi commented Jan 11, 2016

There is quite a bit of room to add optimized unique methods. The only extant single argument unique method unique(C) at set.jl:107 explicitly loops through all the elements of C to build up the unique result, and so has cost proportional to the length C. However, in the case of ranges, something like

unique{T}(x::Range{T}) = x

reduces the cost to O(1), which could improve performance in a number of places (for example see #14004 and the fix at 75336c6).

Some of the optimizations that apply to unique also benefit Set construction (or methods involved therein), and will be noted.

I am planning to make a PR to implement these kinds of optimizations, and would like to use this issue to list out types where this optimization would be helpful. The checked off features in the list below refer to working implementations in this branch (I didn't think I should make a feature request until the branch was feature complete).

Targets:

  • Range types (maximally trivial assuming zero step sizes disallowed) a872404
  • Dict, Set and IntSet types (trivial cases) 990a443
  • AbstractArray{Bool}, AbstractArray{Int8} and other AbstractArray{T} with "small" bounds on total unique elements. Build up of unique list terminates when all possible elements are seen. Also applies to union!(s::Set{T},AbstractArray{T}) methods, which are used in Set construction, among other things. c8b02c4
  • Diagonal,Tridiagonal and other matrices with many repeated structural zeros 5d0e097

...and possibly more? If you have a target in mind please leave a comment.

@gajomi gajomi changed the title We need more unique methods We need more unique methods Jan 11, 2016
@kshyatt kshyatt added the performance Must go faster label Jan 11, 2016
@tkelman
Copy link
Contributor

tkelman commented Jan 11, 2016

Can't ranges possibly have step 0?

@gajomi
Copy link
Contributor Author

gajomi commented Jan 12, 2016

Well I guess you can have

julia> x = 0:0
1-element UnitRange{Int64}:
 0

in which case the unique optimization above gives the correct behavior

julia> unique(x)
1-element Array{Int64,1}:
 0
julia> unique(x) == unique(collect(x))
true

I tried constructing float range with zero step sizes but got error that told me I could not. Curious though, in the above example x = 0:0 I find

julia> step(x)
1

which is possibly a bug?

@gajomi
Copy link
Contributor Author

gajomi commented Jan 12, 2016

OK, I guess I wasn't trying hard enough. If you make a call to the range function you can get

julia> range(1,0.,3)
3-element FloatRange{Float64}:
 1.0,1.0,1.0

such that FloatRanges won't nessecarilly be equal to their uniques. Adding some logic to check for zero valued step sizes would be an easy fix.

In particular, if you do something like this

Base.unique{T}(x::Range{T}) = step(x) == zero(T) ? x[1:1] : x

you get the correct behavior and remain type stable.

@vtjnash
Copy link
Member

vtjnash commented Jan 12, 2016

i think that is a lack of error checking on the part of the float range constructor -- a step of zero isn't logical so it is probably a user bug. the same arguments as Integers throw an argument error:

julia> range(1, 0, 3)
ERROR: ArgumentError: step cannot be zero

@gajomi
Copy link
Contributor Author

gajomi commented Jan 12, 2016

@vtjnash - so are you are saying that range(1,0.,3) should also be an error? I agree that this would seem to be the logical thing to do.

I am interested to know what others actually think should be done though, and have gone on a hunt for relevant discussions. Issues about ranges are myriad. I see a bunch of commits here #3121 referring to zero sized steps. There is an umbrella issue some time later #5585. Perhaps someone knows of an issue specific to step sizes equal to zero?

@nalimilan
Copy link
Member

I agree that range(1,0.,3) should throw an error.

@hayd
Copy link
Member

hayd commented Jan 13, 2016

I think zero step should be allowed. Specifically that'd fix #10391... That issue also mentions:

(1:5) * 0

Edit: zero step ranges were removed in #6326 by @JeffBezanson

@hayd
Copy link
Member

hayd commented Jan 13, 2016

That also allows zero and one to be special cased... and 0(1).

@hayd
Copy link
Member

hayd commented Jan 13, 2016

I think the issue is that SteppedRange doesn't allow a len, whereas FloatRange does. The options IMO:

  1. disallow 0 step ranges consistently
  2. add a length field to SteppedRange
  3. introduce a ConstantRange type (this means that some methods will be type unstable)

Not sure what is the best, although this should probably be it's own issue.

Edit: ConstantRangeis a bit of a oxymoron...

@gajomi
Copy link
Contributor Author

gajomi commented Jan 19, 2016

Well it seems like there is some interest, so I started working on the feature(s). In the process of working on this it seemed reasonable to slightly expand the scope of the issue to other methods that are doing stuff closely related to unique. I have edited the original comment to highlight this fact and provided links to the branch where this work is taking place.

@gajomi gajomi changed the title We need more unique methods We need more unique and related methods Jan 19, 2016
@gajomi
Copy link
Contributor Author

gajomi commented Feb 10, 2016

Pull request with features described above at #15009.

@kshyatt
Copy link
Contributor

kshyatt commented May 10, 2016

@gajomi I see all the boxes are checked. Is it just that open PR that you need to finish before this issue can be closed?

@kshyatt
Copy link
Contributor

kshyatt commented Jan 26, 2017

I'm closing this since it's been nearly a year. Reopen if there's more to add here.

@kshyatt kshyatt closed this as completed Jan 26, 2017
@hayd
Copy link
Member

hayd commented Jan 26, 2017

But #15009 is still open... I don't think this has been resolved yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster
Projects
None yet
Development

No branches or pull requests

6 participants