-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sequence generation linspace vs linrange #9637
Comments
Some previous discussion in #7420, particularly @StefanKarpinski and @JeffBezanson's comments at the end. |
Just realized that Julia allows for testing with rational numbers. Here is what I got: ref = convert(Vector{Float64},
[linrange(convert(Rational{BigInt}, 0.1),
convert(Rational{BigInt}, 1.1), 6)])
print(convert(Vector{Float64}, [linrange(1//10, 11//10, 6)]) .== [0.1:0.2:1.1])
print(ref - [0.1:0.2:1.1])
print(ref - linspace(0.1, 1.1, 6)) gives Bool[true,true,true,true,true,true]
[0.0,5.551115123125783e-17,0.0,1.1102230246251565e-16,0.0,0.0]
[0.0,0.0,0.0,0.0,-1.1102230246251565e-16,0.0] This allows us to see what happens under exact calculations (and second and third example take into account that actually |
Here is the proposal how function linspace2(start, stop, n)
by = (stop - start) / (n - 1)
r = start:by:stop
return stop == last(r) ? [r] : linspace(start, stop, n)
end Now following the linspace definition agreed in #2575: "Construct a vector of n linearly-spaced elements from start to stop." I compared srand(1)
function diffrange(x)
dmin, dmax = extrema(diff(x))
dmax - dmin
end
scale1 = 100.0
scale2 = 1000.0
scale3 = 10000
store = Array(Float64, 0)
for i = 1:10000
a = scale1*(rand() - 0.5)
b = scale2*(rand() - 0.5)
n = 3 + int64(rand() * scale3)
l1 = linspace(a, b, n)
l2 = linspace2(a, b, n)
@assert l1[end] == l2[end]
res = diffrange(l2) - diffrange(l1)
push!(store, res)
end
println(mean(store))
println(maximum(store))
println(minimum(store))
println(sum(store .< 0))
println(sum(store .== 0))
println(sum(store .> 0)) which produces:
which simply says that actually So Additionally a question (I did not find it discussed): Would it not be useful to make |
The linspace function produces the straightforward linear intermediate points between its start and stop values. Since that's a pretty obvious, well-defined behavior – although, as it turns out, not always what you want – I'm hesitant to change it to anything else. The fact that our colon is better at producing what you want is due to "lifting" ranges by integer multiples that match the start, step and stop. |
Yes, my understanding is that the purpose of @bkamins thanks for the nice writeup on your blog :) |
I am aware about "lifitng" and that is why I have proposed the change. But I agree that it is a minor one. However, my understanding is that most often people need to generate sequences from In fact after the discussion I see that the iterator thing is more important than identity between for x = linspace(a, b, n)
# do something with x in a loop
end and in such cases you actually need only an iterator, and it is often easier to specify required size of the collection than step size. I assume that Just a simple example: const n = 10^8+1
function test1()
s = 0.0
for x in 0.0:(1/(n-1)):1.0
s += x
end
end
function test2()
s = 0.0
for x in linspace(0.0, 1.0, n)
s += x
end
end
@time test1()
@time test1()
@time test2()
@time test2() on my computer gives
|
Maybe at least add a warning to the |
I took a crack at making |
With this change, we would have linspace(0.1, 1.1, 6) == [0.1:0.2:1.1]
linspace(0.1, 1.1, 6) != [linrange(0.1, 1.1, 6)] So now [0.1,0.30000000000000004,0.5,0.7000000000000001,0.9,1.1] instead of |
Well this is a possible approach to modify function linrange2(a::Real, b::Real, len::Integer)
if len >= 2
by = (b - a) / (len - 1)
r = a:by:b
return b == last(r) && r.len == len ? r : range(a, by, len)
end
len == 1 && a == b ? range(a, zero((b-a)/(len-1)), 1) :
error("invalid range length")
end as it takes the advantage of "lifting" in standard colon notation (and we know the endpoint). Probably it is enough to check |
Looks like an improvement to me. It's hard to find good test cases, but to suggest a good metric for the "quality" of linspace, I'd want julia> var(diff(linspace2(0.1,1.1,11)))
2.139922160430262e-33
julia> var(diff(linspace(0.1,1.1,11)))
4.879022525780997e-33 |
I have checked |
Minimizing |
Note that I added tests for this that generate and check a lot of cases, all of which pass. I didn't add a list of specific hard cases since my experience with the range work was that the generated tests are strictly harder to pass than the hand-crafted hard cases. Before merging this, I should probably test those too though. |
Ok, added the "handcrafted" test cases too. They all pass, which is unsurprising. |
Just an example of a handcrafted test case (I used unmodified definition of m1 = nextfloat(-Inf)
m2 = prevfloat(Inf) # m1 == -m2
linspace(m1, m2, 3) # OK
[linrange(m1, m2, 3)] # wrong
[m1:m2:m2] # error |
Some of these issues did arise when I wrote Furthermore, in this case julia> for i in linspace(big(0.1),big(1.1),6)
println(float64(i))
end
0.1
0.30000000000000004
0.5
0.7000000000000001
0.9
1.1
julia> for i in linspace(0.1,1.1,6)
println(i)
end
0.1
0.30000000000000004
0.5
0.7000000000000001
0.9000000000000001
1.1
julia> for i in linrange(0.1,1.1,6)
println(i)
end
0.1
0.30000000000000004
0.5
0.7000000000000001
0.9
1.1 |
@bkamins, man, that's a diabolical example, which we should, of course, still handle correctly. |
@simonbyrne, the new proposed julia> for i in linspace(0.1,1.1,6)
println(i)
end
0.1
0.3
0.5
0.7
0.9
1.1 |
Sorry, I should have clarified what I meant: since the So for this example, the current |
Yes, that's true, but our current range behavior – which everyone seems to like – is also "wrong" by that standard. |
@StefanKarpinski, you asked for a handcrafted example so I gave one :). It seems that replacing @simonbyrne, This is exactly how I understood it. That is why in my second post in this issue I used println([0.1:0.1:0.3])
# [0.1,0.2,0.3]
println(map(float64,[big(0.1):big(0.1):big(0.3)]))
# [0.1,0.2] |
@bkamins, it's a tradeoff. I don't think you can have intuitive float range behavior and not have cases where it also depends on the precision of float type you're working with. Personally, I think that example is fairly acceptable. The real issue is actually that julia> println(map(float64,[BigFloat("0.1"):BigFloat("0.1"):BigFloat("0.3")]))
[0.1,0.2,0.3] |
@StefanKarpinski, Yes And as for replacement of |
That was my suspicion, although I hadn't gotten a chance to check it. We may be able to work around this somehow and get correct behavior in both cases. |
I think the way to go here is to create a |
How do you see that In general, as I mentioned earlier, I agree that returning generators should be preferable to returning arrays, as you can always run |
It would work by implementing the non-eager version of this: https://github.com/JuliaLang/julia/pull/9666/files#diff-8cc03187983013adb308460f8365e1d0R241 When I've got some time, I'll implement it in that PR. It different in the non-exact case: https://github.com/JuliaLang/julia/pull/9666/files#diff-8cc03187983013adb308460f8365e1d0R245 |
I've updated my pull request and I could use some opinions. I actually think there may be a way to generally avoid overflow, which I can implement a bit later (basically scaling down and then back up by powers of 2 when values are too large). |
I have written the following simple test procedure srand(1)
simsize = 10000
maxrange = 10
ref = [0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]
counts = zeros(Int, maxrange)
for d = 1:maxrange, i = 1:simsize
x = round(rand(), d)
if linspace(x, round(x+1, d), 11) != round(x + ref, d)
counts[d] += 1
end
end
print(counts / simsize * 100) The old
And with the new
And I like the improvement 👍. |
I'm not entirely sure what that code is measuring. Is this some statistical count of accuracy? |
Assume we wanted to generate an 11 element sequence starting from some value User can generate this 11 element sequence in two ways:
What I do is a comparison what is the probability that those two sequences are different when we change You can see that when |
linspace: try to "lift" linspace the float ranges are [close #9637]
I think that Julia has excellent handling of sequences of floats (see the my blog for comparison with R and Python).
The only small issue is the following code:
produces:
So they are not equal and I would expect an identical result (the root cause is the difference of the results between
colon
notation andlinspace
).It should be decided if this should be guaranteed.
The text was updated successfully, but these errors were encountered: