Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Argument of segment_length in line_segment fun causes issue #552

Closed
wangzhao0217 opened this issue Nov 1, 2023 · 14 comments · Fixed by #556
Closed

Argument of segment_length in line_segment fun causes issue #552

wangzhao0217 opened this issue Nov 1, 2023 · 14 comments · Fixed by #556
Assignees

Comments

@wangzhao0217
Copy link
Collaborator

for this minimal dataset:
rnet_x = sf::read_sf("https://github.com/ropensci/stplanr/releases/download/v1.0.2/rnet_x_ed.geojson")
rnet_y = sf::read_sf("https://github.com/ropensci/stplanr/releases/download/v1.0.2/rnet_y_ed.geojson")

segment_length = 15 will cause error, but segment_length =10 is not.

@Robinlovelace
Copy link
Member

That definitely sounds like a bug. Will aim to take a look today. Did it work previously?

@wangzhao0217
Copy link
Collaborator Author

wangzhao0217 commented Nov 2, 2023 via email

@Robinlovelace
Copy link
Member

Robinlovelace commented Nov 2, 2023

I didn't spot the issue before, segment_length was not defined in rnet_merge in the previous test.

Can you provide a full reproducible example?

@Robinlovelace
Copy link
Member

Full reproducible example, showing that this is for sure an issue:

rnet_x = sf::read_sf("https://github.com/ropensci/stplanr/releases/download/v1.0.2/rnet_x_ed.geojson")
rnet_y = sf::read_sf("https://github.com/ropensci/stplanr/releases/download/v1.0.2/rnet_y_ed.geojson")

segment_length = 15 # will cause error, but segment_length =10 is not
library(stplanr)
# ?rnet_merge

names(rnet_y)
#> [1] "value"     "Quietness" "geometry"
funs = list(vaue = sum, Quietness = mean)

res = rnet_merge(rnet_x, rnet_y)
#> [1] "funs is NULL"
#> Joining with `by = join_by(identifier)`
res2 = rnet_merge(rnet_x, rnet_y, segment_length = segment_length)
#> [1] "funs is NULL"
#> Error in `[[<-`:
#> ! Assigned data `res` must be compatible with existing data.
#> ✖ Existing data has 3859 rows.
#> ✖ Assigned data has 3858 rows.
#> ℹ Only vectors of size 1 are recycled.
#> Caused by error in `vectbl_recycle_rhs_rows()`:
#> ! Can't recycle input of size 3858 to size 3859.
#> Backtrace:
#>      ▆
#>   1. ├─stplanr::rnet_merge(rnet_x, rnet_y, segment_length = segment_length)
#>   2. │ └─stplanr::rnet_join(rnet_x, rnet_y, dist = dist, crs = crs, ...)
#>   3. │   ├─stplanr::line_segment(rnet_y, segment_length = segment_length)
#>   4. │   └─stplanr:::line_segment.sf(rnet_y, segment_length = segment_length)
#>   5. │     └─stplanr:::line_segment_rsgeo(l, n_segments = n_segments)
#>   6. │       ├─base::`[[<-`(`*tmp*`, attr(l, "sf_column"), value = `<LINESTRING [°]>`)
#>   7. │       └─tibble:::`[[<-.tbl_df`(`*tmp*`, attr(l, "sf_column"), value = `<LINESTRING [°]>`)
#>   8. │         └─tibble:::tbl_subassign(...)
#>   9. │           └─tibble:::vectbl_recycle_rhs_rows(value, fast_nrow(xo), i_arg = NULL, value_arg, call)
#>  10. │             ├─base::withCallingHandlers(...)
#>  11. │             └─vctrs::vec_recycle(value[[j]], nrow)
#>  12. └─vctrs:::stop_recycle_incompatible_size(...)
#>  13.   └─vctrs:::stop_vctrs(...)
#>  14.     └─rlang::abort(message, class = c(class, "vctrs_error"), ..., call = call)
res3 = rnet_merge(rnet_x, rnet_y, segment_length = 10)
#> [1] "funs is NULL"
#> Joining with `by = join_by(identifier)`

Created on 2023-11-02 with reprex v2.0.2

@Robinlovelace
Copy link
Member

The line that jumps out to me is this one:

stplanr:::line_segment_rsgeo(l, n_segments = n_segments)

Could it be another issue with the Rust implementation? Heads-up @JosiahParry in case, I'll have a look and try to pull out a minimal example.

Robinlovelace added a commit that referenced this issue Nov 2, 2023
@Robinlovelace
Copy link
Member

Confirmed: seems one out of ~4k expected geometries is missing:

res_tbl[[attr(l, "sf_column")]] <- res
Error in `[[<-`(`*tmp*`, attr(l, "sf_column"), value = list(c(-3.20572,  : 
  Assigned data `res` must be compatible with existing data.
✖ Existing data has 3859 rows.
✖ Assigned data has 3858 rows.
ℹ Only vectors of size 1 are recycled.
Caused by error in `vectbl_recycle_rhs_rows()`:
! Can't recycle input of size 3858 to size 3859.

@JosiahParry
Copy link
Contributor

library(rsgeo)

x <- geom_linestring(
  c(-3.19416, -3.19352, -3.19288),
  c(55.95524, 55.95535, 55.95546)
)

line_segmentize(x, 6) |> 
  expand_geoms()

Minimal reproducible example. I think this may be fixed upstream? I'm not sure what the cause is here. Alo as an aside, these are likely somewhat inaccurate segments since its being applied to geodetic coordinates instead of planar ones.

@JosiahParry
Copy link
Contributor

I genuinely think its because the numbers are so small.

library(rsgeo)

x <- geom_linestring(
  c(-3.19416, -3.19352, -3.19288) * 10,
  c(55.95524, 55.95535, 55.95546) * 10
)

line_segmentize(x, 6) |> 
  expand_geoms()

@Robinlovelace
Copy link
Member

I think this may be fixed upstream?

Re-compiling now after entering

remotes::install_dev("rsgeo")

@Robinlovelace
Copy link
Member

Update: same issue after rebuilding.

@Robinlovelace
Copy link
Member

I genuinely think its because the numbers are so small.

Yes that could potentially explain it:

> line_segmentize(x, 6) |> 
+     expand_geoms()
[[1]]
<rs_LINESTRING[5]>
[1] LineString([Coord { x: -3.19416, y: 55.95524 }, Coord { x: -3.194, y: 55.955267500000005 }, Coord { x: -3.193946666666667, y: 55.95527666666... 
[2] LineString([Coord { x: -3.193946666666667, y: 55.95527666666667 }, Coord { x: -3.19384, y: 55.95529500000001 }, Coord { x: -3.19373333333333... 
[3] LineString([Coord { x: -3.1937333333333333, y: 55.955313333333336 }, Coord { x: -3.19368, y: 55.9553225 }, Coord { x: -3.19352, y: 55.95535 }]))
[4] LineString([Coord { x: -3.19352, y: 55.95535 }, Coord { x: -3.19352, y: 55.95535 }, Coord { x: -3.193306666666667, y: 55.95538666666667 }]))    
[5] LineString([Coord { x: -3.193306666666667, y: 55.95538666666667 }, Coord { x: -3.1933066666666665, y: 55.95538666666667 }, Coord { x: -3.193... 

There shouldn't be any tiny linestrings in the example data though, minimum length should be 15m.

@JosiahParry
Copy link
Contributor

This is likely a bug, yes and should be fixed. But more than anything I think this is a misuse of the function. It's running into an error because of floating point precision. You're applying a euclidean algorithm in a geodetic space. In this very specific example we're partitioning 0.001298769 into 6 equal length pieces of size 0.0002164615. We're dealing with tiny units of precision that is getting futzed up. This does not occur if you increase the scale by even 1 decimal place.

I have pushed a HaversineSegmentize trait to georust upstream which would measure this in units of meters instead of euclidean units. Creating a line_segmentize_haversine() would be the real fix here.

@Robinlovelace
Copy link
Member

Yeah. Running it on projected data sounds like a good plan. Can you try that @wangzhao0217 ?

@JosiahParry
Copy link
Contributor

Related georust/geo#1107

Robinlovelace added a commit that referenced this issue Apr 23, 2024
@Robinlovelace Robinlovelace linked a pull request Apr 23, 2024 that will close this issue
Robinlovelace added a commit that referenced this issue Apr 23, 2024
…in-line_segment-fun-causes-issue-2

Use output of rsgeo for n_segments (#552)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment