Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

phone freezes when handling blanks #27

Closed
kinto-b opened this issue Aug 9, 2021 · 5 comments
Closed

phone freezes when handling blanks #27

kinto-b opened this issue Aug 9, 2021 · 5 comments

Comments

@kinto-b
Copy link

kinto-b commented Aug 9, 2021

Very hard for me to properly describe this issue, since it doesn't produce an error message and I can't reliably recreate it.

But when you run this code on different machines it just hits a wall at a certain point: CPU usage goes to 99% and the parsing slows to a crawl.

For me it took about 475 iterations to stall, but for others it stalled out after less than 50 iterations

options(dialr.show_progress = FALSE)
for (i in 1:1000) {
  cat(".")
  dialr::phone(rep("+", 1000), "AU")
}
@gorcha
Copy link
Collaborator

gorcha commented Aug 11, 2021

This is a weird one! Couldn't tell you why it's stalling everything, but it's something to do with catching the java exceptions.

I get the same behaviour with "" (although not NA) and also with "+001234", which throws an invalid country code exception in java. Pretty sure I've got a fix that works.

@gorcha gorcha closed this as completed in 6f7b42f Aug 11, 2021
@gorcha
Copy link
Collaborator

gorcha commented Aug 11, 2021

@kinto-b I've pushed a fix - let me know if you still have any issues

@kinto-b
Copy link
Author

kinto-b commented Aug 11, 2021

Super odd.

On a related note:

  • It might be useful to handle blanks separately as they don't require parsing:
phone2 <- function(x, region = "AU") {
    blank <- x %in% c("", NA, "+")
    out <- rep(dialr::phone("", "AU"), length(x)) 
    
    if (sum(!blank, na.rm = TRUE) > 0) {
        non_blank <- dialr::phone(x[!blank], region)
        out[!blank] <- non_blank
    }
    
    out
}

options(dialr.show_progress = FALSE)
bench::mark(
    phone2(rep("", 10000)),
    dialr::phone(rep("", 10000), region = "AU")
)
#> Warning: Some expressions had a GC in every iteration; so filtering is disabled.
#> # A tibble: 2 x 6
#>   expression                                       min   median `itr/sec`
#>   <bch:expr>                                  <bch:tm> <bch:tm>     <dbl>
#> 1 phone2(rep("", 10000))                       496.8us  515.3us  1782.   
#> 2 dialr::phone(rep("", 10000), region = "AU")    2.49s    2.49s     0.401
#> # ... with 2 more variables: mem_alloc <bch:byt>, gc/sec <dbl>

Created on 2021-08-11 by the reprex package (v2.0.0)

  • To that end, it would be good to have a method for creating empty phone() vectors like such:
character()
#> character(0)

phone()
#> phone(0)

@kinto-b
Copy link
Author

kinto-b commented Aug 11, 2021

@gorcha haha, you've literally just done what I was suggesting

@gorcha
Copy link
Collaborator

gorcha commented Aug 11, 2021

Yah pretty much, it'll also pick up some other rando invalid numbers like "+", "0" etc.
It's pretty much just pulling the regex that libphonenumber uses out of Java into R to avoid calling up Java when it's going to throw an exception anyway.

Re: the second point, can you please open a separate issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants