Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

read_file() %>% other_stuff() %>% write_file() has unexpected results #1158

Closed
seth127 opened this issue Dec 1, 2020 · 6 comments
Closed

Comments

@seth127
Copy link

seth127 commented Dec 1, 2020

I noticed this when switching from readr 1.3.1 to 1.4.0, but as shown below there is some unexpected behavior in 1.3.1 too. Basically you get weird behavior (usually an empty file) when you have a sequence of functions connected by a %>% that:

  • reads something in with read_file()
  • modifies the resulting string
  • writes something out with write_file()

Reprex 1

This is a version of what my original code was doing: using stringr::str_replace() to modify the contents of a file.

# This works as expected: foo is replaced by bar in the file
> library(readr)
> library(stringr)
> packageVersion("readr")
[1] ‘1.3.1> packageVersion("stringr")
[1] ‘1.4.0> write_file("foo", "foo.txt")
> read_file("foo.txt")
[1] "foo"
> read_file("foo.txt") %>% str_replace("foo", "bar")
[1] "bar"
> read_file("foo.txt") %>% str_replace("foo", "bar") %>% write_file("foo.txt")
> read_file("foo.txt")
[1] "bar"

But when I do this with 1.4.0 I get a blank file:

> library(readr)
> library(stringr)
> packageVersion("readr")
[1] ‘1.4.0> packageVersion("stringr")
[1] ‘1.4.0> write_file("foo", "foo.txt")
> read_file("foo.txt")
[1] "foo"
> read_file("foo.txt") %>% str_replace("foo", "bar")
[1] "bar"
> read_file("foo.txt") %>% str_replace("foo", "bar") %>% write_file("foo.txt")
> read_file("foo.txt")
[1] ""

Reprex 2

I tried to simplify this example by using paste() instead, and now I get unexpected behavior in both 1.3.1 and 1.4.0. Hopefully this helps instead of confusing the issue further. Either way, I would expect the result of read_file("foo.txt") %>% paste("second") to the same thing that ends up in the file when I pipe it to write_file() but that's not the case in either version.

> library(readr)
> packageVersion("readr")
[1] ‘1.3.1> write_file("first", "foo.txt")
> read_file("foo.txt")
[1] "first"
> read_file("foo.txt") %>% paste("second")
[1] "first second"
> read_file("foo.txt") %>% paste("second") %>% write_file("foo.txt")
> read_file("foo.txt")
[1] " second"

And also here

> library(readr)
> packageVersion("readr")
[1] ‘1.4.0> write_file("first", "foo.txt")
> read_file("foo.txt")
[1] "first"
> read_file("foo.txt") %>% paste("second")
[1] "first second"
> read_file("foo.txt") %>% paste("second") %>% write_file("foo.txt")
> read_file("foo.txt")
[1] " second"
@copernican
Copy link

copernican commented Dec 1, 2020

This example removes magrittr as a possible cause, and seems to indicate that the behavior was present in 1.3.1.

library(readr)
packageVersion("readr")
#> [1] '1.3.1'

fn <- "foo.txt"
write_file("foo", fn)
read_file(fn)
#> [1] "foo"

write_file(read_file(fn), fn)
read_file(fn)
#> [1] ""

Created on 2020-12-01 by the reprex package (v0.3.0)

@seth127
Copy link
Author

seth127 commented Dec 1, 2020

well, @copernican 's example seems to say that a similar behavior was happening without the pipes in both 1.3.1 and 1.4.0 so that might point toward magrittr. Indeed that appears to be relevant:

Behaves as I would expect with magrittr 1.5

> library(readr)
> library(stringr)
> library(magrittr)
> packageVersion("readr")
[1] ‘1.3.1> packageVersion("stringr")
[1] ‘1.4.0> packageVersion("magrittr")
[1] ‘1.5> write_file("foo", "foo.txt")
> read_file("foo.txt")
[1] "foo"
> read_file("foo.txt") %>% str_replace("foo", "bar")
[1] "bar"
> read_file("foo.txt") %>% str_replace("foo", "bar") %>% write_file("foo.txt")
> read_file("foo.txt")
[1] "bar"


> library(readr)
> library(magrittr)
> packageVersion("readr")
[1] ‘1.3.1> packageVersion("magrittr")
[1] ‘1.5> write_file("first", "foo.txt")
> read_file("foo.txt")
[1] "first"
> read_file("foo.txt") %>% paste("second")
[1] "first second"
> read_file("foo.txt") %>% paste("second") %>% write_file("foo.txt")
> read_file("foo.txt")
[1] "first second"

Behaves weirdly (to me at least) with magrittr 2.0

> library(readr)
> library(stringr)
> library(magrittr)
> packageVersion("readr")
[1] ‘1.4.0> packageVersion("stringr")
[1] ‘1.4.0> packageVersion("magrittr")
[1] ‘2.0.1> write_file("foo", "foo.txt")
> read_file("foo.txt")
[1] "foo"
> read_file("foo.txt") %>% str_replace("foo", "bar")
[1] "bar"
> read_file("foo.txt") %>% str_replace("foo", "bar") %>% write_file("foo.txt")
> read_file("foo.txt")
[1] ""
> 
> 
> library(readr)
> library(magrittr)
> packageVersion("readr")
[1] ‘1.4.0> packageVersion("magrittr")
[1] ‘2.0.1> write_file("first", "foo.txt")
> read_file("foo.txt")
[1] "first"
> read_file("foo.txt") %>% paste("second")
[1] "first second"
> read_file("foo.txt") %>% paste("second") %>% write_file("foo.txt")
> read_file("foo.txt")
[1] " second"

Although I'm still not entirely clear on what's going on in the example @copernican gives. Furthermore, I'm not clear on whether or not it's related to what I observed, though I think probably not.

@jimhester
Copy link
Collaborator

This is due to R's use of lazy evaluation.

If you look at the code for write_file()

readr/R/file.R

Lines 51 to 70 in 97186a8

write_file <- function(x, file, append = FALSE, path = deprecated()) {
if (is_present(path)) {
deprecate_warn("1.4.0", "write_file(path = )", "write_file(file = )")
file <- path
}
file <- standardise_path(file, input = FALSE)
if (!isOpen(file)) {
on.exit(close(file), add = TRUE)
if (isTRUE(append)) {
open(file, "ab")
} else {
open(file, "wb")
}
}
if (is.raw(x)) {
write_file_raw_(x, file)
} else {
write_file_(x, file)
}

You will see that the file is opened before the argument x is used, which means that in
write_file(read_file(fn), fn) the read_file(fn) call doesn't get evaluated until after the file has been opened for writing.

magrittr 2.0 uses lazy evaluation, so is equivalent to the nested version
write_file(read_file(fn), fn). However prior versions of magrittr were eager, so the read_file() call would have happened first.

To avoid this behavior we could add force(x) at the beginning of write_file(), though I am not sure if we should do so or not.

@seth127
Copy link
Author

seth127 commented Dec 2, 2020

That makes sense. My gut is that adding force(x) is not necessary because that would only solve the nested example (which is probably not a common coding pattern), while the more common %>% pattern would still not work because of magrittr's laziness.

Not sure if this is worth documenting anywhere (probably somewhere in magrittr?) but I'm glad to understand what's actually happening. Thanks for explanation @jimhester . Feel free to close this issue at your discretion.

@jimhester
Copy link
Collaborator

It would actually solve the issue in both cases.

@hongooi73
Copy link

Honestly, reading AND writing to the same file in the one expression seems like a bad idea regardless.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants