Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement fileinput #5202

Closed
wants to merge 3 commits into from
Closed

Implement fileinput #5202

wants to merge 3 commits into from

Conversation

huonw
Copy link
Member

@huonw huonw commented Mar 2, 2013

As per https://github.com/mozilla/rust/wiki/Note-wanted-libraries.

Iterates over lines in a series of files, e.g. a basic cat

use std::fileinput;
fn main() {
    for fileinput::input |line| {
        io::println(line);
    }   
}

The API is essentially a subset of Python's fileinput module, although the lack of default arguments and global mutable state means that there are extra functions to handle a few different cases (files from command line arguments, files from a vector, accessing current filename/line number).

A few points that possibly require adjustment:

  • Most functions take vectors of Path (well, Option<Path>) rather than just ~str, since this seems safer, and allows finer control without the number of different functions/methods increasing exponentially.
  • pathify has a stupid name.
  • I'm not quite sure how to mock tests that require external files: the tests in libcore/io.rs seem to indicate using a tmp subdirectory, so that's what I did, but I can't reliably build rust on this computer to test (sorry! although I have run the tests in just fileinput.rs after creating ./tmp/ manually).
  • The documentation I've written seems pretty crappy and not particularly clear.
  • Only UTF8 files are supported.

@huonw
Copy link
Member Author

huonw commented Mar 4, 2013

@graydon I think I fixed the build failure.

(Edit: I see you're already on top of it.)

@huonw
Copy link
Member Author

huonw commented Mar 4, 2013

Sorry :( (Last attempt before I give up and stop attempting to contribute while I can't test.)

@huonw
Copy link
Member Author

huonw commented Mar 9, 2013

@graydon I've finally got a computer that compiles rust in less than 2 hours, and so I've sorted out all the problems with this patch.

@graydon
Copy link
Contributor

graydon commented Mar 11, 2013

Great! I will approve it for landing, but I did actually start poking at it as promised, just ... I got side tracked and started fussing with the actual implementation. In particular I started trying to make improvements, but each one led to a more-general one:

  • To return str& slices of the input, not ~str slices
  • To implement Reader, actually, and just acquire appropriate methods via ReaderUtil
  • To factor out the functionality of "reading from the concatenation of multiple Readers in sequence" rather than wiring it up to this particular module
  • To improve the io module so that non-boxed Readers are possible. Current design is very old and allocation / object heavy.

Around that point I got sidetracked and didn't finish. Sorry. If you want to make any or all of those changes in followup patches I'd certainly appreciate it. Otherwise it'll just go on the "more refactoring to the io modules" pile-o-bugs.

@huonw
Copy link
Member Author

huonw commented Mar 11, 2013

Sorry for stepping on your toes!

Do you have the partial code in a branch somewhere that I could continue off?

@graydon
Copy link
Contributor

graydon commented Mar 12, 2013

No worries. Unfortunately not, I kept deleting and restarting what I was doing. It wasn't terribly deep; the comment I wrote above is more informative than any of the half-sketch work I had done.

@huonw
Copy link
Member Author

huonw commented Mar 13, 2013

Hm, some quick questions/comments.

To return str& slices of the input, not ~str slices

Do you mean next_line(..) -> &str? And/or lines_each(..., &fn(&str) -> bool)?

To factor out the functionality of "reading from the concatenation of multiple Readers in sequence" rather than wiring it up to this particular module

I see two useful methods: a plain with_list_readers(&[@Reader], fn(@Reader) -> t) -> t which sequences a list of readers into a single one, and another that does the same but generates the Readers as necessary: with_ondemand_readers(&fn() -> Some<@Reader>, &fn(@Reader) -> t) -> t (approximately). However neither offers much option for retrieving useful information that depends on the exact Reader impl being used (e.g. current socket number, filename, line number, byte count, whatever). And seek and tell most likely have very peculiar behaviour, since they can only reasonably act within the current sub-Reader.

In summary: I can't see a good way of designing this that allows it to be reused for fileinput and keep similar functionality.

To implement Reader, actually, and just acquire appropriate methods via ReaderUtil

I thought about doing this before, but discarded it without actually trying (not quite sure why), but it's clearly a better solution than renaming methods; although it likely comes a performance cost due to having to scan for '\n' (not a problem, since this is a convenience library). I'll do this.

@graydon
Copy link
Contributor

graydon commented Mar 15, 2013

I'm not sure why there'd be a performance cost to implementing Reader, can you elaborate?

Yes on returning &str and calling inner &fn(&str)->bool for lines. Less allocation is better.

@huonw
Copy link
Member Author

huonw commented Mar 15, 2013

The performance cost is just having to scan everything for \n to keep the line counts in sync (i.e. when reading via .read).

Hmm, ok, not sure why I didn't follow the convention with the &strs, that will be fixed when I implement Reader after #5398 merges. Anyway, I'm probably incorrect, but I feel like that change actually increases the amount of allocation done, since the user will sometimes want to convert the &str back to a ~str (i.e. an allocation), and the ~str to &str is much cheaper (free?), and the original ~str is always going to be allocated (since the underlying io functions allocate). So it is more efficient to just let the compiler move it into the function. That is to say, the second code has few allocations, since all the uses of the strings are moves:

(with slices)

let mut lines = ~[];
for fileinput::input |slice : &str| {
     lines.push(str::from_slice(slice))
}

vs.

(as it is now)

let mut lines = ~[];
for fileinput::input |owned : ~str| {
     lines.push(owned)
}

(Similar reasoning could be used for many of the io functions: they return a &str even though they had to allocate a ~str anyway.)

@graydon
Copy link
Contributor

graydon commented Mar 18, 2013

The underlying io functions allocating ~str is a bug. I'm trying to prevent it propagating into other modules by committing to such APIs. Ideally we should fix that sometime too.

(In general, any time it seems like we're using ~str or ~[] in library code in a place where you can picture an API designed better that does not necessarily allocate or pass ownership, it's worth checking the history of that API. Most of them are residue from the long-ago state where all vectors and strings were unique. We did a batch conversion when we introduced the differentiated ~[], &[] and @[] types; we did not fix all the associated APIs to make sensible use of &[]. We still have a ton of technical debt in this matter.)

(Also also: sorry for being picky on this and/or trying to use it as some kind of teachable moment. Your patch is good and any time you get frustrated with this conversation just say so and I'm happy to merge it in the state it's in. Any module doing something fileinput-y is better than none and I'm very happy to see this!)

@huonw
Copy link
Member Author

huonw commented Mar 18, 2013

Ok, I understand about &str.

I'm not getting frustrated: it's not my project and I completely understand being picky and trying to get things as you would prefer, and anyway, I really appreciate that you (and the rest of the core team) are so willing to spend the time to help me get my contributions into shape. (And, the longer the conversation goes now, the more I'll learn about the design decisions etc, so (hopefully) the shorter it'll be next time.)

@huonw
Copy link
Member Author

huonw commented Mar 25, 2013

@graydon, I've implemented Reader and done the &str change. I'd like to get this landed, and then I maybe might think about general-concatenation-of-readers and io redesign.

I had an abortive attempt at moving away from mutable fields in io and passing &mut self to the Reader and Writer methods but that seemed to require changes across every crate, and fighting "illegal borrow unless pure"s (isn't the pure/impure distinction removed?), so I'll leave that for a separate PR. However, I have tried to not increase the technical debt, by getting this module into such a state that the change to &mut self basically requires only renaming FileInput_ to FileInput and doing s/&self/&mut self/ and s/self.fi./self./... but I'm sure it won't be that simple.

Also: I've had to add 2 unsafe blocks: I'm fairly sure they're safe and it's rustc being insufficiently smart (or old purity-related code that hasn't been adapted to the new system?), since it's just checking the lists are empty; just drawing your attention to it because there could easily be some subtly that I'm missing?

Lastly, mentioned this in the 0.6 release notes per #5344. (edit 2013-4-3: removed that commit, since .6 was released.)

huonw added 3 commits April 3, 2013 11:36
Iterate over lines in a series of files. API (mostly) adopted from
Python's fileinput module.
bors added a commit that referenced this pull request Apr 3, 2013
As per https://github.com/mozilla/rust/wiki/Note-wanted-libraries.

Iterates over lines in a series of files, e.g. a basic `cat`

```rust
use std::fileinput;
fn main() {
    for fileinput::input |line| {
        io::println(line);
    }   
}
```

The API is essentially a subset of [Python's fileinput module](http://docs.python.org/3.3/library/fileinput.html), although the lack of default arguments and global mutable state means that there are extra functions to handle a few different cases (files from command line arguments, files from a vector, accessing current filename/line number).

A few points that possibly require adjustment:
- Most functions take vectors of `Path` (well, `Option<Path>`) rather than just `~str`, since this seems safer, and allows finer control without the number of different functions/methods increasing exponentially.
- `pathify` has a stupid name.
- I'm not quite sure how to mock tests that require external files: the tests in `libcore/io.rs` seem to indicate using a `tmp` subdirectory, so that's what I did, but I can't reliably build rust on this computer to test (sorry! although I have run the tests in just `fileinput.rs`  after creating `./tmp/` manually).
- The documentation I've written seems pretty crappy and not particularly clear.
- Only UTF8 files are supported.
@bors bors closed this Apr 3, 2013
bors added a commit to rust-lang-ci/rust that referenced this pull request May 2, 2020
…sy-float-literal-restriction, r=flip1995

Move check for lossy whole-number floats out of `excessive_precision`

changelog: Add new lint `lossy_float_literal` to detect lossy whole number float literals and move it out of `excessive_precision` again.

Fixes rust-lang#5201
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants