Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Owning reference to a string with O(1) access to lines #3

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

pacak
Copy link

@pacak pacak commented Mar 3, 2022

This can be useful if you have a string that you don't want to modify but do want to access line by line in a random order.

@pacak
Copy link
Author

pacak commented Jun 12, 2023

Any thoughts?

@vallentin
Copy link
Owner

Hey. Sorry. I completely overlooked this PR. I just randomly did some changes today, and well, your new comment pinged me. I'll just take a look at this PR :)

@vallentin
Copy link
Owner

I have a few questions and thoughts. So feel free to respond to those, before you spend time reworking the PR.

(I'll review the PR now, and add the questions and thoughts as comments)

Copy link
Owner

@vallentin vallentin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again. Sorry for the lack or response until now. I somehow overlooked the original notification for this PR

/// - [`with_ending`](CachedLines::with_ending)
///
///
#[cfg(feature = "alloc")]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be easier to avoid needing all the #[cfg(feature = "alloc")], by having a:

#[cfg(feature = "alloc")]
mod cached;

and then adding this implementation to a cached.rs module.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm okay either way, was mostly trying to keep things in one file, second alternative is

mod cached {
    ...
}

Will do.

#[cfg(feature = "alloc")]
pub struct CachedLines {
content: alloc::string::String,
splits: alloc::vec::Vec<Range<usize>>,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe instead we could have a:

pub struct LineSpanWithoutText {
    start: usize,
    end: usize,
    ending: usize,
}

and then:

Suggested change
splits: alloc::vec::Vec<Range<usize>>,
splits: alloc::vec::Vec<LineSpanWithoutText>,

That way we could replace with_ending() and without_ending() with new()

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's been a while but as far as I remember the reasoning was that you are very unlikely to use both representations at once and having just one representation gives Index and iterator goodies.

Comment on lines +928 to +929
pub fn get(&self, index: usize) -> Option<&str> {
self.content.get(self.splits.get(index)?.clone())
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(in extension of the previous comment)

Then have this be something like:

Suggested change
pub fn get(&self, index: usize) -> Option<&str> {
self.content.get(self.splits.get(index)?.clone())
pub fn get(&self, index: usize) -> Option<LineSpan<'_>> {
// i.e. from splits get LineSpanWithoutText + self.contents => LineSpan
...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, for get this works.

Comment on lines +935 to +938
type Output = str;

fn index(&self, index: usize) -> &Self::Output {
&self.content[self.splits[index].clone()]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. I was about to suggest calling .get() and returning the LineSpan, but this needs to return a reference

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Index needs to return a reference to so to accommodate both with and without ending - it needs to be decided during construction. Either two different constructors - like I did or using either newtype/type parameter.

#[cfg(test)]
mod tests {
use super::*;

#[test]
fn test_cached_lines_no_endings() {
let text = "\r\nfoo\nbar\r\nbaz\nqux\r\n\r";
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not that Rust 1.70 changed the behavior, so trailing \r will appear in the final line. (Just in case this test suddenly)

PR: rust-lang/rust#100311

Comment on lines +880 to +881
pub struct CachedLines {
content: alloc::string::String,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this can use alloc::borrow::Cow instead to allow for borrowing a &str instead of needing to turn it into an owned String

(Not sure if it creates other issues)

Suggested change
pub struct CachedLines {
content: alloc::string::String,
pub struct CachedLines<'a> {
content: alloc::borrow::Cow<'a>,

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm okay either way, but Cow adds complexity without adding significant benefits - CachedLines will have to allocate a vector of sizes regardless, adding a single allocation seems to be fine compared to the alternative of allocating and storing a whole bunch of lines.

///
///
#[cfg(feature = "alloc")]
pub struct CachedLines {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finally, maybe it should be called LineSpanCache/CachedLineSpans. If it's changed to collect LineSpan

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Names are hard, sure.

Comment on lines +863 to +880
#[derive(Debug, Clone)]
/// Owning pointer to a string that provides O(1) access to line slices
///
/// ```
/// use line_span::CachedLines;
/// let cache = CachedLines::without_ending(String::from("hello\nworld"));
/// assert_eq!(&cache[0], "hello");
/// assert_eq!(&cache[1], "world");
/// ```
///
/// Slices might or might not include line breaks depending on a function used to construct it
///
/// - [`without_ending`](CachedLines::without_ending)
/// - [`with_ending`](CachedLines::with_ending)
///
///
#[cfg(feature = "alloc")]
pub struct CachedLines {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move the attribute, to after the comment

#[derive(Clone, Debug)]
pub struct CachedLines {

Comment on lines +863 to +880
#[derive(Debug, Clone)]
/// Owning pointer to a string that provides O(1) access to line slices
///
/// ```
/// use line_span::CachedLines;
/// let cache = CachedLines::without_ending(String::from("hello\nworld"));
/// assert_eq!(&cache[0], "hello");
/// assert_eq!(&cache[1], "world");
/// ```
///
/// Slices might or might not include line breaks depending on a function used to construct it
///
/// - [`without_ending`](CachedLines::without_ending)
/// - [`with_ending`](CachedLines::with_ending)
///
///
#[cfg(feature = "alloc")]
pub struct CachedLines {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also feel like this should probably be feature gated, with a cache = [] feature

Copy link
Author

@pacak pacak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently I inlined proposed changes into my crate and using Index to access strings, code is here: https://github.com/pacak/cargo-show-asm/blob/master/src/cached_lines.rs

There's no rush to get this included :)

Open questions are:

  • Cow - yes/now. I decided against it since it's complexity without significant gains
  • stored values - Range<usize> or Linespan - former offers uniform interface for get and Index, later seems to be a bit less useful
  • put under separate feature or not. I decided against it, "alloc" limits what platforms you can run on, having "cached" as a separate feature saves a tiny bit of compilation time at expense of user confusion :)

What are your preferences herE?

/// - [`with_ending`](CachedLines::with_ending)
///
///
#[cfg(feature = "alloc")]
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm okay either way, was mostly trying to keep things in one file, second alternative is

mod cached {
    ...
}

Will do.

///
///
#[cfg(feature = "alloc")]
pub struct CachedLines {
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Names are hard, sure.

Comment on lines +880 to +881
pub struct CachedLines {
content: alloc::string::String,
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm okay either way, but Cow adds complexity without adding significant benefits - CachedLines will have to allocate a vector of sizes regardless, adding a single allocation seems to be fine compared to the alternative of allocating and storing a whole bunch of lines.

#[cfg(feature = "alloc")]
pub struct CachedLines {
content: alloc::string::String,
splits: alloc::vec::Vec<Range<usize>>,
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's been a while but as far as I remember the reasoning was that you are very unlikely to use both representations at once and having just one representation gives Index and iterator goodies.

Comment on lines +935 to +938
type Output = str;

fn index(&self, index: usize) -> &Self::Output {
&self.content[self.splits[index].clone()]
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Index needs to return a reference to so to accommodate both with and without ending - it needs to be decided during construction. Either two different constructors - like I did or using either newtype/type parameter.

Comment on lines +928 to +929
pub fn get(&self, index: usize) -> Option<&str> {
self.content.get(self.splits.get(index)?.clone())
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, for get this works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants