Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Changing Tendril Atomicity with SendTendril produces garbage. #58

Closed
sqwishy opened this issue Jan 25, 2022 · 4 comments · Fixed by #59
Closed

Changing Tendril Atomicity with SendTendril produces garbage. #58

sqwishy opened this issue Jan 25, 2022 · 4 comments · Fixed by #59

Comments

@sqwishy
Copy link

sqwishy commented Jan 25, 2022

My understanding is that this should not modify the text of the tendril and that this is a reasonable thing to do.

#[cfg(test)]
mod tests {
    use tendril::{fmt::UTF8, Atomic, NonAtomic, Tendril};

    #[test]
    fn it_works() {
        let s: Tendril<UTF8, NonAtomic> = "<p><i>Hello!</p>, World!</i>".into();
        dbg!(&s);
        let s: Tendril<UTF8, Atomic> = s.into_send().into();
        dbg!(&s);
    }
}

But the output is something like:

[src/lib.rs:8] &s = Tendril<UTF8>(owned: "<p><i>Hello!</p>, World!</i>")
[src/lib.rs:10] &s = Tendril<UTF8>(owned: "i>Hello!</p>, World!</i>\u{0}\u{0}\u{0}\u{0}")

It turns out the headers are different sizes so the transmute is weird.

std::mem::size_of::<Header<Atomic>>() = 16
std::mem::size_of::<Header<NonAtomic>>() = 12

I don't know if the Header should get #[repr(align(16))] or if there's a less platform-specific thing or something; but it seems like it might be a bug idk. ¯\_(ツ)_/¯

@SimonSapin
Copy link
Member

This is definitely a bug. It looks like this functionality is not tested at all, lovely.

My suspicion is on PackedUsize, which has #[repr(C, packed)] but is only used for NonAtomic. It was introduced in #46 … maybe as an attempt to preserve some of the effect of #[repr(packed)] that was previously on Header? @xfix, is that right?

This in turn was added in b39ee1f which talks about the C API. Since we removed that, we should probably remove PackedUsize as well.

@SimonSapin
Copy link
Member

Removing PackedUsize fixes this immediate issue, but that it happened in the first place only reinforces my feeling that Tendril’s casual and pervasive use of unsafe is dangerous. It might have been sound once, but when all of the library is responsible to maintain invariant that are only known by the original author it’s not surprising that things break is subtle ways when other folks make changer over the years.

I’d recommend not to use Tendril if you can, though presumably you need it in order to use html5ever.

I started a rewrite of Tendril at https://github.com/servo/html5ever/tree/zbuf/ with the idea of having unsafe much more limited, self-contained, and documented. But I never quite finished it or ported html5ever to it. Though maybe making yet another unsafe-using library is not a great plan and html5ever should rather use something better maintained like https://crates.io/crates/bytes instead.

SimonSapin added a commit that referenced this issue Jan 26, 2022
This fixes #58, which shows UB
when converting between non-atomic and atomic `Tendril`.
This conversion is exposed without `unsafe` through `SendTendril`
and implemented with `transmute`, so matching layout is essential.
@KamilaBorowska
Copy link
Contributor

KamilaBorowska commented Jan 26, 2022

Oops, didn't realize that into_send is a method. I wonder why did the tests not break...

The intent of PackedUsize was to have an usize whose alignment was 4 to stay compatible with C API. The reason why I did that for non-atomic API only is that packing is incorrect for atomic variables (CPUs generally don't guarantee atomicity without proper alignment).

@KamilaBorowska
Copy link
Contributor

KamilaBorowska commented Jan 26, 2022

But yeah, there is way too much unsafe in this crate. I believe I did write something to that effect in https://lib.rs/crates/tendril/crev: "There is ridiculous amount of unsafe. I have no idea how anything in this crate works." I did try fixing some things (things that were obviously wrong), but to be fair, it's still one of crates I'm least certain about in my program dependencies (and I use this due to using ammonia).

SimonSapin added a commit that referenced this issue Feb 10, 2022
This fixes #58, which shows UB
when converting between non-atomic and atomic `Tendril`.
This conversion is exposed without `unsafe` through `SendTendril`
and implemented with `transmute`, so matching layout is essential.
bors-servo added a commit that referenced this issue Feb 10, 2022
Fix layout differences of Header<Atomic> v.s. Header<NonAtomic>

This fixes #58, which shows UB when converting between non-atomic and atomic `Tendril`. This conversion is exposed without `unsafe` through `SendTendril` and implemented with `transmute`, so matching layout is essential.
sqwishy added a commit to sqwishy/nipper that referenced this issue Feb 5, 2023
I made these changes a long time ago and don't really remember why. Only
that I wanted to use graphql over a nipper `Document` and it wasn't Sync
and graphql_async wants everything to implement to implement Sync (see
traits OutputType and ContainerType). In my case I don't care for Sync
but it's easier to change nipper than to change async_graphql.

It also helped me find a bug in the tendril library. [1]

I think one change was so that if you use the :scope css selector it
matches the node you are trying to match from, kind of like
`querySelector()`. See developer.mozilla.org for more [2]

[1] servo/tendril#58
[2] https://developer.mozilla.org/en-US/docs/Web/CSS/:scope
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants