Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Create API for parsing to AST and NSAttributedString #132

Merged
merged 55 commits into from
Apr 21, 2019

Conversation

johnxnguyen
Copy link
Owner

@johnxnguyen johnxnguyen commented Apr 8, 2019

What's In This PR?

This PR introduces the ability to traverse the abstract syntax tree generated by cmark, which will lead to the construction of an NSAttributedString without the need to render html in a web view.

How Does It Work?

The Node protocol represents a wrapper around the cmark_node type. There is a concrete class for each type of cmark_node, which exposes the relevant properties for that type.

The Visitor protocol describes an object that is able to traverse the syntax tree. This approach is quite flexible, inasmuch that arbitrary data can be generated from the tree using a concrete implementation of this protocol. For example, we could define two different visitors to generate an NSAttributedString: one that renders the styled text, and another that does the same, but also includes the markdown syntax.

AttributedStringVisitor is a concrete visitor implementation that generates an NSMutableAttributedString. As it visits each nodes it adds the necessary line breaks and textual formatting and applies the visual attributes. It does not apply the attributes directly, but delegates this task to a Styler.

The Styler protocol declares a styling method for each type of node. Each method receives a reference to the NSMutableAttributedString as well as potentially additional information which may be needed (header size, urls etc.).

A concrete Styler can choose exactly how the attributes should be applied. For example, one styler could allow headers to contain bold or italic styles, while another styler could prohibit this by overwriting any such attributes that may have been previously applied to the header.

Notes

I haven't provided a concrete styler as this is quite an involved task and I think it would be best to address this in a separate PR, if you are interested in it. From my experience the hardest part to get right is the paragraph styling for nested blocks such as lists and block quotes.

I would like to update the readme and possibly the demo to elucidate this feature (this could also be done in the separate PR).

@mglass
Copy link

mglass commented Apr 9, 2019

@johnxnguyen This looks great. I'd like to start working on the styling piece today. I'll attach a patch with whatever changes I make and if they're in-line with your direction on this you can include however much of it makes sense.

@mglass
Copy link

mglass commented Apr 9, 2019

@johnxnguyen Patch attached. Pretty basic implementation, with a couple things still needing to be done, but maybe it'll help. Will check in again tomorrow!
0001-First-pass-at-rendering-Attributed-Strings.patch.zip

@johnxnguyen
Copy link
Owner Author

@mglass cool, I'll have a look in the morning. I've done some more work regarding the attributed string visitor and a styler, I'll try to push something tomorrow.

@iwasrobbed
Copy link
Collaborator

@johnxnguyen Thanks for this! I'll take an initial look later today 🙏

@johnxnguyen
Copy link
Owner Author

There's still a bit to do but it's a start. I'm out of town at the moment so probably can't get much done this weekend, but next week for sure I'll be finished.

Copy link
Collaborator

@iwasrobbed iwasrobbed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thanks for the changes! Just one more comment from the last review that might be worthwhile changing in case it's computed often

Source/AST/Nodes/Node.swift Outdated Show resolved Hide resolved
Source/AST/Nodes/Node.swift Show resolved Hide resolved
@iwasrobbed
Copy link
Collaborator

I ran a few basic tests and it seems like this is an order of magnitude faster 🎉

Time: 0.015 secs

    func testPerformanceOfCustomASTRendering() throws {
        measure {
            let _ = try! Down(markdownString: VisitorTests.cmarkMarkdownTestString).toAttributedString(styler: EmptyStyler()).string
        }
    }

Time: 0.183 sec

    func testPerformanceOfWebKitRendering() {
        measure {
            let _ = try! Down(markdownString: VisitorTests.cmarkMarkdownTestString).toAttributedString().string
        }
    }

I'm arbitrarily using https://github.com/thephpleague/commonmark/edit/master/tests/benchmark/sample.md

but there are more specific examples here https://github.com/commonmark/cmark/tree/master/bench/samples

There are a few differences in the actual .string result though which is a little strange. It leaves a few of the HTML tags in place using the AST method whereas WebKit renders them all down to strings

@johnxnguyen
Copy link
Owner Author

johnxnguyen commented Apr 21, 2019

Whoops, forgot that last point about the children nodes. Fixed that, a typo and the warnings.

In regards to the strings produced by AttributedStringVisitor, I've by no means based the result off what WebKit is rendering. What do you mean exactly about the html tags?

I believe we will need to tweak AttributedStringVisitor once we can use and test it more. In this PR I wanted to focus on the tree traversal rather than the rendered output. From experience, there are a lot of things to consider when rendering the output, mostly styling related, it could be that there needs to be some extra line breaks or fewer line breaks in particular cases.

Providing a default styler would be the icing on the cake for this feature, since it's quite a bit of work to make the attributed string look as nice as the render in a web view.

@iwasrobbed
Copy link
Collaborator

Re: HTML tags: I think the AST renderer is actually doing "the right thing"™️ in this case since the Markdown sample has actual HTML embedded within it for a couple of the headers instead of pure Markdown, so it's rendering it as inline HTML which makes sense.

Screen Shot 2019-04-21 at 6 15 41 PM


Providing a default styler would be the icing on the cake for this feature, since it's quite a bit of work to make the attributed string look as nice as the render in a web view.

For sure, this can come later but this is a great abstraction to work from. It'll likely bring the overall performance down a little depending upon the attributed string stylings, but still much better when this can be run on a background thread

@iwasrobbed
Copy link
Collaborator

Thanks again @johnxnguyen ; I tagged all the issues/PRs asking for this over the years so you can see how many people you've helped and will help.

It's a great foundation to build upon and we'll see about adding a default styler to help people who want the performance with some minimal setup next

@johnxnguyen
Copy link
Owner Author

You're most welcome @iwasrobbed , I'm glad I could help!

@kengruven
Copy link
Collaborator

I haven't tried it yet but this looks great. Exactly what I would have done, if I had infinite time. Thank you!

@ykphuah
Copy link
Contributor

ykphuah commented Jun 15, 2019

Not sure if this is the right place or I should create another new issue.

One good thing about the existing renderer is that when I have a bullet item that wraps around, the second line starts at the proper place, aligned to the indent of the first line.

I just tried using the EmptyStyler, it seems like I won't be able to get that same behavior if I use the Styler, is that correct?

@johnxnguyen
Copy link
Owner Author

@ykphuah the empty styler is just for test purposes and doesn't actually style the attributed string in any way. You will need to create your own styler to format the lists. I'm currently working on providing a default styler that will take care of this, amongst other things, but it's not ready yet. If you need some hints of how to implement the list formatting, just let me know.

@ykphuah
Copy link
Contributor

ykphuah commented Jun 16, 2019

@ykphuah the empty styler is just for test purposes and doesn't actually style the attributed string in any way. You will need to create your own styler to format the lists. I'm currently working on providing a default styler that will take care of this, amongst other things, but it's not ready yet. If you need some hints of how to implement the list formatting, just let me know.

Yeah, I need help. I picked MarkdownKit currently, because it have more "sane" defaults for the font sizes etc. But one issue is that if my list item wraps, the second line will not start aligned to the first line after the bullet. It shows like this

* asdfasd sdf asdf asdf
asdf

When I want it to show like this

* asdfasd as dfas dfa sdf a
  asdf as dfsa df asd fa

The "non-AST" version of the toAttributedString does do this correctly. I am trying to figure out how to do this via the AST. If you can give me some hints that will be great. Even if the EmptyStyler doesn't do anything, it added the bullets there though, instead of asterisk, so it seems like it is doing something.

I have tried this:

func style(list str: NSMutableAttributedString) {
       let paragraphStyle = NSMutableParagraphStyle()
       paragraphStyle.firstLineHeadIndent = 20.0
       paragraphStyle.headIndent = 20.0
       str.addAttributes([.paragraphStyle: paragraphStyle])
   }

Still doesn't seems to indent the second line.

@johnxnguyen
Copy link
Owner Author

@ykphuah, the AttributedStringVisitor walks the AST generated by cmark in order to build the attributed string. The AST is a structural representation of the document, therefore the AttributedStringVisitor needs to build up the attributed string using the content stored in the tree as well as inserting the necessary line breaks and numbers and bullets for the lists. Once the visitor has build the substring represented by a specific node in the AST, it will give the styler a chance to add any attributes.

As for the list formatting you want: make use of the tab that is between the list item prefix (number or bullet) and the list item content. Instead of setting the firstLineHeadIndent, create a single NSTextTab with the location equal to the headIndent. You set this tab stop as the only element on the paragraph styles tabStops property. Like this:

let style = NSMutableParagraphStyle()
let indentation: CGFloat = 20
style.headIndent = indentation
style.tabStops = [NSTextTab(textAlignment: .left, location: indentation, options: [:])]

This is just off the top of my head but I think it would work, and I hope it helps.

@ykphuah
Copy link
Contributor

ykphuah commented Jun 17, 2019

let style = NSMutableParagraphStyle()
let indentation: CGFloat = 20
style.headIndent = indentation
style.tabStops = [NSTextTab(textAlignment: .left, location: indentation, options: [:])]

Thanks, this worked very well. How about nested list? :P

@iwasrobbed
Copy link
Collaborator

FYI: this has been further improved with a default Styler via #177

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants