Skip to content
This repository has been archived by the owner on Aug 24, 2019. It is now read-only.

Improved parser and other stuff #18

Open
wants to merge 115 commits into
base: master
Choose a base branch
from
Open

Improved parser and other stuff #18

wants to merge 115 commits into from

Conversation

alehed
Copy link

@alehed alehed commented Aug 7, 2016

This should fix #5 and make the framework usable for production.
Uses or closes #5, #10, #11, #15 and #16.

Major changes

  • It works, parsing actually produces results that are close to what TextMate does.
  • Supports incremental parsing with an NSOperation subclass.
  • The public interface to get languages and themes has changed.
  • Updated for Swift 3, Xcode 8 and the Swift API Design Guidelines
  • It copies the Color code from the X dependency (see Copied over color.swift from X since X is not a pod and  #11 from alimoeeny).
  • Oh, and I changed tabs to spaces (soft tabs).

The last two changes might be a bit polarizing. The reationale behind them is to make it easier for new people to contribute. Setting up the environment for a new project is always a big hurdle for adoption. If you want to invite people to play around with your framework, make it as simple as possible to go from git clone to a project that builds (and where tests pass). Like this it builds out of the box with no dependencies. Also, the current default in Xcode is soft tabs (4 spaces), so a new user will not mess up the spacing if he uses a clean install of Xcode.

Caveats

There are however a few things that still don't work as expected. Feel free to improve upon this.

Due to differences between Onigurama and NSRegularExpression:

  • \G is not supported (for performance reasons the parser does not guarantee to match the end of a pattern right after begin, and even if it would, \G behaves wierdly with NSRegularExpression). This might be fixable.
  • NSRegularExpression seems to require a few more escape characters in certain places.
  • Backreferences from end to begin like \1 are not supported (I would like to know how Texmate does this).

Not implemented:

  • The contentName property is ignored.
  • Attributes other than foreground color are ignored.
  • Cannot recursively include itself (use $self or $base instead, would require extra checks in the BundleManager). Fixable with extra logic.

Notes

Since I don't use Carthage or CocoaPods, some work probably still has to be done that is plays well with them. Any help would be appreciated.

alehed added 30 commits January 1, 2016 19:36
Fix range bugs. Add Classes to handle entries in the tmbundles
correctly. Extend tests.
no more prints and more tight matching
Some of the grammar regexes are not valid in CocoaLand.
Soft tabs is the way to go (and the default in xcode, afaik).
This change adds updatedRangeForChange to the parser and implements
a ScopedString class to store all the scope information. Most other
changes are tests for the new functionality or minor consistency changes.
now recursive structures and the like are read into the datastructure
Firstly this should clear up some confusion about scopes. In this
project scope means some form of lexical scope (basically what is
between the begin and end pattern) while what TextMate calles scope
selectors or scope for short is now called patternIdentifier or
identifier for short. A scope is now also a type of result.
Added test for the range extensions.
The parser now returns the range that definitely has to	be parsed but
parses further if it has to. Turns out the range cannot	be determined
without actually parsing the new string so this seems to be a good
tradeoff.
end pattern should always take precedence over body matches and
include $self should not override the other patterns in the array.
Mid patterns have precedence over end patterns and unhilighted begin/end
parts should still be put into the scopes string
@Eitot
Copy link

Eitot commented May 25, 2017

It would be awesome if this could be merged.

@alehed
Copy link
Author

alehed commented May 26, 2017

In the meantime you can just use the fork.

@DivineDominion
Copy link

@soffes any plans on merging this in?

@Lessica
Copy link

Lessica commented Aug 14, 2017

Backreferences from end to begin like \1 are not supported (I would like to know how Texmate does this).

Any progress on this?

@alehed
Copy link
Author

alehed commented Aug 14, 2017

Nope, I decided the current state was good enough (backreferences are usually used to recognize languages embedded within other languages, which usually slows parsing down quite a bit). Also for this you would have to match the begin and end part at the same time which is kind of tricky to implement.

If you want to have a go at this, PR's are always welcome.

@Lessica
Copy link

Lessica commented Aug 14, 2017

Yes, of course. Lua uses this syntax:

(Multiline Comment)
begin: --[(=*)[
end: ]\1]

(Multiline String)
begin: [(=*)[
end: ]\1]

I am working on an iOS Lua Editor, your work really helped me a lot!

@alehed
Copy link
Author

alehed commented Aug 14, 2017

You are welcome!

Yes, this is another use of back-references. I see how this can be useful.

If you want to implement this, your best bet is probably to combine the begin and end expressions with a .* and match that instead of just the begin expression. For this you would have to take a look at Pattern.swift and Parser.swift.

@benstockdesign
Copy link

@alehed I've got a question (or five), partner!

When using AttributedParsingOperation to perform incremental parsing, it seems as though there are only two options: insertion or deletion. How might one go about handling the ”replacement” of text? There are three replacement situations that I continually run into where I'm not sure how best to handle parsing and/or re-parsing.

For instance, in all of the examples below, assume the following is true:

let source = "Hello, World!"
let replacementRange = NSRange(location: 7, length: 6) // "World!"

Given source is the target string and replacementRange is the range to be replaced within the target string, the possible replacement situations are:

  • Exact replacement: The replacement string is exactly the length of the range being replaced. Thus, the change in length between the strings is 0 (i.e. no characters have been “deleted” or “inserted” — in the traditional sense of the words).

    let replacement = "There!"
    let result = source.replacingCharacters(in: replacementRange, with: replacement)
    print("\(source) => \(result) (delta: \(result.length - source.length))") 
    // ~> Hello, World! => Hello, There! (delta: 0)
    
  • Additive replacement: The replacement string is longer than the range being replaced. Thus, the change in length between the strings is a positive value (i.e. characters have been “inserted”).

    let replacement = "My Dudes and Dudettes!"
    let result = source.replacingCharacters(in: replacementRange, with: replacement)
    print("\(source) => \(result) (delta: \(result.length - source.length))") 
    // ~> Hello, World! => Hello, My Dudes and Dudettes! (delta: 16)
    
  • Subtractive replacement: The replacement string is shorter than the range being replaced. Thus, the change in length between the strings is a negative value (i.e. characters have been “deleted”).

    let replacement = "Pal!"
    let result = source.replacingCharacters(in: replacementRange, with: replacement)
    print("\(source) => \(result) (delta: \(result.length - source.length))") 
    // ~> Hello, World! => Hello, Pal! (delta: -2)
    

In these situations, does one "break down" the replacement operation into its sub-operations? What about the first case, where the string has had no change in length? Does one re-parse the string in its entirety?

As far as background information goes, I simply have a custom NSTextView subclass that processes the changes as they happen. In shouldChangeText(in:replacementString:), I record the change (if the change should be allowed), then on didChangeText(), I enqueue the parsing operation in a custom queue like you suggested (e.g. chaining the previous operation to the next, or if no operations have been performed, creating the initial operation with the theme, language, etc.). Everything works great until I want to edit the text storage in some other method (like to implement convertSpacesToTabs() or something similar). I never know which range(s) should be considered “dirty.”

If none of this makes sense, please let me know, and I'll try my best to expound on or simplify my question(s). Basically, I just want to know how best to handle the “replacement” aspect of the incremental parsing operation, rather than raw insertion/deletion scenarios.

Thanks!

P.S. Your changes to the base SyntaxKit are badass, and they've been a great help to me!

@alehed
Copy link
Author

alehed commented Sep 26, 2017

Hi @benstockdesign, glad you like it.

Replacement is quite simple, first you delete then you insert. Firstly, this model is simple and always works. Secondly, since the library depends on Foundation and UIKit it is designed to integrate well with the rest of UIKit. For instance NSTextStorage can notify you every time an update happens. If you do a replacement in a UITextView it will first send you a deletion and afterwards an insertion notification. So you never have to deal with actual replacements.

If you observe a different behavior by NSTextStorage, please let me know.

@benstockdesign
Copy link

@alehed Okay, that clears things up. I was just never sure. Oh, wait, one last question: what about in NSTextStorage when edits are coalesced (i.e. beginEditing(); insert(); delete(); delete(); insert(); endEditing())? Do you just attack the final change or all of the in-betweeners? By the way, I'm working on an editor for the Mac (I honestly think I could be the only one), so maybe there's a difference when it comes to some of the UI_____Class vs. NS_____Class stuff. I don't think there are too many differences with the Cocoa text system, but maybe I've overlooked something. Either way, thanks, again!

@alehed
Copy link
Author

alehed commented Sep 26, 2017

@benstockdesign according to my experience user edits are never coalesced, correct me if I'm wrong. If you are doing edits programmatically either do all the changes in succession, or maybe it is faster to just reload the whole thing once in the end. You might want to write a performance test for that.

The underlying class is NSTextStorage for both UITextView and NSTextView.

@iosdevzone iosdevzone mentioned this pull request Apr 3, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Only a subset of syntax is highlighted
10 participants