Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved markup, CSS, JS tokenization #1871

Open
karlhorky opened this issue Apr 26, 2019 · 10 comments
Open

Improved markup, CSS, JS tokenization #1871

karlhorky opened this issue Apr 26, 2019 · 10 comments

Comments

@karlhorky
Copy link
Contributor

karlhorky commented Apr 26, 2019

Hi! First, thanks of all for this awesome project! It's affected so many projects around the world. Incredible work!

Motivation

I'm trying to recreate the default VS Code syntax highlighting with Prism.js (via prism-react-renderer) and I'm running into limitations of Prism.js.

Description

I would like to be able to highlight certain tokens that are not granular enough in HTML, CSS and JS.

Would it be something that the Prism.js team is interested in to have more granular tokenization of HTML, CSS and JS? If possible, also with backwards compatibility?

Some first quick examples of things that could be improved across HTML, CSS and JS (compared against VS Code 1.33.1 default dark mode):

HTML: DOCTYPE

Screen Shot 2019-04-26 at 15 58 02

Screen Shot 2019-04-26 at 16 00 27

HTML: Treating quotes and equals differently than normal punctuation

Screen Shot 2019-04-26 at 16 00 18

Screen Shot 2019-04-26 at 16 00 30

CSS: URL values

Screen Shot 2019-04-26 at 16 01 10

Screen Shot 2019-04-26 at 16 01 19

CSS: Arguments in selectors

Screen Shot 2019-04-26 at 16 10 26

Screen Shot 2019-04-26 at 16 10 31

CSS: Keywords - ex. generic font family names, colors (maybe in CSS-Extra?)

Screen Shot 2019-04-26 at 16 07 46

Screen Shot 2019-04-26 at 16 09 36

Screen Shot 2019-04-26 at 16 09 52

Screen Shot 2019-04-26 at 16 20 35

JS: Regex sequences, regex escaping

Screen Shot 2019-04-26 at 16 17 44

Screen Shot 2019-04-26 at 16 00 40

The code used:

<!DOCTYPE html>

<html lang="en">
  <head>
    <script>
      window.console && console.log('foo');
      function initHighlight(block, cls) {
        try {
          if (cls.search(/\bno\-highlight\b/) != -1)
            return process(block, true, 0x0f) + ` class="${cls}"`;
        } catch (e) {
          /* handle exception */
        }
        for (var i = 0 / 2; i < classes.length; i++) {
          if (checkCondition(classes[i]) === undefined)
            console.log('undefined');
        }
      }
    </script>
    <title>Prism</title>
    <style>
      @font-face {
        src: url(https://lea.verou.me/logo.otf);
        font-family: 'LeaVerou';
      }

      #features li:nth-child(odd),
      footer p {
        font: 100% Consolas, 'Andale Mono', monospace;
        background-image: linear-gradient(
          45deg,
          transparent 34%,
          white 34%,
          white 66%,
          transparent 66%
        );
        text-shadow: 0 1px white;
      }
    </style>
  </head>
</html>
@RunDevelopment
Copy link
Member

RunDevelopment commented Apr 26, 2019

Thanks for taking the time to make this report!

HTML: DOCTYPE

Agreed.

HTML: Treating quotes and equals differently than normal punctuation

I believe that the custom class plugin can be used to add more specific classes to these token which would allow for more granular highlighting.
Edit: The plugin can't be used in that way, so we'll have to find a different solution.

CSS: URL values

Agreed. (#1874)

CSS: Arguments in selectors

If you only need even and odd, I'm on board. (#1872)

CSS: Keywords - ex. generic font family names, colors (maybe in CSS-Extra?)

CSS has quite a few keywords, so I don't know if we want to highlight them all because this will increase the size of the CSS language definition. Also, we highlight everything besides the keywords, so wouldn't it be possible to just assume that everything which isn't highlighted is a keyword?

JS: Regex sequences, regex escaping

The Regex language is what you're looking for. This language will add regex tokenization to JS (similar to how CSS-Extras adds new features to CSS).

@RunDevelopment
Copy link
Member

Regarding the quotes:

I have 2 possible solutions: 1. Add a hook which gives the first = a special class. Change the style of attr-value to look the same as strings and overwrite the color for the = using the special class.

Prism.hooks.add('wrap', function (env) {
	if (env.type === 'punctuation' && env.content === '=') {
		env.classes.push('special-punctuation-name');
	}
});

Not tested but should work. This will also affect the = signs of all other languages but that shouldn't be too much of a problem.

The second solution is to modify the markup language definition:

'attr-value': {
	pattern: /=\s*(?:"[^"]*"|'[^']*'|[^\s'">=]+)/i,
	inside: {
		'punctuation': /^=/,
		'string': {
			pattern: /(\s*)[\s\S]+/,
			lookbehind: true
		}
	}
}

I would careful with modifying the language definition because my other languages rely on markup, so this might cause problems.

@RunDevelopment
Copy link
Member

Regarding "HTML: Treating quotes and equals differently than normal punctuation":

Another CSS-based solution is to use the fact that the = character will always be the first token inside a attr-value token. So the following to uniquely identify every = character:

.token.attr-value > .token.punctuation:first-child {
  /* styles */
}

@karlhorky
Copy link
Contributor Author

Thanks, these solutions look mostly really good for a start. A couple of points:

CSS has quite a few keywords, so I don't know if we want to highlight them all because this will increase the size of the CSS language definition.

Would it be an acceptable increase to add just the colors and generic font family names for now? If the size is too large, would it be viable to move these out to CSS Extras instead?

Also, we highlight everything besides the keywords, so wouldn't it be possible to just assume that everything which isn't highlighted is a keyword?

If you take a look at the screenshot from VS Code, you'll see that Consolas (and monoospace) are highlighted as plain text, and the other keywords (such as monospace and transparent) are highlighted in the dark orange color. This would not be possible with Consolas, monoospace, monospace and transparent all being treated as plain text.

The Regex language is what you're looking for. This language will add regex tokenization to JS (similar to how CSS-Extras adds new features to CSS).

Ah great, thanks for that, didn't know about that!

@karlhorky
Copy link
Contributor Author

I'll be sure to file any further issues if they come up! (Keeping an eye on pragmatism of course)

@RunDevelopment
Copy link
Member

Regarding "HTML: DOCTYPE":

Adding tag-highlighting to the doctype token significantly changes the look of Prism's HTML highlighting. It looks really different, so I don't know if we want that.

Instead of changing the language definition for everyone, it might be the simplest solution to delete this token for just you. VS Code treats the doctype like any other tag, AFAIK.

@RunDevelopment
Copy link
Member

Regarding CSS keywords:

I see the issue with fonts and the like but thought that these minor false positives were acceptable.
Well, all keywords it is then! 😄

@karlhorky
Copy link
Contributor Author

karlhorky commented May 8, 2019

Adding tag-highlighting to the doctype token significantly changes the look of Prism's HTML highlighting. It looks really different, so I don't know if we want that.

Hm... can you elaborate? Since the doctype is already being matched, is there no way of doing some kind of sub-matching on the group already?

Instead of changing the language definition for everyone, it might be the simplest solution to delete this token for just you. VS Code treats the doctype like any other tag, AFAIK.

Do you mean that if I create a custom rule, I can stop Prism from tokenizing the doctype and it will get similar highlighting to the other tags? For example, will ! be treated here as punctuation? Here's VS Code's version again:

VS Code doctype

Compared to a regular tag, it appears that the tag name and the attribute are similarly highlighted (the only difference is the !):

VS Code tag

@RunDevelopment
Copy link
Member

Here's a little snippet which extends the highlighting of doctypes:

Prism.languages.markup.doctype = {
	pattern: Prism.languages.markup.doctype,
	inside: Prism.util.clone(Prism.languages.markup.tag.inside)
};
Prism.languages.markup.doctype.inside.tag = {
	pattern: /^<!DOCTYPE/i,
	inside: {
		'punctuation': /^<!/
	}
};

The doctype token will then have (almost) the same structure as tag.

If you really don't care about the doctype token, you can also use this snippet:

delete Prism.languages.markup['doctype'];
Prism.languages.markup['tag'].inside['tag'].inside['punctuation'] = /^<[!/]?/;

Both snippets handle the ! and should solve your problem.
I hope that helps.


And with Adding tag-highlighting to the doctype token significantly changes the look of Prism's HTML highlighting, I meant really just that. It looks different.

Before:
image

After:
image

@karlhorky
Copy link
Contributor Author

karlhorky commented May 8, 2019

Ah sorry I think there was a misunderstanding here.

And with Adding tag-highlighting to the doctype token significantly changes the look of Prism's HTML highlighting, I meant really just that. It looks different.

Now I'm understanding! Adding highlighting of the doctype is a big change to the visual appearance of HTML files, and it's uncertain if this change is aligned with the goals of the Prism project. Got it!

My understanding was that you meant that the internal implementation looked a lot different (significantly changes the look of Prism's HTML highlighting) and the code became more complex, causing worries about adding that complexity.

Here's a little snippet which extends the highlighting of doctypes
The doctype token will then have (almost) the same structure as tag.

Yeah as long as it's possible with some extension, I'm fine with this route as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants