Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improve script/style extract #9564

Open
dominikg opened this issue Nov 20, 2023 · 17 comments
Open

improve script/style extract #9564

dominikg opened this issue Nov 20, 2023 · 17 comments
Milestone

Comments

@dominikg
Copy link
Member

Describe the problem

svelte uses regex to extract information about script and style nodes in .svelte files

Recent changes increased the complexity of these regex to improve their accurary and allow for ankle brackets in attributes eg generics="<...>"

/<!--[^]*?-->|<style((?:\s+[^=>'"/]+=(?:"[^"]*"|'[^']*'|[^>\s]+)|\s+[^=>'"/]+)*\s*)(?:\/>|>([\S\s]*?)<\/style>)/g;

These regexes for script and style tags also extract the content to be passed to svelte.preprocess.

Due to their complexity, their performance degrades when processing larger files and they are hard to read/maintain (case in point, #9551 )

They are also replicated in other svelte tooling that also needs access to this information without running a full svelte.parse.

Describe the proposed solution

I'd like to suggest a few improvements.

  1. extract these into a utility either exported from svelte or a new utility library

  2. add more tests that check all possible combinations. Eg the current implementation does not allow whitespaces other than space after <script or any whitespace preceding the closing bracket in </script >

Fixing those would make the regex slower and more complex again, but i think we can make reasonable limitations to what a svelte component should be allowed to use for declaring script and style blocks. Which opens up an entirely different avenue:

  1. replace regex with a string walking extractor that explicitly only seeks top level scripts at the beginning of .svelte and style blocks at the end. This allows us to skip the entire template block in the middle.

The resulting extract code would be a bit larger than a regex, but much safer and with better performance characteristics (speed and memory). A super simple start can be found here https://jsben.ch/ODOHu

For backwards compatibility, we could offer a fallback to either full parse or the previous regex approach. but most projects using prettier-plugin-svelte with script-template-style ordering should have no issue with it.

Alternatives considered

  • rethink parsing entirely and make this extract no longer needed
  • leave as is

Importance

would make my life easier

@benmccann benmccann added this to the 5.0 milestone Nov 24, 2023
@flakolefluk
Copy link

Might be related.
I've found this bug today, where parsing the script section breaks when declaring a string with a closing script tag.
Screenshot 2023-12-06 at 18 19 51

@dominikg
Copy link
Member Author

dominikg commented Dec 6, 2023

no, thats just how html parsing works

@flakolefluk
Copy link

Oh got it, when I saw that extraction in the title and the description I thought that the script was extracted with a regex and then there was JS parsing on the content of the script tag. That would've made the content valid for parsing, but the component still broke. Thanks for clarifying.

@dominikg
Copy link
Member Author

dominikg commented Jan 27, 2024

Here is a basic library that implements what i outlined in the initial post

https://github.com/svitejs/sfcsplit

You can tell it the tags you are interested in (useful for custom tags like template used in svelte-preprocess for multi-file-components) and it will parse from start and end, so parse speed is not affected by the size of the template.

Its algorithm works for valid html syntax, including whitespace in tags, linebreaks between attributes and self-closing tags. It returns the tags, their position and content as well as all parsed attributes while being at least 2x faster than the current regex in a benchmark parsing meltui and carbon components.

One caveat is that blocks parsed at the end of the file cannot contain their own opening tag as comment or string literal, eg

<div>red text<div>
<style>
div { color: red }
/* <style>  this is bad */
</style>

or

<div>{openingScriptTag}<div>
<script>
const openingScriptTag = "<script>"; // this is bad too
</script>

The benefits of not having to look at all the template nodes in the middle of the file outweigh this limitation in my opinion, code like that should be exceedingly rare and it is a smaller limitation than the svelte4 regex approach that has more false positives for nested script blocks.

@vojtechsimetka

This comment was marked as off-topic.

@MotionlessTrain

This comment was marked as off-topic.

@vojtechsimetka

This comment was marked as off-topic.

@MotionlessTrain

This comment was marked as off-topic.

@dominikg

This comment was marked as off-topic.

@Rich-Harris
Copy link
Member

Personally I'm inclined to completely overhaul how preprocessing works — I think it makes a lot more sense to treat transformations of .svelte files the same as transformations of any other file, i.e. as a discrete step in a chain of transformations:

// vite.config.js
import { svelte } from '@sveltejs/vite-plugin-svelte';
import { preprocess } from 'svelte-preprocess/vite';

export default {
  plugins: [
    preprocess(),
    svelte()
  ]
};

Otherwise we're just doing the bundler's job for it, but worse (e.g. preprocessors aren't represented in things like vite-plugin-inspect). Svelte can expose utility functions to make it easier to write preprocessors, if we want, but it's not totally clear that even that is necessary.

In other words I don't think it makes sense to tinker at the margins by replacing the regex extraction technique with something better — I think we'd be better off nuking svelte.preprocess altogether. To me the question is whether we think it's realistic to deprecate it in 5.0 and remove in 6.0, or deprecate in 5.x, or in 6.0. I'd prefer to deprecate in a major but it would be a shame to be stuck with it until 7.0.

Thoughts?

@Rich-Harris
Copy link
Member

(I suppose an argument against this approach is that e.g. editors have no way of knowing how to preprocess .svelte files and will likely report syntax errors all over the place. But come on, people, it's 2024 — TypeScript is natively supported and Sass et al are dead. We don't need to mess around with preprocessors any more.)

@dominikg
Copy link
Member Author

thats going to pose a challenge to markup preprocessors like mdsvex, however one could argue that editor support with svelte compiler related squigglies in that could be hard.

A Typescript preprocessor is still needed for any typescript feature that emits code.

@dominikg
Copy link
Member Author

even without preprocess parse could still benefit from fast split into script/template/style and forwarding to their respective content parsers

@dummdidumm
Copy link
Member

Let's revisit this for Svelte 6. This isn't related to any changes in 5 and would be an unnecessary additional hurdle for upgrading. It will also take time to properly design this, so let's not worry about it for now

@Rich-Harris Rich-Harris modified the milestones: 5.0, 5.x Aug 20, 2024
@Rich-Harris
Copy link
Member

Cool — have moved to the 5.x milestone, so we can at least start thinking about it ahead of that

@arxpoetica
Copy link
Member

arxpoetica commented Aug 28, 2024

If you kill that preprocessor, would there be a way for those of us who are clinging tenaciously to it to still craft our own technique?

SASS may be dead, but I have a PostCSS responsive font resizing algorithm that is the bomb.

@dominikg
Copy link
Member Author

does that algorighm require to be run before the svelte compiler though? regular postcss processing by vite can be done through postcss config and runs on css emitted by the svelte compiler. preprocessing style blocks is only needed when you have syntax that is not parsable by sveltes css parser or when you generate rules that should get scoping applied to them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants