improve script/style extract #9564

dominikg · 2023-11-20T21:38:50Z

Describe the problem

svelte uses regex to extract information about script and style nodes in .svelte files

Recent changes increased the complexity of these regex to improve their accurary and allow for ankle brackets in attributes eg generics="<...>"

svelte/packages/svelte/src/compiler/preprocess/index.js

Line 257 in 9926347

    
           /<!--[^]*?-->|<style((?:\s+[^=>'"/]+=(?:"[^"]*"|'[^']*'|[^>\s]+)|\s+[^=>'"/]+)*\s*)(?:\/>|>([\S\s]*?)<\/style>)/g;

These regexes for script and style tags also extract the content to be passed to svelte.preprocess.

Due to their complexity, their performance degrades when processing larger files and they are hard to read/maintain (case in point, #9551 )

They are also replicated in other svelte tooling that also needs access to this information without running a full svelte.parse.

Describe the proposed solution

I'd like to suggest a few improvements.

extract these into a utility either exported from svelte or a new utility library
add more tests that check all possible combinations. Eg the current implementation does not allow whitespaces other than space after <script or any whitespace preceding the closing bracket in </script >

Fixing those would make the regex slower and more complex again, but i think we can make reasonable limitations to what a svelte component should be allowed to use for declaring script and style blocks. Which opens up an entirely different avenue:

replace regex with a string walking extractor that explicitly only seeks top level scripts at the beginning of .svelte and style blocks at the end. This allows us to skip the entire template block in the middle.

The resulting extract code would be a bit larger than a regex, but much safer and with better performance characteristics (speed and memory). A super simple start can be found here https://jsben.ch/ODOHu

For backwards compatibility, we could offer a fallback to either full parse or the previous regex approach. but most projects using prettier-plugin-svelte with script-template-style ordering should have no issue with it.

Alternatives considered

rethink parsing entirely and make this extract no longer needed
leave as is

Importance

would make my life easier

The text was updated successfully, but these errors were encountered:

flakolefluk · 2023-12-06T21:41:11Z

Might be related.
I've found this bug today, where parsing the script section breaks when declaring a string with a closing script tag.

dominikg · 2023-12-06T21:50:54Z

no, thats just how html parsing works

flakolefluk · 2023-12-06T22:35:53Z

Oh got it, when I saw that extraction in the title and the description I thought that the script was extracted with a regex and then there was JS parsing on the content of the script tag. That would've made the content valid for parsing, but the component still broke. Thanks for clarifying.

dominikg · 2024-01-27T14:50:27Z

Here is a basic library that implements what i outlined in the initial post

https://github.com/svitejs/sfcsplit

You can tell it the tags you are interested in (useful for custom tags like template used in svelte-preprocess for multi-file-components) and it will parse from start and end, so parse speed is not affected by the size of the template.

Its algorithm works for valid html syntax, including whitespace in tags, linebreaks between attributes and self-closing tags. It returns the tags, their position and content as well as all parsed attributes while being at least 2x faster than the current regex in a benchmark parsing meltui and carbon components.

One caveat is that blocks parsed at the end of the file cannot contain their own opening tag as comment or string literal, eg

<div>red text<div>
<style>
div { color: red }
/* <style>  this is bad */
</style>

or

<div>{openingScriptTag}<div>
<script>
const openingScriptTag = "<script>"; // this is bad too
</script>

The benefits of not having to look at all the template nodes in the middle of the file outweigh this limitation in my opinion, code like that should be exceedingly rare and it is a smaller limitation than the svelte4 regex approach that has more false positives for nested script blocks.

Rich-Harris · 2024-08-20T17:29:01Z

Personally I'm inclined to completely overhaul how preprocessing works — I think it makes a lot more sense to treat transformations of .svelte files the same as transformations of any other file, i.e. as a discrete step in a chain of transformations:

// vite.config.js
import { svelte } from '@sveltejs/vite-plugin-svelte';
import { preprocess } from 'svelte-preprocess/vite';

export default {
  plugins: [
    preprocess(),
    svelte()
  ]
};

Otherwise we're just doing the bundler's job for it, but worse (e.g. preprocessors aren't represented in things like vite-plugin-inspect). Svelte can expose utility functions to make it easier to write preprocessors, if we want, but it's not totally clear that even that is necessary.

In other words I don't think it makes sense to tinker at the margins by replacing the regex extraction technique with something better — I think we'd be better off nuking svelte.preprocess altogether. To me the question is whether we think it's realistic to deprecate it in 5.0 and remove in 6.0, or deprecate in 5.x, or in 6.0. I'd prefer to deprecate in a major but it would be a shame to be stuck with it until 7.0.

Thoughts?

Rich-Harris · 2024-08-20T17:31:15Z

(I suppose an argument against this approach is that e.g. editors have no way of knowing how to preprocess .svelte files and will likely report syntax errors all over the place. But come on, people, it's 2024 — TypeScript is natively supported and Sass et al are dead. We don't need to mess around with preprocessors any more.)

dominikg · 2024-08-20T17:36:02Z

thats going to pose a challenge to markup preprocessors like mdsvex, however one could argue that editor support with svelte compiler related squigglies in that could be hard.

A Typescript preprocessor is still needed for any typescript feature that emits code.

dominikg · 2024-08-20T17:40:26Z

even without preprocess parse could still benefit from fast split into script/template/style and forwarding to their respective content parsers

dummdidumm · 2024-08-20T17:57:49Z

Let's revisit this for Svelte 6. This isn't related to any changes in 5 and would be an unnecessary additional hurdle for upgrading. It will also take time to properly design this, so let's not worry about it for now

Rich-Harris · 2024-08-20T18:14:50Z

Cool — have moved to the 5.x milestone, so we can at least start thinking about it ahead of that

arxpoetica · 2024-08-28T13:35:50Z

If you kill that preprocessor, would there be a way for those of us who are clinging tenaciously to it to still craft our own technique?

SASS may be dead, but I have a PostCSS responsive font resizing algorithm that is the bomb.

dominikg · 2024-08-29T12:46:10Z

does that algorighm require to be run before the svelte compiler though? regular postcss processing by vite can be done through postcss config and runs on css emitted by the svelte compiler. preprocessing style blocks is only needed when you have syntax that is not parsable by sveltes css parser or when you generate rules that should get scoping applied to them.

benmccann added this to the 5.0 milestone Nov 24, 2023

This comment was marked as off-topic.

Sign in to view

jasonlyu123 mentioned this issue May 22, 2024

svelte-check hangs on <script> tag with specific attributes sveltejs/language-tools#2363

Closed

shadow-identity mentioned this issue May 31, 2024

Prettier plugin hangs on <script> with some particular attributes sveltejs/prettier-plugin-svelte#440

Closed

Rich-Harris modified the milestones: 5.0, 5.x Aug 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

improve script/style extract #9564

improve script/style extract #9564

dominikg commented Nov 20, 2023

flakolefluk commented Dec 6, 2023

dominikg commented Dec 6, 2023

flakolefluk commented Dec 6, 2023

dominikg commented Jan 27, 2024 •

edited

Loading

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

Rich-Harris commented Aug 20, 2024

Rich-Harris commented Aug 20, 2024

dominikg commented Aug 20, 2024

dominikg commented Aug 20, 2024

dummdidumm commented Aug 20, 2024

Rich-Harris commented Aug 20, 2024

arxpoetica commented Aug 28, 2024 •

edited

Loading

dominikg commented Aug 29, 2024

improve script/style extract #9564

improve script/style extract #9564

Comments

dominikg commented Nov 20, 2023

Describe the problem

Describe the proposed solution

Alternatives considered

Importance

flakolefluk commented Dec 6, 2023

dominikg commented Dec 6, 2023

flakolefluk commented Dec 6, 2023

dominikg commented Jan 27, 2024 • edited Loading

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

Rich-Harris commented Aug 20, 2024

Rich-Harris commented Aug 20, 2024

dominikg commented Aug 20, 2024

dominikg commented Aug 20, 2024

dummdidumm commented Aug 20, 2024

Rich-Harris commented Aug 20, 2024

arxpoetica commented Aug 28, 2024 • edited Loading

dominikg commented Aug 29, 2024

dominikg commented Jan 27, 2024 •

edited

Loading

arxpoetica commented Aug 28, 2024 •

edited

Loading