Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preprocessor output files are deleted when hbs renderer is called (from mdBook > 0.3.1) #1087

Open
sytsereitsma opened this issue Nov 4, 2019 · 13 comments
Labels
A-Preprocessor Area: Preprocessors

Comments

@sytsereitsma
Copy link

Hi,

I have a preprocessor (mdbook-plantuml) which creates files in the book/img directory. The changes in issue #985, remove these files again (the destination.exists() if clause in Renderer::render(...) in src/renderer/html_handlebars/hbs_renderer.rs).

This kind of breaks my preprocessor :-(

Any ideas on how this could/should be fixed?

Sytse

@ehuss
Copy link
Contributor

ehuss commented Nov 4, 2019

@Michael-F-Bryan can you take a look at this?

@sytsereitsma
Copy link
Author

I think a solution could be extending the Renderer trait with a prepare function, which can be called before running the preprocessor.

@Michael-F-Bryan
Copy link
Contributor

I wouldn't expect a preprocessor to be touching anything within another backend's output directory. That could end up being quite brittle because you're making assumptions on how the backend will render things.

Your idea of lifecycle methods (e.g. before_preprocess()) is a good one, although we'd need to update the 3rd party plugin mechanism to take it into account. One way this could be propagated to 3rd party backends would be to append an argument to the executed command, with no arguments meaning "run the normal rendering code". Another idea is to use environment variables (e.g. MDBOOK_PHASE=render or MDBOOK_PHASE=before_preprocess). Both could be implemented in a backwards-compatible way.

A second idea is to add a section to the [build] config to tell mdbook "before running the preprocessors for backend X, delete its output directory":

[build]
delete-output-directory-before-build = ["html"]

The latter would probably be easier to implement, but I'm not sure if it would be scalable/extensible or considered hacky.

Otherwise, instead of generating image files, mdbook-plantuml could use data URIs. That way when you do your transformation, instead of replacing the original text with a a normal <img src="some-file.png"/> you could emit <img src="data:image/png;base64,R0lGXoGAiqh..." />.

@sytsereitsma what are your thoughts? Do any of those solutions sound easier/better to you? I'm happy to make PRs for any mdbook changes.

@sytsereitsma
Copy link
Author

@Michael-F-Bryan I'll change my preprocessor to use data URIs to avoid the issue altogether. You have a nice point where my preprocessor should be independent of the renderer.

I do think the lifecycle methods can avoid the problem (e.g. for other preprocessors). It is very difficult to troubleshoot for a regular user because it just seems as if the preprocessor does not work properly.

Didn't know about data URIs, thanks for the tip 👍

@Michael-F-Bryan
Copy link
Contributor

It is very difficult to troubleshoot for a regular user because it just seems as if the preprocessor does not work properly.

See sytsereitsma/mdbook-plantuml#7 (comment). You can write error messages to STDERR and return a non-zero exit code from the preprocessor to let users know about errors.

I do think the lifecycle methods can avoid the problem (e.g. for other preprocessors).

If we were to go down the lifecycle method path, can you think of any other lifecycle methods we may want to add to either the preprocessor or renderer?

@sytsereitsma
Copy link
Author

As far as my preprocessor is concerned everything is working fine. It's just that the output gets deleted once it is done :-)

Regarding the lifecycle methods I do not see any other sensible method than before_preprocessor when looking at MDBook::execute_build_process (src/book/mod.rs).

I have one more suggestion. Since deleting a directory is a significant operation, I'd add a log message when performing the deletion (maybe even an info message).

Regarding fixing my preprocessor I think I will copy the files to the source dir of the book to be renderer agnostic. The maximum size of data URIs differs between browsers (according to MDN opera only allows 65k). Next to that it is likely that another renderer does not support data URIs.

@dylanowen
Copy link
Contributor

I ran into this issue as well with https://github.com/dylanowen/mdbook-graphviz I "solved" it by putting my output in the src directory which is pretty gross.

As for suggestions of what we could do. I like the model we have now where we're always operating across stdin -> stdout so chaining / modifying markdown works nicely.

What about extending BookItem to include a Resource File or adding Resources to the Book? Preprocessors that wanted to process on resource files would have that option, or they could also generate them and pass them down the line. Then our entire src directory becomes something we're operating on vs just the markdown.

I'm not sure how we'd serialize the resource files across the json boundary, and I'm not sure about backwards compatibility but it would give preprocessors more power to interact with the various files without needing to modify the src or the output directories.

@Michael-F-Bryan
Copy link
Contributor

I like the idea of adding a "resource" section to the Book. In theory this should be backwards compatible because the JSON deserializer would interpret a missing "resource" key as an empty Vec<Resource>.

It wouldn't be backwards compatible if we were to add all non-markdown files as Resources to the Book , though (preprocessors/renderers not aware of Resources would silently drop them). That feels like a more general solution because it means we load all files under src/ into memory so files generated by a preprocessor aren't special.

I agree that writing into the src/ or book/html/ directories feels wrong. It'd be like if a build script wrote to target/debug/ or src/ instead of the directory it's told write to ($OUTPUT_DIR, the equivalent of MDBook::build_dir_for()).

@sytsereitsma
Copy link
Author

Something that came to mind.... Images can get rather large and plentiful, meaning the amount of data to transfer over stdio (and json) may get quite large. How about outputting to a 'preprocessor-resources' directory, which gets copied into the book output dir by the renderer?

@sytsereitsma
Copy link
Author

Hi Guys,

I'd like to see this resolved. The serve command gets into an infinite regeneration loop because the files in the src dir are modified by the preprocessor (with the workaround I created).

I'm more than happy to create a PR for this, but I'd like to have consensus on the path to follow.

As discussed above we have the following options (so far):

  1. The before_preprocessor lifecycle method (quick hacky fix)
  2. Extending the Book with a resources section and serialize the images using the stdin interface between the preprocessor and mdBook (is there a length limit for JSON strings in serde_json?).
    I'd suggest deflated base64 encoded image data,
  3. Use a preprocessor-resources directory for staging (which is ignored by the serve command)

Sytse

@dylanowen
Copy link
Contributor

That's a good point about transferring large amounts of data over stdio, I've been unable to find much info on the limits or performance hits for it though.

My rationale for passing data through stdout/in was that it would give preprocessors more power to interact with each other.

For example we could create a "Thumbnail Preprocessor" that would take in images in the src directory and generate a thumbnails page to reference them. In the current model this is possible, but it'd be great if the output of mdbook-plantuml could feed into this Thumbnail preprocessor, without needing to stage anything in the src directory.

@sytsereitsma
Copy link
Author

I'm not too worried about the stdio overhead compared to the whole sequence of events (i.e. generate a file, load it in memory, zip and encode it transfer it, extract and save it). My main concern is the unneeded data serialization/deserialization, even though that probably is a negligible overhead compared to the graphviz rendering.

So time to put the assumptions to the test and generate some data to base the decisions on. I'll do some spikes on how serde json handles large string values and see if I can collect some metrics on different strategies.

@segfaultsourcery
Copy link
Contributor

How about a slightly different approach, where in the first step of the build process, the src tree gets copied into an intermediary work directory (let's call it prep) that pre-processors can write to. Once they're done, the normal build takes over, but instead of building the files found in src, it builds whatever is found in prep. Once built, prep can safely be destroyed.

sytsereitsma added a commit to sytsereitsma/mdBook that referenced this issue Feb 12, 2020
@ehuss ehuss added the A-Preprocessor Area: Preprocessors label Apr 21, 2020
niko-dunixi pushed a commit to niko-dunixi/mdbook-plantuml that referenced this issue Feb 22, 2022
Should solve the issue for this preprocesssor and create a better
user-experience. Also enables other downstream renderers to work
agnostically and not only the html renderer.

- rust-lang/mdBook#1087
niko-dunixi pushed a commit to niko-dunixi/mdbook-plantuml that referenced this issue Feb 22, 2022
Should solve the issue for this preprocesssor and create a better
user-experience. Also enables other downstream renderers to work
agnostically and not only the html renderer.

- rust-lang/mdBook#1087
niko-dunixi pushed a commit to niko-dunixi/mdbook-plantuml that referenced this issue Feb 22, 2022
Should solve the issue for this preprocesssor and create a better
user-experience. Also enables other downstream renderers to work
agnostically and not only the html renderer. Can convert text files to
data-uri but will still defer to the original text embedding method to
preserve historical functionality

- rust-lang/mdBook#1087
sytsereitsma added a commit to sytsereitsma/mdbook-plantuml that referenced this issue Jul 4, 2022
* Creating DataURI for image inlining

Should solve the issue for this preprocesssor and create a better
user-experience. Also enables other downstream renderers to work
agnostically and not only the html renderer. Can convert text files to
data-uri but will still defer to the original text embedding method to
preserve historical functionality

- rust-lang/mdBook#1087

* Made the use of data URIs configurable

* Cargo fmt and compiler warnings

* Do not create data uri images in book src dir
Create them in the .mdbook-plantuml-cache dir in the book root instead

* Update e2e tests

* Use markdown output instead of <img> tags and allow clicking
Clicking appears to be blocked for data uris on most browsers

Co-authored-by: Paul FREAKN Baker <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-Preprocessor Area: Preprocessors
Projects
None yet
Development

No branches or pull requests

5 participants