-
Notifications
You must be signed in to change notification settings - Fork 299
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Streamable templates #211
Comments
I think a potential solution would be to map |
Ah I'm not actually using Tera in a webserver :p How big are the templates using lots of memory? I would consider that a bug if it's too memory hungry/slow. Can you add such a template to the benchmarks folder? |
It's not so much the templates themselves being too big, but the data fed into the template's context. My specific use case just looks like I haven't investigated this closely, but I'm unsure why there would be any problem rendering the node itself after the AST is built (i.e. my template is syntactically well-formed) and the values from the context are semantically compatible with my template (e.g. if I said |
I beg your pardon, Riamse, but what are you trying to achieve with "streaming" the stuff? Also I would suggest to look into dynamic loading, i.e. you render a page with no data then use some asynchronous mechanism to fetch the data on demand. Another thing I can tell you that serialization is not an instant process as well (and you want your data to be serialized in order to be rendered), so that load-on-demand will help here as well. It's a more complex design to implement, but a more scalable one. |
With all due respect, @mexus , I think you may have misunderstood the issue. I'm not concerned about speed of rendering, per se, and dynamic loading will not solve the problem. My problem is that the tera templating engine stores the entire evaluated template as a string on the heap, and then returns the string. This can potentially use a lot of RAM. I propose, in effect, that when you call Now, I don't know if you have any experience with Python, but here's an analogy to illustrate. What Tera does right now is: def render(self, *args_idk):
output = []
for node in self.ast:
output.append(node.render())
return output The feature I propose is roughly equivalent to this: def render(self, *more_args):
for node in self.ast:
yield node.render() The second one only renders nodes as needed, while the former gives me all of the nodes at once. If the web server transmits data to the client like so: server = the_server
client = my_laptop
for piece in tera.render():
server.send(client, piece) # send piece to the specified client with the current implementation, at any given moment, every single node will be stored in RAM at the same time. With my proposed implementation, at any given moment, only one node will be stored in RAM. Here is a concrete example to illustrate what I'm saying, because what I propose is a similar strategy used when serving static files. Ubuntu hosts a 1.5 gigabyte .iso of the operating system on their own server. I am 100% sure that they don't transmit it to my computer by loading every 1.5GB of data into RAM at once and then iterating over it. What they do, I am sure, is:
Thus, at any given time there is only one small portion of the file in RAM throughout the lifetime of the program. In fact, there's even a HTTP header just for this purpose to make it easier. Forgive me if this sounds patronising, as I want to ensure my communications to be 100% understood. Is my explanation to your satisfaction? |
@Riamse I would be open to adding that in Tera but as another method than |
That's fair. Looking at the code, it seems difficult to implement it using the current structure due to how |
I'm planning to rewrite all the parsing with pest 1.0 for the next version soon but I'm not entirely sure how a flat AST would look like for the rendering. |
Great, sounds like a plan |
I took a look but I'm not sure how it works. Could you give me a quick rundown of what happened, how to use it, differences, etc.? |
Sorry I should have explained more:
Other than that the public API of Tera doesn't change, you can use it as before. I might do a few more changes but nothing dramatic. |
Okay I looked at it and I think I have a better idea of what to do. I think what this issue is really about is about how for loops are rendered. Currently This is a problem and I think I might know how to fix it, but I need to understand how |
So the idea of |
Got it. I need to make the AST mutable. Specifically in a line like this. let ast = match self.template.parents.last() {
// this unwrap is safe; Tera will have errored already if the template doesn't exist
Some(parent_tpl_name) => &self.tera.get_template(parent_tpl_name).unwrap().ast,
None => &self.template.ast,
};
ast.push(Node::Text("hello".to_owned())); // what I want to be able to do I believe I have a solution to this problem but it requires manipulating the AST while rendering occurs. |
I mean to say that I can't make the ast mutable right now - trying to make the match block return |
I'm doing a proof of concept |
Also, I think the way that the for loops works is kind of bad. I think instead of keeping track of the variables in the for loop variables themselves, it'd make more sense to just have a Context be able to store variables in scopes: Something like Am I making sense? (https://github.com/Riamse/chom-c/blob/master/grammar.py#L83 and https://github.com/Riamse/chom-c/blob/master/c.py are examples of what I'm talking about) |
Oh. Forget I said anything then. |
basically, render_for calls render_body, which might call render_for, etc etc. that's what I meant by recursion; but fear not, I have an idea in the works. Watch this space. |
@Keats please look at the commit I just committed because it passes almost all the tests and fixes the thing and I am proud |
What do you think? |
I'll have a look at the code tomorrow - any benchmarks for memory usage between the render and iter_render? |
And for I'm also curious if you can run the current benchmark to see if there's any speed difference too. |
I ran the benchmarks. https://pastebin.com/W6E2NApQ for the current method of rendering things and https://pastebin.com/sEvEQxRH for I'm not sure how to measure memory usage. But basically, if your goal is to print a rendered template and not store it anywhere, |
I asked on Twitter (https://twitter.com/20100Prouillet/status/934705940758253568) for the memory usage. |
I'm not sure why it's slower. The only overhead should be the cost of allocating a VecDeque and skipping over no-ops. Seems that there's also a small improvement by using a Vec instead. |
Ah, one more bit of overhead - cloning body nodes (e.g. ForloopPrime). I'm not sure if there's a way around that other than reference-counting. I have a minor issue with "not-really existing nodes" - they do, in fact, exist already, represented by the nested relation that comes with nodes having bodies. However by flattening the nested bodies piece-by-piece, we need to preserve that relation. Think of X' nodes as various types of closing brackets. |
Even if template streaming is marginally slower, from my perspective it's worth the cost because web application response times are most commonly slowed down by expensive / poorly optimized database queries, not by the web application itself mis-using CPU or memory. While template streaming doesn't lower the amount of time it takes for the server to render its response, it does allow the browser to eagerly load render-critical resources, and possibly render the navigation UI. Especially on mobile, having to wait for render-critical assets to load after the HTML can easily add 5 seconds to the total render time (especially if the page requires JS). |
That's not the same kind of streaming referenced in this issue though. The PoC in that issue is basically to make the loops non-recursive to save some memory but it's not streaming: you still end up with a |
No, it is the same kind of streaming. The PoC does in fact return a String because that's what the tests expect as output. If you really wanted, you could duct-tape together something that returns an |
That is not specific to your branch though, you could apply this kind of streaming to the current |
The current
The AST looks like this: I can't figure out a way to stream ["hello", "world"] without introducing generators into the current implementation. |
The latest release of Tera is much more memory efficient: it's still not streaming but it shouldn't allocate the memory for the string now, no more cloning around |
I took a quick look at the code - the rendering work has been moved to https://github.com/Keats/tera/blob/master/src/renderer/processor.rs it seems? But I'm not entirely sure exactly what happened, because it seems to me that it's still allocating memory for strings at each nested level of rendering, but now we copy them into the |
My last message was confusing even for me now, sorry about that. So it doesn't do anything to help with the string allocations, the main change was using references to the original context instead of |
Can we look into adding template streaming to the new renderer architecture? |
That would depend on #340 and whether it requires duplicating the codebase or not |
Unlikely to happen anytime soon, maybe with async/await in the future. |
Currently,
Tera::render
returns a string, which comes fromRenderer::render
, which builds the string by rendering each component and then usingString::push_str
.Unfortunately for extremely large templates, this uses a lot of memory and might take a lot of time, and the biggest problem is that rendering is a blocking operation. It might be smarter to have a different render method that returns a struct that implements
std::io::Read
so it can be streamed and not store the entire response in the heap before sending it over to the server.(I'm obviously talking about using tera in conjunction with a web application framework as opposed to tera alone, but let's be honest, that's 99% of tera's use cases)
The text was updated successfully, but these errors were encountered: