-
-
Notifications
You must be signed in to change notification settings - Fork 704
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory consumption increases continuously when generating multiple PDFs in a Loop. #2130
Comments
Hi! Thanks for your report. WeasyPrint can take a lot of memory, that’s a known behavior and we’re open to solutions to improve this. But memory leaks is a different problem.
Many bug reports like this have already been open, and we have to be sure that it’s a real memory leak. A few (~20) generations is not enough to detect this, because Python’s interpreter can do what it wants with memory. There’s an interesting issue about this, showing that what may appear as a memory leak is not necessarily one: #1977 So, you can try with 200+ generations and see if you’ve find a "real" memory leak. That being said, your problem seems to be related to fonts, just as is #1977. Even if it’s not a memory leak, maybe there’s something we can do about this. |
Thank you for your reply!
I tried generating 200 PDFs, and similar to #1977, the memory usage is stable from around the 80th iterations.
Actually, we encountered memory errors in a container with a memory limit of 1-2GB. Therefore, it might be necessary to increase the memory limit for the container.
Regarding the font-related problem, I tried the suggested method below but it resulted in an error. It seems that it doesn't work with the type of font we are using. Code: class PdfWriter:
_fonts = {}
def write_pdf(self, html_str):
doc = weasyprint.HTML(string=html_str).render()
doc.fonts = self._fonts
doc.write_pdf(None)
PdfWriter().write_pdf(f"<div>{'<div>あ</div>' * 20}</div>") Output:
|
The snippet is just a hack that could help in specific cases. We have yet to find a reliable way to fix this problem. |
OK, I’ve found where the problem comes from: WeasyPrint/weasyprint/pdf/stream.py Lines 329 to 340 in 3a208fe
This method is cached, meaning that the font is stored in memory once for each Let’s just store the (Pango font + key) couple instead! |
Before:
After:
|
This directed me to the right direction after struggling a whole day to solve a font caching related issue. When I was trying to generate a PDF from a static Japanese webpage in a loop, the first 2-3 PDFs were generated correctly. And after that the subsequent PDFs contained garbage texts and I was struggling to figure out what can be the reason, because it was using the same webpage. Then after I started to invalidate the cache after every PDF generation, the results were as expected. This is not mentioned in the documentation. |
First of all, thank you very much for developing and maintaining this incredibly useful library!
We are encountering the following issues regarding memory consumption when converting HTML to PDF:
We have created a minimal setup to reproduce this problem:
https://github.com/yamap55/weasyprint_memory_check
(using Python==3.9.7 and 3.12.3, WeasyPrint==61.2, memory_profiler==0.61.0)
In the above repository, when running the container and executing
python main.py
:memory_profiler
.The text was updated successfully, but these errors were encountered: