Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sitemap - memory allocation #16616

Closed
MikeKry opened this issue Aug 26, 2024 · 9 comments · Fixed by #16636
Closed

Sitemap - memory allocation #16616

MikeKry opened this issue Aug 26, 2024 · 9 comments · Fixed by #16636
Labels
Milestone

Comments

@MikeKry
Copy link
Contributor

MikeKry commented Aug 26, 2024

Describe the bug

Sitemaps with over 20.000 items consume too much memory and do not release it. We have currently added few more content types into sitemap and it grew up to 20k items. When we generate sitemap (with cache), it takes about 5 minutes to build and it allocates and holds around 6 GB of memory.

Orchard Core version

1.8.2

Expected behavior

As the generated sitemap itself has only 4 MB, I wouldn't expect that cache/memory usage is 1,500x bigger

Logs and screenshots

After sitemap build:
image

After sitemap cache clean:
image
(seems memory is still not released, not even through tenant restart and it needs hard reset.)

Standard allocation is around 500-800 MB

Even more interesting is, that the sitemap cache is stored at FS, so it should not be in memory. I guess there is a memory leak somewhere. I tried to disable our custom modules but result is the same.

1

image

2

image

@MikeAlhayek
Copy link
Member

@MikeKry it may be helpful to provide the steps to reproduce this issue.

Are you going to submit a PR for fixing this?

@Piedone
Copy link
Member

Piedone commented Aug 26, 2024

I guess the root of the issue is that when building the sitemap (second exception) there will be just too much stuff in memory for 20k items. There's no batching/paging, so in the end, all items are looped over and the resulting XML is built on the fly (see ContentTypesSitemapSourceBuilder). If changing the logic is not feasible, we may look into sitemap index files to build multiple, smaller sitemaps (that's necessary at one point anyway, since sitemaps are limited to 50MB/50k entries).

If you can provide a way to repro this, like with a suitable export file with that many items, or some code that generates suitable items, then indeed we can look into this better.

@gvkries
Copy link
Contributor

gvkries commented Aug 27, 2024

You can already create a sitemap index to split the sitemap into smaller pieces. You can also limit the number of items in each sitemap. See https://docs.orchardcore.net/en/latest/reference/modules/Sitemaps/#sitemap-content-types-source

@sebastienros
Copy link
Member

The two improvements I can see:

  • Page the content items when we load/process them in sitemaps
  • Have a new option to set the max number of items per sitemap to create multiple files (optional/separate PR maybe), with a sensible default

@sebastienros sebastienros added this to the 2.x milestone Aug 29, 2024
Copy link
Contributor

We triaged this issue and set the milestone according to the priority we think is appropriate (see the docs on how we triage and prioritize issues).

This indicates when the core team may start working on it. However, if you'd like to contribute, we'd warmly welcome you to do that anytime. See our guide on contributions here.

@MikeAlhayek
Copy link
Member

@MikeKry if you are able to, please test out PR #16636 as it should fix your problem. You'll need to be on 2.0 previews or main branch to test it out.

@Piedone
Copy link
Member

Piedone commented Aug 29, 2024

And repro information would still be useful.

@MikeKry
Copy link
Contributor Author

MikeKry commented Aug 30, 2024

@MikeAlhayek I will try it today on our dataset.

Sadly I am not allowed to share this specific dataset, but in case it is needed, it should not be a problem to generate a sample dataset. I can provide you with generated one, in case you don't have option to generate.

@MikeKry
Copy link
Contributor Author

MikeKry commented Aug 30, 2024

Just tried that out, it seems fixed.

If there would be still a need for getting a dataset, let me know, but whole usecase is just having a lot of pages included in sitemap.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants