-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Achieve the most efficient data import #13071
Comments
related :#10858 |
Hi @sebastienros, I tweaked the code and did new tests, and the root cause of the import performance loss is OrchardCore/src/OrchardCore/OrchardCore.ContentManagement/DefaultContentManager.cs Lines 529 to 530 in 95729da
If I override the ContentManager removed _session.Save(contentItem);
_contentManagerSession.Store(contentItem); The import of 20,000 pieces of data takes less than five minutes. Of course, I disabled SQLIndex Because in In my opinion, the cleaning or updating of history versions should be defined by the user, rather than handled automatically in the import function . |
Importing does validation for instance, so the events are required. You can't just ignore the events. If you think there should be an option to just ignore validation then this could be an option on the UI. WE could also find a way to remove some of the extra queries (e.g. RemoveLatestVersion) without breaking the import, and other optimizations like this. |
I also mentioned that in another issue, we should have a service that stops live indexing of items, and delay it to when the data is imported. |
What about having a scoped service that would allow us to disable specific handler events? With a Flags enumeration for instance (so we can pre-define groups). These would be cross-handler though. But one advantage is that you could just disable any validation when you are storing/importing a content item. However, do you know what part of the code for these handlers was causing a perf issue? I checked their code and they were optimized for batches, like just tracking an object and scheduling a single task for the end of the request. But thinking about it the issue might be that it's tracking all the content items from the import, and batching them is not fixing the problem when you have too many items. Then with 20K items in memory, and to process, it could be a problem. Disabling the event handlers like you did solves the issue because it just doesn't even track these items so nothing is done at the end of the request. The only option might be to be able to delay these tasks in the background, out of the request. Another option could be to be able to do it in smaller batches, but all the items would still be in memory at a point of time, so ideally we should just store the ids if these items and process then asynchronously (backgroun task). Like it's done for lucene indexes. (could be a new feature, but it looks a lot like the indexing module, with a cursor, but per task). |
Hi @sebastienros , I try to modify the LuceneIndexingContentHandler class, and test the delete and update new, seems to work well but the below code will always delete last document : await luceneIndexManager.DeleteDocumentsAsync(indexSettings.IndexName, new string[] { contentItem.ContentItemId }); This means that OC's Lucene module only supports the latest version of the index, right? There seems to be no way to get a specific version of the Lucene document and delete it |
do we no index only published content? |
No by config we allow to index latest version (drafted). |
The current import feature will trigger all the handlers one by one on a per-contentItem basis.
For example, if I have an excel sheet with 20,000 entries, it may take hours to import them all.
But most of the time, we know that the data is up-to-date, and we don't need to consider historical version information.
We can batch any issues before importing, such as duplicate versions.
As mentioned in this PR, #11532
the parameter that the import event fires should be a collection type.
I think it could improve the performance of some custom import event hooks
The text was updated successfully, but these errors were encountered: