-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Plone5.2-rc2/Python3.6/c.solr8.0.0a1] Parsing error xmlSAX2Characters: huge text node #239
Comments
I was able to solve that problem using the
|
@NicolasGoeddel thanks for reporing this and providing a fix. This is highly appreciated. I'd be more than happy to review and merge a PR if you would care to open one. :) |
I will take a look into how PRs work. I never did one. Seems like I have to Fork first, make a branch and such things. |
@NicolasGoeddel awesome! Yes, you can fork the repo and then do a pull request or checkout the repository from the collective. For the latter option, I would have to add you to the Plone collective. I'd be more than happy to do so if you are ok with it. |
There is problem with parsing huge XML outputs of the extraction handler of Solr.
When I want to index a PDF file with nearly 3000 pages of text, Solr extracts that text and returns with a XML response that is handled by
collective.solr.indexer.BinaryAdder
. The problem here isetree.parse(response)
which does not work with big text nodes. It needs to be changed toetree.iterparse()
I guess. But that is a bigger change.It would be nicer if collective.solr would extract and indexing a binary object in one single step. I don't know if this is possible with Solr's API. At the moment collective.solr extracts all the text of a binary blob using Solr, saves that text into a Dictionary and sends it back to Solr to index it. That looks not very efficient in my opinion. Maybe you know of a simple change to do both things together without that step in between.
For your information this is the whole warning:
The text was updated successfully, but these errors were encountered: