Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: RAPTOR and #5211

Open
1 task done
alexff77 opened this issue Feb 20, 2025 · 1 comment
Open
1 task done

[Bug]: RAPTOR and #5211

alexff77 opened this issue Feb 20, 2025 · 1 comment
Labels
bug Something isn't working

Comments

@alexff77
Copy link

Is there an existing issue for the same bug?

  • I have checked the existing issues.

RAGFlow workspace code commit ID

9298acc full - Nighty Feb 18

RAGFlow image version

9298acc full

Other environment information

Running nightly build from Feb 18 on ubuntu

Actual behavior

Hi,
I have several issues similar to what others reported but not quite the same.
Issues with RAPTOR:
1) On this one doc with only 1 chunk it always errors our:

15:10:00 Task has been received.
15:10:01 Page(12): OCR started
15:10:04 Page(1
2): OCR finished (3.11s)
15:10:05 Page(12): Layout analysis (0.86s)
15:10:05 Page(1
2): Table analysis (0.00s)
15:10:05 Page(12): Text merged (0.06s)
15:10:05 Page(1
2): Page 01: Text merging finished
15:10:05 Page(1
2): Generate 1 chunks
15:10:05 Page(12): Embedding chunks (0.22s)
15:10:05 Page(1
2): Done (0.04s)
15:10:08 Start RAPTOR (Recursive Abstractive Processing for Tree-Organized Retrieval).
15:10:08 Task has been received.
15:10:08 [ERROR]Fail to bind LLM used by RAPTOR: 'NoneType' object is not subscriptable
15:10:08 [ERROR][Exception]: 'NoneType' object is not subscriptable

I can find chunk in elasticsearch.

2) On another document it processed fine,
then I changed a setting on the file to enable Entity resolution and re-run it and got an error.

16:01:12 Reused previous task's chunks.
16:01:17 Start RAPTOR (Recursive Abstractive Processing for Tree-Organized Retrieval).
16:01:17 Task has been received.
16:01:18 [ERROR]Fail to bind LLM used by RAPTOR: 'NoneType' object is not subscriptable
16:01:18 [ERROR][Exception]: 'NoneType' object is not subscriptable

Then I turned that setting off and re-run and RAPTOR worked fine again. This time it had to re-generate chunks since it's error-ed before.
Simmilar for another document I chose not to regenerate chunks and it failed RAPTOR, they I re-generated and RAPTOR worked.

I see that I have some errors connecting to the elasticsearch

ESConnection.update got exception: BadRequestError(400, 'illegal_argument_exception', 'exceeded max allowed inline script size in bytes [65535] with size [213572] for script [ctx._source.content_with_weight='

Before I had errors regarding number of scripts that can be run and I increased it to 1000/1m

Could be related to how many entities it found and trying to resolve?

3) Tasks seems to be stuck at very last step after entity resolution for a really long time, for an hour or more for example.
18:57:39 Entities extraction progress ... 46/47 (8962 tokens)
18:57:39 Entities extraction progress ... 47/47 (9589 tokens)
then some times it fails.

Thank you.

Expected behavior

No response

Steps to reproduce

As described above. Using documents with RAPTOR and entity resolution.

Additional information

No response

@alexff77 alexff77 added the bug Something isn't working label Feb 20, 2025
@KevinHuSh
Copy link
Collaborator

About 15:10:08 [ERROR][Exception]: 'NoneType' object is not subscriptable, do you have back end error log?
docker logs -f ragflow-server

For illegal_argument_exception:

curl -X POST  -u elastic:infini_rag_flow -H 'Content-Type: application/json'  http://127.0.0.1:1200/_cluster/settings -d '{"transient":{"script.max_size_in_bytes": 10000000}}'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants