Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

install playwright for crawler agent working after re-create #4732

Closed
wants to merge 5 commits into from

Conversation

isthaison
Copy link
Contributor

crawler agent working after re-create

@KevinHuSh
Copy link
Collaborator

KevinHuSh commented Feb 6, 2025

Appreciation!
I did not find the place where we used this 'playwright'. Could you specify what kind issue you want to resolve?

@isthaison
Copy link
Contributor Author

isthaison commented Feb 6, 2025

https://github.com/unclecode/crawl4ai package https://github.com/infiniflow/ragflow/blob/main/pyproject.toml#L29
image
crawler component in agent won't work after being initially launched. Thank you for your interest

@KevinHuSh KevinHuSh added the ci Continue Integration label Feb 6, 2025
@KevinHuSh KevinHuSh requested a review from yuzhichang February 6, 2025 02:47
@KevinHuSh
Copy link
Collaborator

KevinHuSh commented Feb 6, 2025

CI fail.
I don't think it should be installed with pipx, should be?

@isthaison
Copy link
Contributor Author

Let me check again

@isthaison
Copy link
Contributor Author

Hi @KevinHuSh. It seems that CI has been working correctly.

@yuzhichang
Copy link
Member

What exact issue does this PR fix?
With the change, building image will download ~300MB from https://playwright.azureedge.net. This slows down the build and enlarges the image.

@isthaison
Copy link
Contributor Author

image

@isthaison
Copy link
Contributor Author

test.json
agent test. With the current image, the crawler is not really working

@yuzhichang
Copy link
Member

yuzhichang commented Feb 6, 2025

The python package selenium requires chromium, so we installed one at /usr/local/bin/chrome. See Dockerfile line 124.
It's better for playwright to share the same one chromium via env PLAYWRIGHT_BROWSERS_PATH.

@isthaison
Copy link
Contributor Author

I got it, so I'm going to close this request, can you help me customize the environment variable so that the component can work as soon as it's started?

@isthaison isthaison closed this Feb 6, 2025
@yuzhichang
Copy link
Member

That's only my theory. You need to research where and how to customize the environment variable.

@yuzhichang yuzhichang reopened this Feb 6, 2025
@KevinHuSh
Copy link
Collaborator

300MB is too much for one component.
Not every one needs that extra 300MB.
We're gona figure it out
Hope for your understanding.

@isthaison
Copy link
Contributor Author

I got it, let me see if I can customize the environment to work.

@isthaison
Copy link
Contributor Author

It seems like @yuzhichang is not effective.

@KevinHuSh KevinHuSh closed this Feb 7, 2025
@isthaison isthaison deleted the patch-2 branch February 7, 2025 06:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci Continue Integration
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants