-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: increase the robustness of megaparse #121
Conversation
jacopo-chevallard
commented
Nov 12, 2024
- passing base url and timeout explicitly, so that they can be configured from the module caling the sdk
- improving the way we check for supported models, which are now defined in a separate module
- using docker-compose instead of a Dockerfile directly
@@ -4,8 +4,8 @@ | |||
|
|||
|
|||
class MegaParseSDK: | |||
def __init__(self, api_key: str): | |||
self.client = MegaParseClient(api_key) | |||
def __init__(self, api_key: str, base_url: str | None = None, timeout: int = 600): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Timeout won't affect the http request. By default in production, the timeout is on the laod balancer and not the app. It is at 60 seconds. Adding this here gives a false sense of: you can increase the timeout
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, but this is the reason why the parsing fails, since it takes several minutes for a pdf documents with 30 pages...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(and this is how I fixed it locally, but I understand that this is not a good solution for prod...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with @StanGirard 👍🏼 . We should have faster parsing always or send the parser a page by page of the pdf from the workers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice PR
docker-compose.dev.yml
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need docker compose in the repo?
def __init__( | ||
self, | ||
api_key: str | None = None, | ||
base_url: str | None = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like those in env variables but as you want
|
||
|
||
class SupportedModel(Enum): | ||
GPT_4O = ("gpt-4o", None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thats Nice !
Additionnal typeing can be used here (either NamedTuples or typped dict ...)
@@ -4,8 +4,8 @@ | |||
|
|||
|
|||
class MegaParseSDK: | |||
def __init__(self, api_key: str): | |||
self.client = MegaParseClient(api_key) | |||
def __init__(self, api_key: str, base_url: str | None = None, timeout: int = 600): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with @StanGirard 👍🏼 . We should have faster parsing always or send the parser a page by page of the pdf from the workers
* chore: aligining build commands with what used in quivr-core * chore: adding uvloop dependency * feat: improving the way we check for supported models * feat: passing url and timeout explicitly * chore: restoring launch command in Docker * feat: improving dealing with MegaParse configuration
…#123) * fix: fixing the wrong passing of arguments to the parse_file endpoint * feat: increase the robustness of megaparse (#121) * chore: aligining build commands with what used in quivr-core * chore: adding uvloop dependency * feat: improving the way we check for supported models * feat: passing url and timeout explicitly * chore: restoring launch command in Docker * feat: improving dealing with MegaParse configuration * fix: version requirements (#126) * fix: version requirements * fix: fix versions * fix: uvicorn version * fix: uvicorn version (#127) * chore(main): release megaparse 0.0.43 (#125) * fix: typing and test error * fix: change url config * add: a SupportedModels enum for api * fix: change name UploadFileInput to UploadFileConfig --------- Co-authored-by: Chloé Daems <[email protected]> Co-authored-by: Stan Girard <[email protected]> Co-authored-by: chloedia <[email protected]>