-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extend the network client #1269
Conversation
Connections to the client callback URL would be more complicated than in the example @joschrew shared. For the CI/CD pipeline, we need an appropriate host defining and port mapping just like in the other services in the docker-compose file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So far so good, appreciate also the test.
We should really find a different mechanism than a global variable. I have proposed a way to at least scope such a variable to the server instance. Why do we need that mechanism anyway? IIUC callback_server.handle_request
will handle exactly one request with no timeout, which is exactly what we want, is it not?
But I'm also wondering whether having an alternative based on polling for blocking processing might still be worth it. Because I imagine that a lot of firewalls won't allow the callback to a temporary server run on the user's machine. Polling is less efficient obviously but it does not require juggling IP addresses, ports and servers.
I started to review, or rather to test the client but I couldn't finish so far. I will continue tomorrow. Here is what is not working for me up until now:
|
When refactoring to a shorter flag I obviously missed that there are two places to adapt.
I'm not a big fan of copy-pasting the same default everywhere. Instead, I resolved it with 4de1e83. The
Resolved 8e7ba26.
You have caught something bigger than just that simple error. Resolved with 69808b6 and d1af85b. The HTTP exception was being caught with the general Exception catch which was duplicating the error output, and thus obfuscating it.
That was not very pleasant to debug. The 21:57:40.222 WARNING ocrd_network.processing_server - Job input: processor_name='ocrd-cis-ocropy-binarize' path_to_mets='/home/mm/repos/ocrd_network_tests/ws29/data/mets.xml' workspace_id=None description='OCR-D Network client request' input_file_grps=['DEFAULT'] output_file_grps=['BIN-TEST'] page_id=None parameters={'{': '}'} result_queue_name=None callback_url=None agent_type=<AgentType.PROCESSING_WORKER: 'worker'> job_id='d1e075e3-6cb0-40c7-b854-d2d77322cfbf' depends_on=None
21:57:40.222 ERROR ocrd_network.processing_server - Failed to validate processing job input against the tool json of processor: ocrd-cis-ocropy-binarize
["[] Additional properties are not allowed ('{' was unexpected)"] Dirty fix: 50f73c5. Not sure how to better handle that.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After the recent changes the client is working for me.
@@ -210,12 +212,12 @@ def validate_job_input(logger: Logger, processor_name: str, ocrd_tool: dict, job | |||
raise_http_exception(logger, status.HTTP_404_NOT_FOUND, message) | |||
try: | |||
report = ParameterValidator(ocrd_tool).validate(dict(job_input.parameters)) | |||
if not report.is_valid: | |||
message = f"Failed to validate processing job input against the tool json of processor: {processor_name}\n" | |||
raise_http_exception(logger, status.HTTP_400_BAD_REQUEST, message + report.errors) | |||
except Exception as error: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When would this exception be raised? Validators should not raise errors but return a report. If you don't have a specific use case, it might be better to just not do try/except
so that potential errors are actually raised and fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to be extra secure against unexpected errors. The workers or the server (network agents as a service) should ideally never crash due to some error. Especially with an HTTP 500 error on the client side and leaving the network agent in an unpredictable state. The exceptions from problematic requests are still logged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood, but errors in
report = ParameterValidator(ocrd_tool).validate(dict(job_input.parameters))
would mean that something is broken in our implementation or in jsonschema. So this would likely not be a fluke that happens once, we would need to stop processing and fix the bug at the source. The only variable factor is the job_input.parameters
dict and if user-provided input breaks the validator - which it must absolutely never do - we should fix that in the validator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only variable factor is the
job_input.parameters
dict and if user-provided input breaks the validator - which it must absolutely never do - we should fix that in the validator.
A failing validator would also probably fail most of the processing/workflow jobs. Input from the user breaking down the entire infrastructure is conceptually wrong to me. Even if all the processing jobs fail, the Processing Server should still respond to other requests such as checking logs of old jobs. We would rather need a mechanism to prevent further submission of processing jobs in case X amount of jobs fail in a row - potentially preventing further requests only to the processing endpoint till it is fixed.
There is also currently no graceful shutdown for the processing server. I.e., once the server dies, anything inside the internal processing cache of the server (not the RabbitMQ) will be lost.
The actual parsing of the |
ocrd network client: parse parameters and overrides
@MehmedGIT I've created the changelog for this PR, anything essential missing? |
# Conflicts: # CHANGELOG.md
IMO, good to be merged. |
A brief beginning of addressing the points in #1265:
network-integration-test
failing due to a bad env variableOCRD_NETWORK_CLIENT_POLLING_SLEEP
andOCRD_NETWORK_CLIENT_POLLING_TIME