-
Notifications
You must be signed in to change notification settings - Fork 108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support to gzip response at the REST layer #11343
Conversation
Jenkins results:
|
@vkuznet Valentin, given that you will be away the next two days, I wanted to trigger this review request now. I still have to work on those unit tests, which seem to be related to these changes. However, I did test these changes with reqmgr2 and microservices, and content is being served according to the request headers provided by the end user. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes themselves looks fine to me, but without actual test using non-python client I doubt I can make certain conclusion about this approach. I suggest that you perform the following:
- setup basic WMCore service which should serve list of dicts, e.g.
[{"data":1}, {"data":2}]
- make API which will provide this JSON object
- run curl client to fetch both plain and gzipped content
- provide results of this test in this PR
If curl client using appropriate HTTP headers will be able to decode properly gzipped content I will be convinced with these code changes.
@vkuznet I provided some logs in the initial description. I am now working on the unit tests. |
test this please |
Jenkins results:
|
Jenkins results:
|
Jenkins results:
|
Jenkins results:
|
Jenkins results:
|
Jenkins results:
|
Jenkins results:
|
Jenkins results:
|
Jenkins results:
|
@amaltaro , I see now that |
Jenkins results:
|
Jenkins results:
|
Jenkins results:
|
After many iterations with the unit tests, also struggling with the automated "Accept-Encoding" libcurl mechanism (see updated PR description), I managed to get all the unit tests working and I also remade all the curl tests.
@vkuznet please let me know if you have any insights. |
Alan, for things to work we need few pieces of information:
The easiest way to test thing is still use local setup with your changes and curl as a client. You can do it on VM or your laptop. But you should provide all details of HTTP data flaw, e.g.
Once you have such test, then you can move towards WMCore server vs WMCore client use-case (e.g. communication between reqmgr2 and wmstats). This case is more complex since WMCore code is used in both ends. If we talk about reqmgr2 talks to wmstats, then reqmgr2 is a client, and it should set |
Valentin, it has all been explained in this PR, including logs of all the tests performed with If you check things on your browser, you can enable debugger to see the headers sent by the browser in the request http header, and what comes back with the response. --> This is the actual problem, Firefox has full support to For WMCore intercommunication, and when reading the response object, data is decompressed at the pycurl_manager layer, before being yield to to the end client: You can test all of this with this API (and any other WMCore API in this node): In case you are missing anything, please refer to the initial PR description. |
@@ -328,14 +318,14 @@ def request(self, url, params, headers=None, verb='GET', | |||
if verbose: | |||
print(verb, url, params, headers) | |||
header = self.parse_header(hbuf.getvalue()) | |||
data = bbuf.getvalue() | |||
data = decompress(data, header.header) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is not correct since you made it default. Here is what will happen on a client side:
- the server provides response HTTP header with
Content-Encoding: gzip
- the server data is already decompressed here
As you can see both items contradict each other. You either should remove Content-Encoding: gzip
header when you decompress by default over here, or you should not decompress here since your server has proper header, i.e. the client should do it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The line bbuf.getvalue()
simply reads a byte sequence out of the BytesIO
data object, whatever that data is.
Then in the next line at the decompress()
call, body data will ONLY be decompressed if the http response header has a Content-Encoding: gzip
, meaning that the server responded with compressed data and that our client (pycurl_manager, in this case) can decompress the data before passing it over to the upstream modules (note, upstream modules here mean that we are communicating with other services within the WMCore realm).
I see that with curl everything works fine, i.e. server returns |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This API is actually easier to reproduce the problem, since it does not require any internal HTTP call among the central services:
https://alancc7-cloud1.cern.ch/ms-transferor/data/status
using a web debugger, it shows the following for the response object (2 lines):
{"result": [
which makes me think that the method implemented in this PR _stream_compress_gzip
is actually what causes this problem.
@@ -328,14 +318,14 @@ def request(self, url, params, headers=None, verb='GET', | |||
if verbose: | |||
print(verb, url, params, headers) | |||
header = self.parse_header(hbuf.getvalue()) | |||
data = bbuf.getvalue() | |||
data = decompress(data, header.header) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The line bbuf.getvalue()
simply reads a byte sequence out of the BytesIO
data object, whatever that data is.
Then in the next line at the decompress()
call, body data will ONLY be decompressed if the http response header has a Content-Encoding: gzip
, meaning that the server responded with compressed data and that our client (pycurl_manager, in this case) can decompress the data before passing it over to the upstream modules (note, upstream modules here mean that we are communicating with other services within the WMCore realm).
Alan, what you describe from browser debugging is what I wrote here: #11247 (comment) |
Jenkins results:
|
Jenkins results:
|
remove my debugging remove gzip decorator from _call method join chunks to be gzipped all together
Explicitly set gzip as HTTPHEADER object Deal with None headers Disable automatic data decompression in libcurl - ACCEPT_ENCODING
Jenkins results:
|
Okay, I was missing a join of the records/chunks before compressing it with gzip. It has exactly the same logic as the zlib compression (when passing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest to add additional unit tests which will try out gzip with HEAD
HTTP request and HTTP responses which will generate non 200 status since the code in pycurl_manager has moved.
@@ -328,14 +318,14 @@ def request(self, url, params, headers=None, verb='GET', | |||
if verbose: | |||
print(verb, url, params, headers) | |||
header = self.parse_header(hbuf.getvalue()) | |||
data = bbuf.getvalue() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part should be checked if it works with HEAD
HTTP body and different HTTP codes since you moved reading buffer before you check HTTP verb as it was initially implemented in line block below (lines 323-329)
Jenkins results:
|
Jenkins results:
|
Jenkins results:
|
Jenkins results:
|
Jenkins results:
|
Jenkins results:
|
Valentin, I created extra unit tests under the pycurl module. I also pointed them to my VM, to make sure they are correctly testing and passing the tests, which you can see from the results above. I am making another push now, but this is just to squash all the unit test changes made today. |
further fixes for MicroService unit tests further microservice unit test fixes further fixes to unit tests new pycurl unit tests wrap unit tests with try/except block fix reference to res variable, in unit tests unit test fix, trap exception fix last unit test remove unneeded unit test code switch urls back to cmsweb
Jenkins results:
|
Thank you for the suggestions and reviews, Valentin. |
Fixes #11082
Superseeds #11247
Status
ready
Description
This PR implements the following changes:
Accept-Encoding: gzip
, WMCore web-services will serve compressed body data, which needs then to be gzip decompressed on the client application)Some tests with curl have been performed - for reqmgr2, wmstatsserver and ms-transferor - and logs can be found here:
https://amaltaro.web.cern.ch/amaltaro/forWMCore/PR_11343/
If anyone wants to run further tests, services in my alancc7-cloud1 are still up.
NOTE-1: This of course adds an extra CPU load on the server application to compress data, but amount of data to be transferred is drastically lower.
NOTE-2: note that CouchDB is not able to serve compressed data, so any intermediate steps (e.g. ReqMgr2 fetching data from CouchDB) will still be dealt with as a python object or json, until it can be served to the end user.
Is it backward compatible (if not, which system it affects?)
YES
Related PRs
None
External dependencies / deployment changes
None
[1]
Example in:
which is basically this error: https://curl.se/libcurl/c/libcurl-errors.html#CURLE_WRITE_ERROR , but there is no extra information to debug it!