Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

call_thumbor error: tornado server unavailable, proceeding with tornado server restart #70

Closed
passledevelopers opened this issue Nov 10, 2018 · 19 comments

Comments

@passledevelopers
Copy link

passledevelopers commented Nov 10, 2018

We are intermittently getting the error below which we are not able to explain.

This URL was previously failing for us, however it now works:
https://d2vjgjl6okzqz8.cloudfront.net/90x278/smart/526535f5afc81614f05b26d2/52652350afc8161e104d5764/2015-03-20-10-35-02-643-01lastarcticice.jpg

We also sometimes get this when using other parameters (eg omitting fit-in). It seems pretty random.

  • It happens intermittently, and quite often, if we attempt the same job again a few moments later (same url) then it sometimes works ok.
  • When we change the parameters in the URL slightly, this seems to always mean it works.
  • We've been so far unable to identify a pattern of when it happens or with what images.

Looking in the CloudWatch logs I find the following messages:

[ERROR] 2018-11-09T18:14:41.970Z 53ead92f-e44b-11e8-8ccd-eb0ffcfa44ce call_thumbor error: tornado server unavailable, proceeding with tornado
server restart
[ERROR] 2018-11-09T18:14:41.970Z 53ead92f-e44b-11e8-8ccd-eb0ffcfa44ce lambda_handler error: too many values to unpack
[ERROR] 2018-11-09T18:14:41.971Z 53ead92f-e44b-11e8-8ccd-eb0ffcfa44ce lambda_handler trace: Traceback (most recent call last):
File "/var/task/image_handler/lambda_function.py", line 353, in lambda_handler
result = call_thumbor(event)
File "/var/task/image_handler/lambda_function.py", line 328, in call_thumbor
thumbor_down, session = is_thumbor_down()
ValueError: too many values to unpack

The stack was built using the following version: serverless-image-handler/v3.0.0/serverless-image-handler.zip

@passledevelopers
Copy link
Author

OK, I have now found that a lot of the issues we were having were when using the smart option. If I turn the smart option, we get far fewer problems. Even not using the smart option, I do still however have the odd instance where the image fails to process on the first attempt. Some examples of files that failed:

This jpeg:
https://d2hqbf0e2abk5h.cloudfront.net/271x50/526535f5afc81614f05b26d2/52652350afc8161e104d5764/2017-05-03-3-34-59-562-_95764102_hi009419132.jpg

And this animated gif:
https://d2hqbf0e2abk5h.cloudfront.net/307x456/52d018c5afc81639b436c532/52652350afc8161e104d5764/2016-10-03-3-30-33-890-HIv45C.gif

with the error mentioned in my original post

@astuyve
Copy link

astuyve commented Nov 13, 2018

I've also ran into this issue several times:
image

@pbassut
Copy link

pbassut commented Jan 4, 2019

did anyone figure this out? I'm also having the same issue

@J-Nagle
Copy link

J-Nagle commented Jan 10, 2019

We've encountered this error with output logs similar to the initial report, but under different circumstances. It occurred for an image met all these conditions:

  • Source was PNG
  • Source roughly 6K × 5K (about 5-7 or over MB)
  • Outputting to 0x1800 (with no filters)
  • Or using a JPEG version of this source and filtering it as PNG

For that case, no failure happened if either:

  • Source was changed to JPEG without modifying its original resolution
  • Reducing the PNG source's dimensions by half (although, generation peculiarly can take over ten seconds)
  • Setting a lower output resolution of 0x1000
  • Filtered final format to JPEG

Sometimes it happens randomly as originally described, however we haven't looked deeply into that. This probably does not help with the other scenarios in this issue.

@matt-bradley
Copy link

@J-Nagle I am seeing the same thing.

However we are getting some pngs through okay and narrowed it down to images that are >9mb

@J-Nagle
Copy link

J-Nagle commented Jan 24, 2019

@matt-bradley I edited my post to clarify that the our failure matched every factor in the first list. We, too, have examples of working PNG's that are smaller than the one that caused issues. Our conditions are only for a single test case, and we haven't looked into exactly what point images start breaking. Thanks for the info.

@jbenezech
Copy link

Having the same issue, seems to be with png only but small sizes as well. Might be related to concurrent access, unsure. Failing images convert fine on a second try.
Commenting pngquant optimizer seems to do some good but too early to tell.

@hayesry
Copy link
Member

hayesry commented Feb 26, 2019

Hi all. We're in the process of updating the solution and have identified several ways to mitigate this issue in the next version. Stay tuned and thanks for your feedback/patience!

@rayhaanq
Copy link

rayhaanq commented Mar 1, 2019

I'm having the same issue with jpegs when adding my base64 encoded signature to the beginning of the url as documented here: https://github.com/thumbor/thumbor/wiki/Security

@peterkuiper
Copy link

peterkuiper commented Mar 26, 2019

@hayesry Any update on this? We are having the same problem. Can you elaborate on the ways to mitigate this? In our case we have quite a high concurrency (200-400 invocations).

After doing some testing, things seem to go awry when the task timeouts. Right after that I am seeing the mentioned errors.

@cpaul747
Copy link

cpaul747 commented Apr 1, 2019

The PNG issue for me was happening when using Smart Crop on PNG files utilizing an alpha channel. The Rekognition support attempts to convert these to jpg to send to Rekognition, and since jpg doesn't support alpha channels, it fails.

To get around this limitation, I'm trying to redeploy without rekognition enabled, and using only the built-in feature detectors, but having issues with mismatches of libcurl 7.53 in lambda vs 7.55 or 7.61 available via yum in amazon redhat docker used for building.

@arichter83
Copy link

I am also getting Future exception was never retrieved: Traceback, occurs for PNG:

11:09:53 START RequestId: 5ebe7755-a92b-4132-a6ca-164712fe3ca3 Version: $LATEST
11:09:53 [INFO]	2019-05-06T11:09:53.812Z	5ebe7755-a92b-4132-a6ca-164712fe3ca3	200 GET /healthcheck (0.0.0.0) 2.29ms
11:09:53 [ERROR]	2019-05-06T11:09:53.883Z	5ebe7755-a92b-4132-a6ca-164712fe3ca3	Future exception was never retrieved: Traceback (most recent call last): File "/var/task/tornado/gen.py", line 1107, in run yielded = self.gen.throw(*exc_info) File "/var/task/thumbor/transformer.py", line 152, in smart_detect yield self.do_smart_detection() File "/var/task/tornado/gen.py", line 1099, in run va
11:10:03 END RequestId: 5ebe7755-a92b-4132-a6ca-164712fe3ca3
11:10:03 REPORT RequestId: 5ebe7755-a92b-4132-a6ca-164712fe3ca3	Duration: 10010.49 ms	Billed Duration: 10000 ms Memory Size: 1536 MB	Max Memory Used: 119 MB
11:10:03 2019-05-06T11:10:03.798Z 5ebe7755-a92b-4132-a6ca-164712fe3ca3 Task timed out after 10.01 seconds
11:10:05 START RequestId: 8d465828-7082-4b5a-983a-f659f083aea5 Version: $LATEST
11:10:05 [ERROR]	2019-05-06T11:10:05.885Z	8d465828-7082-4b5a-983a-f659f083aea5	call_thumbor error: tornado server unavailable, proceeding with tornado server restart
11:10:05 [ERROR]	2019-05-06T11:10:05.885Z	8d465828-7082-4b5a-983a-f659f083aea5	lambda_handler error: too many values to unpack
11:10:05 [ERROR]	2019-05-06T11:10:05.886Z	8d465828-7082-4b5a-983a-f659f083aea5	lambda_handler trace: Traceback (most recent call last): File "/var/task/image_handler/lambda_function.py", line 338, in lambda_handler result = call_thumbor(event) File "/var/task/image_handler/lambda_function.py", line 313, in call_thumbor thumbor_down, session = is_thumbor_down() ValueError: too many values to unp
11:10:05 END RequestId: 8d465828-7082-4b5a-983a-f659f083aea5
11:10:05 REPORT RequestId: 8d465828-7082-4b5a-983a-f659f083aea5	Duration: 317.85 ms	Billed Duration: 400 ms Memory Size: 1536 MB	Max Memory Used: 41 MB

@hayesry
Copy link
Member

hayesry commented May 7, 2019

@peterkuiper Thanks for checking in on this, we're currently working through the testing stages of the new version and hope to have it out shortly. We've replaced the Thumbor image handler library with SharpJS to provide for a truly serverless implementation and reduce the technical debt associated with starting up and running the tornado server within the Lambda function. We are also providing built-in mappings to promote compatibility with existing Thumbor URLs and minimize/prevent any rework from needing to be done on the front-end when upgrading to the new version.

@peterkuiper
Copy link

@hayesry Thanks for the update! Looking forward to test the new version.

@AdrienBigot
Copy link

Great news ! Also looking forward to test your new solution !

@erodrig
Copy link

erodrig commented May 12, 2019

New version please!

@milon619
Copy link

@passledevelopers we were facing the same issue reported by you and also the following error for some requests
image

We found that the png shared libraries are not included in the Lambda zip file. On further checking the build script https://github.com/awslabs/serverless-image-handler/blob/master/deployment/build-s3-dist.sh we confirmed that it's indeed missed in the build process. Hence we added a step for including the libpng* files as part of the build script to the 'lib' folder inside the zip artifact. We deployed this custom artifact to Lambda and now we are not seeing the same errors again. Not sure whether this is the actual solution to the problem discussed here, however you may try. @hayesry could you confirm?

@Bogatinovski
Copy link

When are you going to release the fix ?

@shsenior
Copy link
Contributor

Resolved with v4.0.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests