You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The s3 key provides information about the bucket and object involved in the event. The object key name value is URL encoded. For example, "red flower.jpg" becomes "red+flower.jpg" (Amazon S3 returns "application/x-www-form-urlencoded" as the content type in the response).
The current extract-Enhancer-TextractAsyncJobSubmitFunction Lambda does not URL decode the S3 key received in the event JSON, so I'm receiving errors when trying to parse objects that contain spaces or other URL-encoding relevant characters.
For example, If I upload a document named my test.pdf, the S3 event sent to the extract-Enhancer-TextractAsyncJobSubmitFunction function contains the key Records[0].s3.object.key = my+test.pdf.
The Textract API calls textract.start_document_analysis() and textract.start_document_text_detection() then fail because the DocumentLocation parameter has a value of my+test.pdf when it should instead be my test.pdf.
Can you add URL decoding to the S3 key name in the received events?
The text was updated successfully, but these errors were encountered:
matwerber1
changed the title
S3 event replaces spaces in object name with plus (+), causing Textract API to fail
S3 Event sends URL encoded key names, causing Lambda handler to fail on Textract API calls
Aug 31, 2019
Hi,
Per S3 docs:
The current extract-Enhancer-TextractAsyncJobSubmitFunction Lambda does not URL decode the S3 key received in the event JSON, so I'm receiving errors when trying to parse objects that contain spaces or other URL-encoding relevant characters.
For example, If I upload a document named
my test.pdf
, the S3 event sent to the extract-Enhancer-TextractAsyncJobSubmitFunction function contains the key Records[0].s3.object.key =my+test.pdf
.The Textract API calls
textract.start_document_analysis()
andtextract.start_document_text_detection()
then fail because the DocumentLocation parameter has a value ofmy+test.pdf
when it should instead bemy test.pdf
.Can you add URL decoding to the S3 key name in the received events?
The text was updated successfully, but these errors were encountered: