-
Notifications
You must be signed in to change notification settings - Fork 46
Get error running initial job for S3 logs #22
Comments
I had the same problem. Changing the runtime to
seemed to fix it. |
I'm experiencing this same issue, and I've set the job to use |
@ryanrf-ac Hey there, thanks for chiming in - based on the error message from the original report, it looks like no data got converted in the initial job.
Did you see that error message as well? Can you provide some info on how the job was configured? Specifically:
Thanks! |
Hi @dacort. Thanks for getting back to me.
As for the arguments (I've removed the bucket name, but it uses the same bucket as the source location and converted target):
And here are some samples from the S3 access logs:
|
@ryanrf-ac Hm, ok everything looks fine there. That message indicates it's not seeing any data in the source table. A couple followup questions:
|
@dacort The access logs are using another prefix Yes, I can query the |
Using latest from this repo ...
The databases get created, but just after the access_optimized table is created, we receive this error:
glue_service_logs/utils.py", line 128, in _get_first_key_in_prefix TypeError: 'NoneType' object has no attribute '__getitem__' End of LogType:stdout
Full trace:
`20/07/21 09:47:30 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
20/07/21 09:47:30 INFO MemoryStore: MemoryStore cleared
20/07/21 09:47:30 INFO BlockManager: BlockManager stopped
20/07/21 09:47:30 INFO BlockManagerMaster: BlockManagerMaster stopped
20/07/21 09:47:30 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
20/07/21 09:47:30 INFO SparkContext: Successfully stopped SparkContext
20/07/21 09:47:30 INFO ApplicationMaster: Unregistering ApplicationMaster with FAILED (diag message: User application exited with status 1)
20/07/21 09:47:30 INFO AMRMClientImpl: Waiting for application to be successfully unregistered.
20/07/21 09:47:30 INFO ApplicationMaster: Deleting staging directory hdfs://ip-172-32-99-27.ec2.internal:8020/user/root/.sparkStaging/application_1595322435097_0001
20/07/21 09:47:30 INFO ShutdownHookManager: Shutdown hook called
20/07/21 09:47:30 INFO ShutdownHookManager: Deleting directory /mnt/yarn/usercache/root/appcache/application_1595322435097_0001/spark-1ca0f7e1-0bc4-47d6-ab91-12879d18055f
20/07/21 09:47:30 INFO ShutdownHookManager: Deleting directory /mnt/yarn/usercache/root/appcache/application_1595322435097_0001/spark-1ca0f7e1-0bc4-47d6-ab91-12879d18055f/pyspark-11265b43-46c7-46a1-a043-8f182446bdf0
End of LogType:stderr
LogType:stdout
Log Upload Time:Tue Jul 21 09:47:32 +0000 2020
LogLength:2765
Log Contents:
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Starting new HTTP connection (1): 169.254.169.254
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Starting new HTTP connection (1): 169.254.169.254
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): glue.us-east-1.amazonaws.com
INFO:athena_glue_service_logs.job:Initial run, scanning S3 for partitions.
INFO:athena_glue_service_logs.catalog_manager:Creating database aws_service_logs
INFO:athena_glue_service_logs.catalog_manager:Creating database table s3_access_raw
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): glue.us-east-1.amazonaws.com
INFO:athena_glue_service_logs.catalog_manager:Creating database table s3_access_optimized
null_fields []
INFO:athena_glue_service_logs.converter:No data returned, skipping conversion.
INFO:athena_glue_service_logs.job:Initial run with source NullPartitioner, adding all partitions from S3.
INFO:botocore.vendored.requests.packages.urllib3.connectionpool:Starting new HTTPS connection (1): rodin-glue.s3.amazonaws.com
Parse yarn logs get error message: TypeError: 'NoneType' object has no attribute 'getitem'
Traceback (most recent call last):
File "script_2020-07-21-09-45-54.py", line 4, in
job_run.convert_and_partition()
File "/mnt/yarn/usercache/root/appcache/application_1595322435097_0001/container_1595322435097_0001_01_000001/athena_glue_converter_latest.zip/athena_glue_service_logs/job.py", line 156, in convert_and_partition
File "/mnt/yarn/usercache/root/appcache/application_1595322435097_0001/container_1595322435097_0001_01_000001/athena_glue_converter_latest.zip/athena_glue_service_logs/job.py", line 137, in add_new_optimized_partitions
File "/mnt/yarn/usercache/root/appcache/application_1595322435097_0001/container_1595322435097_0001_01_000001/athena_glue_converter_latest.zip/athena_glue_service_logs/catalog_manager.py", line 89, in get_and_create_partitions
File "/mnt/yarn/usercache/root/appcache/application_1595322435097_0001/container_1595322435097_0001_01_000001/athena_glue_converter_latest.zip/athena_glue_service_logs/partitioners/date_partitioner.py", line 34, in build_partitions_from_s3
File "/mnt/yarn/usercache/root/appcache/application_1595322435097_0001/container_1595322435097_0001_01_000001/athena_glue_converter_latest.zip/athena_glue_service_logs/utils.py", line 80, in get_first_hivecompatible_date_in_prefix
File "/mnt/yarn/usercache/root/appcache/application_1595322435097_0001/container_1595322435097_0001_01_000001/athena_glue_converter_latest.zip/athena_`
The text was updated successfully, but these errors were encountered: