Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Harvester is failing to parse harvest object id strings correctly #405

Closed
bonnland opened this issue May 22, 2020 · 3 comments · Fixed by #412
Closed

Harvester is failing to parse harvest object id strings correctly #405

bonnland opened this issue May 22, 2020 · 3 comments · Fixed by #412

Comments

@bonnland
Copy link
Contributor

bonnland commented May 22, 2020

**EDIT: The error described in this comment has been fixed, but has led to different errors in the harvesting code (version 1.3.0) described in my next comment. **

I recently upgraded ckanext-harvest to use redis 3, but I am now seeing this:

$ paster --plugin=ckanext-harvest harvester run --config=/etc/ckan/default/development.ini
/usr/lib/ckan/default/lib/python2.7/site-packages/pkg_resources/py2_warn.py:21: UserWarning: Setuptools will stop working on Python 2
************************************************************
You are running Setuptools on Python 2, which is no longer
supported and
>>> SETUPTOOLS WILL STOP WORKING <<<
in a subsequent release (no sooner than 2020-04-20).
Please ensure you are installing
Setuptools using pip 9.x or later or pin to `setuptools<45`
in your environment.
If you have done those things and are still encountering
this message, please follow up at
https://bit.ly/setuptools-py2-warning.
************************************************************
  sys.version_info < (3,) and warnings.warn(pre + "*" * 60 + msg + "*" * 60)
2020-05-22 14:19:31,514 DEBUG [ckanext.harvest.model] Harvest tables defined in memory
2020-05-22 14:19:31,765 DEBUG [ckanext.harvest.model] Harvest tables already exist
2020-05-22 14:19:31,907 DEBUG [ckanext.spatial.plugin] Setting up the spatial model
2020-05-22 14:19:31,972 DEBUG [ckanext.spatial.model.package_extent] Spatial tables defined in memory
2020-05-22 14:19:32,177 DEBUG [ckanext.spatial.model.package_extent] Spatial tables already exist
2020-05-22 14:19:33,865 DEBUG [ckanext.harvest.model] Harvest tables already exist
2020-05-22 14:19:33,953 DEBUG [ckanext.spatial.plugin] Setting up the spatial model
2020-05-22 14:19:34,113 DEBUG [ckanext.spatial.model.package_extent] Spatial tables already exist

2020-05-22 14:19:35,064 INFO  [ckanext.harvest.logic.action.update] Harvest job run: {}
2020-05-22 14:19:35,435 DEBUG [ckanext.harvest.logic.action.update] Ongoing job:31235401-2b63-4480-9380-37c0538b5e75 source:82dd7f15-54e5-41ed-998d-cf2b5435ce93
2020-05-22 14:19:35,435 DEBUG [ckanext.harvest.logic.action.update] No jobs to send to the gather queue
Traceback (most recent call last):
  File "/usr/lib/ckan/default/bin/paster", line 9, in <module>
    load_entry_point('PasteScript==2.0.2', 'console_scripts', 'paster')()
  File "/usr/lib/ckan/default/lib/python2.7/site-packages/paste/script/command.py", line 102, in run
    invoke(command, command_name, options, args[1:])
  File "/usr/lib/ckan/default/lib/python2.7/site-packages/paste/script/command.py", line 141, in invoke
    exit_code = runner.run(args)
  File "/usr/lib/ckan/default/lib/python2.7/site-packages/paste/script/command.py", line 236, in run
    result = self.command()
  File "/usr/lib/ckan/default/src/ckanext-harvest/ckanext/harvest/commands/harvester.py", line 203, in command
    self.run_harvester()
  File "/usr/lib/ckan/default/src/ckanext-harvest/ckanext/harvest/commands/harvester.py", line 348, in run_harvester
    utils.run_harvester()
  File "/usr/lib/ckan/default/src/ckanext-harvest/ckanext/harvest/utils.py", line 346, in run_harvester
    tk.get_action("harvest_jobs_run")(context, {})
  File "/usr/lib/ckan/default/src/ckan/ckan/logic/__init__.py", line 464, in wrapped
    result = _action(context, data_dict, **kw)
  File "/usr/lib/ckan/default/src/ckanext-harvest/ckanext/harvest/logic/action/update.py", line 582, in harvest_jobs_run
    resubmit_objects()
  File "/usr/lib/ckan/default/src/ckanext-harvest/ckanext/harvest/queue.py", line 171, in resubmit_objects
    if not redis.get(object_id):
  File "/usr/lib/ckan/default/lib/python2.7/site-packages/redis/client.py", line 1606, in get
    return self.execute_command('GET', name)
  File "/usr/lib/ckan/default/lib/python2.7/site-packages/redis/client.py", line 900, in execute_command
    conn.send_command(*args)
  File "/usr/lib/ckan/default/lib/python2.7/site-packages/redis/connection.py", line 716, in send_command
    self.send_packed_command(self.pack_command(*args),
  File "/usr/lib/ckan/default/lib/python2.7/site-packages/redis/connection.py", line 766, in pack_command
    for arg in imap(self.encoder.encode, args):
  File "/usr/lib/ckan/default/lib/python2.7/site-packages/redis/connection.py", line 120, in encode
    "bytes, string, int or float first." % typename)
redis.exceptions.DataError: Invalid input of type: 'result'. Convert to a bytes, string, int or float first.

This might be a related discussion: redis/redis-py#1071

Or it is possible that the older version of setuptools is to blame; not sure.

I'm happy to provide additional information if needed. I'm running CKAN 2.8.3, with the ckanext-harvest plugin updated to the commit on master from April 21, 2020.

@bonnland
Copy link
Contributor Author

bonnland commented Jun 17, 2020

Our CKAN instance relies heavily on harvesting, so running into harvest jobs that have to be manually aborted is troubling.

After upgrading to version 1.3.0, I am seeing fetch_err.log with ERROR messages like the following:

2020-06-17 09:43:19,161 INFO  [ckanext.harvest.queue] Received harvest object id: 97661975-fdfd-4b08-ac7f-b4cf1352814e
2020-06-17 09:43:19,443 DEBUG [ckanext.spatial.harvesters.base.import] Import stage for harvest object: 97661975-fdfd-4b08-ac7f-b4cf1352814e
2020-06-17 09:43:19,443 DEBUG [ckanext.spatial.harvesters.base] Using config: {u'read_only': True, u'user': u'harvest'}
2020-06-17 09:43:19,469 DEBUG [ckanext.spatial.validation.validation] Starting validation against profile(s) iso19139,dset-minimum-fields-production,geographic-extent-validator
2020-06-17 09:43:19,510 DEBUG [ckanext.spatial.validation.validation] Validated against "ISO19139 XSD Schema"
2020-06-17 09:43:19,512 DEBUG [ckanext.spatial.validation.validation] Validated against "DSET Minimum Fields Validation - Production"
2020-06-17 09:43:19,512 DEBUG [ckanext.spatial.validation.validation] Validated against "DSET Geographic Extent Validator"
2020-06-17 09:43:19,512 INFO  [ckanext.spatial.validation.validation] Validation passed
2020-06-17 09:43:20,910 DEBUG [ckanext.spatial.plugin] Received: u'{"type": "Polygon", "coordinates": [[[-179.998, -90.0], [180.0, -90.0], [180.0, 90.0], [-179.998, 90.0], [-179.998, -90.0]]]}'
2020-06-17 09:43:20,932 DEBUG [ckanext.spatial.lib] Extent for package c43058f0-a56d-406b-a729-9fe7212fbbd2 unchanged
2020-06-17 09:43:21,412 INFO  [ckanext.spatial.harvesters.base.import] Updated package c43058f0-a56d-406b-a729-9fe7212fbbd2 with guid edu.ucar.rda::ds337.0
2020-06-17 09:43:21,487 INFO  [ckanext.harvest.queue] Received harvest object id: 4
2020-06-17 09:43:21,569 ERROR [ckanext.harvest.queue] Harvest object does not exist: 4
2020-06-17 09:43:21,622 INFO  [ckanext.harvest.queue] Received harvest object id: 6
2020-06-17 09:43:21,702 ERROR [ckanext.harvest.queue] Harvest object does not exist: 6
2020-06-17 09:43:21,754 INFO  [ckanext.harvest.queue] Received harvest object id: f
2020-06-17 09:43:21,832 ERROR [ckanext.harvest.queue] Harvest object does not exist: f
2020-06-17 09:43:21,882 INFO  [ckanext.harvest.queue] Received harvest object id: b
2020-06-17 09:43:21,960 ERROR [ckanext.harvest.queue] Harvest object does not exist: b
2020-06-17 09:43:22,023 INFO  [ckanext.harvest.queue] Received harvest object id: d
2020-06-17 09:43:22,100 ERROR [ckanext.harvest.queue] Harvest object does not exist: d
2020-06-17 09:43:22,153 INFO  [ckanext.harvest.queue] Received harvest object id: 5
2020-06-17 09:43:22,232 ERROR [ckanext.harvest.queue] Harvest object does not exist: 5
2020-06-17 09:43:22,284 INFO  [ckanext.harvest.queue] Received harvest object id: 9
2020-06-17 09:43:22,360 ERROR [ckanext.harvest.queue] Harvest object does not exist: 9
2020-06-17 09:43:22,408 INFO  [ckanext.harvest.queue] Received harvest object id: 3
2020-06-17 09:43:22,484 ERROR [ckanext.harvest.queue] Harvest object does not exist: 3
2020-06-17 09:43:22,534 INFO  [ckanext.harvest.queue] Received harvest object id: 4
2020-06-17 09:43:22,614 ERROR [ckanext.harvest.queue] Harvest object does not exist: 4
2020-06-17 09:43:22,667 INFO  [ckanext.harvest.queue] Received harvest object id: b
2020-06-17 09:43:22,748 ERROR [ckanext.harvest.queue] Harvest object does not exist: b
2020-06-17 09:43:22,799 INFO  [ckanext.harvest.queue] Received harvest object id: a
2020-06-17 09:43:22,880 ERROR [ckanext.harvest.queue] Harvest object does not exist: a
2020-06-17 09:43:22,931 INFO  [ckanext.harvest.queue] Received harvest object id: 7
2020-06-17 09:43:23,010 ERROR [ckanext.harvest.queue] Harvest object does not exist: 7
2020-06-17 09:43:23,063 INFO  [ckanext.harvest.queue] Received harvest object id: 4
2020-06-17 09:43:23,143 ERROR [ckanext.harvest.queue] Harvest object does not exist: 4
2020-06-17 09:43:23,194 INFO  [ckanext.harvest.queue] Received harvest object id: d
2020-06-17 09:43:23,272 ERROR [ckanext.harvest.queue] Harvest object does not exist: d
2020-06-17 09:43:23,322 INFO  [ckanext.harvest.queue] Received harvest object id: 9
2020-06-17 09:43:23,398 ERROR [ckanext.harvest.queue] Harvest object does not exist: 9

Then after I manually abort the job, because giving a harvester run command does not change the job status from Running to Finished, I see this:

2020-06-17 09:48:14,271 INFO  [ckanext.harvest.logic.action.update] Harvest object not changed from "COMPLETE": f5f4947c-9f72-49c3-bf4e-91219702e0e9
2020-06-17 09:48:14,389 INFO  [ckanext.harvest.logic.action.update] Harvest object changed state from "WAITING" to "ERROR": 4fcb3b67-d069-48bc-820f-50ad908a3851

So I would guess that somewhere in the code, a string is being passed when a list is expected. Or something. Am I really the only person who has run into this issue?

@bonnland bonnland changed the title "Harvester Run" command is crashing with Redis 3.5.0 Harvester is failing to parse harvest object id strings correctly Jun 17, 2020
@mozgovipc
Copy link

I have the exact same issue, tested on master and v1.2.0 with CKAN 2.8.4.

We have a custom harvester, on every run it creates 2 HarvestObject, the harvester is set to run manually, sometimes after running it from the web UI or with paster --plugin=ckanext-harvest harvester run_test the last lines in the log are:

INFO  [ckanext.harvest.logic.action.update] Harvest job run: {}
DEBUG [ckanext.harvest.logic.action.update] Ongoing job:fb365470-1153-4df7-96f1-77602f229e0d source:b28d8028-4e14-4665-8df7-ef85fe735322
DEBUG [ckanext.harvest.logic.action.update] No jobs to send to the gather queue

Notice the Ongoing job: instead of the expected Marking job as finished URL ID. After that, if I'm trying to re-run it from the command line with paster --plugin=ckanext-harvest harvester run_test I get a prompt Source "name" apparently has a "Running" job ... Abort it? (y/n). If I (or the cron job) run paster --plugin=ckanext-harvest harvester run the output is:

INFO  [ckanext.harvest.logic.action.update] Harvest job run: {}
INFO  [ckanext.harvest.logic.action.update] Marking job as finished URL fb365470-1153-4df7-96f1-77602f229e0d
DEBUG [ckanext.harvest.logic.action.update] Updating search index for harvest source: name
DEBUG [ckanext.harvest.logic.action.update] No jobs to send to the gather queue

So harvester run marked the manually run harvester job as finished.

@bonnland
Copy link
Contributor Author

bonnland commented Jun 19, 2020

I am going to keep adding comments, I hope that someone finds them useful. I can certainly try to answer any questions that developers may have.

The error pattern seems somewhat random and interleaved with successful harvesting of certain records. Here is an example from today:

2020-06-19 01:30:19,138 DEBUG [ckanext.spatial.lib] Extent for package 133beec1-36fb-4fc0-9192-1b8a410bd503 unchanged
2020-06-19 01:30:19,679 INFO  [ckanext.spatial.harvesters.base.import] Updated package 133beec1-36fb-4fc0-9192-1b8a410bd503 with guid edu.ucar.rda::ds609.2
2020-06-19 01:30:19,769 INFO  [ckanext.harvest.queue] Received harvest object id: c941c1d3-2856-4831-b0a1-d61dc0ea801a
2020-06-19 01:30:20,046 DEBUG [ckanext.spatial.harvesters.base.import] Import stage for harvest object: c941c1d3-2856-4831-b0a1-d61dc0ea801a
2020-06-19 01:30:20,047 DEBUG [ckanext.spatial.harvesters.base] Using config: {u'read_only': True, u'user': u'harvest'}
2020-06-19 01:30:20,072 DEBUG [ckanext.spatial.validation.validation] Starting validation against profile(s) iso19139,dset-minimum-fields-production,geographic-extent-validator
2020-06-19 01:30:20,118 DEBUG [ckanext.spatial.validation.validation] Validated against "ISO19139 XSD Schema"
2020-06-19 01:30:20,119 DEBUG [ckanext.spatial.validation.validation] Validated against "DSET Minimum Fields Validation - Production"
2020-06-19 01:30:20,119 DEBUG [ckanext.spatial.validation.validation] Validated against "DSET Geographic Extent Validator"
2020-06-19 01:30:20,119 INFO  [ckanext.spatial.validation.validation] Validation passed
2020-06-19 01:30:20,738 DEBUG [ckanext.spatial.plugin] Received: u'{"type": "Polygon", "coordinates": [[[-180.0, 15.0], [180.0, 15.0], [180.0, 90.0], [-180.0, 90.0], [-180.0, 15.0]]]}'
2020-06-19 01:30:20,761 DEBUG [ckanext.spatial.lib] Extent for package 2049d6ee-6419-4bcc-aa41-c2691eaebcab unchanged
2020-06-19 01:30:21,252 INFO  [ckanext.spatial.harvesters.base.import] Updated package 2049d6ee-6419-4bcc-aa41-c2691eaebcab with guid edu.ucar.rda::ds010.0
2020-06-19 01:30:21,324 INFO  [ckanext.harvest.queue] Received harvest object id: 2ab67627-59b1-455f-949c-1cc6e3b559cf
2020-06-19 01:30:21,606 DEBUG [ckanext.spatial.harvesters.base.import] Import stage for harvest object: 2ab67627-59b1-455f-949c-1cc6e3b559cf
2020-06-19 01:30:21,607 DEBUG [ckanext.spatial.harvesters.base] Using config: {u'read_only': True, u'user': u'harvest'}
2020-06-19 01:30:21,634 DEBUG [ckanext.spatial.validation.validation] Starting validation against profile(s) iso19139,dset-minimum-fields-production,geographic-extent-validator
2020-06-19 01:30:21,677 DEBUG [ckanext.spatial.validation.validation] Validated against "ISO19139 XSD Schema"
2020-06-19 01:30:21,678 DEBUG [ckanext.spatial.validation.validation] Validated against "DSET Minimum Fields Validation - Production"
2020-06-19 01:30:21,679 DEBUG [ckanext.spatial.validation.validation] Validated against "DSET Geographic Extent Validator"
2020-06-19 01:30:21,679 INFO  [ckanext.spatial.validation.validation] Validation passed
2020-06-19 01:30:24,247 DEBUG [ckanext.spatial.plugin] Received: u'{"type": "Polygon", "coordinates": [[[-180.0, -90.0], [180.0, -90.0], [180.0, 90.0], [-180.0, 90.0], [-180.0, -90.0]]]}'
2020-06-19 01:30:24,270 DEBUG [ckanext.spatial.lib] Extent for package af470463-d696-4a73-b10a-abae5d169e00 unchanged
2020-06-19 01:30:24,787 INFO  [ckanext.spatial.harvesters.base.import] Updated package af470463-d696-4a73-b10a-abae5d169e00 with guid edu.ucar.rda::ds094.0
2020-06-19 01:30:24,869 INFO  [ckanext.harvest.queue] Received harvest object id: d79c74b6-cc7e-4741-83f4-b18adf95cdfe
2020-06-19 01:30:25,139 DEBUG [ckanext.spatial.harvesters.base.import] Import stage for harvest object: d79c74b6-cc7e-4741-83f4-b18adf95cdfe
2020-06-19 01:30:25,139 DEBUG [ckanext.spatial.harvesters.base] Using config: {u'read_only': True, u'user': u'harvest'}
2020-06-19 01:30:25,161 DEBUG [ckanext.spatial.validation.validation] Starting validation against profile(s) iso19139,dset-minimum-fields-production,geographic-extent-validator
2020-06-19 01:30:25,193 DEBUG [ckanext.spatial.validation.validation] Validated against "ISO19139 XSD Schema"
2020-06-19 01:30:25,194 DEBUG [ckanext.spatial.validation.validation] Validated against "DSET Minimum Fields Validation - Production"
2020-06-19 01:30:25,194 DEBUG [ckanext.spatial.validation.validation] Validated against "DSET Geographic Extent Validator"
2020-06-19 01:30:25,195 INFO  [ckanext.spatial.validation.validation] Validation passed
2020-06-19 01:30:27,048 DEBUG [ckanext.spatial.plugin] Received: u'{"type": "Polygon", "coordinates": [[[-180.0, -90.0], [180.0, -90.0], [180.0, 90.0], [-180.0, 90.0], [-180.0, -90.0]]]}'
2020-06-19 01:30:27,071 DEBUG [ckanext.spatial.lib] Extent for package d8959b4d-9a24-486c-9851-05e03c610a04 unchanged
2020-06-19 01:30:27,587 INFO  [ckanext.spatial.harvesters.base.import] Updated package d8959b4d-9a24-486c-9851-05e03c610a04 with guid edu.ucar.rda::ds083.2
2020-06-19 01:30:27,670 INFO  [ckanext.harvest.queue] Received harvest object id: a4081a7f-d0fc-42d8-9caa-9ce9c75b51d2
2020-06-19 01:30:27,953 DEBUG [ckanext.spatial.harvesters.base.import] Import stage for harvest object: a4081a7f-d0fc-42d8-9caa-9ce9c75b51d2
2020-06-19 01:30:27,953 DEBUG [ckanext.spatial.harvesters.base] Using config: {u'read_only': True, u'user': u'harvest'}
2020-06-19 01:30:27,979 DEBUG [ckanext.spatial.validation.validation] Starting validation against profile(s) iso19139,dset-minimum-fields-production,geographic-extent-validator
2020-06-19 01:30:28,024 DEBUG [ckanext.spatial.validation.validation] Validated against "ISO19139 XSD Schema"
2020-06-19 01:30:28,025 DEBUG [ckanext.spatial.validation.validation] Validated against "DSET Minimum Fields Validation - Production"
2020-06-19 01:30:28,025 DEBUG [ckanext.spatial.validation.validation] Validated against "DSET Geographic Extent Validator"
2020-06-19 01:30:28,025 INFO  [ckanext.spatial.validation.validation] Validation passed
2020-06-19 01:30:29,427 DEBUG [ckanext.spatial.plugin] Received: u'{"type": "Polygon", "coordinates": [[[-179.998, -90.0], [180.0, -90.0], [180.0, 90.0], [-179.998, 90.0], [-179.998, -90.0]]]}'
2020-06-19 01:30:29,449 DEBUG [ckanext.spatial.lib] Extent for package c43058f0-a56d-406b-a729-9fe7212fbbd2 unchanged
2020-06-19 01:30:29,883 INFO  [ckanext.spatial.harvesters.base.import] Updated package c43058f0-a56d-406b-a729-9fe7212fbbd2 with guid edu.ucar.rda::ds337.0
2020-06-19 01:30:29,964 INFO  [ckanext.harvest.queue] Received harvest object id: c
2020-06-19 01:30:30,098 INFO  [ckanext.harvest.queue] Received harvest object id: d
2020-06-19 01:30:30,178 ERROR [ckanext.harvest.queue] Harvest object does not exist: d
2020-06-19 01:30:30,230 INFO  [ckanext.harvest.queue] Received harvest object id: 2
2020-06-19 01:30:30,310 ERROR [ckanext.harvest.queue] Harvest object does not exist: 2
2020-06-19 01:30:30,362 INFO  [ckanext.harvest.queue] Received harvest object id: a
2020-06-19 01:30:30,442 ERROR [ckanext.harvest.queue] Harvest object does not exist: a
2020-06-19 01:30:55,934 INFO  [ckanext.harvest.queue] Received harvest object id: c
2020-06-19 01:30:56,017 ERROR [ckanext.harvest.queue] Harvest object does not exist: c
2020-06-19 01:31:31,501 INFO  [ckanext.harvest.queue] Received harvest object id: c
2020-06-19 01:31:31,584 ERROR [ckanext.harvest.queue] Harvest object does not exist: c
2020-06-19 01:32:07,228 INFO  [ckanext.harvest.queue] Received harvest object id: c
2020-06-19 01:32:07,309 ERROR [ckanext.harvest.queue] Harvest object does not exist: c
2020-06-19 01:32:42,861 INFO  [ckanext.harvest.queue] Received harvest object id: c
2020-06-19 01:32:42,944 ERROR [ckanext.harvest.queue] Harvest object does not exist: c
2020-06-19 01:33:18,563 INFO  [ckanext.harvest.queue] Received harvest object id: cfef7790-6e2e-46ff-bfa1-265246025856
2020-06-19 01:33:18,852 DEBUG [ckanext.spatial.harvesters.base.import] Import stage for harvest object: cfef7790-6e2e-46ff-bfa1-265246025856
2020-06-19 01:33:18,852 DEBUG [ckanext.spatial.harvesters.base] Using config: {u'read_only': True, u'user': u'harvest'}
2020-06-19 01:33:18,887 DEBUG [ckanext.spatial.validation.validation] Starting validation against profile(s) iso19139,dset-minimum-fields-production,geographic-extent-validator
2020-06-19 01:33:18,929 DEBUG [ckanext.spatial.validation.validation] Validated against "ISO19139 XSD Schema"
2020-06-19 01:33:18,930 DEBUG [ckanext.spatial.validation.validation] Validated against "DSET Minimum Fields Validation - Production"
2020-06-19 01:33:18,931 DEBUG [ckanext.spatial.validation.validation] Validated against "DSET Geographic Extent Validator"
2020-06-19 01:33:18,931 INFO  [ckanext.spatial.validation.validation] Validation passed
2020-06-19 01:33:19,929 DEBUG [ckanext.spatial.plugin] Received: u'{"type": "Polygon", "coordinates": [[[-180.0, -59.875], [180.0, -59.875], [180.0, 59.875], [-180.0, 59.875], [-180.0, -59.875]]]}'
2020-06-19 01:33:19,952 DEBUG [ckanext.spatial.lib] Extent for package 8621a0f9-60c4-4a08-8ad0-4a46c07bdc88 unchanged
2020-06-19 01:33:20,399 INFO  [ckanext.spatial.harvesters.base.import] Updated package 8621a0f9-60c4-4a08-8ad0-4a46c07bdc88 with guid edu.ucar.rda::ds502.1
2020-06-19 01:33:20,481 INFO  [ckanext.harvest.queue] Received harvest object id: c
2020-06-19 01:33:20,563 ERROR [ckanext.harvest.queue] Harvest object does not exist: c
2020-06-19 05:30:22,028 INFO  [ckanext.harvest.queue] Received harvest object id: 7169c6b5-28f9-4e22-9026-94abca547897
Traceback (most recent call last):
  File "/usr/lib/ckan/default/bin/paster", line 9, in <module>
    load_entry_point('PasteScript==2.0.2', 'console_scripts', 'paster')()
  File "/usr/lib/ckan/default/lib/python2.7/site-packages/paste/script/command.py", line 102, in run
    invoke(command, command_name, options, args[1:])
  File "/usr/lib/ckan/default/lib/python2.7/site-packages/paste/script/command.py", line 141, in invoke
    exit_code = runner.run(args)
  File "/usr/lib/ckan/default/lib/python2.7/site-packages/paste/script/command.py", line 236, in run
    result = self.command()
  File "/usr/lib/ckan/default/src/ckanext-harvest/ckanext/harvest/commands/harvester.py", line 209, in command
    utils.fetch_consumer()
  File "/usr/lib/ckan/default/src/ckanext-harvest/ckanext/harvest/utils.py", line 337, in fetch_consumer
    fetch_callback(consumer, method, header, body)
  File "/usr/lib/ckan/default/src/ckanext-harvest/ckanext/harvest/queue.py", line 437, in fetch_callback
    obj = HarvestObject.get(id)
  File "/usr/lib/ckan/default/src/ckanext-harvest/ckanext/harvest/model/__init__.py", line 99, in get
    o = cls.filter(**kwds).first()
  File "/usr/lib/ckan/default/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 2755, in first
    ret = list(self[0:1])
  File "/usr/lib/ckan/default/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 2547, in __getitem__
    return list(res)
  File "/usr/lib/ckan/default/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 2855, in __iter__
    return self._execute_and_instances(context)
  File "/usr/lib/ckan/default/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 2876, in _execute_and_instances
    close_with_result=True)
  File "/usr/lib/ckan/default/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 2885, in _get_bind_args
    **kw
  File "/usr/lib/ckan/default/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 2867, in _connection_from_session
    conn = self.session.connection(**kw)
  File "/usr/lib/ckan/default/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 998, in connection
    execution_options=execution_options)
  File "/usr/lib/ckan/default/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 1003, in _connection_for_bind
    engine, execution_options)
  File "/usr/lib/ckan/default/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 382, in _connection_for_bind
    self._assert_active()
  File "/usr/lib/ckan/default/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 276, in _assert_active
    % self._rollback_exception
sqlalchemy.exc.InvalidRequestError: This Session's transaction has been rolled back due to a previous exception during flush. To begin a new transaction with this Session, first issue Session.rollback(). Original exception was: (psycopg2.OperationalError) server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
 [SQL: 'INSERT INTO harvest_log (id, content, level, created) VALUES (%(id)s, %(content)s, %(level)s, %(created)s)'] [parameters: {'content': u'Received harvest object id: 7169c6b5-28f9-4e22-9026-94abca547897', 'created': datetime.datetime(2020, 6, 19, 11, 30, 22, 31121), 'id': u'5305d289-1c09-41d9-9256-5eee09278d9b', 'level': 'INFO'}]

At this point the harvesting job is stuck, and no further harvesting jobs appear to start because giving the "harvester run" command does not put this job in a finished state.

When this happens, I have begun performing the ritual of going through the following manual steps:

  1. abort the job
  2. purge the harvest queues
  3. stop the harvester processes
  4. restart the redis service
  5. restart the harvester
  6. create a new harvest job

I don't know if all of these steps are necessary. If someone knows of a better way to handle the harvest errors, please let me know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants