Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updating signatures fails when the URL of the reference file identifier can't be found #201

Closed
replaceafill opened this issue Apr 16, 2021 · 2 comments · Fixed by #197
Closed
Assignees
Labels
bug A product defect that needs fixing P2 Medium priority issues to be scheduled in a future release
Milestone

Comments

@replaceafill
Copy link
Contributor

What version of FIDO are you using?

opf-fido 1.4.1 from PyPI and also confirmed in commit 6211d66 of the rc/1.6 branch.

How was FIDO installed?

In Ubuntu 18.04 with pip install opf-fido in a Python 2.7 virtual environment.

What did you do to cause this bug to happen?

Ran the fido-update-signatures command.

What did you expect to happen?

A file formats-v97.xml generated in the conf directory with the latest PRONOM file format definitions.

What did you see instead?

This error in the Preparing to convert PRONOM formats to FIDO signatures... step:

Traceback (most recent call last):
  File "/tmp/fido-venv/bin/fido-update-signatures", line 8, in <module>
    sys.exit(main())
  File "/tmp/fido-venv/local/lib/python2.7/site-packages/fido/update_signatures.py", line 194, in main
    run(opts)
  File "/tmp/fido-venv/local/lib/python2.7/site-packages/fido/update_signatures.py", line 113, in run
    prepare_pronom_to_fido()
  File "/tmp/fido-venv/local/lib/python2.7/site-packages/fido/prepare.py", line 697, in run
    info.load_pronom_xml(puid)
  File "/tmp/fido-venv/local/lib/python2.7/site-packages/fido/prepare.py", line 129, in load_pronom_xml
    format_ = self.parse_pronom_xml(stream, puid_filter)
  File "/tmp/fido-venv/local/lib/python2.7/site-packages/fido/prepare.py", line 278, in parse_pronom_xml
    sock = urlopen(url)
  File "/usr/lib/python2.7/urllib2.py", line 154, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/lib/python2.7/urllib2.py", line 435, in open
    response = meth(req, response)
  File "/usr/lib/python2.7/urllib2.py", line 548, in http_response
    'http', request, response, code, msg, hdrs)
  File "/usr/lib/python2.7/urllib2.py", line 467, in error
    result = self._call_chain(*args)
  File "/usr/lib/python2.7/urllib2.py", line 407, in _call_chain
    result = func(*args)
  File "/usr/lib/python2.7/urllib2.py", line 654, in http_error_302
    return self.parent.open(new, timeout=req.timeout)
  File "/usr/lib/python2.7/urllib2.py", line 435, in open
    response = meth(req, response)
  File "/usr/lib/python2.7/urllib2.py", line 548, in http_response
    'http', request, response, code, msg, hdrs)
  File "/usr/lib/python2.7/urllib2.py", line 473, in error
    return self._call_chain(*args)
  File "/usr/lib/python2.7/urllib2.py", line 407, in _call_chain
    result = func(*args)
  File "/usr/lib/python2.7/urllib2.py", line 556, in http_error_default
    raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 404: Not Found

Can you reproduce this reliably?

Yes.

Additional notes

As far as I understand if a signature file contains an element ReferenceFile/ReferenceFileIdentifier/IdentifierType with the value URL, fido downloads the file to compute a checksum for it.

Currently there are three cases of this:

  • puid.fmt.11.xml
      ...
      <ReferenceFile>
        <ReferenceFileID>1</ReferenceFileID>
        <ReferenceFileName>nurbcup2si.png</ReferenceFileName>
        <ReferenceFileDescription>W3C PNG 1.0 reference file: Indexed color (palette) image. It is interlaced, so suitable software can give a progressive display.</ReferenceFileDescription>
        <ReferenceFileDocumentation>
        </ReferenceFileDocumentation>
        <ReferenceFileIPR>
        </ReferenceFileIPR>
        <ReferenceFileNote>
        </ReferenceFileNote>
        <ReferenceFileIdentifier>
          <Identifier>www.w3.org/Graphics/PNG/nurbcup2si.png</Identifier>
          <IdentifierType>URL</IdentifierType>
        </ReferenceFileIdentifier>
      </ReferenceFile>
      <ReferenceFile>
        <ReferenceFileID>2</ReferenceFileID>
        <ReferenceFileName>666.png</ReferenceFileName>
        <ReferenceFileDescription>W3C PNG 1.0 reference file: Large truecolor image generated by a raytracer - a visualisation of a 6 by 6 by 6 color cube in CIE LUV color space.</ReferenceFileDescription>
        <ReferenceFileDocumentation>
        </ReferenceFileDocumentation>
        <ReferenceFileIPR>
        </ReferenceFileIPR>
        <ReferenceFileNote>
        </ReferenceFileNote>
        <ReferenceFileIdentifier>
          <Identifier>www.w3.org/Graphics/PNG/666.png</Identifier>
          <IdentifierType>URL</IdentifierType>
        </ReferenceFileIdentifier>
      </ReferenceFile>
      ...
  • puid.fmt.569.xml
      ...
      <ReferenceFile>
        <ReferenceFileID>3</ReferenceFileID>
        <ReferenceFileName>Matroska Test Suite - Wave 1</ReferenceFileName>
        <ReferenceFileDescription>A set of 8 files meant to cover the basic features a player should support to be considered a good Matroska player.</ReferenceFileDescription>
        <ReferenceFileDocumentation>
        </ReferenceFileDocumentation>
        <ReferenceFileIPR>
        </ReferenceFileIPR>
        <ReferenceFileNote>
        </ReferenceFileNote>
        <ReferenceFileIdentifier>
          <Identifier>http://www.matroska.org/downloads/test_w1.html</Identifier>
          <IdentifierType>URL</IdentifierType>
        </ReferenceFileIdentifier>
      </ReferenceFile>
      ...

The problem seems to be that the http://www.matroska.org/downloads/test_w1.html URL has changed to https://www.matroska.org/downloads/test_suite.html producing the problem.

Arguably this needs to be fixed in the PRONOM database, but maybe fido should handle the exception in parse_pronom_xml to protect future cases.

@carlwilson
Copy link
Member

Hi @replaceafill I've actually hit upon this issue when refactoring and updating the signature generation/update code. I've now added a 404 catch for missing test resources as these aren't essential to the functioning of FIDO. The upcoming release should fix this issue, but will also change the way/provide extra options for signature update, including the download of pre-compiled sigs from a central site.

@carlwilson carlwilson self-assigned this Apr 20, 2021
@carlwilson carlwilson added bug A product defect that needs fixing P2 Medium priority issues to be scheduled in a future release labels Apr 20, 2021
@carlwilson carlwilson added this to the v1.6 milestone Apr 20, 2021
@carlwilson carlwilson linked a pull request Apr 20, 2021 that will close this issue
@carlwilson
Copy link
Member

Closed in v1.6

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug A product defect that needs fixing P2 Medium priority issues to be scheduled in a future release
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants