Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filter work summary to remove non-xml safe characters (PP-1969) #2198

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

jonathangreen
Copy link
Member

@jonathangreen jonathangreen commented Nov 27, 2024

Description

Add some filtering to our set_summary function to filter out xml unsafe characters and add a DB migration to remove these characters from our existing database.

Motivation and Context

Seeing Exception in web app: All strings must be XML compatible: Unicode or ASCII, no NULL bytes or control characters in our logs.

Traceback (most recent call last):
  File "src/lxml/builder.py", line 161, in lxml.builder.ElementMaker.__init__.add_text
  File "src/lxml/etree.pyx", line 1202, in lxml.etree._Element.__getitem__
IndexError: list index out of range
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/var/www/circulation/env/lib/python3.10/site-packages/flask/app.py", line 880, in full_dispatch_request
    rv = self.dispatch_request()
  File "/var/www/circulation/env/lib/python3.10/site-packages/flask/app.py", line 865, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)  # type: ignore[no-any-return]
  File "/var/www/circulation/src/palace/manager/api/routes.py", line 120, in decorated
    return f(*args, **kwargs)
  File "/var/www/circulation/src/palace/manager/api/routes.py", line 93, in wrapped_function
    resp = make_response(f(*args, **kwargs))
  File "/var/www/circulation/src/palace/manager/core/app_server.py", line 87, in decorated
    v = f(*args, **kwargs)
  File "/var/www/circulation/src/palace/manager/core/app_server.py", line 163, in compressor
    return f(*args, **kwargs)
  File "/var/www/circulation/src/palace/manager/api/routes.py", line 239, in acquisition_groups
    return app.manager.opds_feeds.groups(lane_identifier)
  File "/var/www/circulation/src/palace/manager/api/controller/opds_feed.py", line 100, in groups
    return feed_class.groups(
  File "/var/www/circulation/src/palace/manager/feed/opds.py", line 64, in as_response
    serializer.serialize_feed(
  File "/var/www/circulation/src/palace/manager/feed/serializer/opds.py", line 102, in serialize_feed
    element = self.serialize_work_entry(entry.computed)
  File "/var/www/circulation/src/palace/manager/feed/serializer/opds.py", line 180, in serialize_work_entry
    entry.append(OPDSFeed.E("summary", feed_entry.summary.text))
  File "src/lxml/builder.py", line 221, in lxml.builder.ElementMaker.__call__
  File "src/lxml/builder.py", line 163, in lxml.builder.ElementMaker.__init__.add_text
  File "src/lxml/etree.pyx", line 1065, in lxml.etree._Element.text.__set__
  File "src/lxml/apihelpers.pxi", line 749, in lxml.etree._setNodeText
  File "src/lxml/apihelpers.pxi", line 737, in lxml.etree._createTextNode
  File "src/lxml/apihelpers.pxi", line 1530, in lxml.etree._utf8
ValueError: All strings must be XML compatible: Unicode or ASCII, no NULL bytes or control characters

Looking at the data in our database, these are only coming in via the Enki integration, but it seemed like a good idea to make sure they can't make it in though any integration.

How Has This Been Tested?

  • Tested locally
  • Running unit tests

Checklist

  • I have updated the documentation accordingly.
  • All new and existing tests passed.

@jonathangreen jonathangreen added bug Something isn't working DB migration This PR contains a DB migration labels Nov 27, 2024
@jonathangreen jonathangreen requested a review from a team November 27, 2024 20:51
Copy link

codecov bot commented Nov 27, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 91.09%. Comparing base (102e9f6) to head (5f2130c).

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #2198   +/-   ##
=======================================
  Coverage   91.09%   91.09%           
=======================================
  Files         363      363           
  Lines       41248    41254    +6     
  Branches     8839     8842    +3     
=======================================
+ Hits        37575    37581    +6     
  Misses       2406     2406           
  Partials     1267     1267           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working DB migration This PR contains a DB migration
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant