You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As of wpull 2.0.3, data: and mailto: URIs get added to the database, although neither serves any purpose. Not only are these schemes unsupported, there's also nothing to be retrieved for them anyway. tel: URIs (currently entirely unsupported and treated as relative paths instead) should likely also be treated the same.
As an extreme example of the impact in the real world: an ArchiveBot job's database grew to 106 GB over the past couple days due to data: URIs embedded in every page. After purging these URIs with (likely not the most efficient approach)
sqlite3 wpull.db 'SELECT id FROM url_strings WHERE url LIKE "data:%"' | sed 's,^.*$,UPDATE url_strings SET url = "data:<removed-&>" WHERE id = &\;,' >cmds
sqlite3 wpull.db <cmds
sqlite3 wpull.db VACUUM
the database size dropped to 860 MB.
The text was updated successfully, but these errors were encountered:
As of wpull 2.0.3,
data:
andmailto:
URIs get added to the database, although neither serves any purpose. Not only are these schemes unsupported, there's also nothing to be retrieved for them anyway.tel:
URIs (currently entirely unsupported and treated as relative paths instead) should likely also be treated the same.As an extreme example of the impact in the real world: an ArchiveBot job's database grew to 106 GB over the past couple days due to
data:
URIs embedded in every page. After purging these URIs with (likely not the most efficient approach)the database size dropped to 860 MB.
The text was updated successfully, but these errors were encountered: