hash-http-content

This is a Python library to retrieve the contents of a given URL via HTTP (or HTTPS) and hash the processed contents.

Content processing

If an encoding is detected, this package will convert content into the UTF-8 encoding before proceeding.

Additional content processing is currently implemented for the following types of content:

HTML
JSON

HTML

HTML content is processed by leveraging the pyppeteer package to execute any JavaScript on a retrieved page. The result is then parsed by Beautiful Soup to reduce the content to the human visible portions of a page.

JSON

JSON content is processed by using the json library that is part of the Python standard library. It is read in and then output in a deterministic manner to adjust for any styling differences between content.

Contributing

We welcome contributions! Please see CONTRIBUTING.md for details.

License

This project is in the worldwide public domain.

This project is in the public domain within the United States, and copyright and related rights in the work worldwide are waived through the CC0 1.0 Universal public domain dedication.

All contributions to this project will be released under the CC0 dedication. By submitting a pull request, you are agreeing to comply with this waiver of copyright interest.

Name		Name	Last commit message	Last commit date
Latest commit History 634 Commits
.github		.github
src/hash_http_content		src/hash_http_content
tests		tests
.ansible-lint		.ansible-lint
.bandit.yml		.bandit.yml
.coveragerc		.coveragerc
.flake8		.flake8
.gitignore		.gitignore
.isort.cfg		.isort.cfg
.mdl_config.yaml		.mdl_config.yaml
.pre-commit-config.yaml		.pre-commit-config.yaml
.prettierignore		.prettierignore
.yamllint		.yamllint
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
bump_version.sh		bump_version.sh
get_serverless_chrome_binary.sh		get_serverless_chrome_binary.sh
pytest.ini		pytest.ini
requirements-dev.txt		requirements-dev.txt
requirements-test.txt		requirements-test.txt
requirements.txt		requirements.txt
setup-env		setup-env
setup.py		setup.py
tag.sh		tag.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

hash-http-content

Content processing

HTML

JSON

Contributing

License

About

Releases

Packages

Contributors 6

Languages

License

cisagov/hash-http-content

Folders and files

Latest commit

History

Repository files navigation

hash-http-content

Content processing

HTML

JSON

Contributing

License

About

Resources

License

Security policy

Stars

Watchers

Forks

Releases

Packages 0

Contributors 6

Languages

Packages