Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Internal Analysis Robot #8

Open
Mischback opened this issue Nov 30, 2022 · 0 comments
Open

Internal Analysis Robot #8

Mischback opened this issue Nov 30, 2022 · 0 comments
Labels
area/ci Affects the CI (e.g. GitHub Actions) area/repository Affects the repository structure lang/python type/feature New feature / feature request
Milestone

Comments

@Mischback
Copy link
Owner

Idea

Have the (generated) website placed in a container (e.g. Docker) and let it be analyzed by a robot / spider.

Check every document and track the following information:

  • internal links incoming (must be derived from analyzing other documents!)
  • internal links outgoing (tracked by document, including counts, disregarding #targets)
  • external links outgoing (tracked by URI, including counts, disregarding #targets)

Implementation in Python, most-likely multithreaded with a configurable number of worker threads beside the main management thread.

@Mischback Mischback added area/ci Affects the CI (e.g. GitHub Actions) area/repository Affects the repository structure lang/python type/feature New feature / feature request labels Nov 30, 2022
@Mischback Mischback added this to the Crawl milestone Nov 30, 2022
@Mischback Mischback modified the milestones: Crawl, Walk Mar 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/ci Affects the CI (e.g. GitHub Actions) area/repository Affects the repository structure lang/python type/feature New feature / feature request
Projects
None yet
Development

No branches or pull requests

1 participant