Increase crawling content by optimizing regular expressions #238
Labels
Status: Completed
Nothing further to be done with this issue. Awaiting to be closed.
Type: Enhancement
Most issues will probably ask for additions or changes.
Milestone
Increase crawling content by optimizing regular expressions
katana-main/katana-main/pkg/utils/regex.go Connections in web pages are extracted by regular expressions
It's implemented by pageBodyRegex or relativeEndpointsRegex
But you will miss some endpoints
for example
http://www.google.com:8080
https://www.google.com/images/%E4%BA%A7%E5%93%81%E4%B8%AD%E5%BF%83/
/1.php
Then the following regular expressions will help you
reference linking: https://github.com/yuzhe-Mortal/tool/blob/main/Reptile.py
The text was updated successfully, but these errors were encountered: