Skip to content

Commit

Permalink
Fixed domain parsing if domain has a '?' but not '/'
Browse files Browse the repository at this point in the history
  • Loading branch information
quintindunn committed Jun 28, 2024
1 parent 1cb99c8 commit 35b1319
Showing 1 changed file with 3 additions and 1 deletion.
4 changes: 3 additions & 1 deletion src/crawler/urls.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,10 +15,12 @@ def get_protocol_and_domain_from_url(url: str):
if "//" not in url:
raise InvalidURLException(f"{logger_url_str} is not a supported url.")

protocol, _url = url.split("//", 1)
protocol, _url = url.split("//", 1) # https:, example.com/test

if "?" in _url and "/" in _url and _url.index("?") < _url.index("/"):
domain = _url.split("?", 1)[0]
elif "?" in url and "/" not in url:
domain = _url.split("?", 1)[0]
elif "/" in _url:
domain = _url.split("/", 1)[0]
else:
Expand Down

0 comments on commit 35b1319

Please sign in to comment.