Skip to content

Commit

Permalink
Merge pull request #30 from hueristiq/dev
Browse files Browse the repository at this point in the history
Development version 0.2.0
enenumxela authored Jun 25, 2023
2 parents 58d906e + 565b39a commit 93a23b0
Showing 16 changed files with 491 additions and 255 deletions.
102 changes: 66 additions & 36 deletions README.md
Original file line number Diff line number Diff line change
@@ -2,7 +2,7 @@

![made with go](https://img.shields.io/badge/made%20with-Go-0000FF.svg) [![release](https://img.shields.io/github/release/hueristiq/xurlfind3r?style=flat&color=0000FF)](https://github.com/hueristiq/xurlfind3r/releases) [![license](https://img.shields.io/badge/license-MIT-gray.svg?color=0000FF)](https://github.com/hueristiq/xurlfind3r/blob/master/LICENSE) ![maintenance](https://img.shields.io/badge/maintained%3F-yes-0000FF.svg) [![open issues](https://img.shields.io/github/issues-raw/hueristiq/xurlfind3r.svg?style=flat&color=0000FF)](https://github.com/hueristiq/xurlfind3r/issues?q=is:issue+is:open) [![closed issues](https://img.shields.io/github/issues-closed-raw/hueristiq/xurlfind3r.svg?style=flat&color=0000FF)](https://github.com/hueristiq/xurlfind3r/issues?q=is:issue+is:closed) [![contribution](https://img.shields.io/badge/contributions-welcome-0000FF.svg)](https://github.com/hueristiq/xurlfind3r/blob/master/CONTRIBUTING.md)

`xurlfind3r` is a command-line interface (CLI) utility to find domain's known URLs from **[AlienVault's Open Threat Exchange](https://otx.alienvault.com/)**, **[Common Crawl](https://commoncrawl.org/)**, **[Github](https://github.com)**, **[Intelligence X](https://intelx.io)**, **[URLScan](https://urlscan.io/)**, and the **[Wayback Machine](https://archive.org/web/)**.
`xurlfind3r` is a command-line interface (CLI) utility to find domain's known URLs from curated passive online sources.

## Resource

@@ -14,16 +14,26 @@
* [`go build ...` the development Version](#go-build--the-development-version)
* [Post Installation](#post-installation)
* [Usage](#usage)
* [Contribution](#contribution)
* [Basic](#basic)
* [Filter Regex](#filter-regex)
* [Match Regex](#match-regex)
* [Contributing](#contributing)
* [Licensing](#licensing)
* [Credits](#credits)

## Features

* Fetches URLs from **[AlienVault's OTX](https://otx.alienvault.com/)**, **[Common Crawl](https://commoncrawl.org/)**, **[URLScan](https://urlscan.io/)**, **[Github](https://github.com)**, **[Intelligence X](https://intelx.io)** and the **[Wayback Machine](https://archive.org/web/)**.
* Parses URLs from `robots.txt` snapshots on the Wayback Machine.
* Parses URLs from webpages snapshots on the Wayback Machine.
* Fetches URLs from curated passive sources to maximize results:
* **[AlienVault's OTX](https://otx.alienvault.com/)**
* **[BeVigil](https://bevigil.com)**
* **[Common Crawl](https://commoncrawl.org/)**
* **[URLScan](https://urlscan.io/)**
* **[Github](https://github.com)**
* **[Intelligence X](https://intelx.io)**
* **[Wayback Machine](https://archive.org/web/)**
* With Wayback Machine, Parses URLs from `robots.txt` snapshots.
* With Wayback Machine, Parses URLs from webpages' snapshots.
* Cross-Platform (Windows, Linux & macOS)
* Supports URLs match and filter

## Installation

@@ -103,25 +113,30 @@ go install -v github.com/hueristiq/xurlfind3r/cmd/xurlfind3r@latest

## Post Installation

`xurlfind3r` will work right after [installation](#installation). However, **[Github](https://github.com)** and **[Intelligence X](https://intelx.io)** require API keys to work. The API keys are stored in the `$HOME/.hueristiq/xurlfind3r/config.yaml` file - created upon first run - and uses the YAML format.
`xurlfind3r` will work right after [installation](#installation). However, **[BeVigil](https://bevigil.com)**, **[Github](https://github.com)** and **[Intelligence X](https://intelx.io)** require API keys to work, **[URLScan](https://urlscan.io)** supports API key but not required. The API keys are stored in the `$HOME/.hueristiq/xurlfind3r/config.yaml` file - created upon first run - and uses the YAML format. Multiple API keys can be specified for each of these source from which one of them will be used.

Example:
Example `config.yaml`:

```yaml
version: 0.1.0
version: 0.2.0
sources:
- bevigil
- commoncrawl
- github
- intelx
- otx
- urlscan
- wayback
keys:
bevigil:
- awA5nvpKU3N8ygkZ
github:
- d23a554bbc1aabb208c9acfbd2dd41ce7fc9db39
- asdsd54bbc1aabb208c9acfbd2dd41ce7fc9db39
intelx:
- 2.intelx.io:00000000-0000-0000-0000-000000000000
urlscan:
- d4c85d34-e425-446e-d4ab-f5a3412acbe8
```
## Usage
@@ -139,48 +154,63 @@ help message:
__ ___ _ _ __| |/ _(_)_ __ __| |___ / _ __
\ \/ / | | | '__| | |_| | '_ \ / _` | |_ \| '__|
> <| |_| | | | | _| | | | | (_| |___) | |
/_/\_\\__,_|_| |_|_| |_|_| |_|\__,_|____/|_| v0.1.0

A CLI utility to find domain's known URLs.
/_/\_\\__,_|_| |_|_| |_|_| |_|\__,_|____/|_| v0.2.0

USAGE:
xurlfind3r [OPTIONS]

TARGET:
-d, --domain string target domain
--include-subdomains bool include domain's subdomains
-d, --domain string (sub)domain to match URLs

SCOPE:
--include-subdomains bool match subdomain's URLs

SOURCES:
--list-sources bool list available sources
-s --sources strings comma(,) separated sources to use (default: commoncrawl,github,intelx,otx,urlscan,wayback)
-s, --sources bool list sources
-u, --use-sources string sources to use (default: bevigil,commoncrawl,github,intelx,otx,urlscan,wayback)
--skip-wayback-robots bool with wayback, skip parsing robots.txt snapshots
--skip-wayback-source bool with wayback, skip parsing source code snapshots

CONFIGURATION:
--skip-wayback-robots bool skip parsing wayback robots.txt snapshots
--skip-wayback-source bool skip parsing wayback source code snapshots
FILTER & MATCH:
-f, --filter string regex to filter URLs
-m, --match string regex to match URLs

OUTPUT:
-m, --monochrome no colored output mode
-o, --output string output file to write found URLs
-v, --verbosity debug, info, warning, error, fatal or silent (default: info)
--no-color bool no color mode
-o, --output string output URLs file path
-v, --verbosity string debug, info, warning, error, fatal or silent (default: info)

CONFIGURATION:
-c, --configuration string configuration file path (default: ~/.hueristiq/xurlfind3r/config.yaml)

```

## Contribution
### Examples

[Issues](https://github.com/hueristiq/xurlfind3r/issues) and [Pull Requests](https://github.com/hueristiq/xurlfind3r/pulls) are welcome! **Check out the [contribution guidelines](./CONTRIBUTING.md).**
#### Basic

## Licensing
```bash
xurlfind3r -d hackerone.com --include-subdomains
```

This utility is distributed under the [MIT license](./LICENSE).
#### Filter Regex

## Credits
```bash
# filter images
xurlfind3r -d hackerone.com --include-subdomains -f '`^https?://[^/]*?/.*\.(jpg|jpeg|png|gif|bmp)(\?[^\s]*)?$`'
```

#### Match Regex

```bash
# match js URLs
xurlfind3r -d hackerone.com --include-subdomains -m '^https?://[^/]*?/.*\.js(\?[^\s]*)?$'
```

## Contributing

* Sources - Thanks to below platforms (Used as data sources in this project):
* Alien Vault OTX (otx.alienvault.com)
* Common Crawl (index.commoncrawl.org) - [Donate to CommonCrawl](https://commoncrawl.org/donate/)
* Github (github.com)
* Intelligence X (intelx.io)
* URLScan (urlscan.io)
* Wayback Machine (web.archive.org) - [Donate to InternetArchive](https://archive.org/donate)
* Alternatives - Check out projects below, that may fit in your workflow:
[Issues](https://github.com/hueristiq/xurlfind3r/issues) and [Pull Requests](https://github.com/hueristiq/xurlfind3r/pulls) are welcome! **Check out the [contribution guidelines](./CONTRIBUTING.md).**

## Licensing

[gau](https://github.com/tomnomnom/waybackurls) ◇ [waybackurls](https://github.com/tomnomnom/waybackurls) ◇ [waymore](https://github.com/xnl-h4ck3r/waymore)
This utility is distributed under the [MIT license](./LICENSE).
174 changes: 94 additions & 80 deletions cmd/xurlfind3r/main.go
Original file line number Diff line number Diff line change
@@ -9,43 +9,61 @@ import (
"strconv"
"strings"

hqlog "github.com/hueristiq/hqgoutils/log"
"github.com/hueristiq/hqgoutils/log/formatter"
"github.com/hueristiq/hqgoutils/log/levels"
"github.com/hueristiq/hqgolog"
"github.com/hueristiq/hqgolog/formatter"
"github.com/hueristiq/hqgolog/levels"
"github.com/hueristiq/xurlfind3r/internal/configuration"
"github.com/hueristiq/xurlfind3r/pkg/xurlfind3r"
"github.com/hueristiq/xurlfind3r/pkg/xurlfind3r/sources"
"github.com/imdario/mergo"
"github.com/logrusorgru/aurora/v3"
"github.com/spf13/pflag"
)

var (
au aurora.Aurora

domain string
domain string

includeSubdomains bool

listSources bool
sourcesToUse []string
skipWaybackRobots bool
skipWaybackSource bool
monochrome bool
output string
verbosity string

filterPattern string
matchPattern string

monochrome bool
output string
verbosity string

YAMLConfigFile string
)

func init() {
// Handle command line arguments & flags
// defaults
defaultYAMLConfigFile := "~/.hueristiq/xurlfind3r/config.yaml"

// Handle CLI arguments, flags & help message (pflag)
pflag.StringVarP(&domain, "domain", "d", "", "")

pflag.BoolVar(&includeSubdomains, "include-subdomains", false, "")
pflag.BoolVar(&listSources, "list-sources", false, "")
pflag.StringSliceVarP(&sourcesToUse, "sources", "s", sources.List, "")

pflag.BoolVarP(&listSources, "sources", "s", false, "")
pflag.StringSliceVarP(&sourcesToUse, "use-sources", "u", sources.List, "")
pflag.BoolVar(&skipWaybackRobots, "skip-wayback-robots", false, "")
pflag.BoolVar(&skipWaybackSource, "skip-wayback-source", false, "")
pflag.BoolVarP(&monochrome, "monochrome", "m", false, "")

pflag.StringVarP(&filterPattern, "filter", "f", "", "")
pflag.StringVarP(&matchPattern, "match", "m", "", "")

pflag.BoolVar(&monochrome, "no-color", false, "")
pflag.StringVarP(&output, "output", "o", "", "")
pflag.StringVarP(&verbosity, "verbosity", "v", string(levels.LevelInfo), "")

pflag.StringVarP(&YAMLConfigFile, "configuration", "c", defaultYAMLConfigFile, "")

pflag.CommandLine.SortFlags = false
pflag.Usage = func() {
fmt.Fprintln(os.Stderr, configuration.BANNER)
@@ -54,169 +72,165 @@ func init() {
h += " xurlfind3r [OPTIONS]\n"

h += "\nTARGET:\n"
h += " -d, --domain string target domain\n"
h += " --include-subdomains bool include domain's subdomains\n"
h += " -d, --domain string (sub)domain to match URLs\n"

h += "\nSCOPE:\n"
h += " --include-subdomains bool match subdomain's URLs\n"

h += "\nSOURCES:\n"
h += " --list-sources bool list available sources\n"
h += " -s --sources strings comma(,) separated sources to use (default: commoncrawl,github,intelx,otx,urlscan,wayback)\n"
h += " -s, --sources bool list sources\n"
h += fmt.Sprintf(" -u, --use-sources string sources to use (default: %s)\n", strings.Join(sources.List, ","))
h += " --skip-wayback-robots bool with wayback, skip parsing robots.txt snapshots\n"
h += " --skip-wayback-source bool with wayback, skip parsing source code snapshots\n"

h += "\nCONFIGURATION:\n"
h += " --skip-wayback-robots bool skip parsing wayback robots.txt snapshots\n"
h += " --skip-wayback-source bool skip parsing wayback source code snapshots\n"
h += "\nFILTER & MATCH:\n"
h += " -f, --filter string regex to filter URLs\n"
h += " -m, --match string regex to match URLs\n"

h += "\nOUTPUT:\n"
h += " -m, --monochrome no colored output mode\n"
h += " -o, --output string output file to write found URLs\n"
h += fmt.Sprintf(" -v, --verbosity debug, info, warning, error, fatal or silent (default: %s)\n\n", string(levels.LevelInfo))
h += " --no-color bool no color mode\n"
h += " -o, --output string output URLs file path\n"
h += fmt.Sprintf(" -v, --verbosity string debug, info, warning, error, fatal or silent (default: %s)\n", string(levels.LevelInfo))

h += "\nCONFIGURATION:\n"
h += fmt.Sprintf(" -c, --configuration string configuration file path (default: %s)\n", defaultYAMLConfigFile)

fmt.Fprintln(os.Stderr, h)
}

pflag.Parse()

// Initialize logger
hqlog.DefaultLogger.SetMaxLevel(levels.LevelStr(verbosity))
hqlog.DefaultLogger.SetFormatter(formatter.NewCLI(&formatter.CLIOptions{
// Initialize logger (hqgolog)
hqgolog.DefaultLogger.SetMaxLevel(levels.LevelStr(verbosity))
hqgolog.DefaultLogger.SetFormatter(formatter.NewCLI(&formatter.CLIOptions{
Colorize: !monochrome,
}))

// Handle configuration on initial run
var (
err error
config configuration.Configuration
)

_, err = os.Stat(configuration.ConfigurationFilePath)
if err != nil {
if os.IsNotExist(err) {
config = configuration.Default

if err = configuration.Write(&config); err != nil {
hqlog.Fatal().Msg(err.Error())
}
} else {
hqlog.Fatal().Msg(err.Error())
}
} else {
config, err = configuration.Read()
// Create | Update configuration
if strings.HasPrefix(YAMLConfigFile, "~") {
home, err := os.UserHomeDir()
if err != nil {
hqlog.Fatal().Msg(err.Error())
hqgolog.Fatal().Msg(err.Error())
}

if config.Version != configuration.VERSION {
if err = mergo.Merge(&config, configuration.Default); err != nil {
hqlog.Fatal().Msg(err.Error())
}

config.Version = configuration.VERSION
YAMLConfigFile = strings.Replace(YAMLConfigFile, "~", home, 1)
}

if err = configuration.Write(&config); err != nil {
hqlog.Fatal().Msg(err.Error())
}
}
if err := configuration.CreateUpdate(YAMLConfigFile); err != nil {
hqgolog.Fatal().Msg(err.Error())
}

au = aurora.NewAurora(!monochrome)
}

func main() {
// Print Banner
if verbosity != string(levels.LevelSilent) {
fmt.Fprintln(os.Stderr, configuration.BANNER)
}

config, err := configuration.Read()
// Read in configuration
config, err := configuration.Read(YAMLConfigFile)
if err != nil {
hqlog.Fatal().Msg(err.Error())
hqgolog.Fatal().Msg(err.Error())
}

keys := config.GetKeys()

// Handle sources listing
// List suported sources
if listSources {
hqlog.Info().Msgf("current list of the available %v sources", au.Underline(strconv.Itoa(len(config.Sources))).Bold())
hqlog.Info().Msg("sources marked with an * needs key or token")
hqlog.Print().Msg("")
hqgolog.Info().Msgf("listing %v current supported sources", au.Underline(strconv.Itoa(len(config.Sources))).Bold())
hqgolog.Info().Msgf("sources with %v needs a key or token", au.Underline("*").Bold())
hqgolog.Print().Msg("")

needsKey := make(map[string]interface{})
keysElem := reflect.ValueOf(&keys).Elem()
keysElem := reflect.ValueOf(&config.Keys).Elem()

for i := 0; i < keysElem.NumField(); i++ {
needsKey[strings.ToLower(keysElem.Type().Field(i).Name)] = keysElem.Field(i).Interface()
}

for _, source := range config.Sources {
if _, ok := needsKey[source]; ok {
hqlog.Print().Msgf("> %s *", source)
hqgolog.Print().Msgf("> %s *", source)
} else {
hqlog.Print().Msgf("> %s", source)
hqgolog.Print().Msgf("> %s", source)
}
}

hqlog.Print().Msg("")
hqgolog.Print().Msg("")

os.Exit(0)
}

// Handle URLs finding
// Find URLs
if verbosity != string(levels.LevelSilent) {
hqlog.Info().Msgf("finding URLs for %v.", au.Underline(domain).Bold())
hqgolog.Info().Msgf("finding URLs for %v.", au.Underline(domain).Bold())

if includeSubdomains {
hqlog.Info().Msg("`--include-subdomains` used: includes subdomains' URLs.")
hqgolog.Info().Msg("`--include-subdomains` used: includes subdomains' URLs.")
}

hqlog.Print().Msg("")
hqgolog.Print().Msg("")
}

options := &xurlfind3r.Options{
Domain: domain,
IncludeSubdomains: includeSubdomains,
Sources: sourcesToUse,
Keys: keys,
Keys: config.Keys,
ParseWaybackRobots: !skipWaybackRobots,
ParseWaybackSource: !skipWaybackSource,
FilterPattern: filterPattern,
Matchattern: matchPattern,
}

finder, err := xurlfind3r.New(options)
if err != nil {
hqgolog.Fatal().Msg(err.Error())
}

finder := xurlfind3r.New(options)
URLs := finder.Find()

if output != "" {
// Create output file path directory
directory := filepath.Dir(output)

if _, err := os.Stat(directory); os.IsNotExist(err) {
if err = os.MkdirAll(directory, os.ModePerm); err != nil {
hqlog.Fatal().Msg(err.Error())
hqgolog.Fatal().Msg(err.Error())
}
}

// Create output file
file, err := os.OpenFile(output, os.O_APPEND|os.O_CREATE|os.O_WRONLY, 0644)
if err != nil {
hqlog.Fatal().Msg(err.Error())
hqgolog.Fatal().Msg(err.Error())
}

defer file.Close()

// Write URLs output file and print on screen
writer := bufio.NewWriter(file)

for URL := range URLs {
if verbosity == string(levels.LevelSilent) {
hqlog.Print().Msg(URL.Value)
hqgolog.Print().Msg(URL.Value)
} else {
hqlog.Print().Msgf("[%s] %s", au.BrightBlue(URL.Source), URL.Value)
hqgolog.Print().Msgf("[%s] %s", au.BrightBlue(URL.Source), URL.Value)
}

fmt.Fprintln(writer, URL.Value)
}

if err = writer.Flush(); err != nil {
hqlog.Fatal().Msg(err.Error())
hqgolog.Fatal().Msg(err.Error())
}
} else {
// Print URLs on screen
for URL := range URLs {
if verbosity == string(levels.LevelSilent) {
hqlog.Print().Msg(URL.Value)
hqgolog.Print().Msg(URL.Value)
} else {
hqlog.Print().Msgf("[%s] %s", au.BrightBlue(URL.Source), URL.Value)
hqgolog.Print().Msgf("[%s] %s", au.BrightBlue(URL.Source), URL.Value)
}
}
}
12 changes: 7 additions & 5 deletions go.mod
Original file line number Diff line number Diff line change
@@ -3,9 +3,11 @@ module github.com/hueristiq/xurlfind3r
go 1.20

require (
dario.cat/mergo v1.0.0
github.com/corpix/uarand v0.2.0
github.com/hueristiq/hqgoutils v0.0.0-20230313202103-053eefb3d3ec
github.com/imdario/mergo v0.3.16
github.com/hueristiq/hqgolimit v0.0.0-20230623113203-3e14552a97f8
github.com/hueristiq/hqgolog v0.0.0-20230623101640-92de7a10a4bb
github.com/hueristiq/hqgourl v0.0.0-20230623095947-4dee5ebb9a96
github.com/logrusorgru/aurora/v3 v3.0.0
github.com/spf13/pflag v1.0.5
github.com/tomnomnom/linkheader v0.0.0-20180905144013-02ca5825eb80
@@ -17,7 +19,7 @@ require (
github.com/andybalholm/brotli v1.0.5 // indirect
github.com/klauspost/compress v1.16.3 // indirect
github.com/valyala/bytebufferpool v1.0.0 // indirect
golang.org/x/net v0.8.0 // indirect
golang.org/x/sys v0.6.0 // indirect
golang.org/x/term v0.6.0 // indirect
golang.org/x/net v0.11.0 // indirect
golang.org/x/sys v0.9.0 // indirect
golang.org/x/term v0.9.0 // indirect
)
24 changes: 14 additions & 10 deletions go.sum
Original file line number Diff line number Diff line change
@@ -1,12 +1,16 @@
dario.cat/mergo v1.0.0 h1:AGCNq9Evsj31mOgNPcLyXc+4PNABt905YmuqPYYpBWk=
dario.cat/mergo v1.0.0/go.mod h1:uNxQE+84aUszobStD9th8a29P2fMDhsBdgRYvZOxGmk=
github.com/andybalholm/brotli v1.0.5 h1:8uQZIdzKmjc/iuPu7O2ioW48L81FgatrcpfFmiq/cCs=
github.com/andybalholm/brotli v1.0.5/go.mod h1:fO7iG3H7G2nSZ7m0zPUDn85XEX2GTukHGRSepvi9Eig=
github.com/corpix/uarand v0.2.0 h1:U98xXwud/AVuCpkpgfPF7J5TQgr7R5tqT8VZP5KWbzE=
github.com/corpix/uarand v0.2.0/go.mod h1:/3Z1QIqWkDIhf6XWn/08/uMHoQ8JUoTIKc2iPchBOmM=
github.com/davecgh/go-spew v1.1.0 h1:ZDRjVQ15GmhC3fiQ8ni8+OwkZQO4DARzQgrnXU1Liz8=
github.com/hueristiq/hqgoutils v0.0.0-20230313202103-053eefb3d3ec h1:rK/Tm5QTcb1hXThoQOQ+KnO6cmPDgrSBi5rPObo1OUM=
github.com/hueristiq/hqgoutils v0.0.0-20230313202103-053eefb3d3ec/go.mod h1:owPR/GbpoJMI3Bvx2NMXqLcwb4p8pd6x2qzuebsgbHk=
github.com/imdario/mergo v0.3.16 h1:wwQJbIsHYGMUyLSPrEq1CT16AhnhNJQ51+4fdHUnCl4=
github.com/imdario/mergo v0.3.16/go.mod h1:WBLT9ZmE3lPoWsEzCh9LPo3TiwVN+ZKEjmz+hD27ysY=
github.com/hueristiq/hqgolimit v0.0.0-20230623113203-3e14552a97f8 h1:r4ze6pX8H//X4SJEIcn8wHPgAhaGKEaa44lyHh1epXY=
github.com/hueristiq/hqgolimit v0.0.0-20230623113203-3e14552a97f8/go.mod h1:CzhJzxz2rv/NMKNz5b4eKFh1epdcED05YTHT32NFyrI=
github.com/hueristiq/hqgolog v0.0.0-20230623101640-92de7a10a4bb h1:DQUVIiWnrTDQ4MP6UJw7/fMkySN+PYonDhlgBh31DDI=
github.com/hueristiq/hqgolog v0.0.0-20230623101640-92de7a10a4bb/go.mod h1:S5J3E3Azva5+JKv67uc+Hh3XwLDvkVYDGjEaMTFrIqg=
github.com/hueristiq/hqgourl v0.0.0-20230623095947-4dee5ebb9a96 h1:oQsID2S7L6dhNVbwkStxesXOMbn7LWfDSyohVbuKJe8=
github.com/hueristiq/hqgourl v0.0.0-20230623095947-4dee5ebb9a96/go.mod h1:8NAT2ECb69qzGf2d/ty0PVE3M3HK/+fXLtri2c47wQE=
github.com/klauspost/compress v1.16.3 h1:XuJt9zzcnaz6a16/OU53ZjWp/v7/42WcR5t2a0PcNQY=
github.com/klauspost/compress v1.16.3/go.mod h1:ntbaceVETuRiXiv4DpjP66DpAtAGkEQskQzEyD//IeE=
github.com/logrusorgru/aurora/v3 v3.0.0 h1:R6zcoZZbvVcGMvDCKo45A9U/lzYyzl5NfYIvznmDfE4=
@@ -21,12 +25,12 @@ github.com/valyala/bytebufferpool v1.0.0 h1:GqA5TC/0021Y/b9FG4Oi9Mr3q7XYx6Kllzaw
github.com/valyala/bytebufferpool v1.0.0/go.mod h1:6bBcMArwyJ5K/AmCkWv1jt77kVWyCJ6HpOuEn7z0Csc=
github.com/valyala/fasthttp v1.47.0 h1:y7moDoxYzMooFpT5aHgNgVOQDrS3qlkfiP9mDtGGK9c=
github.com/valyala/fasthttp v1.47.0/go.mod h1:k2zXd82h/7UZc3VOdJ2WaUqt1uZ/XpXAfE9i+HBC3lA=
golang.org/x/net v0.8.0 h1:Zrh2ngAOFYneWTAIAPethzeaQLuHwhuBkuV6ZiRnUaQ=
golang.org/x/net v0.8.0/go.mod h1:QVkue5JL9kW//ek3r6jTKnTFis1tRmNAW2P1shuFdJc=
golang.org/x/sys v0.6.0 h1:MVltZSvRTcU2ljQOhs94SXPftV6DCNnZViHeQps87pQ=
golang.org/x/sys v0.6.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
golang.org/x/term v0.6.0 h1:clScbb1cHjoCkyRbWwBEUZ5H/tIFu5TAXIqaZD0Gcjw=
golang.org/x/term v0.6.0/go.mod h1:m6U89DPEgQRMq3DNkDClhWw02AUbt2daBVO4cn4Hv9U=
golang.org/x/net v0.11.0 h1:Gi2tvZIJyBtO9SDr1q9h5hEQCp/4L2RQ+ar0qjx2oNU=
golang.org/x/net v0.11.0/go.mod h1:2L/ixqYpgIVXmeoSA/4Lu7BzTG4KIyPIryS4IsOd1oQ=
golang.org/x/sys v0.9.0 h1:KS/R3tvhPqvJvwcKfnBHJwwthS11LRhmM5D59eEXa0s=
golang.org/x/sys v0.9.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
golang.org/x/term v0.9.0 h1:GRRCnKYhdQrD8kfRAdQ6Zcw1P0OcELxGLKJvtjVMZ28=
golang.org/x/term v0.9.0/go.mod h1:M6DEAAIenWoTxdKrOltXcmDY3rSplQUkrvaDU5FcQyo=
gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405 h1:yhCVgyC4o1eVCa2tZl7eS0r+SDo693bJlVdllGtEeKM=
gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0=
gopkg.in/yaml.v3 v3.0.1 h1:fxVm/GzAzEWqLHuvctI91KS9hhNmmWOoWu0XTYJS7CA=
159 changes: 80 additions & 79 deletions internal/configuration/configuration.go
Original file line number Diff line number Diff line change
@@ -1,32 +1,54 @@
package configuration

import (
"math/rand"
"os"
"path/filepath"
"strings"

hqlog "github.com/hueristiq/hqgoutils/log"
"dario.cat/mergo"
"github.com/hueristiq/xurlfind3r/pkg/xurlfind3r/sources"
"github.com/logrusorgru/aurora/v3"
"gopkg.in/yaml.v3"
)

type Keys struct {
Github []string `yaml:"github"`
Intelx []string `yaml:"intelx"`
type Configuration struct {
Version string `yaml:"version"`
Sources []string `yaml:"sources"`
Keys sources.Keys `yaml:"keys"`
}

type Configuration struct {
Version string `yaml:"version"`
Sources []string `yaml:"sources"`
Keys Keys `yaml:"keys"`
func (configuration *Configuration) Write(path string) (err error) {
var (
file *os.File
)

directory := filepath.Dir(path)
identation := 4

if _, err = os.Stat(directory); os.IsNotExist(err) {
if directory != "" {
if err = os.MkdirAll(directory, os.ModePerm); err != nil {
return
}
}
}

file, err = os.OpenFile(path, os.O_WRONLY|os.O_CREATE|os.O_TRUNC, 0755)
if err != nil {
return
}

defer file.Close()

enc := yaml.NewEncoder(file)
enc.SetIndent(identation)
err = enc.Encode(&configuration)

return
}

const (
NAME string = "xurlfind3r"
VERSION string = "0.1.0"
DESCRIPTION string = "A CLI utility to find domain's known URLs."
NAME string = "xurlfind3r"
VERSION string = "0.2.0"
)

var (
@@ -37,98 +59,77 @@ __ ___ _ _ __| |/ _(_)_ __ __| |___ / _ __
\ \/ / | | | '__| | |_| | '_ \ / _`+"`"+` | |_ \| '__|
> <| |_| | | | | _| | | | | (_| |___) | |
/_/\_\\__,_|_| |_|_| |_|_| |_|\__,_|____/|_| %s
%s
`).Bold(),
aurora.BrightYellow("v"+VERSION).Bold(),
aurora.BrightGreen(DESCRIPTION).Italic(),
)
rootDirectoryName = ".hueristiq"
projectRootDirectoryName = NAME
ProjectRootDirectoryPath = func(rootDirectoryName, projectRootDirectoryName string) string {
home, err := os.UserHomeDir()
if err != nil {
hqlog.Fatal().Msg(err.Error())
}

return filepath.Join(home, rootDirectoryName, projectRootDirectoryName)
}(rootDirectoryName, projectRootDirectoryName)
configurationFileName = "config.yaml"
ConfigurationFilePath = filepath.Join(ProjectRootDirectoryPath, configurationFileName)
Default = Configuration{
Version: VERSION,
Sources: sources.List,
Keys: Keys{
Github: []string{},
Intelx: []string{},
},
}
)

func Read() (configuration Configuration, err error) {
func CreateUpdate(path string) (err error) {
var (
file *os.File
config Configuration
)

file, err = os.Open(ConfigurationFilePath)
if err != nil {
return
defaultConfig := Configuration{
Version: VERSION,
Sources: sources.List,
Keys: sources.Keys{
Bevigil: []string{},
GitHub: []string{},
Intelx: []string{},
URLScan: []string{},
},
}

defer file.Close()

err = yaml.NewDecoder(file).Decode(&configuration)
_, err = os.Stat(path)
if err != nil {
if os.IsNotExist(err) {
config = defaultConfig

return
}
if err = config.Write(path); err != nil {
return
}
} else {
return
}
} else {
config, err = Read(path)
if err != nil {
return
}

func Write(configuration *Configuration) (err error) {
var (
file *os.File
identation = 4
)
if config.Version != VERSION ||
len(config.Sources) != len(sources.List) {
if err = mergo.Merge(&config, defaultConfig); err != nil {
return
}

directory := filepath.Dir(ConfigurationFilePath)
config.Version = VERSION
config.Sources = sources.List

if _, err = os.Stat(directory); os.IsNotExist(err) {
if directory != "" {
if err = os.MkdirAll(directory, os.ModePerm); err != nil {
if err = config.Write(path); err != nil {
return
}
}
}

file, err = os.OpenFile(ConfigurationFilePath, os.O_WRONLY|os.O_CREATE|os.O_TRUNC, 0755)
if err != nil {
return
}

defer file.Close()

enc := yaml.NewEncoder(file)
enc.SetIndent(identation)
err = enc.Encode(&configuration)

return
}

func (configuration *Configuration) GetKeys() sources.Keys {
keys := sources.Keys{}
func Read(path string) (configuration Configuration, err error) {
var (
file *os.File
)

if len(configuration.Keys.Github) > 0 {
keys.GitHub = configuration.Keys.Github
file, err = os.Open(path)
if err != nil {
return
}

intelxKeysCount := len(configuration.Keys.Intelx)
if intelxKeysCount > 0 {
intelxKeys := configuration.Keys.Intelx[rand.Intn(intelxKeysCount)] //nolint:gosec // Works perfectly
parts := strings.Split(intelxKeys, ":")
defer file.Close()

if len(parts) == 2 {
keys.IntelXHost = parts[0]
keys.IntelXKey = parts[1]
}
if err = yaml.NewDecoder(file).Decode(&configuration); err != nil {
return
}

return keys
return
}
File renamed without changes.
66 changes: 66 additions & 0 deletions pkg/xurlfind3r/sources/bevigil/bevigil.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
package bevigil

import (
"encoding/json"
"fmt"

"github.com/hueristiq/xurlfind3r/pkg/xurlfind3r/httpclient"
"github.com/hueristiq/xurlfind3r/pkg/xurlfind3r/sources"
"github.com/valyala/fasthttp"
)

type response struct {
Domain string `json:"domain"`
URLs []string `json:"urls"`
}

type Source struct{}

func (source *Source) Run(config *sources.Configuration) (URLsChannel chan sources.URL) {
URLsChannel = make(chan sources.URL)

go func() {
defer close(URLsChannel)

var (
key string
err error
res *fasthttp.Response
headers = map[string]string{}
)

key, err = sources.PickRandom(config.Keys.Bevigil)
if key == "" || err != nil {
return
}

if len(config.Keys.Bevigil) > 0 {
headers["X-Access-Token"] = key
}

reqURL := fmt.Sprintf("https://osint.bevigil.com/api/%s/urls/", config.Domain)

res, err = httpclient.Request(fasthttp.MethodGet, reqURL, "", headers, nil)
if err != nil {
return
}

body := res.Body()

var results response

if err = json.Unmarshal(body, &results); err != nil {
return
}

for _, i := range results.URLs {
URLsChannel <- sources.URL{Source: source.Name(), Value: i}
}
}()

return
}

func (source *Source) Name() string {
return "bevigil"
}
16 changes: 8 additions & 8 deletions pkg/xurlfind3r/sources/configuration.go
Original file line number Diff line number Diff line change
@@ -4,20 +4,20 @@ import (
"regexp"
)

type Keys struct {
GitHub []string `json:"github"`
Intelx string `json:"intelx"` // unused, add for the purpose of adding an asterisk `*` on listing sources
IntelXHost string `json:"intelXHost"`
IntelXKey string `json:"intelXKey"`
}

type Configuration struct {
Keys Keys
Domain string
IncludeSubdomains bool
Keys Keys
ParseWaybackRobots bool
ParseWaybackSource bool
URLsRegex *regexp.Regexp
MediaURLsRegex *regexp.Regexp
RobotsURLsRegex *regexp.Regexp
}

type Keys struct {
Bevigil []string `yaml:"bevigil"`
GitHub []string `yaml:"github"`
Intelx []string `yaml:"intelx"`
URLScan []string `yaml:"urlscan"`
}
27 changes: 19 additions & 8 deletions pkg/xurlfind3r/sources/intelx/intelx.go
Original file line number Diff line number Diff line change
@@ -4,6 +4,7 @@ package intelx
import (
"encoding/json"
"fmt"
"strings"
"time"

"github.com/hueristiq/xurlfind3r/pkg/xurlfind3r/httpclient"
@@ -16,15 +17,15 @@ type searchResponseType struct {
Status int `json:"status"`
}

type selectorType struct {
Selectvalue string `json:"selectorvalue"`
}

type searchResultType struct {
Selectors []selectorType `json:"selectors"`
Status int `json:"status"`
}

type selectorType struct {
Selectvalue string `json:"selectorvalue"`
}

type requestBody struct {
Term string `json:"term"`
Timeout time.Duration `json:"timeout"`
@@ -41,16 +42,26 @@ func (source *Source) Run(config *sources.Configuration) (URLsChannel chan sourc
defer close(URLsChannel)

var (
key string
err error
body []byte
res *fasthttp.Response
body []byte
)

if config.Keys.IntelXKey == "" || config.Keys.IntelXHost == "" {
key, err = sources.PickRandom(config.Keys.Intelx)
if key == "" || err != nil {
return
}

parts := strings.Split(key, ":")
intelXHost := parts[0]
intelXKey := parts[1]

if intelXKey == "" || intelXHost == "" {
return
}

searchURL := fmt.Sprintf("https://%s/phonebook/search?k=%s", config.Keys.IntelXHost, config.Keys.IntelXKey)
searchURL := fmt.Sprintf("https://%s/phonebook/search?k=%s", intelXHost, intelXKey)
reqBody := requestBody{
Term: config.Domain,
MaxResults: 100000,
@@ -74,7 +85,7 @@ func (source *Source) Run(config *sources.Configuration) (URLsChannel chan sourc
return
}

resultsURL := fmt.Sprintf("https://%s/phonebook/search/result?k=%s&id=%s&limit=10000", config.Keys.IntelXHost, config.Keys.IntelXKey, response.ID)
resultsURL := fmt.Sprintf("https://%s/phonebook/search/result?k=%s&id=%s&limit=10000", intelXHost, intelXKey, response.ID)
status := 0

for status == 0 || status == 3 {
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
package sources

var List = []string{
"bevigil",
"commoncrawl",
"github",
"intelx",
80 changes: 63 additions & 17 deletions pkg/xurlfind3r/sources/urlscan/urlscan.go
Original file line number Diff line number Diff line change
@@ -3,7 +3,7 @@ package urlscan

import (
"encoding/json"
"fmt"
"net/url"

"github.com/hueristiq/xurlfind3r/pkg/xurlfind3r/httpclient"
"github.com/hueristiq/xurlfind3r/pkg/xurlfind3r/sources"
@@ -13,9 +13,17 @@ import (
type response struct {
Results []struct {
Page struct {
URL string `json:"url"`
Domain string `json:"domain"`
MimeType string `json:"mimeType"`
URL string `json:"url"`
Status string `json:"status"`
} `json:"page"`
Sort []interface{} `json:"sort"`
} `json:"results"`
Status int `json:"status"`
Total int `json:"total"`
Took int `json:"took"`
HasMore bool `json:"has_more"`
}

type Source struct{}
@@ -27,35 +35,73 @@ func (source *Source) Run(config *sources.Configuration) (URLsChannel chan sourc
defer close(URLsChannel)

var (
err error
res *fasthttp.Response
key string
err error
res *fasthttp.Response
searchAfter []interface{}
headers = map[string]string{
"Content-Type": "application/json",
}
)

res, err = httpclient.SimpleGet(fmt.Sprintf("https://urlscan.io/api/v1/search/?q=domain:%s", config.Domain))
if err != nil {
key, err = sources.PickRandom(config.Keys.URLScan)
if key == "" || err != nil {
return
}

body := res.Body()
if len(config.Keys.URLScan) > 0 {
headers["API-Key"] = key
}

var results response
for {
baseURL := "https://urlscan.io/api/v1/search/"
params := url.Values{}
params.Set("q", config.Domain)

if err = json.Unmarshal(body, &results); err != nil {
return
}
if searchAfter != nil {
searchAfterJSON, _ := json.Marshal(searchAfter)
params.Set("search_after", string(searchAfterJSON))
}

for _, i := range results.Results {
URL := i.Page.URL
reqURL := baseURL + "?" + params.Encode()

if !sources.IsValid(URL) {
continue
res, err = httpclient.Request(fasthttp.MethodGet, reqURL, "", headers, nil)
if err != nil {
return
}

if !sources.IsInScope(URL, config.Domain, config.IncludeSubdomains) {
body := res.Body()

var results response

if err = json.Unmarshal(body, &results); err != nil {
return
}

URLsChannel <- sources.URL{Source: source.Name(), Value: URL}
if results.Status == 429 {
break
}

for _, i := range results.Results {
URL := i.Page.URL

if !sources.IsValid(URL) {
continue
}

if !sources.IsInScope(URL, config.Domain, config.IncludeSubdomains) {
return
}

URLsChannel <- sources.URL{Source: source.Name(), Value: URL}
}

if !results.HasMore {
break
}

lastResult := results.Results[len(results.Results)-1]
searchAfter = lastResult.Sort
}
}()

36 changes: 33 additions & 3 deletions pkg/xurlfind3r/sources/utils.go
Original file line number Diff line number Diff line change
@@ -1,15 +1,45 @@
package sources

import (
"crypto/rand"
"fmt"
"math/big"
"net/mail"

hqurl "github.com/hueristiq/hqgoutils/url"
"github.com/hueristiq/hqgourl"
)

func PickRandom[T any](v []T) (picked T, err error) {
length := len(v)

if length == 0 {
return
}

// Generate a cryptographically secure random index
max := big.NewInt(int64(length))

var indexBig *big.Int

indexBig, err = rand.Int(rand.Reader, max)
if err != nil {
err = fmt.Errorf("failed to generate random index: %v", err)

return
}

index := indexBig.Int64()

// Return the element at the random index
picked = v[index]

return
}

func IsValid(URL string) (isValid bool) {
var err error

_, err = hqurl.Parse(URL)
_, err = hqgourl.Parse(URL)
if err != nil {
return
}
@@ -25,7 +55,7 @@ func IsValid(URL string) (isValid bool) {
}

func IsInScope(URL, domain string, includeSubdomains bool) (isInScope bool) {
parsedURL, err := hqurl.Parse(URL)
parsedURL, err := hqgourl.Parse(URL)
if err != nil {
return
}
4 changes: 2 additions & 2 deletions pkg/xurlfind3r/sources/wayback/wayback.go
Original file line number Diff line number Diff line change
@@ -9,7 +9,7 @@ import (
"strings"
"sync"

hqratelimiter "github.com/hueristiq/hqgoutils/ratelimiter"
"github.com/hueristiq/hqgolimit"
"github.com/hueristiq/xurlfind3r/pkg/xurlfind3r/httpclient"
"github.com/hueristiq/xurlfind3r/pkg/xurlfind3r/sources"
"github.com/valyala/fasthttp"
@@ -18,7 +18,7 @@ import (
type Source struct{}

var (
limiter = hqratelimiter.New(&hqratelimiter.Options{
limiter = hqgolimit.New(&hqgolimit.Options{
RequestsPerMinute: 40,
})
)
4 changes: 2 additions & 2 deletions pkg/xurlfind3r/sources/wayback/waybackrobots.go
Original file line number Diff line number Diff line change
@@ -6,7 +6,7 @@ import (
"strings"
"sync"

hqurl "github.com/hueristiq/hqgoutils/url"
"github.com/hueristiq/hqgourl"
)

func parseWaybackRobots(URL string) (URLs chan string) {
@@ -84,7 +84,7 @@ func parseWaybackRobots(URL string) (URLs chan string) {
}
}

parsedURL, err := hqurl.Parse(row[1])
parsedURL, err := hqgourl.Parse(row[1])
if err != nil {
continue
}
6 changes: 3 additions & 3 deletions pkg/xurlfind3r/sources/wayback/waybacksource.go
Original file line number Diff line number Diff line change
@@ -7,13 +7,13 @@ import (
"strings"
"sync"

hqurl "github.com/hueristiq/hqgoutils/url"
"github.com/hueristiq/hqgourl"
)

func parseWaybackSource(URL string, URLsRegex *regexp.Regexp) (URLs chan string) {
URLs = make(chan string)

parsedURL, err := hqurl.Parse(URL)
parsedURL, err := hqgourl.Parse(URL)
if err != nil {
return
}
@@ -68,7 +68,7 @@ func parseWaybackSource(URL string, URLsRegex *regexp.Regexp) (URLs chan string)
sourceURL = "https:" + sourceURL
}

parsedSourceURL, err := hqurl.Parse(sourceURL)
parsedSourceURL, err := hqgourl.Parse(sourceURL)
if err != nil {
continue
}
35 changes: 33 additions & 2 deletions pkg/xurlfind3r/xurlfind3r.go
Original file line number Diff line number Diff line change
@@ -5,6 +5,7 @@ import (
"sync"

"github.com/hueristiq/xurlfind3r/pkg/xurlfind3r/sources"
"github.com/hueristiq/xurlfind3r/pkg/xurlfind3r/sources/bevigil"
"github.com/hueristiq/xurlfind3r/pkg/xurlfind3r/sources/commoncrawl"
"github.com/hueristiq/xurlfind3r/pkg/xurlfind3r/sources/github"
"github.com/hueristiq/xurlfind3r/pkg/xurlfind3r/sources/intelx"
@@ -20,20 +21,24 @@ type Options struct {
Keys sources.Keys
ParseWaybackRobots bool
ParseWaybackSource bool
FilterPattern string
Matchattern string
}

type Finder struct {
Sources map[string]sources.Source
SourcesConfiguration *sources.Configuration
FilterRegex *regexp.Regexp
MatchRegex *regexp.Regexp
}

func New(options *Options) (finder *Finder) {
func New(options *Options) (finder *Finder, err error) {
finder = &Finder{
Sources: map[string]sources.Source{},
SourcesConfiguration: &sources.Configuration{
Keys: options.Keys,
Domain: options.Domain,
IncludeSubdomains: options.IncludeSubdomains,
Keys: options.Keys,
ParseWaybackRobots: options.ParseWaybackRobots,
ParseWaybackSource: options.ParseWaybackSource,
URLsRegex: regexp.MustCompile(`(?:"|')(((?:[a-zA-Z]{1,10}://|//)[^"'/]{1,}\.[a-zA-Z]{2,}[^"']{0,})|((?:/|\.\./|\./)[^"'><,;| *()(%%$^/\\\[\]][^"'><,;|()]{1,})|([a-zA-Z0-9_\-/]{1,}/[a-zA-Z0-9_\-/]{1,}\.(?:[a-zA-Z]{1,4}|action)(?:[\?|#][^"|']{0,}|))|([a-zA-Z0-9_\-/]{1,}/[a-zA-Z0-9_\-/]{3,}(?:[\?|#][^"|']{0,}|))|([a-zA-Z0-9_\-]{1,}\.(?:php|asp|aspx|jsp|json|action|html|js|txt|xml)(?:[\?|#][^"|']{0,}|)))(?:"|')`), //nolint:gocritic // Works so far
@@ -42,10 +47,26 @@ func New(options *Options) (finder *Finder) {
},
}

if options.FilterPattern != "" {
finder.FilterRegex, err = regexp.Compile(options.FilterPattern)
if err != nil {
return
}
}

if options.Matchattern != "" {
finder.MatchRegex, err = regexp.Compile(options.Matchattern)
if err != nil {
return
}
}

for index := range options.Sources {
source := options.Sources[index]

switch source {
case "bevigil":
finder.Sources[source] = &bevigil.Source{}
case "commoncrawl":
finder.Sources[source] = &commoncrawl.Source{}
case "github":
@@ -87,6 +108,16 @@ func (finder *Finder) Find() (URLs chan sources.URL) {
continue
}

if finder.MatchRegex != nil {
if !finder.MatchRegex.MatchString(URL.Value) {
continue
}
} else if finder.FilterRegex != nil && finder.MatchRegex == nil {
if finder.FilterRegex.MatchString(URL.Value) {
continue
}
}

URLs <- URL
}
}(finder.Sources[name])

0 comments on commit 93a23b0

Please sign in to comment.