Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add DSL expression based matcher and filter flags #537

Merged
merged 7 commits into from
Jul 31, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/build-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ jobs:
strategy:
matrix:
os: [ubuntu-latest-16-cores, windows-latest-8-cores, macOS-12]
go-version: [1.19.x, 1.20.x]
go-version: [1.20.x]
steps:
- name: Set up Go
uses: actions/setup-go@v4
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/functional-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ jobs:
- name: Set up Go
uses: actions/setup-go@v4
with:
go-version: 1.19
go-version: 1.20.x

- name: Check out code
uses: actions/checkout@v3
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/lint-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ jobs:
- name: Set up Go
uses: actions/setup-go@v4
with:
go-version: 1.19
go-version: 1.20.x
cache: true

- name: Run golangci-lint
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/release-binary.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ jobs:
- name: "Set up Go"
uses: actions/setup-go@v4
with:
go-version: 1.19
go-version: 1.20.x
cache: true

- name: "Create release on GitHub"
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/security-crawl-maze-score.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ jobs:
- name: Set up Go
uses: actions/setup-go@v4
with:
go-version: 1.19
go-version: 1.20.x

- name: Check out code
uses: actions/checkout@v3
Expand Down
28 changes: 28 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -166,6 +166,8 @@ FILTER:
-sf, -store-field string field to store in per-host output (url,path,fqdn,rdn,rurl,qurl,qpath,file,ufile,key,value,kv,dir,udir)
-em, -extension-match string[] match output for given extension (eg, -em php,html,js)
-ef, -extension-filter string[] filter output for given extension (eg, -ef png,css)
-mdc, -match-condition string match response with dsl based condition
-fdc, -filter-condition string filter response with dsl based condition

RATE-LIMIT:
-c, -concurrency int number of concurrent fetchers to use (default 10)
Expand Down Expand Up @@ -698,6 +700,7 @@ The `-store-field` option can be useful for collecting information to build a ta
- Finding commonly used files
- Identifying related or unknown subdomains

### Katana Filters

*`-extension-match`*
---
Expand Down Expand Up @@ -732,6 +735,28 @@ The `-filter-regex` or `-fr` flag allows you to filter output URLs using regular
katana -u https://tesla.com -fr 'https://www\.tesla\.com/*' -silent
```

### Advance Filtering

Katana supports DSL-based expressions for advanced matching and filtering capabilities:

- To match endpoints with a 200 status code:
```shell
katana -u https://www.hackerone.com -mdc 'status_code == 200'
```
- To match endpoints that contain "default" and have a status code other than 403:
```shell
katana -u https://www.hackerone.com -mdc 'contains(endpoint, "default") && status_code != 403'
```
- To match endpoints with PHP technologies:
```shell
katana -u https://www.hackerone.com -mdc 'contains(to_lower(technologies), "php")'
```
- To filter out endpoints running on Cloudflare:
```shell
katana -u https://www.hackerone.com -fdc 'contains(to_lower(technologies), "cloudflare")'
```
DSL functions can be applied to any keys in the jsonl output. For more information on available DSL functions, please visit the [dsl project](https://github.com/projectdiscovery/dsl).

Here are additional filter options -

```console
Expand All @@ -745,8 +770,11 @@ FILTER:
-sf, -store-field string field to store in per-host output (url,path,fqdn,rdn,rurl,qurl,qpath,file,ufile,key,value,kv,dir,udir)
-em, -extension-match string[] match output for given extension (eg, -em php,html,js)
-ef, -extension-filter string[] filter output for given extension (eg, -ef png,css)
-mdc, -match-condition string match response with dsl based condition
-fdc, -filter-condition string filter response with dsl based condition
```


## Rate Limit

It's easy to get blocked / banned while crawling if not following target websites limits, katana comes with multiple option to tune the crawl to go as fast / slow we want.
Expand Down
77 changes: 77 additions & 0 deletions cmd/integration-test/filters.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
package main

import (
"fmt"
"os"
"os/exec"
"strings"
)

var filtersTestcases = map[string]TestCase{
"match condition": &matchConditionIntegrationTest{},
"filter condition": &filterConditionIntegrationTest{},
}

type matchConditionIntegrationTest struct{}

// Execute executes a test case and returns an error if occurred
// Execute the docs at ../README.md if the code stops working for integration.
func (h *matchConditionIntegrationTest) Execute() error {
results, _ := RunKatanaAndGetResults(false,
"-u", "scanme.sh",
"-match-condition", "status_code == 200 && contains(body, 'ok')",
)

if len(results) != 1 {
return fmt.Errorf("match condition failed")
}
return nil
}

type filterConditionIntegrationTest struct{}

// Execute executes a test case and returns an error if occurred
// Execute the docs at ../README.md if the code stops working for integration.
func (h *filterConditionIntegrationTest) Execute() error {
results, _ := RunKatanaAndGetResults(false,
"-u", "scanme.sh",
"-filter-condition", "status_code == 200 && contains(body, 'ok')",
)

if len(results) != 0 {
return fmt.Errorf("filter condition failed")
}
return nil
}

// ExtraArgs
var ExtraDebugArgs = []string{}

func RunKatanaAndGetResults(debug bool, extra ...string) ([]string, error) {
cmd := exec.Command("./katana")
extra = append(extra, ExtraDebugArgs...)
cmd.Args = append(cmd.Args, extra...)
cmd.Args = append(cmd.Args, "-duc") // disable auto updates
if debug {
cmd.Args = append(cmd.Args, "-debug")
cmd.Stderr = os.Stderr
fmt.Println(cmd.String())
} else {
cmd.Args = append(cmd.Args, "-silent")
}
data, err := cmd.Output()
if debug {
fmt.Println(string(data))
}
if len(data) < 1 && err != nil {
return nil, fmt.Errorf("%v: %v", err.Error(), string(data))
}
var parts []string
items := strings.Split(string(data), "\n")
for _, i := range items {
if i != "" {
parts = append(parts, i)
}
}
return parts, nil
}
3 changes: 2 additions & 1 deletion cmd/integration-test/integration-test.go
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,8 @@ var (
failed = aurora.Red("[✘]").String()

tests = map[string]map[string]TestCase{
"code": libraryTestcases,
"code": libraryTestcases,
"filters": filtersTestcases,
}
)

Expand Down
2 changes: 2 additions & 0 deletions cmd/katana/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -122,6 +122,8 @@ pipelines offering both headless and non-headless crawling.`)
flagSet.StringVarP(&options.StoreFields, "store-field", "sf", "", fmt.Sprintf("field to store in per-host output (%s)", availableFields)),
flagSet.StringSliceVarP(&options.ExtensionsMatch, "extension-match", "em", nil, "match output for given extension (eg, -em php,html,js)", goflags.CommaSeparatedStringSliceOptions),
flagSet.StringSliceVarP(&options.ExtensionFilter, "extension-filter", "ef", nil, "filter output for given extension (eg, -ef png,css)", goflags.CommaSeparatedStringSliceOptions),
flagSet.StringVarP(&options.OutputMatchCondition, "match-condition", "mdc", "", "match response with dsl based condition"),
flagSet.StringVarP(&options.OutputFilterCondition, "filter-condition", "fdc", "", "filter response with dsl based condition"),
)

flagSet.CreateGroup("ratelimit", "Rate-Limit",
Expand Down
22 changes: 16 additions & 6 deletions go.mod
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
module github.com/projectdiscovery/katana

go 1.19
go 1.20

require (
github.com/BishopFox/jsluice v0.0.0-20230623145428-f10429e1016a
Expand All @@ -9,14 +9,16 @@ require (
github.com/json-iterator/go v1.1.12
github.com/logrusorgru/aurora v2.0.3+incompatible
github.com/lukasbob/srcset v0.0.0-20190730101422-86b742e617f3
github.com/mitchellh/mapstructure v1.5.0
github.com/pkg/errors v0.9.1
github.com/projectdiscovery/dsl v0.0.16
github.com/projectdiscovery/fastdialer v0.0.35
github.com/projectdiscovery/goflags v0.1.11
github.com/projectdiscovery/gologger v1.1.11
github.com/projectdiscovery/hmap v0.0.13
github.com/projectdiscovery/ratelimit v0.0.9
github.com/projectdiscovery/retryablehttp-go v1.0.19
github.com/projectdiscovery/utils v0.0.43
github.com/projectdiscovery/utils v0.0.44
github.com/projectdiscovery/wappalyzergo v0.0.105
github.com/remeh/sizedwaitgroup v1.0.0
github.com/rs/xid v1.5.0
Expand All @@ -29,7 +31,9 @@ require (

require (
aead.dev/minisign v0.2.0 // indirect
github.com/Knetic/govaluate v3.0.0+incompatible // indirect
github.com/Masterminds/semver/v3 v3.2.1 // indirect
github.com/Mzack9999/gostruct v0.0.0-20230415193108-30b70932da81 // indirect
github.com/VividCortex/ewma v1.2.0 // indirect
github.com/alecthomas/chroma v0.10.0 // indirect
github.com/andybalholm/brotli v1.0.4 // indirect
Expand All @@ -38,11 +42,13 @@ require (
github.com/cheggaaa/pb/v3 v3.1.4 // indirect
github.com/dlclark/regexp2 v1.8.1 // indirect
github.com/fatih/color v1.15.0 // indirect
github.com/fsnotify/fsnotify v1.4.9 // indirect
github.com/gaukas/godicttls v0.0.3 // indirect
github.com/golang/protobuf v1.5.3 // indirect
github.com/google/go-github/v30 v30.1.0 // indirect
github.com/google/go-querystring v1.1.0 // indirect
github.com/hashicorp/go-version v1.6.0 // indirect
github.com/hdm/jarm-go v0.0.7 // indirect
github.com/kataras/jwt v0.1.8 // indirect
github.com/klauspost/compress v1.15.15 // indirect
github.com/lucasb-eyer/go-colorful v1.2.0 // indirect
github.com/mattn/go-colorable v0.1.13 // indirect
Expand All @@ -53,16 +59,20 @@ require (
github.com/muesli/termenv v0.15.1 // indirect
github.com/olekukonko/tablewriter v0.0.5 // indirect
github.com/projectdiscovery/blackrock v0.0.1 // indirect
github.com/projectdiscovery/mapcidr v1.1.2 // indirect
github.com/refraction-networking/utls v1.3.2 // indirect
github.com/rivo/uniseg v0.4.4 // indirect
github.com/rogpeppe/go-internal v1.11.0 // indirect
github.com/sashabaranov/go-openai v1.14.1 // indirect
github.com/shoenig/go-m1cpu v0.1.6 // indirect
github.com/smacker/go-tree-sitter v0.0.0-20220628134258-ac06e95cfa11 // indirect
github.com/tidwall/btree v1.4.3 // indirect
github.com/spaolacci/murmur3 v1.1.0 // indirect
github.com/tidwall/btree v1.6.0 // indirect
github.com/tidwall/buntdb v1.3.0 // indirect
github.com/tidwall/gjson v1.14.3 // indirect
github.com/tidwall/gjson v1.14.4 // indirect
github.com/tidwall/grect v0.1.4 // indirect
github.com/tidwall/match v1.1.1 // indirect
github.com/tidwall/pretty v1.2.0 // indirect
github.com/tidwall/pretty v1.2.1 // indirect
github.com/tidwall/rtred v0.1.2 // indirect
github.com/tidwall/tinyqueue v0.1.1 // indirect
github.com/ysmood/fetchup v0.2.3 // indirect
Expand Down
Loading