Skip to content

Commit

Permalink
Merge pull request #363 from projectdiscovery/dev
Browse files Browse the repository at this point in the history
katana v1.0.0
  • Loading branch information
ehsandeep authored Mar 20, 2023
2 parents db629bb + 6966f18 commit 8b074a9
Show file tree
Hide file tree
Showing 47 changed files with 1,353 additions and 1,109 deletions.
8 changes: 5 additions & 3 deletions .github/workflows/build-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,15 +5,17 @@ on:
pull_request:
branches:
- dev

paths:
- '**.go'
- '**.mod'
jobs:
build:
name: Test Builds
runs-on: ${{ matrix.os }}
strategy:
matrix:
os: [ubuntu-latest, windows-latest, macOS-12]
go-version: [1.18.x, 1.19.x]
os: [ubuntu-latest-16-cores, windows-latest-8-cores, macOS-12]
go-version: [1.19.x, 1.20.x]
steps:
- name: Set up Go
uses: actions/setup-go@v3
Expand Down
5 changes: 4 additions & 1 deletion .github/workflows/codeql-analysis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,11 +5,14 @@ on:
pull_request:
branches:
- dev
paths:
- '**.go'
- '**.mod'

jobs:
analyze:
name: Analyze
runs-on: ubuntu-latest
runs-on: ubuntu-latest-16-cores
permissions:
actions: read
contents: read
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/dockerhub-push.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ on:

jobs:
docker:
runs-on: ubuntu-latest
runs-on: ubuntu-latest-16-cores
steps:
- name: Checkout
uses: actions/checkout@v3
Expand All @@ -32,7 +32,7 @@ jobs:
password: ${{ secrets.DOCKER_TOKEN }}

- name: Build and push
uses: docker/build-push-action@v3
uses: docker/build-push-action@v4
with:
context: .
platforms: linux/amd64,linux/arm64
Expand Down
19 changes: 12 additions & 7 deletions .github/workflows/lint-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,21 +5,26 @@ on:
pull_request:
branches:
- dev

paths:
- '**.go'
- '**.mod'
jobs:
lint:
name: Lint Test
runs-on: ubuntu-latest
runs-on: ubuntu-latest-16-cores
steps:
- name: Checkout code
uses: actions/checkout@v3

- name: Set up Go
uses: actions/setup-go@v3
with:
go-version: 1.18
- name: Checkout code
uses: actions/checkout@v3
go-version: 1.19
cache: true

- name: Run golangci-lint
uses: golangci/golangci-lint-action@v3.3.1
uses: golangci/golangci-lint-action@v3.4.0
with:
version: latest
args: --timeout 5m
working-directory: .
working-directory: .
5 changes: 3 additions & 2 deletions .github/workflows/release-binary.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ on:

jobs:
release:
runs-on: ubuntu-latest
runs-on: ubuntu-latest-16-cores
steps:
- name: "Check out code"
uses: actions/checkout@v3
Expand All @@ -18,7 +18,8 @@ jobs:
- name: "Set up Go"
uses: actions/setup-go@v3
with:
go-version: 1.18
go-version: 1.19
cache: true

- name: "Create release on GitHub"
uses: goreleaser/goreleaser-action@v4
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/security-crawl-maze-score.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,12 +6,12 @@ on:
jobs:
build:
name: Run Scoring
runs-on: ubuntu-latest
runs-on: ubuntu-latest-16-cores
steps:
- name: Set up Go
uses: actions/setup-go@v3
with:
go-version: 1.18
go-version: 1.19

- name: Check out code
uses: actions/checkout@v3
Expand Down
4 changes: 2 additions & 2 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
FROM golang:1.19.4-alpine AS builder
FROM golang:1.20.2-alpine AS builder
RUN apk add --no-cache git
WORKDIR /app
COPY . /app
RUN go mod download
RUN go build ./cmd/katana

FROM alpine:3.17.0
FROM alpine:3.17.2
RUN apk -U upgrade --no-cache \
&& apk add --no-cache bind-tools ca-certificates chromium
COPY --from=builder /app/katana /usr/local/bin/
Expand Down
68 changes: 50 additions & 18 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -109,36 +109,43 @@ This will display help for the tool. Here are all the switches it supports.
Usage:
./katana [flags]

Katana is a fast crawler focused on execution in automation
pipelines offering both headless and non-headless crawling.

Flags:
INPUT:
-u, -list string[] target url / list to crawl

CONFIGURATION:
-d, -depth int maximum depth to crawl (default 2)
-r, -resolvers string[] list of custom resolver (file or comma separated)
-d, -depth int maximum depth to crawl (default 3)
-jc, -js-crawl enable endpoint parsing / crawling in javascript file
-ct, -crawl-duration int maximum duration to crawl the target for
-kf, -known-files string enable crawling of known files (all,robotstxt,sitemapxml)
-mrs, -max-response-size int maximum response size to read (default 2097152)
-mrs, -max-response-size int maximum response size to read (default 9223372036854775807)
-timeout int time to wait for request in seconds (default 10)
-aff, -automatic-form-fill enable optional automatic form filling (experimental)
-aff, -automatic-form-fill enable automatic form filling (experimental)
-retry int number of times to retry the request (default 1)
-proxy string http/socks5 proxy to use
-H, -headers string[] custom header/cookie to include in request
-config string path to the katana configuration file
-fc, -form-config string path to custom form configuration file
-flc, -field-config string path to custom field configuration file
-s, -strategy string Visit strategy (depth-first, breadth-first) (default "depth-first")

DEBUG:
-health-check, -hc run diagnostic check up
-elog, -error-log string file to write sent requests error log

HEADLESS:
-hl, -headless enable headless hybrid crawling (experimental)
-sc, -system-chrome use local installed chrome browser instead of katana installed
-sb, -show-browser show the browser on the screen with headless mode
-ho, -headless-options string[] start headless chrome with additional options
-nos, -no-sandbox start headless chrome in --no-sandbox mode
-scp, -system-chrome-path string use specified chrome binary path for headless crawling
-noi, -no-incognito start headless chrome without incognito mode
-hl, -headless enable headless hybrid crawling (experimental)
-sc, -system-chrome use local installed chrome browser instead of katana installed
-sb, -show-browser show the browser on the screen with headless mode
-ho, -headless-options string[] start headless chrome with additional options
-nos, -no-sandbox start headless chrome in --no-sandbox mode
-cdd, -chrome-data-dir string path to store chrome browser data
-scp, -system-chrome-path string use specified chrome browser for headless crawling
-noi, -no-incognito start headless chrome without incognito mode

SCOPE:
-cs, -crawl-scope string[] in scope url regex to be followed by crawler
Expand All @@ -148,8 +155,10 @@ SCOPE:
-do, -display-out-scope display external endpoint from scoped crawling

FILTER:
-f, -field string field to display in output (url,path,fqdn,rdn,rurl,qurl,qpath,file,key,value,kv,dir,udir)
-sf, -store-field string field to store in per-host output (url,path,fqdn,rdn,rurl,qurl,qpath,file,key,value,kv,dir,udir)
-mr, -match-regex string[] regex or list of regex to match on output url (cli, file)
-fr, -filter-regex string[] regex or list of regex to filter on output url (cli, file)
-f, -field string field to display in output (url,path,fqdn,rdn,rurl,qurl,qpath,file,ufile,key,value,kv,dir,udir)
-sf, -store-field string field to store in per-host output (url,path,fqdn,rdn,rurl,qurl,qpath,file,ufile,key,value,kv,dir,udir)
-em, -extension-match string[] match output for given extension (eg, -em php,html,js)
-ef, -extension-filter string[] filter output for given extension (eg, -ef png,css)

Expand All @@ -160,13 +169,19 @@ RATE-LIMIT:
-rl, -rate-limit int maximum requests to send per second (default 150)
-rlm, -rate-limit-minute int maximum number of requests to send per minute

UPDATE:
-up, -update update katana to latest version
-duc, -disable-update-check disable automatic katana update check

OUTPUT:
-o, -output string file to write output to
-j, -json write output in JSONL(ines) format
-nc, -no-color disable output content coloring (ANSI escape codes)
-silent display output only
-v, -verbose display verbose output
-version display project version
-o, -output string file to write output to
-sr, -store-response store http requests/responses
-srd, -store-response-dir string store http requests/responses to custom directory
-j, -json write output in JSONL(ines) format
-nc, -no-color disable output content coloring (ANSI escape codes)
-silent display output only
-v, -verbose display verbose output
-version display project version
```

## Running Katana
Expand Down Expand Up @@ -485,6 +500,8 @@ CONFIGURATION:
-H, -headers string[] custom header/cookie to include in request
-config string path to the katana configuration file
-fc, -form-config string path to custom form configuration file
-flc, -field-config string path to custom field configuration file
-s, -strategy string Visit strategy (depth-first, breadth-first) (default "depth-first")
```

## Filters
Expand Down Expand Up @@ -627,6 +644,21 @@ Here are additional filter options -
```


*`-match-regex`*
---
The `-match-regex` or `-mr` flag allows you to filter output URLs using regular expressions. When using this flag, only URLs that match the specified regular expression will be printed in the output.

```
katana -u https://tesla.com -mr 'https://shop\.tesla\.com/*' -silent
```
*`-filter-regex`*
---
The `-filter-regex` or `-fr` flag allows you to filter output URLs using regular expressions. When using this flag, it will skip the URLs that are match the specified regular expression.

```
katana -u https://tesla.com -fr 'https://www\.tesla\.com/*' -silent
```

## Rate Limit

It's easy to get blocked / banned while crawling if not following target websites limits, katana comes with multiple option to tune the crawl to go as fast / slow we want.
Expand Down
6 changes: 5 additions & 1 deletion cmd/integration-test/library.go
Original file line number Diff line number Diff line change
@@ -1,8 +1,11 @@
package main

import (
"math"

"github.com/projectdiscovery/katana/pkg/engine/standard"
"github.com/projectdiscovery/katana/pkg/types"
"github.com/projectdiscovery/katana/pkg/utils/queue"
)

var libraryTestcases = map[string]TestCase{
Expand All @@ -17,9 +20,10 @@ func (h *goIntegrationTest) Execute() error {
options := &types.Options{
MaxDepth: 1,
FieldScope: "rdn",
BodyReadSize: 2 * 1024 * 1024,
BodyReadSize: math.MaxInt,
RateLimit: 150,
Verbose: debug,
Strategy: queue.DepthFirst.String(),
}
crawlerOptions, err := types.NewCrawlerOptions(options)
if err != nil {
Expand Down
Loading

0 comments on commit 8b074a9

Please sign in to comment.