Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: Missing content_length in JSON output if server response lacks Content-Length header #1032

Open
Sab0tag3d opened this issue Sep 17, 2024 · 1 comment · Fixed by #1033
Assignees
Labels
Status: Completed Nothing further to be done with this issue. Awaiting to be closed. Type: Bug Inconsistencies or issues which will cause an issue or problem for users or implementors.

Comments

@Sab0tag3d
Copy link

Katana Version: v1.1.0 (latest)

Current Behavior:

When using Katana with JSON output mode, the content_length field is not present if the server response does not explicitly include the Content-Length header.

Example:

Command:

katana -u https://gmail.com -fs fqdn -j | grep content_length

Output:

   __        __                
  / /_____ _/ /____ ____  ___ _
 /  '_/ _  / __/ _  / _ \/ _  /
/_/\_\\_,_/\__/\_,_/_//_/\_,_/                  

                projectdiscovery.io

[INF] Current katana version v1.1.0 (latest)
[INF] Started standard crawling for => https://gmail.com
➜  ~ 

However, when piping the same URLs through HTTPX, the content_length field is populated even if the Content-Length header is absent.

Example with HTTPX:

Command:

katana -u https://gmail.com | httpx -j | grep content_length

Output:

{"timestamp":"2024-09-17T12:59:02.204112+02:00","cdn_name":"google","cdn_type":"cdn","url":"https://gmail.com","status_code":301,"content_length":230}

Expected Behavior:

HTTPX handles missing Content-Length headers by calculating the content length from the response body, ensuring the content_length field is present in the JSON output. This behavior is achieved through logic like this:

// if content length is not defined
if resp.ContentLength <= 0 {
	// check if it's in the header and convert to int
	if contentLength, ok := resp.Headers["Content-Length"]; ok && len(contentLength) > 0 {
		resp.ContentLength = strconv.Atoi(contentLength[0])
	}

	// use response body length if the content length is still zero
	if resp.ContentLength <= 0 && len(respbody) > 0 {
		resp.ContentLength = len(respbody)
	}
}

In contrast, Katana does not appear to include a similar mechanism for ensuring content_length is always provided.

Katana Code Snippets:

In pkg/engine/standard/crawl.go, Katana seems to set resp.ContentLength based solely on the response data length:

resp.ContentLength = int64(len(data))

And in pkg/navigation/response.go, the Response struct doesn't include a content_length field:

type Response struct {
	// Other fields
	StatusCode         int               `json:"status_code,omitempty"`
	Headers            Headers           `json:"headers,omitempty"`
	Body               string            `json:"body,omitempty"`
}

Why This Matters:

The content_length field is essential for understanding the structure of a website’s response, particularly to identify whether the response body is empty. Including this data in the output helps users assess the content and size of a response even when the server omits the Content-Length header.

Request:

Could you please add logic to Katana to calculate and include the content_length in the JSON output, similar to how HTTPX handles it? This would greatly improve the utility of Katana’s output when crawling websites without Content-Length headers.

@Sab0tag3d Sab0tag3d added the Type: Bug Inconsistencies or issues which will cause an issue or problem for users or implementors. label Sep 17, 2024
@dogancanbakir dogancanbakir self-assigned this Sep 17, 2024
@Sab0tag3d
Copy link
Author

Here is just example how it can be fixed:
image

@dogancanbakir dogancanbakir linked a pull request Sep 18, 2024 that will close this issue
@ehsandeep ehsandeep added the Status: Completed Nothing further to be done with this issue. Awaiting to be closed. label Oct 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Completed Nothing further to be done with this issue. Awaiting to be closed. Type: Bug Inconsistencies or issues which will cause an issue or problem for users or implementors.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants