Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pings getting 403 responses in "interesting" ways #11753

Open
ianw opened this issue Nov 8, 2024 · 9 comments
Open

Pings getting 403 responses in "interesting" ways #11753

ianw opened this issue Nov 8, 2024 · 9 comments
Labels
Needed: design decision A core team decision is required

Comments

@ianw
Copy link

ianw commented Nov 8, 2024

Details

This is a follow-up to #11733 I guess, but it is driving me a bit nuts :)

I think this is cloudflare doing something, but the way you send the headers, and the user-agent, appears matter to the success of being able to ping the webhooks. I have removed the auth token below, but in all cases it is exactly the same.

If you test with the following

import http
import base64
import urllib
import urllib.request

http.client.HTTPConnection.debuglevel = 1

url = 'https://readthedocs.org/api/v2/webhook/gerrit-dash-creator/43048/'

auth_user = 'openstackci'
auth_passwd = '<thepassword>'

req = urllib.request.Request(url, method='POST')
base64string = base64.b64encode(bytes(f'{auth_user}:{auth_passwd}', 'ascii'))
req.add_header("Authorization", f'Basic {base64string.decode()}')
#req.add_header("User-Agent", 'curl/8.6.0')
with urllib.request.urlopen(req) as response:
    print(response.read())

If you use the default Python UA it will 403

send: b'POST /api/v2/webhook/gerrit-dash-creator/43048/ HTTP/1.1\r\nAccept-Encoding: identity\r\nContent-Length: 0\r\nHost: readthedocs.org\r\nUser-Agent: Python-urllib/3.12\r\nAuthorization: Basic BLAH==\r\nConnection: close\r\n\r\n'
reply: 'HTTP/1.1 403 Forbidden\r\n'

but if you fake that as curl/8.6.0 it will work

send: b'POST /api/v2/webhook/gerrit-dash-creator/43048/ HTTP/1.1\r\nAccept-Encoding: identity\r\nContent-Length: 0\r\nHost: readthedocs.org\r\nAuthorization: Basic BLAH==\r\nUser-Agent: curl/8.6.0\r\nConnection: close\r\n\r\n'
reply: 'HTTP/1.1 200 OK\r\n'

I can only assume cloudflare is filtering this for some reason? I thought that the UA filtering might be the cause of our Ansible pings that stopped working sometime between 2024-09-19T11:08:21Z (our last successful build) and 2024-09-23T21:32:29Z (the first observed failure for a previously working project), but it seems not quite that simple...

As I mentioned in the prior issue, we use the Ansible uri module to ping (https://docs.ansible.com/ansible/latest/collections/ansible/builtin/uri_module.html) where there is something even more bizarre going on.

I have instrumented the URI call to dump what it is sending/getting back. I have only removed the "set-cookie" values below in case they give something away.

When I run from a Debian bookworm container, it fails with a 403

('send:', "b'POST /api/v2/webhook/gerrit-dash-creator/43048/ HTTP/1.1\\r\\nAccept-Encoding: identity\\r\\nContent-Length: 0\\r\\nHost: readthedocs.org\\r\\nUser-Agent: ansible-httpget\\r\\nAuthorization: Basic BLAH\\r\\nConnection: close\\r\\n\\r\\n'")
('reply:', "'HTTP/1.1 403 Forbidden\\r\\n'")
('header:', 'Date:', 'Fri, 08 Nov 2024 10:20:41 GMT')
('header:', 'Content-Type:', 'text/html; charset=UTF-8')
('header:', 'Content-Length:', '4518')
('header:', 'Connection:', 'close')
('header:', 'X-Frame-Options:', 'SAMEORIGIN')
('header:', 'Referrer-Policy:', 'same-origin')
('header:', 'Cache-Control:', 'max-age=15')
('header:', 'Expires:', 'Fri, 08 Nov 2024 10:20:56 GMT')
('header:', 'Set-Cookie:', ...
('header:', 'Vary:', 'Accept-Encoding')
('header:', 'Set-Cookie:', ...
('header:', 'Server:', 'cloudflare')
('header:', 'CF-RAY:', '8df4d4f6cf17e69b-MEL')

When I run from a Fedora container, AFAICT at the last point I can trace it in Python, it sends exactly the same thing, but the call succeeds.

('send:', "b'POST /api/v2/webhook/gerrit-dash-creator/43048/ HTTP/1.1\\r\\nAccept-Encoding: identity\\r\\nContent-Length: 0\\r\\nHost: readthedocs.org\\r\\nUser-Agent: ansible-httpget\\r\\nAuthorization: Basic BLAH\\r\\nConnection: close\\r\\n\\r\\n'")
('reply:', "'HTTP/1.1 200 OK\\r\\n'")
('header:', 'Date:', 'Fri, 08 Nov 2024 10:39:04 GMT')
('header:', 'Content-Type:', 'application/json')
('header:', 'Content-Length:', '78')
('header:', 'Connection:', 'close')
('header:', 'allow:', 'POST, OPTIONS')
('header:', 'vary:', 'Accept, Accept-Language, Cookie')
('header:', 'content-security-policy:', "object-src 'none'; frame-ancestors 'none'")
('header:', 'x-frame-options:', 'DENY')
('header:', 'x-content-type-options:', 'nosniff')
('header:', 'referrer-policy:', 'strict-origin-when-cross-origin')
('header:', 'cross-origin-opener-policy:', 'same-origin')
('header:', 'content-language:', 'en')
('header:', 'strict-transport-security:', 'max-age=31536000;')
('header:', 'x-backend:', 'web-i-01af423c9fbfa39d9')
('header:', 'CF-Cache-Status:', 'DYNAMIC')
('header:', 'Set-Cookie:', ...
('header:', 'Set-Cookie:', ...
('header:', 'Server:', 'cloudflare')
('header:', 'CF-RAY:', '8df4efe00d993056-MEL')

If you'd like to replicate this, you can put into a file /tmp/test.yaml (modulo a project / auth details that work obviously)

- hosts: localhost
  connection: local
  tasks:
    - name: Upload to RTD
      block:
        - name: Trigger readthedocs build webhook via authentication
          uri:
            method: POST
            url: 'https://readthedocs.org/api/v2/webhook/gerrit-dash-creator/43048/'
            user: 'openstackci'
            password: '<password>'
            force_basic_auth: yes

then run for fedora:latest, ubuntu:noble, debian:bookworm
sudo podman run --rm -v /tmp/test.yaml:/tmp/test.yaml:Z -it CONTAINER /bin/bash ... install python3/python3-venv depnding on the distro and run

$ python3 -m venv /tmp/venv
$ /tmp/venv/bin/pip install ansible
$ /tmp/venv/bin/ansible-playbook -i localhost, /tmp/test.yaml

It feels like CF must be somehow fingerprinting ... something ... even more than the UA or header order, etc?

Is there any way to tell what this is filtering on? I included the "RAY-ID" ... is it possible to tell why this is being filtered from that?

@humitos
Copy link
Member

humitos commented Nov 11, 2024

It feels like CF must be somehow fingerprinting

IIRC there are some configuration on CF about fingerprinting to avoid spam and AI mainly. We did some of this work around August, https://about.readthedocs.com/blog/2024/07/ai-crawlers-abuse/

@humitos humitos added the Needed: design decision A core team decision is required label Nov 12, 2024
@agjohnson
Copy link
Contributor

These requests failed not one of our own explicit rules, but the block comes from Cloudflare's browser integrity check. All of the requests failed with this same reason. I'm not familiar with what this check is looking at to determine "integrity" though.

@ianw
Copy link
Author

ianw commented Nov 28, 2024

browser integrity check.

Thanks for confirming! https://developers.cloudflare.com/waf/tools/browser-integrity-check/ is predictably vague :)

One other thing I don't think I mentioned was that when I mitmproxy in between to try and trace what was on the wire, it didn't fail. This is why the send in Python is about the last place I can practically see it before it becomes TLS. I think it must be looking at more than UA and headers ... but who knows.

BTW we switched from using Ansible's bulit-in uri: module to making an external call to curl with https://review.opendev.org/c/zuul/zuul-jobs/+/934243 ...

@tsibley
Copy link

tsibley commented Dec 4, 2024

I dug into this a bit, and it seems Cloudflare is assessing (among other things, I'm sure) the TLS ciphers offered in the client hello during the initial TLS exchange. When it determines they're not up to snuff (i.e. old/insecure), Cloudflare is serving a 403 with an HTML body that requires JS to execute in a browser to pass a "browser challenge".

@ianw
Copy link
Author

ianw commented Dec 4, 2024

I dug into this a bit, and it seems Cloudflare is assessing (among other things, I'm sure) the TLS ciphers offered in the client hello during the initial TLS exchange. When it determines they're not up to snuff (i.e. old/insecure), Cloudflare is serving a 403 with an HTML body that requires JS to execute in a browser to pass a "browser challenge".

That's interesting! Just out of interest, how did you get those handshake dumps? One problem I had was that when I put mitmproxy in the middle it started working -- which makes some sense in a hand-wavy way as it's then terminating to a different SSL implementation from the venv I installed mitmproxy in, rather than the Ansible that's on the "other" side of it (although I feel like they'd be very similar, I didn't really check to the level of what openssl or cryptography wheel it was linked to...)

It's interesting just for the sake of being interesting and the RE challenge, but ultimately it seems unlikely that cloudflare are going to give us a clear list of instructions on how to essentially defeat their checks 😄

It seems unlikely that the cause of problems for RTD is from bots is hitting these endpoints; from the blog post it seemed to be about general scraping. Perhaps there's some way to make the checks on the webhook endpoints a little less restrictive seeing as they are hit from such a wide range of varying automation things?

@agjohnson
Copy link
Contributor

Perhaps there's some way to make the checks on the webhook endpoints a little less restrictive seeing as they are hit from such a wide range of varying automation things?

This is normally a domain level configuration, but I just tried adding a configuration rule disabling browser integrity checks for requests to our APIs. Does this help the requests?

@ianw
Copy link
Author

ianw commented Dec 5, 2024

This is normally a domain level configuration, but I just tried adding a configuration rule disabling browser integrity checks for requests to our APIs. Does this help the requests?

I just tried the test from above; running the ping from ansible's uri: module in a debian:bookworm container and it gave me a 403. It gave me "cf_ray": "8ecfbd67eaeaf0d0-MEL". Same thing passed in a fedora:latest container (f41).

@agjohnson
Copy link
Contributor

Well, the good news is that the request did avoid the browser integrity check, but the bad news is that now the requests are just being flagged and blocked as AI bot traffic. This is by Cloudflare's managed rules for bot detection, which is the configuration we've enabled to combat abusive LLM bots/companies and to a lesser extent, API scraping.

While the browser integrity check can easily be disabled, the AI bot detection can't without opening up some holes for harmful bot traffic through.

I'd have to think more about a potential work around here, I don't have any great answers at the moment.

@tsibley
Copy link

tsibley commented Dec 5, 2024

@ianw

Just out of interest, how did you get those handshake dumps?

I used tcpdump and then opened up the pcap files in Wireshark. I also set SSLKEYLOGFILE when making the requests so I could decrypt the TLS application traffic in Wireshark without interposing something like mitmproxy.

I use mitmproxy all the time, but it does add another TLS/network stack to the mix and for this kind of thing that can change behavior as you saw.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needed: design decision A core team decision is required
Projects
None yet
Development

No branches or pull requests

4 participants