Pings getting 403 responses in "interesting" ways #11753

ianw · 2024-11-08T11:07:25Z

Details

This is a follow-up to #11733 I guess, but it is driving me a bit nuts :)

I think this is cloudflare doing something, but the way you send the headers, and the user-agent, appears matter to the success of being able to ping the webhooks. I have removed the auth token below, but in all cases it is exactly the same.

If you test with the following

import http
import base64
import urllib
import urllib.request

http.client.HTTPConnection.debuglevel = 1

url = 'https://readthedocs.org/api/v2/webhook/gerrit-dash-creator/43048/'

auth_user = 'openstackci'
auth_passwd = '<thepassword>'

req = urllib.request.Request(url, method='POST')
base64string = base64.b64encode(bytes(f'{auth_user}:{auth_passwd}', 'ascii'))
req.add_header("Authorization", f'Basic {base64string.decode()}')
#req.add_header("User-Agent", 'curl/8.6.0')
with urllib.request.urlopen(req) as response:
    print(response.read())

If you use the default Python UA it will 403

send: b'POST /api/v2/webhook/gerrit-dash-creator/43048/ HTTP/1.1\r\nAccept-Encoding: identity\r\nContent-Length: 0\r\nHost: readthedocs.org\r\nUser-Agent: Python-urllib/3.12\r\nAuthorization: Basic BLAH==\r\nConnection: close\r\n\r\n'
reply: 'HTTP/1.1 403 Forbidden\r\n'

but if you fake that as curl/8.6.0 it will work

send: b'POST /api/v2/webhook/gerrit-dash-creator/43048/ HTTP/1.1\r\nAccept-Encoding: identity\r\nContent-Length: 0\r\nHost: readthedocs.org\r\nAuthorization: Basic BLAH==\r\nUser-Agent: curl/8.6.0\r\nConnection: close\r\n\r\n'
reply: 'HTTP/1.1 200 OK\r\n'

I can only assume cloudflare is filtering this for some reason? I thought that the UA filtering might be the cause of our Ansible pings that stopped working sometime between 2024-09-19T11:08:21Z (our last successful build) and 2024-09-23T21:32:29Z (the first observed failure for a previously working project), but it seems not quite that simple...

As I mentioned in the prior issue, we use the Ansible uri module to ping (https://docs.ansible.com/ansible/latest/collections/ansible/builtin/uri_module.html) where there is something even more bizarre going on.

I have instrumented the URI call to dump what it is sending/getting back. I have only removed the "set-cookie" values below in case they give something away.

When I run from a Debian bookworm container, it fails with a 403

('send:', "b'POST /api/v2/webhook/gerrit-dash-creator/43048/ HTTP/1.1\\r\\nAccept-Encoding: identity\\r\\nContent-Length: 0\\r\\nHost: readthedocs.org\\r\\nUser-Agent: ansible-httpget\\r\\nAuthorization: Basic BLAH\\r\\nConnection: close\\r\\n\\r\\n'")
('reply:', "'HTTP/1.1 403 Forbidden\\r\\n'")
('header:', 'Date:', 'Fri, 08 Nov 2024 10:20:41 GMT')
('header:', 'Content-Type:', 'text/html; charset=UTF-8')
('header:', 'Content-Length:', '4518')
('header:', 'Connection:', 'close')
('header:', 'X-Frame-Options:', 'SAMEORIGIN')
('header:', 'Referrer-Policy:', 'same-origin')
('header:', 'Cache-Control:', 'max-age=15')
('header:', 'Expires:', 'Fri, 08 Nov 2024 10:20:56 GMT')
('header:', 'Set-Cookie:', ...
('header:', 'Vary:', 'Accept-Encoding')
('header:', 'Set-Cookie:', ...
('header:', 'Server:', 'cloudflare')
('header:', 'CF-RAY:', '8df4d4f6cf17e69b-MEL')

When I run from a Fedora container, AFAICT at the last point I can trace it in Python, it sends exactly the same thing, but the call succeeds.

('send:', "b'POST /api/v2/webhook/gerrit-dash-creator/43048/ HTTP/1.1\\r\\nAccept-Encoding: identity\\r\\nContent-Length: 0\\r\\nHost: readthedocs.org\\r\\nUser-Agent: ansible-httpget\\r\\nAuthorization: Basic BLAH\\r\\nConnection: close\\r\\n\\r\\n'")
('reply:', "'HTTP/1.1 200 OK\\r\\n'")
('header:', 'Date:', 'Fri, 08 Nov 2024 10:39:04 GMT')
('header:', 'Content-Type:', 'application/json')
('header:', 'Content-Length:', '78')
('header:', 'Connection:', 'close')
('header:', 'allow:', 'POST, OPTIONS')
('header:', 'vary:', 'Accept, Accept-Language, Cookie')
('header:', 'content-security-policy:', "object-src 'none'; frame-ancestors 'none'")
('header:', 'x-frame-options:', 'DENY')
('header:', 'x-content-type-options:', 'nosniff')
('header:', 'referrer-policy:', 'strict-origin-when-cross-origin')
('header:', 'cross-origin-opener-policy:', 'same-origin')
('header:', 'content-language:', 'en')
('header:', 'strict-transport-security:', 'max-age=31536000;')
('header:', 'x-backend:', 'web-i-01af423c9fbfa39d9')
('header:', 'CF-Cache-Status:', 'DYNAMIC')
('header:', 'Set-Cookie:', ...
('header:', 'Set-Cookie:', ...
('header:', 'Server:', 'cloudflare')
('header:', 'CF-RAY:', '8df4efe00d993056-MEL')

If you'd like to replicate this, you can put into a file /tmp/test.yaml (modulo a project / auth details that work obviously)

- hosts: localhost
  connection: local
  tasks:
    - name: Upload to RTD
      block:
        - name: Trigger readthedocs build webhook via authentication
          uri:
            method: POST
            url: 'https://readthedocs.org/api/v2/webhook/gerrit-dash-creator/43048/'
            user: 'openstackci'
            password: '<password>'
            force_basic_auth: yes

then run for fedora:latest, ubuntu:noble, debian:bookworm
sudo podman run --rm -v /tmp/test.yaml:/tmp/test.yaml:Z -it CONTAINER /bin/bash ... install python3/python3-venv depnding on the distro and run

$ python3 -m venv /tmp/venv
$ /tmp/venv/bin/pip install ansible
$ /tmp/venv/bin/ansible-playbook -i localhost, /tmp/test.yaml

It feels like CF must be somehow fingerprinting ... something ... even more than the UA or header order, etc?

Is there any way to tell what this is filtering on? I included the "RAY-ID" ... is it possible to tell why this is being filtered from that?

The text was updated successfully, but these errors were encountered:

humitos · 2024-11-11T11:54:12Z

It feels like CF must be somehow fingerprinting

IIRC there are some configuration on CF about fingerprinting to avoid spam and AI mainly. We did some of this work around August, https://about.readthedocs.com/blog/2024/07/ai-crawlers-abuse/

agjohnson · 2024-11-27T23:35:31Z

These requests failed not one of our own explicit rules, but the block comes from Cloudflare's browser integrity check. All of the requests failed with this same reason. I'm not familiar with what this check is looking at to determine "integrity" though.

ianw · 2024-11-28T00:11:42Z

browser integrity check.

Thanks for confirming! https://developers.cloudflare.com/waf/tools/browser-integrity-check/ is predictably vague :)

One other thing I don't think I mentioned was that when I mitmproxy in between to try and trace what was on the wire, it didn't fail. This is why the send in Python is about the last place I can practically see it before it becomes TLS. I think it must be looking at more than UA and headers ... but who knows.

BTW we switched from using Ansible's bulit-in uri: module to making an external call to curl with https://review.opendev.org/c/zuul/zuul-jobs/+/934243 ...

tsibley · 2024-12-04T19:21:19Z

I dug into this a bit, and it seems Cloudflare is assessing (among other things, I'm sure) the TLS ciphers offered in the client hello during the initial TLS exchange. When it determines they're not up to snuff (i.e. old/insecure), Cloudflare is serving a 403 with an HTML body that requires JS to execute in a browser to pass a "browser challenge".

ianw · 2024-12-04T23:12:32Z

I dug into this a bit, and it seems Cloudflare is assessing (among other things, I'm sure) the TLS ciphers offered in the client hello during the initial TLS exchange. When it determines they're not up to snuff (i.e. old/insecure), Cloudflare is serving a 403 with an HTML body that requires JS to execute in a browser to pass a "browser challenge".

That's interesting! Just out of interest, how did you get those handshake dumps? One problem I had was that when I put mitmproxy in the middle it started working -- which makes some sense in a hand-wavy way as it's then terminating to a different SSL implementation from the venv I installed mitmproxy in, rather than the Ansible that's on the "other" side of it (although I feel like they'd be very similar, I didn't really check to the level of what openssl or cryptography wheel it was linked to...)

It's interesting just for the sake of being interesting and the RE challenge, but ultimately it seems unlikely that cloudflare are going to give us a clear list of instructions on how to essentially defeat their checks 😄

It seems unlikely that the cause of problems for RTD is from bots is hitting these endpoints; from the blog post it seemed to be about general scraping. Perhaps there's some way to make the checks on the webhook endpoints a little less restrictive seeing as they are hit from such a wide range of varying automation things?

agjohnson · 2024-12-04T23:53:25Z

Perhaps there's some way to make the checks on the webhook endpoints a little less restrictive seeing as they are hit from such a wide range of varying automation things?

This is normally a domain level configuration, but I just tried adding a configuration rule disabling browser integrity checks for requests to our APIs. Does this help the requests?

ianw · 2024-12-05T00:02:15Z

This is normally a domain level configuration, but I just tried adding a configuration rule disabling browser integrity checks for requests to our APIs. Does this help the requests?

I just tried the test from above; running the ping from ansible's uri: module in a debian:bookworm container and it gave me a 403. It gave me "cf_ray": "8ecfbd67eaeaf0d0-MEL". Same thing passed in a fedora:latest container (f41).

agjohnson · 2024-12-05T00:42:10Z

Well, the good news is that the request did avoid the browser integrity check, but the bad news is that now the requests are just being flagged and blocked as AI bot traffic. This is by Cloudflare's managed rules for bot detection, which is the configuration we've enabled to combat abusive LLM bots/companies and to a lesser extent, API scraping.

While the browser integrity check can easily be disabled, the AI bot detection can't without opening up some holes for harmful bot traffic through.

I'd have to think more about a potential work around here, I don't have any great answers at the moment.

tsibley · 2024-12-05T21:51:45Z

@ianw

Just out of interest, how did you get those handshake dumps?

I used tcpdump and then opened up the pcap files in Wireshark. I also set SSLKEYLOGFILE when making the requests so I could decrypt the TLS application traffic in Wireshark without interposing something like mitmproxy.

I use mitmproxy all the time, but it does add another TLS/network stack to the mix and for this kind of thing that can change behavior as you saw.

humitos added the Needed: design decision A core team decision is required label Nov 12, 2024

agjohnson mentioned this issue Dec 4, 2024

Requests 403 Client Error #11763

Open

dmundra mentioned this issue Dec 4, 2024

403 Client Error: Forbidden for url: https://readthedocs.org/api/v3/projects/?limit=100 nextstrain/readthedocs-cli#7

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pings getting 403 responses in "interesting" ways #11753

Pings getting 403 responses in "interesting" ways #11753

ianw commented Nov 8, 2024 •

edited

Loading

humitos commented Nov 11, 2024

agjohnson commented Nov 27, 2024

ianw commented Nov 28, 2024 •

edited

Loading

tsibley commented Dec 4, 2024

ianw commented Dec 4, 2024 •

edited

Loading

agjohnson commented Dec 4, 2024

ianw commented Dec 5, 2024

agjohnson commented Dec 5, 2024

tsibley commented Dec 5, 2024

Pings getting 403 responses in "interesting" ways #11753

Pings getting 403 responses in "interesting" ways #11753

Comments

ianw commented Nov 8, 2024 • edited Loading

Details

humitos commented Nov 11, 2024

agjohnson commented Nov 27, 2024

ianw commented Nov 28, 2024 • edited Loading

tsibley commented Dec 4, 2024

ianw commented Dec 4, 2024 • edited Loading

agjohnson commented Dec 4, 2024

ianw commented Dec 5, 2024

agjohnson commented Dec 5, 2024

tsibley commented Dec 5, 2024

ianw commented Nov 8, 2024 •

edited

Loading

ianw commented Nov 28, 2024 •

edited

Loading

ianw commented Dec 4, 2024 •

edited

Loading