-
-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pings getting 403 responses in "interesting" ways #11753
Comments
IIRC there are some configuration on CF about fingerprinting to avoid spam and AI mainly. We did some of this work around August, https://about.readthedocs.com/blog/2024/07/ai-crawlers-abuse/ |
These requests failed not one of our own explicit rules, but the block comes from Cloudflare's browser integrity check. All of the requests failed with this same reason. I'm not familiar with what this check is looking at to determine "integrity" though. |
Thanks for confirming! https://developers.cloudflare.com/waf/tools/browser-integrity-check/ is predictably vague :) One other thing I don't think I mentioned was that when I BTW we switched from using Ansible's bulit-in |
I dug into this a bit, and it seems Cloudflare is assessing (among other things, I'm sure) the TLS ciphers offered in the client hello during the initial TLS exchange. When it determines they're not up to snuff (i.e. old/insecure), Cloudflare is serving a 403 with an HTML body that requires JS to execute in a browser to pass a "browser challenge". |
That's interesting! Just out of interest, how did you get those handshake dumps? One problem I had was that when I put It's interesting just for the sake of being interesting and the RE challenge, but ultimately it seems unlikely that cloudflare are going to give us a clear list of instructions on how to essentially defeat their checks 😄 It seems unlikely that the cause of problems for RTD is from bots is hitting these endpoints; from the blog post it seemed to be about general scraping. Perhaps there's some way to make the checks on the webhook endpoints a little less restrictive seeing as they are hit from such a wide range of varying automation things? |
This is normally a domain level configuration, but I just tried adding a configuration rule disabling browser integrity checks for requests to our APIs. Does this help the requests? |
I just tried the test from above; running the ping from ansible's |
Well, the good news is that the request did avoid the browser integrity check, but the bad news is that now the requests are just being flagged and blocked as AI bot traffic. This is by Cloudflare's managed rules for bot detection, which is the configuration we've enabled to combat abusive LLM bots/companies and to a lesser extent, API scraping. While the browser integrity check can easily be disabled, the AI bot detection can't without opening up some holes for harmful bot traffic through. I'd have to think more about a potential work around here, I don't have any great answers at the moment. |
I used I use |
Details
This is a follow-up to #11733 I guess, but it is driving me a bit nuts :)
I think this is cloudflare doing something, but the way you send the headers, and the user-agent, appears matter to the success of being able to ping the webhooks. I have removed the auth token below, but in all cases it is exactly the same.
If you test with the following
If you use the default Python UA it will 403
but if you fake that as
curl/8.6.0
it will workI can only assume cloudflare is filtering this for some reason? I thought that the UA filtering might be the cause of our Ansible pings that stopped working sometime between 2024-09-19T11:08:21Z (our last successful build) and 2024-09-23T21:32:29Z (the first observed failure for a previously working project), but it seems not quite that simple...
As I mentioned in the prior issue, we use the Ansible
uri
module to ping (https://docs.ansible.com/ansible/latest/collections/ansible/builtin/uri_module.html) where there is something even more bizarre going on.I have instrumented the URI call to dump what it is sending/getting back. I have only removed the "set-cookie" values below in case they give something away.
When I run from a Debian bookworm container, it fails with a 403
When I run from a Fedora container, AFAICT at the last point I can trace it in Python, it sends exactly the same thing, but the call succeeds.
If you'd like to replicate this, you can put into a file
/tmp/test.yaml
(modulo a project / auth details that work obviously)then run for
fedora:latest
,ubuntu:noble
,debian:bookworm
sudo podman run --rm -v /tmp/test.yaml:/tmp/test.yaml:Z -it CONTAINER /bin/bash
... installpython3
/python3-venv
depnding on the distro and runIt feels like CF must be somehow fingerprinting ... something ... even more than the UA or header order, etc?
Is there any way to tell what this is filtering on? I included the "RAY-ID" ... is it possible to tell why this is being filtered from that?
The text was updated successfully, but these errors were encountered: