-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failure to connect to server with Let's Encrypt certificate #5
Comments
(wound up here from Twitter) Certificate validation is a fairly tricky subject and is ideally left up to the TLS library. In particular I think the issue you're asking about is the difference between the chain (what the server sends you) and the path (what your certificate validator builds based on the leaf certificate, the untrusted contents of the chain, and our root store). Here are some blogs about that: https://medium.com/@sleevi_/path-building-vs-path-verifying-implementation-showdown-39a9272b2820 So I can get some context on the question: are you doing validation by hand as a way to work around verification failure? Or were you already doing validation for another reason, and are trying to debug why this specific case fails?
That's because it's a cross-signed root, and the root it's signed by (DST Root CA X3) is expired, and macOS enforces that expiration. Well-behaved validation code should ignore the fact that A doesn't validate, so long as it can build a valid path to a trusted root (even if that path doesn't use any certificates from the chain). |
@jsha I am using code from this library in I am not an expert in this stuff and my priority is to make sure the library I'm shipping is safe. If it turns out Ruby's |
I would note that require 'uri'
require 'net/http'
uri = URI.parse('https://faye.jcoglan.com/')
puts Net::HTTP.get_response(uri) I'm not sure what |
@jsha Your article reflects roughly my understanding of what's going on, without being an expert -- I assumed that the client is given a set of certificates, and as long as it can find some path from a cert it trusts (something in the X509 default store) to the server's cert, things are ok. Does that mean my script is doing something valid when it sees that it can verify cert 2, and use that to verify cert 3, and therefore cert 3 is trustworthy, even though cert 1 is untrusted? Or have I misunderstood? |
My Ruby is a bit rusty and in particular I'm not familiar with EventMachine, but I suspect I see you have this call:
Is that not enough to turn on certificate validation? Why do you also need to implement
This is how things are supposed to work, but per the article don't quite work that way. And the default behavior differs based on which version of OpenSSL. What version are you running? |
It's an implementation decision by EventMachine not to fully implement certificate verification itself, but to provide a callback for the client to do it, because different protocols that run atop TLS have different requirements here, I believe. Faraday originally spotted this and implemented the verification code I'm referring to in this issue.
I'm not actually sure which version Ruby is linked against. I'm on macOS 11.6 and installed ruby using |
One thing I notice is missing from the call to verify() is the chain (see https://ruby-doc.org/stdlib-2.5.1/libdoc/openssl/rdoc/OpenSSL/X509/Store.html). That will be required for this to work right; but as I said before, if you can avoid overriding ssl_verify_peer, and just rely on |
I have just checked this and to confirm: passing |
Ah, and now that I'm coming to understand this code a bit better: def store_cert(cert)
@cert_store.add_cert(cert)
rescue OpenSSL::X509::StoreError => error
raise error unless error.message == 'cert already in hash table'
end I'm pretty sure this is incorrect. This adds the peer's certificate to your store of trusted roots. In general you should not need to modify the Store after you have initialized it with a set of trusted roots. |
I think the chain is built internally by adding certs to the store via |
Ah ok, let me try passing the chain as you suggest and see what happens. |
This produces the same result: cert 1 is untrusted, the other two are trusted: require 'bundler/setup'
require 'eventmachine'
require 'openssl'
HOST = 'faye.jcoglan.com'
module Connection
def connection_completed
@cert_store = OpenSSL::X509::Store.new
@last_cert = nil
@last_verify = false
@cert_store.set_default_paths
start_tls(sni_hostname: HOST, verify_peer: true)
end
def ssl_verify_peer(cert_text)
certificate = OpenSSL::X509::Certificate.new(cert_text)
@last_cert = certificate
@last_verify = @cert_store.verify(certificate, @cert_store.chain)
p certificate
if @last_verify
puts "\e[37;42m[ VERIFIED ]\e[0m"
else
puts "\e[37;41m[ UNVERIFIED ]\e[0m"
end
true
end
def ssl_handshake_completed
unless @last_verify and OpenSSL::SSL.verify_certificate_identity(@last_cert, HOST)
raise OpenSSL::SSL::SSLError,
%(host "#{HOST}" does not match the server certificate)
end
end
end
EM.run { EM.connect(HOST, 443, Connection) } |
Looking at the EventMachine code, I start to see the problem. It has a different meaning for verify_peer than OpenSSL (including Ruby's OpenSSL bindings). In OpenSSL when you turn on "verify_peer", it means "verify peer using the default validator and default roots." However, in EventMachine "verify_peer" means "verify peer using a custom callback." This is a bad API shape, since custom certificate validation is very hard to get right, as you're seeing: EventMachine makes a callback in C++, which AFAICT calls out to the Ruby callback: https://github.com/eventmachine/eventmachine/blob/8e1d6b11fd8400593af035a7a0d203d24c10c9b0/ext/ssl.cpp#L675-L698 Ideally that Ruby callback should take a cert and a chain. But it seems to only take a cert. Also, if you look at the second chunk of C code, it gets the cert using X509_STORE_CTX_get_current_cert. I'm not that familiar with OpenSSL's API here, but those docs say "X509_STORE_CTX_get_current_cert() returns the certificate which caused the error or NULL if no certificate is relevant to the error." So it seems like there are two API bugs in EventMachine:
And possibly one implementation bug:
From your code:
This seems incorrect. The docs for Store::chain say:
In other words, it's not the chain sent by the peer. It's not immediately clear to me that you can fix this without an upstream fix in EventMachine. By the way, thanks for working on this! Certificate validation code can be a real slog to work with, but I'm glad you're helping make the ecosystem more secure. |
It looks like EventMachine does expose an accessor get_peer_cert: https://www.rubydoc.info/github/eventmachine/eventmachine/EventMachine%2FConnection:get_peer_cert. But it doesn't expose the corresponding get_peer_chain it would need in order to make it feasible to write your own certificate validation. (And again, it's unreasonable for EventMachine to say users must implement their own certificate validation; it should allow them to delegate that thankless task to OpenSSL) |
Yeah this reminds me of a lot of what you'll find if you dig in the issue threads linked in https://blog.jcoglan.com/2020/07/31/missing-tls-verification-in-faye/ that go back years. Long story short: this isn't getting fixed in EventMachine. It's possible the ideal solution is to migrate off of EM entirely, but that would break everything depending on my libraries. My priority is to make this as safe as reasonably possible given EM is probably not going to change its behaviour. |
Do you know any way I could make this safer? Is it fundamentally bad to ignore the verification failure on cert 1? Or am I fine letting this connection through because I managed some string of successful |
The explanation from Let's Encrypt about what these certs are for seems to imply clients are expected to accept either the first or second cert in the trace above, and so failing on one of them is normal? |
I would ignore the Let's Encrypt blog post, and indeed my early speculation on this thread. The problem isn't really about the cross-signed root expiration. The fundamental problem is this: in ssl_verify_peer you have a leaf certificate (aka end-entity certificate). In the default root store you have a set of roots. In between there is a whole universe of intermediates that are trusted only by virtue of being signed by a root. In order to validate a peer certificate, you need to build a path from the leaf certificate, through some number of intermediates, to some root certificate. But where do you get those intermediates? Normally they are in the cert chain provided during the TLS handshake, but EventMachine neglected to allow you access to those. In theory you could bundle a list of common intermediates and give those to OpenSSL as untrusted inputs to the verification process. But that's very fragile - you'd need to update the intermediates all the time. CAs change their intermediates regularly and without advance notice, and expect clients to still be able to verify the new intermediate (via the TLS chain mechanism). You could provide a way for your users to input a list of regularly-updated intermediates. Still fragile but at least provides a way to fix things when they break. You could offer your users an interface that says "verify the peer certificate by checking its bytes match this hardcoded certificate we expect." But that's also fragile - it would break every time the peer rotated its certificate, which should be often. Here's a really hacky idea: you could make your own connection to the peer, collect the chain, and provide that as untrusted input to the certificate verification. |
Isn't this what EM is providing when it repeatedly invokes Have I misunderstood something here? In the case of my library there is also an escape hatch in the form of an option letting the caller provide their own root CA file, and we fall back to |
Aha, that makes sense. I was misunderstanding the API. So, it may be possible to do something reasonable here. I can't say exactly what the right thing is at the moment, but it should be possible. I can take a closer look later this week. |
Thanks @jsha, this was really helpful. Absolutely no obligation to dig into this any further :) |
The rough idea I'm thinking of is: accumulate the chain as you receive each subsequent call to ssl_verify_peer, and then verify the whole thing in ssl_handshake_completed. But note that I'm pretty unfamiliar with all these APIs, so please confirm with the documentation and so on. :-) |
Hey @jcoglan, hello from 2024 - have you managed to figure this one out? 😅 I'm using EventMachine, faye-websocket and now also em-http-request in my project, and I've come across an issue that I suspect might be the same thing as you've discussed here… The specific URL I'm trying to load is this one: https://genco.me/.well-known/did.json open-uri / Net::HTTP loads it just fine. But the em-http-request code (copied from faraday apparently) throws: I don't know that much about SSL certificate verification and to be honest I'd prefer not to, since I think this is something that's best not to do unless you're sure you're doing it right - I don't understand why EventMachine can't just delegate to the same verification code that Net::HTTP is using… or maybe it can be told to? |
@mackuba It looks like that domain uses Let's Encrypt certificates which was also the reason that This is the change we made in faye/faye-websocket-ruby@d9428fa: rather than verify every certificate, we check that there is some chain of certificates from the client's trusted root CAs to the domain certificate, and that the final certificate matches the hostname for the request. The current implementation of our verifier is here: https://github.com/faye/faye-websocket-ruby/blob/0.11.3/lib/faye/websocket/ssl_verifier.rb I'm not sure why this is not baked into EventMachine but there's not much we can do about this. This does something very similar to what's inside |
@jcoglan ok, thanks! Tbh, I've since then rewritten this code again and moved back to sync HTTP with Net::HTTP, because I've found a way to do it this way and it makes the code simpler 😉 But this should be helpful if I decide to use em-http again 👍 |
Hi there 👋 This might be weird issue, I'm opening it because I use the SSL verification code from this library in
faye-websocket
and just discovered an issue with it.The Faye website is https://faye.jcoglan.com/ and its TLS certificates are provided by Let's Encrypt's certbot. I found that the verification code rejected it even though my browsers think the site is fine, so I wanted to run this past someone else familiar with this code to see what you think.
If you run this code, you get a TLS failure:
To debug this, I made some adjustments to my copy of the TLS verification code and ended up with this demo that connects successfully:
The main way this differs from the implementation in Faraday is that it's ok for
@cert_store.verify(certificate)
to fail. If it fails, the certificate is not added to theX509::Store
, but we don't raise an exception immediately. Instead we just require that the last certificate in the chain is verified, and that it passed hostname checking.When I run this script, I see:
We have 3 certs here:
CN=ISRG Root X1,O=Internet Security Research Group,C=US
CN=R3,O=Let's Encrypt,C=US
CN=faye.jcoglan.com
The first cert, A, is not verified given the default paths for
X509::Store
. However, cert B is verified, and then cert C is also verified if cert B has been added to the store.If A is added to the store, this causes cert C to fail verification. Likewise if cert B is not added, cert C fails to verify.
What I'm wondering is: is the implementation above valid? Is it ok for cert A to fail, if cert B passes verification and can then go on to verify C? Does this mean there is some chain of certs from my machine's trusted certs to the site's own cert, and that means it can be trusted? Or should the failure of cert A cause the connection to fail?
The text was updated successfully, but these errors were encountered: