Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix ACME certificate for wildcard and root domains #3675

Merged
merged 2 commits into from
Jul 31, 2018

Conversation

nmengin
Copy link
Contributor

@nmengin nmengin commented Jul 25, 2018

What does this PR do?

This Pull Request introduces a mechanims which allow retrying to generate a ACME certificate when a DNSChallenge failed to obtain a certificate for both wildcard and root domain.

Motivation

Since the feature which allows Træfik users to generate ACME certificates for both wildcard domain (as CN) and root domain (as SAN), a flaky behavior is noted.

Sometimes everything works well, sometines challenges cannot be solved with different errors (TXT record missing or incorrect).
These errors exist for almost all the DNS providers.

Fixes #3468 #3445.

More

  • Added/updated tests

Additional Notes

A first analysis showed that the problem is due to the name of the TXT records needed by Let's Encrypt: a wildcard domain and its root domain are challenged thanks to a TXT record with the same name. In function of the TTL given to the TXT records, it was unable to Let's Encrypt the get all the values asked in the allowed time.

After analyzed deeply the problem (and made some tests), it appears that the problem is due to the time to propagate a TXT record creation/deletion/update in the DNS provider infrastructure.

Indeed, LEGO makes a pre-check on all the TXT records before to ask Let's Ecrypt to do the challenge. But, sometimes, the result caught by LEGO and the other one caught by Let's Encrypt are differents (it's maybe due to the time to propagate the modifications into all the DNS provider infrastructure).

The modification introduced by this PR allows Træfik to retry to obtain a ACME certificate for a wildcard and its root domain when errors happens.

The retry works because Let's Encrypt does not ask to challenge twice the same domain.
Thus, if during the first iteration the wildcard is successfully checked, during the second iteration only the root domain will be checked.

@traefiker traefiker added this to the 1.7 milestone Jul 25, 2018
@nmengin nmengin force-pushed the hotfix/acme-wildcard-root-domain branch 2 times, most recently from 115fce0 to d34b5f9 Compare July 26, 2018 07:58
Copy link
Member

@juliens juliens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👏 👏

// Search a wildcard domain if not already found
if len(rootDomain) == 0 && strings.HasPrefix(domains[i], "*.") {
rootDomain = strings.TrimPrefix(domains[i], "*.")
// Restart to the first indfex to check id the root domain of the wildcard domain exists
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indfex/index

Copy link
Member

@emilevauge emilevauge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this one @nmengin 👏
Few comments though :)

@@ -246,6 +249,17 @@ func (p *Provider) getClient() (*acme.Client, error) {
if err != nil {
return nil, err
}

// Set the precheck timeout into the DNSChallenge provider
switch provider := provider.(type) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using switch type with only one case is probably overkill, how about using a simpler if else here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@emilevauge

I have to use a switch statement as mentionned here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this working if cpt, ok := provider.(acme.ChallengeProviderTimeout); ok { instead ?

// Check if we can use a backoff only if we use the DNS Challenge and if is there are at least 2 domains to check
if p.DNSChallenge != nil && len(domains) > 1 {
rootDomain := ""
for i := 0; i < len(domains); i++ {
Copy link
Member

@emilevauge emilevauge Jul 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for i := range domains instead? Modifying the index in a loop leads to bugs (infinite loop, out of range, etc).

download

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@emilevauge I used the old way to iterate on a slice in the way to use only one loop statement for the two operations I have to to on the slice.

I was not aware about the possible issues you mentionned, I am going to change the code.


ebo := backoff.NewExponentialBackOff()
// Give the time to LEGO to try to solve the challenge twice
ebo.MaxElapsedTime = 2 * timeout
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about this, maybe use WithMaxRetries instead ?

@nmengin nmengin force-pushed the hotfix/acme-wildcard-root-domain branch from d34b5f9 to 1ee5355 Compare July 31, 2018 08:14
@nmengin nmengin force-pushed the hotfix/acme-wildcard-root-domain branch from 1ee5355 to 63af610 Compare July 31, 2018 09:01
Copy link
Member

@juliens juliens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@mmatur mmatur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@emilevauge emilevauge left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job @nmengin
LGTM

@mmatur mmatur force-pushed the hotfix/acme-wildcard-root-domain branch from 63af610 to 78d9652 Compare July 31, 2018 10:12
@A-Shleifman
Copy link
Contributor

Hi @nmengin,

I'm afraid the problem is not fixed yet. I'm running traefik in a docker container based on traefik:1.7-alpine. The actual version is v1.7.0-rc3. Tried to generate wildcard certificates for two domains and got the same incorrect TXT record error:

traefik.log

I'm using cloudflare.

.txt as github doesn't allow .toml files
traefik.toml.txt

Any ideas what could be causing this?

@A-Shleifman
Copy link
Contributor

Update: Certificates are generated for domains using route53 provider, but the error appears when using cloudflare

@BirkhoffLee
Copy link

BirkhoffLee commented Aug 15, 2018 via email

@nmengin
Copy link
Contributor Author

nmengin commented Aug 15, 2018

Hello @A-Shleifman ,

Many thanks fro your feedback.

I guess the problem comes from the paramtrization of the cloudflare provider into LEGO 👍
The TTL and the check timeout are equals (120s) that's why even if we try twice the problem persists.

I'll try to find another solution when I'll be back at the office (monday).

Can you open a new issue as @BirkhoffLee suggested please?

@A-Shleifman
Copy link
Contributor

Done => #3761 :)

@pwaldon
Copy link

pwaldon commented Aug 19, 2018

Hello,

with the latest build (codename: maroilles) and NS1 provider I started to get new errors. Is this somehow related?

config:

[[acme.domains]]
  main = "*.abc.com"
  sans = ["abc.com"]
traefik          | time="2018-08-19T05:43:50Z" level=info msg="legolog: [INFO] acme: Registering account for [email protected]"
traefik          | time="2018-08-19T05:43:51Z" level=info msg="legolog: [INFO] [*.abc.com] acme: Obtaining bundled SAN certificate"
traefik          | time="2018-08-19T05:43:51Z" level=info msg="legolog: [INFO] [*.abc.com] AuthURL: https://acme-v02.api.letsencrypt.org/acme/authz/..."
traefik          | time="2018-08-19T05:43:51Z" level=info msg="legolog: [INFO] [abc.com] acme: Trying to solve DNS-01"
traefik          | time="2018-08-19T05:43:51Z" level=error msg="Unable to obtain ACME certificate for domains \"*.abc.com\" : unable to generate a certificate for the domains [*.abc.com]: acme: Error -> One or more domains had a problem:\n[abc.com] error presenting token: dns: domain must be fully qualified\n"

@nmengin
Copy link
Contributor Author

nmengin commented Aug 20, 2018

Hello @pwaldon ,

Many thanks for your interest in the project.

Your problem does not seem to be related to this closed PR.

If you need more help, please join our Slack workspace for more community #support.

@chris-aeviator
Copy link

without setting timeout to >90 sec, I'm getting this behavior with Gandi V5 trying to issue a wildcard cert with traefik

traefik:

Gandi DNS: DNS Record Created
Gandi DNS: Zone record deleted
Gandi DNS: DNS Record Created
Gandi DNS: Zone record deleted
time="2018-09-18T21:47:29Z" level=error msg="Unable to obtain ACME certificate for domains \"**.domain.com, domain.com\" : cannot obtain certificates: acme: Error -> One or more domains had a problem:\n[domaincom] acme: Error 403 - urn:ietf:params:acme:error:unauthorized - Incorrect TXT record \"QEpHmVXXXXXXXcz0Au3ZrFphtXFMKZrkdCM\" found at _acme-challenge.domain.com\n"
Gandi DNS: DNS Record Created

traefik.toml

defaultEntryPoints = ["http", "https"]

[web]
# Port for the status page
address = ":8080"

# Entrypoints, http and https
[entryPoints]
  # http should be redirected to https
  [entryPoints.http]
  address = ":80"
    #[entryPoints.http.redirect]
    #entryPoint = "https"
  # https is the default
  [entryPoints.https]
  address = ":443"
    [entryPoints.https.tls]

[retry]

# Enable ACME (Let's Encrypt): automatic SSL
[acme]
# Email address used for registration
email = "[email protected]"
storageFile = "acme.json"
entryPoint = "https"
onDemand = false
onHostRule = false
# Use a HTTP-01 acme challenge rather than TLS-SNI-01 challenge
#[acme.httpChallenge]
#entryPoint = "http"

caServer = "https://acme-v02.api.letsencrypt.org/directory"

# optional but recommend
#[acme.tlsChallenge]

  [acme.dnsChallenge]
    provider = "gandiv5"
    delayBeforeCheck = 91

  [[acme.domains]]
    main = "*.domain.com"
    sans = ["domain.com"]
# Enable Docker configuration backend
[docker]
endpoint = "unix:///var/run/docker.sock"
domain = "domain.com"
watch = true
exposedbydefault = false
[api]
  entryPoint = "traefik"
  dashboard = true
  [api.statistics]
    recentErrors = 10

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants