Skip to content
This repository has been archived by the owner on Feb 8, 2018. It is now read-only.

figure out DNS WTF #1512

Closed
chadwhitacre opened this issue Sep 26, 2013 · 39 comments
Closed

figure out DNS WTF #1512

chadwhitacre opened this issue Sep 26, 2013 · 39 comments

Comments

@chadwhitacre
Copy link
Contributor

Apparently www.gittip.com is resolving to 23.21.209.136 for some people! WTF?!?!? 😡

https://botbot.me/freenode/gittip/msg/6374300/
https://twitter.com/zyegfryed/status/383319201953243136

@chadwhitacre
Copy link
Contributor Author

13 8-16 am

@chadwhitacre
Copy link
Contributor Author

Support ticket at Heroku (login required):

https://help.heroku.com/tickets/99245

@chadwhitacre
Copy link
Contributor Author

I've also filled out http://support.dnsimple.com/contact.

@chadwhitacre
Copy link
Contributor Author

From Heroku:

Interesting, this might be an issue with your DNS provider. I have seen cases with certain DNS providers in the past where they had some leaked configuration but I never seen this happen for DNSimple. Have you contacted them already?

Also, do you know if your customers reporting them all come from a specific region? Have you been able to reproduce this yourself?

My reply:

Thanks Brett. I've just emailed DNSimple, yes. I'll post back here when I have a reply from them.

I've received two reports of this so far. The first was in Denver, the second, Switzerland.

I have not been able to reproduce this myself.

@chadwhitacre
Copy link
Contributor Author

Is it possible to determine whether the IP address in question, 23.21.209.136, was ever owned by Heroku?

@chadwhitacre
Copy link
Contributor Author

@dnsimple Hey there. :-) I'm having a DNS WTF. :-( #1512 … Any insight?

https://twitter.com/whit537/status/383389415206178816

@chadwhitacre
Copy link
Contributor Author

Could be Amazon?

i saw this once at readability, ELB randomly routes wrong

IRC

@chadwhitacre
Copy link
Contributor Author

Here's what I'm getting right now (looks right to me):

$ dig www.gittip.com
<snip>
www.gittip.com.         22      IN      CNAME   nara-9076.herokussl.com.
nara-9076.herokussl.com. 150    IN      CNAME   elb002959-2580971.us-east-1.elb.amazonaws.com.
elb002959-2580971.us-east-1.elb.amazonaws.com. 60 IN A 50.17.213.114
elb002959-2580971.us-east-1.elb.amazonaws.com. 60 IN A 23.23.105.130
elb002959-2580971.us-east-1.elb.amazonaws.com. 60 IN A 50.16.193.209

Could it be that Amazon is improperly routing elb002959-2580971.us-east-1.elb.amazonaws.com. on occasion? So far that seems like the most plausible explanation to me. I guess it'd be up to Heroku to track that down for us, though.

@chadwhitacre
Copy link
Contributor Author

(That's cross-posted to the Heroku ticket.)

@aeden
Copy link

aeden commented Sep 27, 2013

I just checked all 4 name servers, and all are resolving with the same CNAME record:

; <<>> DiG 9.8.3-P1 <<>> @ns4.dnsimple.com www.gittip.com
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 10430
;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0
;; WARNING: recursion requested but not available

;; QUESTION SECTION:
;www.gittip.com.            IN  A

;; ANSWER SECTION:
www.gittip.com.     3600    IN  CNAME   nara-9076.herokussl.com.

;; Query time: 324 msec
;; SERVER: 50.112.128.56#53(50.112.128.56)
;; WHEN: Fri Sep 27 07:15:34 2013
;; MSG SIZE  rcvd: 66

This is as I would expect, so at the authoritative level it seems fine.

I suppose it's possible that someone has a poisoned cache (http://en.wikipedia.org/wiki/DNS_spoofing). Can you get a DNS lookup result from the person or people that are seeing that address using something like dig www.gittip.com?

@chadwhitacre
Copy link
Contributor Author

Thanks @aeden. It sounds like when @greggles first saw this he did an nslookup on www.gittip.com from two locations and got the same three IPs from both places (IRC). What those IPs were we don't know, but rereading the IRC logs is sounds like 23.21.209.136 was not one of them. @greggles can you confirm?

@greggles also reported, "and, oddly enough, if I used chrome I got the right page and firefox gave the wrong page."

@greggles
Copy link
Contributor

I'll try to find the values I got in my terminal backscroll. Did anyone else report this issue?

@chadwhitacre
Copy link
Contributor Author

@greggles Thanks. Yes, @zyegfryed reported the same issue from Switzerland.

@chadwhitacre
Copy link
Contributor Author

From @greggles [IRC]:

https://gist.github.com/greggles/2c12a5c3ac43de30fe7e

nslookup www.gittip.com
Server:     10.0.1.1
Address:    10.0.1.1#53

Non-authoritative answer:
www.gittip.com  canonical name = nara-9076.herokussl.com.
nara-9076.herokussl.com canonical name = elb002959-2580971.us-east-1.elb.amazonaws.com.
Name:   elb002959-2580971.us-east-1.elb.amazonaws.com
Address: 50.17.213.114
Name:   elb002959-2580971.us-east-1.elb.amazonaws.com
Address: 50.16.193.209
Name:   elb002959-2580971.us-east-1.elb.amazonaws.com
Address: 23.23.105.130

greggles@Gregs-MacBook-Pro-2 ~/workspace/cap (chp-working)[@cap.l]$ nslookup reaction.streamweaver.com
Server:     10.0.1.1
Address:    10.0.1.1#53

Non-authoritative answer:
Name:   reaction.streamweaver.com
Address: 23.21.209.136

greggles@Gregs-MacBook-Pro-2 ~/workspace/cap (chp-working)[@cap.l]$ dig www.gittip.com

; <<>> DiG 9.8.3-P1 <<>> www.gittip.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 7915
;; flags: qr rd ra; QUERY: 1, ANSWER: 5, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;www.gittip.com.            IN  A

;; ANSWER SECTION:
www.gittip.com.     2487    IN  CNAME   nara-9076.herokussl.com.
nara-9076.herokussl.com. 150    IN  CNAME   elb002959-2580971.us-east-1.elb.amazonaws.com.
elb002959-2580971.us-east-1.elb.amazonaws.com. 60 IN A 50.16.193.209
elb002959-2580971.us-east-1.elb.amazonaws.com. 60 IN A 23.23.105.130
elb002959-2580971.us-east-1.elb.amazonaws.com. 60 IN A 50.17.213.114

;; Query time: 100 msec
;; SERVER: 10.0.1.1#53(10.0.1.1)
;; WHEN: Thu Sep 26 08:31:39 2013
;; MSG SIZE  rcvd: 170

@greggles
Copy link
Contributor

https://gist.github.com/greggles/2c12a5c3ac43de30fe7e is some nslookup and dig action on both hostnames. I don't see anything obvious. I remember reviewing the IPs that nslookup showed me for gittip and not seeing the one from streamweaver.com. The problem was happening in Firefox 23 which had been updated using Firefox's built in updater but I hadn't yet restarted. When I restarted the problem went away. Looks like @zyegfryed was also using Firefox, but that is not too surprising and could be a coincidence.

@zyegfryed
Copy link

@greggles Yes, I'm using Firefox, but version 24.
FWIW, I'm no longer experiencing DNS issue with both Firefox 24 and Safari 6.0.5, here in Switzerland. And I'm having the same result both for nslookup and dig as you.

@chadwhitacre
Copy link
Contributor Author

Latest from Heroku over the weekend:

Hi,

I did a trace with the dnstrace tool and everyone's responding as expected. I have opened a case with AWS to see if they can offer any insight, will report back when I hear from them. In the meantime, please pass on any more reports of your custo mers seeing this issue.

Thanks!


AWS reports that IP has not been part of your ELB for at least the last 28 days, but it is part of another ELB. I've run dnstrace a few more times and have not seen that IP show up or anything else weird. I also don't see any more updates on the GitHub issue. Have you received reports of this still happening to anyone?

Thanks!


Doesn't look like this has happened to anyone else recently, going to close this ticket. Please let us know if there are any new reports.

@chadwhitacre
Copy link
Contributor Author

Unfortunately when I changed the email address for our Heroku account as part of #1516, I lost access to all of my support tickets at Heroku, including the one for this issue. I've opened a new support request with them about that, asking if I can get those linked over.

In the mean time it sounds like there's nothing left to be done here. If this recurs let's try to capture more info and reopen.

@chadwhitacre
Copy link
Contributor Author

Heroku support requests relinked to new account.

@chadwhitacre
Copy link
Contributor Author

Another possibly related case reported via Twitter. @huxi was delivered a cert that wasn't ours. O.O

@aeden
Copy link

aeden commented Apr 10, 2014

Delivered from us at DNSimple or from someone else?

Also, was the cert for your domain but not requested by you, or was it actually for another domain you do not manage?

If you want to take this up directly with me just email me at my DNSimple account.

@huxi
Copy link

huxi commented Apr 10, 2014

I got a browser warning that the certificate was actually for stitchfix.com. I didn't accept the cert and instead alerted the gittip Twitter account about it immediately. So I can't say (but guess) that gittip.com was indeed mapped to stitchfix.com at DNS level for a short time.
I use 8.8.8.8 as my DNS server.

@greggles
Copy link
Contributor

@aeden could you provide some troubleshooting steps someone should take next time this happens to help identify where the problem is?

One thing I notice now is that stitchfix seems to also be using Heroku and while the domain is registered at GoDaddy it uses dnsimple nameservers. So, both Heroku and dnsimple are used by both domains - doesn't really seem to help identify where the problem is.

nslookup www.stitchfix.com
Server: 10.0.1.1
Address: 10.0.1.1#53

Non-authoritative answer:
www.stitchfix.com canonical name = fukui-8799.herokussl.com.
fukui-8799.herokussl.com canonical name = elb015700-1149408683.us-east-1.elb.amazonaws.com.
Name: elb015700-1149408683.us-east-1.elb.amazonaws.com
Address: 54.225.190.147
Name: elb015700-1149408683.us-east-1.elb.amazonaws.com
Address: 184.72.243.222
Name: elb015700-1149408683.us-east-1.elb.amazonaws.com
Address: 174.129.211.118

@seanlinsley
Copy link
Contributor

Someone in IRC mentioned this same problem, except for www.teespring.com. They said the problem went away after a force-refresh.

@xnyhps
Copy link
Contributor

xnyhps commented Apr 10, 2014

That was me. I tried dig when it happened, but I can't tell whether this response is correct:

$ dig www.gittip.com
; <<>> DiG 9.8.3-P1 <<>> www.gittip.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 34889
;; flags: qr rd ra; QUERY: 1, ANSWER: 5, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;www.gittip.com.            IN  A

;; ANSWER SECTION:
www.gittip.com.     3424    IN  CNAME   nara-9076.herokussl.com.
nara-9076.herokussl.com. 1548   IN  CNAME   elb002959-2580971.us-east-1.elb.amazonaws.com.
elb002959-2580971.us-east-1.elb.amazonaws.com. 48 IN A 50.17.238.165
elb002959-2580971.us-east-1.elb.amazonaws.com. 48 IN A 50.16.250.190
elb002959-2580971.us-east-1.elb.amazonaws.com. 48 IN A 107.20.208.114

;; Query time: 14 msec
;; SERVER: 192.168.1.1#53(192.168.1.1)
;; WHEN: Thu Apr 10 18:08:26 2014
;; MSG SIZE  rcvd: 170

I saw this certificate: https://gist.github.com/xnyhps/4874584cb9d2b837d972

@aeden
Copy link

aeden commented Apr 10, 2014

nara-9076.herokussl.com is the correct hostname at Heroku as far as I can tell.

; <<>> DiG 9.8.3-P1 <<>> @ns1.dnsimple.com www.gittip.com
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 53793
;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0
;; WARNING: recursion requested but not available

;; QUESTION SECTION:
;www.gittip.com.            IN  A

;; ANSWER SECTION:
www.gittip.com.     3600    IN  CNAME   nara-9076.herokussl.com.

;; Query time: 77 msec
;; SERVER: 198.241.10.53#53(198.241.10.53)
;; WHEN: Thu Apr 10 18:33:17 2014
;; MSG SIZE  rcvd: 66

I have no idea how Heroku does their SSL host routing, but it does not seem to be a DNS issue AFAICT.

@chadwhitacre
Copy link
Contributor Author

@calvinhp is also seeing teespring.com. Reopening here and filing a support ticket with Heroku (login required).

@chadwhitacre chadwhitacre reopened this Apr 11, 2014
@chadwhitacre
Copy link
Contributor Author

Debugging with @calvinhp and @cyberdelia at Heroku booth at Pycon.

@chadwhitacre
Copy link
Contributor Author

Thinking it might be Firefox related?

@chadwhitacre
Copy link
Contributor Author

@calvinhp shift-refreshed and can't repro anymore. :-( But he was using Firefox.

@chadwhitacre
Copy link
Contributor Author

@craigkerstiens "Sounds like an intermittent ELB routing issue."

@chadwhitacre
Copy link
Contributor Author

@craigkerstiens "Ping me in a couple days if you don't hear anything else and I'll escalate with Amazon. If you're seeing this then other people are too and we can press them for more details."

@chadwhitacre
Copy link
Contributor Author

@jacobian "I'm almost positive it's an ELB bug but I need more evidence to take to Amazon."

@jacobian
Copy link

So I think this is a different issue this time: last time it looks like this was a DNS issue (gittip.com resolving to the wrong SSL endpoint). This time, it seems like what's happening is that the SSL endpoint is serving the wrong certificate. I've seen this a couple of times before, but I don't know exactly what the deal is or how to track it down.

One other idea a co-worker had: it's possible that this is a Firefox bug: https://bugzilla.mozilla.org/show_bug.cgi?id=151929

@jacobian
Copy link

@whit537 @calvinhp can either of you validate or contradict the hypothesis that this is a Firefox bug? The symptoms described in that bug seem like they could account for this problem, but if you've seen it in other browsers than it's definitely something else. I threw an absurd amount of requests at some SSL endpoints over night and was unable to get the wrong certificate - not that that proves anything, but it does suggest that the endpoints are OK and that it's a client issue.

@xnyhps
Copy link
Contributor

xnyhps commented Apr 12, 2014

FWIW, I was using Firefox.

@chadwhitacre
Copy link
Contributor Author

@jacobian I haven't seen it myself so I can neither confirm nor deny (as they say). @calvinhp Were you using Firefox?

@chadwhitacre
Copy link
Contributor Author

Closing. Consider reticketing if it happens again?

@chadwhitacre
Copy link
Contributor Author

@clone1018 Let's use #2586 and keep this closed. I think we decided above that the SSL misfire is different from the DNS issue that started this ticket.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

9 participants