Skip to content
This repository has been archived by the owner on Feb 8, 2018. It is now read-only.

Twitter user avatars are broken #1936

Closed
seanlinsley opened this issue Jan 22, 2014 · 21 comments · Fixed by #1989
Closed

Twitter user avatars are broken #1936

seanlinsley opened this issue Jan 22, 2014 · 21 comments · Fixed by #1989

Comments

@seanlinsley
Copy link
Contributor

screen shot 2014-01-22 at 4 48 16 pm

This is currently being discussed in IRC.

@chadwhitacre
Copy link
Contributor

Appears to be a Twitter issue: https://dev.twitter.com/discussions/25385.

@chadwhitacre
Copy link
Contributor

Here's a script to convert si0 to pbs if we need to go that route (take out the rollback):

BEGIN;

    UPDATE elsewhere SET user_info=user_info || (
        'profile_image_url_https'=>(
            'https://pbs.' || substring(user_info->'profile_image_url_https' from 11)
        )
    ) WHERE platform='twitter';

    SELECT user_info->'profile_image_url_https' FROM elsewhere WHERE platform='twitter';

    ROLLBACK;

END;

@chadwhitacre
Copy link
Contributor

Still borken.

@seanlinsley
Copy link
Contributor Author

@whit537 do all URLs in the database use si0 currently?

@clone1018
Copy link
Contributor

@seanlinsley
Copy link
Contributor Author

Is there any correlation between the 4431 users with a pbs URL? Are they new?

Are there any other subdomains in use?

@galuszkak
Copy link
Contributor

@seanlinsley I really don't know. But I know that there is more subdomains on twtimg (like a0).

I sugest to change @whit537 SQL and make a script that do this. (pseudo code)

#check image working
if get(profile_image_url_https).code in [403, 404]:
     #check if changed link works
     if get(updated_to_pbs_profile_image_url_https).code in [200, 301]
            #if works then update URL
            update_url()

@chadwhitacre
Copy link
Contributor

Some of these are coming back, others aren't. MaxCDN and Bountysource are back, UkuleleRod isn't. Could be because the first two have logged in since this started.

@chadwhitacre
Copy link
Contributor

Confirmed: MaxCDN and Bountysource are now on pbs, while UkuleleRod is still on si0. I checked a backup from last week and all three were on si0 last week.

@chadwhitacre
Copy link
Contributor

What's the harm in switching everyone who is si0 to pbs, per #1936 (comment)? I suppose we're assuming that all si0s are busted and all pbss are good. We could/should verify that assumption before pulling the trigger.

@chadwhitacre
Copy link
Contributor

#!/usr/bin/env python                                                                                       
import requests, sys                                                                                        

for i, line in enumerate(open('twimg.csv')):                                                                
    url = line.strip()                                                                                      
    response = requests.get(url)                                                                            
    if response.status_code != 200:                                                                         
        print response.status_code, url                                                                     
    sys.stdout.flush()

I'm running that script against 18,960 URLs. Will report back ...

@clone1018
Copy link
Contributor

Just don't do it from production :D

@chadwhitacre
Copy link
Contributor

:-)

[gittip] $ grep "403 " twimg.log | wc -l
   13123
[gittip] $ grep "404 " twimg.log | wc -l
     534
[gittip] $ grep "si0" twimg.log | wc -l
   13123
[gittip] $ grep "pbs" twimg.log | wc -l
     534
[gittip] $ wc -l twimg.log
   13657 twimg.log
[gittip] $ echo 13123 534 + p | dc
13657
[gittip] $

The script died before reaching 18,960, not sure why. Also, why are the pbs ones 404 instead of 200?

@chadwhitacre
Copy link
Contributor

Blech. This sucks.

@chadwhitacre
Copy link
Contributor

The right ways to fix this are:

Neither of those is trivial.

@chadwhitacre
Copy link
Contributor

There's a script in #1989 to fix this as a one-off. Spinning up a DO VPS to run it (using the payday image) ...

@chadwhitacre
Copy link
Contributor

The script died mysteriously (forgot to redirect stderr :/ ) after processing 4036 accounts. Before rerunning it's probably worth rewriting to use users/lookup (100 at a time) instead of users/show (one at a time), per #1989 (comment).

@chadwhitacre
Copy link
Contributor

Rewrote the script to use lookup and rerunning it now. It still has a 5-second sleep between hits. If we were under 18,000 we could fit inside one 15 minute window, but we're at ~19,000.

@chadwhitacre
Copy link
Contributor

This should be done in 15-20 minutes.

@chadwhitacre
Copy link
Contributor

Done! 💃

@chadwhitacre
Copy link
Contributor

screen shot 2014-02-07 at 1 35 37 pm

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants