Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Public scraping methods fail due to login screen on Instagram on production builds #24

Closed
wjx0820 opened this issue May 6, 2019 · 157 comments
Labels
bug Something isn't working help wanted Extra attention is needed

Comments

@wjx0820
Copy link

wjx0820 commented May 6, 2019

I use this plugin in my demo and could not work, it says could not fetch instagram posts, no Gatsby nodes generated(I did't use any token and just want to Public scraping for posts). So I cloned your repo, cd into /example, yarn install and run 'npm run develop'. And then it seems like the same problem happened. Wonder to know if i am missing something? Thanks!

@oorestisime
Copy link
Owner

This is odd.
I pushed a new yarn lock file in the example. i forgot to do so. can you pull yarn install and try again?

@wjx0820
Copy link
Author

wjx0820 commented May 6, 2019

Thanks for the quick reply but it still not work...
Wait a long time and said could not fetch instagram posts...
Is it a temporary situation?

@oorestisime
Copy link
Owner

It works for me. I am not sure what is happening locally for you. Are you able to go on instagram? was it working for you and just broke or never worked?

@wjx0820
Copy link
Author

wjx0820 commented May 6, 2019

I can go to instagram in the browser. But still not work.

Error message like these:
Could not fetch instagram posts. Error status Error: write EPROTO 4472038848:error:1408F10B:SSL routines:ssl3_get_record:wrong version number:../deps/openssl/openssl/ssl/record/ssl3_record.c:252:

warning The gatsby-source-instagram plugin has generated no Gatsby nodes. Do you need it?

My location is China, it must be the issue with GFW...I tried use proxy in the terminal, then npm run develop, but still wait for a long time then failed...😭

@oorestisime
Copy link
Owner

:( I am really sorry but i am not sure how i can help here :( You will need to try with a vpn or something.

@calpa
Copy link

calpa commented May 6, 2019

@Jexxie Instagram maybe blocked in China according to the "law", so you may need to find other way.

This should not be an problem of this plugin.

@oorestisime
Copy link
Owner

Closing this! feel free to reopen if you think there's something to be done in the scope of the plugin!

@tinoguti
Copy link

I am having trouble with this same description when I try to build my Gatsby app on AWS but when I try locally somehow it works. I thought I solved the problem by updating npm packages and I had 2 successful builds but today it is not building. Any help on how to solve this problem?
Here's a copy of my log with the part where everything seems to start failing:

2020-05-10T16:23:33.187Z [INFO]: success createSchemaCustomization - 0.073s
2020-05-10T16:23:33.560Z [WARNING]: warning
                                    Could not fetch instagram posts. Error status TypeError: Cannot read property '0' of undefined
2020-05-10T16:23:33.708Z [WARNING]: warning The gatsby-source-instagram plugin has generated no Gatsby nodes. Do you need it?
2020-05-10T16:23:33.708Z [INFO]: success source and transform nodes - 0.521s
2020-05-10T16:23:33.966Z [INFO]: success building schema - 0.257s
2020-05-10T16:23:34.017Z [INFO]: success createPages - 0.049s
2020-05-10T16:23:34.082Z [INFO]: success createPagesStatefully - 0.065s
2020-05-10T16:23:34.082Z [INFO]: success onPreExtractQueries - 0.000s
2020-05-10T16:23:34.111Z [INFO]: success update schema - 0.028s
2020-05-10T16:23:34.446Z [WARNING]: error There was an error in your GraphQL query:
                                    Cannot query field "allInstaNode" on type "Query".
                                    If you don't expect "allInstaNode" to exist on the type "Query" it is most likely a typo.
                                    However, if you expect "allInstaNode" to exist there are a couple of solutions to common problems:
                                    - If you added a new data source and/or changed something inside gatsby-node.js/gatsby-config.js, please try a restart of your development server
                                    - The field might be accessible in another subfield, please try your query in GraphiQL and use the GraphiQL explorer to see which fields you can query and what shape they have
                                    - You want to optionally use your field "allInstaNode" and right now it is not used anywhere. Therefore Gatsby can't infer the type and add it to the GraphQL schema. A quick fix is to add a least one entry with that field ("dummy content")
                                    It is recommended to explicitly type your GraphQL schema if you want to use optional fields. This way you don't have to add the mentioned "dummy content". Visit our docs to learn how you can define the schema for "Query":
                                    https://www.gatsbyjs.org/docs/schema-customization/#creating-type-definitions
2020-05-10T16:23:34.449Z [INFO]: failed extract queries from components - 0.337s
2020-05-10T16:23:34.514Z [WARNING]: npm
2020-05-10T16:23:34.515Z [WARNING]: ERR! code ELIFECYCLE
                                    npm ERR! errno 1
2020-05-10T16:23:34.515Z [WARNING]: npm

@tinoguti
Copy link

Now I am intrigued. I've just redeployed my app and it worked. Any idea why this seems to randomly fail? I wouldn't like to have a few failed builds hoping the next one will be the one every time I need to deploy.

@oorestisime
Copy link
Owner

Hey there, kind of hard to know without reproduction :/

Is it with public scraping or with the graph api?
Where does this run? amplify?

@tinoguti
Copy link

tinoguti commented May 10, 2020

Hi. Yes it's running on amplify and with public scrapping, with just an username added to the plugin config parameters.

GraphQL query looks like this:
`
allInstaNode(limit: 8) {

  edges {
      node {
          id
          username
          caption
          localFile {
            childImageSharp {
              fixed(width: 500, height: 500) {
                ...GatsbyImageSharpFixed
              }
            }
          }
      }
  }
}

`
It works well with gatsby devleop and gatsby build locally. But having "random" build fails on amplify.

@oorestisime
Copy link
Owner

Yeah query isn't the issue. the error you showed above is when it couldn't get the instagram posts. maybe something wrong with instagram during that particular time? does it still happening?

I just triggered another rebuild on netlify for the example app and seems to be working fine

@LarsBehrenberg
Copy link

I seem to have the same issue. Just a week ago it seemed to have worked fine.
I am deploying my site with Netlify. With the gatsby develop or build command locally I don't run into issues and even with the NetlifyCLI running the local netlify build command everything is alright. But as soon as I push to Netlify the build fails.

7:18:17 PM: $ yarn build
7:18:17 PM: yarn run v1.22.4
7:18:17 PM: $ gatsby clean && gatsby build
7:18:18 PM: 
7:18:18 PM: info Deleting .cache, public
7:18:18 PM: info Successfully deleted directories
7:18:20 PM: 
7:18:20 PM: success open and validate gatsby-configs - 0.067s
7:18:22 PM: 
7:18:22 PM: success load plugins - 1.694s
7:18:22 PM: 
7:18:22 PM: success onPreInit - 0.016s
7:18:22 PM: success delete html and css files from previous builds - 0.018s
7:18:22 PM: 
7:18:22 PM: success initialize cache - 0.012s
7:18:22 PM: 
7:18:22 PM: success copy gatsby files - 0.047s
7:18:22 PM: 
7:18:22 PM: success onPreBootstrap - 0.010s
7:18:22 PM: 
7:18:22 PM: success createSchemaCustomization - 0.012s
7:18:23 PM: 
7:18:23 PM: warning
7:18:23 PM: Could not fetch instagram posts. Error status TypeError: Cannot read property '0' of undefined
7:18:23 PM: success source and transform nodes - 1.476s
7:18:24 PM: 
7:18:24 PM: success building schema - 0.578s
7:18:24 PM: 
7:18:24 PM: success createPages - 0.240s
7:18:24 PM: success createPagesStatefully - 0.162s
7:18:24 PM: 
7:18:24 PM: success onPreExtractQueries - 0.001s
7:18:24 PM: success update schema - 0.084s
7:18:25 PM: error There was an error in your GraphQL query:
7:18:25 PM: Cannot query field "allInstaNode" on type "Query".
7:18:25 PM: If you don't expect "allInstaNode" to exist on the type "Query" it is most likely a typo.
7:18:25 PM: However, if you expect "allInstaNode" to exist there are a couple of solutions to common problems:
7:18:25 PM: - If you added a new data source and/or changed something inside gatsby-node.js/gatsby-config.js, please try a restart of your development server
7:18:25 PM: - The field might be accessible in another subfield, please try your query in GraphiQL and use the GraphiQL explorer to see which fields you can query and what shape they have
7:18:25 PM: - You want to optionally use your field "allInstaNode" and right now it is not used anywhere. Therefore Gatsby can't infer the type and add it to the GraphQL schema. A quick fix is to add a least one entry with that field ("dummy content")
7:18:25 PM: It is recommended to explicitly type your GraphQL schema if you want to use optional fields. This way you don't have to add the mentioned "dummy content". Visit our docs to learn how you can define the schema for "Query":
7:18:25 PM: https://www.gatsbyjs.org/docs/schema-customization/#creating-type-definitions
7:18:25 PM: not finished Generating image thumbnails - 1.265s
7:18:25 PM: failed extract queries from components - 0.823s

@oorestisime
Copy link
Owner

I am seeing this now on netlify as well. locally works fine. I am investigating right now

@oorestisime
Copy link
Owner

Well now it went through on netlify. I think maybe network issue? I can't find any way to reproduce locally so not sure what i can do here :/

@LarsBehrenberg
Copy link

I am really not sure what is happening here, but it seems like this is not the plugins fault? I deployed the same built a couple times on Netlify every time clearing cache before building. The first two times everything works fine and then the 3rd time it breaks.

warning
11:15:23 AM: Could not fetch instagram user. Error status TypeError: Cannot read property '0' of undefined
11:15:23 AM: 
11:15:23 AM: error "gatsby-source-instagram" threw an error while running the sourceNodes lifecycle:
11:15:23 AM: Cannot read property 'id' of null
11:15:23 AM:   74 |   return {
11:15:23 AM:   75 |     type: params.type,
11:15:23 AM: > 76 |     id: datum.id,
11:15:23 AM:      |               ^
11:15:23 AM:   77 |     full_name: datum.full_name,
11:15:23 AM:   78 |     biography: datum.biography,
11:15:23 AM:   79 |     edge_followed_by: datum.edge_followed_by,
11:15:23 AM: 
11:15:23 AM: 
11:15:23 AM: 
11:15:23 AM:   TypeError: Cannot read property 'id' of null
11:15:23 AM:   
11:15:23 AM:   - gatsby-node.js:76 createUserNode
11:15:23 AM:     [repo]/[gatsby-source-instagram]/gatsby-node.js:76:15
11:15:23 AM:   
11:15:23 AM:   - gatsby-node.js:91 processDatum
11:15:23 AM:     [repo]/[gatsby-source-instagram]/gatsby-node.js:91:49
11:15:23 AM:   
11:15:23 AM:   - gatsby-node.js:125 
11:15:23 AM:     [repo]/[gatsby-source-instagram]/gatsby-node.js:125:16
11:15:23 AM:   
11:15:23 AM:   - Array.map
11:15:23 AM:   
11:15:23 AM:   - gatsby-node.js:123 Object.exports.sourceNodes
11:15:23 AM:     [repo]/[gatsby-source-instagram]/gatsby-node.js:123:29
11:15:23 AM:   
11:15:23 AM:   - task_queues.js:97 processTicksAndRejections
11:15:23 AM:     internal/process/task_queues.js:97:5
11:15:23 AM:   
11:15:23 AM: 
11:15:23 AM: not finished source and transform nodes - 1.120s

Btw, I am not using an Instagram API, maybe this makes a difference?

@pul87
Copy link

pul87 commented May 31, 2020

I'm having the same problem on Netlify, seems an issue related to the platform, on development it works.

@LarsBehrenberg
Copy link

Anything we can do about it? Some workaround?

@oorestisime
Copy link
Owner

Well the issue is not with the plugin afaict. it is happening when scraping the page doesn't work. But i can't find out whether this is an issue on Netlify (much more likely since it never happens in dev) itself or if Instagram is testing things out. After all the public scraping is extemely dependent on their raw html code :)

Only thing i can susggest if use the API. it worked the whole time i was testing.

Has anybody tried to contact Netlify support to see if anything is off in the network?

@oorestisime
Copy link
Owner

I am re-opening this for visibility among folks checking the issues.

@oorestisime oorestisime reopened this Jun 1, 2020
@oorestisime oorestisime changed the title The gatsby-source-instagram plugin has generated no Gatsby nodes Sometimes builds on Netlify fail with inability to fetch nodes Jun 1, 2020
@dmcreis
Copy link

dmcreis commented Jun 1, 2020

Same problem happening here. I tried a couple of times today and the build is failing. is there any workaround for this? @oorestisime thanks for all your replies :)

@tonilaukka
Copy link

tonilaukka commented Jun 1, 2020

I'm having the same issue although my builds fail randomly and hitting redeploy helps.

This is the error message:

11:55:31 PM: error There was an error in your GraphQL query:
11:55:31 PM: - Unknown field 'allInstaNode' on type 'Query'.
11:55:31 PM: failed extract queries from components - 0.464s

@xanderjl
Copy link

xanderjl commented Jun 2, 2020

I've also been running into this issue. Noticed a string of failed builds on the deploy dashboard with the same error. I hope your Netlify post brings attention to the problem!

@LarsBehrenberg
Copy link

If you could join the netlify post and let people know you have had the same issue, maybe it will gain more attention. Would really appreciate it! Thanks so much here already for all the help!

@dbertella
Copy link

dbertella commented Jan 21, 2021

@samason can you share some code example on what you've done? Can you fetch instagram as not authenticated user?
Ok I see this right?

https://instagram.com/graphql/query/?query_id=17888483320059182&variables={"id":"${username}","first":100,"after":null}

I think I'll have a go, today my buids start to fail again even if I was using the id method

@dbertella
Copy link

dbertella commented Jan 21, 2021

Ok just in case someone want to follow this path this is my attempt to fetch all the instagram phost from the client

// https://github.com/oorestisime/gatsby-source-instagram/blob/master/src/instagram.js
const igUrl = (userId) =>
  `https://instagram.com/graphql/query/?query_id=17888483320059182&variables={"id":"${userId}","first":12,"after":null}`;

...

  const [allInsta, setAllInsta] = useState([]);
  useEffect(() => {
    fetch(igUrl(IG_ID))
      .then((j) => j.json())
      .then(({ data }) => {
        const photos = [];
        data.user.edge_owner_to_timeline_media.edges.forEach((edge) => {
          if (edge.node) {
            photos.push({
              id: edge.node.id,
              thumbnail: edge.node.thumbnail_resources[2].src, // here I'm getting some data I need to display later, but more are contained in the response
              caption: edge.node.edge_media_to_caption.edges[0].node.text,
            });
          }
        });
        setAllInsta(photos);
      });
  }, []);

@LarsBehrenberg
Copy link

I encountered this issue as well last year. And I don't think it's this plugin's fault, but rather down to Instagram doing weird stuff. In the meantime, I also went with fetching the posts manually and wrote a blog post about how to do so.
Here are some links to explanation in the blog post, code, and working demo.

@owenhoskins
Copy link

owenhoskins commented Jan 26, 2021

Here are somelinks to explanation in the blog post, code, and working demo.

Hey @LarsBehrenberg, thanks for sharing! The method you describe is using the same end-point as the current Public scraping for posts:

https://www.instagram.com/graphql/query?query_id=17888483320059182&variables={"id":"${INSTAGRAM_ID}","first":${PHOTO_COUNT},"after":null}

I see that it is working in the demo. But I wonder if it would also suffer from the same "rate-limiting" login wall that the we've been hitting lately.

In my use-case, for an artist's representation agency, I am scrapping the posts of 60+ Instagram accounts and after a few builds within a short time-frame we face the login wall. I wonder if the dynamic component would suffer a similar fate if the page had many visitors and thus many requests to end-point?

@LarsBehrenberg
Copy link

@owenhoskins Good question! That might as well be the case. I haven't run into any issues using this method yet and been using it for the last 8 months or so. So I guess you'll just have to try out and see what happens... Sorry for not being more helpful :S

@radscheit
Copy link

@LarsBehrenberg Thanks for your effort and I like the approach. But currently, your demo isn't working any longer – at least for me:

Screenshot 2021-02-16 at 08 47 20

@LarsBehrenberg
Copy link

@LarsBehrenberg Thanks for your effort and I like the approach. But currently, your demo isn't working any longer – at least for me:

Screenshot 2021-02-16 at 08 47 20

Note sure, just checked and it's working fine for me.

@radscheit
Copy link

radscheit commented Feb 16, 2021

@LarsBehrenberg Thanks for your effort and I like the approach. But currently, your demo isn't working any longer – at least for me:
Screenshot 2021-02-16 at 08 47 20

Note sure, just checked and it's working fine for me.

I can reproduce these errors by using the same IP address, but it works by changing my IP address via VPN. So is the consumption of that API depending on the end-users rate limit?

@VT-Web-Development
Copy link

image
I followed a lot of guides and even did this
and used the end id but I still get the same error. I just want to show my posts in my website nothing more.
I spent my whole day looking into this, but still cant figure it out. My version is 0.8.0

That exactly what happened when I deployed to Netlify.

@tijsluitse
Copy link

Same here @VT-Web-Development! were you able to fix it? Having this issue on multiple websites now..

@VT-Web-Development
Copy link

Same here @VT-Web-Development! were you able to fix it? Having this issue on multiple websites now..

No - I am not using it for now. I will try another solutions.

@joshua-isaac
Copy link

Has anyone found a solution to this or a solution in general on how to get Instagram data in a react/gatsby app?

@beamercola
Copy link

@joshua-isaac The only thing that works for me is Zapier > Airtable

@VT-Web-Development
Copy link

@joshua-isaac The only thing that works for me is Zapier > Airtable

But you have to pay for it.

@Aarekaz
Copy link

Aarekaz commented Mar 26, 2021

Any updates? I have been having this issue and its preventing deploying.
image

@oorestisime
Copy link
Owner

Hi, just to give a heads up.

There is nothing to do here for the plugin. Its on instagram to stop their paywalls i cant do anything :(
The reason i am keeping this ticket open is because it has a lot of information for anyone who wishes to read.

@SignetOHara
Copy link

Hi everyone, thanks for this thread.

So just to clarify, if we use the Graph API method rather than public scraping are there still issues? If not, does anyone know of any up to date tutorial/guides on how to set up? Instagram/FB seem to want to make it as convoluted as possible...

I'm using public scraping during development and it usually works as long as the dev server isn't restarted numerous times (which fits with what everyone is saying). Haven't yet deployed to Netlify though.

@mcljs
Copy link

mcljs commented Apr 13, 2021

image
Hello everyone today, in the morning I installed the gatsby-source-instagram plugin and it helped me, when I go to make other changes right now at night it no longer grabs me, I get this and it has not succeeded in grabbing my nodes. Somebody could help me?

@tijsluitse
Copy link

Hey there, I found out that my access token was no longer valid. After creating a new version via the steps explained here: https://www.gatsbyjs.com/plugins/gatsby-source-instagram/#instagram-graph-api-token, the public scraping worked again. You can check out your token here: https://developers.facebook.com/tools/debug/accesstoken/.

@matteocarpi
Copy link

hey @tijsluitse thanks for the links!

I was facing this issue and I followed that procedure and was able to get an access token. Though unfortunately I wasn't able to get a token that never expires :/. Simply when I click "debug" for the second time I get told it still expires in three months.

The only thing that I had to do different is using v10.0 of the Graph API, as I had no other option given in the Explorer.

@kanlanc
Copy link

kanlanc commented May 25, 2021

@matteocarpi I have the same thing that I cannot access any other API version, but I cant find the permissions mentioned, were they available for you?

@matteocarpi
Copy link

hey @kanlanc yes, but it was tricky.

When you click on "Generate Access Token" and a new window pops up, you should be able to click on "Edit settings" or something similar, and allow the app to access the facebook page connected to the instagram account (if that makes sense). at that point, once you confirm and go back to the API Explorer, you should be able to see all the permissions you need.

Hope that helps :)

@knhn1004
Copy link

knhn1004 commented Jul 1, 2021

same, my clients bothered me recently and I still cannot resolve the issue
I would have to remove the "Posts" area for their website temporarily
seems like an issue with the connection between netlify and instagram

@kekshibata
Copy link

So, public scraping is no longer available...?
It works for me on local development, but on gatsby cloud build it fails

@reaganchisholm
Copy link
Contributor

So, public scraping is no longer available...?
It works for me on local development, but on gatsby cloud build it fails

Public scraping will fail more frequently on hosting sites like Netlify and Gatsby Cloud because there are likely many sites doing public scraping on Instagram via those hosts. Instagram then rate limits those host servers by returning a login page, which causes most builds using public scraping to fail on those hosts.

@oorestisime
Copy link
Owner

Closing this as i dont think we can do anything more here!

@oorestisime oorestisime unpinned this issue Nov 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests