Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CAPTCHA does not go away #9

Closed
GiulioGiorcelli opened this issue Mar 7, 2018 · 2 comments
Closed

CAPTCHA does not go away #9

GiulioGiorcelli opened this issue Mar 7, 2018 · 2 comments

Comments

@GiulioGiorcelli
Copy link

Hi there!

I'm using your software for a personal project and when Zillow throws up a CAPTCHA it takes a really long time and dozens of iterations to get rid of it. I basically complete the CAPTCHA and the page reloads a new one. It goes on for about 10/15 minutes no matter how many times I do it. Do know why this is happening? Is there a workaround to this issue?

Thanks,
Giulio

@wwetzel
Copy link

wwetzel commented Mar 7, 2018

Hi Giulio,

I was using this code to access Zillow for a while and would run into a similar issue. As ChrisMuir points out - scraping is against Zillow's ToS, so they are throwing a CAPTCHA to prevent bots like this one from scraping content. I haven't tried to defeat a CAPTCHA yet - the whole point is to not be beatable by bots.

Using multiple computers - throwing up a bunch of linux virtual machines, basically you're suspicious because of how much searching you're doing and the way the bot interacts with the web page - it's very not human. I don't know how Zillow tracks this but some googling would give you an idea.

Easy solutions:

  1. You can try manually monitoring the machine and interceding when a CAPTCHA appears - manually click around for a while and the site will figure out you are a person. You'd probably have to add code to track how far the bot got in its search before getting stuck.

  2. Use multiple computers and / or IP address to try and fool Zillow

@ChrisMuir
Copy link
Owner

Hi @GiulioGiorcelli, I don't have any good answers for you on this. I honestly haven't had much interest in this project/repo for a while now, so when I added the CAPTCHA code I didn't test it much.....I think I recall what you described happening to me once? And I didn't investigate it at the time. For me, almost all of the instances of CAPTCHA were easy to manually handle (code pauses, I beat the CAPTCHA once, it goes away, code resumes).

The short answer is that once the CAPTCHA appears, it's out of my hands. I have no interest in developing the current CAPTCHA code beyond what it currently is, which is simply to pause code execution indefinitely until the CAPTCHA been manually handled.

Hi @wwetzel, thanks for jumping in with your input and info!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants