-
-
Notifications
You must be signed in to change notification settings - Fork 534
Googler no results on 3.9 #306
Comments
@zmwangx I've been noticing this today too. |
I'm having the same results. |
This is in fact the same problem as #299, and it's getting a bit ridiculous. The markup is pretty damn hard to parse as discussed before. Again, we wait for maybe 48hrs. If things don't go back to normal by then, we move to a modern UA, and update the parser. Until then, here's a patch (with modern UA) that works: diff --git a/googler b/googler
index 460350e..20698c7 100755
--- a/googler
+++ b/googler
@@ -102,7 +102,7 @@ COLORMAP = {k: '\x1b[%sm' % v for k, v in {
'x': '0', 'X': '1', 'y': '7', 'Y': '7;1',
}.items()}
-USER_AGENT = 'googler/%s (like MSIE)' % _VERSION_
+USER_AGENT = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36'
text_browsers = ['elinks', 'links', 'lynx', 'w3m', 'www-browser']
@@ -2192,13 +2192,18 @@ class GoogleParser(object):
# Skip smart cards.
continue
try:
- h3 = div_g.select('h3.r')
- a = h3.select('a')
- title = a.text
- mime = div_g.select('.mime')
- if mime:
- title = mime.text + ' ' + title
- url = self.unwrap_link(a.attr('href'))
+ h3 = div_g.select('div.r h3')
+ if h3:
+ title = h3.text
+ url = self.unwrap_link(h3.parent.attr('href'))
+ else:
+ h3 = div_g.select('h3.r')
+ a = h3.select('a')
+ title = a.text
+ mime = div_g.select('.mime')
+ if mime:
+ title = mime.text + ' ' + title
+ url = self.unwrap_link(a.attr('href'))
matched_keywords = []
abstract = ''
for childnode in div_g.select('.st').children:
@@ -2233,10 +2238,12 @@ class GoogleParser(object):
# Search instead for ...
spell_orig = tree.select("span.spell_orig")
if spell_orig:
- self.autocorrected = True
- self.showing_results_for = next(
+ showing_results_for_link = next(
filter(lambda el: el.tag == "a", spell_orig.previous_siblings()), None
- ).text
+ )
+ if showing_results_for_link:
+ self.autocorrected = True
+ self.showing_results_for = showing_results_for_link.text
# No results found for ...
# Results for ...:
@@ -2252,14 +2259,14 @@ class GoogleParser(object):
self.filtered = tree.select('p#ofr') is not None
# Unwraps /url?q=http://...&sa=...
- # May raise ValueError.
+ # TODO: don't unwrap if URL isn't in this form.
@staticmethod
def unwrap_link(link):
qs = urllib.parse.urlparse(link).query
try:
url = urllib.parse.parse_qs(qs)['q'][0]
except KeyError:
- raise ValueError(link)
+ return link
else:
if "://" in url:
return url If it doesn't work, show me the markup and I'll fix it. |
The patch works fine for me. Is there a way to auto-detect if the results are in markup? What if we use the FF user agent and this patch. Looks like we are detecting whether the results are in new markup or earlier. |
The patch provided by @zmwangx works for vanilla searches. Can you also please provide the patch for retrieving news (-N argument) results? It gives the same "No results" error. Thank you. |
Problem still not resolved. I'll turn the patch into a PR soonish and we'll probably need to cut a release.
If you're talking about the Yeah, we can possibly maintain compatibility with the older layout we were targeting, but since the older layout appears to be gone, there's no point. Note that we used this
I propose we use a Chrome UA. It is said that FF is more likely to be reCAPTCHA'ed than Chrome (although it's not clear whether that's based on UA detection). @ajithkumar-natarajan I did test my patch with |
Please go ahead. The Chrome UA sounds good. |
I'll make a release this evening if things are good. |
Tracking update: the patch works for me so far. |
Hi, I know I opened this ticket but I will not have access to my affected workstation until late in the week. Just want to make sure you don't wait on me for testing! :-D |
No problem! Looks like it's reproducible globally. Just came across a post on HN that google is no longer working on Lynx. |
Just FYI, but the patch works for me too running OSX. |
The patch works here on Ubuntu 18.04 as well. |
Fixes jarun#306, hopefully. Not refined (even left a TODO), not extensively tested against edge cases.
Fixes jarun#306, hopefully. Not refined (even left a TODO), not extensively tested against edge cases.
Turns out I have more pressing matters and didn't have time to refine and test the patch... Instead of delaying the fix further, I just pushed the patch to #307. I'll refine it and rewrite our currently useless testing system later, but let's have a working release first... |
I'll make a release today. |
Output of
googler -d
:Link to the response body : https://gist.github.com/amitai/c840955133e1938d4369eafdbd1232a7
Details of operating system, Python version used, terminal emulator and shell;
Python 3.6.8, ubuntu 18.04.3, bash 4.4.20(1)
The text was updated successfully, but these errors were encountered: