Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ProactorEventLoop on Windows #36

Closed
josalhor opened this issue Dec 10, 2017 · 6 comments
Closed

ProactorEventLoop on Windows #36

josalhor opened this issue Dec 10, 2017 · 6 comments

Comments

@josalhor
Copy link

josalhor commented Dec 10, 2017

Hi,

I've had a few problems trying to implement asyncio and aiohttp into my script running out of sockets to perform the connection in SelectorEventLoop. I've then tried to use ProactorEventLoop on Windows that doesn't seem to not have this limitation. However when I try:

import asyncio
import aiohttp


async def getHeaders(url, session, sema):
    async with session:
        async with sema:
            try:
                async with session.head(url) as response:
                    try:
                        if "html" in response.headers["Content-Type"]:
                            return url, True
                        else:
                            return url, False
                    except:
                        return url, False
            except:
                return url, False


def removeUrlsWithoutHtml(setOfUrls, MAXitems):
    listOfUrls = list(setOfUrls)
    while(len(listOfUrls) != 0):
        blockurls = []
        print("URLS left to process: " + str(len(listOfUrls)))
        items = 0
        for num in range(0, len(listOfUrls)):
            if num < MAXitems:
                blockurls.append(listOfUrls[num - items])
                listOfUrls.remove(listOfUrls[num - items])
                items += 1
        loop = asyncio.ProactorEventLoop()
        asyncio.set_event_loop(loop)
        semaphoreHeaders = asyncio.Semaphore(50)
        session = aiohttp.ClientSession()
        data = loop.run_until_complete(asyncio.gather(*(getHeaders(url, session, semaphoreHeaders) for url in blockurls)))
        for header in data:
            if False == header[1]:
                setOfUrls.remove(header[0])

MAXitems = 10
setOfUrls = {'http://www.google.com', 'http://www.reddit.com'}
removeUrlsWithoutHtml(setOfUrls, MAXitems)

for link in list(setOfUrls):
    print(link)

Note the use of semaphore and chuncking to try to get around the selector limit issue that I face if I replace

        loop = asyncio.ProactorEventLoop()
        asyncio.set_event_loop(loop)

with:
loop = asyncio.get_event_loop()

With the current configuratioon it raises:

Exception ignored in: <bound method DNSResolver._sock_state_cb of <aiodns.DNSResolver object at 0x0616F830>>
Traceback (most recent call last):
  File "USER\AppData\Local\Programs\Python\Python36-32\lib\site-packages\aiodns\__init__.py", line 85, in _sock_state_cb
    self.loop.add_reader(fd, self._handle_event, fd, READ)
  File "USER\AppData\Local\Programs\Python\Python36-32\lib\asyncio\events.py", line 453, in add_reader
    raise NotImplementedError
NotImplementedError:

Note my direction path has been manually changed to USER

Python documentation says: https://docs.python.org/3/library/asyncio-eventloops.html#asyncio.ProactorEventLoop

add_reader() and add_writer() only accept file descriptors of sockets

Is aiodns not supported with ProactorEventLoop? Is this some type of weird bug? Is aiodns fully supported on Windows?

I can provide more info, but in case you need a little bit more background I've been derived here by @asvetlov in the following stack overflow question: https://stackoverflow.com/questions/47675410/python-asyncio-aiohttp-valueerror-too-many-file-descriptors-in-select-on-win

@asvetlov
Copy link
Member

I'm sorry but latest aiohttp doesn't use aiodns by default. You should make async dns resolver before creating client session. Thus your snippet looks incomplete.

@josalhor
Copy link
Author

josalhor commented Dec 10, 2017

You're right, for whatever reason I was running it in a machine with aiohttp 1.0.whatever. I still think this problem can be reproduced in the last version, I'll try later.

@asvetlov
Copy link
Member

Async DNS resolver was disabled by default in aiohttp 1.1: aio-libs/aiohttp#559

It is not 100% compatible with standard threading one.

To reproduce the functionality in newer versions explicitly install async resolver: https://docs.aiohttp.org/en/stable/client.html#resolving-using-custom-nameservers

@josalhor
Copy link
Author

josalhor commented Dec 10, 2017

Code:

import asyncio
import aiohttp
from aiohttp.resolver import AsyncResolver

async def getHeaders(url, sema):
    async with aiohttp.ClientSession(headers={'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:57.0) Gecko/20100101 Firefox/57.0'}, connector=aiohttp.TCPConnector(verify_ssl=False, resolver= AsyncResolver(nameservers=["8.8.8.8", "8.8.4.4"]))) as session:
        async with sema:
            try:
                async with session.head(url) as response:
                    try:
                        if "html" in response.headers["Content-Type"]:
                            return url, True
                        else:
                            return url, False
                    except:
                        return url, False
            except:
                return url, False


def removeUrlsWithoutHtml(setOfUrls, MAXitems):
    listOfUrls = list(setOfUrls)
    while(len(listOfUrls) != 0):
        blockurls = []
        print("URLS left to process: " + str(len(listOfUrls)))
        items = 0
        for num in range(0, len(listOfUrls)):
            if num < MAXitems:
                blockurls.append(listOfUrls[num - items])
                listOfUrls.remove(listOfUrls[num - items])
                items += 1
        loop = asyncio.ProactorEventLoop()
        asyncio.set_event_loop(loop)
        semaphoreHeaders = asyncio.Semaphore(50)
        data = loop.run_until_complete(asyncio.gather(*(getHeaders(url, semaphoreHeaders) for url in blockurls)))
        for header in data:
            if False == header[1]:
                setOfUrls.remove(header[0])

MAXitems = 10
setOfUrls = {'http://www.google.com', 'http://www.reddit.com'}
removeUrlsWithoutHtml(setOfUrls, MAXitems)

for link in list(setOfUrls):
    print(link)

From the original code I added a proper user agent, the implementation of Async resolver is basically a copy-paste of the documentation.

Got error:

Exception ignored in: <bound method DNSResolver._sock_state_cb of <aiodns.DNSResolver object at 0x06C17CF0>>
Traceback (most recent call last):
  File "USER\AppData\Local\Programs\Python\Python36-32\lib\site-packages\aiodns\__init__.py", line 85, in _sock_state_cb
    self.loop.add_reader(fd, self._handle_event, fd, READ)
  File "USER\AppData\Local\Programs\Python\Python36-32\lib\asyncio\events.py", line 453, in add_reader
    raise NotImplementedError
NotImplementedError:

Edit: From the original coded I also implemented the session inside the coroutine.

@josalhor
Copy link
Author

@asvetlov If I try to input a lot of urls, I aslo get another error:
Code with the input on setOfUrls:
Note I've disabled async resolver

import asyncio
import aiohttp
from aiohttp.resolver import AsyncResolver

async def getHeaders(url, sema):#resolver= AsyncResolver(nameservers=["8.8.8.8", "8.8.4.4"])
    async with aiohttp.ClientSession(headers={'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:57.0) Gecko/20100101 Firefox/57.0'}, connector=aiohttp.TCPConnector(verify_ssl=False)) as session:
        async with sema:
            try:
                async with session.head(url) as response:
                    try:
                        if "html" in response.headers["Content-Type"]:
                            return url, True
                        else:
                            return url, False
                    except:
                        return url, False
            except:
                return url, False


def removeUrlsWithoutHtml(setOfUrls, MAXitems):
    listOfUrls = list(setOfUrls)
    while(len(listOfUrls) != 0):
        blockurls = []
        print("URLS left to process: " + str(len(listOfUrls)))
        items = 0
        for num in range(0, len(listOfUrls)):
            if num < MAXitems:
                blockurls.append(listOfUrls[num - items])
                listOfUrls.remove(listOfUrls[num - items])
                items += 1
        loop = asyncio.ProactorEventLoop()
        asyncio.set_event_loop(loop)
        semaphoreHeaders = asyncio.Semaphore(50)
        data = loop.run_until_complete(asyncio.gather(*(getHeaders(url, semaphoreHeaders) for url in blockurls)))
        for header in data:
            if False == header[1]:
                setOfUrls.remove(header[0])

MAXitems = 10
setOfUrls = {'https://apis.google.com', 'https://www.google.com/calendar?tab=wc', 'https://accounts.google.com/ServiceLogin?hl=es&amp;passive=true&amp;continue=https://www.google.es/%3Fgfe_rd%3Dcr%26dcr%3D0%26ei%3DzBwtWszREYGZX47QkKgI%26gws_rd%3Dssl', 'https://www.google.com/gen_204?', 'https://www.google.es/webhp?hl=es&amp;dcr=0&amp;sa=X&amp;ved=0ahUKEwiprMzlrf_XAhXJaxQKHffvC9UQPAgD', 'https://play.google.com/?hl=es&amp;tab=w8', 'https://www.google.es/setprefs?sig=0_fm9MOZRAXNmSEF8OkKdOwopqi2M%3D&amp;hl=eu&amp;source=homepage&amp;sa=X&amp;ved=0ahUKEwiprMzlrf_XAhXJaxQKHffvC9UQ2ZgBCAo', 'https://books.google.es/bkshp?hl=es&amp;tab=wp', 'https://www.youtube.com/?gl=ES', 'https://adservice.google.es/adsid/google/ui', 'https://play.google.com/log?format=json', 'https://keep.google.com/', 'https://www.google.es/intl/es/options/', 'https://www.google.es/setprefs?sig=0_fm9MOZRAXNmSEF8OkKdOwopqi2M%3D&amp;hl=gl&amp;source=homepage&amp;sa=X&amp;ved=0ahUKEwiprMzlrf_XAhXJaxQKHffvC9UQ2ZgBCAk', 'https://translate.google.es/?hl=es&amp;tab=wT', 'https://consent.google.com?hl\\u003des\\u0026origin\\u003dhttps://www.google.es\\u0026continue\\u003dhttps://www.google.es/?gfe_rd%3Dcr%26dcr%3D0%26ei%3DzBwtWszREYGZX47QkKgI%26gws_rd%3Dssl\\u0026if\\u003d1\\u0026l\\u003d0\\u0026m\\u003d0\\u0026pc\\u003ds\\u0026wp\\u003d71', 'https://www.google.es/services/?subid=ww-ww-et-g-awa-a-g_hpbfoot1_1!o2&amp;utm_source=google.com&amp;utm_medium=referral&amp;utm_campaign=google_hpbfooter&amp;fg=1', 'https://www.google.com/?gfe_rd=cr&amp;dcr=0&amp;ei=zBwtWszREYGZX47QkKgI&amp;gws_rd=ssl,cr&amp;fg=1', 'https://www.blogger.com/?tab=wj', 'https://www.google.es/webhp?tab=ww', 'https://www.google.es/preferences?hl=es', 'https://www.google.es/preferences?hl=es&amp;fg=1', 'https://mail.google.com/mail/?tab=wm', 'https://consent.google.es?hl\\u003des\\u0026origin\\u003dhttps://www.google.es\\u0026continue\\u003dhttps://www.google.es/?gfe_rd%3Dcr%26dcr%3D0%26ei%3DzBwtWszREYGZX47QkKgI%26gws_rd%3Dssl\\u0026if\\u003d1\\u0026l\\u003d0\\u0026m\\u003d0\\u0026pc\\u003ds\\u0026wp\\u003d71', 'https://consent.google.com/status?continue=https://www.google.es&amp;pc=s&amp;timestamp=1512905932', 'https://www.google.es/setprefs?sig=0_fm9MOZRAXNmSEF8OkKdOwopqi2M%3D&amp;hl=ca&amp;source=homepage&amp;sa=X&amp;ved=0ahUKEwiprMzlrf_XAhXJaxQKHffvC9UQ2ZgBCAg', 'https://www.google.es/intl/es_es/about/?utm_source=google.com&amp;utm_medium=referral&amp;utm_campaign=hp-footer&amp;fg=1', 'https://consent.google.com?hl\\\\u003des\\\\u0026origin\\\\u003dhttps://www.google.es\\\\u0026continue\\\\u003dhttps://www.google.es/?gfe_rd%3Dcr%26dcr%3D0%26ei%3DzBwtWszREYGZX47QkKgI%26gws_rd%3Dssl\\\\u0026if\\\\u003d1\\\\u0026l\\\\u003d0\\\\u0026m\\\\u003d0\\\\u0026pc\\\\u003ds\\\\u0026wp\\\\u003d71\\', 'https://www.google.com/contacts/?hl=es&amp;tab=wC', 'https://maps.google.es/maps?hl=es&amp;tab=wl', 'http://schema.org/WebPage', 'https://docs.google.com/document/?usp=docs_alc', 'https://hangouts.google.com/', 'https://www.google.es/imghp?hl=es&amp;tab=wi', 'https://jmt17.google.com/log', 'http://www.google.es/shopping?hl=es&amp;tab=wf', 'https://www.google.es/intl/es_es/ads/?subid=ww-ww-et-g-awa-a-g_hpafoot1_1!o2&amp;utm_source=google.com&amp;utm_medium=referral&amp;utm_campaign=google_hpafooter&amp;fg=1'}
removeUrlsWithoutHtml(setOfUrls, MAXitems)

for link in list(setOfUrls):
    print(link)

Error:

Exception ignored in: <bound method _ProactorBasePipeTransport.__del__ of <_ProactorSocketTransport closing fd=-1 read=<_OverlappedFuture cancelled>>>
Traceback (most recent call last):
  File "USER\AppData\Local\Programs\Python\Python36-32\lib\asyncio\proactor_events.py", line 97, in __del__
    self.close()
  File "USER\AppData\Local\Programs\Python\Python36-32\lib\asyncio\proactor_events.py", line 84, in close
    self._loop.call_soon(self._call_connection_lost, None)
  File "USER\AppData\Local\Programs\Python\Python36-32\lib\asyncio\base_events.py", line 574, in call_soon
    self._check_closed()
  File "USER\AppData\Local\Programs\Python\Python36-32\lib\asyncio\base_events.py", line 357, in _check_closed
    raise RuntimeError('Event loop is closed')
RuntimeError: Event loop is closed

I'm well aware this is a separate error that should be discussed elsewhere because it's not in the scope of aiodns, I'm just pointing it out here in case both errors are on my end and are somehow (although unlikely) correlated

@saghul
Copy link
Contributor

saghul commented Dec 11, 2017

The API c-ares provides deals with low level fds, which is what aiodns in turn uses to function. If a given event loop implementation doesn't support those methods then aiodns cannot work.

@saghul saghul closed this as completed Dec 11, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants