Threadpool worker threads exit on socket errors #358

cameronbrunner · 2021-01-15T22:20:23Z

❓ I'm submitting a ...

🐞 bug report
🐣 feature request
❓ question about the decisions made in the repository

🐞 Describe the bug. What is the current behavior?
When cheroot threads handle a conn that throws a socket error the thread exits. The easiest way to get this to occur is to create an HTTPS cherrypy server with a cheroot backend and then issue HTTP requests. SSL errors will be detected for each connection and a worker thread will exit. Once all worker threads have exited cherrypy becomes unresponsive.

The issue appears to be due to this commit for v8.1.0...

6f9a8f9

when breaking out the code from 'tick' into 'from_server_socket'. Previously any socket error or other error now handled by 'from_server_socket' used to prevent the conn being put into the queue. Now these events cause a None to be put on the queue. The None is equivalent to a threadpool shutdown event thus leading to the exit of the processing worker thread. In master the code the lines of note are:

Returning none on an SSL error (many other ways this routine can return None...)
https://github.com/cherrypy/cheroot/blob/master/cheroot/connections.py#L291
Allowing a None to be assigned to a conn
https://github.com/cherrypy/cheroot/blob/master/cheroot/connections.py#L218
Adding the passed in conn on the queue
https://github.com/cherrypy/cheroot/blob/master/cheroot/server.py#L2057
Shutdown Requests defined as None
https://github.com/cherrypy/cheroot/blob/master/cheroot/workers/threadpool.py#L38
Thread exiting if a Shutdown Request is received:
https://github.com/cherrypy/cheroot/blob/master/cheroot/workers/threadpool.py#L117

❓ What is the motivation / use case for changing the behavior?

This issue leads to many scenarios where a server using cheroot will have less than 'min' worker threads running and very easily could have 0 worker threads running. In general ThreadPools should provide some level of guarantee that the number of running / active threads lies somewhere between the min and max configured values.

💡 To Reproduce

Steps to reproduce the behavior:

Setup an HTTPS server with a thread pool of 10 listening on port 8443.
Validate functionality with a curl https://$(hostname -f)
Make 10 HTTP requests to the server curl http://$(hostname-f)
Validate that curl https://$(hostname -f) hangs.

Example... Note the 10 second delay on curl prior to breaking out of the hang.

[1:47 PM] Cameron Brunner
    
[root@ip-10-1-0-64 ~]# curl https://$(hostname -f):8443/ > /dev/null
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   513  100   513    0     0   3841      0 --:--:-- --:--:-- --:--:--  3857
[root@ip-10-1-0-64 ~]# for i in $(seq 1 10); do curl http://$(hostname -f):8443; done
curl: (56) Recv failure: Connection reset by peer
curl: (56) Recv failure: Connection reset by peer
curl: (56) Recv failure: Connection reset by peer
curl: (56) Recv failure: Connection reset by peer
curl: (56) Recv failure: Connection reset by peer
curl: (56) Recv failure: Connection reset by peer
curl: (56) Recv failure: Connection reset by peer
curl: (56) Recv failure: Connection reset by peer
curl: (56) Recv failure: Connection reset by peer
curl: (56) Recv failure: Connection reset by peer
[root@ip-10-1-0-64 ~]# curl https://$(hostname -f):8443/ > /dev/null
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:--  0:00:10 --:--:--     0

💡 Expected behavior
Threads should never exit or threads should be recreated after they exit.

📋 Details

📋 Environment

Cheroot version: 8.5.0
CherryPy version: 18.6.0
Python version: 3.6.12
OS: Centos 7
Browser: [all ]

📋 Additional context
I think an appropriate simple fix would be to check the return value from _from_server_socket and not call process_conn if it is None.

Changed showed below:
https://github.com/cherrypy/cheroot/blob/master/cheroot/connections.py#L216

                if conn is self.server:
                    # New connection
                    new_conn = self._from_server_socket(self.server.socket)
                    if new_conn is not None:
                        self.server.process_conn(new_conn)

The text was updated successfully, but these errors were encountered:

cameronbrunner · 2021-01-15T22:21:18Z

@dtougas

cameronbrunner · 2021-01-15T22:23:16Z

It would be a much larger change, but might be worth it, to have some thread maintenance to check threads that might have exited. Wrapping the worker run code to catch all exceptions would be another option to consider.

The `_from_server_socket routine` may return a None which should not be sent to workers to process. for cherrypy#358

This change makes sure that only valid connections are processed. The `_from_server_socket` routine may return a `None` when the attempt to establish the connection goes wrong. Like if a client tries to speak plain text HTTP to HTTPS socket. Earlier, this was causing `None` to be forwarded to workers to processes making them to exit. Eventually, there would be no workers left in the pool and the server would reset any attempts to establish new connections. Ref #358

pavelbrych · 2021-04-26T21:37:28Z

Hi, we just run into similar issue in our project, see the traceback bellow. Check for not-empty connection does not help here.

This SSLError in parse_request() will trigger _conditional_error(), but subsequent exception in simple_response() is not handled and results in interruption of WorkerThread.run() method.

When all worker-threads are stopped, the server became unresponsive.

I see no harm in the proposed solution to catch all exceptions related to connection inside WorkerThread.run() method — it should not be worse than leaving WorkerThread instance in unusable state, what do you think?

2021-04-26 20:37:13 WARNING  [26/Apr/2021:20:37:13] ENGINE socket.error 1
Traceback (most recent call last):
  File "C:\Users\Pavel Brych\AppData\Local\Programs\Python\Python38-32\lib\site-packages\cheroot\server.py", line 1277, in communicate
    req.parse_request()
  File "C:\Users\Pavel Brych\AppData\Local\Programs\Python\Python38-32\lib\site-packages\cheroot\server.py", line 706, in parse_request
    success = self.read_request_line()
  File "C:\Users\Pavel Brych\AppData\Local\Programs\Python\Python38-32\lib\site-packages\cheroot\server.py", line 747, in read_request_line
    request_line = self.rfile.readline()
  File "C:\Users\Pavel Brych\AppData\Local\Programs\Python\Python38-32\lib\site-packages\cheroot\server.py", line 304, in readline
    data = self.rfile.readline(256)
  File "C:\Users\Pavel Brych\AppData\Local\Programs\Python\Python38-32\lib\_pyio.py", line 557, in readline
    b = self.read(nreadahead())
  File "C:\Users\Pavel Brych\AppData\Local\Programs\Python\Python38-32\lib\_pyio.py", line 536, in nreadahead
    readahead = self.peek(1)
  File "C:\Users\Pavel Brych\AppData\Local\Programs\Python\Python38-32\lib\_pyio.py", line 1129, in peek
    return self._peek_unlocked(size)
  File "C:\Users\Pavel Brych\AppData\Local\Programs\Python\Python38-32\lib\_pyio.py", line 1136, in _peek_unlocked
    current = self.raw.read(to_read)
  File "C:\Users\Pavel Brych\AppData\Local\Programs\Python\Python38-32\lib\socket.py", line 669, in readinto
    return self._sock.recv_into(b)
  File "C:\Users\Pavel Brych\AppData\Local\Programs\Python\Python38-32\lib\ssl.py", line 1241, in recv_into
    return self.read(nbytes, buffer)
  File "C:\Users\Pavel Brych\AppData\Local\Programs\Python\Python38-32\lib\ssl.py", line 1099, in read
    return self._sslobj.read(len, buffer)
ssl.SSLError: [SSL] internal error (_ssl.c:2623)

Exception in thread CP Server Thread-17:
Traceback (most recent call last):
  File "C:\Users\Pavel Brych\AppData\Local\Programs\Python\Python38-32\lib\site-packages\cheroot\server.py", line 1277, in communicate
    req.parse_request()
  File "C:\Users\Pavel Brych\AppData\Local\Programs\Python\Python38-32\lib\site-packages\cheroot\server.py", line 706, in parse_request
    success = self.read_request_line()
  File "C:\Users\Pavel Brych\AppData\Local\Programs\Python\Python38-32\lib\site-packages\cheroot\server.py", line 747, in read_request_line
    request_line = self.rfile.readline()
  File "C:\Users\Pavel Brych\AppData\Local\Programs\Python\Python38-32\lib\site-packages\cheroot\server.py", line 304, in readline
    data = self.rfile.readline(256)
  File "C:\Users\Pavel Brych\AppData\Local\Programs\Python\Python38-32\lib\_pyio.py", line 557, in readline
    b = self.read(nreadahead())
  File "C:\Users\Pavel Brych\AppData\Local\Programs\Python\Python38-32\lib\_pyio.py", line 536, in nreadahead
    readahead = self.peek(1)
  File "C:\Users\Pavel Brych\AppData\Local\Programs\Python\Python38-32\lib\_pyio.py", line 1129, in peek
    return self._peek_unlocked(size)
  File "C:\Users\Pavel Brych\AppData\Local\Programs\Python\Python38-32\lib\_pyio.py", line 1136, in _peek_unlocked
    current = self.raw.read(to_read)
  File "C:\Users\Pavel Brych\AppData\Local\Programs\Python\Python38-32\lib\socket.py", line 669, in readinto
    return self._sock.recv_into(b)
  File "C:\Users\Pavel Brych\AppData\Local\Programs\Python\Python38-32\lib\ssl.py", line 1241, in recv_into
    return self.read(nbytes, buffer)
  File "C:\Users\Pavel Brych\AppData\Local\Programs\Python\Python38-32\lib\ssl.py", line 1099, in read
    return self._sslobj.read(len, buffer)
ssl.SSLError: [SSL] internal error (_ssl.c:2623)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\Pavel Brych\AppData\Local\Programs\Python\Python38-32\lib\threading.py", line 932, in _bootstrap_inner
    self.run()
  File "C:\Users\Pavel Brych\AppData\Local\Programs\Python\Python38-32\lib\site-packages\cheroot\workers\threadpool.py", line 125, in run
    keep_conn_open = conn.communicate()
  File "C:\Users\Pavel Brych\AppData\Local\Programs\Python\Python38-32\lib\site-packages\cheroot\server.py", line 1306, in communicate
    self._conditional_error(req, '500 Internal Server Error')
  File "C:\Users\Pavel Brych\AppData\Local\Programs\Python\Python38-32\lib\site-packages\cheroot\server.py", line 1349, in _conditional_error
    req.simple_response(response)
  File "C:\Users\Pavel Brych\AppData\Local\Programs\Python\Python38-32\lib\site-packages\cheroot\server.py", line 1115, in simple_response
    self.conn.wfile.write(EMPTY.join(buf))
  File "C:\Users\Pavel Brych\AppData\Local\Programs\Python\Python38-32\lib\site-packages\cheroot\makefile.py", line 438, in write
    res = super().write(val, *args, **kwargs)
  File "C:\Users\Pavel Brych\AppData\Local\Programs\Python\Python38-32\lib\site-packages\cheroot\makefile.py", line 36, in write
    self._flush_unlocked()
  File "C:\Users\Pavel Brych\AppData\Local\Programs\Python\Python38-32\lib\site-packages\cheroot\makefile.py", line 45, in _flush_unlocked
    n = self.raw.write(bytes(self._write_buf))
  File "C:\Users\Pavel Brych\AppData\Local\Programs\Python\Python38-32\lib\socket.py", line 687, in write
    return self._sock.send(b)
  File "C:\Users\Pavel Brych\AppData\Local\Programs\Python\Python38-32\lib\ssl.py", line 1173, in send
    return self._sslobj.write(data)
ssl.SSLEOFError: EOF occurred in violation of protocol (_ssl.c:2473)

thezoggy · 2021-12-08T23:49:33Z

Not we see this with our app when older ssl requests come in from random scanners or local apps that still tried ssl rather than tls. After like 5 requests in a row cherrypy crashes

webknjaz · 2024-03-12T18:56:56Z

I see no harm in the proposed solution to catch all exceptions related to connection inside WorkerThread.run() method — it should not be worse than leaving WorkerThread instance in unusable state, what do you think?

Thanks for the suggestion! I missed this but also came to the same conclusion while looking into it more.

There are exceptions to handle better on other levels too, though.

This patch aims to prevent a situation when the working threads are killed by arbitrary exceptions that bubble up to their entry point layers that aren't handled properly or at all. This was causing DoS in many situations, including TLS errors and attempts to close the underlying sockets erroring out. Fixes cherrypy#358 Ref cherrypy#375 Resolves cherrypy#365

A DoS would happen in many situations, including TLS errors and attempts to close the underlying sockets erroring out. This patch aims to prevent a situation when the worker threads are killed by arbitrary exceptions that bubble up to their entry point layers that aren't handled properly or at all. PR #649 Fixes #358 Fixes #354 Ref #310 Ref #346 Ref #375 Ref #599 Ref #641 Resolves #365

cameronbrunner added bug Something is broken triage labels Jan 15, 2021

cameronbrunner added a commit to cameronbrunner/cheroot that referenced this issue Jan 15, 2021

Only process valid connections

6325dd5

The `_from_server_socket routine` may return a None which should not be sent to workers to process. for cherrypy#358

cameronbrunner mentioned this issue Jan 15, 2021

Address #358: Only process valid connections #359

Merged

15 tasks

webknjaz added reproducer: missing This PR or issue lacks code, which reproduce the problem described or clearly understandable STR and removed triage labels Jan 17, 2021

webknjaz pushed a commit to cameronbrunner/cheroot that referenced this issue Jan 18, 2021

Only process valid connections

5923b44

The `_from_server_socket routine` may return a None which should not be sent to workers to process. for cherrypy#358

AdamaTheHutt mentioned this issue Jul 14, 2021

Windows error 10022 kills threads #375

Open

3 tasks

siris mentioned this issue Apr 6, 2022

Qualys security scanner kills all cheroot worker threads, hangs server, eats all available file descriptors #365

Closed

15 tasks

webknjaz mentioned this issue Mar 12, 2024

🐛 Handle stray exceptions in worker threads #649

Merged

16 tasks

webknjaz closed this as completed in 7b34907 Mar 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Threadpool worker threads exit on socket errors #358

Threadpool worker threads exit on socket errors #358

cameronbrunner commented Jan 15, 2021 •

edited by webknjaz

Loading

cameronbrunner commented Jan 15, 2021

cameronbrunner commented Jan 15, 2021

pavelbrych commented Apr 26, 2021 •

edited by webknjaz

Loading

thezoggy commented Dec 8, 2021

webknjaz commented Mar 12, 2024

Threadpool worker threads exit on socket errors #358

Threadpool worker threads exit on socket errors #358

Comments

cameronbrunner commented Jan 15, 2021 • edited by webknjaz Loading

cameronbrunner commented Jan 15, 2021

cameronbrunner commented Jan 15, 2021

pavelbrych commented Apr 26, 2021 • edited by webknjaz Loading

thezoggy commented Dec 8, 2021

webknjaz commented Mar 12, 2024

cameronbrunner commented Jan 15, 2021 •

edited by webknjaz

Loading

pavelbrych commented Apr 26, 2021 •

edited by webknjaz

Loading