Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pulsar Python client leaks filehandles after destruction if a consumer is created on a partitioned topic #14588

Closed
zbentley opened this issue Mar 7, 2022 · 5 comments
Labels
help wanted lifecycle/stale type/bug The PR fixed a bug or issue reported a bug

Comments

@zbentley
Copy link
Contributor

zbentley commented Mar 7, 2022

Describe the bug
In Python, I have to reconnect Client objects a lot. Usually due to bugs. client.close and client.shutdown are sufficient to disconnect a client object. However, a disconnected client object can't be reconnected, so instead I re-instantiate a brand new Client object to replace it.

While doing this, some of my production services started crashing due to file descriptor exhaustion.

It turns out that, if you have called .subscribe at least once, closing a consumer object and closing a client do not close all file handles opened on the system. Those handles (a fair number of them--one or two per worker thread per partition, it looks like) leak, consuming resources and putting the process closer to exhaustion.

This seems to only happen if:

  • The topic is partitioned.
  • The topic is not automatically created by the subscription, and is pre-existing.

To Reproduce

  1. Create a partitioned, persistent topic with at least 2 partitions.
  2. Update the below code to use the name of the topic you created.
  3. Run the below code.
  4. Observe the filehandle diff printed. Observe that the number of open filehandles on the system grows over time.

Expected behavior

  • Creating then .closeing a Consumer object should result in net zero file handle changes on the system.
  • Creating then .closeing a Producer object should result in net zero file handle changes on the system.
  • Creating then closeing a Client object should result in net zero file handle changes on the system.

Code to reproduce

from contextlib import contextmanager

import logging
from pulsar import Client, ConsumerType

TOPIC_NAME = "persistent://your topic name here"

def get_open_filehandles():
    import os
    import subprocess
    lines = subprocess.run(['lsof', '-p', str(os.getpid())], capture_output=True)
    return sorted(lines.stdout.decode().split('\n'))


@contextmanager
def diff_file_handles():
    initial = set(get_open_filehandles())
    try:
        yield
    finally:
        final = set(get_open_filehandles())
        for fh in final:
            if fh not in initial:
                print("NEW FILEHANDLE", fh)
        for fh in initial:
            if fh not in final:
                print("CLOSED FILEHANDL", fh)
        print("Handles before:", len(initial), "Handles after:", len(final))


def consume_and_toss():
    client = Client(
        service_url='pulsar://localhost',
        logger=logging.getLogger(),
        io_threads=1,
        message_listener_threads=1,
    )
    sub = client.subscribe(
        topic=TOPIC_NAME,
        subscription_name='testsub',
        receiver_queue_size=1,
        max_total_receiver_queue_size_across_partitions=1,
        consumer_type=ConsumerType.KeyShared,
        replicate_subscription_state_enabled=False,

    )
    sub.close()
    del sub
    client.shutdown()
    client.close()
    del client


def main():
    # Prime caches, load dylibs:
    consume_and_toss()

    for _ in range(4):
        print("ITERATING")
        with diff_file_handles():
            consume_and_toss()

if __name__ == '__main__':
    main()

Desktop (please complete the following information):

  • OS: MacOS monterey.
  • Pulsar: 2.9.1 via Homebrew.
  • Client: 2.9.1.
@zbentley
Copy link
Contributor Author

#14585 partially addresses this (fewer handles are leaked), but doesn't completely address it; leaks still occur.

@github-actions
Copy link

The issue had no activity for 30 days, mark with Stale label.

@Jaudouard
Copy link

We observe the same thing with the python client, when we do consumer.unsubscribe(), several file descriptors are left open.

It looks like an issue with the c++ client, and it is very blocking for us

@github-actions
Copy link

The issue had no activity for 30 days, mark with Stale label.

@tisonkun
Copy link
Member

Closed as stale. The development of the Python client has been permanently moved to http://github.com/apache/pulsar-client-python. Please open an issue there if it's still relevant.

@tisonkun tisonkun closed this as not planned Won't fix, can't repro, duplicate, stale Dec 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted lifecycle/stale type/bug The PR fixed a bug or issue reported a bug
Projects
None yet
Development

No branches or pull requests

4 participants