Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Add Symbol Filtering for Spot Market in BinanceBulkDownloader #21

Open
LenWilliamson opened this issue Jul 26, 2024 · 0 comments

Comments

@LenWilliamson
Copy link

Hi there,

First off, thank you for maintaining such a helpful and well-constructed repository. I have a feature request that I believe could enhance the utility of the library for users who do not require data for all symbols from Binance.

Feature Request

I propose adding the ability to filter for specific symbols (such as BTCUSDT or ETHUSDT) in the spot market. This would allow users to download data only for the symbols they are interested in, rather than fetching data for all available symbols.

Proposed Implementation

I have extended the BinanceBulkDownloader class by adding a symbols parameter to the __init__ method. Here is the modified code:

class BinanceBulkDownloader:
    def __init__(
        self,
        destination_dir=".",
        data_type="klines",
        data_frequency="1m",
        asset="um",
        timeperiod_per_file="daily",
        symbols=None
    ) -> None:
        """
        :param destination_dir: Destination directory for downloaded files
        :param data_type: Type of data to download (klines, aggTrades, etc.)
        :param data_frequency: Frequency of data to download (1m, 1h, 1d, etc.)
        :param asset: Type of asset to download (um, cm, spot, option)
        :param timeperiod_per_file: Time period per file (daily, monthly)
        :param symbols: List of symbols to download (e.g., ['BTCUSDT', 'ETHUSDT'])
        """
        self._destination_dir = destination_dir
        self._data_type = data_type
        self._data_frequency = data_frequency
        self._asset = asset
        self._timeperiod_per_file = timeperiod_per_file
        self._symbols = symbols if symbols is not None else []
        self.marker = None
        self.is_truncated = True
        self.downloaded_list = []

    def run_download(self):
        """
        Download concurrently
        :return: None
        """
        print(f"[bold blue]Downloading {self._data_type}[/bold blue]")

        while self.is_truncated:
            file_list_generator = self._get_file_list_from_s3_bucket(
                self._build_prefix(), self.marker, self.is_truncated
            )
            if self._data_type in self._DATA_FREQUENCY_REQUIRED_BY_DATA_TYPE:
                file_list_generator = [
                    prefix
                    for prefix in file_list_generator
                    if prefix.count(self._data_frequency) == 2 and self._is_prefix_in_set_of_requested_symbols(prefix)
                ]
            else:
                file_list_generator = [
                    prefix
                    for prefix in file_list_generator
                    if self._is_prefix_in_set_of_requested_symbols(prefix)
                ]
            for prefix_chunk in track(
                self.make_chunks(file_list_generator, self._CHUNK_SIZE),
                description="Downloading",
            ):
                with ThreadPoolExecutor() as executor:
                    executor.map(self._download, prefix_chunk)
                self.downloaded_list.extend(prefix_chunk)

    def _is_prefix_in_set_of_requested_symbols(self, prefix: str) -> bool:
        """
        Check if the prefix matches any of the requested symbols.
        :param prefix: The prefix string to check
        :return: True if the prefix matches any symbol in the list, otherwise False
        """
        if not self._symbols:
            return True
        return any(symbol in prefix for symbol in self._symbols)

Summary

  • New parameter: symbols added to the __init__ method.
  • Helper method: _is_prefix_in_set_of_requested_symbols filters the file list based on the specified symbols.

Next Steps

Please let me know if this feature aligns with the direction of your project. I am happy to implement a draft and submit a pull request if this proposal is accepted.

Thank you for considering this enhancement.

Best regards

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant