A high-performance blockchain event indexing and data processing pipeline that uses Hypersync to efficiently process and store Ethereum event data.
cherry/
├── src/ # Source code
│ ├── config/ # Configuration parsing
│ │ ├── __init__.py
│ │ └── parser.py # Config file parser
│ ├── ingesters/ # Data ingestion
│ │ ├── __init__.py
│ │ ├── base.py # Base ingester class
│ │ ├── factory.py # Ingester factory
│ │ └── providers/ # Data source providers
│ │ └── hypersync.py # Hypersync ingester
│ ├── processors/ # Data processing
│ │ ├── __init__.py
│ │ └── hypersync.py # Hypersync data processor
│ ├── schemas/ # Data schemas
│ │ ├── __init__.py
│ │ ├── base.py # Schema converter
│ │ └── blockchain_schemas.py
│ ├── types/ # Type definitions
│ │ ├── __init__.py
│ │ ├── data.py # Data container
│ │ └── hypersync.py # Hypersync types
│ ├── utils/ # Utilities
│ │ ├── __init__.py
│ │ ├── logging_setup.py
│ │ ├── schema_converter.py
│ │ └── generate_hypersync_query.py
│ └── writers/ # Data writers
│ ├── __init__.py
│ ├── base.py # Base writer class
│ ├── parquet.py # Parquet writer
│ ├── postgres.py # PostgreSQL writer
│ ├── s3.py # S3/MinIO writer
│ └── writer.py # Writer manager
├── data/ # Output data directory
├── logs/ # Application logs
├── state/ # Stream state files
├── docker-compose/ # Docker configurations
├── config.yaml # Main configuration
├── main.py # Application entry point
├── requirements.txt # Python dependencies
└── README.md # Documentation
- Python 3.10 or higher
- Docker and Docker Compose
- MinIO (for local S3-compatible storage)
- Clone the repository and go to the project root:
git clone https://github.com/steelcake/cherry.git
cd cherry
- Create and activate a virtual environment:
# Create virtual environment (all platforms)
python -m venv .venv
# Activate virtual environment
# For Windows with git bash:
source .venv/Scripts/activate
# For macOS/Linux:
source .venv/bin/activate
- Install dependencies:
pip install -r requirements.txt
- Set up environment variables:
- Create a
.env
file in the project root - Add your Hypersync API token:
- Create a
HYPERSYNC_API_TOKEN=your_token_here
- Start MinIO server (for local S3 storage):
# Navigate to docker-compose directory
cd docker-compose
# Start MinIO using docker-compose
docker-compose up -d
# Return to project root
cd ..
Default credentials:
- Access Key: minioadmin
- Secret Key: minioadmin
- Console URL: http://localhost:9001
Note: The MinIO service will be automatically configured with the correct ports and volumes as defined in the docker-compose.yml file.
-
Configure event streams:
- Open
config.yaml
- Adjust block ranges, event filters, and batch sizes as needed
- Configure output settings (S3/local parquet)
- Open
-
Run the indexer:
python main.py
data/
├── events/
│ ├── approval/
│ │ ├── YYYYMMDD_HHMMSS_startblock_endblock.parquet
│ │ ├── ...
│ │ └── YYYYMMDD_HHMMSS_startblock_endblock.parquet
│ └── transfer/
│ ├── YYYYMMDD_HHMMSS_startblock_endblock.parquet
│ ├── ...
│ └── YYYYMMDD_HHMMSS_startblock_endblock.parquet
├── blocks/
│ ├── approval/
│ │ ├── YYYYMMDD_HHMMSS_startblock_endblock.parquet
│ │ ├── ...
│ │ └── YYYYMMDD_HHMMSS_startblock_endblock.parquet
│ └── transfer/
│ ├── YYYYMMDD_HHMMSS_startblock_endblock.parquet
│ ├── ...
│ └── YYYYMMDD_HHMMSS_startblock_endblock.parquet
blockchain-data/
├── events/
│ ├── approval/
│ │ ├── YYYYMMDD_HHMMSS_startblock_endblock.parquet
│ │ ├── ...
│ │ └── YYYYMMDD_HHMMSS_startblock_endblock.parquet
│ └── transfer/
│ ├── YYYYMMDD_HHMMSS_startblock_endblock.parquet
│ ├── ...
│ └── YYYYMMDD_HHMMSS_startblock_endblock.parquet
├── blocks/
│ ├── approval/
│ │ ├── YYYYMMDD_HHMMSS_startblock_endblock.parquet
│ │ ├── ...
│ │ └── YYYYMMDD_HHMMSS_startblock_endblock.parquet
│ └── transfer/
│ ├── YYYYMMDD_HHMMSS_startblock_endblock.parquet
│ ├── ...
│ └── YYYYMMDD_HHMMSS_startblock_endblock.parquet
Access via:
- MinIO Console: http://localhost:9001
- S3 Endpoint: http://localhost:9000
-
Check application logs:
- Located in
logs/
directory - Format:
blockchain_etl_YYYYMMDD_HHMMSS.log
- Located in
-
Monitor processing progress:
- Console output shows real-time processing stats
- Log files contain detailed processing information
-
View processed data:
- Local: Check
data/
directory - S3: Access MinIO console at http://localhost:9001
- Data is organized by event type and timestamp
- Each file contains events from a specific block range
- Local: Check
The project maintains processing state in the state/
directory:
state/
├── approval_stream.json
├── transfer_stream.json
└── block_stream.json
These files track the last processed block for each stream and enable resume functionality.
-
If no data is being processed:
- Verify your Hypersync API token
- Check block range configuration
- Ensure event filters are correctly set
-
If MinIO connection fails:
- Verify MinIO is running (
docker ps
) - Check credentials in config.yaml
- Ensure ports 9000 and 9001 are available
- Verify MinIO is running (
-
For other issues:
- Check the latest log file in
logs/
directory - Verify configuration in config.yaml
- Ensure all requirements are installed correctly
- Check Docker logs:
docker-compose logs minio
- Check the latest log file in