-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Added Workers system for standalone tasks
### Added - New Workers system for task-specific data extraction - Base Worker class in core module - PricingResearchWorker implementation - Plugin integration (Serper, Jina, LiteLLM) - Automated pricing data extraction capabilities - Plan detection - Feature extraction - Price analysis - Subscriber limit detection ### Changed - Enhanced LiteLLM integration for structured data - Improved content extraction accuracy - Standardized worker output format ### Documentation - Added workers.md documentation - Updated plugin integration guides - Added pricing research examples This commit introduces a new Workers system that simplifies complex data extraction tasks by combining multiple plugins into focused, single-purpose executors. The initial implementation includes a PricingResearchWorker that can extract structured pricing data from any SaaS website.
- Loading branch information
Showing
7 changed files
with
412 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,103 @@ | ||
--- | ||
title: "Workers" | ||
publishedAt: "2024-11-10" | ||
updatedAt: "2024-11-10" | ||
summary: "Standalone task executors that combine multiple plugins for specific data extraction needs." | ||
kind: "detailed" | ||
--- | ||
|
||
## Overview | ||
Workers are specialized task executors that combine multiple plugins to perform specific data extraction and analysis tasks. Unlike workflows that chain multiple steps together, workers are focused on single, well-defined tasks that require coordination between multiple plugins. | ||
|
||
## Features | ||
- 🎯 Task-specific implementations | ||
- 🔄 Automated data extraction | ||
- 📊 Structured output | ||
- 🛠 Plugin integration | ||
- ⚡ Efficient processing | ||
|
||
## Available Workers | ||
|
||
### PricingResearchWorker | ||
Extracts structured pricing data from any SaaS website by combining: | ||
1. **Serper Web Search**: Finds pricing pages | ||
2. **Jina AI Reader**: Extracts clean content | ||
3. **LiteLLM**: Analyzes and structures pricing data | ||
|
||
#### Usage | ||
|
||
```python | ||
from pynions.workers import PricingResearchWorker | ||
async def analyze_pricing(): | ||
worker = PricingResearchWorker() | ||
result = await worker.execute({"domain": "example.com"}) | ||
print(json.dumps(result, indent=2)) | ||
``` | ||
|
||
|
||
#### Output Structure | ||
|
||
```json | ||
{ | ||
"domain": "example.com", | ||
"source": "https://example.com/pricing", | ||
"pricing": { | ||
"plans": ["plan names"], | ||
"pricing": { | ||
"plan_name": { | ||
"monthly_price": 0.0, | ||
"annual_price": 0.0, | ||
"features": ["feature list"], | ||
"limits": {"limit_type": "limit value"} | ||
} | ||
}, | ||
"currency": "USD" | ||
} | ||
} | ||
``` | ||
|
||
|
||
## Creating Custom Workers | ||
|
||
1. Inherit from base Worker class | ||
```python | ||
from pynions.core import Worker | ||
class CustomWorker(Worker): | ||
def init(self): | ||
# Initialize required plugins | ||
self.plugin1 = Plugin1() | ||
self.plugin2 = Plugin2() | ||
async def execute(self, input_data: Dict[str, Any]) -> Dict[str, Any]: | ||
# Implement your worker logic | ||
pass | ||
|
||
|
||
## Best Practices | ||
|
||
1. **Plugin Integration** | ||
- Initialize plugins in constructor | ||
- Handle plugin errors gracefully | ||
- Validate plugin responses | ||
|
||
2. **Data Processing** | ||
- Use structured input/output | ||
- Validate extracted data | ||
- Clean and normalize output | ||
|
||
3. **Error Handling** | ||
- Handle network timeouts | ||
- Validate input parameters | ||
- Provide meaningful error messages | ||
|
||
4. **Performance** | ||
- Minimize API calls | ||
- Process only required data | ||
- Use efficient data structures | ||
|
||
## Common Issues | ||
- API rate limits | ||
- Content extraction failures | ||
- Data validation errors | ||
- Network timeouts | ||
|
||
Need help? Check our [Debugging Guide](debugging.md) for solutions. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,24 @@ | ||
from abc import ABC, abstractmethod | ||
from typing import Dict, Any | ||
import logging | ||
|
||
|
||
class Worker(ABC): | ||
"""Base class for all Pynions workers""" | ||
|
||
def __init__(self, config: Dict[str, Any] = None): | ||
self.config = config or {} | ||
self.logger = logging.getLogger(self.__class__.__name__) | ||
|
||
@abstractmethod | ||
async def execute(self, input_data: Dict[str, Any]) -> Dict[str, Any]: | ||
"""Execute the worker's task""" | ||
pass | ||
|
||
def validate_input(self, input_data: Dict[str, Any]) -> bool: | ||
"""Validate input data""" | ||
return True # Override in subclasses | ||
|
||
def validate_output(self, output: Dict[str, Any]) -> bool: | ||
"""Validate output data""" | ||
return True # Override in subclasses |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,101 @@ | ||
import asyncio | ||
import json | ||
from typing import Dict, Any | ||
from pynions.core import Worker | ||
from pynions.plugins.serper import SerperWebSearch | ||
from pynions.plugins.jina import JinaAIReader | ||
from pynions.plugins.litellm_plugin import LiteLLM | ||
|
||
|
||
class PricingResearchWorker(Worker): | ||
"""Worker for extracting pricing data from a website""" | ||
|
||
def __init__(self): | ||
self.serper = SerperWebSearch({"max_results": 1}) | ||
self.jina = JinaAIReader() | ||
self.llm = LiteLLM( | ||
{ | ||
"model": "gpt-4o-mini", | ||
"temperature": 0.1, | ||
"max_tokens": 1000, | ||
} | ||
) | ||
|
||
async def execute(self, input_data: Dict[str, Any]) -> Dict[str, Any]: | ||
"""Extract and structure pricing data from a domain""" | ||
domain = input_data["domain"] | ||
print(f"\n🔍 Analyzing pricing for {domain}") | ||
|
||
try: | ||
# Get pricing page URL | ||
search_result = await self.serper.execute( | ||
{"query": f"site:{domain} pricing"} | ||
) | ||
if not search_result.get("organic"): | ||
return None | ||
|
||
url = search_result["organic"][0]["link"] | ||
print(f"📄 Found pricing page: {url}") | ||
|
||
# Extract content | ||
content = await self.jina.execute({"url": url}) | ||
if not content or not content.get("content"): | ||
return None | ||
|
||
print(f"✅ Extracted {len(content['content'])} characters") | ||
|
||
# Analyze with LLM - using full content | ||
response = await self.llm.execute( | ||
{ | ||
"messages": [ | ||
{ | ||
"role": "system", | ||
"content": """You are a precise pricing data extractor. Your task is to extract EXACT pricing information from websites. | ||
Instructions: | ||
1. Only include information that is explicitly stated in the content | ||
2. Use exact prices, features, and limits as shown | ||
3. Do not make assumptions or fill in missing data | ||
4. If a value is not found, exclude it from the output | ||
Output format: | ||
{ | ||
"plans": ["exact plan names found"], | ||
"pricing": { | ||
"plan_name": { | ||
"monthly_price": exact_number_from_content, | ||
"annual_price": exact_number_from_content, | ||
"features": ["exact feature text"], | ||
"limits": {"exact limit name": "exact limit value"} | ||
} | ||
}, | ||
"currency": "exact currency code found" | ||
}""", | ||
}, | ||
{ | ||
"role": "user", | ||
"content": f"Extract the pricing structure from this content. Only include explicitly stated information:\n\n{content['content']}", | ||
}, | ||
] | ||
} | ||
) | ||
|
||
# Parse response | ||
pricing_data = json.loads(response["content"]) | ||
return {"domain": domain, "source": url, "pricing": pricing_data} | ||
|
||
except Exception as e: | ||
print(f"❌ Error: {str(e)}") | ||
return None | ||
|
||
|
||
# Test | ||
if __name__ == "__main__": | ||
|
||
async def test(): | ||
worker = PricingResearchWorker() | ||
result = await worker.execute({"domain": "rewardful.com"}) | ||
if result: | ||
print("\nPricing Data:") | ||
print(json.dumps(result, indent=2)) | ||
|
||
asyncio.run(test()) |
Oops, something went wrong.