-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
reorganize project configuration and data structure
- Move config files to /pynions/config/ - Consolidate settings.json - Move .env to config directory - Add workflow status types config - Improve data organization - Add raw/ and output/ directories - Implement project-based content structure - Add configurable workflow status types - Update utils.py - Add project-aware save functions - Add status validation - Improve file naming consistency - Add documentation - Add data-organization.md - Update changelog.md Breaking Changes: - Configuration files moved to /pynions/config/ - New data directory structure required - Updated save_result() function signature
- Loading branch information
Showing
12 changed files
with
381 additions
and
32 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -99,4 +99,8 @@ site/ | |
*.ipynb | ||
|
||
# Package specific | ||
pynions-*/ | ||
pynions-*/ | ||
|
||
# Config files | ||
pynions/config/.env | ||
.env |
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,158 @@ | ||
# Data Organization | ||
|
||
Pynions uses a structured approach to organize data and content workflows. | ||
|
||
## Directory Structure | ||
|
||
``` | ||
data/ | ||
├── raw/ # Original, unmodified data | ||
│ ├── scraped_data/ # Raw scraped content | ||
│ └── logs/ # Application logs | ||
└── output/ # All workflow outputs | ||
└── [project]/ # Project-specific folders | ||
└── [project]_[status]_[date].[ext] | ||
``` | ||
|
||
## Workflow Status Types | ||
|
||
Content goes through several stages in a typical workflow: | ||
|
||
- `research`: Initial research and data gathering | ||
- `brief`: Content brief or outline | ||
- `outline`: Detailed content structure | ||
- `draft`: First version of content | ||
- `review`: Content under review | ||
- `final`: Final approved version | ||
- `data`: Processed data files | ||
- `assets`: Related assets and resources | ||
|
||
## File Naming Convention | ||
|
||
Files are automatically named using the following pattern: | ||
`[project]_[status]_[YYYYMMDD].[extension]` | ||
|
||
Examples: | ||
``` | ||
best_mailchimp_alternatives_research_20240309.md | ||
best_mailchimp_alternatives_brief_20240309.md | ||
best_mailchimp_alternatives_draft_20240309.md | ||
``` | ||
|
||
## Usage | ||
|
||
Save content at different stages of your workflow: | ||
|
||
```python | ||
from pynions.core.utils import save_result | ||
|
||
# Save research content | ||
save_result( | ||
content="Research findings...", | ||
project_name="best-mailchimp-alternatives", | ||
status="research" | ||
) | ||
|
||
# Save draft content | ||
save_result( | ||
content="Draft content...", | ||
project_name="best-mailchimp-alternatives", | ||
status="draft" | ||
) | ||
|
||
# Save related data | ||
save_result( | ||
content='{"data": "metrics"}', | ||
project_name="best-mailchimp-alternatives", | ||
status="data", | ||
extension="json" | ||
) | ||
``` | ||
|
||
## Raw Data Storage | ||
|
||
For storing raw data from various sources: | ||
|
||
```python | ||
from pynions.core.utils import save_raw_data | ||
|
||
# Save scraped content | ||
save_raw_data( | ||
content="Raw scraped content...", | ||
source="serper", | ||
data_type="scraped_data" | ||
) | ||
|
||
# Save log data | ||
save_raw_data( | ||
content="Log entry...", | ||
source="workflow", | ||
data_type="logs" | ||
) | ||
``` | ||
|
||
## Configuration | ||
|
||
Status types and their properties are configured in `settings.json`: | ||
|
||
```json | ||
{ | ||
"workflow": { | ||
"status_types": { | ||
"research": { | ||
"description": "Initial research and data gathering", | ||
"extensions": ["md", "txt"] | ||
}, | ||
"brief": { | ||
"description": "Content brief or outline", | ||
"extensions": ["md"] | ||
}, | ||
"draft": { | ||
"description": "First version of content", | ||
"extensions": ["md"] | ||
} | ||
// ... other status types | ||
} | ||
} | ||
} | ||
``` | ||
|
||
## Best Practices | ||
|
||
1. **Project Names** | ||
- Use descriptive, hyphen-separated names | ||
- Keep names consistent across related content | ||
- Example: "best-mailchimp-alternatives" | ||
|
||
2. **Content Organization** | ||
- Create a new project folder for each content initiative | ||
- Keep all related files within the project folder | ||
- Use appropriate status types to track progress | ||
|
||
3. **Raw Data** | ||
- Always save original, unmodified data in the raw directory | ||
- Use descriptive source names | ||
- Include timestamps for tracking | ||
|
||
4. **File Extensions** | ||
- Use `.md` for content files (research, briefs, drafts) | ||
- Use `.json` for structured data | ||
- Use `.txt` for plain text and logs | ||
|
||
## Data Lifecycle | ||
|
||
1. **Creation** | ||
- Raw data is saved in appropriate raw/ subdirectories | ||
- New project folders are created as needed | ||
|
||
2. **Processing** | ||
- Content moves through various status types | ||
- Each stage saved with appropriate status | ||
|
||
3. **Completion** | ||
- Final content marked with 'final' status | ||
- Raw data retained for reference | ||
|
||
4. **Maintenance** | ||
- Regular cleanup of old raw data | ||
- Archive completed projects as needed |
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,82 @@ | ||
{ | ||
"model": { | ||
"name": "gpt-4o-mini", | ||
"temperature": 0.7, | ||
"max_tokens": 500 | ||
}, | ||
"output": { | ||
"stream": true, | ||
"save_results": true, | ||
"path": "data" | ||
}, | ||
"plugins": { | ||
"stats": { | ||
"enabled": true, | ||
"show_model": true | ||
}, | ||
"serper": { | ||
"max_results": 10, | ||
"country": "us", | ||
"language": "en" | ||
}, | ||
"litellm": { | ||
"default_model": "gpt-4", | ||
"temperature": 0.7, | ||
"max_tokens": 2000, | ||
"retry_attempts": 3 | ||
}, | ||
"playwright": { | ||
"headless": true, | ||
"timeout": 30000, | ||
"screenshot": false | ||
}, | ||
"jina": {} | ||
}, | ||
"storage": { | ||
"data_dir": "data", | ||
"raw_dir": "data/raw", | ||
"output_dir": "data/output", | ||
"max_file_size_mb": 100 | ||
}, | ||
"logging": { | ||
"level": "INFO", | ||
"format": "%(asctime)s - %(name)s - %(levelname)s - %(message)s", | ||
"date_format": "%Y-%m-%d %H:%M:%S" | ||
}, | ||
"workflow": { | ||
"status_types": { | ||
"research": { | ||
"description": "Initial research and data gathering", | ||
"extensions": ["md", "txt"] | ||
}, | ||
"brief": { | ||
"description": "Content brief or outline", | ||
"extensions": ["md"] | ||
}, | ||
"outline": { | ||
"description": "Detailed content structure", | ||
"extensions": ["md"] | ||
}, | ||
"draft": { | ||
"description": "First version of content", | ||
"extensions": ["md"] | ||
}, | ||
"review": { | ||
"description": "Content under review", | ||
"extensions": ["md"] | ||
}, | ||
"final": { | ||
"description": "Final approved version", | ||
"extensions": ["md"] | ||
}, | ||
"data": { | ||
"description": "Processed data files", | ||
"extensions": ["json", "csv"] | ||
}, | ||
"assets": { | ||
"description": "Related assets and resources", | ||
"extensions": ["png", "jpg", "pdf"] | ||
} | ||
} | ||
} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,23 @@ | ||
from pathlib import Path | ||
import json | ||
from dotenv import load_dotenv | ||
|
||
CONFIG_DIR = Path(__file__).parent | ||
DEFAULT_CONFIG_PATH = CONFIG_DIR / "settings.json" | ||
ENV_PATH = CONFIG_DIR / ".env" | ||
|
||
|
||
def load_config(custom_config=None): | ||
"""Load configuration from files and merge with custom config""" | ||
# Load environment variables | ||
load_dotenv(ENV_PATH) | ||
|
||
# Load default settings | ||
with open(DEFAULT_CONFIG_PATH) as f: | ||
config = json.load(f) | ||
|
||
# Merge with custom config | ||
if custom_config: | ||
config.update(custom_config) | ||
|
||
return config |
Oops, something went wrong.