Skip to content

Commit

Permalink
reorganize project configuration and data structure
Browse files Browse the repository at this point in the history
- Move config files to /pynions/config/
  - Consolidate settings.json
  - Move .env to config directory
  - Add workflow status types config

- Improve data organization
  - Add raw/ and output/ directories
  - Implement project-based content structure
  - Add configurable workflow status types

- Update utils.py
  - Add project-aware save functions
  - Add status validation
  - Improve file naming consistency

- Add documentation
  - Add data-organization.md
  - Update changelog.md

Breaking Changes:
- Configuration files moved to /pynions/config/
- New data directory structure required
- Updated save_result() function signature
  • Loading branch information
tomaslau committed Nov 9, 2024
1 parent f65234b commit d37bacf
Show file tree
Hide file tree
Showing 12 changed files with 381 additions and 32 deletions.
6 changes: 5 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -99,4 +99,8 @@ site/
*.ipynb

# Package specific
pynions-*/
pynions-*/

# Config files
pynions/config/.env
.env
30 changes: 0 additions & 30 deletions config.example.json

This file was deleted.

30 changes: 29 additions & 1 deletion docs/changelog.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,39 @@
---
title: "Changelog"
publishedAt: "2024-11-03"
updatedAt: "2024-11-08"
updatedAt: "2024-11-09"
summary: "Updates, bug fixes and improvements."
kind: "detailed"
---

## v0.2.18 - Nov 9, 2024

### Changed

- Reorganized project configuration structure
- Moved all config files to `/pynions/config/` directory
- Consolidated settings into single `settings.json`
- Moved `.env` to config directory
- Improved data directory organization
- Separated raw and output data
- Added structured workflow status types
- Implemented project-based output organization
- Enhanced utils.py with new file management functions
- Added project-aware save functions
- Improved status type validation
- Added configurable file extensions

### Added

- New configuration management system
- Added `settings.py` for centralized config loading
- Added workflow status types configuration
- Added file extension preferences per status
- New utility functions for content workflow
- `save_result()` with project and status support
- `save_raw_data()` for structured data storage
- `slugify()` for consistent file naming

## v0.2.17 - Nov 9, 2024

### Changed
Expand Down
158 changes: 158 additions & 0 deletions docs/data-organization.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,158 @@
# Data Organization

Pynions uses a structured approach to organize data and content workflows.

## Directory Structure

```
data/
├── raw/ # Original, unmodified data
│ ├── scraped_data/ # Raw scraped content
│ └── logs/ # Application logs
└── output/ # All workflow outputs
└── [project]/ # Project-specific folders
└── [project]_[status]_[date].[ext]
```

## Workflow Status Types

Content goes through several stages in a typical workflow:

- `research`: Initial research and data gathering
- `brief`: Content brief or outline
- `outline`: Detailed content structure
- `draft`: First version of content
- `review`: Content under review
- `final`: Final approved version
- `data`: Processed data files
- `assets`: Related assets and resources

## File Naming Convention

Files are automatically named using the following pattern:
`[project]_[status]_[YYYYMMDD].[extension]`

Examples:
```
best_mailchimp_alternatives_research_20240309.md
best_mailchimp_alternatives_brief_20240309.md
best_mailchimp_alternatives_draft_20240309.md
```

## Usage

Save content at different stages of your workflow:

```python
from pynions.core.utils import save_result

# Save research content
save_result(
content="Research findings...",
project_name="best-mailchimp-alternatives",
status="research"
)

# Save draft content
save_result(
content="Draft content...",
project_name="best-mailchimp-alternatives",
status="draft"
)

# Save related data
save_result(
content='{"data": "metrics"}',
project_name="best-mailchimp-alternatives",
status="data",
extension="json"
)
```

## Raw Data Storage

For storing raw data from various sources:

```python
from pynions.core.utils import save_raw_data

# Save scraped content
save_raw_data(
content="Raw scraped content...",
source="serper",
data_type="scraped_data"
)

# Save log data
save_raw_data(
content="Log entry...",
source="workflow",
data_type="logs"
)
```

## Configuration

Status types and their properties are configured in `settings.json`:

```json
{
"workflow": {
"status_types": {
"research": {
"description": "Initial research and data gathering",
"extensions": ["md", "txt"]
},
"brief": {
"description": "Content brief or outline",
"extensions": ["md"]
},
"draft": {
"description": "First version of content",
"extensions": ["md"]
}
// ... other status types
}
}
}
```

## Best Practices

1. **Project Names**
- Use descriptive, hyphen-separated names
- Keep names consistent across related content
- Example: "best-mailchimp-alternatives"

2. **Content Organization**
- Create a new project folder for each content initiative
- Keep all related files within the project folder
- Use appropriate status types to track progress

3. **Raw Data**
- Always save original, unmodified data in the raw directory
- Use descriptive source names
- Include timestamps for tracking

4. **File Extensions**
- Use `.md` for content files (research, briefs, drafts)
- Use `.json` for structured data
- Use `.txt` for plain text and logs

## Data Lifecycle

1. **Creation**
- Raw data is saved in appropriate raw/ subdirectories
- New project folders are created as needed

2. **Processing**
- Content moves through various status types
- Each stage saved with appropriate status

3. **Completion**
- Final content marked with 'final' status
- Raw data retained for reference

4. **Maintenance**
- Regular cleanup of old raw data
- Archive completed projects as needed
File renamed without changes.
82 changes: 82 additions & 0 deletions pynions/config/settings.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
{
"model": {
"name": "gpt-4o-mini",
"temperature": 0.7,
"max_tokens": 500
},
"output": {
"stream": true,
"save_results": true,
"path": "data"
},
"plugins": {
"stats": {
"enabled": true,
"show_model": true
},
"serper": {
"max_results": 10,
"country": "us",
"language": "en"
},
"litellm": {
"default_model": "gpt-4",
"temperature": 0.7,
"max_tokens": 2000,
"retry_attempts": 3
},
"playwright": {
"headless": true,
"timeout": 30000,
"screenshot": false
},
"jina": {}
},
"storage": {
"data_dir": "data",
"raw_dir": "data/raw",
"output_dir": "data/output",
"max_file_size_mb": 100
},
"logging": {
"level": "INFO",
"format": "%(asctime)s - %(name)s - %(levelname)s - %(message)s",
"date_format": "%Y-%m-%d %H:%M:%S"
},
"workflow": {
"status_types": {
"research": {
"description": "Initial research and data gathering",
"extensions": ["md", "txt"]
},
"brief": {
"description": "Content brief or outline",
"extensions": ["md"]
},
"outline": {
"description": "Detailed content structure",
"extensions": ["md"]
},
"draft": {
"description": "First version of content",
"extensions": ["md"]
},
"review": {
"description": "Content under review",
"extensions": ["md"]
},
"final": {
"description": "Final approved version",
"extensions": ["md"]
},
"data": {
"description": "Processed data files",
"extensions": ["json", "csv"]
},
"assets": {
"description": "Related assets and resources",
"extensions": ["png", "jpg", "pdf"]
}
}
}
}
23 changes: 23 additions & 0 deletions pynions/config/settings.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
from pathlib import Path
import json
from dotenv import load_dotenv

CONFIG_DIR = Path(__file__).parent
DEFAULT_CONFIG_PATH = CONFIG_DIR / "settings.json"
ENV_PATH = CONFIG_DIR / ".env"


def load_config(custom_config=None):
"""Load configuration from files and merge with custom config"""
# Load environment variables
load_dotenv(ENV_PATH)

# Load default settings
with open(DEFAULT_CONFIG_PATH) as f:
config = json.load(f)

# Merge with custom config
if custom_config:
config.update(custom_config)

return config
Loading

0 comments on commit d37bacf

Please sign in to comment.