reorganize project configuration and data structure

- Move config files to /pynions/config/ - Consolidate settings.json - Move .env to config directory - Add workflow status types config - Improve data organization - Add raw/ and output/ directories - Implement project-based content structure - Add configurable workflow status types - Update utils.py - Add project-aware save functions - Add status validation - Improve file naming consistency - Add documentation - Add data-organization.md - Update changelog.md Breaking Changes: - Configuration files moved to /pynions/config/ - New data directory structure required - Updated save_result() function signature
craftled · Nov 9, 2024 · d37bacf · d37bacf
1 parent f65234b
commit d37bacf
Show file tree

Hide file tree

Showing 12 changed files with 381 additions and 32 deletions.
diff --git a/.gitignore b/.gitignore
@@ -99,4 +99,8 @@ site/
 *.ipynb
 
 # Package specific
-pynions-*/
+pynions-*/
+
+# Config files
+pynions/config/.env
+.env
diff --git a/config.example.json b/config.example.json
diff --git a/docs/changelog.md b/docs/changelog.md
@@ -1,11 +1,39 @@
 ---
 title: "Changelog"
 publishedAt: "2024-11-03"
-updatedAt: "2024-11-08"
+updatedAt: "2024-11-09"
 summary: "Updates, bug fixes and improvements."
 kind: "detailed"
 ---
 
+## v0.2.18 - Nov 9, 2024
+
+### Changed
+
+- Reorganized project configuration structure
+  - Moved all config files to `/pynions/config/` directory
+  - Consolidated settings into single `settings.json`
+  - Moved `.env` to config directory
+- Improved data directory organization
+  - Separated raw and output data
+  - Added structured workflow status types
+  - Implemented project-based output organization
+- Enhanced utils.py with new file management functions
+  - Added project-aware save functions
+  - Improved status type validation
+  - Added configurable file extensions
+
+### Added
+
+- New configuration management system
+  - Added `settings.py` for centralized config loading
+  - Added workflow status types configuration
+  - Added file extension preferences per status
+- New utility functions for content workflow
+  - `save_result()` with project and status support
+  - `save_raw_data()` for structured data storage
+  - `slugify()` for consistent file naming
+
 ## v0.2.17 - Nov 9, 2024
 
 ### Changed

diff --git a/docs/data-organization.md b/docs/data-organization.md
@@ -0,0 +1,158 @@
+# Data Organization
+
+Pynions uses a structured approach to organize data and content workflows.
+
+## Directory Structure
+
+```
+data/
+├── raw/              # Original, unmodified data
+│   ├── scraped_data/ # Raw scraped content
+│   └── logs/         # Application logs
+└── output/           # All workflow outputs
+    └── [project]/    # Project-specific folders
+        └── [project]_[status]_[date].[ext]
+```
+
+## Workflow Status Types
+
+Content goes through several stages in a typical workflow:
+
+- `research`: Initial research and data gathering
+- `brief`: Content brief or outline
+- `outline`: Detailed content structure
+- `draft`: First version of content
+- `review`: Content under review
+- `final`: Final approved version
+- `data`: Processed data files
+- `assets`: Related assets and resources
+
+## File Naming Convention
+
+Files are automatically named using the following pattern:
+`[project]_[status]_[YYYYMMDD].[extension]`
+
+Examples:
+```
+best_mailchimp_alternatives_research_20240309.md
+best_mailchimp_alternatives_brief_20240309.md
+best_mailchimp_alternatives_draft_20240309.md
+```
+
+## Usage
+
+Save content at different stages of your workflow:
+
+```python
+from pynions.core.utils import save_result
+
+# Save research content
+save_result(
+    content="Research findings...",
+    project_name="best-mailchimp-alternatives",
+    status="research"
+)
+
+# Save draft content
+save_result(
+    content="Draft content...",
+    project_name="best-mailchimp-alternatives",
+    status="draft"
+)
+
+# Save related data
+save_result(
+    content='{"data": "metrics"}',
+    project_name="best-mailchimp-alternatives",
+    status="data",
+    extension="json"
+)
+```
+
+## Raw Data Storage
+
+For storing raw data from various sources:
+
+```python
+from pynions.core.utils import save_raw_data
+
+# Save scraped content
+save_raw_data(
+    content="Raw scraped content...",
+    source="serper",
+    data_type="scraped_data"
+)
+
+# Save log data
+save_raw_data(
+    content="Log entry...",
+    source="workflow",
+    data_type="logs"
+)
+```
+
+## Configuration
+
+Status types and their properties are configured in `settings.json`:
+
+```json
+{
+  "workflow": {
+    "status_types": {
+      "research": {
+        "description": "Initial research and data gathering",
+        "extensions": ["md", "txt"]
+      },
+      "brief": {
+        "description": "Content brief or outline",
+        "extensions": ["md"]
+      },
+      "draft": {
+        "description": "First version of content",
+        "extensions": ["md"]
+      }
+      // ... other status types
+    }
+  }
+}
+```
+
+## Best Practices
+
+1. **Project Names**
+   - Use descriptive, hyphen-separated names
+   - Keep names consistent across related content
+   - Example: "best-mailchimp-alternatives"
+
+2. **Content Organization**
+   - Create a new project folder for each content initiative
+   - Keep all related files within the project folder
+   - Use appropriate status types to track progress
+
+3. **Raw Data**
+   - Always save original, unmodified data in the raw directory
+   - Use descriptive source names
+   - Include timestamps for tracking
+
+4. **File Extensions**
+   - Use `.md` for content files (research, briefs, drafts)
+   - Use `.json` for structured data
+   - Use `.txt` for plain text and logs
+
+## Data Lifecycle
+
+1. **Creation**
+   - Raw data is saved in appropriate raw/ subdirectories
+   - New project folders are created as needed
+
+2. **Processing**
+   - Content moves through various status types
+   - Each stage saved with appropriate status
+
+3. **Completion**
+   - Final content marked with 'final' status
+   - Raw data retained for reference
+
+4. **Maintenance**
+   - Regular cleanup of old raw data
+   - Archive completed projects as needed
diff --git a/.env.example → pynions/config/.env.example b/.env.example → pynions/config/.env.example
diff --git a/pynions/config/settings.json b/pynions/config/settings.json
@@ -0,0 +1,82 @@
+{
+  "model": {
+    "name": "gpt-4o-mini",
+    "temperature": 0.7,
+    "max_tokens": 500
+  },
+  "output": {
+    "stream": true,
+    "save_results": true,
+    "path": "data"
+  },
+  "plugins": {
+    "stats": {
+      "enabled": true,
+      "show_model": true
+    },
+    "serper": {
+      "max_results": 10,
+      "country": "us",
+      "language": "en"
+    },
+    "litellm": {
+      "default_model": "gpt-4",
+      "temperature": 0.7,
+      "max_tokens": 2000,
+      "retry_attempts": 3
+    },
+    "playwright": {
+      "headless": true,
+      "timeout": 30000,
+      "screenshot": false
+    },
+    "jina": {}
+  },
+  "storage": {
+    "data_dir": "data",
+    "raw_dir": "data/raw",
+    "output_dir": "data/output",
+    "max_file_size_mb": 100
+  },
+  "logging": {
+    "level": "INFO",
+    "format": "%(asctime)s - %(name)s - %(levelname)s - %(message)s",
+    "date_format": "%Y-%m-%d %H:%M:%S"
+  },
+  "workflow": {
+    "status_types": {
+      "research": {
+        "description": "Initial research and data gathering",
+        "extensions": ["md", "txt"]
+      },
+      "brief": {
+        "description": "Content brief or outline",
+        "extensions": ["md"]
+      },
+      "outline": {
+        "description": "Detailed content structure",
+        "extensions": ["md"]
+      },
+      "draft": {
+        "description": "First version of content",
+        "extensions": ["md"]
+      },
+      "review": {
+        "description": "Content under review",
+        "extensions": ["md"]
+      },
+      "final": {
+        "description": "Final approved version",
+        "extensions": ["md"]
+      },
+      "data": {
+        "description": "Processed data files",
+        "extensions": ["json", "csv"]
+      },
+      "assets": {
+        "description": "Related assets and resources",
+        "extensions": ["png", "jpg", "pdf"]
+      }
+    }
+  }
+}
diff --git a/pynions/config/settings.py b/pynions/config/settings.py
@@ -0,0 +1,23 @@
+from pathlib import Path
+import json
+from dotenv import load_dotenv
+
+CONFIG_DIR = Path(__file__).parent
+DEFAULT_CONFIG_PATH = CONFIG_DIR / "settings.json"
+ENV_PATH = CONFIG_DIR / ".env"
+
+
+def load_config(custom_config=None):
+    """Load configuration from files and merge with custom config"""
+    # Load environment variables
+    load_dotenv(ENV_PATH)
+
+    # Load default settings
+    with open(DEFAULT_CONFIG_PATH) as f:
+        config = json.load(f)
+
+    # Merge with custom config
+    if custom_config:
+        config.update(custom_config)
+
+    return config