docs: Add complete documentation structure

tonykipkemboi · Jan 7, 2025 · 1514087 · 1514087
1 parent 67c3abb
commit 1514087
Show file tree

Hide file tree

Showing 8 changed files with 602 additions and 0 deletions.
diff --git a/.github/workflows/docs.yml b/.github/workflows/docs.yml
@@ -0,0 +1,32 @@
+name: Deploy Documentation
+
+on:
+  push:
+    branches:
+      - main
+  pull_request:
+    branches:
+      - main
+
+permissions:
+  contents: write
+
+jobs:
+  deploy:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+
+      - name: Set up Python
+        uses: actions/setup-python@v4
+        with:
+          python-version: '3.x'
+
+      - name: Install dependencies
+        run: |
+          python -m pip install --upgrade pip
+          pip install mkdocs-material mkdocstrings[python]
+      
+      - name: Build and deploy documentation
+        run: |
+          mkdocs gh-deploy --force 
diff --git a/docs/api/document.md b/docs/api/document.md
@@ -0,0 +1,84 @@
+# Document Processing API
+
+This page documents the document processing components of Ollama PDF RAG.
+
+## DocumentProcessor
+
+```python
+class DocumentProcessor:
+    """Handles PDF document loading and processing."""
+
+    def __init__(self, chunk_size: int = 1000, chunk_overlap: int = 200):
+        """Initialize document processor with chunking parameters."""
+```
+
+### Methods
+
+#### load_document
+```python
+def load_document(self, file_path: str) -> List[Document]:
+    """Load a PDF document and return list of Document objects."""
+```
+
+Parameters:
+- `file_path`: Path to the PDF file
+
+Returns:
+- List of Document objects
+
+#### split_documents
+```python
+def split_documents(self, documents: List[Document]) -> List[Document]:
+    """Split documents into chunks with overlap."""
+```
+
+Parameters:
+- `documents`: List of Document objects
+
+Returns:
+- List of chunked Document objects
+
+#### process_pdf
+```python
+def process_pdf(self, file_path: str) -> List[Document]:
+    """Load and process a PDF file."""
+```
+
+Parameters:
+- `file_path`: Path to the PDF file
+
+Returns:
+- List of processed Document chunks
+
+## Usage Example
+
+```python
+# Initialize processor
+processor = DocumentProcessor(chunk_size=1000, chunk_overlap=200)
+
+# Process a PDF file
+documents = processor.process_pdf("path/to/document.pdf")
+
+# Access document content
+for doc in documents:
+    print(doc.page_content)
+    print(doc.metadata)
+```
+
+## Configuration
+
+The document processor can be configured with:
+
+- `chunk_size`: Number of characters per chunk
+- `chunk_overlap`: Number of overlapping characters
+- `pdf_parser`: PDF parsing backend
+- `encoding`: Text encoding
+
+## Error Handling
+
+The processor handles common errors:
+
+- File not found
+- Invalid PDF format
+- Encoding issues
+- Memory constraints 
diff --git a/docs/api/embeddings.md b/docs/api/embeddings.md
@@ -0,0 +1,90 @@
+# Embeddings API
+
+This page documents the text embedding components used for semantic search.
+
+## NomicEmbeddings
+
+```python
+class NomicEmbeddings:
+    """Manages text embeddings using Nomic's embedding model."""
+
+    def __init__(self, model_name: str = "nomic-embed-text"):
+        """Initialize embeddings with model name."""
+```
+
+### Methods
+
+#### embed_documents
+```python
+def embed_documents(self, texts: List[str]) -> List[List[float]]:
+    """Generate embeddings for a list of texts."""
+```
+
+Parameters:
+- `texts`: List of text strings
+
+Returns:
+- List of embedding vectors
+
+#### embed_query
+```python
+def embed_query(self, text: str) -> List[float]:
+    """Generate embedding for a single query text."""
+```
+
+Parameters:
+- `text`: Query text
+
+Returns:
+- Embedding vector
+
+## Usage Example
+
+```python
+# Initialize embeddings
+embeddings = NomicEmbeddings()
+
+# Embed documents
+docs = ["First document", "Second document"]
+doc_embeddings = embeddings.embed_documents(docs)
+
+# Embed query
+query = "Sample query"
+query_embedding = embeddings.embed_query(query)
+```
+
+## Configuration
+
+Configure embeddings with:
+
+- Model selection
+- Batch size
+- Normalization
+- Caching options
+
+## Performance
+
+Optimization options:
+
+- Batch processing
+- GPU acceleration
+- Caching
+- Dimensionality
+
+## Best Practices
+
+1. **Text Preparation**
+   - Clean input text
+   - Handle special characters
+   - Normalize length
+
+2. **Resource Management**
+   - Batch similar lengths
+   - Monitor memory usage
+   - Cache frequent queries
+
+3. **Quality Control**
+   - Validate embeddings
+   - Check dimensions
+   - Monitor similarity scores
+``` 
diff --git a/docs/api/llm.md b/docs/api/llm.md
@@ -0,0 +1,97 @@
+# LLM Manager API
+
+This page documents the Language Model management components.
+
+## LLMManager
+
+```python
+class LLMManager:
+    """Manages Ollama language model interactions."""
+
+    def __init__(self, model_name: str = "llama2"):
+        """Initialize LLM manager with model name."""
+```
+
+### Methods
+
+#### list_models
+```python
+def list_models() -> List[str]:
+    """List available Ollama models."""
+```
+
+Returns:
+- List of model names
+
+#### get_model
+```python
+def get_model(self, model_name: str) -> LLM:
+    """Get an instance of the specified model."""
+```
+
+Parameters:
+- `model_name`: Name of the Ollama model
+
+Returns:
+- LLM instance
+
+#### generate
+```python
+def generate(self, prompt: str, **kwargs) -> str:
+    """Generate text using the current model."""
+```
+
+Parameters:
+- `prompt`: Input text
+- `**kwargs`: Additional generation parameters
+
+Returns:
+- Generated text
+
+## Usage Example
+
+```python
+# Initialize manager
+manager = LLMManager(model_name="llama2")
+
+# List available models
+models = manager.list_models()
+
+# Generate text
+response = manager.generate(
+    prompt="Explain RAG in simple terms",
+    temperature=0.7,
+    max_tokens=500
+)
+```
+
+## Model Parameters
+
+Configure model behavior with:
+
+- `temperature`: Creativity (0.0-1.0)
+- `max_tokens`: Response length
+- `top_p`: Nucleus sampling
+- `frequency_penalty`: Repetition control
+
+## Error Handling
+
+The manager handles:
+
+- Model loading errors
+- Generation timeouts
+- Resource constraints
+- API communication issues
+
+## Best Practices
+
+1. **Model Selection**
+   - Match model to task
+   - Consider resource usage
+   - Test performance
+
+2. **Parameter Tuning**
+   - Adjust temperature
+   - Control response length
+   - Balance quality/speed
+```