Skip to content

Commit

Permalink
docs: Add complete documentation structure
Browse files Browse the repository at this point in the history
  • Loading branch information
tonykipkemboi committed Jan 7, 2025
1 parent 67c3abb commit 1514087
Show file tree
Hide file tree
Showing 8 changed files with 602 additions and 0 deletions.
32 changes: 32 additions & 0 deletions .github/workflows/docs.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
name: Deploy Documentation

on:
push:
branches:
- main
pull_request:
branches:
- main

permissions:
contents: write

jobs:
deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4

- name: Set up Python
uses: actions/setup-python@v4
with:
python-version: '3.x'

- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install mkdocs-material mkdocstrings[python]
- name: Build and deploy documentation
run: |
mkdocs gh-deploy --force
84 changes: 84 additions & 0 deletions docs/api/document.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
# Document Processing API

This page documents the document processing components of Ollama PDF RAG.

## DocumentProcessor

```python
class DocumentProcessor:
"""Handles PDF document loading and processing."""

def __init__(self, chunk_size: int = 1000, chunk_overlap: int = 200):
"""Initialize document processor with chunking parameters."""
```

### Methods

#### load_document
```python
def load_document(self, file_path: str) -> List[Document]:
"""Load a PDF document and return list of Document objects."""
```

Parameters:
- `file_path`: Path to the PDF file

Returns:
- List of Document objects

#### split_documents
```python
def split_documents(self, documents: List[Document]) -> List[Document]:
"""Split documents into chunks with overlap."""
```

Parameters:
- `documents`: List of Document objects

Returns:
- List of chunked Document objects

#### process_pdf
```python
def process_pdf(self, file_path: str) -> List[Document]:
"""Load and process a PDF file."""
```

Parameters:
- `file_path`: Path to the PDF file

Returns:
- List of processed Document chunks

## Usage Example

```python
# Initialize processor
processor = DocumentProcessor(chunk_size=1000, chunk_overlap=200)

# Process a PDF file
documents = processor.process_pdf("path/to/document.pdf")

# Access document content
for doc in documents:
print(doc.page_content)
print(doc.metadata)
```

## Configuration

The document processor can be configured with:

- `chunk_size`: Number of characters per chunk
- `chunk_overlap`: Number of overlapping characters
- `pdf_parser`: PDF parsing backend
- `encoding`: Text encoding

## Error Handling

The processor handles common errors:

- File not found
- Invalid PDF format
- Encoding issues
- Memory constraints
90 changes: 90 additions & 0 deletions docs/api/embeddings.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
# Embeddings API

This page documents the text embedding components used for semantic search.

## NomicEmbeddings

```python
class NomicEmbeddings:
"""Manages text embeddings using Nomic's embedding model."""

def __init__(self, model_name: str = "nomic-embed-text"):
"""Initialize embeddings with model name."""
```

### Methods

#### embed_documents
```python
def embed_documents(self, texts: List[str]) -> List[List[float]]:
"""Generate embeddings for a list of texts."""
```

Parameters:
- `texts`: List of text strings

Returns:
- List of embedding vectors

#### embed_query
```python
def embed_query(self, text: str) -> List[float]:
"""Generate embedding for a single query text."""
```

Parameters:
- `text`: Query text

Returns:
- Embedding vector

## Usage Example

```python
# Initialize embeddings
embeddings = NomicEmbeddings()

# Embed documents
docs = ["First document", "Second document"]
doc_embeddings = embeddings.embed_documents(docs)

# Embed query
query = "Sample query"
query_embedding = embeddings.embed_query(query)
```

## Configuration

Configure embeddings with:

- Model selection
- Batch size
- Normalization
- Caching options

## Performance

Optimization options:

- Batch processing
- GPU acceleration
- Caching
- Dimensionality

## Best Practices

1. **Text Preparation**
- Clean input text
- Handle special characters
- Normalize length

2. **Resource Management**
- Batch similar lengths
- Monitor memory usage
- Cache frequent queries

3. **Quality Control**
- Validate embeddings
- Check dimensions
- Monitor similarity scores
```
97 changes: 97 additions & 0 deletions docs/api/llm.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
# LLM Manager API

This page documents the Language Model management components.

## LLMManager

```python
class LLMManager:
"""Manages Ollama language model interactions."""

def __init__(self, model_name: str = "llama2"):
"""Initialize LLM manager with model name."""
```

### Methods

#### list_models
```python
def list_models() -> List[str]:
"""List available Ollama models."""
```

Returns:
- List of model names

#### get_model
```python
def get_model(self, model_name: str) -> LLM:
"""Get an instance of the specified model."""
```

Parameters:
- `model_name`: Name of the Ollama model

Returns:
- LLM instance

#### generate
```python
def generate(self, prompt: str, **kwargs) -> str:
"""Generate text using the current model."""
```

Parameters:
- `prompt`: Input text
- `**kwargs`: Additional generation parameters

Returns:
- Generated text

## Usage Example

```python
# Initialize manager
manager = LLMManager(model_name="llama2")

# List available models
models = manager.list_models()

# Generate text
response = manager.generate(
prompt="Explain RAG in simple terms",
temperature=0.7,
max_tokens=500
)
```

## Model Parameters

Configure model behavior with:

- `temperature`: Creativity (0.0-1.0)
- `max_tokens`: Response length
- `top_p`: Nucleus sampling
- `frequency_penalty`: Repetition control

## Error Handling

The manager handles:

- Model loading errors
- Generation timeouts
- Resource constraints
- API communication issues

## Best Practices

1. **Model Selection**
- Match model to task
- Consider resource usage
- Test performance

2. **Parameter Tuning**
- Adjust temperature
- Control response length
- Balance quality/speed
```
Loading

0 comments on commit 1514087

Please sign in to comment.