-
Notifications
You must be signed in to change notification settings - Fork 116
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
docs: Add complete documentation structure
- Loading branch information
1 parent
67c3abb
commit 1514087
Showing
8 changed files
with
602 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,32 @@ | ||
name: Deploy Documentation | ||
|
||
on: | ||
push: | ||
branches: | ||
- main | ||
pull_request: | ||
branches: | ||
- main | ||
|
||
permissions: | ||
contents: write | ||
|
||
jobs: | ||
deploy: | ||
runs-on: ubuntu-latest | ||
steps: | ||
- uses: actions/checkout@v4 | ||
|
||
- name: Set up Python | ||
uses: actions/setup-python@v4 | ||
with: | ||
python-version: '3.x' | ||
|
||
- name: Install dependencies | ||
run: | | ||
python -m pip install --upgrade pip | ||
pip install mkdocs-material mkdocstrings[python] | ||
- name: Build and deploy documentation | ||
run: | | ||
mkdocs gh-deploy --force |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,84 @@ | ||
# Document Processing API | ||
|
||
This page documents the document processing components of Ollama PDF RAG. | ||
|
||
## DocumentProcessor | ||
|
||
```python | ||
class DocumentProcessor: | ||
"""Handles PDF document loading and processing.""" | ||
|
||
def __init__(self, chunk_size: int = 1000, chunk_overlap: int = 200): | ||
"""Initialize document processor with chunking parameters.""" | ||
``` | ||
|
||
### Methods | ||
|
||
#### load_document | ||
```python | ||
def load_document(self, file_path: str) -> List[Document]: | ||
"""Load a PDF document and return list of Document objects.""" | ||
``` | ||
|
||
Parameters: | ||
- `file_path`: Path to the PDF file | ||
|
||
Returns: | ||
- List of Document objects | ||
|
||
#### split_documents | ||
```python | ||
def split_documents(self, documents: List[Document]) -> List[Document]: | ||
"""Split documents into chunks with overlap.""" | ||
``` | ||
|
||
Parameters: | ||
- `documents`: List of Document objects | ||
|
||
Returns: | ||
- List of chunked Document objects | ||
|
||
#### process_pdf | ||
```python | ||
def process_pdf(self, file_path: str) -> List[Document]: | ||
"""Load and process a PDF file.""" | ||
``` | ||
|
||
Parameters: | ||
- `file_path`: Path to the PDF file | ||
|
||
Returns: | ||
- List of processed Document chunks | ||
|
||
## Usage Example | ||
|
||
```python | ||
# Initialize processor | ||
processor = DocumentProcessor(chunk_size=1000, chunk_overlap=200) | ||
|
||
# Process a PDF file | ||
documents = processor.process_pdf("path/to/document.pdf") | ||
|
||
# Access document content | ||
for doc in documents: | ||
print(doc.page_content) | ||
print(doc.metadata) | ||
``` | ||
|
||
## Configuration | ||
|
||
The document processor can be configured with: | ||
|
||
- `chunk_size`: Number of characters per chunk | ||
- `chunk_overlap`: Number of overlapping characters | ||
- `pdf_parser`: PDF parsing backend | ||
- `encoding`: Text encoding | ||
|
||
## Error Handling | ||
|
||
The processor handles common errors: | ||
|
||
- File not found | ||
- Invalid PDF format | ||
- Encoding issues | ||
- Memory constraints |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,90 @@ | ||
# Embeddings API | ||
|
||
This page documents the text embedding components used for semantic search. | ||
|
||
## NomicEmbeddings | ||
|
||
```python | ||
class NomicEmbeddings: | ||
"""Manages text embeddings using Nomic's embedding model.""" | ||
|
||
def __init__(self, model_name: str = "nomic-embed-text"): | ||
"""Initialize embeddings with model name.""" | ||
``` | ||
|
||
### Methods | ||
|
||
#### embed_documents | ||
```python | ||
def embed_documents(self, texts: List[str]) -> List[List[float]]: | ||
"""Generate embeddings for a list of texts.""" | ||
``` | ||
|
||
Parameters: | ||
- `texts`: List of text strings | ||
|
||
Returns: | ||
- List of embedding vectors | ||
|
||
#### embed_query | ||
```python | ||
def embed_query(self, text: str) -> List[float]: | ||
"""Generate embedding for a single query text.""" | ||
``` | ||
|
||
Parameters: | ||
- `text`: Query text | ||
|
||
Returns: | ||
- Embedding vector | ||
|
||
## Usage Example | ||
|
||
```python | ||
# Initialize embeddings | ||
embeddings = NomicEmbeddings() | ||
|
||
# Embed documents | ||
docs = ["First document", "Second document"] | ||
doc_embeddings = embeddings.embed_documents(docs) | ||
|
||
# Embed query | ||
query = "Sample query" | ||
query_embedding = embeddings.embed_query(query) | ||
``` | ||
|
||
## Configuration | ||
|
||
Configure embeddings with: | ||
|
||
- Model selection | ||
- Batch size | ||
- Normalization | ||
- Caching options | ||
|
||
## Performance | ||
|
||
Optimization options: | ||
|
||
- Batch processing | ||
- GPU acceleration | ||
- Caching | ||
- Dimensionality | ||
|
||
## Best Practices | ||
|
||
1. **Text Preparation** | ||
- Clean input text | ||
- Handle special characters | ||
- Normalize length | ||
|
||
2. **Resource Management** | ||
- Batch similar lengths | ||
- Monitor memory usage | ||
- Cache frequent queries | ||
|
||
3. **Quality Control** | ||
- Validate embeddings | ||
- Check dimensions | ||
- Monitor similarity scores | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,97 @@ | ||
# LLM Manager API | ||
|
||
This page documents the Language Model management components. | ||
|
||
## LLMManager | ||
|
||
```python | ||
class LLMManager: | ||
"""Manages Ollama language model interactions.""" | ||
|
||
def __init__(self, model_name: str = "llama2"): | ||
"""Initialize LLM manager with model name.""" | ||
``` | ||
|
||
### Methods | ||
|
||
#### list_models | ||
```python | ||
def list_models() -> List[str]: | ||
"""List available Ollama models.""" | ||
``` | ||
|
||
Returns: | ||
- List of model names | ||
|
||
#### get_model | ||
```python | ||
def get_model(self, model_name: str) -> LLM: | ||
"""Get an instance of the specified model.""" | ||
``` | ||
|
||
Parameters: | ||
- `model_name`: Name of the Ollama model | ||
|
||
Returns: | ||
- LLM instance | ||
|
||
#### generate | ||
```python | ||
def generate(self, prompt: str, **kwargs) -> str: | ||
"""Generate text using the current model.""" | ||
``` | ||
|
||
Parameters: | ||
- `prompt`: Input text | ||
- `**kwargs`: Additional generation parameters | ||
|
||
Returns: | ||
- Generated text | ||
|
||
## Usage Example | ||
|
||
```python | ||
# Initialize manager | ||
manager = LLMManager(model_name="llama2") | ||
|
||
# List available models | ||
models = manager.list_models() | ||
|
||
# Generate text | ||
response = manager.generate( | ||
prompt="Explain RAG in simple terms", | ||
temperature=0.7, | ||
max_tokens=500 | ||
) | ||
``` | ||
|
||
## Model Parameters | ||
|
||
Configure model behavior with: | ||
|
||
- `temperature`: Creativity (0.0-1.0) | ||
- `max_tokens`: Response length | ||
- `top_p`: Nucleus sampling | ||
- `frequency_penalty`: Repetition control | ||
|
||
## Error Handling | ||
|
||
The manager handles: | ||
|
||
- Model loading errors | ||
- Generation timeouts | ||
- Resource constraints | ||
- API communication issues | ||
|
||
## Best Practices | ||
|
||
1. **Model Selection** | ||
- Match model to task | ||
- Consider resource usage | ||
- Test performance | ||
|
||
2. **Parameter Tuning** | ||
- Adjust temperature | ||
- Control response length | ||
- Balance quality/speed | ||
``` |
Oops, something went wrong.