Initial commit

2025-12-11 14:43:16 +01:00
commit 7ef62972f3
36 changed files with 23641 additions and 0 deletions
--- a/docs/Library/retrievers.md
+++ b/docs/Library/retrievers.md
@@ -0,0 +1,185 @@
+---
+id: retrievers
+title: Understanding Retrievers
+description: Learn how retrievers find and extract relevant information from your documents
+sidebar_label: Retrievers
+sidebar_position: 3
+---
+
+# Understanding Retrievers
+
+## Overview
+
+Retrievers are essential components in Evie's Library that help find and extract relevant information from your documents. 
+Think of retrievers as intelligent search engines that understand the meaning behind your questions and find the most 
+relevant content from your stored documents.
+
+```mermaid
+classDiagram
+    class Catalog {
+        +id: Integer
+        +name: String
+        +description: Text
+        +type: String
+        +min_chunk_size: Integer
+        +max_chunk_size: Integer
+        +user_metadata: JSON
+    }
+    
+    class Retriever {
+        +id: Integer
+        +name: String
+        +description: Text
+        +catalog_id: Integer
+        +type: String
+        +tuning: Boolean
+        +configuration: JSON
+        +arguments: JSON
+    }
+    
+    class StandardRAGRetriever {
+        +configuration
+        es_k: Integer
+        es_similarity_threshold: Float
+        +arguments
+        query: String
+    }
+    
+    class DossierRetriever {
+        +configuration
+        es_k: Integer
+        es_similarity_threshold: Float
+        tag_conditions: JSON
+        +arguments
+        query: String
+    }
+    
+    Catalog "1" -- "*" Retriever : has
+    Retriever <|-- StandardRAGRetriever
+    Retriever <|-- DossierRetriever
+    
+    note for StandardRAGRetriever "Default similarity threshold: 0.3<br>Default es_k: 8"
+    note for DossierRetriever "Coming soon<br>Specialized for Dossier catalogs"
+```
+## Key Concepts
+
+### What is a Retriever?
+
+A retriever is responsible for:
+- Understanding the meaning of your questions
+- Searching through document chunks in your catalog
+- Finding the most relevant information based on semantic similarity
+- Providing context for Evie's responses
+
+```mermaid
+flowchart LR
+    A[User Question] --> B[Retriever]
+    B --> C[Document Chunks]
+    C --> D[Most Relevant Information]
+    D --> E[Evie's Response]
+    
+    style A fill:#9c2d66,stroke:#333,stroke-width:2px
+    style B fill:#423372,stroke:#333,stroke-width:2px
+    style C fill:#423372,stroke:#333,stroke-width:2px
+    style D fill:#423372,stroke:#333,stroke-width:2px
+    style E fill:#9c2d66,stroke:#333,stroke-width:2px
+```
+
+### How Retrievers Work
+
+When you ask Evie a question, the retriever:
+1. Analyzes your question to understand its meaning
+2. Compares it with stored document chunks
+3. Assigns similarity scores to each chunk
+4. Returns the most relevant chunks based on configuration settings
+
+## Types of Retrievers
+
+### Standard RAG Retriever
+
+The Standard RAG (Retrieval-Augmented Generation) Retriever is the default option suitable for most use cases. It 
+searches through all documents in a catalog to find relevant information.
+
+Configuration options include:
+- **Maximum Results (es_k)**: Controls how many document chunks to retrieve (default: 8)
+- **Similarity Threshold**: Determines how closely chunks must match your question (default: 0.3)
+  - Lower threshold = stricter matching
+  - Higher threshold = more permissive matching
+
+### Dossier Retriever (Coming Soon)
+
+A specialized retriever for Dossier catalogs that will allow:
+- Filtering by document tags
+- Creating specific "viewpoints" based on tag combinations
+- Combining semantic search with tag-based filtering
+
+## Setting Up Retrievers
+
+### Creating a New Retriever
+
+To create a retriever:
+1. Enter standard values such as name and description
+2. Select the target catalog
+3. Choose the retriever type
+4. After saving, you will have the ability to set the specific configuration (based on the type)
+
+### Configuration Best Practices
+
+1. **Similarity Threshold Tuning**:
+   - Start with the default 0.3 threshold
+   - If receiving too much information: Lower the threshold
+   - If receiving too little information: Raise the threshold
+
+2. **Multiple Retrievers**:
+   You can create multiple retrievers for the same catalog to serve different purposes. For example:
+   - A broad retriever with higher threshold for general questions
+   - A strict retriever with lower threshold for specific queries
+   - Different retrievers for different document subsets (in Dossier catalogs)
+
+## Practical Examples
+
+### Standard RAG Retriever Example
+
+```json
+{
+  "name": "General Knowledge Retriever",
+  "type": "STANDARD_RAG",
+  "configuration": {
+    "es_k": 8,
+    "es_similarity_threshold": 0.3
+  }
+}
+```
+
+### Future Dossier Retriever Example
+
+```json
+{
+  "name": "Quarterly Reports 2024",
+  "type": "DOSSIER_RAG",
+  "configuration": {
+    "es_k": 8,
+    "es_similarity_threshold": 0.3,
+    "tag_conditions": {
+      "document_type": "quarterly_report",
+      "year": 2024
+    }
+  }
+}
+```
+
+## Tips for Optimal Retrieval
+
+1. **Name Retrievers Clearly**:
+   Use descriptive names that indicate their purpose and configuration
+
+2. **Monitor Performance**:
+   - If answers are missing important information, consider:
+     - Increasing the similarity threshold
+     - Increasing the maximum results (es_k)
+   - If answers contain irrelevant information, consider:
+     - Decreasing the similarity threshold
+     - Decreasing the maximum results
+
+3. **Use Multiple Retrievers**:
+   Create specialized retrievers for different use cases within the same catalog