Initial commit
This commit is contained in:
185
docs/Library/retrievers.md
Normal file
185
docs/Library/retrievers.md
Normal file
@@ -0,0 +1,185 @@
|
||||
---
|
||||
id: retrievers
|
||||
title: Understanding Retrievers
|
||||
description: Learn how retrievers find and extract relevant information from your documents
|
||||
sidebar_label: Retrievers
|
||||
sidebar_position: 3
|
||||
---
|
||||
|
||||
# Understanding Retrievers
|
||||
|
||||
## Overview
|
||||
|
||||
Retrievers are essential components in Evie's Library that help find and extract relevant information from your documents.
|
||||
Think of retrievers as intelligent search engines that understand the meaning behind your questions and find the most
|
||||
relevant content from your stored documents.
|
||||
|
||||
```mermaid
|
||||
classDiagram
|
||||
class Catalog {
|
||||
+id: Integer
|
||||
+name: String
|
||||
+description: Text
|
||||
+type: String
|
||||
+min_chunk_size: Integer
|
||||
+max_chunk_size: Integer
|
||||
+user_metadata: JSON
|
||||
}
|
||||
|
||||
class Retriever {
|
||||
+id: Integer
|
||||
+name: String
|
||||
+description: Text
|
||||
+catalog_id: Integer
|
||||
+type: String
|
||||
+tuning: Boolean
|
||||
+configuration: JSON
|
||||
+arguments: JSON
|
||||
}
|
||||
|
||||
class StandardRAGRetriever {
|
||||
+configuration
|
||||
es_k: Integer
|
||||
es_similarity_threshold: Float
|
||||
+arguments
|
||||
query: String
|
||||
}
|
||||
|
||||
class DossierRetriever {
|
||||
+configuration
|
||||
es_k: Integer
|
||||
es_similarity_threshold: Float
|
||||
tag_conditions: JSON
|
||||
+arguments
|
||||
query: String
|
||||
}
|
||||
|
||||
Catalog "1" -- "*" Retriever : has
|
||||
Retriever <|-- StandardRAGRetriever
|
||||
Retriever <|-- DossierRetriever
|
||||
|
||||
note for StandardRAGRetriever "Default similarity threshold: 0.3<br>Default es_k: 8"
|
||||
note for DossierRetriever "Coming soon<br>Specialized for Dossier catalogs"
|
||||
```
|
||||
## Key Concepts
|
||||
|
||||
### What is a Retriever?
|
||||
|
||||
A retriever is responsible for:
|
||||
- Understanding the meaning of your questions
|
||||
- Searching through document chunks in your catalog
|
||||
- Finding the most relevant information based on semantic similarity
|
||||
- Providing context for Evie's responses
|
||||
|
||||
```mermaid
|
||||
flowchart LR
|
||||
A[User Question] --> B[Retriever]
|
||||
B --> C[Document Chunks]
|
||||
C --> D[Most Relevant Information]
|
||||
D --> E[Evie's Response]
|
||||
|
||||
style A fill:#9c2d66,stroke:#333,stroke-width:2px
|
||||
style B fill:#423372,stroke:#333,stroke-width:2px
|
||||
style C fill:#423372,stroke:#333,stroke-width:2px
|
||||
style D fill:#423372,stroke:#333,stroke-width:2px
|
||||
style E fill:#9c2d66,stroke:#333,stroke-width:2px
|
||||
```
|
||||
|
||||
### How Retrievers Work
|
||||
|
||||
When you ask Evie a question, the retriever:
|
||||
1. Analyzes your question to understand its meaning
|
||||
2. Compares it with stored document chunks
|
||||
3. Assigns similarity scores to each chunk
|
||||
4. Returns the most relevant chunks based on configuration settings
|
||||
|
||||
## Types of Retrievers
|
||||
|
||||
### Standard RAG Retriever
|
||||
|
||||
The Standard RAG (Retrieval-Augmented Generation) Retriever is the default option suitable for most use cases. It
|
||||
searches through all documents in a catalog to find relevant information.
|
||||
|
||||
Configuration options include:
|
||||
- **Maximum Results (es_k)**: Controls how many document chunks to retrieve (default: 8)
|
||||
- **Similarity Threshold**: Determines how closely chunks must match your question (default: 0.3)
|
||||
- Lower threshold = stricter matching
|
||||
- Higher threshold = more permissive matching
|
||||
|
||||
### Dossier Retriever (Coming Soon)
|
||||
|
||||
A specialized retriever for Dossier catalogs that will allow:
|
||||
- Filtering by document tags
|
||||
- Creating specific "viewpoints" based on tag combinations
|
||||
- Combining semantic search with tag-based filtering
|
||||
|
||||
## Setting Up Retrievers
|
||||
|
||||
### Creating a New Retriever
|
||||
|
||||
To create a retriever:
|
||||
1. Enter standard values such as name and description
|
||||
2. Select the target catalog
|
||||
3. Choose the retriever type
|
||||
4. After saving, you will have the ability to set the specific configuration (based on the type)
|
||||
|
||||
### Configuration Best Practices
|
||||
|
||||
1. **Similarity Threshold Tuning**:
|
||||
- Start with the default 0.3 threshold
|
||||
- If receiving too much information: Lower the threshold
|
||||
- If receiving too little information: Raise the threshold
|
||||
|
||||
2. **Multiple Retrievers**:
|
||||
You can create multiple retrievers for the same catalog to serve different purposes. For example:
|
||||
- A broad retriever with higher threshold for general questions
|
||||
- A strict retriever with lower threshold for specific queries
|
||||
- Different retrievers for different document subsets (in Dossier catalogs)
|
||||
|
||||
## Practical Examples
|
||||
|
||||
### Standard RAG Retriever Example
|
||||
|
||||
```json
|
||||
{
|
||||
"name": "General Knowledge Retriever",
|
||||
"type": "STANDARD_RAG",
|
||||
"configuration": {
|
||||
"es_k": 8,
|
||||
"es_similarity_threshold": 0.3
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### Future Dossier Retriever Example
|
||||
|
||||
```json
|
||||
{
|
||||
"name": "Quarterly Reports 2024",
|
||||
"type": "DOSSIER_RAG",
|
||||
"configuration": {
|
||||
"es_k": 8,
|
||||
"es_similarity_threshold": 0.3,
|
||||
"tag_conditions": {
|
||||
"document_type": "quarterly_report",
|
||||
"year": 2024
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Tips for Optimal Retrieval
|
||||
|
||||
1. **Name Retrievers Clearly**:
|
||||
Use descriptive names that indicate their purpose and configuration
|
||||
|
||||
2. **Monitor Performance**:
|
||||
- If answers are missing important information, consider:
|
||||
- Increasing the similarity threshold
|
||||
- Increasing the maximum results (es_k)
|
||||
- If answers contain irrelevant information, consider:
|
||||
- Decreasing the similarity threshold
|
||||
- Decreasing the maximum results
|
||||
|
||||
3. **Use Multiple Retrievers**:
|
||||
Create specialized retrievers for different use cases within the same catalog
|
||||
Reference in New Issue
Block a user