5.3 KiB
id, title, description, sidebar_label, sidebar_position
| id | title | description | sidebar_label | sidebar_position |
|---|---|---|---|---|
| retrievers | Understanding Retrievers | Learn how retrievers find and extract relevant information from your documents | Retrievers | 3 |
Understanding Retrievers
Overview
Retrievers are essential components in Evie's Library that help find and extract relevant information from your documents. Think of retrievers as intelligent search engines that understand the meaning behind your questions and find the most relevant content from your stored documents.
classDiagram
class Catalog {
+id: Integer
+name: String
+description: Text
+type: String
+min_chunk_size: Integer
+max_chunk_size: Integer
+user_metadata: JSON
}
class Retriever {
+id: Integer
+name: String
+description: Text
+catalog_id: Integer
+type: String
+tuning: Boolean
+configuration: JSON
+arguments: JSON
}
class StandardRAGRetriever {
+configuration
es_k: Integer
es_similarity_threshold: Float
+arguments
query: String
}
class DossierRetriever {
+configuration
es_k: Integer
es_similarity_threshold: Float
tag_conditions: JSON
+arguments
query: String
}
Catalog "1" -- "*" Retriever : has
Retriever <|-- StandardRAGRetriever
Retriever <|-- DossierRetriever
note for StandardRAGRetriever "Default similarity threshold: 0.3<br>Default es_k: 8"
note for DossierRetriever "Coming soon<br>Specialized for Dossier catalogs"
Key Concepts
What is a Retriever?
A retriever is responsible for:
- Understanding the meaning of your questions
- Searching through document chunks in your catalog
- Finding the most relevant information based on semantic similarity
- Providing context for Evie's responses
flowchart LR
A[User Question] --> B[Retriever]
B --> C[Document Chunks]
C --> D[Most Relevant Information]
D --> E[Evie's Response]
style A fill:#9c2d66,stroke:#333,stroke-width:2px
style B fill:#423372,stroke:#333,stroke-width:2px
style C fill:#423372,stroke:#333,stroke-width:2px
style D fill:#423372,stroke:#333,stroke-width:2px
style E fill:#9c2d66,stroke:#333,stroke-width:2px
How Retrievers Work
When you ask Evie a question, the retriever:
- Analyzes your question to understand its meaning
- Compares it with stored document chunks
- Assigns similarity scores to each chunk
- Returns the most relevant chunks based on configuration settings
Types of Retrievers
Standard RAG Retriever
The Standard RAG (Retrieval-Augmented Generation) Retriever is the default option suitable for most use cases. It searches through all documents in a catalog to find relevant information.
Configuration options include:
- Maximum Results (es_k): Controls how many document chunks to retrieve (default: 8)
- Similarity Threshold: Determines how closely chunks must match your question (default: 0.3)
- Lower threshold = stricter matching
- Higher threshold = more permissive matching
Dossier Retriever (Coming Soon)
A specialized retriever for Dossier catalogs that will allow:
- Filtering by document tags
- Creating specific "viewpoints" based on tag combinations
- Combining semantic search with tag-based filtering
Setting Up Retrievers
Creating a New Retriever
To create a retriever:
- Enter standard values such as name and description
- Select the target catalog
- Choose the retriever type
- After saving, you will have the ability to set the specific configuration (based on the type)
Configuration Best Practices
-
Similarity Threshold Tuning:
- Start with the default 0.3 threshold
- If receiving too much information: Lower the threshold
- If receiving too little information: Raise the threshold
-
Multiple Retrievers: You can create multiple retrievers for the same catalog to serve different purposes. For example:
- A broad retriever with higher threshold for general questions
- A strict retriever with lower threshold for specific queries
- Different retrievers for different document subsets (in Dossier catalogs)
Practical Examples
Standard RAG Retriever Example
{
"name": "General Knowledge Retriever",
"type": "STANDARD_RAG",
"configuration": {
"es_k": 8,
"es_similarity_threshold": 0.3
}
}
Future Dossier Retriever Example
{
"name": "Quarterly Reports 2024",
"type": "DOSSIER_RAG",
"configuration": {
"es_k": 8,
"es_similarity_threshold": 0.3,
"tag_conditions": {
"document_type": "quarterly_report",
"year": 2024
}
}
}
Tips for Optimal Retrieval
-
Name Retrievers Clearly: Use descriptive names that indicate their purpose and configuration
-
Monitor Performance:
- If answers are missing important information, consider:
- Increasing the similarity threshold
- Increasing the maximum results (es_k)
- If answers contain irrelevant information, consider:
- Decreasing the similarity threshold
- Decreasing the maximum results
- If answers are missing important information, consider:
-
Use Multiple Retrievers: Create specialized retrievers for different use cases within the same catalog