186 lines
5.3 KiB
Markdown
186 lines
5.3 KiB
Markdown
---
|
|
id: retrievers
|
|
title: Understanding Retrievers
|
|
description: Learn how retrievers find and extract relevant information from your documents
|
|
sidebar_label: Retrievers
|
|
sidebar_position: 3
|
|
---
|
|
|
|
# Understanding Retrievers
|
|
|
|
## Overview
|
|
|
|
Retrievers are essential components in Evie's Library that help find and extract relevant information from your documents.
|
|
Think of retrievers as intelligent search engines that understand the meaning behind your questions and find the most
|
|
relevant content from your stored documents.
|
|
|
|
```mermaid
|
|
classDiagram
|
|
class Catalog {
|
|
+id: Integer
|
|
+name: String
|
|
+description: Text
|
|
+type: String
|
|
+min_chunk_size: Integer
|
|
+max_chunk_size: Integer
|
|
+user_metadata: JSON
|
|
}
|
|
|
|
class Retriever {
|
|
+id: Integer
|
|
+name: String
|
|
+description: Text
|
|
+catalog_id: Integer
|
|
+type: String
|
|
+tuning: Boolean
|
|
+configuration: JSON
|
|
+arguments: JSON
|
|
}
|
|
|
|
class StandardRAGRetriever {
|
|
+configuration
|
|
es_k: Integer
|
|
es_similarity_threshold: Float
|
|
+arguments
|
|
query: String
|
|
}
|
|
|
|
class DossierRetriever {
|
|
+configuration
|
|
es_k: Integer
|
|
es_similarity_threshold: Float
|
|
tag_conditions: JSON
|
|
+arguments
|
|
query: String
|
|
}
|
|
|
|
Catalog "1" -- "*" Retriever : has
|
|
Retriever <|-- StandardRAGRetriever
|
|
Retriever <|-- DossierRetriever
|
|
|
|
note for StandardRAGRetriever "Default similarity threshold: 0.3<br>Default es_k: 8"
|
|
note for DossierRetriever "Coming soon<br>Specialized for Dossier catalogs"
|
|
```
|
|
## Key Concepts
|
|
|
|
### What is a Retriever?
|
|
|
|
A retriever is responsible for:
|
|
- Understanding the meaning of your questions
|
|
- Searching through document chunks in your catalog
|
|
- Finding the most relevant information based on semantic similarity
|
|
- Providing context for Evie's responses
|
|
|
|
```mermaid
|
|
flowchart LR
|
|
A[User Question] --> B[Retriever]
|
|
B --> C[Document Chunks]
|
|
C --> D[Most Relevant Information]
|
|
D --> E[Evie's Response]
|
|
|
|
style A fill:#9c2d66,stroke:#333,stroke-width:2px
|
|
style B fill:#423372,stroke:#333,stroke-width:2px
|
|
style C fill:#423372,stroke:#333,stroke-width:2px
|
|
style D fill:#423372,stroke:#333,stroke-width:2px
|
|
style E fill:#9c2d66,stroke:#333,stroke-width:2px
|
|
```
|
|
|
|
### How Retrievers Work
|
|
|
|
When you ask Evie a question, the retriever:
|
|
1. Analyzes your question to understand its meaning
|
|
2. Compares it with stored document chunks
|
|
3. Assigns similarity scores to each chunk
|
|
4. Returns the most relevant chunks based on configuration settings
|
|
|
|
## Types of Retrievers
|
|
|
|
### Standard RAG Retriever
|
|
|
|
The Standard RAG (Retrieval-Augmented Generation) Retriever is the default option suitable for most use cases. It
|
|
searches through all documents in a catalog to find relevant information.
|
|
|
|
Configuration options include:
|
|
- **Maximum Results (es_k)**: Controls how many document chunks to retrieve (default: 8)
|
|
- **Similarity Threshold**: Determines how closely chunks must match your question (default: 0.3)
|
|
- Lower threshold = stricter matching
|
|
- Higher threshold = more permissive matching
|
|
|
|
### Dossier Retriever (Coming Soon)
|
|
|
|
A specialized retriever for Dossier catalogs that will allow:
|
|
- Filtering by document tags
|
|
- Creating specific "viewpoints" based on tag combinations
|
|
- Combining semantic search with tag-based filtering
|
|
|
|
## Setting Up Retrievers
|
|
|
|
### Creating a New Retriever
|
|
|
|
To create a retriever:
|
|
1. Enter standard values such as name and description
|
|
2. Select the target catalog
|
|
3. Choose the retriever type
|
|
4. After saving, you will have the ability to set the specific configuration (based on the type)
|
|
|
|
### Configuration Best Practices
|
|
|
|
1. **Similarity Threshold Tuning**:
|
|
- Start with the default 0.3 threshold
|
|
- If receiving too much information: Lower the threshold
|
|
- If receiving too little information: Raise the threshold
|
|
|
|
2. **Multiple Retrievers**:
|
|
You can create multiple retrievers for the same catalog to serve different purposes. For example:
|
|
- A broad retriever with higher threshold for general questions
|
|
- A strict retriever with lower threshold for specific queries
|
|
- Different retrievers for different document subsets (in Dossier catalogs)
|
|
|
|
## Practical Examples
|
|
|
|
### Standard RAG Retriever Example
|
|
|
|
```json
|
|
{
|
|
"name": "General Knowledge Retriever",
|
|
"type": "STANDARD_RAG",
|
|
"configuration": {
|
|
"es_k": 8,
|
|
"es_similarity_threshold": 0.3
|
|
}
|
|
}
|
|
```
|
|
|
|
### Future Dossier Retriever Example
|
|
|
|
```json
|
|
{
|
|
"name": "Quarterly Reports 2024",
|
|
"type": "DOSSIER_RAG",
|
|
"configuration": {
|
|
"es_k": 8,
|
|
"es_similarity_threshold": 0.3,
|
|
"tag_conditions": {
|
|
"document_type": "quarterly_report",
|
|
"year": 2024
|
|
}
|
|
}
|
|
}
|
|
```
|
|
|
|
## Tips for Optimal Retrieval
|
|
|
|
1. **Name Retrievers Clearly**:
|
|
Use descriptive names that indicate their purpose and configuration
|
|
|
|
2. **Monitor Performance**:
|
|
- If answers are missing important information, consider:
|
|
- Increasing the similarity threshold
|
|
- Increasing the maximum results (es_k)
|
|
- If answers contain irrelevant information, consider:
|
|
- Decreasing the similarity threshold
|
|
- Decreasing the maximum results
|
|
|
|
3. **Use Multiple Retrievers**:
|
|
Create specialized retrievers for different use cases within the same catalog
|