Files

Josako 7ef62972f3 Initial commit

2025-12-11 14:43:16 +01:00

5.3 KiB

Raw Blame History

id, title, description, sidebar_label, sidebar_position

id	title	description	sidebar_label	sidebar_position
retrievers	Understanding Retrievers	Learn how retrievers find and extract relevant information from your documents	Retrievers	3

Understanding Retrievers

Overview

Retrievers are essential components in Evie's Library that help find and extract relevant information from your documents. Think of retrievers as intelligent search engines that understand the meaning behind your questions and find the most relevant content from your stored documents.

classDiagram
    class Catalog {
        +id: Integer
        +name: String
        +description: Text
        +type: String
        +min_chunk_size: Integer
        +max_chunk_size: Integer
        +user_metadata: JSON
    }
    
    class Retriever {
        +id: Integer
        +name: String
        +description: Text
        +catalog_id: Integer
        +type: String
        +tuning: Boolean
        +configuration: JSON
        +arguments: JSON
    }
    
    class StandardRAGRetriever {
        +configuration
        es_k: Integer
        es_similarity_threshold: Float
        +arguments
        query: String
    }
    
    class DossierRetriever {
        +configuration
        es_k: Integer
        es_similarity_threshold: Float
        tag_conditions: JSON
        +arguments
        query: String
    }
    
    Catalog "1" -- "*" Retriever : has
    Retriever <|-- StandardRAGRetriever
    Retriever <|-- DossierRetriever
    
    note for StandardRAGRetriever "Default similarity threshold: 0.3<br>Default es_k: 8"
    note for DossierRetriever "Coming soon<br>Specialized for Dossier catalogs"

Key Concepts

What is a Retriever?

A retriever is responsible for:

Understanding the meaning of your questions
Searching through document chunks in your catalog
Finding the most relevant information based on semantic similarity
Providing context for Evie's responses

flowchart LR
    A[User Question] --> B[Retriever]
    B --> C[Document Chunks]
    C --> D[Most Relevant Information]
    D --> E[Evie's Response]
    
    style A fill:#9c2d66,stroke:#333,stroke-width:2px
    style B fill:#423372,stroke:#333,stroke-width:2px
    style C fill:#423372,stroke:#333,stroke-width:2px
    style D fill:#423372,stroke:#333,stroke-width:2px
    style E fill:#9c2d66,stroke:#333,stroke-width:2px

How Retrievers Work

When you ask Evie a question, the retriever:

Analyzes your question to understand its meaning
Compares it with stored document chunks
Assigns similarity scores to each chunk
Returns the most relevant chunks based on configuration settings

Types of Retrievers

Standard RAG Retriever

The Standard RAG (Retrieval-Augmented Generation) Retriever is the default option suitable for most use cases. It searches through all documents in a catalog to find relevant information.

Configuration options include:

Maximum Results (es_k): Controls how many document chunks to retrieve (default: 8)
Similarity Threshold: Determines how closely chunks must match your question (default: 0.3)
- Lower threshold = stricter matching
- Higher threshold = more permissive matching

Dossier Retriever (Coming Soon)

A specialized retriever for Dossier catalogs that will allow:

Filtering by document tags
Creating specific "viewpoints" based on tag combinations
Combining semantic search with tag-based filtering

Setting Up Retrievers

Creating a New Retriever

To create a retriever:

Enter standard values such as name and description
Select the target catalog
Choose the retriever type
After saving, you will have the ability to set the specific configuration (based on the type)

Configuration Best Practices

Similarity Threshold Tuning:
- Start with the default 0.3 threshold
- If receiving too much information: Lower the threshold
- If receiving too little information: Raise the threshold
Multiple Retrievers: You can create multiple retrievers for the same catalog to serve different purposes. For example:
- A broad retriever with higher threshold for general questions
- A strict retriever with lower threshold for specific queries
- Different retrievers for different document subsets (in Dossier catalogs)

Practical Examples

Standard RAG Retriever Example

{
  "name": "General Knowledge Retriever",
  "type": "STANDARD_RAG",
  "configuration": {
    "es_k": 8,
    "es_similarity_threshold": 0.3
  }
}

Future Dossier Retriever Example

{
  "name": "Quarterly Reports 2024",
  "type": "DOSSIER_RAG",
  "configuration": {
    "es_k": 8,
    "es_similarity_threshold": 0.3,
    "tag_conditions": {
      "document_type": "quarterly_report",
      "year": 2024
    }
  }
}

Tips for Optimal Retrieval

Name Retrievers Clearly: Use descriptive names that indicate their purpose and configuration
Monitor Performance:
- If answers are missing important information, consider:
  - Increasing the similarity threshold
  - Increasing the maximum results (es_k)
- If answers contain irrelevant information, consider:
  - Decreasing the similarity threshold
  - Decreasing the maximum results
Use Multiple Retrievers: Create specialized retrievers for different use cases within the same catalog

5.3 KiB Raw Blame History