eveai_docs/docs/Library/retrievers.md

---
id: retrievers
title: Understanding Retrievers
description: Learn how retrievers find and extract relevant information from your documents
sidebar_label: Retrievers
sidebar_position: 3
---

# Understanding Retrievers

## Overview

Retrievers are essential components in Evie's Library that help find and extract relevant information from your documents.
Think of retrievers as intelligent search engines that understand the meaning behind your questions and find the most
relevant content from your stored documents.

```mermaid
classDiagram
    class Catalog {
        +id: Integer
        +name: String
        +description: Text
        +type: String
        +min_chunk_size: Integer
        +max_chunk_size: Integer
        +user_metadata: JSON
    }

    class Retriever {
        +id: Integer
        +name: String
        +description: Text
        +catalog_id: Integer
        +type: String
        +tuning: Boolean
        +configuration: JSON
        +arguments: JSON
    }

    class StandardRAGRetriever {
        +configuration
        es_k: Integer
        es_similarity_threshold: Float
        +arguments
        query: String
    }

    class DossierRetriever {
        +configuration
        es_k: Integer
        es_similarity_threshold: Float
        tag_conditions: JSON
        +arguments
        query: String
    }

    Catalog "1" -- "*" Retriever : has
    Retriever <|-- StandardRAGRetriever
    Retriever <|-- DossierRetriever

    note for StandardRAGRetriever "Default similarity threshold: 0.3<br>Default es_k: 8"
    note for DossierRetriever "Coming soon<br>Specialized for Dossier catalogs"
```
## Key Concepts

### What is a Retriever?

A retriever is responsible for:
- Understanding the meaning of your questions
- Searching through document chunks in your catalog
- Finding the most relevant information based on semantic similarity
- Providing context for Evie's responses

```mermaid
flowchart LR
    A[User Question] --> B[Retriever]
    B --> C[Document Chunks]
    C --> D[Most Relevant Information]
    D --> E[Evie's Response]

    style A fill:#9c2d66,stroke:#333,stroke-width:2px
    style B fill:#423372,stroke:#333,stroke-width:2px
    style C fill:#423372,stroke:#333,stroke-width:2px
    style D fill:#423372,stroke:#333,stroke-width:2px
    style E fill:#9c2d66,stroke:#333,stroke-width:2px
```

### How Retrievers Work

When you ask Evie a question, the retriever:
1. Analyzes your question to understand its meaning
2. Compares it with stored document chunks
3. Assigns similarity scores to each chunk
4. Returns the most relevant chunks based on configuration settings

## Types of Retrievers

### Standard RAG Retriever

The Standard RAG (Retrieval-Augmented Generation) Retriever is the default option suitable for most use cases. It
searches through all documents in a catalog to find relevant information.

Configuration options include:
- **Maximum Results (es_k)**: Controls how many document chunks to retrieve (default: 8)
- **Similarity Threshold**: Determines how closely chunks must match your question (default: 0.3)
  - Lower threshold = stricter matching
  - Higher threshold = more permissive matching

### Dossier Retriever (Coming Soon)

A specialized retriever for Dossier catalogs that will allow:
- Filtering by document tags
- Creating specific "viewpoints" based on tag combinations
- Combining semantic search with tag-based filtering

## Setting Up Retrievers

### Creating a New Retriever

To create a retriever:
1. Enter standard values such as name and description
2. Select the target catalog
3. Choose the retriever type
4. After saving, you will have the ability to set the specific configuration (based on the type)

### Configuration Best Practices

1. **Similarity Threshold Tuning**:
   - Start with the default 0.3 threshold
   - If receiving too much information: Lower the threshold
   - If receiving too little information: Raise the threshold

2. **Multiple Retrievers**:
   You can create multiple retrievers for the same catalog to serve different purposes. For example:
   - A broad retriever with higher threshold for general questions
   - A strict retriever with lower threshold for specific queries
   - Different retrievers for different document subsets (in Dossier catalogs)

## Practical Examples

### Standard RAG Retriever Example

```json
{
  "name": "General Knowledge Retriever",
  "type": "STANDARD_RAG",
  "configuration": {
    "es_k": 8,
    "es_similarity_threshold": 0.3
  }
}
```

### Future Dossier Retriever Example

```json
{
  "name": "Quarterly Reports 2024",
  "type": "DOSSIER_RAG",
  "configuration": {
    "es_k": 8,
    "es_similarity_threshold": 0.3,
    "tag_conditions": {
      "document_type": "quarterly_report",
      "year": 2024
    }
  }
}
```

## Tips for Optimal Retrieval

1. **Name Retrievers Clearly**:
   Use descriptive names that indicate their purpose and configuration

2. **Monitor Performance**:
   - If answers are missing important information, consider:
     - Increasing the similarity threshold
     - Increasing the maximum results (es_k)
   - If answers contain irrelevant information, consider:
     - Decreasing the similarity threshold
     - Decreasing the maximum results

3. **Use Multiple Retrievers**:
   Create specialized retrievers for different use cases within the same catalog