25 Commits

Author SHA1 Message Date
Josako
344ea26ecc - Security improvements to Docker images (Docker Scout advise) 2024-11-27 12:27:28 +01:00
Josako
98cb4e4f2f - Created a new eveai_chat plugin to support the new dynamic possibilities of the Specialists. Currently only supports standard Rag retrievers (i.e. no extra arguments). 2024-11-27 12:26:49 +01:00
Josako
07d89d204f - Created a new eveai_chat plugin to support the new dynamic possibilities of the Specialists. Currently only supports standard Rag retrievers (i.e. no extra arguments). 2024-11-26 13:35:29 +01:00
Josako
7702a6dfcc - Modernized authentication with the introduction of TenantProject
- Created a base mail template
- Adapt and improve document API to usage of catalogs and processors
- Adapt eveai_sync to new authentication mechanism and usage of catalogs and processors
2024-11-21 17:24:33 +01:00
Josako
4c009949b3 - Changes to support SpecialistID being passed iso CatalogID
- Removed error that stopped sync
2024-11-15 13:13:45 +01:00
Josako
aa4ac3ec7c - Changes to support SpecialistID being passed iso CatalogID
- Removed error that stopped sync
2024-11-15 13:13:33 +01:00
Josako
1807435339 - Introduction of dynamic Retrievers & Specialists
- Introduction of dynamic Processors
- Introduction of caching system
- Introduction of a better template manager
- Adaptation of ModelVariables to support dynamic Processors / Retrievers / Specialists
- Start adaptation of chat client
2024-11-15 10:00:53 +01:00
Josako
55a8a95f79 - Finalisation of the Specialist model, forms and views 2024-11-04 11:22:40 +01:00
Josako
503ea7965d - Temporary checkin to branch for the rest of the introduction of experts 2024-11-03 16:18:14 +01:00
Josako
88f4db1178 - Organise retrievers 2024-11-01 11:19:55 +01:00
Josako
2df291ea91 - Organise retrievers 2024-11-01 11:19:34 +01:00
Josako
5841525b4c - When no explicit path is given in the browser, we automatically get redirected to the admin interface (eveai_app)
- Tuning moved to Retriever iso in the configuration, as this is an attribute that should be available for all types of Retrievers
2024-10-31 08:32:02 +01:00
Josako
532073d38e - Add dynamic fields to DocumentVersion in case the Catalog requires it. 2024-10-30 13:52:18 +01:00
Josako
43547287b1 - Refining & Enhancing dynamic fields
- Creating a specialized Form class for handling dynamic fields
- Refinement of HTML-macros to handle dynamic fields
- Introduction of dynamic fields for Catalogs
2024-10-29 09:17:44 +01:00
Josako
aa358df28e - Allowing for multiple types of Catalogs
- Introduction of retrievers
- Ensuring processing information is collected from Catalog iso Tenant
- Introduction of a generic Form class to enable dynamic fields based on a configuration
- Realisation of Retriever functionality to support dynamic fields
2024-10-25 14:11:47 +02:00
Josako
30fec27488 - Release script added to tag in both git and docker 2024-10-21 07:45:06 +02:00
Josako
5e77b478dd - Release script added to tag in both git and docker 2024-10-17 11:22:18 +02:00
Josako
6f71259822 - Changelog update 2024-10-17 10:35:51 +02:00
Josako
74cc7ae95e - Adapt Sync Wordpress Component to Catalog introduction
- Small bug fixes
2024-10-17 10:31:13 +02:00
Josako
7f12c8b355 - Remove obsolete fields from Tenant model (Catalog introduction) 2024-10-16 13:59:57 +02:00
Josako
6069f5f7e5 - Catalog functionality integrated into document and document_version views
- small bugfixes and improvements
2024-10-16 13:09:19 +02:00
Josako
3e644f1652 - Add Catalog Functionality 2024-10-15 18:14:57 +02:00
Josako
3316a8bc47 - Small changes to show when upgrades are finished 2024-10-14 16:40:56 +02:00
Josako
270479c77d - Add Catalog Concept to Document Domain
- Create Catalog views
- Modify document stack creation
2024-10-14 13:56:23 +02:00
Josako
0f4558d775 - Small fix in interaction view, as it still refered to file_name 2024-10-11 18:14:35 +02:00
198 changed files with 10634 additions and 3026 deletions

2
.gitignore vendored
View File

@@ -43,3 +43,5 @@ scripts/.DS_Store
scripts/__pycache__/run_eveai_app.cpython-312.pyc
/eveai_repo.txt
*repo.txt
/docker/eveai_logs/
/integrations/Wordpress/eveai_sync.zip

View File

@@ -4,8 +4,7 @@ eveai_beat/
eveai_chat/
eveai_chat_workers/
eveai_entitlements/
eveai_workers/
instance/
integrations/
integrations/Wordpress/eveai-chat
nginx/
scripts/

View File

@@ -1,11 +1,10 @@
docker/
eveai_api/
eveai_app/
eveai_beat/
eveai_chat_workers/
eveai_entitlements/
eveai_workers/
instance/
integrations/
integrations/Wordpress/eveai_sync
nginx/
scripts/

View File

@@ -25,6 +25,87 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
### Security
- In case of vulnerabilities.
## [2.0.0-alfa]
### Added
- Introduction of dynamic Retrievers & Specialists
- Introduction of dynamic Processors
- Introduction of caching system
- Introduction of a better template manager
### Changed
- For changes in existing functionality.
### Deprecated
- For soon-to-be removed features.
### Removed
- For now removed features.
### Fixed
- Set default language when registering Documents or URLs.
### Security
- In case of vulnerabilities.
## [1.0.14-alfa]
### Added
- New release script added to tag images with release number
- Allow the addition of multiple types of Catalogs
- Generic functionality to enable dynamic fields
- Addition of Retrievers to allow for smart collection of information in Catalogs
- Add dynamic fields to Catalog / Retriever / DocumentVersion
### Changed
- Processing parameters defined at Catalog level iso Tenant level
- Reroute 'blank' paths to 'admin'
### Deprecated
- For soon-to-be removed features.
### Removed
- For now removed features.
### Fixed
- Set default language when registering Documents or URLs.
### Security
- In case of vulnerabilities.
## [1.0.13-alfa]
### Added
- Finished Catalog introduction
- Reinitialization of WordPress site for syncing
### Changed
- Modification of WordPress Sync Component
- Cleanup of attributes in Tenant
### Fixed
- Overall bugfixes as result from the Catalog introduction
## [1.0.12-alfa]
### Added
- Added Catalog functionality
### Changed
- For changes in existing functionality.
### Deprecated
- For soon-to-be removed features.
### Removed
- For now removed features.
### Fixed
- Set default language when registering Documents or URLs.
### Security
- In case of vulnerabilities.
## [1.0.11-alfa]
### Added
@@ -37,7 +118,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- repopack can now split for different components
### Fixed
- Various fixes as consequece of changing file_location / file_name ==> bucket_name / object_name
- Various fixes as consequence of changing file_location / file_name ==> bucket_name / object_name
- Celery Routing / Queuing updated
## [1.0.10-alfa]

View File

@@ -12,6 +12,8 @@ from flask_wtf import CSRFProtect
from flask_restx import Api
from prometheus_flask_exporter import PrometheusMetrics
from .langchain.templates.template_manager import TemplateManager
from .utils.cache.eveai_cache_manager import EveAICacheManager
from .utils.simple_encryption import SimpleEncryption
from .utils.minio_utils import MinioClient
@@ -32,3 +34,5 @@ api_rest = Api()
simple_encryption = SimpleEncryption()
minio_client = MinioClient()
metrics = PrometheusMetrics.for_app_factory()
template_manager = TemplateManager()
cache_manager = EveAICacheManager()

View File

View File

@@ -1,52 +0,0 @@
from langchain_core.retrievers import BaseRetriever
from sqlalchemy import asc
from sqlalchemy.exc import SQLAlchemyError
from pydantic import Field, BaseModel, PrivateAttr
from typing import Any, Dict
from flask import current_app
from common.extensions import db
from common.models.interaction import ChatSession, Interaction
from common.utils.model_utils import ModelVariables
class EveAIHistoryRetriever(BaseRetriever, BaseModel):
_model_variables: ModelVariables = PrivateAttr()
_session_id: str = PrivateAttr()
def __init__(self, model_variables: ModelVariables, session_id: str):
super().__init__()
self._model_variables = model_variables
self._session_id = session_id
@property
def model_variables(self) -> ModelVariables:
return self._model_variables
@property
def session_id(self) -> str:
return self._session_id
def _get_relevant_documents(self, query: str):
current_app.logger.debug(f'Retrieving history of interactions for query: {query}')
try:
query_obj = (
db.session.query(Interaction)
.join(ChatSession, Interaction.chat_session_id == ChatSession.id)
.filter(ChatSession.session_id == self.session_id)
.order_by(asc(Interaction.id))
)
interactions = query_obj.all()
result = []
for interaction in interactions:
result.append(f'HUMAN:\n{interaction.detailed_question}\n\nAI: \n{interaction.answer}\n\n')
except SQLAlchemyError as e:
current_app.logger.error(f'Error retrieving history of interactions: {e}')
db.session.rollback()
return []
return result

View File

@@ -1,138 +0,0 @@
from langchain_core.retrievers import BaseRetriever
from sqlalchemy import func, and_, or_, desc
from sqlalchemy.exc import SQLAlchemyError
from pydantic import BaseModel, Field, PrivateAttr
from typing import Any, Dict
from flask import current_app
from common.extensions import db
from common.models.document import Document, DocumentVersion
from common.utils.datetime_utils import get_date_in_timezone
from common.utils.model_utils import ModelVariables
class EveAIRetriever(BaseRetriever, BaseModel):
_model_variables: ModelVariables = PrivateAttr()
_tenant_info: Dict[str, Any] = PrivateAttr()
def __init__(self, model_variables: ModelVariables, tenant_info: Dict[str, Any]):
super().__init__()
current_app.logger.debug(f'Model variables type: {type(model_variables)}')
self._model_variables = model_variables
self._tenant_info = tenant_info
@property
def model_variables(self) -> ModelVariables:
return self._model_variables
@property
def tenant_info(self) -> Dict[str, Any]:
return self._tenant_info
def _get_relevant_documents(self, query: str):
current_app.logger.debug(f'Retrieving relevant documents for query: {query}')
query_embedding = self._get_query_embedding(query)
current_app.logger.debug(f'Model Variables Private: {type(self._model_variables)}')
current_app.logger.debug(f'Model Variables Property: {type(self.model_variables)}')
db_class = self.model_variables['embedding_db_model']
similarity_threshold = self.model_variables['similarity_threshold']
k = self.model_variables['k']
if self.tenant_info['rag_tuning']:
try:
current_date = get_date_in_timezone(self.tenant_info['timezone'])
current_app.rag_tuning_logger.debug(f'Current date: {current_date}\n')
# Debug query to show similarity for all valid documents (without chunk text)
debug_query = (
db.session.query(
Document.id.label('document_id'),
DocumentVersion.id.label('version_id'),
db_class.id.label('embedding_id'),
(1 - db_class.embedding.cosine_distance(query_embedding)).label('similarity')
)
.join(DocumentVersion, db_class.doc_vers_id == DocumentVersion.id)
.join(Document, DocumentVersion.doc_id == Document.id)
.filter(
or_(Document.valid_from.is_(None), func.date(Document.valid_from) <= current_date),
or_(Document.valid_to.is_(None), func.date(Document.valid_to) >= current_date)
)
.order_by(desc('similarity'))
)
debug_results = debug_query.all()
current_app.logger.debug("Debug: Similarity for all valid documents:")
for row in debug_results:
current_app.rag_tuning_logger.debug(f"Doc ID: {row.document_id}, "
f"Version ID: {row.version_id}, "
f"Embedding ID: {row.embedding_id}, "
f"Similarity: {row.similarity}")
current_app.rag_tuning_logger.debug(f'---------------------------------------\n')
except SQLAlchemyError as e:
current_app.logger.error(f'Error generating overview: {e}')
db.session.rollback()
if self.tenant_info['rag_tuning']:
current_app.rag_tuning_logger.debug(f'Parameters for Retrieval of documents: \n')
current_app.rag_tuning_logger.debug(f'Similarity Threshold: {similarity_threshold}\n')
current_app.rag_tuning_logger.debug(f'K: {k}\n')
current_app.rag_tuning_logger.debug(f'---------------------------------------\n')
try:
current_date = get_date_in_timezone(self.tenant_info['timezone'])
# Subquery to find the latest version of each document
subquery = (
db.session.query(
DocumentVersion.doc_id,
func.max(DocumentVersion.id).label('latest_version_id')
)
.group_by(DocumentVersion.doc_id)
.subquery()
)
# Main query to filter embeddings
query_obj = (
db.session.query(db_class,
(1 - db_class.embedding.cosine_distance(query_embedding)).label('similarity'))
.join(DocumentVersion, db_class.doc_vers_id == DocumentVersion.id)
.join(Document, DocumentVersion.doc_id == Document.id)
.join(subquery, DocumentVersion.id == subquery.c.latest_version_id)
.filter(
or_(Document.valid_from.is_(None), func.date(Document.valid_from) <= current_date),
or_(Document.valid_to.is_(None), func.date(Document.valid_to) >= current_date),
(1 - db_class.embedding.cosine_distance(query_embedding)) > similarity_threshold
)
.order_by(desc('similarity'))
.limit(k)
)
if self.tenant_info['rag_tuning']:
current_app.rag_tuning_logger.debug(f'Query executed for Retrieval of documents: \n')
current_app.rag_tuning_logger.debug(f'{query_obj.statement}\n')
current_app.rag_tuning_logger.debug(f'---------------------------------------\n')
res = query_obj.all()
if self.tenant_info['rag_tuning']:
current_app.rag_tuning_logger.debug(f'Retrieved {len(res)} relevant documents \n')
current_app.rag_tuning_logger.debug(f'Data retrieved: \n')
current_app.rag_tuning_logger.debug(f'{res}\n')
current_app.rag_tuning_logger.debug(f'---------------------------------------\n')
result = []
for doc in res:
if self.tenant_info['rag_tuning']:
current_app.rag_tuning_logger.debug(f'Document ID: {doc[0].id} - Distance: {doc[1]}\n')
current_app.rag_tuning_logger.debug(f'Chunk: \n {doc[0].chunk}\n\n')
result.append(f'SOURCE: {doc[0].id}\n\n{doc[0].chunk}\n\n')
except SQLAlchemyError as e:
current_app.logger.error(f'Error retrieving relevant documents: {e}')
db.session.rollback()
return []
return result
def _get_query_embedding(self, query: str):
embedding_model = self.model_variables['embedding_model']
query_embedding = embedding_model.embed_query(query)
return query_embedding

View File

@@ -0,0 +1,23 @@
# Output Schema Management - common/langchain/outputs/base.py
from typing import Dict, Type, Any
from pydantic import BaseModel
class BaseSpecialistOutput(BaseModel):
"""Base class for all specialist outputs"""
pass
class OutputRegistry:
"""Registry for specialist output schemas"""
_schemas: Dict[str, Type[BaseSpecialistOutput]] = {}
@classmethod
def register(cls, specialist_type: str, schema_class: Type[BaseSpecialistOutput]):
cls._schemas[specialist_type] = schema_class
@classmethod
def get_schema(cls, specialist_type: str) -> Type[BaseSpecialistOutput]:
if specialist_type not in cls._schemas:
raise ValueError(f"No output schema registered for {specialist_type}")
return cls._schemas[specialist_type]

View File

@@ -0,0 +1,22 @@
# RAG Specialist Output - common/langchain/outputs/rag.py
from typing import List
from pydantic import Field
from .base import BaseSpecialistOutput
class RAGOutput(BaseSpecialistOutput):
"""Output schema for RAG specialist"""
"""Default docstring - to be replaced with actual prompt"""
answer: str = Field(
...,
description="The answer to the user question, based on the given sources",
)
citations: List[int] = Field(
...,
description="The integer IDs of the SPECIFIC sources that were used to generate the answer"
)
insufficient_info: bool = Field(
False, # Default value is set to False
description="A boolean indicating whether given sources were sufficient or not to generate the answer"
)

View File

@@ -0,0 +1,154 @@
import os
import yaml
from typing import Dict, Optional, Any
from packaging import version
from dataclasses import dataclass
from flask import current_app, Flask
from common.utils.os_utils import get_project_root
@dataclass
class PromptTemplate:
"""Represents a versioned prompt template"""
content: str
version: str
metadata: Dict[str, Any]
class TemplateManager:
"""Manages versioned prompt templates"""
def __init__(self):
self.templates_dir = None
self._templates = None
self.app = None
def init_app(self, app: Flask) -> None:
# Initialize template manager
base_dir = "/app"
self.templates_dir = os.path.join(base_dir, 'config', 'prompts')
app.logger.debug(f'Loading templates from {self.templates_dir}')
self.app = app
self._templates = self._load_templates()
# Log available templates for each supported model
for llm in app.config['SUPPORTED_LLMS']:
try:
available_templates = self.list_templates(llm)
app.logger.info(f"Loaded templates for {llm}: {available_templates}")
except ValueError:
app.logger.warning(f"No templates found for {llm}")
def _load_templates(self) -> Dict[str, Dict[str, Dict[str, PromptTemplate]]]:
"""
Load all template versions from the templates directory.
Structure: {provider.model -> {template_name -> {version -> template}}}
Directory structure:
prompts/
├── provider/
│ └── model/
│ └── template_name/
│ └── version.yaml
"""
templates = {}
# Iterate through providers (anthropic, openai)
for provider in os.listdir(self.templates_dir):
provider_path = os.path.join(self.templates_dir, provider)
if not os.path.isdir(provider_path):
continue
# Iterate through models (claude-3, gpt-4o)
for model in os.listdir(provider_path):
model_path = os.path.join(provider_path, model)
if not os.path.isdir(model_path):
continue
provider_model = f"{provider}.{model}"
templates[provider_model] = {}
# Iterate through template types (rag, summary, etc.)
for template_name in os.listdir(model_path):
template_path = os.path.join(model_path, template_name)
if not os.path.isdir(template_path):
continue
template_versions = {}
# Load all version files for this template
for version_file in os.listdir(template_path):
if not version_file.endswith('.yaml'):
continue
version_str = version_file[:-5] # Remove .yaml
if not self._is_valid_version(version_str):
current_app.logger.warning(
f"Invalid version format for {template_name}: {version_str}")
continue
try:
with open(os.path.join(template_path, version_file)) as f:
template_data = yaml.safe_load(f)
# Verify required fields
if not template_data.get('content'):
raise ValueError("Template content is required")
template_versions[version_str] = PromptTemplate(
content=template_data['content'],
version=version_str,
metadata=template_data.get('metadata', {})
)
except Exception as e:
current_app.logger.error(
f"Error loading template {template_name} version {version_str}: {e}")
continue
if template_versions:
templates[provider_model][template_name] = template_versions
return templates
def _is_valid_version(self, version_str: str) -> bool:
"""Validate semantic versioning string"""
try:
version.parse(version_str)
return True
except version.InvalidVersion:
return False
def get_template(self,
provider_model: str,
template_name: str,
template_version: Optional[str] = None) -> PromptTemplate:
"""
Get a specific template version. If version not specified,
returns the latest version.
"""
if provider_model not in self._templates:
raise ValueError(f"Unknown provider.model: {provider_model}")
if template_name not in self._templates[provider_model]:
raise ValueError(f"Unknown template: {template_name}")
versions = self._templates[provider_model][template_name]
if template_version:
if template_version not in versions:
raise ValueError(f"Template version {template_version} not found")
return versions[template_version]
# Return latest version
latest = max(versions.keys(), key=version.parse)
return versions[latest]
def list_templates(self, provider_model: str) -> Dict[str, list]:
"""
List all available templates and their versions for a provider.model
Returns: {template_name: [version1, version2, ...]}
"""
if provider_model not in self._templates:
raise ValueError(f"Unknown provider.model: {provider_model}")
return {
template_name: sorted(versions.keys(), key=version.parse)
for template_name, versions in self._templates[provider_model].items()
}

View File

@@ -1,27 +0,0 @@
import time
from common.utils.business_event_context import current_event
def tracked_transcribe(client, *args, **kwargs):
start_time = time.time()
# Extract the file and model from kwargs if present, otherwise use defaults
file = kwargs.get('file')
model = kwargs.get('model', 'whisper-1')
duration = kwargs.pop('duration', 600)
result = client.audio.transcriptions.create(*args, **kwargs)
end_time = time.time()
# Token usage for transcriptions is actually the duration in seconds we pass, as the whisper model is priced per second transcribed
metrics = {
'total_tokens': duration,
'prompt_tokens': 0, # For transcriptions, all tokens are considered "completion"
'completion_tokens': duration,
'time_elapsed': end_time - start_time,
'interaction_type': 'ASR',
}
current_event.log_llm_metrics(metrics)
return result

View File

@@ -0,0 +1,77 @@
# common/langchain/tracked_transcription.py
from typing import Any, Optional, Dict
import time
from openai import OpenAI
from common.utils.business_event_context import current_event
class TrackedOpenAITranscription:
"""Wrapper for OpenAI transcription with metric tracking"""
def __init__(self, api_key: str, **kwargs: Any):
"""Initialize with OpenAI client settings"""
self.client = OpenAI(api_key=api_key)
self.model = kwargs.get('model', 'whisper-1')
def transcribe(self,
file: Any,
model: Optional[str] = None,
language: Optional[str] = None,
prompt: Optional[str] = None,
response_format: Optional[str] = None,
temperature: Optional[float] = None,
duration: Optional[int] = None) -> str:
"""
Transcribe audio with metrics tracking
Args:
file: Audio file to transcribe
model: Model to use (defaults to whisper-1)
language: Optional language of the audio
prompt: Optional prompt to guide transcription
response_format: Response format (json, text, etc)
temperature: Sampling temperature
duration: Duration of audio in seconds for metrics
Returns:
Transcription text
"""
start_time = time.time()
try:
# Create transcription options
options = {
"file": file,
"model": model or self.model,
}
if language:
options["language"] = language
if prompt:
options["prompt"] = prompt
if response_format:
options["response_format"] = response_format
if temperature:
options["temperature"] = temperature
response = self.client.audio.transcriptions.create(**options)
# Calculate metrics
end_time = time.time()
# Token usage for transcriptions is based on audio duration
metrics = {
'total_tokens': duration or 600, # Default to 10 minutes if duration not provided
'prompt_tokens': 0, # For transcriptions, all tokens are completion
'completion_tokens': duration or 600,
'time_elapsed': end_time - start_time,
'interaction_type': 'ASR',
}
current_event.log_llm_metrics(metrics)
# Return text from response
if isinstance(response, str):
return response
return response.text
except Exception as e:
raise Exception(f"Transcription failed: {str(e)}")

View File

@@ -2,12 +2,80 @@ from common.extensions import db
from .user import User, Tenant
from pgvector.sqlalchemy import Vector
from sqlalchemy.dialects.postgresql import JSONB
from sqlalchemy.dialects.postgresql import ARRAY
import sqlalchemy as sa
class Catalog(db.Model):
id = db.Column(db.Integer, primary_key=True)
name = db.Column(db.String(50), nullable=False)
description = db.Column(db.Text, nullable=True)
type = db.Column(db.String(50), nullable=False, default="STANDARD_CATALOG")
min_chunk_size = db.Column(db.Integer, nullable=True, default=2000)
max_chunk_size = db.Column(db.Integer, nullable=True, default=3000)
# Meta Data
user_metadata = db.Column(JSONB, nullable=True)
system_metadata = db.Column(JSONB, nullable=True)
configuration = db.Column(JSONB, nullable=True)
# Versioning Information
created_at = db.Column(db.DateTime, nullable=False, server_default=db.func.now())
created_by = db.Column(db.Integer, db.ForeignKey(User.id), nullable=True)
updated_at = db.Column(db.DateTime, nullable=False, server_default=db.func.now(), onupdate=db.func.now())
updated_by = db.Column(db.Integer, db.ForeignKey(User.id))
class Processor(db.Model):
id = db.Column(db.Integer, primary_key=True)
name = db.Column(db.String(50), nullable=False)
description = db.Column(db.Text, nullable=True)
catalog_id = db.Column(db.Integer, db.ForeignKey('catalog.id'), nullable=True)
type = db.Column(db.String(50), nullable=False)
sub_file_type = db.Column(db.String(50), nullable=True)
# Tuning enablers
tuning = db.Column(db.Boolean, nullable=True, default=False)
# Meta Data
user_metadata = db.Column(JSONB, nullable=True)
system_metadata = db.Column(JSONB, nullable=True)
configuration = db.Column(JSONB, nullable=True)
# Versioning Information
created_at = db.Column(db.DateTime, nullable=False, server_default=db.func.now())
created_by = db.Column(db.Integer, db.ForeignKey(User.id), nullable=True)
updated_at = db.Column(db.DateTime, nullable=False, server_default=db.func.now(), onupdate=db.func.now())
updated_by = db.Column(db.Integer, db.ForeignKey(User.id))
class Retriever(db.Model):
id = db.Column(db.Integer, primary_key=True)
name = db.Column(db.String(50), nullable=False)
description = db.Column(db.Text, nullable=True)
catalog_id = db.Column(db.Integer, db.ForeignKey('catalog.id'), nullable=True)
type = db.Column(db.String(50), nullable=False, default="STANDARD_RAG")
tuning = db.Column(db.Boolean, nullable=True, default=False)
# Meta Data
user_metadata = db.Column(JSONB, nullable=True)
system_metadata = db.Column(JSONB, nullable=True)
configuration = db.Column(JSONB, nullable=True)
arguments = db.Column(JSONB, nullable=True)
# Versioning Information
created_at = db.Column(db.DateTime, nullable=False, server_default=db.func.now())
created_by = db.Column(db.Integer, db.ForeignKey(User.id), nullable=True)
updated_at = db.Column(db.DateTime, nullable=False, server_default=db.func.now(), onupdate=db.func.now())
updated_by = db.Column(db.Integer, db.ForeignKey(User.id))
class Document(db.Model):
id = db.Column(db.Integer, primary_key=True)
# tenant_id = db.Column(db.Integer, db.ForeignKey(Tenant.id), nullable=False)
catalog_id = db.Column(db.Integer, db.ForeignKey(Catalog.id), nullable=True)
name = db.Column(db.String(100), nullable=False)
tenant_id = db.Column(db.Integer, db.ForeignKey(Tenant.id), nullable=False)
valid_from = db.Column(db.DateTime, nullable=True)
valid_to = db.Column(db.DateTime, nullable=True)
@@ -31,12 +99,14 @@ class DocumentVersion(db.Model):
bucket_name = db.Column(db.String(255), nullable=True)
object_name = db.Column(db.String(200), nullable=True)
file_type = db.Column(db.String(20), nullable=True)
sub_file_type = db.Column(db.String(50), nullable=True)
file_size = db.Column(db.Float, nullable=True)
language = db.Column(db.String(2), nullable=False)
user_context = db.Column(db.Text, nullable=True)
system_context = db.Column(db.Text, nullable=True)
user_metadata = db.Column(JSONB, nullable=True)
system_metadata = db.Column(JSONB, nullable=True)
catalog_properties = db.Column(JSONB, nullable=True)
# Versioning Information
created_at = db.Column(db.DateTime, nullable=False, server_default=db.func.now())

View File

@@ -1,6 +1,8 @@
from sqlalchemy.dialects.postgresql import JSONB
from ..extensions import db
from .user import User, Tenant
from .document import Embedding
from .document import Embedding, Retriever
class ChatSession(db.Model):
@@ -18,14 +20,32 @@ class ChatSession(db.Model):
return f"<ChatSession {self.id} by {self.user_id}>"
class Specialist(db.Model):
id = db.Column(db.Integer, primary_key=True)
name = db.Column(db.String(50), nullable=False)
description = db.Column(db.Text, nullable=True)
type = db.Column(db.String(50), nullable=False, default="STANDARD_RAG")
tuning = db.Column(db.Boolean, nullable=True, default=False)
configuration = db.Column(JSONB, nullable=True)
arguments = db.Column(JSONB, nullable=True)
# Relationship to retrievers through the association table
retrievers = db.relationship('SpecialistRetriever', backref='specialist', lazy=True,
cascade="all, delete-orphan")
# Versioning Information
created_at = db.Column(db.DateTime, nullable=False, server_default=db.func.now())
created_by = db.Column(db.Integer, db.ForeignKey(User.id), nullable=True)
updated_at = db.Column(db.DateTime, nullable=False, server_default=db.func.now(), onupdate=db.func.now())
updated_by = db.Column(db.Integer, db.ForeignKey(User.id))
class Interaction(db.Model):
id = db.Column(db.Integer, primary_key=True)
chat_session_id = db.Column(db.Integer, db.ForeignKey(ChatSession.id), nullable=False)
question = db.Column(db.Text, nullable=False)
detailed_question = db.Column(db.Text, nullable=True)
answer = db.Column(db.Text, nullable=True)
algorithm_used = db.Column(db.String(20), nullable=True)
language = db.Column(db.String(2), nullable=False)
specialist_id = db.Column(db.Integer, db.ForeignKey(Specialist.id), nullable=True)
specialist_arguments = db.Column(JSONB, nullable=True)
specialist_results = db.Column(JSONB, nullable=True)
timezone = db.Column(db.String(30), nullable=True)
appreciation = db.Column(db.Integer, nullable=True)
@@ -44,3 +64,10 @@ class Interaction(db.Model):
class InteractionEmbedding(db.Model):
interaction_id = db.Column(db.Integer, db.ForeignKey(Interaction.id, ondelete='CASCADE'), primary_key=True)
embedding_id = db.Column(db.Integer, db.ForeignKey(Embedding.id, ondelete='CASCADE'), primary_key=True)
class SpecialistRetriever(db.Model):
specialist_id = db.Column(db.Integer, db.ForeignKey(Specialist.id, ondelete='CASCADE'), primary_key=True)
retriever_id = db.Column(db.Integer, db.ForeignKey(Retriever.id, ondelete='CASCADE'), primary_key=True)
retriever = db.relationship("Retriever", backref="specialist_retrievers")

View File

@@ -34,36 +34,8 @@ class Tenant(db.Model):
embedding_model = db.Column(db.String(50), nullable=True)
llm_model = db.Column(db.String(50), nullable=True)
# Embedding variables
html_tags = db.Column(ARRAY(sa.String(10)), nullable=True, default=['p', 'h1', 'h2', 'h3', 'h4', 'h5', 'h6', 'li'])
html_end_tags = db.Column(ARRAY(sa.String(10)), nullable=True, default=['p', 'li'])
html_included_elements = db.Column(ARRAY(sa.String(50)), nullable=True)
html_excluded_elements = db.Column(ARRAY(sa.String(50)), nullable=True)
html_excluded_classes = db.Column(ARRAY(sa.String(200)), nullable=True)
min_chunk_size = db.Column(db.Integer, nullable=True, default=2000)
max_chunk_size = db.Column(db.Integer, nullable=True, default=3000)
# Embedding search variables
es_k = db.Column(db.Integer, nullable=True, default=5)
es_similarity_threshold = db.Column(db.Float, nullable=True, default=0.7)
# Chat variables
chat_RAG_temperature = db.Column(db.Float, nullable=True, default=0.3)
chat_no_RAG_temperature = db.Column(db.Float, nullable=True, default=0.5)
fallback_algorithms = db.Column(ARRAY(sa.String(50)), nullable=True)
# Licensing Information
encrypted_chat_api_key = db.Column(db.String(500), nullable=True)
encrypted_api_key = db.Column(db.String(500), nullable=True)
# Tuning enablers
embed_tuning = db.Column(db.Boolean, nullable=True, default=False)
rag_tuning = db.Column(db.Boolean, nullable=True, default=False)
# Entitlements
currency = db.Column(db.String(20), nullable=True)
usage_email = db.Column(db.String(255), nullable=True)
storage_dirty = db.Column(db.Boolean, nullable=True, default=False)
# Relations
@@ -96,22 +68,7 @@ class Tenant(db.Model):
'allowed_languages': self.allowed_languages,
'embedding_model': self.embedding_model,
'llm_model': self.llm_model,
'html_tags': self.html_tags,
'html_end_tags': self.html_end_tags,
'html_included_elements': self.html_included_elements,
'html_excluded_elements': self.html_excluded_elements,
'html_excluded_classes': self.html_excluded_classes,
'min_chunk_size': self.min_chunk_size,
'max_chunk_size': self.max_chunk_size,
'es_k': self.es_k,
'es_similarity_threshold': self.es_similarity_threshold,
'chat_RAG_temperature': self.chat_RAG_temperature,
'chat_no_RAG_temperature': self.chat_no_RAG_temperature,
'fallback_algorithms': self.fallback_algorithms,
'embed_tuning': self.embed_tuning,
'rag_tuning': self.rag_tuning,
'currency': self.currency,
'usage_email': self.usage_email,
}
@@ -153,6 +110,8 @@ class User(db.Model, UserMixin):
fs_uniquifier = db.Column(db.String(255), unique=True, nullable=False)
confirmed_at = db.Column(db.DateTime, nullable=True)
valid_to = db.Column(db.Date, nullable=True)
is_primary_contact = db.Column(db.Boolean, nullable=True, default=False)
is_financial_contact = db.Column(db.Boolean, nullable=True, default=False)
# Security Trackable Information
last_login_at = db.Column(db.DateTime, nullable=True)
@@ -193,3 +152,29 @@ class TenantDomain(db.Model):
def __repr__(self):
return f"<TenantDomain {self.id}: {self.domain}>"
class TenantProject(db.Model):
__bind_key__ = 'public'
__table_args__ = {'schema': 'public'}
id = db.Column(db.Integer, primary_key=True)
tenant_id = db.Column(db.Integer, db.ForeignKey('public.tenant.id'), nullable=False)
name = db.Column(db.String(50), nullable=False)
description = db.Column(db.Text, nullable=True)
services = db.Column(ARRAY(sa.String(50)), nullable=False)
encrypted_api_key = db.Column(db.String(500), nullable=True)
visual_api_key = db.Column(db.String(20), nullable=True)
active = db.Column(db.Boolean, nullable=False, default=True)
responsible_email = db.Column(db.String(255), nullable=True)
# Versioning Information
created_at = db.Column(db.DateTime, nullable=False, server_default=db.func.now())
created_by = db.Column(db.Integer, db.ForeignKey('public.user.id'), nullable=True)
updated_at = db.Column(db.DateTime, nullable=False, server_default=db.func.now(), onupdate=db.func.now())
updated_by = db.Column(db.Integer, db.ForeignKey('public.user.id'))
# Relations
tenant = db.relationship('Tenant', backref='projects')
def __repr__(self):
return f"<TenantProject {self.id}: {self.name}>"

View File

@@ -4,7 +4,6 @@ from contextlib import contextmanager
from datetime import datetime
from typing import Dict, Any, Optional
from datetime import datetime as dt, timezone as tz
from portkey_ai import Portkey, Config
import logging
from .business_event_context import BusinessEventContext

0
common/utils/cache/__init__old.py vendored Normal file
View File

89
common/utils/cache/base.py vendored Normal file
View File

@@ -0,0 +1,89 @@
# common/utils/cache/base.py
from typing import Any, Dict, List, Optional, TypeVar, Generic, Type
from dataclasses import dataclass
from flask import Flask
from dogpile.cache import CacheRegion
T = TypeVar('T')
@dataclass
class CacheKey:
"""Represents a cache key with multiple components"""
components: Dict[str, Any]
def __str__(self) -> str:
return ":".join(f"{k}={v}" for k, v in sorted(self.components.items()))
class CacheInvalidationManager:
"""Manages cache invalidation subscriptions"""
def __init__(self):
self._subscribers = {}
def subscribe(self, model: str, handler: 'CacheHandler', key_fields: List[str]):
if model not in self._subscribers:
self._subscribers[model] = []
self._subscribers[model].append((handler, key_fields))
def notify_change(self, model: str, **identifiers):
if model in self._subscribers:
for handler, key_fields in self._subscribers[model]:
if all(field in identifiers for field in key_fields):
handler.invalidate_by_model(model, **identifiers)
class CacheHandler(Generic[T]):
"""Base cache handler implementation"""
def __init__(self, region: CacheRegion, prefix: str):
self.region = region
self.prefix = prefix
self._key_components = []
def configure_keys(self, *components: str):
self._key_components = components
return self
def subscribe_to_model(self, model: str, key_fields: List[str]):
invalidation_manager.subscribe(model, self, key_fields)
return self
def generate_key(self, **identifiers) -> str:
missing = set(self._key_components) - set(identifiers.keys())
if missing:
raise ValueError(f"Missing key components: {missing}")
key = CacheKey({k: identifiers[k] for k in self._key_components})
return f"{self.prefix}:{str(key)}"
def get(self, creator_func, **identifiers) -> T:
cache_key = self.generate_key(**identifiers)
def creator():
instance = creator_func(**identifiers)
return self.to_cache_data(instance)
cached_data = self.region.get_or_create(
cache_key,
creator,
should_cache_fn=self.should_cache
)
return self.from_cache_data(cached_data, **identifiers)
def invalidate(self, **identifiers):
cache_key = self.generate_key(**identifiers)
self.region.delete(cache_key)
def invalidate_by_model(self, model: str, **identifiers):
try:
self.invalidate(**identifiers)
except ValueError:
pass
# Create global invalidation manager
invalidation_manager = CacheInvalidationManager()

View File

@@ -0,0 +1,39 @@
from typing import Type
from flask import Flask
from common.utils.cache.base import CacheHandler
class EveAICacheManager:
"""Cache manager with registration capabilities"""
def __init__(self):
self._regions = {}
self._handlers = {}
def init_app(self, app: Flask):
"""Initialize cache regions"""
from common.utils.cache.regions import create_cache_regions
self._regions = create_cache_regions(app)
# Store regions in instance
for region_name, region in self._regions.items():
setattr(self, f"{region_name}_region", region)
# Initialize all registered handlers with their regions
for handler_class, region_name in self._handlers.items():
region = self._regions[region_name]
handler_instance = handler_class(region)
handler_name = getattr(handler_class, 'handler_name', None)
if handler_name:
app.logger.debug(f"{handler_name} is registered")
setattr(self, handler_name, handler_instance)
app.logger.info('Cache regions initialized: ' + ', '.join(self._regions.keys()))
def register_handler(self, handler_class: Type[CacheHandler], region: str):
"""Register a cache handler class with its region"""
if not hasattr(handler_class, 'handler_name'):
raise ValueError("Cache handler must define handler_name class attribute")
self._handlers[handler_class] = region

65
common/utils/cache/regions.py vendored Normal file
View File

@@ -0,0 +1,65 @@
# common/utils/cache/regions.py
from dogpile.cache import make_region
from urllib.parse import urlparse
import os
def get_redis_config(app):
"""
Create Redis configuration dict based on app config
Handles both authenticated and non-authenticated setups
"""
# Parse the REDIS_BASE_URI to get all components
redis_uri = urlparse(app.config['REDIS_BASE_URI'])
config = {
'host': redis_uri.hostname,
'port': int(redis_uri.port or 6379),
'db': 4, # Keep this for later use
'redis_expiration_time': 3600,
'distributed_lock': True,
'thread_local_lock': False,
}
# Add authentication if provided
if redis_uri.username and redis_uri.password:
config.update({
'username': redis_uri.username,
'password': redis_uri.password
})
return config
def create_cache_regions(app):
"""Initialize all cache regions with app config"""
redis_config = get_redis_config(app)
regions = {}
# Region for model-related caching (ModelVariables etc)
model_region = make_region(name='eveai_model').configure(
'dogpile.cache.redis',
arguments=redis_config,
replace_existing_backend=True
)
regions['eveai_model'] = model_region
# Region for eveai_chat_workers components (Specialists, Retrievers, ...)
eveai_chat_workers_region = make_region(name='eveai_chat_workers').configure(
'dogpile.cache.redis',
arguments=redis_config, # arguments={**redis_config, 'db': 4}, # Different DB
replace_existing_backend=True
)
regions['eveai_chat_workers'] = eveai_chat_workers_region
# Region for eveai_workers components (Processors, ...)
eveai_workers_region = make_region(name='eveai_workers').configure(
'dogpile.cache.redis',
arguments=redis_config, # Same config for now
replace_existing_backend=True
)
regions['eveai_workers'] = eveai_workers_region
return regions

View File

@@ -8,8 +8,6 @@ celery_app = Celery()
def init_celery(celery, app, is_beat=False):
celery_app.main = app.name
app.logger.debug(f'CELERY_BROKER_URL: {app.config["CELERY_BROKER_URL"]}')
app.logger.debug(f'CELERY_RESULT_BACKEND: {app.config["CELERY_RESULT_BACKEND"]}')
celery_config = {
'broker_url': app.config.get('CELERY_BROKER_URL', 'redis://localhost:6379/0'),

View File

@@ -0,0 +1,613 @@
from typing import Optional, List, Union, Dict, Any, Pattern
from pydantic import BaseModel, field_validator, model_validator
from typing_extensions import Annotated
import re
from datetime import datetime
import json
from textwrap import dedent
import yaml
from dataclasses import dataclass
class TaggingField(BaseModel):
"""Represents a single tagging field configuration"""
type: str
required: bool = False
description: Optional[str] = None
allowed_values: Optional[List[Any]] = None # for enum type
min_value: Optional[Union[int, float]] = None # for numeric types
max_value: Optional[Union[int, float]] = None # for numeric types
@field_validator('type', mode='before')
@classmethod
def validate_type(cls, v: str) -> str:
valid_types = ['string', 'integer', 'float', 'date', 'enum']
if v not in valid_types:
raise ValueError(f'type must be one of {valid_types}')
return v
@model_validator(mode='after')
def validate_field_constraints(self) -> 'TaggingField':
# Validate enum constraints
if self.type == 'enum':
if not self.allowed_values:
raise ValueError('allowed_values must be provided for enum type')
elif self.allowed_values is not None:
raise ValueError('allowed_values only valid for enum type')
# Validate numeric constraints
if self.type not in ('integer', 'float'):
if self.min_value is not None or self.max_value is not None:
raise ValueError('min_value/max_value only valid for numeric types')
else:
if self.min_value is not None and self.max_value is not None and self.min_value >= self.max_value:
raise ValueError('min_value must be less than max_value')
return self
class TaggingFields(BaseModel):
"""Represents a collection of tagging fields, mapped by their names"""
fields: Dict[str, TaggingField]
@classmethod
def from_dict(cls, data: Dict[str, Dict[str, Any]]) -> 'TaggingFields':
return cls(fields={
field_name: TaggingField(**field_config)
for field_name, field_config in data.items()
})
def to_dict(self) -> Dict[str, Dict[str, Any]]:
return {
field_name: field.model_dump(exclude_none=True)
for field_name, field in self.fields.items()
}
class ArgumentConstraint(BaseModel):
"""Base class for all argument constraints"""
description: Optional[str] = None
error_message: Optional[str] = None
class NumericConstraint(ArgumentConstraint):
"""Constraints for numeric values (int/float)"""
min_value: Optional[float] = None
max_value: Optional[float] = None
include_min: bool = True # True for >= min_value, False for > min_value
include_max: bool = True # True for <= max_value, False for < max_value
@model_validator(mode='after')
def validate_ranges(self) -> 'NumericConstraint':
if self.min_value is not None and self.max_value is not None:
if self.min_value > self.max_value:
raise ValueError("min_value must be less than or equal to max_value")
return self
def validate(self, value: Union[int, float]) -> bool:
if self.min_value is not None:
if self.include_min and value < self.min_value:
return False
if not self.include_min and value <= self.min_value:
return False
if self.max_value is not None:
if self.include_max and value > self.max_value:
return False
if not self.include_max and value >= self.max_value:
return False
return True
class StringConstraint(ArgumentConstraint):
"""Constraints for string values"""
min_length: Optional[int] = None
max_length: Optional[int] = None
patterns: Optional[List[str]] = None # List of regex patterns to match
pattern_match_all: bool = False # If True, string must match all patterns
forbidden_patterns: Optional[List[str]] = None # List of regex patterns that must not match
allow_empty: bool = False
@field_validator('patterns', 'forbidden_patterns')
@classmethod
def validate_patterns(cls, v: Optional[List[str]]) -> Optional[List[str]]:
if v is not None:
# Validate each pattern compiles
for pattern in v:
try:
re.compile(pattern)
except re.error as e:
raise ValueError(f"Invalid regex pattern '{pattern}': {str(e)}")
return v
def validate(self, value: str) -> bool:
if not self.allow_empty and not value:
return False
if self.min_length is not None and len(value) < self.min_length:
return False
if self.max_length is not None and len(value) > self.max_length:
return False
if self.patterns:
matches = [bool(re.search(pattern, value)) for pattern in self.patterns]
if self.pattern_match_all and not all(matches):
return False
if not self.pattern_match_all and not any(matches):
return False
if self.forbidden_patterns:
for pattern in self.forbidden_patterns:
if re.search(pattern, value):
return False
return True
class DateConstraint(ArgumentConstraint):
"""Constraints for date values"""
min_date: Optional[datetime] = None
max_date: Optional[datetime] = None
include_min: bool = True
include_max: bool = True
allowed_formats: Optional[List[str]] = None # List of allowed date formats
@model_validator(mode='after')
def validate_ranges(self) -> 'DateConstraint':
if self.min_date and self.max_date and self.min_date > self.max_date:
raise ValueError("min_date must be less than or equal to max_date")
return self
def validate(self, value: datetime) -> bool:
if self.min_date is not None:
if self.include_min and value < self.min_date:
return False
if not self.include_min and value <= self.min_date:
return False
if self.max_date is not None:
if self.include_max and value > self.max_date:
return False
if not self.include_max and value >= self.max_date:
return False
return True
class EnumConstraint(ArgumentConstraint):
"""Constraints for enum values"""
allowed_values: List[Any]
case_sensitive: bool = True # For string enums
allow_multiple: bool = False # If True, value can be a list of allowed values
min_selections: Optional[int] = None # When allow_multiple is True
max_selections: Optional[int] = None # When allow_multiple is True
@model_validator(mode='after')
def validate_selections(self) -> 'EnumConstraint':
if self.allow_multiple:
if self.min_selections is not None and self.max_selections is not None:
if self.min_selections > self.max_selections:
raise ValueError("min_selections must be less than or equal to max_selections")
if self.max_selections > len(self.allowed_values):
raise ValueError("max_selections cannot be greater than number of allowed values")
return self
def validate(self, value: Union[Any, List[Any]]) -> bool:
if self.allow_multiple:
if not isinstance(value, list):
return False
if self.min_selections is not None and len(value) < self.min_selections:
return False
if self.max_selections is not None and len(value) > self.max_selections:
return False
for v in value:
if not self._validate_single_value(v):
return False
else:
return self._validate_single_value(value)
return True
def _validate_single_value(self, value: Any) -> bool:
if isinstance(value, str) and not self.case_sensitive:
return any(str(value).lower() == str(v).lower() for v in self.allowed_values)
return value in self.allowed_values
class ArgumentDefinition(BaseModel):
"""Defines an argument with its type and constraints"""
name: str
type: str
description: Optional[str] = None
required: bool = False
default: Optional[Any] = None
constraints: Optional[Union[NumericConstraint, StringConstraint, DateConstraint, EnumConstraint]] = None
@field_validator('type')
@classmethod
def validate_type(cls, v: str) -> str:
valid_types = ['string', 'integer', 'float', 'date', 'enum']
if v not in valid_types:
raise ValueError(f'type must be one of {valid_types}')
return v
@model_validator(mode='after')
def validate_constraints(self) -> 'ArgumentDefinition':
if self.constraints:
expected_constraint_types = {
'string': StringConstraint,
'integer': NumericConstraint,
'float': NumericConstraint,
'date': DateConstraint,
'enum': EnumConstraint
}
expected_type = expected_constraint_types.get(self.type)
if not isinstance(self.constraints, expected_type):
raise ValueError(f'Constraints for type {self.type} must be of type {expected_type.__name__}')
if self.default is not None:
if not self.constraints.validate(self.default):
raise ValueError(f'Default value does not satisfy constraints for {self.name}')
return self
class ArgumentDefinitions(BaseModel):
"""Collection of argument definitions"""
arguments: Dict[str, ArgumentDefinition]
@classmethod
def from_dict(cls, data: Dict[str, Dict[str, Any]]) -> 'ArgumentDefinitions':
return cls(arguments={
arg_name: ArgumentDefinition(**arg_config)
for arg_name, arg_config in data.items()
})
def to_dict(self) -> Dict[str, Dict[str, Any]]:
return {
arg_name: arg.model_dump(exclude_none=True)
for arg_name, arg in self.arguments.items()
}
def validate_argument_values(self, values: Dict[str, Any]) -> Dict[str, str]:
"""
Validate a set of argument values against their definitions
Returns a dictionary of error messages for invalid arguments
"""
errors = {}
# Check for required arguments
for name, arg_def in self.arguments.items():
if arg_def.required and name not in values:
errors[name] = "Required argument missing"
continue
if name in values:
value = values[name]
# Validate type
try:
if arg_def.type == 'integer':
value = int(value)
elif arg_def.type == 'float':
value = float(value)
elif arg_def.type == 'date' and isinstance(value, str):
if arg_def.constraints and arg_def.constraints.allowed_formats:
for fmt in arg_def.constraints.allowed_formats:
try:
value = datetime.strptime(value, fmt)
break
except ValueError:
continue
else:
errors[
name] = f"Invalid date format. Allowed formats: {arg_def.constraints.allowed_formats}"
continue
except (ValueError, TypeError):
errors[name] = f"Invalid type. Expected {arg_def.type}"
continue
# Validate constraints
if arg_def.constraints and not arg_def.constraints.validate(value):
errors[name] = arg_def.constraints.error_message or "Value does not satisfy constraints"
return errors
@dataclass
class DocumentationFormat:
"""Constants for documentation formats"""
MARKDOWN = "markdown"
JSON = "json"
YAML = "yaml"
@dataclass
class DocumentationVersion:
"""Constants for documentation versions"""
BASIC = "basic" # Original documentation without retriever info
EXTENDED = "extended" # Including retriever documentation
def _generate_argument_constraints(field_config: Dict[str, Any]) -> List[Dict[str, Any]]:
"""Generate possible argument constraints based on field type"""
constraints = []
base_constraint = {
"description": f"Constraint for {field_config.get('description', 'field')}",
"error_message": "Optional custom error message"
}
if field_config["type"] == "integer" or field_config["type"] == "float":
constraints.append({
**base_constraint,
"type": "NumericConstraint",
"possible_constraints": {
"min_value": "number",
"max_value": "number",
"include_min": "boolean",
"include_max": "boolean"
},
"example": {
"min_value": field_config.get("min_value", 0),
"max_value": field_config.get("max_value", 100),
"include_min": True,
"include_max": True
}
})
elif field_config["type"] == "string":
constraints.append({
**base_constraint,
"type": "StringConstraint",
"possible_constraints": {
"min_length": "integer",
"max_length": "integer",
"patterns": "list[str]",
"pattern_match_all": "boolean",
"forbidden_patterns": "list[str]",
"allow_empty": "boolean"
},
"example": {
"min_length": 1,
"max_length": 100,
"patterns": ["^[A-Za-z0-9]+$"],
"pattern_match_all": False,
"forbidden_patterns": ["^test_", "_temp$"],
"allow_empty": False
}
})
elif field_config["type"] == "enum":
constraints.append({
**base_constraint,
"type": "EnumConstraint",
"possible_constraints": {
"allowed_values": f"list[{field_config.get('allowed_values', ['value1', 'value2'])}]",
"case_sensitive": "boolean",
"allow_multiple": "boolean",
"min_selections": "integer",
"max_selections": "integer"
},
"example": {
"allowed_values": field_config.get("allowed_values", ["value1", "value2"]),
"case_sensitive": True,
"allow_multiple": True,
"min_selections": 1,
"max_selections": 2
}
})
elif field_config["type"] == "date":
constraints.append({
**base_constraint,
"type": "DateConstraint",
"possible_constraints": {
"min_date": "datetime",
"max_date": "datetime",
"include_min": "boolean",
"include_max": "boolean",
"allowed_formats": "list[str]"
},
"example": {
"min_date": "2024-01-01T00:00:00",
"max_date": "2024-12-31T23:59:59",
"include_min": True,
"include_max": True,
"allowed_formats": ["%Y-%m-%d", "%Y/%m/%d"]
}
})
return constraints
def generate_field_documentation(
tagging_fields: Dict[str, Any],
format: str = "markdown",
version: str = "basic"
) -> str:
"""
Generate documentation for tagging fields configuration.
Args:
tagging_fields: Dictionary containing tagging fields configuration
format: Output format ("markdown", "json", or "yaml")
version: Documentation version ("basic" or "extended")
Returns:
str: Formatted documentation
"""
if version not in [DocumentationVersion.BASIC, DocumentationVersion.EXTENDED]:
raise ValueError(f"Unsupported documentation version: {version}")
# Normalize fields configuration
normalized_fields = {}
for field_name, field_config in tagging_fields.items():
field_doc = {
"name": field_name,
"type": field_config["type"],
"required": field_config.get("required", False),
"description": field_config.get("description", "No description provided"),
"constraints": []
}
# Only include possible arguments in extended version
if version == DocumentationVersion.EXTENDED:
field_doc["possible_arguments"] = _generate_argument_constraints(field_config)
# Add type-specific constraints
if field_config["type"] == "integer" or field_config["type"] == "float":
if "min_value" in field_config:
field_doc["constraints"].append(
f"Minimum value: {field_config['min_value']}")
if "max_value" in field_config:
field_doc["constraints"].append(
f"Maximum value: {field_config['max_value']}")
elif field_config["type"] == "string":
if "min_length" in field_config:
field_doc["constraints"].append(
f"Minimum length: {field_config['min_length']}")
if "max_length" in field_config:
field_doc["constraints"].append(
f"Maximum length: {field_config['max_length']}")
if "patterns" in field_config:
field_doc["constraints"].append(
f"Must match patterns: {', '.join(field_config['patterns'])}")
elif field_config["type"] == "enum":
if "allowed_values" in field_config:
field_doc["constraints"].append(
f"Allowed values: {', '.join(str(v) for v in field_config['allowed_values'])}")
elif field_config["type"] == "date":
if "min_date" in field_config:
field_doc["constraints"].append(
f"Minimum date: {field_config['min_date']}")
if "max_date" in field_config:
field_doc["constraints"].append(
f"Maximum date: {field_config['max_date']}")
if "allowed_formats" in field_config:
field_doc["constraints"].append(
f"Allowed formats: {', '.join(field_config['allowed_formats'])}")
normalized_fields[field_name] = field_doc
# Generate documentation in requested format
if format == DocumentationFormat.MARKDOWN:
return _generate_markdown_docs(normalized_fields, version)
elif format == DocumentationFormat.JSON:
return _generate_json_docs(normalized_fields, version)
elif format == DocumentationFormat.YAML:
return _generate_yaml_docs(normalized_fields, version)
else:
raise ValueError(f"Unsupported documentation format: {format}")
def _generate_markdown_docs(fields: Dict[str, Any], version: str) -> str:
"""Generate markdown documentation"""
docs = ["# Tagging Fields Documentation\n"]
# Add overview table
docs.append("## Fields Overview\n")
docs.append("| Field Name | Type | Required | Description |")
docs.append("|------------|------|----------|-------------|")
for field_name, field in fields.items():
docs.append(
f"| {field_name} | {field['type']} | "
f"{'Yes' if field['required'] else 'No'} | {field['description']} |"
)
# Add detailed field specifications
docs.append("\n## Detailed Field Specifications\n")
for field_name, field in fields.items():
docs.append(f"### {field_name}\n")
docs.append(f"**Type:** {field['type']}")
docs.append(f"**Required:** {'Yes' if field['required'] else 'No'}")
docs.append(f"**Description:** {field['description']}\n")
if field["constraints"]:
docs.append("**Field Constraints:**")
for constraint in field["constraints"]:
docs.append(f"- {constraint}")
docs.append("")
# Add retriever argument documentation only in extended version
if version == DocumentationVersion.EXTENDED and "possible_arguments" in field:
docs.append("**Possible Retriever Arguments:**")
for arg_constraint in field["possible_arguments"]:
docs.append(f"\n*{arg_constraint['type']}*")
docs.append(f"Description: {arg_constraint['description']}")
docs.append("\nPossible constraints:")
for const_name, const_type in arg_constraint["possible_constraints"].items():
docs.append(f"- `{const_name}`: {const_type}")
docs.append("\nExample:")
docs.append("```python")
docs.append(json.dumps(arg_constraint["example"], indent=2))
docs.append("```\n")
# Add example retriever configuration only in extended version
if version == DocumentationVersion.EXTENDED:
docs.append("\n## Example Retriever Configuration\n")
docs.append("```python")
example_config = {
"metadata_filters": {
field_name: field["possible_arguments"][0]["example"]
for field_name, field in fields.items()
if "possible_arguments" in field
}
}
docs.append(json.dumps(example_config, indent=2))
docs.append("```")
return "\n".join(docs)
def _generate_json_docs(fields: Dict[str, Any], version: str) -> str:
"""Generate JSON documentation"""
doc = {
"tagging_fields_documentation": {
"version": version,
"fields": fields
}
}
if version == DocumentationVersion.EXTENDED:
doc["tagging_fields_documentation"]["example_retriever_config"] = {
"metadata_filters": {
field_name: field["possible_arguments"][0]["example"]
for field_name, field in fields.items()
if "possible_arguments" in field
}
}
return json.dumps(doc, indent=2)
def _generate_yaml_docs(fields: Dict[str, Any], version: str) -> str:
"""Generate YAML documentation"""
doc = {
"tagging_fields_documentation": {
"version": version,
"fields": fields
}
}
if version == DocumentationVersion.EXTENDED:
doc["tagging_fields_documentation"]["example_retriever_config"] = {
"metadata_filters": {
field_name: field["possible_arguments"][0]["example"]
for field_name, field in fields.items()
if "possible_arguments" in field
}
}
return yaml.dump(doc, sort_keys=False, default_flow_style=False)

View File

@@ -1,14 +1,14 @@
from flask import request, current_app, session
from flask_jwt_extended import decode_token, verify_jwt_in_request, get_jwt_identity
from common.models.user import Tenant, TenantDomain
def get_allowed_origins(tenant_id):
session_key = f"allowed_origins_{tenant_id}"
if session_key in session:
current_app.logger.debug(f"Fetching allowed origins for tenant {tenant_id} from session")
return session[session_key]
current_app.logger.debug(f"Fetching allowed origins for tenant {tenant_id} from database")
tenant_domains = TenantDomain.query.filter_by(tenant_id=int(tenant_id)).all()
allowed_origins = [domain.domain for domain in tenant_domains]
@@ -18,51 +18,52 @@ def get_allowed_origins(tenant_id):
def cors_after_request(response, prefix):
current_app.logger.debug(f'CORS after request: {request.path}, prefix: {prefix}')
current_app.logger.debug(f'request.headers: {request.headers}')
current_app.logger.debug(f'request.args: {request.args}')
current_app.logger.debug(f'request is json?: {request.is_json}')
# Exclude health checks from checks
if request.path.startswith('/healthz') or request.path.startswith('/_healthz'):
current_app.logger.debug('Skipping CORS headers for health checks')
response.headers.add('Access-Control-Allow-Origin', '*')
response.headers.add('Access-Control-Allow-Headers', '*')
response.headers.add('Access-Control-Allow-Methods', '*')
return response
# Handle OPTIONS preflight requests
if request.method == 'OPTIONS':
response.headers.add('Access-Control-Allow-Origin', '*')
response.headers.add('Access-Control-Allow-Headers', 'Content-Type,Authorization,X-Tenant-ID')
response.headers.add('Access-Control-Allow-Methods', 'GET,POST,PUT,DELETE,OPTIONS')
response.headers.add('Access-Control-Allow-Credentials', 'true')
return response
tenant_id = None
allowed_origins = []
# Try to get tenant_id from JSON payload
json_data = request.get_json(silent=True)
current_app.logger.debug(f'request.get_json(silent=True): {json_data}')
if json_data and 'tenant_id' in json_data:
tenant_id = json_data['tenant_id']
# Check Socket.IO connection
if 'socket.io' in request.path:
token = request.args.get('token')
if token:
try:
decoded = decode_token(token)
tenant_id = decoded['sub']
except Exception as e:
current_app.logger.error(f'Error decoding token: {e}')
return response
else:
# Fallback to get tenant_id from query parameters or headers if JSON is not available
tenant_id = request.args.get('tenant_id') or request.args.get('tenantId') or request.headers.get('X-Tenant-ID')
current_app.logger.debug(f'Identified tenant_id: {tenant_id}')
# Regular API requests
try:
if verify_jwt_in_request(optional=True):
tenant_id = get_jwt_identity()
except Exception as e:
current_app.logger.error(f'Error verifying JWT: {e}')
return response
if tenant_id:
origin = request.headers.get('Origin')
allowed_origins = get_allowed_origins(tenant_id)
current_app.logger.debug(f'Allowed origins for tenant {tenant_id}: {allowed_origins}')
else:
current_app.logger.warning('tenant_id not found in request')
origin = request.headers.get('Origin')
current_app.logger.debug(f'Origin: {origin}')
if origin in allowed_origins:
response.headers.add('Access-Control-Allow-Origin', origin)
response.headers.add('Access-Control-Allow-Headers', 'Content-Type,Authorization')
response.headers.add('Access-Control-Allow-Methods', 'GET,POST,PUT,DELETE,OPTIONS')
response.headers.add('Access-Control-Allow-Credentials', 'true')
current_app.logger.debug(f'CORS headers set for origin: {origin}')
else:
current_app.logger.warning(f'Origin {origin} not allowed')
if origin in allowed_origins:
response.headers.add('Access-Control-Allow-Origin', origin)
response.headers.add('Access-Control-Allow-Headers', 'Content-Type,Authorization')
response.headers.add('Access-Control-Allow-Methods', 'GET,POST,PUT,DELETE,OPTIONS')
response.headers.add('Access-Control-Allow-Credentials', 'true')
return response

View File

@@ -36,7 +36,7 @@ def log_request_middleware(app):
@app.before_request
def log_session_state_before():
app.logger.debug(f'Session state before request: {session.items()}')
pass
# @app.after_request
# def log_response_info(response):
@@ -58,5 +58,4 @@ def log_request_middleware(app):
@app.after_request
def log_session_state_after(response):
app.logger.debug(f'Session state after request: {session.items()}')
return response

View File

@@ -3,29 +3,40 @@ from datetime import datetime as dt, timezone as tz
from sqlalchemy import desc
from sqlalchemy.exc import SQLAlchemyError
from werkzeug.utils import secure_filename
from common.models.document import Document, DocumentVersion
from common.models.document import Document, DocumentVersion, Catalog
from common.extensions import db, minio_client
from common.utils.celery_utils import current_celery
from flask import current_app
from flask_security import current_user
import requests
from urllib.parse import urlparse, unquote
from urllib.parse import urlparse, unquote, urlunparse
import os
from .eveai_exceptions import EveAIInvalidLanguageException, EveAIDoubleURLException, EveAIUnsupportedFileType
from .eveai_exceptions import (EveAIInvalidLanguageException, EveAIDoubleURLException, EveAIUnsupportedFileType,
EveAIInvalidCatalog, EveAIInvalidDocument, EveAIInvalidDocumentVersion)
from ..models.user import Tenant
def create_document_stack(api_input, file, filename, extension, tenant_id):
# Create the Document
new_doc = create_document(api_input, filename, tenant_id)
catalog_id = int(api_input.get('catalog_id'))
catalog = Catalog.query.get(catalog_id)
if not catalog:
raise EveAIInvalidCatalog(tenant_id, catalog_id)
new_doc = create_document(api_input, filename, catalog_id)
db.session.add(new_doc)
url = api_input.get('url', '')
if url != '':
url = cope_with_local_url(api_input.get('url', ''))
# Create the DocumentVersion
new_doc_vers = create_version_for_document(new_doc,
api_input.get('url', ''),
new_doc_vers = create_version_for_document(new_doc, tenant_id,
url,
api_input.get('sub_file_type', ''),
api_input.get('language', 'en'),
api_input.get('user_context', ''),
api_input.get('user_metadata'),
api_input.get('catalog_properties')
)
db.session.add(new_doc_vers)
@@ -45,7 +56,7 @@ def create_document_stack(api_input, file, filename, extension, tenant_id):
return new_doc, new_doc_vers
def create_document(form, filename, tenant_id):
def create_document(form, filename, catalog_id):
new_doc = Document()
if form['name'] == '':
new_doc.name = filename.rsplit('.', 1)[0]
@@ -56,13 +67,14 @@ def create_document(form, filename, tenant_id):
new_doc.valid_from = form['valid_from']
else:
new_doc.valid_from = dt.now(tz.utc)
new_doc.tenant_id = tenant_id
new_doc.catalog_id = catalog_id
set_logging_information(new_doc, dt.now(tz.utc))
return new_doc
def create_version_for_document(document, url, language, user_context, user_metadata):
def create_version_for_document(document, tenant_id, url, sub_file_type, language, user_context, user_metadata,
catalog_properties):
new_doc_vers = DocumentVersion()
if url != '':
new_doc_vers.url = url
@@ -78,11 +90,17 @@ def create_version_for_document(document, url, language, user_context, user_meta
if user_metadata != '' and user_metadata is not None:
new_doc_vers.user_metadata = user_metadata
if catalog_properties != '' and catalog_properties is not None:
new_doc_vers.catalog_properties = catalog_properties
if sub_file_type != '':
new_doc_vers.sub_file_type = sub_file_type
new_doc_vers.document = document
set_logging_information(new_doc_vers, dt.now(tz.utc))
mark_tenant_storage_dirty(document.tenant_id)
mark_tenant_storage_dirty(tenant_id)
return new_doc_vers
@@ -158,6 +176,8 @@ def get_extension_from_content_type(content_type):
def process_url(url, tenant_id):
url = cope_with_local_url(url)
response = requests.head(url, allow_redirects=True)
content_type = response.headers.get('Content-Type', '').split(';')[0]
@@ -189,38 +209,6 @@ def process_url(url, tenant_id):
return file_content, filename, extension
def process_multiple_urls(urls, tenant_id, api_input):
results = []
for url in urls:
try:
file_content, filename, extension = process_url(url, tenant_id)
url_input = api_input.copy()
url_input.update({
'url': url,
'name': f"{api_input['name']}-{filename}" if api_input['name'] else filename
})
new_doc, new_doc_vers = create_document_stack(url_input, file_content, filename, extension, tenant_id)
task_id = start_embedding_task(tenant_id, new_doc_vers.id)
results.append({
'url': url,
'document_id': new_doc.id,
'document_version_id': new_doc_vers.id,
'task_id': task_id,
'status': 'success'
})
except Exception as e:
current_app.logger.error(f"Error processing URL {url}: {str(e)}")
results.append({
'url': url,
'status': 'error',
'message': str(e)
})
return results
def start_embedding_task(tenant_id, doc_vers_id):
task = current_celery.send_task('create_embeddings',
args=[tenant_id, doc_vers_id,],
@@ -232,8 +220,6 @@ def start_embedding_task(tenant_id, doc_vers_id):
def validate_file_type(extension):
current_app.logger.debug(f'Validating file type {extension}')
current_app.logger.debug(f'Supported file types: {current_app.config["SUPPORTED_FILE_TYPES"]}')
if extension not in current_app.config['SUPPORTED_FILE_TYPES']:
raise EveAIUnsupportedFileType(f"Filetype {extension} is currently not supported. "
f"Supported filetypes: {', '.join(current_app.config['SUPPORTED_FILE_TYPES'])}")
@@ -256,11 +242,16 @@ def get_documents_list(page, per_page):
return pagination
def edit_document(document_id, name, valid_from, valid_to):
doc = Document.query.get_or_404(document_id)
doc.name = name
doc.valid_from = valid_from
doc.valid_to = valid_to
def edit_document(tenant_id, document_id, name, valid_from, valid_to):
doc = Document.query.get(document_id)
if not doc:
raise EveAIInvalidDocument(tenant_id, document_id)
if name:
doc.name = name
if valid_from:
doc.valid_from = valid_from
if valid_to:
doc.valid_to = valid_to
update_logging_information(doc, dt.now(tz.utc))
try:
@@ -272,9 +263,12 @@ def edit_document(document_id, name, valid_from, valid_to):
return None, str(e)
def edit_document_version(version_id, user_context):
doc_vers = DocumentVersion.query.get_or_404(version_id)
def edit_document_version(tenant_id, version_id, user_context, catalog_properties):
doc_vers = DocumentVersion.query.get(version_id)
if not doc_vers:
raise EveAIInvalidDocumentVersion(tenant_id, version_id)
doc_vers.user_context = user_context
doc_vers.catalog_properties = catalog_properties
update_logging_information(doc_vers, dt.now(tz.utc))
try:
@@ -286,19 +280,22 @@ def edit_document_version(version_id, user_context):
return None, str(e)
def refresh_document_with_info(doc_id, api_input):
doc = Document.query.get_or_404(doc_id)
def refresh_document_with_info(doc_id, tenant_id, api_input):
doc = Document.query.get(doc_id)
if not doc:
raise EveAIInvalidDocument(tenant_id, doc_id)
old_doc_vers = DocumentVersion.query.filter_by(doc_id=doc_id).order_by(desc(DocumentVersion.id)).first()
if not old_doc_vers.url:
return None, "This document has no URL. Only documents with a URL can be refreshed."
new_doc_vers = create_version_for_document(
doc,
doc, tenant_id,
old_doc_vers.url,
old_doc_vers.sub_file_type,
api_input.get('language', old_doc_vers.language),
api_input.get('user_context', old_doc_vers.user_context),
api_input.get('user_metadata', old_doc_vers.user_metadata)
api_input.get('user_metadata', old_doc_vers.user_metadata),
api_input.get('catalog_properties', old_doc_vers.catalog_properties),
)
set_logging_information(new_doc_vers, dt.now(tz.utc))
@@ -310,17 +307,18 @@ def refresh_document_with_info(doc_id, api_input):
db.session.rollback()
return None, str(e)
response = requests.head(old_doc_vers.url, allow_redirects=True)
url = cope_with_local_url(old_doc_vers.url)
response = requests.head(url, allow_redirects=True)
content_type = response.headers.get('Content-Type', '').split(';')[0]
extension = get_extension_from_content_type(content_type)
response = requests.get(old_doc_vers.url)
response = requests.get(url)
response.raise_for_status()
file_content = response.content
upload_file_for_version(new_doc_vers, file_content, extension, doc.tenant_id)
upload_file_for_version(new_doc_vers, file_content, extension, tenant_id)
task = current_celery.send_task('create_embeddings', args=[doc.tenant_id, new_doc_vers.id,], queue='embeddings')
task = current_celery.send_task('create_embeddings', args=[tenant_id, new_doc_vers.id,], queue='embeddings')
current_app.logger.info(f'Embedding creation started for document {doc_id} on version {new_doc_vers.id} '
f'with task id: {task.id}.')
@@ -328,7 +326,7 @@ def refresh_document_with_info(doc_id, api_input):
# Update the existing refresh_document function to use the new refresh_document_with_info
def refresh_document(doc_id):
def refresh_document(doc_id, tenant_id):
current_app.logger.info(f'Refreshing document {doc_id}')
doc = Document.query.get_or_404(doc_id)
old_doc_vers = DocumentVersion.query.filter_by(doc_id=doc_id).order_by(desc(DocumentVersion.id)).first()
@@ -336,14 +334,32 @@ def refresh_document(doc_id):
api_input = {
'language': old_doc_vers.language,
'user_context': old_doc_vers.user_context,
'user_metadata': old_doc_vers.user_metadata
'user_metadata': old_doc_vers.user_metadata,
'catalog_properties': old_doc_vers.catalog_properties,
}
return refresh_document_with_info(doc_id, api_input)
return refresh_document_with_info(doc_id, tenant_id, api_input)
# Function triggered when a document_version is created or updated
def mark_tenant_storage_dirty(tenant_id):
tenant = db.session.query(Tenant).filter_by(id=tenant_id).first()
tenant = db.session.query(Tenant).filter_by(id=int(tenant_id)).first()
tenant.storage_dirty = True
db.session.commit()
def cope_with_local_url(url):
current_app.logger.debug(f'Incomming URL: {url}')
parsed_url = urlparse(url)
# Check if this is an internal WordPress URL (TESTING) and rewrite it
if parsed_url.netloc in [current_app.config['EXTERNAL_WORDPRESS_BASE_URL']]:
parsed_url = parsed_url._replace(
scheme=current_app.config['WORDPRESS_PROTOCOL'],
netloc=f"{current_app.config['WORDPRESS_HOST']}:{current_app.config['WORDPRESS_PORT']}"
)
url = urlunparse(parsed_url)
current_app.logger.debug(f'Translated Wordpress URL to: {url}')
return url

View File

@@ -10,8 +10,12 @@ class EveAIException(Exception):
def to_dict(self):
rv = dict(self.payload or ())
rv['message'] = self.message
rv['error'] = self.__class__.__name__
return rv
def __str__(self):
return self.message # Return the message when the exception is converted to a string
class EveAIInvalidLanguageException(EveAIException):
"""Raised when an invalid language is provided"""
@@ -41,3 +45,83 @@ class EveAINoLicenseForTenant(EveAIException):
super().__init__(message, status_code, payload)
class EveAITenantNotFound(EveAIException):
"""Raised when a tenant is not found"""
def __init__(self, tenant_id, status_code=400, payload=None):
self.tenant_id = tenant_id
message = f"Tenant {tenant_id} not found"
super().__init__(message, status_code, payload)
class EveAITenantInvalid(EveAIException):
"""Raised when a tenant is invalid"""
def __init__(self, tenant_id, status_code=400, payload=None):
self.tenant_id = tenant_id
# Construct the message dynamically
message = f"Tenant with ID '{tenant_id}' is not valid. Please contact the System Administrator."
super().__init__(message, status_code, payload)
class EveAINoActiveLicense(EveAIException):
"""Raised when a tenant has no active licenses"""
def __init__(self, tenant_id, status_code=400, payload=None):
self.tenant_id = tenant_id
# Construct the message dynamically
message = f"Tenant with ID '{tenant_id}' has no active licenses. Please contact the System Administrator."
super().__init__(message, status_code, payload)
class EveAIInvalidCatalog(EveAIException):
"""Raised when a catalog cannot be found"""
def __init__(self, tenant_id, catalog_id, status_code=400, payload=None):
self.tenant_id = tenant_id
self.catalog_id = catalog_id
# Construct the message dynamically
message = f"Tenant with ID '{tenant_id}' has no valid catalog with ID {catalog_id}. Please contact the System Administrator."
super().__init__(message, status_code, payload)
class EveAIInvalidProcessor(EveAIException):
"""Raised when no valid processor can be found for a given Catalog ID"""
def __init__(self, tenant_id, catalog_id, file_type, status_code=400, payload=None):
self.tenant_id = tenant_id
self.catalog_id = catalog_id
self.file_type = file_type
# Construct the message dynamically
message = (f"Tenant with ID '{tenant_id}' has no valid {file_type} processor for catalog with ID {catalog_id}. "
f"Please contact the System Administrator.")
super().__init__(message, status_code, payload)
class EveAIInvalidDocument(EveAIException):
"""Raised when a tenant has no document with given ID"""
def __init__(self, tenant_id, document_id, status_code=400, payload=None):
self.tenant_id = tenant_id
self.document_id = document_id
# Construct the message dynamically
message = f"Tenant with ID '{tenant_id}' has no document with ID {document_id}."
super().__init__(message, status_code, payload)
class EveAIInvalidDocumentVersion(EveAIException):
"""Raised when a tenant has no document version with given ID"""
def __init__(self, tenant_id, document_version_id, status_code=400, payload=None):
self.tenant_id = tenant_id
self.document_version_id = document_version_id
# Construct the message dynamically
message = f"Tenant with ID '{tenant_id}' has no document version with ID {document_version_id}."
super().__init__(message, status_code, payload)
class EveAISocketInputException(EveAIException):
"""Raised when a socket call receives an invalid payload"""
def __init__(self, message, status_code=400, payload=None):
super.__init__(message, status_code, payload)

View File

@@ -24,9 +24,6 @@ def mw_before_request():
if not tenant_id:
raise Exception('Cannot switch schema for tenant: no tenant defined in session')
for role in current_user.roles:
current_app.logger.debug(f'In middleware: User {current_user.email} has role {role.name}')
# user = User.query.get(current_user.id)
if current_user.has_role('Super User') or current_user.tenant_id == tenant_id:
Database(tenant_id).switch_schema()

View File

@@ -1,236 +1,36 @@
import os
from typing import Dict, Any, Optional
import langcodes
from flask import current_app
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_anthropic import ChatAnthropic
from langchain_core.pydantic_v1 import BaseModel, Field
from typing import List, Any, Iterator
from collections.abc import MutableMapping
from openai import OpenAI
from portkey_ai import createHeaders, PORTKEY_GATEWAY_URL
from portkey_ai.langchain.portkey_langchain_callback_handler import LangchainCallbackHandler
from common.langchain.llm_metrics_handler import LLMMetricsHandler
from common.langchain.templates.template_manager import TemplateManager
from langchain_openai import OpenAIEmbeddings, ChatOpenAI, OpenAI
from langchain_anthropic import ChatAnthropic
from flask import current_app
from datetime import datetime as dt, timezone as tz
from common.langchain.tracked_openai_embeddings import TrackedOpenAIEmbeddings
from common.langchain.tracked_transcribe import tracked_transcribe
from common.models.document import EmbeddingSmallOpenAI, EmbeddingLargeOpenAI
from common.langchain.tracked_transcription import TrackedOpenAITranscription
from common.models.user import Tenant
from common.utils.cache.base import CacheHandler
from config.model_config import MODEL_CONFIG
from common.utils.business_event_context import current_event
from common.extensions import template_manager, cache_manager
from common.models.document import EmbeddingLargeOpenAI, EmbeddingSmallOpenAI
from common.utils.eveai_exceptions import EveAITenantNotFound
class CitedAnswer(BaseModel):
"""Default docstring - to be replaced with actual prompt"""
def create_language_template(template: str, language: str) -> str:
"""
Replace language placeholder in template with specified language
answer: str = Field(
...,
description="The answer to the user question, based on the given sources",
)
citations: List[int] = Field(
...,
description="The integer IDs of the SPECIFIC sources that were used to generate the answer"
)
insufficient_info: bool = Field(
False, # Default value is set to False
description="A boolean indicating wether given sources were sufficient or not to generate the answer"
)
Args:
template: Template string with {language} placeholder
language: Language code to insert
def set_language_prompt_template(cls, language_prompt):
cls.__doc__ = language_prompt
class ModelVariables(MutableMapping):
def __init__(self, tenant: Tenant):
self.tenant = tenant
self._variables = self._initialize_variables()
self._embedding_model = None
self._llm = None
self._llm_no_rag = None
self._transcription_client = None
self._prompt_templates = {}
self._embedding_db_model = None
self.llm_metrics_handler = LLMMetricsHandler()
self._transcription_client = None
def _initialize_variables(self):
variables = {}
# We initialize the variables that are available knowing the tenant. For the other, we will apply 'lazy loading'
variables['k'] = self.tenant.es_k or 5
variables['similarity_threshold'] = self.tenant.es_similarity_threshold or 0.7
variables['RAG_temperature'] = self.tenant.chat_RAG_temperature or 0.3
variables['no_RAG_temperature'] = self.tenant.chat_no_RAG_temperature or 0.5
variables['embed_tuning'] = self.tenant.embed_tuning or False
variables['rag_tuning'] = self.tenant.rag_tuning or False
variables['rag_context'] = self.tenant.rag_context or " "
# Set HTML Chunking Variables
variables['html_tags'] = self.tenant.html_tags
variables['html_end_tags'] = self.tenant.html_end_tags
variables['html_included_elements'] = self.tenant.html_included_elements
variables['html_excluded_elements'] = self.tenant.html_excluded_elements
variables['html_excluded_classes'] = self.tenant.html_excluded_classes
# Set Chunk Size variables
variables['min_chunk_size'] = self.tenant.min_chunk_size
variables['max_chunk_size'] = self.tenant.max_chunk_size
# Set model providers
variables['embedding_provider'], variables['embedding_model'] = self.tenant.embedding_model.rsplit('.', 1)
variables['llm_provider'], variables['llm_model'] = self.tenant.llm_model.rsplit('.', 1)
variables["templates"] = current_app.config['PROMPT_TEMPLATES'][(f"{variables['llm_provider']}."
f"{variables['llm_model']}")]
current_app.logger.info(f"Loaded prompt templates: \n")
current_app.logger.info(f"{variables['templates']}")
# Set model-specific configurations
model_config = MODEL_CONFIG.get(variables['llm_provider'], {}).get(variables['llm_model'], {})
variables.update(model_config)
variables['annotation_chunk_length'] = current_app.config['ANNOTATION_TEXT_CHUNK_LENGTH'][self.tenant.llm_model]
if variables['tool_calling_supported']:
variables['cited_answer_cls'] = CitedAnswer
variables['max_compression_duration'] = current_app.config['MAX_COMPRESSION_DURATION']
variables['max_transcription_duration'] = current_app.config['MAX_TRANSCRIPTION_DURATION']
variables['compression_cpu_limit'] = current_app.config['COMPRESSION_CPU_LIMIT']
variables['compression_process_delay'] = current_app.config['COMPRESSION_PROCESS_DELAY']
return variables
@property
def embedding_model(self):
api_key = os.getenv('OPENAI_API_KEY')
model = self._variables['embedding_model']
self._embedding_model = TrackedOpenAIEmbeddings(api_key=api_key,
model=model,
)
self._embedding_db_model = EmbeddingSmallOpenAI \
if model == 'text-embedding-3-small' \
else EmbeddingLargeOpenAI
return self._embedding_model
@property
def llm(self):
api_key = self.get_api_key_for_llm()
self._llm = ChatOpenAI(api_key=api_key,
model=self._variables['llm_model'],
temperature=self._variables['RAG_temperature'],
callbacks=[self.llm_metrics_handler])
return self._llm
@property
def llm_no_rag(self):
api_key = self.get_api_key_for_llm()
self._llm_no_rag = ChatOpenAI(api_key=api_key,
model=self._variables['llm_model'],
temperature=self._variables['RAG_temperature'],
callbacks=[self.llm_metrics_handler])
return self._llm_no_rag
def get_api_key_for_llm(self):
if self._variables['llm_provider'] == 'openai':
api_key = os.getenv('OPENAI_API_KEY')
else: # self._variables['llm_provider'] == 'anthropic'
api_key = os.getenv('ANTHROPIC_API_KEY')
return api_key
@property
def transcription_client(self):
api_key = os.getenv('OPENAI_API_KEY')
self._transcription_client = OpenAI(api_key=api_key, )
self._variables['transcription_model'] = 'whisper-1'
return self._transcription_client
def transcribe(self, *args, **kwargs):
return tracked_transcribe(self._transcription_client, *args, **kwargs)
@property
def embedding_db_model(self):
if self._embedding_db_model is None:
self._embedding_db_model = self.get_embedding_db_model()
return self._embedding_db_model
def get_embedding_db_model(self):
current_app.logger.debug("In get_embedding_db_model")
if self._embedding_db_model is None:
self._embedding_db_model = EmbeddingSmallOpenAI \
if self._variables['embedding_model'] == 'text-embedding-3-small' \
else EmbeddingLargeOpenAI
current_app.logger.debug(f"Embedding DB Model: {self._embedding_db_model}")
return self._embedding_db_model
def get_prompt_template(self, template_name: str) -> str:
current_app.logger.info(f"Getting prompt template for {template_name}")
if template_name not in self._prompt_templates:
self._prompt_templates[template_name] = self._load_prompt_template(template_name)
return self._prompt_templates[template_name]
def _load_prompt_template(self, template_name: str) -> str:
# In the future, this method will make an API call to Portkey
# For now, we'll simulate it with a placeholder implementation
# You can replace this with your current prompt loading logic
return self._variables['templates'][template_name]
def __getitem__(self, key: str) -> Any:
current_app.logger.debug(f"ModelVariables: Getting {key}")
# Support older template names (suffix = _template)
if key.endswith('_template'):
key = key[:-len('_template')]
current_app.logger.debug(f"ModelVariables: Getting modified {key}")
if key == 'embedding_model':
return self.embedding_model
elif key == 'embedding_db_model':
return self.embedding_db_model
elif key == 'llm':
return self.llm
elif key == 'llm_no_rag':
return self.llm_no_rag
elif key == 'transcription_client':
return self.transcription_client
elif key in self._variables.get('prompt_templates', []):
return self.get_prompt_template(key)
return self._variables.get(key)
def __setitem__(self, key: str, value: Any) -> None:
self._variables[key] = value
def __delitem__(self, key: str) -> None:
del self._variables[key]
def __iter__(self) -> Iterator[str]:
return iter(self._variables)
def __len__(self):
return len(self._variables)
def get(self, key: str, default: Any = None) -> Any:
return self.__getitem__(key) or default
def update(self, **kwargs) -> None:
self._variables.update(kwargs)
def items(self):
return self._variables.items()
def keys(self):
return self._variables.keys()
def values(self):
return self._variables.values()
def select_model_variables(tenant):
model_variables = ModelVariables(tenant=tenant)
return model_variables
def create_language_template(template, language):
Returns:
str: Template with language placeholder replaced
"""
try:
full_language = langcodes.Language.make(language=language)
language_template = template.replace('{language}', full_language.display_name())
@@ -240,5 +40,249 @@ def create_language_template(template, language):
return language_template
def replace_variable_in_template(template, variable, value):
return template.replace(variable, value)
def replace_variable_in_template(template: str, variable: str, value: str) -> str:
"""
Replace a variable placeholder in template with specified value
Args:
template: Template string with variable placeholder
variable: Variable placeholder to replace (e.g. "{tenant_context}")
value: Value to insert
Returns:
str: Template with variable placeholder replaced
"""
return template.replace(variable, value or "")
class ModelVariables:
"""Manages model-related variables and configurations"""
def __init__(self, tenant_id: int, variables: Dict[str, Any] = None):
"""
Initialize ModelVariables with tenant and optional template manager
Args:
tenant: Tenant instance
template_manager: Optional TemplateManager instance
"""
current_app.logger.info(f'Model variables initialized with tenant {tenant_id} and variables \n{variables}')
self.tenant_id = tenant_id
self._variables = variables if variables is not None else self._initialize_variables()
current_app.logger.info(f'Model _variables initialized to {self._variables}')
self._embedding_model = None
self._embedding_model_class = None
self._llm_instances = {}
self.llm_metrics_handler = LLMMetricsHandler()
self._transcription_model = None
def _initialize_variables(self) -> Dict[str, Any]:
"""Initialize the variables dictionary"""
variables = {}
tenant = Tenant.query.get(self.tenant_id)
if not tenant:
raise EveAITenantNotFound(self.tenant_id)
# Set model providers
variables['embedding_provider'], variables['embedding_model'] = tenant.embedding_model.split('.')
variables['llm_provider'], variables['llm_model'] = tenant.llm_model.split('.')
variables['llm_full_model'] = tenant.llm_model
# Set model-specific configurations
model_config = MODEL_CONFIG.get(variables['llm_provider'], {}).get(variables['llm_model'], {})
variables.update(model_config)
# Additional configurations
variables['annotation_chunk_length'] = current_app.config['ANNOTATION_TEXT_CHUNK_LENGTH'][tenant.llm_model]
variables['max_compression_duration'] = current_app.config['MAX_COMPRESSION_DURATION']
variables['max_transcription_duration'] = current_app.config['MAX_TRANSCRIPTION_DURATION']
variables['compression_cpu_limit'] = current_app.config['COMPRESSION_CPU_LIMIT']
variables['compression_process_delay'] = current_app.config['COMPRESSION_PROCESS_DELAY']
return variables
@property
def embedding_model(self):
"""Get the embedding model instance"""
if self._embedding_model is None:
api_key = os.getenv('OPENAI_API_KEY')
self._embedding_model = TrackedOpenAIEmbeddings(
api_key=api_key,
model=self._variables['embedding_model']
)
return self._embedding_model
@property
def embedding_model_class(self):
"""Get the embedding model class"""
if self._embedding_model_class is None:
if self._variables['embedding_model'] == 'text-embedding-3-large':
self._embedding_model_class = EmbeddingLargeOpenAI
else: # text-embedding-3-small
self._embedding_model_class = EmbeddingSmallOpenAI
return self._embedding_model_class
@property
def annotation_chunk_length(self):
return self._variables['annotation_chunk_length']
@property
def max_compression_duration(self):
return self._variables['max_compression_duration']
@property
def max_transcription_duration(self):
return self._variables['max_transcription_duration']
@property
def compression_cpu_limit(self):
return self._variables['compression_cpu_limit']
@property
def compression_process_delay(self):
return self._variables['compression_process_delay']
def get_llm(self, temperature: float = 0.3, **kwargs) -> Any:
"""
Get an LLM instance with specific configuration
Args:
temperature: The temperature for the LLM
**kwargs: Additional configuration parameters
Returns:
An instance of the configured LLM
"""
cache_key = f"{temperature}_{hash(frozenset(kwargs.items()))}"
if cache_key not in self._llm_instances:
provider = self._variables['llm_provider']
model = self._variables['llm_model']
if provider == 'openai':
self._llm_instances[cache_key] = ChatOpenAI(
api_key=os.getenv('OPENAI_API_KEY'),
model=model,
temperature=temperature,
callbacks=[self.llm_metrics_handler],
**kwargs
)
elif provider == 'anthropic':
self._llm_instances[cache_key] = ChatAnthropic(
api_key=os.getenv('ANTHROPIC_API_KEY'),
model=current_app.config['ANTHROPIC_LLM_VERSIONS'][model],
temperature=temperature,
callbacks=[self.llm_metrics_handler],
**kwargs
)
else:
raise ValueError(f"Unsupported LLM provider: {provider}")
return self._llm_instances[cache_key]
@property
def transcription_model(self) -> TrackedOpenAITranscription:
"""Get the transcription model instance"""
if self._transcription_model is None:
api_key = os.getenv('OPENAI_API_KEY')
self._transcription_model = TrackedOpenAITranscription(
api_key=api_key,
model='whisper-1'
)
return self._transcription_model
# Remove the old transcription-related methods since they're now handled by TrackedOpenAITranscription
@property
def transcription_client(self):
raise DeprecationWarning("Use transcription_model instead")
def transcribe(self, *args, **kwargs):
raise DeprecationWarning("Use transcription_model.transcribe() instead")
def get_template(self, template_name: str, version: Optional[str] = None) -> str:
"""
Get a template for the tenant's configured LLM
Args:
template_name: Name of the template to retrieve
version: Optional specific version to retrieve
Returns:
The template content
"""
try:
template = template_manager.get_template(
self._variables['llm_full_model'],
template_name,
version
)
return template.content
except Exception as e:
current_app.logger.error(f"Error getting template {template_name}: {str(e)}")
# Fall back to old template loading if template_manager fails
if template_name in self._variables.get('templates', {}):
return self._variables['templates'][template_name]
raise
class ModelVariablesCacheHandler(CacheHandler[ModelVariables]):
handler_name = 'model_vars_cache' # Used to access handler instance from cache_manager
def __init__(self, region):
super().__init__(region, 'model_variables')
self.configure_keys('tenant_id')
self.subscribe_to_model('Tenant', ['tenant_id'])
def to_cache_data(self, instance: ModelVariables) -> Dict[str, Any]:
return {
'tenant_id': instance.tenant_id,
'variables': instance._variables,
'last_updated': dt.now(tz=tz.utc).isoformat()
}
def from_cache_data(self, data: Dict[str, Any], tenant_id: int, **kwargs) -> ModelVariables:
instance = ModelVariables(tenant_id, data.get('variables'))
return instance
def should_cache(self, value: Dict[str, Any]) -> bool:
required_fields = {'tenant_id', 'variables'}
return all(field in value for field in required_fields)
# Register the handler with the cache manager
cache_manager.register_handler(ModelVariablesCacheHandler, 'eveai_model')
# Helper function to get cached model variables
def get_model_variables(tenant_id: int) -> ModelVariables:
return cache_manager.model_vars_cache.get(
lambda tenant_id: ModelVariables(tenant_id), # function to create ModelVariables if required
tenant_id=tenant_id
)
# Written in a long format, without lambda
# def get_model_variables(tenant_id: int) -> ModelVariables:
# """
# Get ModelVariables instance, either from cache or newly created
#
# Args:
# tenant_id: The tenant's ID
#
# Returns:
# ModelVariables: Instance with either cached or fresh data
#
# Raises:
# TenantNotFoundError: If tenant doesn't exist
# CacheStateError: If cached data is invalid
# """
#
# def create_new_instance(tenant_id: int) -> ModelVariables:
# """Creator function that's called when cache miss occurs"""
# return ModelVariables(tenant_id) # This will initialize fresh variables
#
# return cache_manager.model_vars_cache.get(
# create_new_instance, # Function to create new instance if needed
# tenant_id=tenant_id # Parameters passed to both get() and create_new_instance
# )

View File

@@ -1,4 +1,6 @@
import os
import sys
import gevent
import time
from flask import current_app
@@ -28,3 +30,17 @@ def sync_folder(file_path):
dir_fd = os.open(file_path, os.O_RDONLY)
os.fsync(dir_fd)
os.close(dir_fd)
def get_project_root():
"""Get the root directory of the project."""
# Use the module that's actually running (not this file)
module = sys.modules['__main__']
if hasattr(module, '__file__'):
# Get the path to the main module
main_path = os.path.abspath(module.__file__)
# Get the root directory (where the main module is located)
return os.path.dirname(main_path)
else:
# Fallback: use current working directory
return os.getcwd()

View File

@@ -1,10 +1,15 @@
from flask import session, current_app
from sqlalchemy import and_
from common.models.user import Tenant
from common.models.entitlements import License
from common.utils.database import Database
from common.utils.eveai_exceptions import EveAITenantNotFound, EveAITenantInvalid, EveAINoActiveLicense
from datetime import datetime as dt, timezone as tz
# Definition of Trigger Handlers
def set_tenant_session_data(sender, user, **kwargs):
current_app.logger.debug(f"Setting tenant session data for user {user.id}")
tenant = Tenant.query.filter_by(id=user.tenant_id).first()
session['tenant'] = tenant.to_dict()
session['default_language'] = tenant.default_language
@@ -16,4 +21,25 @@ def clear_tenant_session_data(sender, user, **kwargs):
session.pop('tenant', None)
session.pop('default_language', None)
session.pop('default_embedding_model', None)
session.pop('default_llm_model', None)
session.pop('default_llm_model', None)
def is_valid_tenant(tenant_id):
if tenant_id == 1: # The 'root' tenant, is always valid
return True
tenant = Tenant.query.get(tenant_id)
Database(tenant).switch_schema()
if tenant is None:
raise EveAITenantNotFound()
elif tenant.type == 'Inactive':
raise EveAITenantInvalid(tenant_id)
else:
current_date = dt.now(tz=tz.utc).date()
active_license = (License.query.filter_by(tenant_id=tenant_id)
.filter(and_(License.start_date <= current_date,
License.end_date >= current_date))
.one_or_none())
if not active_license:
raise EveAINoActiveLicense(tenant_id)
return True

View File

@@ -11,7 +11,7 @@ def confirm_token(token, expiration=3600):
try:
email = serializer.loads(token, salt=current_app.config['SECURITY_PASSWORD_SALT'], max_age=expiration)
except Exception as e:
current_app.logger.debug(f'Error confirming token: {e}')
current_app.logger.error(f'Error confirming token: {e}')
raise
return email
@@ -35,14 +35,11 @@ def generate_confirmation_token(email):
def send_confirmation_email(user):
current_app.logger.debug(f'Sending confirmation email to {user.email}')
if not test_smtp_connection():
raise Exception("Failed to connect to SMTP server")
token = generate_confirmation_token(user.email)
confirm_url = prefixed_url_for('security_bp.confirm_email', token=token, _external=True)
current_app.logger.debug(f'Confirmation URL: {confirm_url}')
html = render_template('email/activate.html', confirm_url=confirm_url)
subject = "Please confirm your email"
@@ -56,10 +53,8 @@ def send_confirmation_email(user):
def send_reset_email(user):
current_app.logger.debug(f'Sending reset email to {user.email}')
token = generate_reset_token(user.email)
reset_url = prefixed_url_for('security_bp.reset_password', token=token, _external=True)
current_app.logger.debug(f'Reset URL: {reset_url}')
html = render_template('email/reset_password.html', reset_url=reset_url)
subject = "Reset Your Password"
@@ -98,4 +93,3 @@ def test_smtp_connection():
except Exception as e:
current_app.logger.error(f"Failed to connect to SMTP server: {str(e)}")
return False

View File

@@ -4,7 +4,7 @@ from flask import Flask
def generate_api_key(prefix="EveAI-Chat"):
parts = [str(random.randint(1000, 9999)) for _ in range(5)]
parts = [str(random.randint(1000, 9999)) for _ in range(8)]
return f"{prefix}-{'-'.join(parts)}"

View File

@@ -0,0 +1,112 @@
from typing import List, Union
import re
class StringListConverter:
"""Utility class for converting between comma-separated strings and lists"""
@staticmethod
def string_to_list(input_string: Union[str, None], allow_empty: bool = True) -> List[str]:
"""
Convert a comma-separated string to a list of strings.
Args:
input_string: Comma-separated string to convert
allow_empty: If True, returns empty list for None/empty input
If False, raises ValueError for None/empty input
Returns:
List of stripped strings
Raises:
ValueError: If input is None/empty and allow_empty is False
"""
if not input_string:
if allow_empty:
return []
raise ValueError("Input string cannot be None or empty")
return [item.strip() for item in input_string.split(',') if item.strip()]
@staticmethod
def list_to_string(input_list: Union[List[str], None], allow_empty: bool = True) -> str:
"""
Convert a list of strings to a comma-separated string.
Args:
input_list: List of strings to convert
allow_empty: If True, returns empty string for None/empty input
If False, raises ValueError for None/empty input
Returns:
Comma-separated string
Raises:
ValueError: If input is None/empty and allow_empty is False
"""
if not input_list:
if allow_empty:
return ''
raise ValueError("Input list cannot be None or empty")
return ', '.join(str(item).strip() for item in input_list)
@staticmethod
def validate_format(input_string: str,
allowed_chars: str = r'a-zA-Z0-9_\-',
min_length: int = 1,
max_length: int = 50) -> bool:
"""
Validate the format of items in a comma-separated string.
Args:
input_string: String to validate
allowed_chars: String of allowed characters (for regex pattern)
min_length: Minimum length for each item
max_length: Maximum length for each item
Returns:
bool: True if format is valid, False otherwise
"""
if not input_string:
return False
# Create regex pattern for individual items
pattern = f'^[{allowed_chars}]{{{min_length},{max_length}}}$'
try:
# Convert to list and check each item
items = StringListConverter.string_to_list(input_string)
return all(bool(re.match(pattern, item)) for item in items)
except Exception:
return False
@staticmethod
def validate_and_convert(input_string: str,
allowed_chars: str = r'a-zA-Z0-9_\-',
min_length: int = 1,
max_length: int = 50) -> List[str]:
"""
Validate and convert a comma-separated string to a list.
Args:
input_string: String to validate and convert
allowed_chars: String of allowed characters (for regex pattern)
min_length: Minimum length for each item
max_length: Maximum length for each item
Returns:
List of validated and converted strings
Raises:
ValueError: If input string format is invalid
"""
if not StringListConverter.validate_format(
input_string, allowed_chars, min_length, max_length
):
raise ValueError(
f"Invalid format. Items must be {min_length}-{max_length} characters "
f"long and contain only these characters: {allowed_chars}"
)
return StringListConverter.string_to_list(input_string)

View File

@@ -0,0 +1,60 @@
from dataclasses import dataclass
from typing import Optional
from datetime import datetime
from flask_jwt_extended import decode_token, verify_jwt_in_request
from flask import current_app
@dataclass
class TokenValidationResult:
"""Clean, simple validation result"""
is_valid: bool
tenant_id: Optional[int] = None
error_message: Optional[str] = None
class TokenValidator:
"""Simplified token validator focused on JWT validation"""
def validate_token(self, token: str) -> TokenValidationResult:
"""
Validate JWT token
Args:
token: The JWT token to validate
Returns:
TokenValidationResult with validation status and tenant_id if valid
"""
try:
# Decode and validate token
decoded_token = decode_token(token)
# Extract tenant_id from token subject
tenant_id = decoded_token.get('sub')
if not tenant_id:
return TokenValidationResult(
is_valid=False,
error_message="Missing tenant ID in token"
)
# Verify token timestamps
now = datetime.utcnow().timestamp()
if not (decoded_token.get('exp', 0) > now >= decoded_token.get('nbf', 0)):
return TokenValidationResult(
is_valid=False,
error_message="Token expired or not yet valid"
)
# Token is valid
return TokenValidationResult(
is_valid=True,
tenant_id=tenant_id
)
except Exception as e:
current_app.logger.error(f"Token validation error: {str(e)}")
return TokenValidationResult(
is_valid=False,
error_message=str(e)
)

View File

@@ -44,7 +44,7 @@ def form_validation_failed(request, form):
for fieldName, errorMessages in form.errors.items():
for err in errorMessages:
flash(f"Error in {fieldName}: {err}", 'danger')
current_app.logger.debug(f"Error in {fieldName}: {err}")
current_app.logger.error(f"Error in {fieldName}: {err}")
def form_to_dict(form):

View File

@@ -1,3 +1,4 @@
import os
from os import environ, path
from datetime import timedelta
import redis
@@ -68,9 +69,6 @@ class Config(object):
ANTHROPIC_LLM_VERSIONS = {'claude-3-5-sonnet': 'claude-3-5-sonnet-20240620', }
# Load prompt templates dynamically
PROMPT_TEMPLATES = {model: load_prompt_templates(model) for model in SUPPORTED_LLMS}
# Annotation text chunk length
ANNOTATION_TEXT_CHUNK_LENGTH = {
'openai.gpt-4o': 10000,
@@ -87,9 +85,6 @@ class Config(object):
# Anthropic API Keys
ANTHROPIC_API_KEY = environ.get('ANTHROPIC_API_KEY')
# Portkey API Keys
PORTKEY_API_KEY = environ.get('PORTKEY_API_KEY')
# Celery settings
CELERY_TASK_SERIALIZER = 'json'
CELERY_RESULT_SERIALIZER = 'json'
@@ -138,7 +133,10 @@ class Config(object):
MAIL_USE_SSL = True
MAIL_USERNAME = environ.get('MAIL_USERNAME')
MAIL_PASSWORD = environ.get('MAIL_PASSWORD')
MAIL_DEFAULT_SENDER = ('eveAI Admin', MAIL_USERNAME)
MAIL_DEFAULT_SENDER = ('Evie', MAIL_USERNAME)
# Email settings for API key notifications
PROMOTIONAL_IMAGE_URL = 'https://askeveai.com/wp-content/uploads/2024/07/Evie-Call-scaled.jpg' # Replace with your actual URL
# Langsmith settings
LANGCHAIN_TRACING_V2 = True
@@ -148,7 +146,7 @@ class Config(object):
SUPPORTED_FILE_TYPES = ['pdf', 'html', 'md', 'txt', 'mp3', 'mp4', 'ogg', 'srt']
TENANT_TYPES = ['Active', 'Demo', 'Inactive', 'Test']
TENANT_TYPES = ['Active', 'Demo', 'Inactive', 'Test', 'Wordpress Starter']
# The maximum number of seconds allowed for audio compression (to save resources)
MAX_COMPRESSION_DURATION = 60*10 # 10 minutes
@@ -159,6 +157,13 @@ class Config(object):
# Delay between compressing chunks in seconds
COMPRESSION_PROCESS_DELAY = 1
# WordPress Integration Settings
WORDPRESS_PROTOCOL = os.environ.get('WORDPRESS_PROTOCOL', 'http')
WORDPRESS_HOST = os.environ.get('WORDPRESS_HOST', 'host.docker.internal')
WORDPRESS_PORT = os.environ.get('WORDPRESS_PORT', '10003')
WORDPRESS_BASE_URL = f"{WORDPRESS_PROTOCOL}://{WORDPRESS_HOST}:{WORDPRESS_PORT}"
EXTERNAL_WORDPRESS_BASE_URL = 'localhost:10003'
class DevConfig(Config):
DEVELOPMENT = True
@@ -181,13 +186,21 @@ class DevConfig(Config):
# file upload settings
# UPLOAD_FOLDER = '/app/tenant_files'
# Redis Settings
REDIS_URL = 'redis'
REDIS_PORT = '6379'
REDIS_BASE_URI = f'redis://{REDIS_URL}:{REDIS_PORT}'
# Celery settings
# eveai_app Redis Settings
CELERY_BROKER_URL = 'redis://redis:6379/0'
CELERY_RESULT_BACKEND = 'redis://redis:6379/0'
CELERY_BROKER_URL = f'{REDIS_BASE_URI}/0'
CELERY_RESULT_BACKEND = f'{REDIS_BASE_URI}/0'
# eveai_chat Redis Settings
CELERY_BROKER_URL_CHAT = 'redis://redis:6379/3'
CELERY_RESULT_BACKEND_CHAT = 'redis://redis:6379/3'
CELERY_BROKER_URL_CHAT = f'{REDIS_BASE_URI}/3'
CELERY_RESULT_BACKEND_CHAT = f'{REDIS_BASE_URI}/3'
# eveai_chat_workers cache Redis Settings
CHAT_WORKER_CACHE_URL = f'{REDIS_BASE_URI}/4'
# Unstructured settings
# UNSTRUCTURED_API_KEY = 'pDgCrXumYhM3CNvjvwV8msMldXC3uw'
@@ -195,7 +208,7 @@ class DevConfig(Config):
# UNSTRUCTURED_FULL_URL = 'https://flowitbv-16c4us0m.api.unstructuredapp.io/general/v0/general'
# SocketIO settings
SOCKETIO_MESSAGE_QUEUE = 'redis://redis:6379/1'
SOCKETIO_MESSAGE_QUEUE = f'{REDIS_BASE_URI}/1'
SOCKETIO_CORS_ALLOWED_ORIGINS = '*'
SOCKETIO_LOGGER = True
SOCKETIO_ENGINEIO_LOGGER = True
@@ -211,7 +224,7 @@ class DevConfig(Config):
GC_CRYPTO_KEY = 'envelope-encryption-key'
# Session settings
SESSION_REDIS = redis.from_url('redis://redis:6379/2')
SESSION_REDIS = redis.from_url(f'{REDIS_BASE_URI}/2')
# PATH settings
ffmpeg_path = '/usr/bin/ffmpeg'
@@ -278,6 +291,8 @@ class ProdConfig(Config):
# eveai_chat Redis Settings
CELERY_BROKER_URL_CHAT = f'{REDIS_BASE_URI}/3'
CELERY_RESULT_BACKEND_CHAT = f'{REDIS_BASE_URI}/3'
# eveai_chat_workers cache Redis Settings
CHAT_WORKER_CACHE_URL = f'{REDIS_BASE_URI}/4'
# Session settings
SESSION_REDIS = redis.from_url(f'{REDIS_BASE_URI}/2')

View File

@@ -1,4 +1,8 @@
import json
import os
from datetime import datetime as dt, timezone as tz
from flask import current_app
from graypy import GELFUDPHandler
import logging
import logging.config
@@ -9,24 +13,173 @@ GRAYLOG_PORT = int(os.environ.get('GRAYLOG_PORT', 12201))
env = os.environ.get('FLASK_ENV', 'development')
class CustomLogRecord(logging.LogRecord):
class TuningLogRecord(logging.LogRecord):
"""Extended LogRecord that handles both tuning and business event logging"""
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
# Initialize extra fields after parent initialization
self._extra_fields = {}
self._is_tuning_log = False
self._tuning_type = None
self._tuning_tenant_id = None
self._tuning_catalog_id = None
self._tuning_specialist_id = None
self._tuning_retriever_id = None
self._tuning_processor_id = None
self.component = os.environ.get('COMPONENT_NAME', 'eveai_app')
def __setattr__(self, name, value):
if name not in {'event_type', 'tenant_id', 'trace_id', 'span_id', 'span_name', 'parent_span_id',
'document_version_id', 'chat_session_id', 'interaction_id', 'environment'}:
super().__setattr__(name, value)
def getMessage(self):
"""
Override getMessage to handle both string and dict messages
"""
msg = self.msg
if self.args:
msg = msg % self.args
return msg
@property
def is_tuning_log(self):
return self._is_tuning_log
@is_tuning_log.setter
def is_tuning_log(self, value):
object.__setattr__(self, '_is_tuning_log', value)
@property
def tuning_type(self):
return self._tuning_type
@tuning_type.setter
def tuning_type(self, value):
object.__setattr__(self, '_tuning_type', value)
def get_tuning_data(self):
"""Get all tuning-related data if this is a tuning log"""
if not self._is_tuning_log:
return {}
return {
'is_tuning_log': self._is_tuning_log,
'tuning_type': self._tuning_type,
'tuning_tenant_id': self._tuning_tenant_id,
'tuning_catalog_id': self._tuning_catalog_id,
'tuning_specialist_id': self._tuning_specialist_id,
'tuning_retriever_id': self._tuning_retriever_id,
'tuning_processor_id': self._tuning_processor_id,
}
def set_tuning_data(self, tenant_id=None, catalog_id=None, specialist_id=None,
retriever_id=None, processor_id=None):
"""Set tuning-specific data"""
object.__setattr__(self, '_tuning_tenant_id', tenant_id)
object.__setattr__(self, '_tuning_catalog_id', catalog_id)
object.__setattr__(self, '_tuning_specialist_id', specialist_id)
object.__setattr__(self, '_tuning_retriever_id', retriever_id)
object.__setattr__(self, '_tuning_processor_id', processor_id)
def custom_log_record_factory(*args, **kwargs):
record = CustomLogRecord(*args, **kwargs)
return record
class TuningFormatter(logging.Formatter):
"""Universal formatter for all tuning logs"""
def __init__(self, fmt=None, datefmt=None):
super().__init__(fmt or '%(asctime)s [%(levelname)s] %(name)s: %(message)s',
datefmt or '%Y-%m-%d %H:%M:%S')
def format(self, record):
# First format with the default formatter to handle basic fields
formatted_msg = super().format(record)
# If this is a tuning log, add the additional context
if getattr(record, 'is_tuning_log', False):
try:
identifiers = []
if hasattr(record, 'tenant_id') and record.tenant_id:
identifiers.append(f"Tenant: {record.tenant_id}")
if hasattr(record, 'catalog_id') and record.catalog_id:
identifiers.append(f"Catalog: {record.catalog_id}")
if hasattr(record, 'processor_id') and record.processor_id:
identifiers.append(f"Processor: {record.processor_id}")
formatted_msg = (
f"{formatted_msg}\n"
f"[TUNING {record.tuning_type}] [{' | '.join(identifiers)}]"
)
if hasattr(record, 'tuning_data') and record.tuning_data:
formatted_msg += f"\nData: {json.dumps(record.tuning_data, indent=2)}"
except Exception as e:
return f"{formatted_msg} (Error formatting tuning data: {str(e)})"
return formatted_msg
class GraylogFormatter(logging.Formatter):
"""Maintains existing Graylog formatting while adding tuning fields"""
def format(self, record):
if getattr(record, 'is_tuning_log', False):
# Add tuning-specific fields to Graylog
record.tuning_fields = {
'is_tuning_log': True,
'tuning_type': record.tuning_type,
'tenant_id': record.tenant_id,
'catalog_id': record.catalog_id,
'specialist_id': record.specialist_id,
'retriever_id': record.retriever_id,
'processor_id': record.processor_id,
}
return super().format(record)
class TuningLogger:
"""Helper class to manage tuning logs with consistent structure"""
def __init__(self, logger_name, tenant_id=None, catalog_id=None, specialist_id=None, retriever_id=None, processor_id=None):
self.logger = logging.getLogger(logger_name)
self.tenant_id = tenant_id
self.catalog_id = catalog_id
self.specialist_id = specialist_id
self.retriever_id = retriever_id
self.processor_id = processor_id
def log_tuning(self, tuning_type: str, message: str, data=None, level=logging.DEBUG):
"""Log a tuning event with structured data"""
try:
# Create a standard LogRecord for tuning
record = logging.LogRecord(
name=self.logger.name,
level=level,
pathname='',
lineno=0,
msg=message,
args=(),
exc_info=None
)
# Add tuning-specific attributes
record.is_tuning_log = True
record.tuning_type = tuning_type
record.tenant_id = self.tenant_id
record.catalog_id = self.catalog_id
record.specialist_id = self.specialist_id
record.retriever_id = self.retriever_id
record.processor_id = self.processor_id
if data:
record.tuning_data = data
# Process the record
self.logger.handle(record)
except Exception as e:
fallback_logger = logging.getLogger('eveai_workers')
fallback_logger.exception(f"Failed to log tuning message: {str(e)}")
# Set the custom log record factory
logging.setLogRecordFactory(custom_log_record_factory)
logging.setLogRecordFactory(TuningLogRecord)
LOGGING = {
@@ -37,104 +190,104 @@ LOGGING = {
'level': 'DEBUG',
'class': 'logging.handlers.RotatingFileHandler',
'filename': 'logs/eveai_app.log',
'maxBytes': 1024 * 1024 * 5, # 5MB
'backupCount': 10,
'maxBytes': 1024 * 1024 * 1, # 1MB
'backupCount': 2,
'formatter': 'standard',
},
'file_workers': {
'level': 'DEBUG',
'class': 'logging.handlers.RotatingFileHandler',
'filename': 'logs/eveai_workers.log',
'maxBytes': 1024 * 1024 * 5, # 5MB
'backupCount': 10,
'maxBytes': 1024 * 1024 * 1, # 1MB
'backupCount': 2,
'formatter': 'standard',
},
'file_chat': {
'level': 'DEBUG',
'class': 'logging.handlers.RotatingFileHandler',
'filename': 'logs/eveai_chat.log',
'maxBytes': 1024 * 1024 * 5, # 5MB
'backupCount': 10,
'maxBytes': 1024 * 1024 * 1, # 1MB
'backupCount': 2,
'formatter': 'standard',
},
'file_chat_workers': {
'level': 'DEBUG',
'class': 'logging.handlers.RotatingFileHandler',
'filename': 'logs/eveai_chat_workers.log',
'maxBytes': 1024 * 1024 * 5, # 5MB
'backupCount': 10,
'maxBytes': 1024 * 1024 * 1, # 1MB
'backupCount': 2,
'formatter': 'standard',
},
'file_api': {
'level': 'DEBUG',
'class': 'logging.handlers.RotatingFileHandler',
'filename': 'logs/eveai_api.log',
'maxBytes': 1024 * 1024 * 5, # 5MB
'backupCount': 10,
'maxBytes': 1024 * 1024 * 1, # 1MB
'backupCount': 2,
'formatter': 'standard',
},
'file_beat': {
'level': 'DEBUG',
'class': 'logging.handlers.RotatingFileHandler',
'filename': 'logs/eveai_beat.log',
'maxBytes': 1024 * 1024 * 5, # 5MB
'backupCount': 10,
'maxBytes': 1024 * 1024 * 1, # 1MB
'backupCount': 2,
'formatter': 'standard',
},
'file_entitlements': {
'level': 'DEBUG',
'class': 'logging.handlers.RotatingFileHandler',
'filename': 'logs/eveai_entitlements.log',
'maxBytes': 1024 * 1024 * 5, # 5MB
'backupCount': 10,
'maxBytes': 1024 * 1024 * 1, # 1MB
'backupCount': 2,
'formatter': 'standard',
},
'file_sqlalchemy': {
'level': 'DEBUG',
'class': 'logging.handlers.RotatingFileHandler',
'filename': 'logs/sqlalchemy.log',
'maxBytes': 1024 * 1024 * 5, # 5MB
'backupCount': 10,
'maxBytes': 1024 * 1024 * 1, # 1MB
'backupCount': 2,
'formatter': 'standard',
},
'file_mailman': {
'level': 'DEBUG',
'class': 'logging.handlers.RotatingFileHandler',
'filename': 'logs/mailman.log',
'maxBytes': 1024 * 1024 * 5, # 5MB
'backupCount': 10,
'maxBytes': 1024 * 1024 * 1, # 1MB
'backupCount': 2,
'formatter': 'standard',
},
'file_security': {
'level': 'DEBUG',
'class': 'logging.handlers.RotatingFileHandler',
'filename': 'logs/security.log',
'maxBytes': 1024 * 1024 * 5, # 5MB
'backupCount': 10,
'maxBytes': 1024 * 1024 * 1, # 1MB
'backupCount': 2,
'formatter': 'standard',
},
'file_rag_tuning': {
'level': 'DEBUG',
'class': 'logging.handlers.RotatingFileHandler',
'filename': 'logs/rag_tuning.log',
'maxBytes': 1024 * 1024 * 5, # 5MB
'backupCount': 10,
'maxBytes': 1024 * 1024 * 1, # 1MB
'backupCount': 2,
'formatter': 'standard',
},
'file_embed_tuning': {
'level': 'DEBUG',
'class': 'logging.handlers.RotatingFileHandler',
'filename': 'logs/embed_tuning.log',
'maxBytes': 1024 * 1024 * 5, # 5MB
'backupCount': 10,
'maxBytes': 1024 * 1024 * 1, # 1MB
'backupCount': 2,
'formatter': 'standard',
},
'file_business_events': {
'level': 'INFO',
'class': 'logging.handlers.RotatingFileHandler',
'filename': 'logs/business_events.log',
'maxBytes': 1024 * 1024 * 5, # 5MB
'backupCount': 10,
'maxBytes': 1024 * 1024 * 1, # 1MB
'backupCount': 2,
'formatter': 'standard',
},
'console': {
@@ -142,25 +295,38 @@ LOGGING = {
'level': 'DEBUG',
'formatter': 'standard',
},
'tuning_file': {
'level': 'DEBUG',
'class': 'logging.handlers.RotatingFileHandler',
'filename': 'logs/tuning.log',
'maxBytes': 1024 * 1024 * 3, # 3MB
'backupCount': 3,
'formatter': 'tuning',
},
'graylog': {
'level': 'DEBUG',
'class': 'graypy.GELFUDPHandler',
'host': GRAYLOG_HOST,
'port': GRAYLOG_PORT,
'debugging_fields': True, # Set to True if you want to include debugging fields
'extra_fields': True, # Set to True if you want to include extra fields
'debugging_fields': True,
'formatter': 'graylog'
},
},
'formatters': {
'standard': {
'format': '%(asctime)s [%(levelname)s] %(name)s (%(component)s) [%(module)s:%(lineno)d in %(funcName)s] '
'[Thread: %(threadName)s]: %(message)s'
'format': '%(asctime)s [%(levelname)s] %(name)s (%(component)s) [%(module)s:%(lineno)d]: %(message)s',
'datefmt': '%Y-%m-%d %H:%M:%S'
},
'graylog': {
'format': '[%(levelname)s] %(name)s (%(component)s) [%(module)s:%(lineno)d in %(funcName)s] '
'[Thread: %(threadName)s]: %(message)s',
'datefmt': '%Y-%m-%d %H:%M:%S',
'()': GraylogFormatter
},
'tuning': {
'()': TuningFormatter,
'datefmt': '%Y-%m-%d %H:%M:%S UTC'
}
},
'loggers': {
'eveai_app': { # logger for the eveai_app
@@ -213,21 +379,17 @@ LOGGING = {
'level': 'DEBUG',
'propagate': False
},
'rag_tuning': { # logger for the rag_tuning
'handlers': ['file_rag_tuning', 'graylog', ] if env == 'production' else ['file_rag_tuning', ],
'level': 'DEBUG',
'propagate': False
},
'embed_tuning': { # logger for the embed_tuning
'handlers': ['file_embed_tuning', 'graylog', ] if env == 'production' else ['file_embed_tuning', ],
'level': 'DEBUG',
'propagate': False
},
'business_events': {
'handlers': ['file_business_events', 'graylog'],
'level': 'DEBUG',
'propagate': False
},
# Single tuning logger
'tuning': {
'handlers': ['tuning_file', 'graylog'] if env == 'production' else ['tuning_file'],
'level': 'DEBUG',
'propagate': False,
},
'': { # root logger
'handlers': ['console'],
'level': 'WARNING', # Set higher level for root to minimize noise

View File

@@ -1,88 +0,0 @@
html_parse: |
You are a top administrative assistant specialized in transforming given HTML into markdown formatted files. The generated files will be used to generate embeddings in a RAG-system.
# Best practices are:
- Respect wordings and language(s) used in the HTML.
- The following items need to be considered: headings, paragraphs, listed items (numbered or not) and tables. Images can be neglected.
- Sub-headers can be used as lists. This is true when a header is followed by a series of sub-headers without content (paragraphs or listed items). Present those sub-headers as a list.
- Be careful of encoding of the text. Everything needs to be human readable.
Process the file carefully, and take a stepped approach. The resulting markdown should be the result of the processing of the complete input html file. Answer with the pure markdown, without any other text.
HTML is between triple backticks.
```{html}```
pdf_parse: |
You are a top administrative aid specialized in transforming given PDF-files into markdown formatted files. The generated files will be used to generate embeddings in a RAG-system.
# Best practices are:
- Respect wordings and language(s) used in the PDF.
- The following items need to be considered: headings, paragraphs, listed items (numbered or not) and tables. Images can be neglected.
- When headings are numbered, show the numbering and define the header level.
- A new item is started when a <return> is found before a full line is reached. In order to know the number of characters in a line, please check the document and the context within the document (e.g. an image could limit the number of characters temporarily).
- Paragraphs are to be stripped of newlines so they become easily readable.
- Be careful of encoding of the text. Everything needs to be human readable.
Process the file carefully, and take a stepped approach. The resulting markdown should be the result of the processing of the complete input pdf content. Answer with the pure markdown, without any other text.
PDF content is between triple backticks.
```{pdf_content}```
summary: |
Write a concise summary of the text in {language}. The text is delimited between triple backticks.
```{text}```
rag: |
Answer the question based on the following context, delimited between triple backticks.
{tenant_context}
Use the following {language} in your communication, and cite the sources used.
If the question cannot be answered using the given context, say "I have insufficient information to answer this question."
Context:
```{context}```
Question:
{question}
history: |
You are a helpful assistant that details a question based on a previous context,
in such a way that the question is understandable without the previous context.
The context is a conversation history, with the HUMAN asking questions, the AI answering questions.
The history is delimited between triple backticks.
You answer by stating the question in {language}.
History:
```{history}```
Question to be detailed:
{question}
encyclopedia: |
You have a lot of background knowledge, and as such you are some kind of
'encyclopedia' to explain general terminology. Only answer if you have a clear understanding of the question.
If not, say you do not have sufficient information to answer the question. Use the {language} in your communication.
Question:
{question}
transcript: |
"""You are a top administrative assistant specialized in transforming given transcriptions into markdown formatted files. Your task is to process and improve the given transcript, not to summarize it.
IMPORTANT INSTRUCTIONS:
1. DO NOT summarize the transcript and don't make your own interpretations. Return the FULL, COMPLETE transcript with improvements.
2. Improve any errors in the transcript based on context.
3. Respect the original wording and language(s) used in the transcription. Main Language used is {language}.
4. Divide the transcript into paragraphs for better readability. Each paragraph ONLY contains ORIGINAL TEXT.
5. Group related paragraphs into logical sections.
6. Add appropriate headers (using markdown syntax) to each section in {language}.
7. We do not need an overall title. Just add logical headers
8. Ensure that the entire transcript is included in your response, from start to finish.
REMEMBER:
- Your output should be the complete transcript in markdown format, NOT A SUMMARY OR ANALYSIS.
- Include EVERYTHING from the original transcript, just organized and formatted better.
- Just return the markdown version of the transcript, without any other text such as an introduction or a summary.
Here is the transcript to process (between triple backticks):
```{transcript}```
Process this transcript according to the instructions above and return the full, formatted markdown version.
"""

View File

@@ -1,79 +0,0 @@
html_parse: |
You are a top administrative assistant specialized in transforming given HTML into markdown formatted files. The generated files will be used to generate embeddings in a RAG-system.
# Best practices are:
- Respect wordings and language(s) used in the HTML.
- The following items need to be considered: headings, paragraphs, listed items (numbered or not) and tables. Images can be neglected.
- Sub-headers can be used as lists. This is true when a header is followed by a series of sub-headers without content (paragraphs or listed items). Present those sub-headers as a list.
- Be careful of encoding of the text. Everything needs to be human readable.
Process the file carefully, and take a stepped approach. The resulting markdown should be the result of the processing of the complete input html file. Answer with the pure markdown, without any other text.
HTML is between triple backquotes.
```{html}```
pdf_parse: |
You are a top administrative aid specialized in transforming given PDF-files into markdown formatted files. The generated files will be used to generate embeddings in a RAG-system.
# Best practices are:
- Respect wordings and language(s) used in the PDF.
- The following items need to be considered: headings, paragraphs, listed items (numbered or not) and tables. Images can be neglected.
- When headings are numbered, show the numbering and define the header level.
- A new item is started when a <return> is found before a full line is reached. In order to know the number of characters in a line, please check the document and the context within the document (e.g. an image could limit the number of characters temporarily).
- Paragraphs are to be stripped of newlines so they become easily readable.
- Be careful of encoding of the text. Everything needs to be human readable.
Process the file carefully, and take a stepped approach. The resulting markdown should be the result of the processing of the complete input pdf content. Answer with the pure markdown, without any other text.
PDF content is between triple backquotes.
```{pdf_content}```
summary: |
Write a concise summary of the text in {language}. The text is delimited between triple backquotes.
```{text}```
rag: |
Answer the question based on the following context, delimited between triple backquotes.
{tenant_context}
Use the following {language} in your communication, and cite the sources used.
If the question cannot be answered using the given context, say "I have insufficient information to answer this question."
Context:
```{context}```
Question:
{question}
history: |
You are a helpful assistant that details a question based on a previous context,
in such a way that the question is understandable without the previous context.
The context is a conversation history, with the HUMAN asking questions, the AI answering questions.
The history is delimited between triple backquotes.
You answer by stating the question in {language}.
History:
```{history}```
Question to be detailed:
{question}
encyclopedia: |
You have a lot of background knowledge, and as such you are some kind of
'encyclopedia' to explain general terminology. Only answer if you have a clear understanding of the question.
If not, say you do not have sufficient information to answer the question. Use the {language} in your communication.
Question:
{question}
transcript: |
You are a top administrative assistant specialized in transforming given transcriptions into markdown formatted files. The generated files will be used to generate embeddings in a RAG-system. The transcriptions originate from podcast, videos and similar material.
# Best practices and steps are:
- Respect wordings and language(s) used in the transcription. Main language is {language}.
- Sometimes, the transcript contains speech of several people participating in a conversation. Although these are not obvious from reading the file, try to detect when other people are speaking.
- Divide the transcript into several logical parts. Ensure questions and their answers are in the same logical part.
- annotate the text to identify these logical parts using headings in {language}.
- improve errors in the transcript given the context, but do not change the meaning and intentions of the transcription.
Process the file carefully, and take a stepped approach. The resulting markdown should be the result of processing the complete input transcription. Answer with the pure markdown, without any other text.
The transcript is between triple backquotes.
```{transcript}```

View File

@@ -1,84 +0,0 @@
html_parse: |
You are a top administrative assistant specialized in transforming given HTML into markdown formatted files. The generated files will be used to generate embeddings in a RAG-system.
# Best practices are:
- Respect wordings and language(s) used in the HTML.
- The following items need to be considered: headings, paragraphs, listed items (numbered or not) and tables. Images can be neglected.
- Sub-headers can be used as lists. This is true when a header is followed by a series of sub-headers without content (paragraphs or listed items). Present those sub-headers as a list.
- Be careful of encoding of the text. Everything needs to be human readable.
Process the file carefully, and take a stepped approach. The resulting markdown should be the result of the processing of the complete input html file. Answer with the pure markdown, without any other text.
HTML is between triple backquotes.
```{html}```
pdf_parse: |
You are a top administrative aid specialized in transforming given PDF-files into markdown formatted files. The generated files will be used to generate embeddings in a RAG-system.
The content you get is already processed (some markdown already generated), but needs to be corrected. For large files, you may receive only portions of the full file. Consider this when processing the content.
# Best practices are:
- Respect wordings and language(s) used in the provided content.
- The following items need to be considered: headings, paragraphs, listed items (numbered or not) and tables. Images can be neglected.
- When headings are numbered, show the numbering and define the header level. You may have to correct current header levels, as preprocessing is known to make errors.
- A new item is started when a <return> is found before a full line is reached. In order to know the number of characters in a line, please check the document and the context within the document (e.g. an image could limit the number of characters temporarily).
- Paragraphs are to be stripped of newlines so they become easily readable.
- Be careful of encoding of the text. Everything needs to be human readable.
Process the file carefully, and take a stepped approach. The resulting markdown should be the result of the processing of the complete input pdf content. Answer with the pure markdown, without any other text.
PDF content is between triple backquotes.
```{pdf_content}```
summary: |
Write a concise summary of the text in {language}. The text is delimited between triple backquotes.
```{text}```
rag: |
Answer the question based on the following context, delimited between triple backquotes.
{tenant_context}
Use the following {language} in your communication, and cite the sources used.
If the question cannot be answered using the given context, say "I have insufficient information to answer this question."
Context:
```{context}```
Question:
{question}
history: |
You are a helpful assistant that details a question based on a previous context,
in such a way that the question is understandable without the previous context.
The context is a conversation history, with the HUMAN asking questions, the AI answering questions.
The history is delimited between triple backquotes.
You answer by stating the question in {language}.
History:
```{history}```
Question to be detailed:
{question}
encyclopedia: |
You have a lot of background knowledge, and as such you are some kind of
'encyclopedia' to explain general terminology. Only answer if you have a clear understanding of the question.
If not, say you do not have sufficient information to answer the question. Use the {language} in your communication.
Question:
{question}
transcript: |
You are a top administrative assistant specialized in transforming given transcriptions into markdown formatted files. The generated files will be used to generate embeddings in a RAG-system. The transcriptions originate from podcast, videos and similar material.
You may receive information in different chunks. If you're not receiving the first chunk, you'll get the last part of the previous chunk, including it's title in between triple $. Consider this last part and the title as the start of the new chunk.
# Best practices and steps are:
- Respect wordings and language(s) used in the transcription. Main language is {language}.
- Sometimes, the transcript contains speech of several people participating in a conversation. Although these are not obvious from reading the file, try to detect when other people are speaking.
- Divide the transcript into several logical parts. Ensure questions and their answers are in the same logical part. Don't make logical parts too small. They should contain at least 7 or 8 sentences.
- annotate the text to identify these logical parts using headings in {language}.
- improve errors in the transcript given the context, but do not change the meaning and intentions of the transcription.
Process the file carefully, and take a stepped approach. The resulting markdown should be the result of processing the complete input transcription. Answer with the pure markdown, without any other text.
The transcript is between triple backquotes.
$$${previous_part}$$$
```{transcript}```

View File

@@ -0,0 +1,12 @@
version: "1.0.0"
content: |
You have a lot of background knowledge, and as such you are some kind of
'encyclopedia' to explain general terminology. Only answer if you have a clear understanding of the question.
If not, say you do not have sufficient information to answer the question. Use the {language} in your communication.
Question:
{question}
metadata:
author: "Josako"
date_added: "2024-11-10"
description: "A background information retriever for Evie"
changes: "Initial version migrated from flat file structure"

View File

@@ -0,0 +1,16 @@
version: "1.0.0"
content: |
You are a helpful assistant that details a question based on a previous context,
in such a way that the question is understandable without the previous context.
The context is a conversation history, with the HUMAN asking questions, the AI answering questions.
The history is delimited between triple backquotes.
You answer by stating the question in {language}.
History:
```{history}```
Question to be detailed:
{question}
metadata:
author: "Josako"
date_added: "2024-11-10"
description: "Prompt to further detail a question based on the previous conversation"
changes: "Initial version migrated from flat file structure"

View File

@@ -0,0 +1,20 @@
version: "1.0.0"
content: |
You are a top administrative assistant specialized in transforming given HTML into markdown formatted files. The generated files will be used to generate embeddings in a RAG-system.
# Best practices are:
- Respect wordings and language(s) used in the HTML.
- The following items need to be considered: headings, paragraphs, listed items (numbered or not) and tables. Images can be neglected.
- Sub-headers can be used as lists. This is true when a header is followed by a series of sub-headers without content (paragraphs or listed items). Present those sub-headers as a list.
- Be careful of encoding of the text. Everything needs to be human readable.
Process the file carefully, and take a stepped approach. The resulting markdown should be the result of the processing of the complete input html file. Answer with the pure markdown, without any other text.
HTML is between triple backquotes.
```{html}```
metadata:
author: "Josako"
date_added: "2024-11-10"
description: "An aid in transforming HTML-based inputs to markdown"
changes: "Initial version migrated from flat file structure"

View File

@@ -0,0 +1,23 @@
version: "1.0.0"
content: |
You are a top administrative aid specialized in transforming given PDF-files into markdown formatted files. The generated files will be used to generate embeddings in a RAG-system.
The content you get is already processed (some markdown already generated), but needs to be corrected. For large files, you may receive only portions of the full file. Consider this when processing the content.
# Best practices are:
- Respect wordings and language(s) used in the provided content.
- The following items need to be considered: headings, paragraphs, listed items (numbered or not) and tables. Images can be neglected.
- When headings are numbered, show the numbering and define the header level. You may have to correct current header levels, as preprocessing is known to make errors.
- A new item is started when a <return> is found before a full line is reached. In order to know the number of characters in a line, please check the document and the context within the document (e.g. an image could limit the number of characters temporarily).
- Paragraphs are to be stripped of newlines so they become easily readable.
- Be careful of encoding of the text. Everything needs to be human readable.
Process the file carefully, and take a stepped approach. The resulting markdown should be the result of the processing of the complete input pdf content. Answer with the pure markdown, without any other text.
PDF content is between triple backquotes.
```{pdf_content}```
metadata:
author: "Josako"
date_added: "2024-11-10"
description: "A assistant to parse PDF-content into markdown"
changes: "Initial version migrated from flat file structure"

View File

@@ -0,0 +1,15 @@
version: "1.0.0"
content: |
Answer the question based on the following context, delimited between triple backquotes.
{tenant_context}
Use the following {language} in your communication, and cite the sources used at the end of the full conversation.
If the question cannot be answered using the given context, say "I have insufficient information to answer this question."
Context:
```{context}```
Question:
{question}
metadata:
author: "Josako"
date_added: "2024-11-10"
description: "The Main RAG retriever"
changes: "Initial version migrated from flat file structure"

View File

@@ -0,0 +1,9 @@
version: "1.0.0"
content: |
Write a concise summary of the text in {language}. The text is delimited between triple backquotes.
```{text}```
metadata:
author: "Josako"
date_added: "2024-11-10"
description: "An assistant to create a summary when multiple chunks are required for 1 file"
changes: "Initial version migrated from flat file structure"

View File

@@ -0,0 +1,25 @@
version: "1.0.0"
content: |
You are a top administrative assistant specialized in transforming given transcriptions into markdown formatted files. The generated files will be used to generate embeddings in a RAG-system. The transcriptions originate from podcast, videos and similar material.
You may receive information in different chunks. If you're not receiving the first chunk, you'll get the last part of the previous chunk, including it's title in between triple $. Consider this last part and the title as the start of the new chunk.
# Best practices and steps are:
- Respect wordings and language(s) used in the transcription. Main language is {language}.
- Sometimes, the transcript contains speech of several people participating in a conversation. Although these are not obvious from reading the file, try to detect when other people are speaking.
- Divide the transcript into several logical parts. Ensure questions and their answers are in the same logical part. Don't make logical parts too small. They should contain at least 7 or 8 sentences.
- annotate the text to identify these logical parts using headings in {language}.
- improve errors in the transcript given the context, but do not change the meaning and intentions of the transcription.
Process the file carefully, and take a stepped approach. The resulting markdown should be the result of processing the complete input transcription. Answer with the pure markdown, without any other text.
The transcript is between triple backquotes.
$$${previous_part}$$$
```{transcript}```
metadata:
author: "Josako"
date_added: "2024-11-10"
description: "An assistant to transform a transcript to markdown."
changes: "Initial version migrated from flat file structure"

View File

View File

@@ -0,0 +1,29 @@
# Catalog Types
CATALOG_TYPES = {
"STANDARD_CATALOG": {
"name": "Standard Catalog",
"Description": "A Catalog with information in Evie's Library, to be considered as a whole",
"configuration": {},
"document_version_configurations": []
},
"DOSSIER": {
"name": "Dossier Catalog",
"Description": "A Catalog with information in Evie's Library in which several Dossiers can be stored",
"configuration": {
"tagging_fields": {
"name": "Tagging Fields",
"type": "tagging_fields",
"description": """Define the metadata fields that will be used for tagging documents.
Each field must have:
- type: one of 'string', 'integer', 'float', 'date', 'enum'
- required: boolean indicating if the field is mandatory
- description: field description
- allowed_values: list of values (for enum type only)
- min_value/max_value: range limits (for numeric types only)""",
"required": True,
"default": {},
}
},
"document_version_configurations": ["tagging_fields"]
},
}

View File

@@ -0,0 +1,56 @@
# Catalog Types
PROCESSOR_TYPES = {
"HTML_PROCESSOR": {
"name": "HTML Processor",
"file_types": "html",
"Description": "A processor for HTML files",
"configuration": {
"html_tags": {
"name": "HTML Tags",
"type": "string",
"description": "A comma-separated list of HTML tags",
"required": True,
"default": "p, h1, h2, h3, h4, h5, h6, li, table, thead, tbody, tr, td"
},
"html_end_tags": {
"name": "HTML End Tags",
"type": "string",
"description": "A comma-separated list of HTML end tags (where can the chunk end)",
"required": True,
"default": "p, li, table"
},
"html_included_elements": {
"name": "HTML Included Elements",
"type": "string",
"description": "A comma-separated list of elements to be included",
"required": True,
"default": "article, main"
},
"html_excluded_elements": {
"name": "HTML Excluded Elements",
"type": "string",
"description": "A comma-separated list of elements to be excluded",
"required": False,
"default": "header, footer, nav, script"
},
"html_excluded_classes": {
"name": "HTML Excluded Classes",
"type": "string",
"description": "A comma-separated list of classes to be excluded",
"required": False,
},
},
},
"PDF_PROCESSOR": {
"name": "PDF Processor",
"file_types": "pdf",
"Description": "A Processor for PDF files",
"configuration": {}
},
"AUDIO_PROCESSOR": {
"name": "AUDIO Processor",
"file_types": "mp3, mp4, ogg",
"Description": "A Processor for audio files",
"configuration": {}
},
}

View File

@@ -0,0 +1,31 @@
# Retriever Types
RETRIEVER_TYPES = {
"STANDARD_RAG": {
"name": "Standard RAG Retriever",
"description": "Retrieving all embeddings conform the query",
"configuration": {
"es_k": {
"name": "es_k",
"type": "int",
"description": "K-value to retrieve embeddings (max embeddings retrieved)",
"required": True,
"default": 8,
},
"es_similarity_threshold": {
"name": "es_similarity_threshold",
"type": "float",
"description": "Similarity threshold for retrieving embeddings",
"required": True,
"default": 0.3,
},
},
"arguments": {
"query": {
"name": "query",
"type": "str",
"description": "Query to retrieve embeddings",
"required": True,
},
}
}
}

View File

@@ -0,0 +1,11 @@
# Specialist Types
SERVICE_TYPES = {
"CHAT": {
"name": "CHAT",
"description": "Service allows to use CHAT functionality.",
},
"DOCAPI": {
"name": "DOCAPI",
"description": "Service allows to use document API functionality.",
},
}

View File

@@ -0,0 +1,62 @@
# Specialist Types
SPECIALIST_TYPES = {
"STANDARD_RAG": {
"name": "Q&A RAG Specialist",
"description": "Standard Q&A through RAG Specialist",
"configuration": {
"specialist_context": {
"name": "Specialist Context",
"type": "text",
"description": "The context to be used by the specialist.",
"required": False,
},
"temperature": {
"name": "Temperature",
"type": "number",
"description": "The inference temperature to be used by the specialist.",
"required": False,
"default": 0.3
}
},
"arguments": {
"language": {
"name": "Language",
"type": "str",
"description": "Language code to be used for receiving questions and giving answers",
"required": True,
},
"query": {
"name": "query",
"type": "str",
"description": "Query to answer",
"required": True,
}
},
"results": {
"detailed_query": {
"name": "detailed_query",
"type": "str",
"description": "The query detailed with the Chat Session History.",
"required": True,
},
"answer": {
"name": "answer",
"type": "str",
"description": "Answer to the query",
"required": True,
},
"citations": {
"name": "citations",
"type": "List[str]",
"description": "List of citations",
"required": False,
},
"insufficient_info": {
"name": "insufficient_info",
"type": "bool",
"description": "Whether or not the query is insufficient info",
"required": True,
},
}
}
}

View File

@@ -169,4 +169,5 @@ for SERVICE in "${SERVICES[@]}"; do
fi
done
echo "All specified services processed."
echo -e "\033[35mAll specified services processed.\033[0m"
echo -e "\033[35mFinished at $(date +"%d/%m/%Y %H:%M:%S")\033[0m"

View File

@@ -18,8 +18,8 @@ x-common-variables: &common-variables
FLASK_DEBUG: true
SECRET_KEY: '97867c1491bea5ee6a8e8436eb11bf2ba6a69ff53ab1b17ecba450d0f2e572e1'
SECURITY_PASSWORD_SALT: '228614859439123264035565568761433607235'
MAIL_USERNAME: eveai_super@flow-it.net
MAIL_PASSWORD: '$$6xsWGbNtx$$CFMQZqc*'
MAIL_USERNAME: evie@askeveai.com
MAIL_PASSWORD: 'D**0z@UGfJOI@yv3eC5'
MAIL_SERVER: mail.flow-it.net
MAIL_PORT: 465
REDIS_URL: redis
@@ -27,7 +27,6 @@ x-common-variables: &common-variables
OPENAI_API_KEY: 'sk-proj-8R0jWzwjL7PeoPyMhJTZT3BlbkFJLb6HfRB2Hr9cEVFWEhU7'
GROQ_API_KEY: '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'
ANTHROPIC_API_KEY: 'sk-ant-api03-c2TmkzbReeGhXBO5JxNH6BJNylRDonc9GmZd0eRbrvyekec2'
PORTKEY_API_KEY: 'T2Dt4QTpgCvWxa1OftYCJtj7NcDZ'
JWT_SECRET_KEY: 'bsdMkmQ8ObfMD52yAFg4trrvjgjMhuIqg2fjDpD/JqvgY0ccCcmlsEnVFmR79WPiLKEA3i8a5zmejwLZKl4v9Q=='
API_ENCRYPTION_KEY: 'xfF5369IsredSrlrYZqkM9ZNrfUASYYS6TCcAR9UKj4='
MINIO_ENDPOINT: minio:9000
@@ -36,11 +35,6 @@ x-common-variables: &common-variables
NGINX_SERVER_NAME: 'localhost http://macstudio.ask-eve-ai-local.com/'
LANGCHAIN_API_KEY: "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"
networks:
eveai-network:
driver: bridge
services:
nginx:
image: josakola/nginx:latest
@@ -60,9 +54,10 @@ services:
- ../nginx/sites-enabled:/etc/nginx/sites-enabled
- ../nginx/static:/etc/nginx/static
- ../nginx/public:/etc/nginx/public
- ../integrations/Wordpress/eveai-chat-widget/css/eveai-chat-style.css:/etc/nginx/static/css/eveai-chat-style.css
- ../integrations/Wordpress/eveai-chat-widget/js/eveai-chat-widget.js:/etc/nginx/static/js/eveai-chat-widget.js
- ../integrations/Wordpress/eveai-chat-widget/js/eveai-sdk.js:/etc/nginx/static/js/eveai-sdk.js
- ../integrations/Wordpress/eveai-chat/assets/css/eveai-chat-style.css:/etc/nginx/static/css/eveai-chat-style.css
- ../integrations/Wordpress/eveai-chat/assets/js/eveai-chat-widget.js:/etc/nginx/static/js/eveai-chat-widget.js
- ../integrations/Wordpress/eveai-chat/assets/js/eveai-chat-widget.js:/etc/nginx/static/js/eveai-token-manager.js
- ../integrations/Wordpress/eveai-chat/assets/js/eveai-sdk.js:/etc/nginx/static/js/eveai-sdk.js
- ./logs/nginx:/var/log/nginx
depends_on:
- eveai_app
@@ -90,7 +85,7 @@ services:
- ../migrations:/app/migrations
- ../scripts:/app/scripts
- ../patched_packages:/app/patched_packages
- eveai_logs:/app/logs
- ./eveai_logs:/app/logs
depends_on:
db:
condition: service_healthy
@@ -124,7 +119,7 @@ services:
- ../config:/app/config
- ../scripts:/app/scripts
- ../patched_packages:/app/patched_packages
- eveai_logs:/app/logs
- ./eveai_logs:/app/logs
depends_on:
db:
condition: service_healthy
@@ -154,7 +149,7 @@ services:
- ../config:/app/config
- ../scripts:/app/scripts
- ../patched_packages:/app/patched_packages
- eveai_logs:/app/logs
- ./eveai_logs:/app/logs
depends_on:
db:
condition: service_healthy
@@ -186,7 +181,7 @@ services:
- ../config:/app/config
- ../scripts:/app/scripts
- ../patched_packages:/app/patched_packages
- eveai_logs:/app/logs
- ./eveai_logs:/app/logs
depends_on:
db:
condition: service_healthy
@@ -208,13 +203,16 @@ services:
environment:
<<: *common-variables
COMPONENT_NAME: eveai_api
WORDPRESS_HOST: host.docker.internal
WORDPRESS_PORT: 10003
WORDPRESS_PROTOCOL: http
volumes:
- ../eveai_api:/app/eveai_api
- ../common:/app/common
- ../config:/app/config
- ../scripts:/app/scripts
- ../patched_packages:/app/patched_packages
- eveai_logs:/app/logs
- ./eveai_logs:/app/logs
depends_on:
db:
condition: service_healthy
@@ -248,7 +246,7 @@ services:
- ../config:/app/config
- ../scripts:/app/scripts
- ../patched_packages:/app/patched_packages
- eveai_logs:/app/logs
- ./eveai_logs:/app/logs
depends_on:
redis:
condition: service_healthy
@@ -272,7 +270,7 @@ services:
- ../config:/app/config
- ../scripts:/app/scripts
- ../patched_packages:/app/patched_packages
- eveai_logs:/app/logs
- ./eveai_logs:/app/logs
depends_on:
db:
condition: service_healthy
@@ -283,7 +281,6 @@ services:
networks:
- eveai-network
db:
hostname: db
image: ankane/pgvector
@@ -308,8 +305,8 @@ services:
redis:
image: redis:7.2.5
restart: always
expose:
- 6379
ports:
- "6379:6379"
volumes:
- ./db/redis:/data
healthcheck:
@@ -359,6 +356,13 @@ services:
networks:
- eveai-network
networks:
eveai-network:
driver: bridge
# This enables the containers to access the host network
driver_opts:
com.docker.network.bridge.host_ipc: "true"
volumes:
minio_data:
eveai_logs:

View File

@@ -31,7 +31,6 @@ x-common-variables: &common-variables
OPENAI_API_KEY: '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'
GROQ_API_KEY: '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'
ANTHROPIC_API_KEY: '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'
PORTKEY_API_KEY: 'XvmvBFIVbm76opUxA7MNP14QmdQj'
JWT_SECRET_KEY: '0d99e810e686ea567ef305d8e9b06195c4db482952e19276590a726cde60a408'
API_ENCRYPTION_KEY: 'Ly5XYWwEKiasfAwEqdEMdwR-k0vhrq6QPYd4whEROB0='
GRAYLOG_HOST: de4zvu.stackhero-network.com

View File

@@ -1,4 +1,4 @@
ARG PYTHON_VERSION=3.12.3
ARG PYTHON_VERSION=3.12.7
FROM python:${PYTHON_VERSION}-slim as base
# Prevents Python from writing pyc files.

View File

@@ -1,4 +1,4 @@
ARG PYTHON_VERSION=3.12.3
ARG PYTHON_VERSION=3.12.7
FROM python:${PYTHON_VERSION}-slim as base
# Prevents Python from writing pyc files.

View File

@@ -1,4 +1,4 @@
ARG PYTHON_VERSION=3.12.3
ARG PYTHON_VERSION=3.12.7
FROM python:${PYTHON_VERSION}-slim as base
# Prevents Python from writing pyc files.

View File

@@ -1,4 +1,4 @@
ARG PYTHON_VERSION=3.12.3
ARG PYTHON_VERSION=3.12.7
FROM python:${PYTHON_VERSION}-slim as base
# Prevents Python from writing pyc files.

View File

@@ -1,4 +1,4 @@
ARG PYTHON_VERSION=3.12.3
ARG PYTHON_VERSION=3.12.7
FROM python:${PYTHON_VERSION}-slim as base
# Prevents Python from writing pyc files.

View File

@@ -1,4 +1,4 @@
ARG PYTHON_VERSION=3.12.3
ARG PYTHON_VERSION=3.12.7
FROM python:${PYTHON_VERSION}-slim as base
# Prevents Python from writing pyc files.

View File

@@ -1,4 +1,4 @@
ARG PYTHON_VERSION=3.12.3
ARG PYTHON_VERSION=3.12.7
FROM python:${PYTHON_VERSION}-slim as base
# Prevents Python from writing pyc files.

View File

@@ -1,4 +1,4 @@
ARG PYTHON_VERSION=3.12.3
ARG PYTHON_VERSION=3.12.7
FROM python:${PYTHON_VERSION}-slim as base
ENV PYTHONDONTWRITEBYTECODE=1

View File

@@ -10,9 +10,10 @@ COPY ../../nginx/mime.types /etc/nginx/mime.types
# Copy static & public files
RUN mkdir -p /etc/nginx/static /etc/nginx/public
COPY ../../nginx/static /etc/nginx/static
COPY ../../integrations/Wordpress/eveai-chat-widget/css/eveai-chat-style.css /etc/nginx/static/css/
COPY ../../integrations/Wordpress/eveai-chat-widget/js/eveai-chat-widget.js /etc/nginx/static/js/
COPY ../../integrations/Wordpress/eveai-chat-widget/js/eveai-sdk.js /etc/nginx/static/js
COPY ../../integrations/Wordpress/eveai-chat/assets/css/eveai-chat-style.css /etc/nginx/static/css/
COPY ../../integrations/Wordpress/eveai-chat/assets/js/eveai-chat-widget.js /etc/nginx/static/js/
COPY ../../integrations/Wordpress/eveai-chat/assets/js/eveai-token-manager.js /etc/nginx/static/js/
COPY ../../integrations/Wordpress/eveai-chat/assets/js/eveai-sdk.js /etc/nginx/static/js
COPY ../../nginx/public /etc/nginx/public
# Copy site-specific configurations

62
docker/release_and_tag_eveai.sh Executable file
View File

@@ -0,0 +1,62 @@
#!/bin/bash
# Initialize variables
RELEASE_VERSION=""
RELEASE_MESSAGE=""
DOCKER_ACCOUNT="josakola" # Your Docker account name
# Parse input arguments
while getopts r:m: flag
do
case "${flag}" in
r) RELEASE_VERSION=${OPTARG};;
m) RELEASE_MESSAGE=${OPTARG};;
*)
echo "Usage: $0 -r <release_version> -m <release_message>"
exit 1 ;;
esac
done
# Ensure both version and message are provided
if [ -z "$RELEASE_VERSION" ]; then
echo "Error: Release version not provided. Use -r <release_version>"
exit 1
fi
if [ -z "$RELEASE_MESSAGE" ]; then
echo "Error: Release message not provided. Use -m <release_message>"
exit 1
fi
# Path to your docker-compose file
DOCKER_COMPOSE_FILE="compose_dev.yaml"
# Get all the images defined in docker-compose
IMAGES=$(docker compose -f $DOCKER_COMPOSE_FILE config | grep 'image:' | awk '{ print $2 }')
# Start tagging only relevant images
for DOCKER_IMAGE in $IMAGES; do
# Check if the image belongs to your Docker account and ends with :latest
if [[ $DOCKER_IMAGE == $DOCKER_ACCOUNT* && $DOCKER_IMAGE == *:latest ]]; then
# Remove the ":latest" tag to use the base image name
BASE_IMAGE=${DOCKER_IMAGE%:latest}
echo "Tagging Docker image: $BASE_IMAGE with version: $RELEASE_VERSION"
# Tag the 'latest' image with the new release version
docker tag $DOCKER_IMAGE $BASE_IMAGE:$RELEASE_VERSION
# Push the newly tagged image to Docker Hub
docker push $BASE_IMAGE:$RELEASE_VERSION
else
echo "Skipping image: $DOCKER_IMAGE (not part of $DOCKER_ACCOUNT or not tagged as latest)"
fi
done
# Step 3: Tag the Git repository with the release version
echo "Tagging Git repository with version: $RELEASE_VERSION"
git tag -a v$RELEASE_VERSION -m "Release $RELEASE_VERSION: $RELEASE_MESSAGE"
git push origin v$RELEASE_VERSION
echo -e "\033[35mRelease process completed for version: $RELEASE_VERSION \033[0m"
echo -e "\033[35mFinished at $(date +"%d/%m/%Y %H:%M:%S")\033[0m"

View File

@@ -1,9 +1,16 @@
import traceback
from flask import Flask, jsonify, request
from flask_jwt_extended import get_jwt_identity, verify_jwt_in_request
from common.extensions import db, api_rest, jwt, minio_client, simple_encryption
from sqlalchemy.exc import SQLAlchemyError
from werkzeug.exceptions import HTTPException
from common.extensions import db, api_rest, jwt, minio_client, simple_encryption, cors
import os
import logging.config
from common.models.user import TenantDomain
from common.utils.cors_utils import get_allowed_origins
from common.utils.database import Database
from config.logging_config import LOGGING
from .api.document_api import document_ns
@@ -45,56 +52,55 @@ def create_app(config_file=None):
# Register Blueprints
register_blueprints(app)
# Error handler for the API
@app.errorhandler(EveAIException)
def handle_eveai_exception(error):
return {'message': str(error)}, error.status_code
# Register Error Handlers
register_error_handlers(app)
@app.before_request
def before_request():
app.logger.debug(f'Before request: {request.method} {request.path}')
app.logger.debug(f'Request URL: {request.url}')
app.logger.debug(f'Request headers: {dict(request.headers)}')
def check_cors():
if request.method == 'OPTIONS':
app.logger.debug("Handling OPTIONS request")
return '', 200 # Allow OPTIONS to pass through
# Log request arguments
app.logger.debug(f'Request args: {request.args}')
origin = request.headers.get('Origin')
if not origin:
return # Not a CORS request
# Log form data if it's a POST request
if request.method == 'POST':
app.logger.debug(f'Form data: {request.form}')
# Log JSON data if the content type is application/json
if request.is_json:
app.logger.debug(f'JSON data: {request.json}')
# Log raw data for other content types
if request.data:
app.logger.debug(f'Raw data: {request.data}')
# Check if this is a request to the token endpoint
if request.path == '/api/v1/auth/token' and request.method == 'POST':
app.logger.debug('Token request detected, skipping JWT verification')
# Get tenant ID from request
if verify_jwt_in_request():
tenant_id = get_jwt_identity()
if not tenant_id:
return
else:
return
# Check if origin is allowed for this tenant
allowed_origins = get_allowed_origins(tenant_id)
if origin not in allowed_origins:
app.logger.warning(f'Origin {origin} not allowed for tenant {tenant_id}')
return {'error': 'Origin not allowed'}, 403
@app.before_request
def set_tenant_schema():
# Check if this a health check request
if request.path.startswith('/_healthz') or request.path.startswith('/healthz'):
app.logger.debug('Health check request detected, skipping JWT verification')
pass
else:
try:
verify_jwt_in_request(optional=True)
tenant_id = get_jwt_identity()
app.logger.debug(f'Tenant ID from JWT: {tenant_id}')
if tenant_id:
Database(tenant_id).switch_schema()
app.logger.debug(f'Switched to schema for tenant {tenant_id}')
else:
app.logger.debug('No tenant ID found in JWT')
except Exception as e:
app.logger.error(f'Error in before_request: {str(e)}')
# Don't raise the exception here, let the request continue
# The appropriate error handling will be done in the specific endpoints
@app.route('/api/v1')
def swagger():
return api_rest.render_doc()
return app
@@ -104,6 +110,17 @@ def register_extensions(app):
jwt.init_app(app)
minio_client.init_app(app)
simple_encryption.init_app(app)
cors.init_app(app, resources={
r"/api/v1/*": {
"origins": "*",
"methods": ["GET", "POST", "PUT", "OPTIONS"],
"allow_headers": ["Content-Type", "Authorization", "X-Requested-With"],
"expose_headers": ["Content-Length", "Content-Range"],
"supports_credentials": True,
"max_age": 1728000, # 20 days
"allow_credentials": True
}
})
def register_namespaces(app):
@@ -115,3 +132,61 @@ def register_blueprints(app):
from .views.healthz_views import healthz_bp
app.register_blueprint(healthz_bp)
def register_error_handlers(app):
@app.errorhandler(Exception)
def handle_exception(e):
"""Handle all unhandled exceptions with detailed error responses"""
# Get the current exception info
exc_info = traceback.format_exc()
# Log the full exception details
app.logger.error(f"Unhandled exception: {str(e)}\n{exc_info}")
# Start with a default error response
response = {
"error": "Internal Server Error",
"message": str(e),
"type": e.__class__.__name__
}
status_code = 500
# Handle specific types of exceptions
if isinstance(e, HTTPException):
status_code = e.code
response["error"] = e.name
elif isinstance(e, SQLAlchemyError):
response["error"] = "Database Error"
response["details"] = str(e.__cause__ or e)
elif isinstance(e, ValueError):
status_code = 400
response["error"] = "Invalid Input"
# In development, include additional debug information
if app.debug:
response["debug"] = {
"exception": exc_info,
"class": e.__class__.__name__,
"module": e.__class__.__module__
}
return jsonify(response), status_code
@app.errorhandler(404)
def not_found_error(e):
return jsonify({
"error": "Not Found",
"message": str(e),
"type": "NotFoundError"
}), 404
@app.errorhandler(400)
def bad_request_error(e):
return jsonify({
"error": "Bad Request",
"message": str(e),
"type": "BadRequestError"
}), 400

View File

@@ -1,8 +1,8 @@
from datetime import timedelta
from datetime import timedelta, datetime as dt, timezone as tz
from flask_restx import Namespace, Resource, fields
from flask_jwt_extended import create_access_token
from common.models.user import Tenant
from flask_jwt_extended import create_access_token, verify_jwt_in_request, get_jwt
from common.models.user import Tenant, TenantProject
from common.extensions import simple_encryption
from flask import current_app, request
@@ -18,6 +18,12 @@ token_response = auth_ns.model('TokenResponse', {
'expires_in': fields.Integer(description='Token expiration time in seconds')
})
token_verification = auth_ns.model('TokenVerification', {
'is_valid': fields.Boolean(description='Token validity status'),
'expires_in': fields.Integer(description='Seconds until token expiration'),
'tenant_id': fields.Integer(description='Tenant ID from token')
})
@auth_ns.route('/token')
class Token(Resource):
@@ -30,42 +36,51 @@ class Token(Resource):
"""
Get JWT token
"""
current_app.logger.debug(f"Token endpoint called with data: {request.json}")
current_app.logger.debug(f'Token Requested {auth_ns.payload}')
try:
tenant_id = auth_ns.payload['tenant_id']
tenant_id = int(auth_ns.payload['tenant_id'])
api_key = auth_ns.payload['api_key']
except KeyError as e:
current_app.logger.error(f"Missing required field: {e}")
return {'message': f"Missing required field: {e}"}, 400
current_app.logger.debug(f"Querying database for tenant: {tenant_id}")
tenant = Tenant.query.get(tenant_id)
if not tenant:
current_app.logger.error(f"Tenant not found: {tenant_id}")
return {'message': "Tenant not found"}, 404
return {'message': f"Authentication invalid for tenant {tenant_id}"}, 404
current_app.logger.debug(f"Tenant found: {tenant.id}")
projects = TenantProject.query.filter_by(
tenant_id=tenant_id,
active=True
).all()
try:
current_app.logger.debug("Attempting to decrypt API key")
decrypted_api_key = simple_encryption.decrypt_api_key(tenant.encrypted_api_key)
except Exception as e:
current_app.logger.error(f"Error decrypting API key: {e}")
return {'message': "Internal server error"}, 500
# Find project with matching API key
matching_project = None
for project in projects:
try:
decrypted_key = simple_encryption.decrypt_api_key(project.encrypted_api_key)
if decrypted_key == api_key:
matching_project = project
break
except Exception as e:
current_app.logger.error(f"Error decrypting API key for project {project.id}: {e}")
continue
if api_key != decrypted_api_key:
current_app.logger.error(f"Invalid API key for tenant: {tenant_id}")
if not matching_project:
current_app.logger.error(f"Project for given API key not found for Tenant: {tenant_id}")
return {'message': "Invalid API key"}, 401
if "DOCAPI" not in matching_project.services:
current_app.logger.error(f"Service DOCAPI not authorized for Project {matching_project.name} "
f"for Tenant: {tenant_id}")
return {'message': f"Service DOCAPI not authorized for Project {matching_project.name}"}, 403
# Get the JWT_ACCESS_TOKEN_EXPIRES setting from the app config
expires_delta = current_app.config.get('JWT_ACCESS_TOKEN_EXPIRES', timedelta(minutes=15))
try:
current_app.logger.debug(f"Creating access token for tenant: {tenant_id}")
access_token = create_access_token(identity=tenant_id, expires_delta=expires_delta)
current_app.logger.debug("Access token created successfully")
return {
'access_token': access_token,
'expires_in': expires_delta.total_seconds()
@@ -73,3 +88,61 @@ class Token(Resource):
except Exception as e:
current_app.logger.error(f"Error creating access token: {e}")
return {'message': "Internal server error"}, 500
@auth_ns.route('/verify')
class TokenVerification(Resource):
@auth_ns.doc('verify_token')
@auth_ns.response(200, 'Token verification result', token_verification)
@auth_ns.response(401, 'Invalid token')
def get(self):
"""Verify a token's validity and get expiration information"""
try:
verify_jwt_in_request()
jwt_data = get_jwt()
# Get expiration timestamp from token
exp_timestamp = jwt_data['exp']
current_timestamp = dt.now().timestamp()
return {
'is_valid': True,
'expires_in': int(exp_timestamp - current_timestamp),
'tenant_id': jwt_data['sub'] # tenant_id is stored in 'sub' claim
}, 200
except Exception as e:
current_app.logger.error(f"Token verification failed: {str(e)}")
return {
'is_valid': False,
'message': 'Invalid token'
}, 401
@auth_ns.route('/refresh')
class TokenRefresh(Resource):
@auth_ns.doc('refresh_token')
@auth_ns.response(200, 'New token', token_response)
@auth_ns.response(401, 'Invalid token')
def post(self):
"""Get a new token before the current one expires"""
try:
verify_jwt_in_request()
jwt_data = get_jwt()
tenant_id = jwt_data['sub']
# Optional: Add additional verification here if needed
# Create new token
expires_delta = current_app.config.get('JWT_ACCESS_TOKEN_EXPIRES', timedelta(minutes=15))
new_token = create_access_token(
identity=tenant_id,
expires_delta=expires_delta
)
return {
'access_token': new_token,
'expires_in': int(expires_delta.total_seconds())
}, 200
except Exception as e:
current_app.logger.error(f"Token refresh failed: {str(e)}")
return {'message': 'Token refresh failed'}, 401

View File

@@ -10,9 +10,10 @@ from werkzeug.utils import secure_filename
from common.utils.document_utils import (
create_document_stack, process_url, start_embedding_task,
validate_file_type, EveAIInvalidLanguageException, EveAIDoubleURLException, EveAIUnsupportedFileType,
process_multiple_urls, get_documents_list, edit_document, refresh_document, edit_document_version,
get_documents_list, edit_document, refresh_document, edit_document_version,
refresh_document_with_info
)
from common.utils.eveai_exceptions import EveAIException
def validate_date(date_str):
@@ -33,6 +34,7 @@ document_ns = Namespace('documents', description='Document related operations')
# Define models for request parsing and response serialization
upload_parser = reqparse.RequestParser()
upload_parser.add_argument('catalog_id', location='form', type=int, required=True, help='The catalog to add the file to')
upload_parser.add_argument('file', location='files', type=FileStorage, required=True, help='The file to upload')
upload_parser.add_argument('name', location='form', type=str, required=False, help='Name of the document')
upload_parser.add_argument('language', location='form', type=str, required=True, help='Language of the document')
@@ -42,6 +44,9 @@ upload_parser.add_argument('valid_from', location='form', type=validate_date, re
help='Valid from date for the document (ISO format)')
upload_parser.add_argument('user_metadata', location='form', type=validate_json, required=False,
help='User metadata for the document (JSON format)')
upload_parser.add_argument('catalog_properties', location='form', type=validate_json, required=False,
help='The catalog configuration to be passed along (JSON format). Validity is against catalog requirements '
'is not checked, and is the responsibility of the calling client.')
add_document_response = document_ns.model('AddDocumentResponse', {
'message': fields.String(description='Status message'),
@@ -75,11 +80,13 @@ class AddDocument(Resource):
validate_file_type(extension)
api_input = {
'catalog_id': args.get('catalog_id'),
'name': args.get('name') or filename,
'language': args.get('language'),
'user_context': args.get('user_context'),
'valid_from': args.get('valid_from'),
'user_metadata': args.get('user_metadata'),
'catalog_properties': args.get('catalog_properties'),
}
new_doc, new_doc_vers = create_document_stack(api_input, file, filename, extension, tenant_id)
@@ -102,13 +109,18 @@ class AddDocument(Resource):
# Models for AddURL
add_url_model = document_ns.model('AddURL', {
'catalog_id': fields.Integer(required='True', description='ID of the catalog the URL needs to be added to'),
'url': fields.String(required=True, description='URL of the document to add'),
'name': fields.String(required=False, description='Name of the document'),
'language': fields.String(required=True, description='Language of the document'),
'user_context': fields.String(required=False, description='User context for the document'),
'valid_from': fields.String(required=False, description='Valid from date for the document'),
'user_metadata': fields.String(required=False, description='User metadata for the document'),
'system_metadata': fields.String(required=False, description='System metadata for the document')
'system_metadata': fields.String(required=False, description='System metadata for the document'),
'catalog_properties': fields.String(required=False, description='The catalog configuration to be passed along (JSON '
'format). Validity is against catalog requirements '
'is not checked, and is the responsibility of the '
'calling client.'),
})
add_url_response = document_ns.model('AddURLResponse', {
@@ -138,12 +150,14 @@ class AddURL(Resource):
file_content, filename, extension = process_url(args['url'], tenant_id)
api_input = {
'catalog_id': args['catalog_id'],
'url': args['url'],
'name': args.get('name') or filename,
'language': args['language'],
'user_context': args.get('user_context'),
'valid_from': args.get('valid_from'),
'user_metadata': args.get('user_metadata'),
'catalog_properties': args.get('catalog_properties'),
}
new_doc, new_doc_vers = create_document_stack(api_input, file_content, filename, extension, tenant_id)
@@ -199,21 +213,31 @@ class DocumentResource(Resource):
@document_ns.doc('edit_document')
@document_ns.expect(edit_document_model)
@document_ns.response(200, 'Document updated successfully')
@document_ns.response(400, 'Validation Error')
@document_ns.response(404, 'Document not found')
@document_ns.response(500, 'Internal Server Error')
def put(self, document_id):
"""Edit a document"""
data = request.json
updated_doc, error = edit_document(document_id, data['name'], data.get('valid_from'), data.get('valid_to'))
if updated_doc:
return {'message': f'Document {updated_doc.id} updated successfully'}, 200
else:
return {'message': f'Error updating document: {error}'}, 400
try:
current_app.logger.debug(f'Editing document {document_id}')
data = request.json
tenant_id = get_jwt_identity()
updated_doc, error = edit_document(tenant_id, document_id, data.get('name', None),
data.get('valid_from', None), data.get('valid_to', None))
if updated_doc:
return {'message': f'Document {updated_doc.id} updated successfully'}, 200
else:
return {'message': f'Error updating document: {error}'}, 400
except EveAIException as e:
return e.to_dict(), e.status_code
@jwt_required()
@document_ns.doc('refresh_document')
@document_ns.response(200, 'Document refreshed successfully')
def post(self, document_id):
"""Refresh a document"""
new_version, result = refresh_document(document_id)
tenant_id = get_jwt_identity()
new_version, result = refresh_document(document_id, tenant_id)
if new_version:
return {'message': f'Document refreshed. New version: {new_version.id}. Task ID: {result}'}, 200
else:
@@ -222,6 +246,7 @@ class DocumentResource(Resource):
edit_document_version_model = document_ns.model('EditDocumentVersion', {
'user_context': fields.String(required=True, description='New user context for the document version'),
'catalog_properties': fields.String(required=True, description='New catalog properties for the document version'),
})
@@ -234,7 +259,8 @@ class DocumentVersionResource(Resource):
def put(self, version_id):
"""Edit a document version"""
data = request.json
updated_version, error = edit_document_version(version_id, data['user_context'])
tenant_id = get_jwt_identity()
updated_version, error = edit_document_version(tenant_id, version_id, data['user_context'], data.get('catalog_properties'))
if updated_version:
return {'message': f'Document Version {updated_version.id} updated successfully'}, 200
else:
@@ -246,7 +272,8 @@ refresh_document_model = document_ns.model('RefreshDocument', {
'name': fields.String(required=False, description='New name for the document'),
'language': fields.String(required=False, description='Language of the document'),
'user_context': fields.String(required=False, description='User context for the document'),
'user_metadata': fields.Raw(required=False, description='User metadata for the document')
'user_metadata': fields.Raw(required=False, description='User metadata for the document'),
'catalog_properties': fields.Raw(required=False, description='Catalog properties for the document'),
})
@@ -263,7 +290,7 @@ class RefreshDocument(Resource):
current_app.logger.info(f'Refreshing document {document_id} for tenant {tenant_id}')
try:
new_version, result = refresh_document(document_id)
new_version, result = refresh_document(document_id, tenant_id)
if new_version:
return {
@@ -296,7 +323,7 @@ class RefreshDocumentWithInfo(Resource):
try:
api_input = request.json
new_version, result = refresh_document_with_info(document_id, api_input)
new_version, result = refresh_document_with_info(document_id, tenant_id, api_input)
if new_version:
return {

View File

@@ -7,7 +7,7 @@ from werkzeug.middleware.proxy_fix import ProxyFix
import logging.config
from common.extensions import (db, migrate, bootstrap, security, mail, login_manager, cors, csrf, session,
minio_client, simple_encryption, metrics)
minio_client, simple_encryption, metrics, cache_manager)
from common.models.user import User, Role, Tenant, TenantDomain
import common.models.interaction
import common.models.entitlements
@@ -119,6 +119,7 @@ def register_extensions(app):
simple_encryption.init_app(app)
session.init_app(app)
minio_client.init_app(app)
cache_manager.init_app(app)
metrics.init_app(app)

View File

@@ -1,4 +1,4 @@
from flask import render_template, request, jsonify, redirect
from flask import render_template, request, jsonify, redirect, current_app
from flask_login import current_user
from common.utils.nginx_utils import prefixed_url_for
@@ -6,24 +6,28 @@ from common.utils.nginx_utils import prefixed_url_for
def not_found_error(error):
if not current_user.is_authenticated:
return redirect(prefixed_url_for('security.login'))
current_app.logger.error(f"Not Found Error: {error}")
return render_template('error/404.html'), 404
def internal_server_error(error):
if not current_user.is_authenticated:
return redirect(prefixed_url_for('security.login'))
current_app.logger.error(f"Internal Server Error: {error}")
return render_template('error/500.html'), 500
def not_authorised_error(error):
if not current_user.is_authenticated:
return redirect(prefixed_url_for('security.login'))
current_app.logger.error(f"Not Authorised Error: {error}")
return render_template('error/401.html')
def access_forbidden(error):
if not current_user.is_authenticated:
return redirect(prefixed_url_for('security.login'))
current_app.logger.error(f"Access Forbidden: {error}")
return render_template('error/403.html')
@@ -32,6 +36,7 @@ def key_error_handler(error):
if str(error) == "'tenant'":
return redirect(prefixed_url_for('security.login'))
# For other KeyErrors, you might want to log the error and return a generic error page
current_app.logger.error(f"Key Error: {error}")
return render_template('error/generic.html', error_message="An unexpected error occurred"), 500

View File

@@ -11,9 +11,17 @@
{{ form.hidden_tag() }}
{% set disabled_fields = [] %}
{% set exclude_fields = [] %}
{% for field in form %}
{% for field in form.get_static_fields() %}
{{ render_field(field, disabled_fields, exclude_fields) }}
{% endfor %}
{% for collection_name, fields in form.get_dynamic_fields().items() %}
{% if fields|length > 0 %}
<h4 class="mt-4">{{ collection_name }}</h4>
{% endif %}
{% for field in fields %}
{{ render_field(field, disabled_fields, exclude_fields) }}
{% endfor %}
{% endfor %}
<button type="submit" class="btn btn-primary">Add Document</button>
</form>
{% endblock %}

View File

@@ -0,0 +1,23 @@
{% extends 'base.html' %}
{% from "macros.html" import render_field %}
{% block title %}Catalog Registration{% endblock %}
{% block content_title %}Register Catalog{% endblock %}
{% block content_description %}Define a new catalog of documents in Evie's Library{% endblock %}
{% block content %}
<form method="post">
{{ form.hidden_tag() }}
{% set disabled_fields = [] %}
{% set exclude_fields = [] %}
{% for field in form %}
{{ render_field(field, disabled_fields, exclude_fields) }}
{% endfor %}
<button type="submit" class="btn btn-primary">Register Catalog</button>
</form>
{% endblock %}
{% block content_footer %}
{% endblock %}

View File

@@ -0,0 +1,24 @@
{% extends 'base.html' %}
{% from 'macros.html' import render_selectable_table, render_pagination %}
{% block title %}Documents{% endblock %}
{% block content_title %}Catalogs{% endblock %}
{% block content_description %}View Catalogs for Tenant{% endblock %}
{% block content_class %}<div class="col-xl-12 col-lg-5 col-md-7 mx-auto"></div>{% endblock %}
{% block content %}
<div class="container">
<form method="POST" action="{{ url_for('document_bp.handle_catalog_selection') }}">
{{ render_selectable_table(headers=["Catalog ID", "Name", "Type"], rows=rows, selectable=True, id="catalogsTable") }}
<div class="form-group mt-3">
<button type="submit" name="action" value="set_session_catalog" class="btn btn-primary">Set Session Catalog</button>
<button type="submit" name="action" value="edit_catalog" class="btn btn-primary">Edit Catalog</button>
</div>
</form>
</div>
{% endblock %}
{% block content_footer %}
{{ render_pagination(pagination, 'document_bp.catalogs') }}
{% endblock %}

View File

@@ -23,15 +23,23 @@
{{ render_collapsible_section('Filter', 'Filter Options', filter_form) }}
<!-- Document Versions Table -->
{{ render_selectable_sortable_table(
headers=["ID", "File Type", "Processing", "Processing Start", "Processing Finish", "Processing Error"],
rows=rows,
selectable=True,
id="documentVersionsTable",
sort_by=sort_by,
sort_order=sort_order
) }}
<div class="form-group mt-3">
<form method="POST" action="{{ url_for('document_bp.handle_document_version_selection') }}">
<!-- Document Versions Table -->
{{ render_selectable_sortable_table(
headers=["ID", "File Type", "Processing", "Processing Start", "Processing Finish", "Processing Error"],
rows=rows,
selectable=True,
id="documentVersionsTable",
sort_by=sort_by,
sort_order=sort_order
) }}
<div class="form-group mt-4">
<button type="submit" name="action" value="edit_document_version" class="btn btn-primary">Edit Document Version</button>
<button type="submit" name="action" value="process_document_version" class="btn btn-danger">Process Document Version</button>
</div>
</form>
</div>
{% endblock %}
{% block content_footer %}

View File

@@ -1,5 +1,5 @@
{% extends 'base.html' %}
{% from 'macros.html' import render_selectable_table, render_pagination %}
{% from 'macros.html' import render_selectable_table, render_pagination, render_filter_field, render_date_filter_field, render_collapsible_section, render_selectable_sortable_table_with_dict_headers %}
{% block title %}Documents{% endblock %}
@@ -8,18 +8,88 @@
{% block content_class %}<div class="col-xl-12 col-lg-5 col-md-7 mx-auto"></div>{% endblock %}
{% block content %}
<div class="container">
<form method="POST" action="{{ url_for('document_bp.handle_document_selection') }}">
{{ render_selectable_table(headers=["Document ID", "Name", "Valid From", "Valid To"], rows=rows, selectable=True, id="documentsTable") }}
<div class="form-group mt-3">
<button type="submit" name="action" value="edit_document" class="btn btn-primary">Edit Document</button>
<button type="submit" name="action" value="document_versions" class="btn btn-secondary">Show Document Versions</button>
<button type="submit" name="action" value="refresh_document" class="btn btn-secondary">Refresh Document (new version)</button>
</div>
</form>
</div>
<!-- Filter Form -->
{% set filter_form %}
<form method="GET" action="{{ url_for('document_bp.documents') }}">
{{ render_filter_field('catalog_id', 'Catalog', filter_options['catalog_id'], filters.get('catalog_id', [])) }}
{{ render_filter_field('validity', 'Validity', filter_options['validity'], filters.get('validity', [])) }}
<button type="submit" class="btn btn-primary">Apply Filters</button>
</form>
{% endset %}
{{ render_collapsible_section('Filter', 'Filter Options', filter_form) }}
<div class="form-group mt-3">
<form method="POST" action="{{ url_for('document_bp.handle_document_selection') }}">
<!-- Documents Table -->
{{ render_selectable_sortable_table_with_dict_headers(
headers=[
{"text": "ID", "sort": "id"},
{"text": "Name", "sort": "name"},
{"text": "Catalog", "sort": "catalog_name"},
{"text": "Valid From", "sort": "valid_from"},
{"text": "Valid To", "sort": "valid_to"}
],
rows=rows,
selectable=True,
id="documentsTable",
sort_by=sort_by,
sort_order=sort_order
) }}
<div class="form-group mt-4">
<button type="submit" name="action" value="edit_document" class="btn btn-primary">Edit Document</button>
<button type="submit" name="action" value="document_versions" class="btn btn-secondary">Show Document Versions</button>
<button type="submit" name="action" value="refresh_document" class="btn btn-secondary">Refresh Document (new version)</button>
</div>
</form>
</div>
{% endblock %}
{% block content_footer %}
{{ render_pagination(pagination, 'document_bp.documents') }}
{% endblock %}
{% block scripts %}
<script>
document.addEventListener('DOMContentLoaded', function() {
const table = document.getElementById('documentsTable');
const headers = table.querySelectorAll('th.sortable');
headers.forEach(header => {
header.addEventListener('click', function() {
const sortBy = this.dataset.sort;
let sortOrder = 'asc';
if (this.querySelector('.fa-sort-up')) {
sortOrder = 'desc';
} else if (this.querySelector('.fa-sort-down')) {
sortOrder = 'none';
}
window.location.href = updateQueryStringParameter(window.location.href, 'sort_by', sortBy);
window.location.href = updateQueryStringParameter(window.location.href, 'sort_order', sortOrder);
});
});
function updateQueryStringParameter(uri, key, value) {
var re = new RegExp("([?&])" + key + "=.*?(&|$)", "i");
var separator = uri.indexOf('?') !== -1 ? "&" : "?";
if (uri.match(re)) {
return uri.replace(re, '$1' + key + "=" + value + '$2');
}
else {
return uri + separator + key + "=" + value;
}
}
table.addEventListener('change', function(event) {
if (event.target.type === 'radio') {
var selectedRow = event.target.closest('tr');
var documentId = selectedRow.cells[1].textContent;
console.log('Selected Document ID:', documentId);
}
});
});
</script>
{% endblock %}

View File

@@ -0,0 +1,35 @@
{% extends 'base.html' %}
{% from "macros.html" import render_field %}
{% block title %}Edit Catalog{% endblock %}
{% block content_title %}Edit Catalog{% endblock %}
{% block content_description %}Edit a catalog of documents in Evie's Library.
When you change chunking of embedding information, you'll need to manually refresh the library if you want immediate impact.
{% endblock %}
{% block content %}
<form method="post">
{{ form.hidden_tag() }}
{% set disabled_fields = ['type'] %}
{% set exclude_fields = [] %}
<!-- Render Static Fields -->
{% for field in form.get_static_fields() %}
{{ render_field(field, disabled_fields, exclude_fields) }}
{% endfor %}
<!-- Render Dynamic Fields -->
{% for collection_name, fields in form.get_dynamic_fields().items() %}
{% if fields|length > 0 %}
<h4 class="mt-4">{{ collection_name }}</h4>
{% endif %}
{% for field in fields %}
{{ render_field(field, disabled_fields, exclude_fields) }}
{% endfor %}
{% endfor %}
<button type="submit" class="btn btn-primary">Save Retriever</button>
</form>
{% endblock %}
{% block content_footer %}
{% endblock %}

View File

@@ -8,11 +8,17 @@
{% block content %}
<form method="post">
{{ form.hidden_tag() }}
{% set disabled_fields = [] %}
{% set exclude_fields = [] %}
{% for field in form %}
{{ render_field(field, disabled_fields, exclude_fields) }}
{% endfor %}
{% set disabled_fields = [] %}
{% set exclude_fields = [] %}
{{ render_field(form.name, disabled_fields, exclude_fields) }}
{{ render_field(form.valid_from, disabled_fields, exclude_fields) }}
{{ render_field(form.valid_to, disabled_fields, exclude_fields) }}
<div class="form-group">
<label for="catalog_name">Catalog</label>
<input type="text" class="form-control" id="catalog_name" value="{{ catalog_name }}" readonly>
</div>
<button type="submit" class="btn btn-primary">Update Document</button>
</form>
{% endblock %}
{% endblock %}

View File

@@ -0,0 +1,33 @@
{% extends 'base.html' %}
{% from "macros.html" import render_field %}
{% block title %}Edit Processor{% endblock %}
{% block content_title %}Edit Processor{% endblock %}
{% block content_description %}Edit a Processor (for a Catalog){% endblock %}
{% block content %}
<form method="post">
{{ form.hidden_tag() }}
{% set disabled_fields = ['type'] %}
{% set exclude_fields = [] %}
<!-- Render Static Fields -->
{% for field in form.get_static_fields() %}
{{ render_field(field, disabled_fields, exclude_fields) }}
{% endfor %}
<!-- Render Dynamic Fields -->
{% for collection_name, fields in form.get_dynamic_fields().items() %}
{% if fields|length > 0 %}
<h4 class="mt-4">{{ collection_name }}</h4>
{% endif %}
{% for field in fields %}
{{ render_field(field, disabled_fields, exclude_fields) }}
{% endfor %}
{% endfor %}
<button type="submit" class="btn btn-primary">Save Processor</button>
</form>
{% endblock %}
{% block content_footer %}
{% endblock %}

View File

@@ -0,0 +1,33 @@
{% extends 'base.html' %}
{% from "macros.html" import render_field %}
{% block title %}Edit Retriever{% endblock %}
{% block content_title %}Edit Retriever{% endblock %}
{% block content_description %}Edit a Retriever (for a Catalog){% endblock %}
{% block content %}
<form method="post">
{{ form.hidden_tag() }}
{% set disabled_fields = ['type'] %}
{% set exclude_fields = [] %}
<!-- Render Static Fields -->
{% for field in form.get_static_fields() %}
{{ render_field(field, disabled_fields, exclude_fields) }}
{% endfor %}
<!-- Render Dynamic Fields -->
{% for collection_name, fields in form.get_dynamic_fields().items() %}
{% if fields|length > 0 %}
<h4 class="mt-4">{{ collection_name }}</h4>
{% endif %}
{% for field in fields %}
{{ render_field(field, disabled_fields, exclude_fields) }}
{% endfor %}
{% endfor %}
<button type="submit" class="btn btn-primary">Save Retriever</button>
</form>
{% endblock %}
{% block content_footer %}
{% endblock %}

View File

@@ -10,6 +10,17 @@
<div class="container">
<form method="POST" action="{{ url_for('document_bp.handle_library_selection') }}">
<div class="form-group mt-3">
<h2>Create Default RAG Library</h2>
<p>This function will create a default library setup for RAG purposes. More specifically, it will create:</p>
<ul>
<li>A default RAG Catalog</li>
<li>A Default HTML Processor</li>
<li>A default RAG Retriever</li>
<li>A default RAG Specialist</li>
</ul>
<p>This enables a quick start-up for standard Ask Eve AI functionality. All elements can be changed later on an individual basis.</p>
<button type="submit" name="action" value="create_default_rag_library" class="btn btn-danger">Create Default RAG Library</button>
<h2>Re-Embed Latest Versions</h2>
<p>This functionality will re-apply embeddings on the latest versions of all documents in the library.
This is useful only while tuning the embedding parameters, or when changing embedding algorithms.
@@ -17,6 +28,7 @@
use it with caution!
</p>
<button type="submit" name="action" value="re_embed_latest_versions" class="btn btn-danger">Re-embed Latest Versions (expensive)</button>
<h2>Refresh all documents</h2>
<p>This operation will create new versions of all documents in the library with a URL. Documents that were uploaded directly,
cannot be automatically refreshed. This is an expensive operation, and impacts the performance of the system in future use.

View File

@@ -0,0 +1,23 @@
{% extends 'base.html' %}
{% from "macros.html" import render_field %}
{% block title %}Processor Registration{% endblock %}
{% block content_title %}Register Processor{% endblock %}
{% block content_description %}Define a new processor (for a catalog){% endblock %}
{% block content %}
<form method="post">
{{ form.hidden_tag() }}
{% set disabled_fields = [] %}
{% set exclude_fields = [] %}
{% for field in form %}
{{ render_field(field, disabled_fields, exclude_fields) }}
{% endfor %}
<button type="submit" class="btn btn-primary">Register Processor</button>
</form>
{% endblock %}
{% block content_footer %}
{% endblock %}

View File

@@ -0,0 +1,23 @@
{% extends 'base.html' %}
{% from 'macros.html' import render_selectable_table, render_pagination %}
{% block title %}Processors{% endblock %}
{% block content_title %}Processors{% endblock %}
{% block content_description %}View Processors for Tenant{% endblock %}
{% block content_class %}<div class="col-xl-12 col-lg-5 col-md-7 mx-auto"></div>{% endblock %}
{% block content %}
<div class="container">
<form method="POST" action="{{ url_for('document_bp.handle_processor_selection') }}">
{{ render_selectable_table(headers=["Processor ID", "Name", "Type", "Catalog ID"], rows=rows, selectable=True, id="retrieversTable") }}
<div class="form-group mt-3">
<button type="submit" name="action" value="edit_processor" class="btn btn-primary">Edit Processor</button>
</div>
</form>
</div>
{% endblock %}
{% block content_footer %}
{{ render_pagination(pagination, 'document_bp.processors') }}
{% endblock %}

View File

@@ -0,0 +1,23 @@
{% extends 'base.html' %}
{% from "macros.html" import render_field %}
{% block title %}Retriever Registration{% endblock %}
{% block content_title %}Register Retriever{% endblock %}
{% block content_description %}Define a new retriever (for a catalog){% endblock %}
{% block content %}
<form method="post">
{{ form.hidden_tag() }}
{% set disabled_fields = [] %}
{% set exclude_fields = [] %}
{% for field in form %}
{{ render_field(field, disabled_fields, exclude_fields) }}
{% endfor %}
<button type="submit" class="btn btn-primary">Register Retriever</button>
</form>
{% endblock %}
{% block content_footer %}
{% endblock %}

View File

@@ -0,0 +1,23 @@
{% extends 'base.html' %}
{% from 'macros.html' import render_selectable_table, render_pagination %}
{% block title %}Retrievers{% endblock %}
{% block content_title %}Retrievers{% endblock %}
{% block content_description %}View Retrievers for Tenant{% endblock %}
{% block content_class %}<div class="col-xl-12 col-lg-5 col-md-7 mx-auto"></div>{% endblock %}
{% block content %}
<div class="container">
<form method="POST" action="{{ url_for('document_bp.handle_retriever_selection') }}">
{{ render_selectable_table(headers=["Retriever ID", "Name", "Type", "Catalog ID"], rows=rows, selectable=True, id="retrieversTable") }}
<div class="form-group mt-3">
<button type="submit" name="action" value="edit_retriever" class="btn btn-primary">Edit Retriever</button>
</div>
</form>
</div>
{% endblock %}
{% block content_footer %}
{{ render_pagination(pagination, 'document_bp.retrievers') }}
{% endblock %}

View File

@@ -0,0 +1,28 @@
{% extends "email/base.html" %}
{% block content %}
<p>Hello,</p>
<p>A new API project has been created for your Ask Eve AI tenant. Here are the details:</p>
<div class="info-box">
<p><strong>Tenant ID:</strong> {{ tenant_id }}</p>
<p><strong>Tenant Name:</strong> {{ tenant_name }}</p>
<p><strong>Project Name:</strong> {{ project_name }}</p>
<p><strong>API Key:</strong> <span style="font-family: monospace; background-color: #f0f0f0; padding: 5px;">{{ api_key }}</span></p>
<div style="margin-top: 15px;">
<p><strong>Enabled Services:</strong></p>
<ul style="list-style-type: none; padding-left: 0;">
{% for service in services %}
<li>✓ {{ service }}</li>
{% endfor %}
</ul>
</div>
</div>
<div class="warning-box">
<strong>Important:</strong> Please store this API key securely. It cannot be retrieved once this email is gone.
</div>
<p>You can start using this API key right away to interact with our services. For documentation and usage examples, please visit our <a href="https://docs.askeveai.com">documentation</a>.</p>
{% endblock %}

View File

@@ -0,0 +1,106 @@
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>{{ subject|default('Message from Ask Eve AI') }}</title>
<style>
.email-container {
font-family: Tahoma, Geneva, sans-serif;
max-width: 600px;
margin: 0 auto;
}
.header {
text-align: center;
padding: 20px;
}
.header img {
max-width: 200px;
}
.footer {
text-align: center;
padding: 20px;
background-color: #f8f9fa;
}
.signature {
font-style: italic;
margin: 20px 0;
}
.footer-text {
font-size: 12px;
color: #666;
}
.footer img {
max-width: 100%;
height: auto;
width: 600px; /* Match the container width */
display: block;
margin: 20px auto;
}
@media only screen and (max-width: 600px) {
.footer img {
width: 100%;
}
}
.social-links {
margin: 20px 0;
}
.social-links a {
margin: 0 10px;
color: #0066cc;
text-decoration: none;
}
.info-box {
background-color: #f8f9fa;
border-left: 4px solid #0066cc;
padding: 15px;
margin: 20px 0;
}
.warning-box {
background-color: #fff3cd;
border-left: 4px solid #ffc107;
padding: 15px;
margin: 20px 0;
}
</style>
</head>
<body>
<div class="email-container">
<div class="header">
<img src="https://askeveai.com/wp-content/uploads/2024/07/Logo-Square-small.png" alt="Ask Eve AI Logo">
</div>
<div class="content-wrapper">
{% block content %}{% endblock %}
</div>
<div class="footer">
<div class="signature">
Best regards,<br>
Evie
</div>
{% if promo_image_url %}
<a href="https://www.askeveai.com">
<img src="{{ promo_image_url }}" alt="Ask Eve AI Promotion">
</a>
{% endif %}
<div class="social-links">
<a href="https://twitter.com/askeveai">Twitter</a>
<a href="https://linkedin.com/company/ask-eve-ai">LinkedIn</a>
</div>
<div class="footer-text">
© {{ year }} Ask Eve AI. All rights reserved.<br>
<a href="https://www.askeveai.com/privacy">Privacy Policy</a> |
<a href="https://www.askeveai.com/terms">Terms of Service</a>
{% if unsubscribe_url %}
| <a href="{{ unsubscribe_url }}">Unsubscribe</a>
{% endif %}
</div>
</div>
</div>
</body>
</html>

View File

@@ -0,0 +1,25 @@
{% extends 'base.html' %}
{% from "macros.html" import render_selectable_table, render_pagination %}
{% block title %}View Licenses{% endblock %}
{% block content_title %}View Licenses{% endblock %}
{% block content_description %}View Licenses{% endblock %}
{% block content %}
<form action="{{ url_for('entitlements_bp.handle_license_selection') }}" method="POST">
{{ render_selectable_table(headers=["License ID", "Name", "Start Date", "End Date", "Active"], rows=rows, selectable=True, id="licensesTable") }}
<div class="form-group mt-3">
<button type="submit" name="action" value="edit_license" class="btn btn-primary">Edit License</button>
<!-- Additional buttons can be added here for other actions -->
</div>
</form>
{% endblock %}
{% block content_footer %}
{{ render_pagination(pagination, 'entitlements_bp.view_licenses') }}
{% endblock %}
{% block scripts %}
{% endblock %}

View File

@@ -7,7 +7,7 @@
{% block content_description %}View License Usage{% endblock %}
{% block content %}
<form action="{{ url_for('user_bp.handle_user_action') }}" method="POST">
<form action="{{ url_for('entitlements_bp.handle_usage_selection') }}" method="POST">
{{ render_selectable_table(headers=["Usage ID", "Start Date", "End Date", "Storage (MiB)", "Embedding (MiB)", "Interaction (tokens)"], rows=rows, selectable=False, id="usagesTable") }}
<!-- <div class="form-group mt-3">-->
<!-- <button type="submit" name="action" value="edit_user" class="btn btn-primary">Edit Selected User</button>-->
@@ -20,7 +20,7 @@
{% endblock %}
{% block content_footer %}
{{ render_pagination(pagination, 'user_bp.select_tenant') }}
{{ render_pagination(pagination, 'entitlements_bp.view_usages') }}
{% endblock %}
{% block scripts %}

View File

@@ -1,5 +1,5 @@
<header class="header-2">
<div class="page-header min-vh-25" style="background-image: url({{url_for('static', filename='/assets/img/EveAI_bg.jpg')}})" loading="lazy">
<div class="page-header min-vh-25" style="background-image: url({{url_for('static', filename='/assets/img/EveAI_bg.jpg')}}); background-position: top left; background-repeat: no-repeat; background-size: cover;" loading="lazy">
<span class="mask bg-gradient-primary opacity-4"></span>
<div class="container">
<div class="row">
@@ -10,4 +10,4 @@
</div>
</div>
</div>
</header>
</header>

View File

@@ -0,0 +1,33 @@
{% extends 'base.html' %}
{% from "macros.html" import render_field %}
{% block title %}Edit Specialist{% endblock %}
{% block content_title %}Edit Specialist{% endblock %}
{% block content_description %}Edit a Specialist{% endblock %}
{% block content %}
<form method="post">
{{ form.hidden_tag() }}
{% set disabled_fields = ['type'] %}
{% set exclude_fields = [] %}
<!-- Render Static Fields -->
{% for field in form.get_static_fields() %}
{{ render_field(field, disabled_fields, exclude_fields) }}
{% endfor %}
<!-- Render Dynamic Fields -->
{% for collection_name, fields in form.get_dynamic_fields().items() %}
{% if fields|length > 0 %}
<h4 class="mt-4">{{ collection_name }}</h4>
{% endif %}
{% for field in fields %}
{{ render_field(field, disabled_fields, exclude_fields) }}
{% endfor %}
{% endfor %}
<button type="submit" class="btn btn-primary">Save Specialist</button>
</form>
{% endblock %}
{% block content_footer %}
{% endblock %}

View File

@@ -0,0 +1,23 @@
{% extends 'base.html' %}
{% from "macros.html" import render_field %}
{% block title %}Specialist Registration{% endblock %}
{% block content_title %}Register Specialist{% endblock %}
{% block content_description %}Define a new specialist{% endblock %}
{% block content %}
<form method="post">
{{ form.hidden_tag() }}
{% set disabled_fields = [] %}
{% set exclude_fields = [] %}
{% for field in form %}
{{ render_field(field, disabled_fields, exclude_fields) }}
{% endfor %}
<button type="submit" class="btn btn-primary">Register Specialist</button>
</form>
{% endblock %}
{% block content_footer %}
{% endblock %}

View File

@@ -0,0 +1,23 @@
{% extends 'base.html' %}
{% from 'macros.html' import render_selectable_table, render_pagination %}
{% block title %}Retrievers{% endblock %}
{% block content_title %}Specialists{% endblock %}
{% block content_description %}View Specialists for Tenant{% endblock %}
{% block content_class %}<div class="col-xl-12 col-lg-5 col-md-7 mx-auto"></div>{% endblock %}
{% block content %}
<div class="container">
<form method="POST" action="{{ url_for('interaction_bp.handle_specialist_selection') }}">
{{ render_selectable_table(headers=["Specialist ID", "Name", "Type"], rows=rows, selectable=True, id="specialistsTable") }}
<div class="form-group mt-3">
<button type="submit" name="action" value="edit_specialist" class="btn btn-primary">Edit Specialist</button>
</div>
</form>
</div>
{% endblock %}
{% block content_footer %}
{{ render_pagination(pagination, 'document_bp.retrievers') }}
{% endblock %}

View File

@@ -1,6 +1,4 @@
{% extends "base.html" %}
{% from "macros.html" import render_field %}
{% block title %}Session Overview{% endblock %}
{% block content_title %}Session Overview{% endblock %}
@@ -8,7 +6,7 @@
{% block content %}
<div class="container mt-5">
<h2>Chat Session Details</h2>
<h4>Chat Session Details</h4>
<div class="card mb-4">
<div class="card-header">
<h5>Session Information</h5>
@@ -21,44 +19,73 @@
</div>
</div>
<h3>Interactions</h3>
<h5>Interactions</h5>
<div class="accordion" id="interactionsAccordion">
{% for interaction in interactions %}
{% for interaction, id, question_at, specialist_arguments, specialist_results, specialist_name, specialist_type in interactions %}
<div class="accordion-item">
<h2 class="accordion-header" id="heading{{ loop.index }}">
<p class="accordion-header" id="heading{{ loop.index }}">
<button class="accordion-button collapsed" type="button" data-bs-toggle="collapse"
data-bs-target="#collapse{{ loop.index }}" aria-expanded="false"
aria-controls="collapse{{ loop.index }}">
<div class="d-flex justify-content-between align-items-center w-100">
<span class="interaction-question">{{ interaction.question | truncate(50) }}</span>
<span class="interaction-icons">
<i class="material-icons algorithm-icon {{ interaction.algorithm_used | lower }}">fingerprint</i>
<i class="material-icons thumb-icon {% if interaction.appreciation == 100 %}filled{% else %}outlined{% endif %}">thumb_up</i>
<i class="material-icons thumb-icon {% if interaction.appreciation == 0 %}filled{% else %}outlined{% endif %}">thumb_down</i>
</span>
<div class="interaction-header">
<div class="interaction-metadata">
<div class="interaction-time text-muted">
{{ question_at | to_local_time(chat_session.timezone) }}
</div>
<div class="specialist-info">
<span class="badge bg-primary">{{ specialist_name if specialist_name else 'No Specialist' }}</span>
<span class="badge bg-secondary">{{ specialist_type if specialist_type else '' }}</span>
</div>
</div>
<div class="interaction-question">
{{ specialist_results.detailed_query if specialist_results and specialist_results.detailed_query else specialist_arguments.query }}
</div>
</div>
</button>
</h2>
</p>
<div id="collapse{{ loop.index }}" class="accordion-collapse collapse" aria-labelledby="heading{{ loop.index }}"
data-bs-parent="#interactionsAccordion">
<div class="accordion-body">
<h6>Detailed Question:</h6>
<p>{{ interaction.detailed_question }}</p>
<h6>Answer:</h6>
<div class="markdown-content">{{ interaction.answer | safe }}</div>
{% if embeddings_dict.get(interaction.id) %}
<h6>Related Documents:</h6>
<ul>
{% for embedding in embeddings_dict[interaction.id] %}
<li>
{% if embedding.url %}
<a href="{{ embedding.url }}" target="_blank">{{ embedding.url }}</a>
{% else %}
{{ embedding.file_name }}
{% endif %}
</li>
{% endfor %}
</ul>
<!-- Arguments Section -->
{% if specialist_arguments %}
<div class="mb-4">
<h6 class="mb-3">Specialist Arguments:</h6>
<div class="code-wrapper">
<pre><code class="language-json" style="width: 100%;">{{ specialist_arguments | tojson(indent=2) }}</code></pre>
</div>
</div>
{% endif %}
<!-- Results Section -->
{% if specialist_results %}
<div class="mb-4">
<h6 class="mb-3">Specialist Results:</h6>
<div class="code-wrapper">
<pre><code class="language-json" style="width: 100%;">{{ specialist_results | tojson(indent=2) }}</code></pre>
</div>
</div>
{% endif %}
<!-- Related Documents Section -->
{% if embeddings_dict.get(id) %}
<div class="mt-4">
<h6>Related Documents:</h6>
<ul class="list-group">
{% for embedding in embeddings_dict[id] %}
<li class="list-group-item">
{% if embedding.url %}
<a href="{{ embedding.url }}" target="_blank" class="text-decoration-none">
<i class="material-icons align-middle me-2">link</i>
{{ embedding.url }}
</a>
{% else %}
<i class="material-icons align-middle me-2">description</i>
{{ embedding.object_name }}
{% endif %}
</li>
{% endfor %}
</ul>
</div>
{% endif %}
</div>
</div>
@@ -68,14 +95,166 @@
</div>
{% endblock %}
{% block styles %}
{{ super() }}
<style>
.interaction-header {
font-size: 0.9rem;
display: flex;
flex-direction: column;
width: 100%;
padding: 0.5rem 0;
}
.interaction-metadata {
display: flex;
gap: 1rem;
align-items: center;
margin-bottom: 0.5rem;
}
.interaction-time {
font-size: 0.9rem;
}
.specialist-info {
display: flex;
gap: 0.5rem;
align-items: center;
}
.interaction-question {
font-size: 0.9rem;
font-weight: bold;
line-height: 1.4;
}
.badge {
font-size: 0.9rem;
padding: 0.35em 0.65em;
white-space: nowrap;
}
.accordion-button {
padding: 0.5rem 1rem;
}
.accordion-button::after {
margin-left: 1rem;
}
.json-display {
background-color: #f8f9fa;
border-radius: 4px;
padding: 15px;
margin: 0;
white-space: pre-wrap;
word-wrap: break-word;
font-family: monospace;
font-size: 0.85rem;
line-height: 1.5;
max-width: 100%;
overflow-x: auto;
}
.list-group-item {
font-size: 0.9rem;
}
.material-icons {
font-size: 1.1rem;
}
pre {
margin: 0;
padding: 0;
white-space: pre-wrap !important; /* Force wrapping */
word-wrap: break-word !important; /* Break long words if necessary */
max-width: 100%; /* Ensure container doesn't overflow */
}
pre, code {
margin: 0;
padding: 0;
white-space: pre-wrap !important; /* Force wrapping */
word-wrap: break-word !important; /* Break long words if necessary */
max-width: 100%; /* Ensure container doesn't overflow */
}
pre code {
padding: 1rem !important;
border-radius: 4px;
font-size: 0.75rem;
line-height: 1.5;
white-space: pre-wrap !important; /* Force wrapping in code block */
}
.code-wrapper {
position: relative;
width: 100%;
}
/* Override all possible highlight.js white-space settings */
.code-wrapper pre,
.code-wrapper pre code,
.code-wrapper pre code.hljs,
.code-wrapper .hljs {
white-space: pre-wrap !important;
overflow-wrap: break-word !important;
word-wrap: break-word !important;
word-break: break-word !important;
max-width: 100% !important;
overflow-x: hidden !important;
}
.code-wrapper pre {
margin: 0;
background: #f8f9fa;
border-radius: 4px;
}
.code-wrapper pre code {
padding: 1rem !important;
font-family: monospace;
font-size: 0.9rem;
line-height: 1.5;
display: block;
}
/* Override highlight.js default nowrap behavior */
.hljs {
background: #f8f9fa !important;
white-space: pre-wrap !important;
word-wrap: break-word !important;
}
/* Color theme */
.hljs-string {
color: #0a3069 !important;
}
.hljs-attr {
color: #953800 !important;
}
.hljs-number {
color: #116329 !important;
}
.hljs-boolean {
color: #0550ae !important;
}
</style>
{% endblock %}
{% block scripts %}
<script src="https://cdn.jsdelivr.net/npm/marked/marked.min.js"></script>
{{ super() }}
<script>
document.addEventListener('DOMContentLoaded', function() {
var markdownElements = document.querySelectorAll('.markdown-content');
markdownElements.forEach(function(el) {
el.innerHTML = marked.parse(el.textContent);
});
document.addEventListener('DOMContentLoaded', function() {
// Initialize syntax highlighting
document.querySelectorAll('pre code').forEach((block) => {
hljs.highlightElement(block);
});
});
</script>
{% endblock %}

View File

@@ -1,49 +1,96 @@
{% macro render_field(field, disabled_fields=[], exclude_fields=[], class='') %}
{% set disabled = field.name in disabled_fields %}
{% set exclude_fields = exclude_fields + ['csrf_token', 'submit'] %}
{% if field.name not in exclude_fields %}
{% if field.type == 'BooleanField' %}
<div class="form-check">
{{ field(class="form-check-input " + class, type="checkbox", id="flexSwitchCheckDefault") }}
{{ field.label(class="form-check-label", for="flexSwitchCheckDefault", disabled=disabled) }}
</div>
{% else %}
<div class="form-group">
{{ field.label(class="form-label") }}
{{ field(class="form-control " + class, disabled=disabled) }}
{% if field.errors %}
<div class="invalid-feedback">
{% for error in field.errors %}
{{ error }}
{% endfor %}
</div>
{% macro render_field_content(field, disabled=False, class='') %}
{% if field.type == 'BooleanField' %}
<div class="form-group">
<div class="form-check form-switch">
{{ field(class="form-check-input " + class, disabled=disabled) }}
{% if field.description %}
{{ field.label(class="form-check-label",
**{'data-bs-toggle': 'tooltip',
'data-bs-placement': 'right',
'title': field.description}) }}
{% if field.flags.required %}
<span class="required-field-indicator" aria-hidden="true">
<i class="material-symbols-outlined required-icon">check_circle</i>
</span>
<span class="visually-hidden">Required field</span>
{% endif %}
{% else %}
{{ field.label(class="form-check-label") }}
{% if field.flags.required %}
<span class="required-field-indicator" aria-hidden="true">
<i class="material-symbols-outlined required-icon">check_circle</i>
</span>
<span class="visually-hidden">Required field</span>
{% endif %}
{% endif %}
</div>
{% endif %}
{% if field.errors %}
<div class="invalid-feedback d-block">
{% for error in field.errors %}
{{ error }}
{% endfor %}
</div>
{% endif %}
</div>
{% else %}
<div class="form-group">
{% if field.description %}
{{ field.label(class="form-label",
**{'data-bs-toggle': 'tooltip',
'data-bs-placement': 'right',
'title': field.description}) }}
{% if field.flags.required %}
<span class="required-field-indicator" aria-hidden="true">
<i class="material-symbols-outlined required-icon">check_circle</i>
</span>
<span class="visually-hidden">Required field</span>
{% endif %}
{% else %}
{{ field.label(class="form-label") }}
{% if field.flags.required %}
<span class="required-field-indicator" aria-hidden="true">
<i class="material-symbols-outlined required-icon">check_circle</i>
</span>
<span class="visually-hidden">Required field</span>
{% endif %}
{% endif %}
{% if field.type == 'TextAreaField' and 'json-editor' in class %}
<div id="{{ field.id }}-editor" class="json-editor-container"></div>
{{ field(class="form-control d-none " + class, disabled=disabled) }}
{% elif field.type == 'SelectField' %}
{{ field(class="form-control form-select " + class, disabled=disabled) }}
{% else %}
{{ field(class="form-control " + class, disabled=disabled) }}
{% endif %}
{% if field.errors %}
<div class="invalid-feedback d-block">
{% for error in field.errors %}
{{ error }}
{% endfor %}
</div>
{% endif %}
</div>
{% endif %}
{% endmacro %}
{% macro render_included_field(field, disabled_fields=[], include_fields=[]) %}
{% macro render_field(field, disabled_fields=[], exclude_fields=[], class='') %}
<!-- Debug info -->
<!-- Field name: {{ field.name }}, Field type: {{ field.__class__.__name__ }} -->
{% set disabled = field.name in disabled_fields %}
{% set exclude_fields = exclude_fields + ['csrf_token', 'submit'] %}
{% if field.name not in exclude_fields %}
{{ render_field_content(field, disabled, class) }}
{% endif %}
{% endmacro %}
{% macro render_included_field(field, disabled_fields=[], include_fields=[], class='') %}
{% set disabled = field.name in disabled_fields %}
{% if field.name in include_fields %}
{% if field.type == 'BooleanField' %}
<div class="form-check">
{{ field(class="form-check-input", type="checkbox", id="flexSwitchCheckDefault") }}
{{ field.label(class="form-check-label", for="flexSwitchCheckDefault", disabled=disabled) }}
</div>
{% else %}
<div class="form-group">
{{ field.label(class="form-label") }}
{{ field(class="form-control", disabled=disabled) }}
{% if field.errors %}
<div class="invalid-feedback">
{% for error in field.errors %}
{{ error }}
{% endfor %}
</div>
{% endif %}
</div>
{% endif %}
{{ render_field_content(field, disabled, class) }}
{% endif %}
{% endmacro %}
@@ -177,6 +224,48 @@
</div>
{% endmacro %}
{% macro render_selectable_sortable_table_with_dict_headers(headers, rows, selectable, id, sort_by, sort_order) %}
<div class="card">
<div class="table-responsive">
<table class="table align-items-center mb-0" id="{{ id }}">
<thead>
<tr>
{% if selectable %}
<th class="text-uppercase text-secondary text-xxs font-weight-bolder opacity-7">Select</th>
{% endif %}
{% for header in headers %}
<th class="text-uppercase text-secondary text-xxs font-weight-bolder opacity-7 sortable" data-sort="{{ header['sort'] }}">
{{ header['text'] }}
{% if sort_by == header['sort'] %}
{% if sort_order == 'asc' %}
<i class="fas fa-sort-up"></i>
{% elif sort_order == 'desc' %}
<i class="fas fa-sort-down"></i>
{% endif %}
{% else %}
<i class="fas fa-sort"></i>
{% endif %}
</th>
{% endfor %}
</tr>
</thead>
<tbody>
{% for row in rows %}
<tr>
{% if selectable %}
<td><input type="radio" name="selected_row" value="{{ row[0].value }}"></td>
{% endif %}
{% for cell in row %}
<td>{{ cell.value }}</td>
{% endfor %}
</tr>
{% endfor %}
</tbody>
</table>
</div>
</div>
{% endmacro %}
{% macro render_accordion(accordion_id, accordion_items, header_title, header_description) %}
<div class="accordion-1">
<div class="container">

View File

@@ -75,12 +75,20 @@
{'name': 'Edit Tenant', 'url': '/user/tenant/' ~ session['tenant'].get('id'), 'roles': ['Super User', 'Tenant Admin']},
{'name': 'Tenant Domains', 'url': '/user/view_tenant_domains', 'roles': ['Super User', 'Tenant Admin']},
{'name': 'Tenant Domain Registration', 'url': '/user/tenant_domain', 'roles': ['Super User', 'Tenant Admin']},
{'name': 'Tenant Projects', 'url': '/user/tenant_projects', 'roles': ['Super User', 'Tenant Admin']},
{'name': 'Tenant Project Registration', 'url': '/user/tenant_project', 'roles': ['Super User', 'Tenant Admin']},
{'name': 'User List', 'url': '/user/view_users', 'roles': ['Super User', 'Tenant Admin']},
{'name': 'User Registration', 'url': '/user/user', 'roles': ['Super User', 'Tenant Admin']},
]) }}
{% endif %}
{% if current_user.is_authenticated %}
{{ dropdown('Document Mgmt', 'note_stack', [
{'name': 'Add Catalog', 'url': '/document/catalog', 'roles': ['Super User', 'Tenant Admin']},
{'name': 'All Catalogs', 'url': '/document/catalogs', 'roles': ['Super User', 'Tenant Admin']},
{'name': 'Add Processor', 'url': '/document/processor', 'roles': ['Super User', 'Tenant Admin']},
{'name': 'All Processors', 'url': '/document/processors', 'roles': ['Super User', 'Tenant Admin']},
{'name': 'Add Retriever', 'url': '/document/retriever', 'roles': ['Super User', 'Tenant Admin']},
{'name': 'All Retrievers', 'url': '/document/retrievers', 'roles': ['Super User', 'Tenant Admin']},
{'name': 'Add Document', 'url': '/document/add_document', 'roles': ['Super User', 'Tenant Admin']},
{'name': 'Add URL', 'url': '/document/add_url', 'roles': ['Super User', 'Tenant Admin']},
{'name': 'Add a list of URLs', 'url': '/document/add_urls', 'roles': ['Super User', 'Tenant Admin']},
@@ -91,6 +99,8 @@
{% endif %}
{% if current_user.is_authenticated %}
{{ dropdown('Interactions', 'hub', [
{'name': 'Add Specialist', 'url': '/interaction/specialist', 'roles': ['Super User', 'Tenant Admin']},
{'name': 'All Specialists', 'url': '/interaction/specialists', 'roles': ['Super User', 'Tenant Admin']},
{'name': 'Chat Sessions', 'url': '/interaction/chat_sessions', 'roles': ['Super User', 'Tenant Admin']},
]) }}
{% endif %}
@@ -99,6 +109,7 @@
{'name': 'License Tier Registration', 'url': '/entitlements/license_tier', 'roles': ['Super User']},
{'name': 'All License Tiers', 'url': '/entitlements/view_license_tiers', 'roles': ['Super User']},
{'name': 'Trigger Actions', 'url': '/administration/trigger_actions', 'roles': ['Super User']},
{'name': 'All Licenses', 'url': '/entitlements/view_licenses', 'roles': ['Super User', 'Tenant Admin']},
{'name': 'Usage', 'url': '/entitlements/view_usages', 'roles': ['Super User', 'Tenant Admin']},
]) }}
{% endif %}

View File

@@ -14,4 +14,130 @@
<script src="{{url_for('static', filename='assets/js/material-kit-pro.min.js')}}?v=3.0.4 type="text/javascript"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/bootstrap/5.3.3/js/bootstrap.bundle.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/select2/4.0.13/js/select2.min.js"></script>
<link href="https://cdnjs.cloudflare.com/ajax/libs/jsoneditor/10.1.0/jsoneditor.min.css" rel="stylesheet" type="text/css">
<script src="https://cdnjs.cloudflare.com/ajax/libs/jsoneditor/10.1.0/jsoneditor.min.js"></script>
<script>
document.addEventListener('DOMContentLoaded', function() {
// Initialize tooltips
var tooltipTriggerList = [].slice.call(document.querySelectorAll('[data-bs-toggle="tooltip"]'))
var tooltipList = tooltipTriggerList.map(function (tooltipTriggerEl) {
return new bootstrap.Tooltip(tooltipTriggerEl)
});
// Initialize JSON editors
document.querySelectorAll('.json-editor').forEach(function(textarea) {
// Create container for editor
var container = document.getElementById(textarea.id + '-editor');
// Initialize the editor
var editor = new JSONEditor(container, {
mode: 'code',
modes: ['code', 'tree'],
onChangeText: function(jsonString) {
textarea.value = jsonString;
}
});
// Set initial value
try {
const initialValue = textarea.value ? JSON.parse(textarea.value) : {};
editor.set(initialValue);
} catch (e) {
console.error('Error parsing initial JSON:', e);
editor.set({});
}
// Add validation indicator
editor.validate().then(function(errors) {
if (errors.length) {
container.style.border = '2px solid red';
} else {
container.style.border = '1px solid #ccc';
}
});
});
});
</script>
<script>
document.addEventListener('DOMContentLoaded', function() {
// Get all forms with tabs
const formsWithTabs = document.querySelectorAll('form');
formsWithTabs.forEach(form => {
// Handle the form's submit event
form.addEventListener('submit', function(event) {
const invalidFields = form.querySelectorAll(':invalid');
if (invalidFields.length > 0) {
// Prevent form submission
event.preventDefault();
// Find which tab contains the first invalid field
const firstInvalidField = invalidFields[0];
const tabPane = firstInvalidField.closest('.tab-pane');
if (tabPane) {
// Get the tab ID
const tabId = tabPane.id;
// Find and click the corresponding tab button
const tabButton = document.querySelector(`[data-bs-toggle="tab"][data-bs-target="#${tabId}"]`);
if (tabButton) {
const tab = new bootstrap.Tab(tabButton);
tab.show();
}
// Scroll the invalid field into view and focus it
firstInvalidField.scrollIntoView({ behavior: 'smooth', block: 'center' });
firstInvalidField.focus();
}
// Optional: Show a message about validation errors
const errorCount = invalidFields.length;
const message = `Please fill in all required fields (${errorCount} ${errorCount === 1 ? 'error' : 'errors'} found)`;
if (typeof Swal !== 'undefined') {
// If SweetAlert2 is available
Swal.fire({
title: 'Validation Error',
text: message,
icon: 'error',
confirmButtonText: 'OK'
});
} else {
// Fallback to browser alert
alert(message);
}
}
});
// Optional: Real-time validation as user switches tabs
const tabButtons = document.querySelectorAll('[data-bs-toggle="tab"]');
tabButtons.forEach(button => {
button.addEventListener('shown.bs.tab', function() {
const previousTabPane = document.querySelector(button.getAttribute('data-bs-target'));
if (previousTabPane) {
const invalidFields = previousTabPane.querySelectorAll(':invalid');
if (invalidFields.length > 0) {
// Add visual indicator to tab
button.classList.add('has-error');
} else {
button.classList.remove('has-error');
}
}
});
});
});
});
</script>
<style>
.json-editor-container {
height: 400px;
margin-bottom: 1rem;
}
.tooltip {
position: fixed;
}
</style>

View File

@@ -0,0 +1,28 @@
{% extends 'base.html' %}
{% block title %}Delete Tenant Project{% endblock %}
{% block content_title %}Delete Tenant Project{% endblock %}
{% block content_description %}Are you sure you want to delete this tenant project?{% endblock %}
{% block content %}
<div class="container">
<div class="alert alert-warning">
<p><strong>Warning:</strong> You are about to delete the following tenant project:</p>
<ul>
<li><strong>Name:</strong> {{ tenant_project.name }}</li>
<li><strong>API Key:</strong> {{ tenant_project.visual_api_key }}</li>
<li><strong>Responsible:</strong> {{ tenant_project.responsible_email or 'Not specified' }}</li>
</ul>
<p>This action cannot be undone.</p>
</div>
<form method="POST">
{{ form.csrf_token if form }}
<div class="form-group mt-3">
<a href="{{ url_for('user_bp.tenant_projects') }}" class="btn btn-secondary">Cancel</a>
<button type="submit" class="btn btn-danger">Confirm Delete</button>
</div>
</form>
</div>
{% endblock %}

View File

@@ -0,0 +1,26 @@
{% extends 'base.html' %}
{% from "macros.html" import render_field %}
{% block title %}Edit Tenant Project{% endblock %}
{% block content_title %}Edit Tenant Project{% endblock %}
{% block content_description %}Edit a Tenant Project. It is impossible to view of renew the existing API key.
You need to invalidate the current project, and create a new one.
{% endblock %}
{% block content %}
<form method="post">
{{ form.hidden_tag() }}
{% set disabled_fields = [] %}
{% set exclude_fields = [] %}
<!-- Render Static Fields -->
{% for field in form %}
{{ render_field(field, disabled_fields, exclude_fields) }}
{% endfor %}
<button type="submit" class="btn btn-primary">Save Tenant Project</button>
</form>
{% endblock %}
{% block content_footer %}
{% endblock %}

Some files were not shown because too many files have changed in this diff Show More