Commit Graph

19 Commits

Author SHA1 Message Date
Josako
aa358df28e - Allowing for multiple types of Catalogs
- Introduction of retrievers
- Ensuring processing information is collected from Catalog iso Tenant
- Introduction of a generic Form class to enable dynamic fields based on a configuration
- Realisation of Retriever functionality to support dynamic fields
2024-10-25 14:11:47 +02:00
Josako
270479c77d - Add Catalog Concept to Document Domain
- Create Catalog views
- Modify document stack creation
2024-10-14 13:56:23 +02:00
Josako
6cf660e622 - Adding a Tenant Type
- Allow filtering on Tenant Types & searching for parts of Tenant names
- Implement health checks
- Start Prometheus monitoring (needs to be finalized)
- Refine audio_processor and srt_processor to reduce duplicate code and support for larger files
- Introduce repopack to reason in LLMs about the code
2024-09-13 15:43:40 +02:00
Josako
76cb825660 - Full API application, streamlined, de-duplication of document handling code into document_utils.py
- Added meta-data fields to DocumentVersion
- Docker container to support API
2024-09-09 16:11:42 +02:00
Josako
ae7bf3dbae - Correct default language when adding Documents and URLs 2024-09-02 14:04:22 +02:00
Josako
914c265afe - Improvements on document uploads (accept other files than html-files when entering a URL)
- Introduction of API-functionality (to be continued). Deduplication of document and url uploads between views and api.
- Improvements on document processing - introduction of processor classes to streamline document inputs
- Removed pure Youtube functionality, as Youtube retrieval of documents continuously changes. But added upload of srt, mp3, ogg and mp4
2024-09-02 12:37:44 +02:00
Josako
8f08d6e1ae Allow for a list of URLs to be entered into the system. 2024-07-08 15:17:10 +02:00
Josako
8e1dac0233 Youtube added - further checking required 2024-07-04 08:11:31 +02:00
Josako
27b6de8734 Removing DocumentLanguage, as both System Context and User Context are to be defined on DocumentVersion level.
Finetuning of embedding workers.
2024-06-06 15:26:49 +02:00
Josako
61e1372dc8 Improvements to Document Interface and correcting embedding workers 2024-06-04 14:59:38 +02:00
Josako
6c2e99f467 Realise processing of HTML and improve both HTML & PDF processing giving new tenant information. 2024-05-13 17:18:38 +02:00
Josako
adee283d7a Simplify model selection for both embeddings and LLM. Editing capabilities for new tenant columns... 2024-05-13 14:58:21 +02:00
Josako
699de951e8 Add functionality to add a URL to the system. 2024-05-10 22:44:53 +02:00
Josako
bf6d91527b Add extra chunking information in Tenant schema
Add extra scripts for flask-migrate to support refactoring
2024-05-08 17:40:42 +02:00
Josako
d925477e68 Setup of documents view 2024-05-05 20:21:44 +02:00
Josako
31250443c2 Add the prefered embedding model to the add_document interface 2024-05-02 10:19:15 +02:00
Josako
659588deab Changes Documents - llm and languagefields on tenant, processing on documents
first version of Adding Documents (excl. embeddings)
2024-05-02 00:12:27 +02:00
Josako
9f350b5ea0 realise document upload - Part 1 2024-04-30 22:47:44 +02:00
Josako
ffa60b4616 Update Document domain models to use pgvector (extension of PostgreSQL) 2024-04-30 15:09:32 +02:00