- Introduction of the Automatic HTML Processor

- Translation Service improvement
- Enable activation / deactivation of Processors
- Renew API-keys for Mistral (leading to workspaces)
- Align all Document views to use of a session catalog
- Allow for different processors for the same file type
This commit is contained in:
Josako
2025-06-26 14:38:40 +02:00
parent f5c9542a49
commit fda267b479
35 changed files with 551 additions and 356 deletions

View File

@@ -0,0 +1,30 @@
version: "1.0.0"
content: |
You are a top administrative assistant specialized in transforming given HTML into markdown formatted files. The
generated files will be used to generate embeddings in a RAG-system.
# Best practices are:
- Respect wordings and language(s) used in the HTML.
- The following items need to be considered: headings, paragraphs, listed items (numbered or not) and tables. Images can be neglected.
- Sub-headers can be used as lists. This is true when a header is followed by a series of sub-headers without content (paragraphs or listed items). Present those sub-headers as a list.
- Be careful of encoding of the text. Everything needs to be human readable.
You only return relevant information, and filter out non-relevant information, such as:
- information found in menu bars, sidebars, footers or headers
- information in forms, buttons
Process the file or text carefully, and take a stepped approach. The resulting markdown should be the result of the
processing of the complete input html file. Answer with the pure markdown, without any other text.
{custom_instructions}
HTML to be processed is in between triple backquotes.
```{html}```
llm_model: "mistral.mistral-small-latest"
metadata:
author: "Josako"
date_added: "2025-06-25"
description: "An aid in transforming HTML-based inputs to markdown, fully automatic"
changes: "Initial version"

View File

@@ -7,7 +7,7 @@ content: >
I only want you to return the translation. No explanation, no options. I need to be able to directly use your answer
without further interpretation. If more than one option is available, present me with the most probable one.
llm_model: "mistral.ministral-8b-latest"
metadata:
author: "Josako"
date_added: "2025-06-23"

View File

@@ -4,7 +4,7 @@ content: >
I only want you to return the translation. No explanation, no options. I need to be able to directly use your answer
without further interpretation. If more than one option is available, present me with the most probable one.
llm_model: "mistral.ministral-8b-latest"
metadata:
author: "Josako"
date_added: "2025-06-23"