- Introduction of the Automatic HTML Processor
- Translation Service improvement - Enable activation / deactivation of Processors - Renew API-keys for Mistral (leading to workspaces) - Align all Document views to use of a session catalog - Allow for different processors for the same file type
This commit is contained in:
@@ -0,0 +1,14 @@
|
||||
version: "1.0.0"
|
||||
name: "HTML Processor"
|
||||
file_types: "html"
|
||||
description: "A processor for HTML files, driven by AI"
|
||||
configuration:
|
||||
custom_instructions:
|
||||
name: "Custom Instructions"
|
||||
description: "Some custom instruction to guide our AI agent in parsing your HTML file"
|
||||
type: "text"
|
||||
required: false
|
||||
metadata:
|
||||
author: "Josako"
|
||||
date_added: "2025-06-25"
|
||||
description: "A processor for HTML files, driven by AI"
|
||||
30
config/prompts/globals/automagic_html_parse/1.0.0.yaml
Normal file
30
config/prompts/globals/automagic_html_parse/1.0.0.yaml
Normal file
@@ -0,0 +1,30 @@
|
||||
version: "1.0.0"
|
||||
content: |
|
||||
You are a top administrative assistant specialized in transforming given HTML into markdown formatted files. The
|
||||
generated files will be used to generate embeddings in a RAG-system.
|
||||
|
||||
# Best practices are:
|
||||
- Respect wordings and language(s) used in the HTML.
|
||||
- The following items need to be considered: headings, paragraphs, listed items (numbered or not) and tables. Images can be neglected.
|
||||
- Sub-headers can be used as lists. This is true when a header is followed by a series of sub-headers without content (paragraphs or listed items). Present those sub-headers as a list.
|
||||
- Be careful of encoding of the text. Everything needs to be human readable.
|
||||
|
||||
You only return relevant information, and filter out non-relevant information, such as:
|
||||
- information found in menu bars, sidebars, footers or headers
|
||||
- information in forms, buttons
|
||||
|
||||
Process the file or text carefully, and take a stepped approach. The resulting markdown should be the result of the
|
||||
processing of the complete input html file. Answer with the pure markdown, without any other text.
|
||||
|
||||
{custom_instructions}
|
||||
|
||||
HTML to be processed is in between triple backquotes.
|
||||
|
||||
```{html}```
|
||||
|
||||
llm_model: "mistral.mistral-small-latest"
|
||||
metadata:
|
||||
author: "Josako"
|
||||
date_added: "2025-06-25"
|
||||
description: "An aid in transforming HTML-based inputs to markdown, fully automatic"
|
||||
changes: "Initial version"
|
||||
@@ -7,7 +7,7 @@ content: >
|
||||
|
||||
I only want you to return the translation. No explanation, no options. I need to be able to directly use your answer
|
||||
without further interpretation. If more than one option is available, present me with the most probable one.
|
||||
|
||||
llm_model: "mistral.ministral-8b-latest"
|
||||
metadata:
|
||||
author: "Josako"
|
||||
date_added: "2025-06-23"
|
||||
|
||||
@@ -4,7 +4,7 @@ content: >
|
||||
|
||||
I only want you to return the translation. No explanation, no options. I need to be able to directly use your answer
|
||||
without further interpretation. If more than one option is available, present me with the most probable one.
|
||||
|
||||
llm_model: "mistral.ministral-8b-latest"
|
||||
metadata:
|
||||
author: "Josako"
|
||||
date_added: "2025-06-23"
|
||||
|
||||
@@ -24,5 +24,10 @@ PROCESSOR_TYPES = {
|
||||
"name": "DOCX Processor",
|
||||
"description": "A processor for DOCX files",
|
||||
"file_types": "docx",
|
||||
}
|
||||
},
|
||||
"AUTOMAGIC_HTML_PROCESSOR": {
|
||||
"name": "AutoMagic HTML Processor",
|
||||
"description": "A processor for HTML files, driven by AI",
|
||||
"file_types": "html, htm",
|
||||
},
|
||||
}
|
||||
|
||||
Reference in New Issue
Block a user