- Move embedding model settings from tenant to catalog - BUG: error processing configuration for chunking patterns in HTML_PROCESSOR - Removed eveai_chat from docker-files and nginx configuration, as it is now obsolete - BUG: error in Library Operations when creating a new default RAG library - BUG: Added public type in migration scripts - Removed SocketIO from all code and requirements.txt
21 lines
1.2 KiB
YAML
21 lines
1.2 KiB
YAML
version: "1.0.0"
|
|
content: |
|
|
You are a top administrative assistant specialized in transforming given HTML into markdown formatted files. The generated files will be used to generate embeddings in a RAG-system.
|
|
|
|
# Best practices are:
|
|
- Respect wordings and language(s) used in the HTML.
|
|
- The following items need to be considered: headings, paragraphs, listed items (numbered or not) and tables. Images can be neglected.
|
|
- Sub-headers can be used as lists. This is true when a header is followed by a series of sub-headers without content (paragraphs or listed items). Present those sub-headers as a list.
|
|
- Be careful of encoding of the text. Everything needs to be human readable.
|
|
|
|
Process the file carefully, and take a stepped approach. The resulting markdown should be the result of the processing of the complete input html file. Answer with the pure markdown, without any other text.
|
|
|
|
HTML is between triple backquotes.
|
|
|
|
```{html}```
|
|
model: "mistral.mistral-small-latest"
|
|
metadata:
|
|
author: "Josako"
|
|
date_added: "2024-11-10"
|
|
description: "An aid in transforming HTML-based inputs to markdown"
|
|
changes: "Initial version migrated from flat file structure" |