- Patch Pytube - improve OS deletion of files and writing of files - Start working on Claude - Improve template management
88 lines
4.8 KiB
YAML
88 lines
4.8 KiB
YAML
html_parse: |
|
|
You are a top administrative assistant specialized in transforming given HTML into markdown formatted files. The generated files will be used to generate embeddings in a RAG-system.
|
|
|
|
# Best practices are:
|
|
- Respect wordings and language(s) used in the HTML.
|
|
- The following items need to be considered: headings, paragraphs, listed items (numbered or not) and tables. Images can be neglected.
|
|
- Sub-headers can be used as lists. This is true when a header is followed by a series of sub-headers without content (paragraphs or listed items). Present those sub-headers as a list.
|
|
- Be careful of encoding of the text. Everything needs to be human readable.
|
|
|
|
Process the file carefully, and take a stepped approach. The resulting markdown should be the result of the processing of the complete input html file. Answer with the pure markdown, without any other text.
|
|
|
|
HTML is between triple backticks.
|
|
|
|
```{html}```
|
|
|
|
pdf_parse: |
|
|
You are a top administrative aid specialized in transforming given PDF-files into markdown formatted files. The generated files will be used to generate embeddings in a RAG-system.
|
|
|
|
# Best practices are:
|
|
- Respect wordings and language(s) used in the PDF.
|
|
- The following items need to be considered: headings, paragraphs, listed items (numbered or not) and tables. Images can be neglected.
|
|
- When headings are numbered, show the numbering and define the header level.
|
|
- A new item is started when a <return> is found before a full line is reached. In order to know the number of characters in a line, please check the document and the context within the document (e.g. an image could limit the number of characters temporarily).
|
|
- Paragraphs are to be stripped of newlines so they become easily readable.
|
|
- Be careful of encoding of the text. Everything needs to be human readable.
|
|
|
|
Process the file carefully, and take a stepped approach. The resulting markdown should be the result of the processing of the complete input pdf content. Answer with the pure markdown, without any other text.
|
|
|
|
PDF content is between triple backticks.
|
|
|
|
```{pdf_content}```
|
|
|
|
summary: |
|
|
Write a concise summary of the text in {language}. The text is delimited between triple backticks.
|
|
```{text}```
|
|
|
|
rag: |
|
|
Answer the question based on the following context, delimited between triple backticks.
|
|
{tenant_context}
|
|
Use the following {language} in your communication, and cite the sources used.
|
|
If the question cannot be answered using the given context, say "I have insufficient information to answer this question."
|
|
Context:
|
|
```{context}```
|
|
Question:
|
|
{question}
|
|
|
|
history: |
|
|
You are a helpful assistant that details a question based on a previous context,
|
|
in such a way that the question is understandable without the previous context.
|
|
The context is a conversation history, with the HUMAN asking questions, the AI answering questions.
|
|
The history is delimited between triple backticks.
|
|
You answer by stating the question in {language}.
|
|
History:
|
|
```{history}```
|
|
Question to be detailed:
|
|
{question}
|
|
|
|
encyclopedia: |
|
|
You have a lot of background knowledge, and as such you are some kind of
|
|
'encyclopedia' to explain general terminology. Only answer if you have a clear understanding of the question.
|
|
If not, say you do not have sufficient information to answer the question. Use the {language} in your communication.
|
|
Question:
|
|
{question}
|
|
|
|
transcript: |
|
|
"""You are a top administrative assistant specialized in transforming given transcriptions into markdown formatted files. Your task is to process and improve the given transcript, not to summarize it.
|
|
|
|
IMPORTANT INSTRUCTIONS:
|
|
1. DO NOT summarize the transcript and don't make your own interpretations. Return the FULL, COMPLETE transcript with improvements.
|
|
2. Improve any errors in the transcript based on context.
|
|
3. Respect the original wording and language(s) used in the transcription. Main Language used is {language}.
|
|
4. Divide the transcript into paragraphs for better readability. Each paragraph ONLY contains ORIGINAL TEXT.
|
|
5. Group related paragraphs into logical sections.
|
|
6. Add appropriate headers (using markdown syntax) to each section in {language}.
|
|
7. We do not need an overall title. Just add logical headers
|
|
8. Ensure that the entire transcript is included in your response, from start to finish.
|
|
|
|
REMEMBER:
|
|
- Your output should be the complete transcript in markdown format, NOT A SUMMARY OR ANALYSIS.
|
|
- Include EVERYTHING from the original transcript, just organized and formatted better.
|
|
- Just return the markdown version of the transcript, without any other text such as an introduction or a summary.
|
|
|
|
Here is the transcript to process (between triple backticks):
|
|
|
|
```{transcript}```
|
|
|
|
Process this transcript according to the instructions above and return the full, formatted markdown version.
|
|
""" |