AI Parser Processor
The AI Parser Processor in Gathr efficiently extracts and organizes valuable information from PDF documents, enhancing data processing capabilities.
Input Configurations
Input configurations for the AI Powered PDF Parser as explained below.
Input Column
Provide the column containing the PDF data to be parsed.
Base64 Encoded
Select if the input PDF data is encoded in Base64 format.
Password-Protected
Select if the PDF is password protected.
Fetch Password
Fetch password from either existing column or Inline of the PDF.
Column
Fetch password from existing column’s row of the pdf.
Enter Password
Specify the column containing the password for password protected PDF. Provide a hardcoded password of the file.
Split PDF into Pages
Check to process each page separately (page wise output).
AI Provider
Select the AI provider for document parsing. Available options are: Gathr Managed and User Provided.
Upon selecting User Provided option as AI Provider, provide the below details.
LLM Provider
Select the LLM provider for text processing. Available options are MistralAI and OpenAI.
Connection Name
Connection name is to be selected out of the list of saved connections or it can be created with the Add New Connection option. Selected connection will be used to connect with the source.
For further details, refer to Mistral AI Connection and OpenAI Connection to know more about creating the connections.
Model
Select the AI model for text extraction.
Prompt for Extraction
Enter the prompt for text extraction.
Max Tokens
Maximum tokens for model output.
Temperature
Controls randomness of the mode output (0-2).
Top P
Controls diversity via necleus sampling (0-1).
Frequency Penalty
Penalizes repeated tokens (0-2).
Presence Penalty
Encourages new topics (0-2).
Document Parsing
Parse document (PDF) using one of the options on need basis.
Available options are:
Optimize for Speed
Generate fast parsing response, OCR based processing for Markdown generation including tables, headings, paragraphs by extracting text.
Optimize for Quality
Derive enhanced output of the entire PDF by parsing the file(s) using vision AI.
Optimize for Speed (Include Images)
OCR based parsing along with vision AI for enhanced Markdown output by processing text and embedded image content.
Output Configuration
Options available to provide output configuration.
Output Column
Specify the column where the status/error message will be stored.
Status Column
Specify the column where status/error message will be stored.
Metadata Column
Specify the column where metadata (example: page number, filename) will be stored.
Drop Input Column
Chec to drop the input column from the output after parsing.
Rate Limiting
Option available to provide the detail of rate control.
Rate Control
Specify the max concurrent request in provided time frame (seconds).
Retries
Option available to enable retries.
Enable Retries
Check to enable retries for failed API calls.
Max Retries
Maximum number of retry attempts for failed API calls.
Min Delay(s)
Minimum delay between retries attempts in seconds.
Max Delay(s)
Maximum delay between retry attempts in seconds.
ADD CONFIGURATION
Click to add additional configuration as key-value pairs.
If you have any feedback on Gathr documentation, please email us!