AI Parser Processor

The AI Parser Processor in Gathr efficiently extracts and organizes valuable information from PDF documents, enhancing data processing capabilities.

Input Configurations

Input configurations for the AI Powered PDF Parser as explained below.

Input Column

Provide the column containing the PDF data to be parsed.

Base64 Encoded

Select if the input PDF data is encoded in Base64 format.

Password-Protected

Select if the PDF is password protected.

Fetch Password

Fetch password from either existing column or Inline of the PDF.

Column

Fetch password from existing column’s row of the pdf.

Enter Password

Specify the column containing the password for password protected PDF. Provide a hardcoded password of the file.

Split PDF into Pages

Check to process each page separately (page wise output).

AI Provider

Select the AI provider for document parsing. Available options are: Gathr Managed and User Provided.

Upon selecting User Provided option as AI Provider, provide the below details.

LLM Provider

Select the LLM provider for text processing. Available options are MistralAI and OpenAI.

Connection Name

Connection name is to be selected out of the list of saved connections or it can be created with the Add New Connection option. Selected connection will be used to connect with the source.

For further details, refer to Mistral AI Connection and OpenAI Connection to know more about creating the connections.

Model

Select the AI model for text extraction.

Prompt for Extraction

Enter the prompt for text extraction.

Max Tokens

Maximum tokens for model output.

Temperature

Controls randomness of the mode output (0-2).

Top P

Controls diversity via necleus sampling (0-1).

Frequency Penalty

Penalizes repeated tokens (0-2).

Presence Penalty

Encourages new topics (0-2).

Document Parsing

Parse document (PDF) using one of the options on need basis.

Available options are:

Optimize for Speed

Generate fast parsing response, OCR based processing for Markdown generation including tables, headings, paragraphs by extracting text.

Optimize for Quality

Derive enhanced output of the entire PDF by parsing the file(s) using vision AI.

Optimize for Speed (Include Images)

OCR based parsing along with vision AI for enhanced Markdown output by processing text and embedded image content.

Output Configuration

Options available to provide output configuration.

Output Column

Specify the column where the status/error message will be stored.

Status Column

Specify the column where status/error message will be stored.

Metadata Column

Specify the column where metadata (example: page number, filename) will be stored.

Drop Input Column

Chec to drop the input column from the output after parsing.

Rate Limiting

Option available to provide the detail of rate control.

Rate Control

Specify the max concurrent request in provided time frame (seconds).

Retries

Option available to enable retries.

Enable Retries

Check to enable retries for failed API calls.

Max Retries

Maximum number of retry attempts for failed API calls.

Min Delay(s)

Minimum delay between retries attempts in seconds.

Max Delay(s)

Maximum delay between retry attempts in seconds.

ADD CONFIGURATION

Click to add additional configuration as key-value pairs.

Top