Atri AI Documentation

Overview

The Documents Upload connector enables you to create conversational AI experiences from your existing documents. By uploading files directly to the platform, you can quickly establish a knowledge base that powers intelligent conversations without the need for complex integrations or technical setup. This connector represents the most straightforward path to creating intelligent conversational experiences from your existing documentation, manuals, reports, and other text-based content.

Input Requirements & Specifications

Supported File Formats

Our document processing pipeline supports the most commonly used business document formats, ensuring broad compatibility with your existing content library. Each format is processed using specialized extraction techniques optimized for that particular file type, ensuring maximum content fidelity and search effectiveness.

📄

PDF

Portable Document Format files with text extraction capabilities

📝

DOCX

Microsoft Word documents with full formatting preservation

📃

TXT

Plain text files for simple, unformatted content

File Size & Processing Limits

To ensure optimal performance and processing efficiency, the following limits apply to document uploads:

Maximum File Size

25MB per file

Individual file size limit to ensure efficient processing

Total Upload Capacity

Varies by plan

Aggregate storage limits based on your subscription tier

Processing Time

1-5 minutes

Average time for document indexing and embedding generation

Concurrent Uploads

Multiple supported

Upload multiple documents simultaneously for faster setup

Information Extraction & Processing Pipeline

Our document processing pipeline is a multi-stage approach that transforms static documents into intelligent, searchable knowledge bases.

Document Ingestion

The initial stage involves secure document upload and format validation, ensuring that all supported file types are properly received and prepared for processing.

This stage encompasses secure file upload, format validation, and metadata extraction including file properties and structure.

Content Extraction

Advanced text extraction techniques preserve document structure while identifying key information elements that will enhance search and conversation capabilities.

This stage involves document conversion to text-friendly formats for extraction of text data, and text extraction with formatting and structure preservation.

Semantic Analysis

The extracted content undergoes semantic analysis to understand context, relationships, and meaning, enabling more intelligent conversational responses.

The main goal of this stage is to map, transform and chunk the content for optimal embedding generation.

Vector Embedding Generation

The final processing stage creates high-dimensional vector representations that capture semantic meaning and enable intelligent retrieval during conversations.

Advanced embedding models are used for semantic representation and the embedded content is similarity indexed for efficient retrieval.

Key Features & Capabilities

Intelligent Content Chunking

Documents are automatically segmented into meaningful chunks that preserve context while optimizing for search and retrieval performance.

Key benefits include:

Maintains document structure and flow
Optimizes chunk sizes for embedding generation
Preserves cross-references and relationships
Enhances retrieval accuracy and relevance

Multi-Format Processing

Comprehensive support for business document formats ensures broad compatibility with your existing content ecosystem.

Key benefits include:

Unified processing pipeline for all supported formats
Consistent extraction quality across file types
Preservation of document-specific formatting
Seamless integration of mixed-format document libraries

Metadata Enrichment

Documents are enhanced with extracted metadata that improves search accuracy and provides additional context for conversations.

Key benefits include:

Automatic extraction of document properties
Easy referencing and citations
Relationship mapping between documents
Enhanced search and filtering capabilities

Future Multi-Modal Support

While currently focused on text-based document processing, our roadmap includes expanding capabilities to support richer, multi-modal content experiences. These planned enhancements will significantly expand the types of content that can be processed and the richness of conversational experiences.

Image and Chart Processing

Coming Soon

Advanced OCR and image analysis will extract information from charts, graphs, and diagrams embedded within documents.

Audio Transcript Integration

Under Development

Support for audio files with automatic transcription and integration into the knowledge base for voice-based content.

Video Content Analysis

Planned

Video file processing with transcript extraction and visual content analysis for comprehensive multimedia support.

Current Limitations & Considerations

Auto-Refresh Not Supported

Unlike some other connectors, the Documents Upload connector does not support automatic synchronization. Document updates require manual re-upload and processing.

Static Content Processing

The connector processes documents as static content snapshots, without dynamic linking or real-time updates from source systems.

Text-Based Processing

Current processing focuses primarily on text content, with limited support for complex visual elements or multimedia components.

Best Practices for Document Upload

Document Preparation

Ensure documents are text-searchable rather than image-only scans
Use consistent naming conventions for easy identification
Remove or redact sensitive information before upload

Content Optimization

Include comprehensive metadata in document properties
Avoid duplicate naming of documents
Avoid excessive formatting that might interfere with text extraction

Knowledge Base Management

Regularly review and update document collections
Remove outdated or redundant documents
Monitor conversation quality to identify content gaps

Documents Upload Connector

Overview

Input Requirements & Specifications

Supported File Formats

PDF

DOCX

TXT

File Size & Processing Limits

Maximum File Size

Total Upload Capacity

Processing Time

Concurrent Uploads

Information Extraction & Processing Pipeline

Document Ingestion

Content Extraction

Semantic Analysis

Vector Embedding Generation

Key Features & Capabilities

Intelligent Content Chunking

Multi-Format Processing

Metadata Enrichment

Future Multi-Modal Support

Image and Chart Processing

Audio Transcript Integration

Video Content Analysis

Current Limitations & Considerations

Auto-Refresh Not Supported

Static Content Processing

Text-Based Processing

Best Practices for Document Upload

Document Preparation

Content Optimization

Knowledge Base Management