Website Domain Connector

Transform your entire website into an intelligent conversational AI by connecting your domain URL. Our web crawler processes your site content and maintains fresh information through automated daily updates.

Input Requirements

The Website Domain connector requires minimal setup - simply provide your website's base domain URL and our system handles the rest.

Base Domain URLRequired

Provide the root domain of your website (e.g., https://yourcompany.com). The system will automatically discover and crawl all accessible pages from this starting point.

https://yourcompany.com

Website AccessibilityRequired

Ensure your website is publicly accessible and not behind authentication walls. The crawler needs to access content without login credentials.

Content Structure

Websites with clear navigation structure, proper HTML semantics, and well-organized content produce better AI chat experiences.

How Website Crawling Works

Our intelligent web crawling system processes your website content through a multi-stage pipeline designed to capture and understand your site's knowledge.

1

Initial Discovery

The system starts with your provided domain URL and discovers all accessible pages through link analysis and sitemap processing.

  • Reads and processes your sitemap.xml file
  • Follows internal links to discover additional pages
  • Respects robots.txt crawling guidelines
  • Identifies content-rich pages for prioritization
2

Content Extraction

Each discovered page undergoes intelligent content extraction that focuses on meaningful text while filtering out navigation and promotional elements.

  • Extracts main content areas using advanced HTML parsing
  • Filters out navigation menus, footers, and sidebars
  • Processes headings, paragraphs, and structured content
  • Maintains content hierarchy and relationships
3

Content Processing

Extracted content is processed and optimized for conversational AI through cleaning, conversion to markdown, and semantic analysis.

  • Cleans the crawled html content
  • Converts html to markdown format with emphasis on text content
  • Creates semantic embeddings for AI understanding
  • Optimizes content for retrieval and generation
4

Index Integration

Processed content is integrated into your AI chat's knowledge base, making it immediately available for conversational interactions.

  • Adds content to the conversational knowledge base
  • Enables source attribution for transparency
  • Integrates with existing knowledge sources
  • Activates real-time chat capabilities

Setting Up Your Website Connector

Get started with the Website Domain connector in just a few simple steps.

1

Enter your website's base domain URL

2

Configure crawling preferences in your robots.txt

3

For best results, ensure availability of sitemap.xml

4

Start the crawling process

5

Monitor and wait for indexing progress to complete

6

Begin conversational interactions

Daily Auto-Refresh System

Your AI chat stays current with your website through our automated daily refresh system that monitors and updates content changes.

Website Configuration Best Practices

Following these best practices ensures optimal content discovery and processing for your AI chat system.

Robots.txt Configuration

Proper robots.txt configuration guides our crawler to the most relevant content while respecting your crawling preferences.

Best Practices:

  • Allow crawler access to important content sections
  • Include sitemap location in robots.txt

Example:

User-agent: *
Allow: /
Disallow: /admin/
Disallow: /login/
Disallow: /private/
Sitemap: https://yoursite.com/sitemap.xml

XML Sitemap Optimization

A comprehensive XML sitemap helps our crawler discover all important pages and understand your site structure.

Best Practices:

  • Include all important content pages
  • Keep sitemap updated with new content

Example:

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>https://yoursite.com/</loc>
    <lastmod>2024-01-15</lastmod>
    <priority>1.0</priority>
  </url>
  <url>
    <loc>https://yoursite.com/products/</loc>
    <lastmod>2024-01-10</lastmod>
    <priority>0.8</priority>
  </url>
</urlset>

Content Structure Guidelines

Well-structured content with sufficient text content improves AI understanding and response quality.

Best Practices:

  • Descriptions work better than illustrations
  • Use alt text for images and meaningful link text