How can we help you?

Knowledge Index & File Storage Guide

Knowledge Index & File Storage Guide

How Velaro organizes your knowledge

Velaro uses Azure AI Search to power bot answers, Agent Assist suggestions, and auto-learned Q&A. Every site gets an index automatically when you connect a data source (scraper, file upload, KB article, or conversation training). You do not need to create or configure an index — it is provisioned the first time data is ingested.

Shared vs. dedicated physical index

Plan Index type What it means
Starter / Professional Shared Your data lives in a multi-tenant physical index, isolated by your site ID. Fast setup, no cost overhead.
Enterprise Dedicated Your own physical Azure AI Search index. Fully isolated, independently scalable, can use a different embedding model.

Shared indexes are segmented so no site can read another site's data — isolation is enforced at query time by a mandatory site_id filter. If you upgrade from Professional to Enterprise, Velaro migrates your data to a dedicated index in the background and reconnects all your resources (scraper, files, KB articles, conversation training) automatically.

Virtual namespaces within your index

Within your index (shared or dedicated), Velaro separates data by source type using an internal index_name field:

Namespace What it contains
scraper_pages Web pages scraped from your site; uploaded documents (AI type)
kb_articles Knowledge base articles you publish in Velaro
conv_training High-quality Q&A pairs extracted from past conversations
youtube_content YouTube transcript content (if connected)

The bot queries across all namespaces that are enabled for your subscription, ranking results by relevance. You do not need to manage these namespaces directly.

When to use a separate index for different content areas

Most sites use one index for all their content. Consider requesting a separate index if:

  • You have two fundamentally different business lines with completely separate audiences (example: personal auto insurance vs. commercial liability policies) and you want to guarantee no cross-contamination in bot answers.
  • You have a large Enterprise deployment where one product line's data should never surface in another product line's chat widget.

For most segmentation needs — like separating support content from sales content — the bot's workflow system and channel-specific prompts handle this without a separate index. Talk to your account manager if you think separate indexes would help your situation.

File storage: AI files vs. general files

When you upload a file in Velaro, you choose its purpose:

AI files (uploaded as "AI" type)

  • Immediately indexed into your knowledge base vector index.
  • The bot can retrieve excerpts from these files to answer visitor questions.
  • The bot uses the content of the file to generate answers — it does not send the file as an attachment.
  • Use for: product manuals, policy documents, FAQ sheets, training materials you want the bot to draw on.
  • Limit: 25 MB per file. Plan quota applies (Starter: 50 files, Professional: 500, Enterprise: 5,000).

General files (uploaded as "General" type)

  • Stored in Velaro's file library but NOT indexed for AI retrieval.
  • Agents can manually attach or share a link to these files in any conversation.
  • The bot does not automatically surface general files — an agent selects and sends them.
  • Use for: contracts, price sheets, signed forms, any document that should only be shared intentionally by a human agent.
  • Limit: 20 MB per file.

How to decide: AI file vs. general file

Scenario Use
Product spec sheet you want the bot to quote from AI file
Contract template an agent sends after a sale General file
FAQ document with common support answers AI file
Price list that changes weekly and must be reviewed before sharing General file
Installation manual the bot should summarize on request AI file
Signed agreement the customer requested a copy of General file

A file cannot be both types at once. If you need the bot to answer questions from a document AND agents to send it as a link, upload it twice — once as AI and once as General.

Conversation training (auto-learn)

When conversation training is enabled, Velaro automatically extracts Q&A pairs from resolved conversations that meet a quality bar (minimum 4 messages, agent participated, CSAT ≥ 4 when collected). These pairs are indexed under the conv_training namespace and used to improve bot answers over time.

Conversation training data is:

  • Isolated to your site — no cross-tenant sharing.
  • Weighted alongside your other knowledge sources at query time.
  • Automatically deduplicated — the same conversation is never re-indexed if it has not changed.

This feature requires the Conversation Training subscription add-on.

Reindexing and data freshness

Source How often re-indexed
Scraper (web pages) Per your plan: Starter monthly, Professional weekly, Enterprise daily
Uploaded AI files Immediately on upload; no automatic re-scrape
KB articles Immediately on publish
Conversation training Within minutes of conversation resolve

If you delete a file or unpublish a KB article, it is removed from the index at the next scheduled scrape or immediately for KB articles.

Was this article helpful?