Knowledge Index & File Storage Guide

Last updated May 7, 2026 · 59 views · 8 min read

Knowledge Index & File Storage Guide

How Velaro organizes your knowledge

Velaro uses Azure AI Search to power bot answers, Agent Assist suggestions, and auto-learned Q&A. Every site gets an index automatically when you connect a data source (scraper, file upload, KB article, or conversation training). You do not need to create or configure an index — it is provisioned the first time data is ingested.

Shared vs. dedicated physical index

Plan	Index type	What it means
Starter / Professional	Shared	Your data lives in a multi-tenant physical index, isolated by your site ID. Fast setup, no cost overhead.
Enterprise	Dedicated	Your own physical Azure AI Search index. Fully isolated, independently scalable, can use a different embedding model.

Shared indexes are segmented so no site can read another site's data — isolation is enforced at query time by a mandatory site_id filter. If you upgrade from Professional to Enterprise, Velaro migrates your data to a dedicated index in the background and reconnects all your resources (scraper, files, KB articles, conversation training) automatically.

Virtual namespaces within your index

Within your index (shared or dedicated), Velaro separates data by source type using an internal index_name field:

Namespace	What it contains
`scraper_pages`	Web pages scraped from your site; uploaded documents (AI type)
`kb_articles`	Knowledge base articles you publish in Velaro
`conv_training`	High-quality Q&A pairs extracted from past conversations
`youtube_content`	YouTube transcript content (if connected)

The bot queries across all namespaces that are enabled for your subscription, ranking results by relevance. You do not need to manage these namespaces directly.

Named knowledge indexes — one or many?

You can create multiple named knowledge indexes (your plan limit is shown in the top-right of the Knowledge Index page). By default, bots search all indexes simultaneously. If you assign a specific index to a bot, that bot searches only its assigned index.

Use one index for most situations. A single index handles multiple bots, multiple topics, and large content libraries without any extra configuration. All your bots can query it at the same time.

Use a separate index when content from two areas could contaminate each other's answers. The classic case is serving multiple organizations whose documents are similar in structure but must never mix — for example, two insurance companies whose liability policy language overlaps, or a staffing agency managing separate client handbooks. In these cases, a dedicated index per organization guarantees that a question asked by one organization's visitors can only surface that organization's documents.

Separate indexes are not needed to separate support content from sales content, or to give different bots different focus areas — use the bot's Knowledge Sources panel and workflow system for that instead.

Bots cannot query across indexes: if two bots ever need to combine or compare answers, they must share one index.

How bots are routed to indexes

By default, a bot queries all indexes. To pin a bot to a specific index, go to the bot settings → Knowledge tab → select the index from the Knowledge Index dropdown. Once pinned, the bot ignores all other indexes regardless of what is configured at the account level.

File storage: AI files vs. general files

When you upload a file in Velaro, you choose its purpose:

AI files (uploaded as "AI" type)

Immediately indexed into your knowledge base vector index.
The bot can retrieve excerpts from these files to answer visitor questions.
The bot uses the content of the file to generate answers — it does not send the file as an attachment.
Use for: product manuals, policy documents, FAQ sheets, training materials you want the bot to draw on.
File size limit: 25 MB per file. Page and file count limits vary by plan (see your plan's Usage page).

General files (uploaded as "General" type)

Stored in Velaro's file library but NOT indexed for AI retrieval.
Agents can manually attach or share a link to these files in any conversation.
The bot does not automatically surface general files — an agent selects and sends them.
Use for: contracts, price sheets, signed forms, any document that should only be shared intentionally by a human agent.
File size limit: 20 MB per file.

How Velaro meters knowledge ingestion

Velaro tracks ingestion by pages processed (across uploaded files and web scraper) and total indexed chunks — not by raw file count alone. Your plan includes a monthly page budget and a total storage ceiling. When you approach a limit, the admin dashboard shows your current usage on the Knowledge → Usage tab. If your use case requires significantly more capacity (for example, a large document corpus across thousands of files), contact your account manager to discuss a custom arrangement.

What counts as a page:

Uploaded files (PDF, DOCX, XLSX, etc.): 1 page = 1 document page as reported by the file's page count
Web scraper: 1 page = 1 URL crawled
Both count against the same monthly page budget

Per-file page cap: Your plan limits how many pages can be indexed from a single file. Pages beyond that cap are not indexed — the first N pages are ingested and the rest are skipped. This prevents a single large document from consuming your entire monthly budget in one upload. If you regularly work with very large documents, contact your account manager.

Controlling what each bot searches (Knowledge Sources)

By default, a bot searches all knowledge sources that are enabled on your plan — website content, Knowledge Base articles, product catalogs, conversation learning, and any files you've uploaded. You can narrow this per bot.

Example: Support bot — check only "Your Website & Files" and "Knowledge Base Articles." Uncheck "Shopify Products" so the bot doesn't surface product listings when customers ask support questions.

Example: Sales bot — check only "Shopify Products" and "Your Website & Files." The bot focuses on your catalog and site content rather than support articles.

To configure knowledge sources: go to your bot → Training tab → Knowledge Sources panel. Each source shows whether it's available on your plan. Grayed-out sources require a plan upgrade. Your bot's own training data (anything you add under the Training Data section) is always searched regardless of these settings.

How to decide: AI file vs. general file

Scenario	Use
Product spec sheet you want the bot to quote from	AI file
Contract template an agent sends after a sale	General file
FAQ document with common support answers	AI file
Price list that changes weekly and must be reviewed before sharing	General file
Installation manual the bot should summarize on request	AI file
Signed agreement the customer requested a copy of	General file

A file cannot be both types at once. If you need the bot to answer questions from a document AND agents to send it as a link, upload it twice — once as AI and once as General.

Conversation training (auto-learn)

When conversation training is enabled, Velaro automatically extracts Q&A pairs from resolved conversations that meet a quality bar (minimum 4 messages, agent participated, CSAT ≥ 4 when collected). These pairs are indexed under the conv_training namespace and used to improve bot answers over time.

Conversation training data is:

Isolated to your site — no cross-tenant sharing.
Weighted alongside your other knowledge sources at query time.
Automatically deduplicated — the same conversation is never re-indexed if it has not changed.

This feature requires the Conversation Training subscription add-on.

Reindexing and data freshness

Source	How often re-indexed
Scraper (web pages)	Per your plan: Starter monthly, Professional weekly, Enterprise daily
Uploaded AI files	Immediately on upload; no automatic re-scrape
KB articles	Immediately on publish
Conversation training	Within minutes of conversation resolve

If you delete a file or unpublish a KB article, it is removed from the index at the next scheduled scrape or immediately for KB articles.

Product comparison and quoting from your knowledge index

Once your product catalog, pricing sheets, or spec documents are indexed, your bot can use them for common commerce tasks without any additional configuration:

Product comparison — the bot can compare 2–4 products side by side, pulling specs, pricing, and features from whatever you've indexed (uploaded PDFs, scraped catalog pages, Shopify/BigCommerce products). A visitor saying "what's the difference between Model A and Model B?" triggers a formatted comparison table.

Pricing lookup — the bot retrieves pricing details from your indexed documents, including tiered pricing, bundle pricing, and discounts if those details are in your content.

Quote assistance — for connected ecommerce platforms (BigCommerce, Magento), the bot can initiate a quote directly in the platform. For businesses without a platform connection, the bot can summarize available options and hand off to an agent to finalize.

These capabilities are available under the Quote Maker add-on. No custom development is required — index your content and the bot handles the rest. If your pricing is complex (configurable options, volume tiers, negotiated contracts), upload a structured pricing sheet as an AI file and the bot will reference it.

Was this article helpful?

How can we help you?

Knowledge Index & File Storage Guide

Knowledge Index & File Storage Guide

How Velaro organizes your knowledge

Shared vs. dedicated physical index

Virtual namespaces within your index

Named knowledge indexes — one or many?

How bots are routed to indexes

File storage: AI files vs. general files

AI files (uploaded as "AI" type)

General files (uploaded as "General" type)

How Velaro meters knowledge ingestion

Controlling what each bot searches (Knowledge Sources)

How to decide: AI file vs. general file

Conversation training (auto-learn)

Reindexing and data freshness

Product comparison and quoting from your knowledge index

Related articles