Product overview
The Data Collection Engine (DCE)
A stand-alone, cloud-native Azure service for high-scale data processing with large throughput and a minimal compute footprint. Configure and orchestrate data collection workloads through a documented REST API; extend it with a modular connector framework.
What it does
Extract data from Microsoft 365 (and, over time, other platforms) for collection, archiving, indexing, or streaming into your own systems.
Collection / Archive
Content is extracted to long-term storage such as Azure Blob (S3-compatible storage planned). Associated metadata is captured in Azure Cosmos DB for structured querying.
Indexing
Collected content is indexed for full-text search while metadata stays directly queryable in Cosmos DB for fast lookup and filtering.
Streaming
A streaming interface lets you ingest collected data in real time for downstream processing — indexing, metadata extraction, ZIP/PST extraction, decryption and more.
Restore
Browse collection/archive content and restore it directly back into OneDrive, SharePoint, Teams and Exchange Online — individual items through to entire mailboxes or sites.
Capabilities
Everything is exposed through a well-documented RESTful API.
API Orchestrator
Configure and monitor workflows; full-scan or incremental; target all custodians or a subset; bidirectional collect/restore.
Connectors
Outlook/Exchange, Teams, SharePoint and OneDrive out of the box; an extensible framework for new sources (Google Workspace, Slack planned).
Data storage
Durable low-cost storage with normalized metadata mirrored to Cosmos DB as a rebuildable cache.
Search & query
Full-text and metadata queries over indexed data using Lucene syntax, with paginated responses.
Monitoring & reporting
Live job status (custodians, items, errors) and end-of-run reports retained and consolidated over time.
White-label UI
A themeable reference UI to manage connectors, jobs and policies — embed it in your own administration console.
Architecture & security
Azure-first, multi-tenant, and built for strict isolation.
Per-tenant isolation
All runtime and static resources — container apps, Cosmos DB, Blob Storage and Azure Functions — are provisioned in a dedicated resource group per tenant. The only shared component is the API layer.
API-key authorization
Every endpoint is secured by API keys. Keys are issued only by the SysAdmin; a tenant-scoped key can access only its own tenant's data. Revoking a key disables access immediately.
App-only M365 access
Certificate-based, app-only OAuth grants tenant-wide access to mailboxes, drives and sites. Secrets are stored in Azure Key Vault.
Auto-scaling & cost control
Pre-emptive back-off avoids upstream throttling; runtime components are deprovisioned at the end of each job to minimize compute spend.
Read the docs
Start with the developer guides, or jump straight into the full REST API reference.
Developer documentationAPI docsReady to try it?
Get a dedicated trial tenant and API key, then call the endpoints directly from Swagger.
Start a free trial