Product overview

The Data Collection Engine (DCE)

A stand-alone, cloud-native Azure service for high-scale data processing with large throughput and a minimal compute footprint. Configure and orchestrate data collection workloads through a documented REST API; extend it with a modular connector framework.

What it does

Extract data from Microsoft 365 (and, over time, other platforms) for collection, archiving, indexing, or streaming into your own systems.

Collection / Archive

Content is extracted to long-term storage such as Azure Blob (S3-compatible storage planned). Associated metadata is captured in Azure Cosmos DB for structured querying.

Indexing

Collected content is indexed for full-text search while metadata stays directly queryable in Cosmos DB for fast lookup and filtering.

Streaming

A streaming interface lets you ingest collected data in real time for downstream processing — indexing, metadata extraction, ZIP/PST extraction, decryption and more.

Restore

Browse collection/archive content and restore it directly back into OneDrive, SharePoint, Teams and Exchange Online — individual items through to entire mailboxes or sites.

Capabilities

Everything is exposed through a well-documented RESTful API.

API Orchestrator

Configure and monitor workflows; full-scan or incremental; target all custodians or a subset; bidirectional collect/restore.

Connectors

Outlook/Exchange, Teams, SharePoint and OneDrive out of the box; an extensible framework for new sources (Google Workspace, Slack planned).

Data storage

Durable low-cost storage with normalized metadata mirrored to Cosmos DB as a rebuildable cache.

Search & query

Full-text and metadata queries over indexed data using Lucene syntax, with paginated responses.

Monitoring & reporting

Live job status (custodians, items, errors) and end-of-run reports retained and consolidated over time.

White-label UI

A themeable reference UI to manage connectors, jobs and policies — embed it in your own administration console.

Architecture & security

Azure-first, multi-tenant, and built for strict isolation.

Per-tenant isolation

All runtime and static resources — container apps, Cosmos DB, Blob Storage and Azure Functions — are provisioned in a dedicated resource group per tenant. The only shared component is the API layer.

API-key authorization

Every endpoint is secured by API keys. Keys are issued only by the SysAdmin; a tenant-scoped key can access only its own tenant's data. Revoking a key disables access immediately.

App-only M365 access

Certificate-based, app-only OAuth grants tenant-wide access to mailboxes, drives and sites. Secrets are stored in Azure Key Vault.

Auto-scaling & cost control

Pre-emptive back-off avoids upstream throttling; runtime components are deprovisioned at the end of each job to minimize compute spend.

Read the docs

Start with the developer guides, or jump straight into the full REST API reference.

Developer documentationAPI docs

Ready to try it?

Get a dedicated trial tenant and API key, then call the endpoints directly from Swagger.

Start a free trial