Architecture · Build vs Buy

Graph API hands you an endpoint.
AI gives you a prototype.
We deliver the platform that production workloads demand.

A Graph call that reads one mailbox is a weekend. A multi-tenant, throttling-aware, auto-scaling, restore-capable collection platform is a roadmap. This is the system behind DataTapStream — and the equivalent system you would otherwise design, run and staff yourself.

Developer documentationAPI docs

The system, end to end

Five Azure-native services communicate exclusively over durable queues — no service-to-service HTTP. Each worker container is scoped to a single tenant + job and lives only as long as the work.

API orchestrator

A stateless, multi-tenant REST API enqueues typed work (job / custodian / batch) and exposes discovery, monitoring and restore. Routes are tenant-scoped; OData + OpenAPI throughout.

Job & scanner workers

Per-job orchestration runs a reactive pipeline; per-account scanners crawl mailboxes, drives and sites item by item. Fan-out width is the job's parallel-account count.

Queue backbone

Azure Storage Queues decouple every stage and provide natural backpressure. Messages are pulled in batches of up to 32 with resilient, backoff-based retries.

High throughput, low footprint

Concrete defaults, not adjectives.

Reactive streaming pipeline

Backup/restore workflows are System.Reactive observables: discover → validate → schedule runs as a non-blocking stream, so hundreds of custodians flow concurrently instead of marching through a serial loop. I/O is decoupled from queueing via the queue backbone.

Lean workers

Worker containers default to 0.25 vCPU / 0.5 GiB on runtime-only, multi-stage images (no SDK layer) with invariant globalization. Batch-heavy roles size up explicitly; everything else stays small.

Scale-to-zero by default

KEDA Azure-queue scaling drives replica count from queue depth: a replica per ~2 queued messages, activation from zero at ~10 messages, and back to zero when the queue drains. You pay for work, not for idle capacity.

Ephemeral per-job compute

Each job provisions its own container app and scanner queue, then tears them down on completion. No long-lived worker fleet to babysit, right-size, or pay for between runs.

Dynamic provisioning on Azure Resource Manager

Tenants and jobs are infrastructure events, created and destroyed programmatically.

Per-tenant topology

Provisioning spins up a dedicated resource group, serverless Cosmos DB, storage account and Container App environment via ARM — and, on the indexing tier, an Azure AI Search service and a function app.

Full lifecycle

Provision, deprovision and mode-switch (collection ↔ collection+indexing) are first-class, idempotent operations with tracked state — not manual runbooks.

Optimal resource use

Serverless Cosmos bills per request unit, queues cost fractions of a cent per million messages, and normalization runs on consumption-plan functions. Managed identity removes secrets from config entirely.

Subsystem-by-subsystem

What you would build and operate, versus what ships.

SubsystemBuild on Graph yourselfDataTapStream
Graph throttlingYou own per-app + per-tenant back-off, 429/Retry-After handling, adaptive pacing, and re-tuning as limits change.Pre-emptive back-off and throttling mitigation built into the connector layer; resilient retry with exponential backoff on every Graph + queue call.
OrchestrationDesign a job model, work distribution, fan-out/fan-in, and failure recovery from scratch.Reactive (System.Reactive) pipeline: sources are discovered, validated and scheduled as an async observable, decoupled onto durable queues.
ScalingStand up autoscalers, capacity planning, and idle-cost controls; keep workers warm or eat cold-start.KEDA queue-depth scaling on Azure Container Apps — replicas track queue length, wake from zero on demand, and scale back to zero when drained.
ProvisioningWrite and maintain IaC, per-customer resource topology, and teardown logic.Per-tenant resource group, serverless Cosmos, storage, Container App environment and functions created on demand via Azure Resource Manager; full provision / deprovision / mode-switch lifecycle.
IsolationArchitect multi-tenant data isolation, secret management, and blast-radius containment — and defend it in audits.Dedicated resource group per tenant; cert-based app-only OAuth with secrets in Key Vault; revocable, tenant-scoped API keys.
RestoreGraph has no bulk-restore primitive — re-injecting items, folders, mailboxes or sites is a separate project.Bidirectional engine restores content back into OneDrive, SharePoint, Teams and Exchange, from a single item to a whole mailbox or site.
ObservabilityBuild job status, per-custodian metrics, error capture and reporting yourself.Live job status (custodians / items / errors) and retained, consolidated end-of-run reports out of the box.
MaintenanceA standing team owns Graph API drift, throttling changes, and auth deprecations indefinitely.Connectors, scaling and compliance posture are maintained for you behind a stable REST API.

Security & tenancy

Enterprise-grade isolation is structural, not bolted on.

App-only, cert-based OAuth

Tenant-wide access to mailboxes, drives and sites via certificate credentials — no user impersonation, no refresh-token juggling. Certificates load from Key Vault at runtime.

Dedicated blast radius

One resource group per tenant means a provisioning failure or compromise is contained to a single tenant. The only shared component is the stateless API layer.

Revocable API keys

Keys are admin-issued and tenant-scoped; a key reaches only its own tenant's data, and revocation cuts access immediately.

Read the architecture, then call it

The developer guides cover the job model, connectors and policies; the API reference is live. Spin up a trial tenant and exercise the endpoints against real data.

Developer documentationStart a free trial