Architecture · Build vs Buy
Graph API hands you an endpoint.
AI gives you a prototype.
We deliver the platform that production workloads demand.
A Graph call that reads one mailbox is a weekend. A multi-tenant, throttling-aware, auto-scaling, restore-capable collection platform is a roadmap. This is the system behind DataTapStream — and the equivalent system you would otherwise design, run and staff yourself.
Developer documentationAPI docsThe system, end to end
Five Azure-native services communicate exclusively over durable queues — no service-to-service HTTP. Each worker container is scoped to a single tenant + job and lives only as long as the work.
API orchestrator
A stateless, multi-tenant REST API enqueues typed work (job / custodian / batch) and exposes discovery, monitoring and restore. Routes are tenant-scoped; OData + OpenAPI throughout.
Job & scanner workers
Per-job orchestration runs a reactive pipeline; per-account scanners crawl mailboxes, drives and sites item by item. Fan-out width is the job's parallel-account count.
Queue backbone
Azure Storage Queues decouple every stage and provide natural backpressure. Messages are pulled in batches of up to 32 with resilient, backoff-based retries.
High throughput, low footprint
Concrete defaults, not adjectives.
Reactive streaming pipeline
Backup/restore workflows are System.Reactive observables: discover → validate → schedule runs as a non-blocking stream, so hundreds of custodians flow concurrently instead of marching through a serial loop. I/O is decoupled from queueing via the queue backbone.
Lean workers
Worker containers default to 0.25 vCPU / 0.5 GiB on runtime-only, multi-stage images (no SDK layer) with invariant globalization. Batch-heavy roles size up explicitly; everything else stays small.
Scale-to-zero by default
KEDA Azure-queue scaling drives replica count from queue depth: a replica per ~2 queued messages, activation from zero at ~10 messages, and back to zero when the queue drains. You pay for work, not for idle capacity.
Ephemeral per-job compute
Each job provisions its own container app and scanner queue, then tears them down on completion. No long-lived worker fleet to babysit, right-size, or pay for between runs.
Dynamic provisioning on Azure Resource Manager
Tenants and jobs are infrastructure events, created and destroyed programmatically.
Per-tenant topology
Provisioning spins up a dedicated resource group, serverless Cosmos DB, storage account and Container App environment via ARM — and, on the indexing tier, an Azure AI Search service and a function app.
Full lifecycle
Provision, deprovision and mode-switch (collection ↔ collection+indexing) are first-class, idempotent operations with tracked state — not manual runbooks.
Optimal resource use
Serverless Cosmos bills per request unit, queues cost fractions of a cent per million messages, and normalization runs on consumption-plan functions. Managed identity removes secrets from config entirely.
Subsystem-by-subsystem
What you would build and operate, versus what ships.
| Subsystem | Build on Graph yourself | DataTapStream |
|---|---|---|
| Graph throttling | You own per-app + per-tenant back-off, 429/Retry-After handling, adaptive pacing, and re-tuning as limits change. | Pre-emptive back-off and throttling mitigation built into the connector layer; resilient retry with exponential backoff on every Graph + queue call. |
| Orchestration | Design a job model, work distribution, fan-out/fan-in, and failure recovery from scratch. | Reactive (System.Reactive) pipeline: sources are discovered, validated and scheduled as an async observable, decoupled onto durable queues. |
| Scaling | Stand up autoscalers, capacity planning, and idle-cost controls; keep workers warm or eat cold-start. | KEDA queue-depth scaling on Azure Container Apps — replicas track queue length, wake from zero on demand, and scale back to zero when drained. |
| Provisioning | Write and maintain IaC, per-customer resource topology, and teardown logic. | Per-tenant resource group, serverless Cosmos, storage, Container App environment and functions created on demand via Azure Resource Manager; full provision / deprovision / mode-switch lifecycle. |
| Isolation | Architect multi-tenant data isolation, secret management, and blast-radius containment — and defend it in audits. | Dedicated resource group per tenant; cert-based app-only OAuth with secrets in Key Vault; revocable, tenant-scoped API keys. |
| Restore | Graph has no bulk-restore primitive — re-injecting items, folders, mailboxes or sites is a separate project. | Bidirectional engine restores content back into OneDrive, SharePoint, Teams and Exchange, from a single item to a whole mailbox or site. |
| Observability | Build job status, per-custodian metrics, error capture and reporting yourself. | Live job status (custodians / items / errors) and retained, consolidated end-of-run reports out of the box. |
| Maintenance | A standing team owns Graph API drift, throttling changes, and auth deprecations indefinitely. | Connectors, scaling and compliance posture are maintained for you behind a stable REST API. |
Security & tenancy
Enterprise-grade isolation is structural, not bolted on.
App-only, cert-based OAuth
Tenant-wide access to mailboxes, drives and sites via certificate credentials — no user impersonation, no refresh-token juggling. Certificates load from Key Vault at runtime.
Dedicated blast radius
One resource group per tenant means a provisioning failure or compromise is contained to a single tenant. The only shared component is the stateless API layer.
Revocable API keys
Keys are admin-issued and tenant-scoped; a key reaches only its own tenant's data, and revocation cuts access immediately.
Read the architecture, then call it
The developer guides cover the job model, connectors and policies; the API reference is live. Spin up a trial tenant and exercise the endpoints against real data.
Developer documentationStart a free trial