Marcus Chen, YuSMP Group
Marcus Chen Staff Engineer (Backend & Cloud), YuSMP Group · multi-tenant SaaS architecture on AWS and GCP

The 60-second answer

Three credible isolation models in 2026:

  1. Shared database, shared schema, tenant_id column + Postgres RLS. Default. Cheapest to operate. Supports 10,000+ tenants on a single Aurora or Neon cluster. Migration to schema-per-tenant later is painful but possible.
  2. Schema-per-tenant (separate Postgres schemas in the same database). The enterprise upgrade. Useful for “logical isolation” conversations with auditors without doubling infra cost. Supports 200–2,000 tenants per cluster.
  3. Database-per-tenant. Regulated workloads, single-tenant deployments, customers paying $100k+/year. Expensive to operate, easiest to delete cleanly, easiest to argue for in regulated audits.

Pick (1) for any new B2B SaaS. Add (2) at the business tier when you sign your first deal that asks “is my data isolated from other tenants?” Add (3) only for regulated verticals or VIP deals over $100k ARR.

Three isolation models — tradeoffs

ModelCost/tenantOps complexityTenants/clusterAudit story
Shared schema + RLS$0.05–0.50/moLow10,000+Logical isolation via RLS
Schema-per-tenant$2–8/moMedium200–2,000Schema-level isolation
Database-per-tenant$25–150/moHigh1 per DBStrong physical isolation

Shared schema with RLS — the right default

The shared-schema-with-RLS pattern has become the senior-default for new B2B SaaS in 2026. It works like this:

  1. Every domain table has a non-null tenant_id uuid references tenants(id) column, indexed as part of every common query.
  2. Postgres Row-Level Security is enabled on every domain table.
  3. Each table has an RLS policy: USING (tenant_id = current_setting('app.tenant_id')::uuid).
  4. Every request handler runs SET LOCAL app.tenant_id = '...' at the start of its transaction, derived from the verified auth token.
  5. The database role used by the application has FORCE ROW LEVEL SECURITY set, so even the application cannot bypass the policy without changing role.

The result: a leaked SQL bug in your codebase cannot leak tenant data. The database refuses to return rows from other tenants regardless of what the query asks for. We have caught real bugs this way in customer audits — queries that would have returned other tenants’ rows, blocked by RLS, with a clear error in the logs.

Production tips from our delivery work:

  • Use a connection pooler (PgBouncer in transaction-pooling mode) but be aware that SET LOCAL resets at end of transaction — this is the correct behaviour.
  • For background jobs, set app.tenant_id in the job handler, never trust the job payload alone.
  • Add a smoke test in CI that intentionally tries to read another tenant’s row and asserts that 0 rows return.
  • Composite indexes should lead with tenant_id — e.g., (tenant_id, created_at) — for query plan stability.

Schema-per-tenant — the enterprise upgrade

When you start losing enterprise deals to “is my data in the same database as everyone else?”, schema-per-tenant is the next step. Each tenant gets its own Postgres schema (tenant_a1b2c3) with the full table set. The application sets search_path at the start of the transaction.

Benefits over shared schema:

  • True schema-level isolation. A bug cannot return another tenant’s rows.
  • Easier per-tenant backup / restore (single schema dump).
  • Easier per-tenant deletion (single DROP SCHEMA CASCADE).
  • Auditor-friendly answer: “each tenant has a logically separated schema.”

Costs:

  • Migrations have to run across N schemas, in parallel, with proper failure handling. Tools: Sqitch, Flyway, or a custom orchestrator.
  • Cross-tenant analytics queries become awkward (you cannot easily “SUM across all tenants”). Solve with a separate analytics database fed from CDC.
  • Postgres catalog size grows with N tenants × N tables. Past ~2,000 schemas, you start seeing pg_class bloat affect planner time.

Database-per-tenant — regulated and isolated

For regulated verticals (HealthTech, GovTech, defence, certain financial workloads) and for whale customers paying $100k+/year, give each tenant their own database. Often their own VPC and their own region.

This is expensive: $25–150/month per tenant on AWS RDS or equivalent. The justification is that the audit story becomes trivial, the blast radius of any bug is one customer, and customers signing six-figure contracts will pay for the isolation.

Operationally you need:

  • An IaC pipeline (Terraform, Pulumi) that provisions a new database, runs migrations, configures backups and connects observability for each new tenant.
  • A tenant routing layer (typically a Cloudflare Worker, an API gateway or an in-app subdomain router) that maps tenant ID to database connection string.
  • A connection pool per tenant, with sensible idle-eviction.
  • A robust deprovisioning flow — deleting a database, archiving its backup, and removing its IaC state cleanly.

Auth and identity at scale

In 2026 there is no good reason to roll your own multi-tenant auth. The managed options have all converged on the same feature set:

ProviderStrengthsWatch out for
ClerkBest DX, organisations primitive, SDKs for Next/Remix/ExpoPricing scales hard past 10k MAU
WorkOSEnterprise SSO (SAML/OIDC), SCIM, audit logsLess polished consumer auth UI
Auth0Mature, every integration existsCostly at scale; complex tenant primitive
Supabase AuthFree with Supabase; pairs with Postgres RLSOrg/tenant model is DIY
FusionAuth / KeycloakSelf-host, GDPR-friendlyYou operate it

Common pattern in 2026: Clerk or Supabase for the early stage; add WorkOS at the business tier for enterprise SSO/SCIM. Token verification at the API layer extracts tenant ID and user role; the request handler sets the Postgres GUC.

Billing, metering and reconciliation

One Stripe Customer per tenant. Tenants link to Stripe customers via a stored stripe_customer_id. Subscriptions are managed in Stripe Billing. For usage-based components:

  • Emit usage events from the application to an event bus (SNS, Kafka, NATS) tagged with tenant ID, meter name, quantity, idempotency key.
  • A worker aggregates per-tenant usage into your internal ledger (Postgres table).
  • A reconciliation job runs nightly, pushes deltas to Stripe Meters (or Orb/Metronome/Lago) and writes a daily reconciliation report.
  • Surface usage to customers in-product — live, with at most 60s lag. Customers will not trust an opaque bill.

We covered the strategic side of pricing in SaaS pricing models in 2026; this is the engineering side. Plan 6–10 weeks for production-grade metered billing — not two.

EU data residency the right way

If you sell into the EU (especially regulated industries or public sector), “data stays in the EU” is a procurement checkbox, not a marketing claim. The right architecture:

  1. Two separate clusters: us-east-1 and eu-central-1 (or eu-west-1 for Ireland, eu-north-1 for Sweden).
  2. Each cluster has its own Postgres, blob storage, queues, search, observability stack.
  3. Tenant-to-cluster mapping in a small global registry. Login flow redirects to the correct regional API.
  4. Auth provider must support EU residency (Clerk EU, WorkOS EU, Auth0 EU tenant).
  5. Observability vendor must support EU (Datadog EU, Sentry EU, Grafana Cloud EU).
  6. Do not split rows in a single database by region — auditors do not accept it.

If you build with AI features, also consider the LLM provider region: Anthropic offers EU endpoints; Bedrock supports eu-central-1 for Claude; Azure OpenAI offers EU regions for GPT-4o. See EU AI Act compliance for the documentation side.

Observability and per-tenant SLOs

In multi-tenant SaaS, “the API is slow” is meaningless without per-tenant slicing. Add tenant ID as a first-class dimension to every metric, log and trace:

  • OpenTelemetry traces with tenant.id attribute on every span.
  • Sentry tag tenant_id on every error.
  • Datadog/Grafana dashboards that slice p50/p95/p99 latency and error rate by tenant.
  • Top-10 noisy tenants report — daily, automated.
  • Per-tenant SLOs for your top 20 accounts. When their SLO budget is at risk, alert their CSM, not just engineering.

Without per-tenant observability, a single noisy enterprise tenant degrading p95 looks like a global incident. With it, you can call the customer first.

Per-tenant observability dashboard
Per-tenant slicing turns “the API is slow” into “tenant X is hitting a query plan regression on the orders index since Tuesday.”

Migration paths and gotchas

You will eventually migrate. Two real-world paths we have shipped:

  1. Shared schema → schema-per-tenant. Doable but painful. New tenants land in schema-per-tenant; existing tenants migrate in batches via dual-write + verification + cutover. Expect 6–14 weeks of engineering for a 500-tenant migration.
  2. Single region → multi-region (EU residency). Build the EU cluster first, route new EU customers to it, migrate existing EU customers via export + import + cutover with a maintenance window. 8–16 weeks.

Gotchas we have hit repeatedly:

  • UUIDs without v7. Random UUIDs (v4) thrash B-tree indexes. Use UUID v7 (time-ordered) for all new primary keys.
  • Missing tenant_id on side tables. Audit log, attachments, webhook delivery records — all need tenant_id and all need RLS.
  • Long-running tenants with bloated data. One tenant with 10M rows in a table where the median has 5k slows everyone via shared buffers. Move them to schema-per-tenant or DB-per-tenant.
  • Stripe customer drift. A tenant deletion that does not cancel Stripe subscriptions leaves zombie revenue and accounting headaches.
  • Background job leakage. A job triggered for tenant A that queries tenant B because the job runner forgot to set the GUC. Always set the GUC inside the job handler, never trust the payload.

FAQ

What is the best tenant isolation strategy for a new SaaS?

Shared schema + Postgres RLS, with tenant_id on every table and the GUC set per request. Easiest to operate, cheapest to scale, easiest to migrate later.

How do I handle auth in a multi-tenant SaaS?

Use a managed provider (Clerk, Auth0, WorkOS, Supabase Auth). Add WorkOS for enterprise SSO/SCIM at the business tier.

Where should tenant_id live in the data model?

Every table, indexed, foreign-keyed, enforced by RLS. Always.

How do I handle EU data residency?

Separate EU and US clusters with their own databases, storage and observability. Route by tenant. Do not split rows within a single database.

Serverless or container runtime?

Containers (Fargate, GKE Autopilot, Fly.io) for the main API; serverless for edges like webhooks and image processing.

How do I bill across tenants cleanly?

One Stripe Customer per tenant, Subscriptions linked to internal tenant_id, metered events via Stripe Meters or Orb/Metronome. Reconcile nightly.

Build it right the first time

Tell us about your tenancy model, isolation needs, and audit story. We will tell you the cheapest architecture that survives 100× growth.

Last updated 26 May 2026.