TL;DR — what migration actually involves
Migrating an enterprise monolith to microservices is not primarily a coding exercise. It is an organisational, data and operational transformation that takes 18–36 months for a large system. Here is what you are actually signing up for:
- Service extraction: identifying bounded contexts and extracting them one at a time using the strangler fig pattern
- Data decomposition: breaking the shared monolithic database into service-owned databases — the hardest technical problem in any migration
- Org alignment: restructuring teams around services (Conway's Law), otherwise your code structure will revert to match your org chart
- Operational maturity: distributed tracing, service mesh, independent CI/CD pipelines and on-call runbooks — all prerequisites, not afterthoughts
- Cost: typically 15–25% of the original system's build cost per year of migration for a large enterprise monolith
When to migrate (and when not to)
The single most important architectural question is not how to migrate, but whether to. Many enterprise monoliths should not be migrated — at least not fully, and not now.
Strong signals that migration is justified
- Deployment coupling blocks velocity: teams wait for each other to release because everything deploys as one artefact. A single team's bug delays five other teams' features.
- Scaling is all-or-nothing: a peak in one subsystem (payment processing, recommendation engine) forces the entire application to scale, at unnecessary infrastructure cost.
- Regulatory isolation is required: GDPR data residency, PCI-DSS cardholder data isolation, or SOC 2 scope reduction mandate that certain data flows are physically separated.
- Technology lock-in blocks capability: a critical new feature requires a runtime or language incompatible with the monolith's stack, and re-platforming the whole application is disproportionate.
Strong signals migration should wait
- Your team has fewer than 30 engineers — the coordination overhead of microservices outweighs the deployment benefits at this scale.
- You lack mature CI/CD pipelines, automated testing above 60% coverage, or centralised observability — you will create a distributed mess, not a microservices platform.
- No one owns bounded context mapping — without a clear domain model, you will extract wrong boundaries and spend years re-merging what you incorrectly split.
- The monolith works and the business pain is modest — if your current delivery pace is acceptable, incremental legacy modernisation may deliver more value per dollar than a full decomposition.
The distributed spaghetti trap
The most common failure mode in enterprise microservices migrations is creating a distributed monolith: a system that has been split into separate deployable units but retains all the coupling of the original monolith, while adding all the operational complexity of distributed systems.
Symptoms of a distributed monolith include:
- Services share a single database or schema — changes to one table require coordinating releases across five teams
- Synchronous REST chains span 4–8 services per user request — a timeout in service D cascades back to service A
- A single shared library contains the domain model, business logic or configuration — every service must release together when it changes
- Services are deployed together on the same release schedule, defeating the independent-deployment benefit
The strangler fig pattern: incremental extraction
Martin Fowler's strangler fig pattern (named after the tropical vine that grows around and eventually replaces its host tree) is the industry-standard approach for zero-downtime migration of large production systems. The key insight: you never rewrite the whole system; you redirect specific request paths to new services one at a time.
The mechanics involve three elements:
- Facade: an API gateway, reverse proxy (Nginx, Envoy) or BFF layer sits in front of both the monolith and the emerging services, routing requests based on path, headers or feature flags.
- Extract: one bounded context is extracted into a standalone service with its own codebase, deployment pipeline and (eventually) its own database. The monolith's code for that context is left in place but traffic is rerouted through the facade.
- Verify and strangle: once the new service handles 100% of traffic for that context without regression, the corresponding monolith code is removed. The "strangling" is complete for that context.
Practical rules for the strangler fig extraction sequence:
- Start with read-heavy, low-write services — reporting, catalogue browsing, notification preferences. They have fewer transactional dependencies and are safer to extract first.
- Extract leaf contexts first — services that do not call other internal services. Working inward toward the core reduces the blast radius of early extractions.
- Avoid extracting the payment or auth context first — these are high-criticality, high-coupling domains. Extract them after your team has built confidence with simpler contexts.
- Keep the facade thin — avoid putting business logic in the router. It becomes a new monolith.
Decomposing the shared database
If the strangler fig pattern is the most discussed challenge of microservices migration, data decomposition is the hardest one in practice. Most enterprise monoliths share a single relational database with hundreds of tables, foreign-key relationships across domain boundaries and stored procedures that embed business logic.
The decomposition proceeds in stages:
- Identify ownership: for every table and column, assign a single owning service. This is a domain modelling exercise, not a technical one. Use event storming sessions with business stakeholders to surface true ownership.
- Abstract access: before physically separating databases, introduce service-level read/write paths through the new service's API. Other services stop querying the shared table directly and call the owning service instead. This reveals hidden coupling.
- Dual-write transition: during the cutover window, write to both the monolith's shared table and the new service's private schema simultaneously. Verify consistency before committing to the new schema as canonical.
- Migrate and cut over: once dual-write is stable and read queries are routed through the new service, promote the private schema to canonical and remove the dual-write. The shared table becomes read-only, then is eventually dropped.
| Coupling type | Recommended approach | Risk level |
|---|---|---|
| Simple table ownership (no cross-domain FKs) | Direct extraction with dual-write window | Low |
| Cross-domain foreign keys | Replace FKs with domain events; accept eventual consistency | Medium |
| Shared joins across domain boundaries | Introduce read models (CQRS) per consuming service | Medium–High |
| Stored procedures with cross-domain logic | Extract business logic to application layer first, then decompose | High |
| Distributed transactions (saga required) | Design saga choreography with compensating events | High |
Org structure and Conway's Law
Conway's Law states that organisations design systems that mirror their own communication structure. The practical implication for migration: if you extract microservices without changing how your teams are organised, the services will gradually evolve back toward the shape of your org chart — and you will rebuild the monolith in distributed form.
Effective migration requires the inverse Conway manoeuvre: intentionally restructuring teams to match the desired service boundaries, so that team ownership, communication patterns and service ownership align.
This means:
- Each service has a single owning team — not a "platform team" that owns everything
- Teams are sized to own 1–3 services end-to-end (development, deployment, on-call)
- Inter-service communication becomes a formal API contract owned by the producing team, not an informal database join
- Platform teams own infrastructure primitives (service mesh, CI templates, observability) but not domain services
This is also where enterprise system integration strategy matters: the contracts between services become the long-term integration surface, not the database schema.
Cost and timeline benchmarks
Realistic cost benchmarks for enterprise monolith migrations vary widely based on system size and coupling, but the following ranges are consistent with industry data and our own engagement experience:
| Monolith size | Indicative migration cost | Timeline to first independent service | Full decomposition timeline |
|---|---|---|---|
| Mid-size (5–15 engineers, 200k–500k LoC) | $200k–$500k | 3–5 months | 12–18 months |
| Large (15–50 engineers, 500k–2M LoC) | $500k–$2M | 4–8 months | 18–30 months |
| Very large (50+ engineers, 2M+ LoC) | $2M–$8M+ | 6–12 months | 24–48 months |
These figures assume the migration is staffed partly with the existing team (domain knowledge) and partly with an external specialist partner (architectural patterns, tooling). Note that "full decomposition" is rarely the goal — most enterprises aim for selective decomposition of high-value bounded contexts and leave stable, low-change subsystems in the monolith indefinitely.
Infrastructure costs also increase significantly during migration: running both monolith and services in parallel, plus new tooling for service mesh, distributed tracing and secret management, typically adds 30–50% to operational infrastructure spend for the duration of the transition.
Observability and rollback strategy
You cannot safely operate a distributed system you cannot observe. Before extracting the first service, the following must be in place:
- Distributed tracing: OpenTelemetry with a backend (Jaeger, Tempo, Datadog APM) so you can trace a single user request across service boundaries. This is non-negotiable.
- Centralised log aggregation: structured JSON logs from all services forwarded to a single queryable store (Loki, Elasticsearch, CloudWatch Logs).
- Service-level SLIs/SLOs: define error rate, latency p95 and availability targets per service before it receives production traffic. Alerts trigger on SLO burn rate, not raw metrics.
- Feature flags at the facade: maintain the ability to instantly redirect traffic back to the monolith for any extracted context. The strangler fig pattern only works if you can reverse it quickly.
Rollback strategy must be explicit for each extraction phase. For every service extraction, document:
- The specific facade routing rule to revert (one-line change in API gateway config)
- The dual-write status — if data has been written to both schemas, which is canonical
- The acceptable data staleness window if the new service's database has diverged
- The on-call engineer who owns the rollback decision and the time threshold that triggers automatic rollback
Phased migration roadmap
A structured four-phase roadmap reduces risk and ensures each phase delivers measurable value before the next begins.
- Phase 1 — Foundation (months 1–3): Do not extract any services yet. Invest in prerequisites: centralised observability (tracing, logging, alerting), independent CI/CD pipeline templates, a container platform (Kubernetes or ECS), domain event catalogue, and boundary mapping workshops with business and engineering stakeholders. Deliverable: a prioritised extraction backlog with bounded context map.
- Phase 2 — First extraction (months 3–6): Extract one low-risk, high-impact bounded context using the strangler fig pattern. Validate the extraction playbook (facade routing, dual-write, data migration, SLO monitoring, rollback). This phase is as much a process dry-run as a technical delivery. Deliverable: one independently deployed service in production, documented extraction runbook.
- Phase 3 — Systematic decomposition (months 6–24+): Apply the validated playbook iteratively to the remaining priority bounded contexts. Each extraction team follows the same pattern. Data decomposition accelerates as the shared database ownership becomes clearer. Org restructuring (inverse Conway manoeuvre) happens in parallel. Deliverable: 60–80% of high-change business logic running as independent services.
- Phase 4 — Monolith residualisation: The remaining monolith contains stable, low-change subsystems that are rarely touched. Decide explicitly whether each residual context justifies further extraction cost. For most enterprises, leaving 20–30% of the original system in a "residual monolith" is the right economic decision. Deliverable: documented residual monolith policy, decommission plan for shared database tables no longer in use.
FAQ
Should we migrate our monolith to microservices?
Not automatically. Migrate when you have concrete organisational pain: teams blocked by deployment coupling, services that need to scale independently, or regulatory isolation requirements. If your team is smaller than 30 engineers, your CI/CD is immature, or your monolith works well and the business pain is modest, migration will cost more than it saves. Start with an honest bounded context audit and a clear list of the specific problems you are trying to solve.
What is the strangler fig pattern?
The strangler fig pattern involves placing a routing facade in front of your monolith and incrementally redirecting specific request paths to new microservices, while the monolith continues to handle everything else. Over time you extract context after context until the monolith handles so little traffic that it can be decommissioned. It is the safest approach to migrating a live production system because it is entirely incremental and reversible at every step.
What is a distributed monolith?
A distributed monolith is the worst-case outcome of a poorly planned migration: multiple services that still share a single database, synchronous call chains that span the entire system, or a shared library that forces all services to deploy together. You get all the operational complexity of distributed systems with none of the deployment independence. The root cause is almost always data decomposition skipped or deferred indefinitely.
How long does a migration take?
Meaningful value — an independently deployed, independently scalable service — can be delivered in 3–6 months with the strangler fig pattern. Full decomposition of a large enterprise monolith takes 18–36 months. The timeline is driven more by data coupling complexity and organisational change management than by the coding work itself. Plan for the org change to take as long as the technical work.
How do we avoid downtime during migration?
The strangler fig pattern's routing facade is your primary mechanism: traffic can be switched back to the monolith in seconds if the new service fails. Supplement this with dual-write during database cutover phases, blue-green deployments at the service level, feature flags at the facade, and comprehensive distributed tracing to catch regressions before they affect all users. Never do a big-bang database cutover on a live system — always run dual-write with a validation window.
Last updated 6 June 2026. Cost benchmarks reflect YuSMP Group engagement data and publicly available industry reports (Gartner, CNCF, ThoughtWorks Technology Radar). Individual migration costs vary based on monolith size, coupling, team structure and observability maturity.


