TL;DR — default to a modular monolith
The architecture debate of 2016–2020 — "monoliths are dead, microservices are the future" — has largely been corrected by hard operational experience. Here is the short version:
- Default to a modular monolith. Clean modules, enforced domain boundaries, a single deploy pipeline. This is the right starting point for the vast majority of web applications, including B2B SaaS, marketplaces and enterprise portals.
- Extract services when you have a concrete reason. Different scaling profiles, independent deployment cadences, or team-ownership boundaries that a monolith cannot express cleanly. Not before.
- Full microservices make sense at scale. Think 20+ engineers, 10M+ requests per day, and genuinely divergent service SLAs. Below that threshold the overhead is pure tax on your team's velocity.
- A plain monolith is not shameful. Shopify, Stack Overflow and Basecamp shipped at billions of dollars in revenue on well-tuned monoliths. The architecture is only wrong if it is stopping you from doing something you need to do.
The three options defined
Traditional monolith
A single deployable application where all business logic, data access and presentation live in one codebase and process. Every feature ships together; the database is shared; scaling means running more copies of the whole application. The classic Rails app, Django project or Spring Boot service is a monolith.
Modular monolith
Still a single deployable binary, but the code is partitioned into well-defined modules — each owning its domain logic, its database schema namespace and its public interface. Modules communicate through in-process APIs or message contracts, not HTTP calls. The result is a codebase that deploys simply but is structured to allow future extraction. This is the architecture we reach for first when building our web application development service engagements for clients in the US and EU.
Microservices
A collection of independently deployable services, each responsible for a narrow domain, communicating over HTTP or a message broker (Kafka, RabbitMQ, SQS). Each service has its own database, its own CI/CD pipeline and its own operational footprint. Done well, it lets large organisations ship fast without stepping on each other. Done prematurely, it is a distributed monolith with all the drawbacks of both worlds and none of the advantages of either.
What a modern monolith can actually do
The narrative that "monolith equals legacy" is a marketing artefact of the Kubernetes era. Let us be precise about what modern monoliths can and cannot do.
They scale horizontally
Running four copies of a Rails or Django app behind a load balancer is horizontal scaling. It works until your database becomes the bottleneck — at which point you add read replicas and connection pooling, not necessarily a new service. Stack Overflow serves billions of pageviews on nine on-premise servers. The bottleneck was never the monolith; it was the lack of caching, indexing and query discipline.
They support fast iteration
A monolith has one deploy pipeline. One set of integration tests. One place to grep for a bug. For a team of 3–15 engineers, this is a massive coordination advantage. Every distributed system you add multiplies the failure modes you must monitor and the rollback scenarios you must practise.
They support compliance boundaries
GDPR data residency, HIPAA audit logs and SOC 2 controls are easier to implement inside one application than to enforce across a mesh of services. A single audit trail, a single secrets store, a single TLS termination point. See the patterns we use in how to build a multi-tenant SaaS — multi-tenancy is fully achievable inside a modular monolith.
Where monoliths genuinely struggle
There are real limitations. If your checkout service needs to handle 50x the traffic of your reporting module, scaling the whole app wastes resources. If your ML inference pipeline has completely different runtime requirements (GPU, Python, different deploy cadence), keeping it in the same process is awkward. These are the right reasons to extract a service — not engineering fashion.
When microservices genuinely pay off
There are scenarios where microservices are the right answer. Here is how to recognise them.
Genuinely different scaling profiles
If your product recommendation engine receives 1,000 requests per second during peak while your user-settings API receives 5, it makes no sense to scale both together. Once you can quantify that divergence in production — not in a whiteboard session — service extraction pays for itself. Related: see the 2026 enterprise AI-agent stack for how inference-heavy workloads often justify isolation for exactly this reason.
Independent deployment cadences
When separate product teams need to ship multiple times per day without coordination, a single deploy pipeline becomes a bottleneck. This is Amazon's original motivation for microservices — not performance, but organisational velocity. If your analytics team and your billing team are waiting for each other's code to be stable before a release, the architecture is creating a human coordination problem that a service boundary would solve.
Isolated fault domains
A catastrophic bug in a monolith brings down the whole application. In a microservices architecture, a bug in your notification service does not have to bring down checkout. Circuit breakers, bulkheads and graceful degradation are patterns that only make sense when you have service boundaries to enforce them at. For high-revenue flows — payments, core API, authentication — the isolation guarantee is worth the operational cost.
Regulatory and data-residency boundaries
A US company serving EU customers under GDPR may genuinely need EU-resident data processed in a separately deployed and audited service, not just a config flag in a shared app. Ditto for PCI DSS scope isolation — cardholder data handled in a separate service with its own network boundary is architecturally cleaner than whittling down a monolith's surface area.
Cost, team size and operational burden
Microservices are not free. Every service you add multiplies the infrastructure and staffing cost. Here is an honest accounting.
Infrastructure cost
Each service needs its own: compute (container, Lambda or VM), database or database namespace, secrets manager integration, load balancer or API gateway routing rule, log aggregation pipeline entry, and health-check monitor. For a ten-service architecture you are running ten of each of those. On AWS or GCP, a modular monolith often costs 60–80% less in monthly infrastructure than an equivalent microservices mesh handling the same traffic.
Observability cost
A monolith fails with a stack trace in one log stream. A distributed system fails with partial traces spread across five services, two async queues and a caching layer. Distributed tracing (Jaeger, Tempo, AWS X-Ray), structured logging aggregation (Loki, Datadog, CloudWatch) and service health dashboards are mandatory, not optional. Budget a dedicated platform engineer plus $2,000–8,000/month in SaaS tooling for a team running 10+ services.
Team size requirements
Amazon's two-pizza rule (6–10 engineers per service) is the operational minimum for each service to be maintained without constant context switching. Below 20–30 engineers total, microservices mean every engineer owns multiple services — exactly the coordination overhead the architecture was supposed to remove. Also see the SaaS churn-reduction playbook for how architectural decisions upstream affect the product reliability that drives retention.
| Dimension | Monolith | Modular Monolith | Microservices |
|---|---|---|---|
| Deploy complexity | Low | Low | High |
| Infra cost (same traffic) | Lowest | Lowest | 2–5x higher |
| Observability overhead | Minimal | Minimal | Significant |
| Horizontal scalability | Coarse-grained | Coarse-grained | Fine-grained |
| Independent team deploys | No | Partial | Yes |
| Min. team to run well | 3–5 engineers | 5–15 engineers | 20+ engineers |
| Time to first production deploy | 1–2 weeks | 1–3 weeks | 4–8 weeks |
Scaling a web app the right way
Before choosing an architecture based on future scale you do not yet have, apply these scaling levers in order — most applications never need to go further than step three.
1. Vertical scaling and query optimisation
Double your database instance size. Add an index to the slow query. Enable connection pooling (PgBouncer, RDS Proxy). This is free or near-free engineering and commonly buys 5–10x capacity headroom. Most applications that "need microservices for scale" actually need a database index and a Redis cache.
2. Horizontal application scaling
Run multiple instances of your monolith behind a load balancer. Add read replicas for read-heavy workloads. Use a CDN to offload static assets and cached API responses. A single Rails or Node.js process at 4 vCPU handles ~500 requests/second. Eight instances behind a load balancer handle 4,000. You reach this ceiling slowly.
3. Caching and async queuing
Redis or Memcached in front of your hot read paths. A background job queue (Sidekiq, Celery, Bull) for anything that does not need to be synchronous — emails, webhooks, report generation, third-party API calls. Offloading async work eliminates a class of slow-request tail latency that makes your p95 look terrible without any actual architecture change.
4. Extract one service at a time
When a specific hot path is genuinely bottlenecked and the above steps are exhausted, extract that service using the strangler fig pattern. Route new traffic to the service while the monolith handles the rest. Migrate incrementally. Do not attempt a big-bang rewrite from monolith to microservices — the failure rate is high and the cost is enormous.
Decision matrix
Use this matrix when you are making the architecture call. Score each row for your current situation — not for the company you aspire to be in three years.
| Criterion | Choose modular monolith if… | Consider microservices if… |
|---|---|---|
| Team size | Under 20 engineers | 20+ engineers across multiple product squads |
| Traffic pattern | Uniform or low-volume today | Measured, divergent peak loads per domain |
| Deploy cadence | One or two teams, coordinated releases | Multiple teams, independent cadences required |
| Compliance isolation | Standard GDPR/SOC 2 manageable in one app | PCI DSS cardholder scope or strict data residency |
| DevOps maturity | No dedicated platform engineer | Platform/SRE team already in place |
| Technology mix | One primary language/runtime | Genuine need for mixed runtimes (e.g. ML GPU service) |
| Fault tolerance requirement | Full downtime acceptable for rare incidents | Partial degradation required (checkout must survive analytics outage) |
FAQ
Should a startup use microservices?
Almost never at the beginning. Microservices multiply operational surface area — service discovery, distributed tracing, independent CI/CD pipelines and a team large enough to own each service. A startup's constraint is shipping product and learning quickly, not infrastructure flexibility. Start with a modular monolith; extract services when a concrete scaling or ownership boundary forces you to.
What is a modular monolith?
A modular monolith is a single deployable application internally divided into well-defined, loosely coupled modules — each owning its own domain logic, database schema namespace and public API surface. It deploys as one process but is structured so that individual modules can be extracted into separate services later with minimal refactoring. It is the sweet spot for most teams under 50 engineers.
When do microservices pay off?
Microservices pay off when: (1) different parts of the system have genuinely different scaling profiles — checkout handles 10x the load of admin; (2) independent teams need to deploy without coordinating releases; (3) a specific service requires a different technology stack, SLA or compliance boundary. If none of those are true today, the overhead of microservices is pure cost.
Are microservices more scalable than a monolith?
Not automatically. A well-tuned monolith on modern cloud infrastructure handles millions of requests per day with horizontal scaling, connection pooling and read replicas. Microservices allow fine-grained scaling of hot paths — but that benefit only materialises when different services genuinely have different traffic shapes. Many teams add microservices complexity before they have the traffic that would justify it.
How big should the team be to run microservices?
Amazon's two-pizza rule is a useful guide: each service should be ownable by a team of 6–10 engineers. In practice you need at least one dedicated DevOps or platform engineer, SRE coverage, observability tooling and enough developers to maintain each service without constant context switching. Below 20–30 engineers total, microservices usually create more coordination overhead than they remove.
Can I migrate from a monolith to microservices later?
Yes — and this is the recommended path. Build a modular monolith with clean domain boundaries from the start, and extracting a service later is a contained effort: move a module's code, split its database tables and add an API contract. The strangler fig pattern — routing traffic incrementally to a new service while the monolith handles the rest — is battle-tested for this migration. Do not design for microservices on day one unless you already have the team and traffic that requires it.
Last updated 4 June 2026. Architecture guidance based on production engagements delivered for US and EU clients between 2022 and 2026. Cost figures are estimates; actual infrastructure spend varies by cloud provider, region and traffic shape.


