Yury Pukhov, YuSMP Group
Yury Pukhov CEO & Web Engineering Lead, YuSMP Group · Web platform architecture since 2011

TL;DR — default to a modular monolith

The architecture debate of 2016–2020 — "monoliths are dead, microservices are the future" — has largely been corrected by hard operational experience. Here is the short version:

  • Default to a modular monolith. Clean modules, enforced domain boundaries, a single deploy pipeline. This is the right starting point for the vast majority of web applications, including B2B SaaS, marketplaces and enterprise portals.
  • Extract services when you have a concrete reason. Different scaling profiles, independent deployment cadences, or team-ownership boundaries that a monolith cannot express cleanly. Not before.
  • Full microservices make sense at scale. Think 20+ engineers, 10M+ requests per day, and genuinely divergent service SLAs. Below that threshold the overhead is pure tax on your team's velocity.
  • A plain monolith is not shameful. Shopify, Stack Overflow and Basecamp shipped at billions of dollars in revenue on well-tuned monoliths. The architecture is only wrong if it is stopping you from doing something you need to do.

The three options defined

Traditional monolith

A single deployable application where all business logic, data access and presentation live in one codebase and process. Every feature ships together; the database is shared; scaling means running more copies of the whole application. The classic Rails app, Django project or Spring Boot service is a monolith.

Modular monolith

Still a single deployable binary, but the code is partitioned into well-defined modules — each owning its domain logic, its database schema namespace and its public interface. Modules communicate through in-process APIs or message contracts, not HTTP calls. The result is a codebase that deploys simply but is structured to allow future extraction. This is the architecture we reach for first when building our web application development service engagements for clients in the US and EU.

Microservices

A collection of independently deployable services, each responsible for a narrow domain, communicating over HTTP or a message broker (Kafka, RabbitMQ, SQS). Each service has its own database, its own CI/CD pipeline and its own operational footprint. Done well, it lets large organisations ship fast without stepping on each other. Done prematurely, it is a distributed monolith with all the drawbacks of both worlds and none of the advantages of either.

Architecture diagram whiteboard showing service boundaries and data flows
Clean domain boundaries drawn at the design stage make the difference between a modular monolith that evolves gracefully and a big ball of mud that resists change. Good architecture starts on a whiteboard, not in a Kubernetes cluster.

What a modern monolith can actually do

The narrative that "monolith equals legacy" is a marketing artefact of the Kubernetes era. Let us be precise about what modern monoliths can and cannot do.

They scale horizontally

Running four copies of a Rails or Django app behind a load balancer is horizontal scaling. It works until your database becomes the bottleneck — at which point you add read replicas and connection pooling, not necessarily a new service. Stack Overflow serves billions of pageviews on nine on-premise servers. The bottleneck was never the monolith; it was the lack of caching, indexing and query discipline.

They support fast iteration

A monolith has one deploy pipeline. One set of integration tests. One place to grep for a bug. For a team of 3–15 engineers, this is a massive coordination advantage. Every distributed system you add multiplies the failure modes you must monitor and the rollback scenarios you must practise.

They support compliance boundaries

GDPR data residency, HIPAA audit logs and SOC 2 controls are easier to implement inside one application than to enforce across a mesh of services. A single audit trail, a single secrets store, a single TLS termination point. See the patterns we use in how to build a multi-tenant SaaS — multi-tenancy is fully achievable inside a modular monolith.

Where monoliths genuinely struggle

There are real limitations. If your checkout service needs to handle 50x the traffic of your reporting module, scaling the whole app wastes resources. If your ML inference pipeline has completely different runtime requirements (GPU, Python, different deploy cadence), keeping it in the same process is awkward. These are the right reasons to extract a service — not engineering fashion.

When microservices genuinely pay off

There are scenarios where microservices are the right answer. Here is how to recognise them.

Genuinely different scaling profiles

If your product recommendation engine receives 1,000 requests per second during peak while your user-settings API receives 5, it makes no sense to scale both together. Once you can quantify that divergence in production — not in a whiteboard session — service extraction pays for itself. Related: see the 2026 enterprise AI-agent stack for how inference-heavy workloads often justify isolation for exactly this reason.

Independent deployment cadences

When separate product teams need to ship multiple times per day without coordination, a single deploy pipeline becomes a bottleneck. This is Amazon's original motivation for microservices — not performance, but organisational velocity. If your analytics team and your billing team are waiting for each other's code to be stable before a release, the architecture is creating a human coordination problem that a service boundary would solve.

Isolated fault domains

A catastrophic bug in a monolith brings down the whole application. In a microservices architecture, a bug in your notification service does not have to bring down checkout. Circuit breakers, bulkheads and graceful degradation are patterns that only make sense when you have service boundaries to enforce them at. For high-revenue flows — payments, core API, authentication — the isolation guarantee is worth the operational cost.

Regulatory and data-residency boundaries

A US company serving EU customers under GDPR may genuinely need EU-resident data processed in a separately deployed and audited service, not just a config flag in a shared app. Ditto for PCI DSS scope isolation — cardholder data handled in a separate service with its own network boundary is architecturally cleaner than whittling down a monolith's surface area.

Cost, team size and operational burden

Microservices are not free. Every service you add multiplies the infrastructure and staffing cost. Here is an honest accounting.

Operations engineer monitoring a server dashboard with multiple services displayed
Running microservices at scale means running a platform engineering function in parallel with product development. Observability, service mesh, on-call rotations and incident runbooks for each service add up quickly — factor this into your architecture decision before you split the first service.

Infrastructure cost

Each service needs its own: compute (container, Lambda or VM), database or database namespace, secrets manager integration, load balancer or API gateway routing rule, log aggregation pipeline entry, and health-check monitor. For a ten-service architecture you are running ten of each of those. On AWS or GCP, a modular monolith often costs 60–80% less in monthly infrastructure than an equivalent microservices mesh handling the same traffic.

Observability cost

A monolith fails with a stack trace in one log stream. A distributed system fails with partial traces spread across five services, two async queues and a caching layer. Distributed tracing (Jaeger, Tempo, AWS X-Ray), structured logging aggregation (Loki, Datadog, CloudWatch) and service health dashboards are mandatory, not optional. Budget a dedicated platform engineer plus $2,000–8,000/month in SaaS tooling for a team running 10+ services.

Team size requirements

Amazon's two-pizza rule (6–10 engineers per service) is the operational minimum for each service to be maintained without constant context switching. Below 20–30 engineers total, microservices mean every engineer owns multiple services — exactly the coordination overhead the architecture was supposed to remove. Also see the SaaS churn-reduction playbook for how architectural decisions upstream affect the product reliability that drives retention.

DimensionMonolithModular MonolithMicroservices
Deploy complexityLowLowHigh
Infra cost (same traffic)LowestLowest2–5x higher
Observability overheadMinimalMinimalSignificant
Horizontal scalabilityCoarse-grainedCoarse-grainedFine-grained
Independent team deploysNoPartialYes
Min. team to run well3–5 engineers5–15 engineers20+ engineers
Time to first production deploy1–2 weeks1–3 weeks4–8 weeks

Scaling a web app the right way

Before choosing an architecture based on future scale you do not yet have, apply these scaling levers in order — most applications never need to go further than step three.

1. Vertical scaling and query optimisation

Double your database instance size. Add an index to the slow query. Enable connection pooling (PgBouncer, RDS Proxy). This is free or near-free engineering and commonly buys 5–10x capacity headroom. Most applications that "need microservices for scale" actually need a database index and a Redis cache.

2. Horizontal application scaling

Run multiple instances of your monolith behind a load balancer. Add read replicas for read-heavy workloads. Use a CDN to offload static assets and cached API responses. A single Rails or Node.js process at 4 vCPU handles ~500 requests/second. Eight instances behind a load balancer handle 4,000. You reach this ceiling slowly.

3. Caching and async queuing

Redis or Memcached in front of your hot read paths. A background job queue (Sidekiq, Celery, Bull) for anything that does not need to be synchronous — emails, webhooks, report generation, third-party API calls. Offloading async work eliminates a class of slow-request tail latency that makes your p95 look terrible without any actual architecture change.

4. Extract one service at a time

When a specific hot path is genuinely bottlenecked and the above steps are exhausted, extract that service using the strangler fig pattern. Route new traffic to the service while the monolith handles the rest. Migrate incrementally. Do not attempt a big-bang rewrite from monolith to microservices — the failure rate is high and the cost is enormous.

Decision matrix

Use this matrix when you are making the architecture call. Score each row for your current situation — not for the company you aspire to be in three years.

CriterionChoose modular monolith if…Consider microservices if…
Team sizeUnder 20 engineers20+ engineers across multiple product squads
Traffic patternUniform or low-volume todayMeasured, divergent peak loads per domain
Deploy cadenceOne or two teams, coordinated releasesMultiple teams, independent cadences required
Compliance isolationStandard GDPR/SOC 2 manageable in one appPCI DSS cardholder scope or strict data residency
DevOps maturityNo dedicated platform engineerPlatform/SRE team already in place
Technology mixOne primary language/runtimeGenuine need for mixed runtimes (e.g. ML GPU service)
Fault tolerance requirementFull downtime acceptable for rare incidentsPartial degradation required (checkout must survive analytics outage)

FAQ

Should a startup use microservices?

Almost never at the beginning. Microservices multiply operational surface area — service discovery, distributed tracing, independent CI/CD pipelines and a team large enough to own each service. A startup's constraint is shipping product and learning quickly, not infrastructure flexibility. Start with a modular monolith; extract services when a concrete scaling or ownership boundary forces you to.

What is a modular monolith?

A modular monolith is a single deployable application internally divided into well-defined, loosely coupled modules — each owning its own domain logic, database schema namespace and public API surface. It deploys as one process but is structured so that individual modules can be extracted into separate services later with minimal refactoring. It is the sweet spot for most teams under 50 engineers.

When do microservices pay off?

Microservices pay off when: (1) different parts of the system have genuinely different scaling profiles — checkout handles 10x the load of admin; (2) independent teams need to deploy without coordinating releases; (3) a specific service requires a different technology stack, SLA or compliance boundary. If none of those are true today, the overhead of microservices is pure cost.

Are microservices more scalable than a monolith?

Not automatically. A well-tuned monolith on modern cloud infrastructure handles millions of requests per day with horizontal scaling, connection pooling and read replicas. Microservices allow fine-grained scaling of hot paths — but that benefit only materialises when different services genuinely have different traffic shapes. Many teams add microservices complexity before they have the traffic that would justify it.

How big should the team be to run microservices?

Amazon's two-pizza rule is a useful guide: each service should be ownable by a team of 6–10 engineers. In practice you need at least one dedicated DevOps or platform engineer, SRE coverage, observability tooling and enough developers to maintain each service without constant context switching. Below 20–30 engineers total, microservices usually create more coordination overhead than they remove.

Can I migrate from a monolith to microservices later?

Yes — and this is the recommended path. Build a modular monolith with clean domain boundaries from the start, and extracting a service later is a contained effort: move a module's code, split its database tables and add an API contract. The strangler fig pattern — routing traffic incrementally to a new service while the monolith handles the rest — is battle-tested for this migration. Do not design for microservices on day one unless you already have the team and traffic that requires it.

Last updated 4 June 2026. Architecture guidance based on production engagements delivered for US and EU clients between 2022 and 2026. Cost figures are estimates; actual infrastructure spend varies by cloud provider, region and traffic shape.