The SaaS Architecture Mistakes That Will Cost You at 10,000 Users
Your SaaS is growing. Users are signing up, metrics are climbing, and investors are interested. Everything looks great — until it doesn't.
At 10,000 users, things break in ways that weren't visible at 100 or even 1,000. The database query that took 50ms now takes 8 seconds. The background job that processed in 2 minutes now takes 45 minutes and occasionally crashes. The API that served 50 requests per second now gets 500 and starts dropping connections.
These aren't scaling problems. They're architecture problems that were invisible at small scale.
We've audited the architecture of over 30 SaaS platforms preparing for their next growth stage. Here are the seven mistakes we find in almost every one.
Mistake 1: The God Database
What It Looks Like
One PostgreSQL or MySQL instance handles everything — user data, application data, analytics events, session storage, job queues, and full-text search. It's the single point of truth, the single point of query, and the single point of failure.
Why It Breaks at Scale
Databases aren't one-size-fits-all. A query optimized for transactional writes (INSERT a new order) conflicts with a query optimized for analytical reads (aggregate revenue by month for the last 2 years). When both hit the same database, they compete for resources — and the analytical query wins because it holds locks longer.
The Fix
Separate your data by access pattern:
- Transactional data → PostgreSQL/MySQL (optimized for reads/writes)
- Analytics and reporting → A read replica or a dedicated analytical store (ClickHouse, BigQuery)
- Session storage → Redis (in-memory, fast expiration)
- Job queues → Redis or a dedicated queue (SQS, BullMQ)
- Full-text search → Elasticsearch or Meilisearch
You don't need all of these from day one. But you need the architecture to support adding them without rewriting your application.
Mistake 2: No Multi-Tenancy Strategy
What It Looks Like
Tenant isolation is implemented with a WHERE tenant_id = ? clause on every query. There's no row-level security, no schema separation, and no query-level enforcement.
Why It Breaks at Scale
One missed WHERE clause in one query is a data breach. One tenant running an expensive report slows down every other tenant. One tenant's data growth makes backups and migrations slower for everyone.
The Fix
Choose a multi-tenancy model and enforce it at the infrastructure level:
- Shared database, shared schema: Simplest. Use row-level security (PostgreSQL RLS) to enforce tenant isolation at the database layer, not the application layer.
- Shared database, separate schemas: Each tenant gets their own schema. Better isolation, slightly more operational complexity.
- Separate databases: Maximum isolation. Best for regulated industries (healthcare, finance) where data residency matters.
The right choice depends on your compliance requirements, performance needs, and operational capacity. But "WHERE tenant_id = ?" as your sole isolation mechanism is a ticking time bomb.
Mistake 3: Synchronous Everything
What It Looks Like
When a user creates an order, the API handler: validates the input → writes to the database → sends a confirmation email → updates the inventory → notifies the warehouse → logs the analytics event → returns the response. All synchronously. All in one HTTP request.
Why It Breaks at Scale
The more steps in a synchronous chain, the higher the failure probability. If the email service is slow (it will be), the entire request is slow. If the inventory service is down (it will be), the entire request fails — even though the order itself was valid.
The Fix
Follow the rule: the API handler should do the minimum work necessary to accept the request, then delegate everything else to background jobs.
The order creation flow becomes:
- Validate input → Write to database → Return 201 Created (this takes 50ms)
- Emit an
OrderCreatedevent - Background workers handle email, inventory, warehouse notification, and analytics independently
If the email service is down, the order still succeeds. The email gets retried later. Users get a fast response, and the system is resilient to partial failures.
Mistake 4: No Caching Strategy
What It Looks Like
Every API request hits the database. Even for data that changes once a day (plan limits, feature flags, tenant configuration), the database serves every read.
Why It Breaks at Scale
Databases are fast, but they have finite connection limits and IOPS. At 10,000 users making 10 requests each per session, you're looking at 100,000 database queries per session cycle. If 70% of those queries return the same data they returned 5 minutes ago, you're wasting 70,000 queries.
The Fix
Implement caching in layers:
-
Application-level caching: In-memory cache (Redis) for frequently accessed, slowly changing data. User profiles, tenant settings, permission sets — cache these for 5–15 minutes.
-
API-level caching: HTTP response caching with proper
Cache-Controlheaders. CDN-cached responses for public endpoints (pricing pages, documentation, marketing content). -
Query-level caching: For expensive database queries (dashboards, reports, aggregations), cache the result set with a TTL and invalidate on write.
The target: 80% of read requests should be served from cache. The database should only handle writes and cache misses.
Mistake 5: Authentication as an Afterthought
What It Looks Like
Sessions are stored in the application's memory (not a shared store), JWTs are issued with no expiration or 30-day expiration, there's no token refresh mechanism, and API keys are stored in plaintext in the database.
Why It Breaks at Scale
Memory-based sessions don't work when you have multiple application servers (which you will at 10K users). Long-lived JWTs mean you can't revoke access when a user changes their password or gets deactivated. Plaintext API keys are a breach waiting to happen.
The Fix
- Store sessions in Redis with configurable TTL
- Issue short-lived access tokens (15 minutes) with long-lived refresh tokens (7–30 days)
- Implement token rotation — when a refresh token is used, issue a new one and invalidate the old
- Hash API keys (like passwords) and only display them once at creation time
- Implement rate limiting per user and per API key
Authentication is infrastructure. It should be rock-solid before you have 10,000 users relying on it.
Mistake 6: Monolith Without Module Boundaries
What It Looks Like
The codebase is a single deployment unit (fine for this stage), but internally, every module imports from every other module. The billing code imports the notification code, which imports the user code, which imports the billing code. Circular dependencies are everywhere.
Why It Breaks at Scale
When everything depends on everything, you can't change anything safely. A "small" change to the notification system breaks billing because of an implicit dependency. Testing is slow because running one test requires initializing the entire application. And when you eventually need to extract a service (billing is often first), the extraction is a multi-month project because the boundaries don't exist.
The Fix
Keep the monolith, but enforce internal module boundaries:
- Each module exposes a public interface (API) and hides its implementation
- Cross-module communication goes through the public interface, never through direct database queries
- No circular dependencies — use dependency inversion or event-based communication
- Each module owns its own database tables — no shared tables across modules
This is called a "modular monolith," and it gives you the simplicity of a single deployment with the architectural cleanliness needed for future extraction.
Mistake 7: No Observability
What It Looks Like
Logging is console.log statements. Monitoring is "check if the website loads." Alerting is "a customer reports a problem."
Why It Breaks at Scale
At 10,000 users, you can't wait for customer reports to discover issues. By the time a customer complains, 100 others have already experienced the problem and silently churned.
The Fix
Implement the three pillars of observability:
-
Structured Logging: JSON-formatted logs with request ID, user ID, and trace ID. Every log entry should be searchable and correlatable.
-
Metrics: Response times (p50, p95, p99), error rates, queue depths, database connection pool utilization, cache hit rates. Dashboarded and alertable.
-
Distributed Tracing: Follow a request through every service it touches. When a user reports "the page is slow," you should be able to find the exact trace and see which service or query caused the latency.
Set up alerts for: error rate > 1%, p99 latency > 2s, database connection pool > 80%, queue depth growing for > 5 minutes. These are the early warning signals that prevent outages.
The Devoax Approach to SaaS Architecture
When we build SaaS platforms, we design for 10x the current target. Not because you need 10x infrastructure on day one — but because the architecture decisions need to support 10x without a rewrite.
We start with a modular monolith, implement proper multi-tenancy from day one, build with asynchronous processing as the default, and set up observability before the first user signs up.
The cheapest time to make these decisions is before you write the first line of code. The most expensive time is when you're at 10,000 users and your system is falling apart.
Scaling isn't about adding more servers. It's about removing the architectural bottlenecks that prevent servers from doing their job. Fix the architecture, and scaling becomes a configuration change, not a rewrite.