Backend Engineering |

Load Testing and Performance Optimization: How We Prepared Our Backend for 100K Concurrent Users

Load testing with k6, identifying bottlenecks like N+1 queries and connection pool exhaustion, and optimizing for high traffic.

By SouvenirList

Three weeks before our biggest product launch, the CEO asked a question that nobody on the engineering team could answer: “How many users can we handle at the same time?” We had been building features for months, but nobody had tested the system under realistic load. So we ran our first load test. The results were sobering — our backend collapsed at 2,000 concurrent users, far below the 50,000 we expected on launch day.

What followed was the most intense three weeks of optimization I have ever done. We found N+1 queries that generated 400 database calls per page load, a connection pool configured for 10 connections serving an application that needed 200, a memory leak in our session middleware that consumed 500MB per hour, and an unindexed database column that turned a 5ms query into a 12-second full table scan under load.

Every one of these issues was invisible during normal development. Our test suite passed. The application worked perfectly with 10 users. Only load testing revealed what would break at scale. This guide covers the load testing methodology and performance optimization techniques that got us from 2,000 to 120,000 concurrent users — and the process I now follow before every major launch.


TL;DR — Performance Optimization Checklist

BottleneckSymptom Under LoadFixImpact
N+1 queriesDatabase CPU spikes, slow responsesEager loading, batch queries10-100x improvement
Missing indexesSpecific queries slow down exponentiallyAdd targeted indexes100-1000x improvement
Connection pool exhaustionTimeouts, “too many connections” errorsIncrease pool size, add PgBouncerRemoves ceiling
Memory leaksGradual slowdown, OOM crashesProfile and fix allocationsPrevents crashes
No cachingDatabase overloaded on repeated readsRedis cache layer5-50x improvement
Synchronous I/OThread blocking, low throughputAsync operations, queues3-10x improvement
Large payloadsHigh bandwidth, slow transfersPagination, compression, field selection2-5x improvement

Setting Up Load Testing with k6

I use k6 for load testing because it is scriptable in JavaScript, produces clear metrics, and handles complex scenarios like authenticated user flows. Here is a basic load test that simulates users browsing a product catalog:

// load-test.js
import http from 'k6/http';
import { check, sleep } from 'k6';
import { Rate, Trend } from 'k6/metrics';

const errorRate = new Rate('errors');
const responseTime = new Trend('response_time');

export const options = {
  stages: [
    { duration: '2m', target: 100 },   // Ramp up to 100 users
    { duration: '5m', target: 100 },   // Stay at 100 users
    { duration: '2m', target: 500 },   // Ramp up to 500 users
    { duration: '5m', target: 500 },   // Stay at 500 users
    { duration: '2m', target: 1000 },  // Ramp up to 1000 users
    { duration: '5m', target: 1000 },  // Stay at 1000 users
    { duration: '3m', target: 0 },     // Ramp down
  ],
  thresholds: {
    http_req_duration: ['p(95)<500', 'p(99)<1000'],
    errors: ['rate<0.01'],
  },
};

export default function () {
  // Browse product listing
  const listRes = http.get('https://api.example.com/api/v1/products?page=1&limit=20');
  check(listRes, {
    'list status 200': (r) => r.status === 200,
    'list response time < 500ms': (r) => r.timings.duration < 500,
  });
  errorRate.add(listRes.status !== 200);
  responseTime.add(listRes.timings.duration);

  sleep(Math.random() * 3 + 1); // Think time: 1-4 seconds

  // View product detail
  const products = JSON.parse(listRes.body).data;
  if (products && products.length > 0) {
    const randomProduct = products[Math.floor(Math.random() * products.length)];
    const detailRes = http.get(`https://api.example.com/api/v1/products/${randomProduct.id}`);
    check(detailRes, {
      'detail status 200': (r) => r.status === 200,
      'detail response time < 300ms': (r) => r.timings.duration < 300,
    });
    errorRate.add(detailRes.status !== 200);
    responseTime.add(detailRes.timings.duration);
  }

  sleep(Math.random() * 2 + 1);
}

Run with: k6 run load-test.js

Key Metrics to Watch

MetricWhat It Tells YouHealthy Range
p95 response time95% of requests complete within this time< 500ms
p99 response timeWorst-case experience for 1% of users< 1000ms
Error ratePercentage of failed requests< 1%
Throughput (RPS)Requests per second the system handlesDepends on use case
Active connectionsConcurrent connections to the serverBelow pool limits

I focus on p95 and p99 rather than average response time. Averages hide problems — a system with 50ms average might have a p99 of 10 seconds, meaning 1 in 100 users waits 10 seconds. That is a terrible experience that averages mask completely.


Testing Authenticated User Flows

Real load tests must simulate authenticated users. Here is how I test an authenticated flow with login, browsing, and checkout:

import http from 'k6/http';
import { check, group, sleep } from 'k6';

export const options = {
  stages: [
    { duration: '5m', target: 500 },
    { duration: '10m', target: 500 },
    { duration: '5m', target: 0 },
  ],
};

// Pre-generate test users
const users = JSON.parse(open('./test-users.json'));

export default function () {
  const user = users[__VU % users.length];

  group('Login', () => {
    const loginRes = http.post(
      'https://api.example.com/api/v1/auth/login',
      JSON.stringify({ email: user.email, password: user.password }),
      { headers: { 'Content-Type': 'application/json' } }
    );
    check(loginRes, { 'login success': (r) => r.status === 200 });
    
    const token = JSON.parse(loginRes.body).accessToken;
    // Store for subsequent requests
    http.setDefaultHeaders({ Authorization: `Bearer ${token}` });
  });

  sleep(2);

  group('Browse Products', () => {
    const res = http.get('https://api.example.com/api/v1/products?limit=20');
    check(res, { 'browse success': (r) => r.status === 200 });
  });

  sleep(3);

  group('Add to Cart', () => {
    const res = http.post(
      'https://api.example.com/api/v1/cart/items',
      JSON.stringify({ productId: 'prod_001', quantity: 1 }),
      { headers: { 'Content-Type': 'application/json' } }
    );
    check(res, { 'add to cart success': (r) => r.status === 201 });
  });

  sleep(2);

  group('Checkout', () => {
    const res = http.post(
      'https://api.example.com/api/v1/orders',
      JSON.stringify({ paymentMethod: 'test_card' }),
      { headers: { 'Content-Type': 'application/json' } }
    );
    check(res, { 'checkout success': (r) => r.status === 201 });
  });

  sleep(5);
}

The Bottlenecks We Found (And How We Fixed Them)

Bottleneck 1: N+1 Queries

This was the biggest performance killer. Our product listing endpoint loaded 20 products, then made a separate database query for each product’s category, images, and reviews. That is 1 + 20 + 20 + 20 = 61 queries per page load.

// Before: N+1 queries (61 queries for 20 products)
async function getProducts(page, limit) {
  const products = await db.query(
    'SELECT * FROM products LIMIT $1 OFFSET $2',
    [limit, (page - 1) * limit]
  );

  // N+1: one query per product for each relationship
  for (const product of products) {
    product.category = await db.query(
      'SELECT * FROM categories WHERE id = $1',
      [product.category_id]
    );
    product.images = await db.query(
      'SELECT * FROM product_images WHERE product_id = $1',
      [product.id]
    );
    product.reviews = await db.query(
      'SELECT * FROM reviews WHERE product_id = $1 LIMIT 5',
      [product.id]
    );
  }

  return products;
}

// After: 1 query with JOINs and aggregation
async function getProducts(page, limit) {
  return db.query(`
    SELECT 
      p.*,
      c.name as category_name,
      COALESCE(json_agg(DISTINCT pi.*) FILTER (WHERE pi.id IS NOT NULL), '[]') as images,
      COALESCE(
        json_agg(DISTINCT jsonb_build_object(
          'id', r.id, 'rating', r.rating, 'text', r.text
        )) FILTER (WHERE r.id IS NOT NULL), '[]'
      ) as reviews
    FROM products p
    LEFT JOIN categories c ON c.id = p.category_id
    LEFT JOIN product_images pi ON pi.product_id = p.id
    LEFT JOIN LATERAL (
      SELECT * FROM reviews WHERE product_id = p.id 
      ORDER BY created_at DESC LIMIT 5
    ) r ON true
    GROUP BY p.id, c.name
    ORDER BY p.created_at DESC
    LIMIT $1 OFFSET $2
  `, [limit, (page - 1) * limit]);
}

This single change reduced the product listing response time from 800ms to 45ms under load. From 61 queries to 1.

Bottleneck 2: Connection Pool Exhaustion

Our Node.js application used the default pg pool configuration: 10 connections. With 500 concurrent users, every request waited for a free connection. The wait time was the primary cause of our timeout errors.

// Before: default pool (10 connections)
const pool = new Pool({
  connectionString: process.env.DATABASE_URL,
});

// After: properly sized pool with PgBouncer
const pool = new Pool({
  connectionString: process.env.DATABASE_URL,
  max: 20,               // Connections per Node.js process
  idleTimeoutMillis: 30000,
  connectionTimeoutMillis: 5000,
});

The rule of thumb for pool sizing: connections = (2 * CPU cores) + disk spindles. For a server with 4 cores and SSD storage, that is about 10-20 connections per process. With 4 Node.js processes behind a load balancer, that is 40-80 total connections to PostgreSQL.

For higher concurrency, I add PgBouncer as a connection pooler between the application and PostgreSQL. PgBouncer can handle thousands of client connections with only 50-100 actual PostgreSQL connections, multiplexing them in transaction mode.

Bottleneck 3: Missing Database Indexes

Our search endpoint used a LIKE query on the product name column — without an index. At 1,000 products, the query took 5ms. At 100,000 products, it took 12 seconds under concurrent load because every search triggered a full table scan.

-- The problematic query
SELECT * FROM products WHERE name ILIKE '%wireless headphones%';

-- Adding a trigram index for pattern matching
CREATE EXTENSION IF NOT EXISTS pg_trgm;
CREATE INDEX idx_products_name_trgm ON products USING gin (name gin_trgm_ops);

-- Query time: 12 seconds → 8 milliseconds

I now run EXPLAIN ANALYZE on every query that appears in a load test’s slow query log. The PostgreSQL query planner tells you exactly whether a query uses an index or does a sequential scan:

EXPLAIN ANALYZE SELECT * FROM products WHERE name ILIKE '%wireless%';

-- Bad output (sequential scan):
-- Seq Scan on products  (cost=0.00..2847.00 rows=50 width=256) (actual time=12340.123ms)

-- Good output (index scan):
-- Bitmap Index Scan on idx_products_name_trgm  (cost=0.00..12.00 rows=50 width=256) (actual time=8.234ms)

Bottleneck 4: Memory Leak

Our response time slowly degraded over hours of sustained load. The server started at 200MB memory usage and climbed to 2GB before crashing with an OOM (Out of Memory) error. The cause: a middleware that stored every request’s body in an array for debugging purposes — and never cleared it.

// The leak (simplified)
const requestLog = [];

app.use((req, res, next) => {
  requestLog.push({
    method: req.method,
    path: req.path,
    body: req.body,
    timestamp: Date.now(),
  });
  next();
});
// requestLog grows forever — classic memory leak

I found this using Node.js heap snapshots:

// Take heap snapshots for comparison
const v8 = require('v8');
const fs = require('fs');

app.get('/debug/heap', (req, res) => {
  const snapshotPath = `/tmp/heap-${Date.now()}.heapsnapshot`;
  const snapshotStream = v8.writeHeapSnapshot(snapshotPath);
  res.json({ path: snapshotPath });
});

Taking two snapshots 10 minutes apart and comparing them in Chrome DevTools immediately showed the growing array. After removing the debug middleware, memory usage stayed flat at 250MB regardless of how long the server ran.


Performance Optimization Techniques

Response Compression

Enabling gzip compression reduced our average response payload by 70%, which directly improved response times for clients on slow connections.

const compression = require('compression');

app.use(compression({
  threshold: 1024,  // Only compress responses > 1KB
  level: 6,         // Balanced compression level
}));

Field Selection

Let clients request only the fields they need. Our mobile app only displayed product name, price, and thumbnail — but the API returned the full product object including descriptions, specifications, and reviews (8KB per product).

// GET /api/v1/products?fields=id,name,price,thumbnail
app.get('/api/v1/products', (req, res) => {
  const allowedFields = ['id', 'name', 'price', 'thumbnail', 'category', 'rating'];
  const requestedFields = req.query.fields?.split(',').filter(f => allowedFields.includes(f));
  
  const selectClause = requestedFields?.length
    ? requestedFields.join(', ')
    : '*';

  const products = await db.query(`SELECT ${selectClause} FROM products LIMIT 20`);
  res.json({ data: products });
});

This reduced the product listing payload from 160KB to 12KB — an 92% reduction.

Database Query Optimization

Beyond indexes, these query optimizations made significant differences:

-- Use LIMIT even for existence checks
-- Bad: fetches all matching rows then checks length
SELECT * FROM orders WHERE user_id = $1 AND status = 'pending';

-- Good: stops at first match
SELECT EXISTS(SELECT 1 FROM orders WHERE user_id = $1 AND status = 'pending');

-- Use covering indexes for common queries
-- The index itself contains all needed data — no table lookup required
CREATE INDEX idx_products_list ON products(category_id, created_at DESC) 
  INCLUDE (id, name, price, thumbnail);

-- Avoid SELECT * in production code
-- Bad: fetches all columns including large text fields
SELECT * FROM products WHERE id = $1;

-- Good: fetch only what you need
SELECT id, name, price, thumbnail, category_id FROM products WHERE id = $1;

Load Testing Best Practices

Test Against Production-Like Data

Our initial load tests used a database with 100 products. Production had 500,000. The performance characteristics were completely different — queries that were fast on 100 rows became bottlenecks at 500,000. I now seed load test environments with production-scale data (anonymized) to catch these issues early.

Simulate Realistic User Behavior

Real users do not hammer endpoints as fast as possible. They browse, read, think, and then act. Include think time (random delays between requests) and realistic user flows (browse → view details → add to cart → checkout) in your load tests. Without think time, you test the system under unrealistic worst-case conditions.

Run Soak Tests

A spike test might pass, but will the system survive sustained load for hours? Memory leaks, connection leaks, and file descriptor exhaustion only appear over time. I run soak tests (constant moderate load for 4-8 hours) before every major launch.

export const options = {
  stages: [
    { duration: '5m', target: 200 },    // Ramp up
    { duration: '8h', target: 200 },    // Sustained load
    { duration: '5m', target: 0 },       // Ramp down
  ],
};

Monitor Everything During the Test

During load tests, I monitor: application response times and error rates (k6 output), database query performance (pg_stat_statements), CPU, memory, and network usage (system metrics), connection pool utilization, and Redis hit/miss ratios. The load test itself tells you what is slow. The monitoring tells you why.


Frequently Asked Questions

How Many Concurrent Users Should I Test For?

Test for 2-3x your expected peak traffic. If you expect 10,000 concurrent users at launch, test for 25,000-30,000. This margin accounts for unexpected viral traffic, bot activity, and the general principle that launch-day traffic is always higher than projections. If your system handles 3x expected load comfortably, you can launch with confidence.

What Is a Good Response Time for an API?

For user-facing APIs: p95 under 500ms and p99 under 1 second. For internal service-to-service APIs: p95 under 100ms. These are general guidelines — latency-sensitive applications (real-time gaming, financial trading) need tighter targets, while batch-processing APIs can tolerate more. The key metric is the p95/p99, not the average.

Should I Load Test in Production?

Yes, but carefully. Load testing in a staging environment catches most issues, but subtle differences (different hardware, different data volume, different network topology) mean staging results do not perfectly predict production behavior. I run production load tests during low-traffic windows with careful monitoring and an abort threshold. Start at 10% of expected load and increase gradually.

How Do I Find Memory Leaks in Node.js?

Take two heap snapshots 10 minutes apart using v8.writeHeapSnapshot(), load them in Chrome DevTools (Memory tab), and compare. Objects that grew significantly between snapshots are your leak candidates. Common Node.js leaks include: unbounded arrays or maps used for logging, event listeners that are added but never removed, and closures that capture large objects.

What Is the Difference Between Load Testing and Stress Testing?

Load testing verifies that your system performs acceptably under expected traffic levels. Stress testing pushes beyond expected levels to find the breaking point and verify that the system degrades gracefully rather than catastrophically. I always do both — load testing confirms we can handle launch day, and stress testing tells us what happens if we get 5x more traffic than expected.


The Bottom Line

Load testing is the only reliable way to predict how your backend will behave under real-world traffic. Every performance issue we found during our three-week optimization sprint was invisible during normal development and testing. N+1 queries, connection pool exhaustion, missing indexes, and memory leaks all performed fine with a handful of users but collapsed under load.

Start load testing early — not three weeks before launch. Include it in your CI/CD pipeline as a periodic check. Use production-scale data, simulate realistic user behavior, and monitor your system from every angle during the test. The goal is not just to find the breaking point, but to understand why it breaks and fix it before your users find it for you.

Product recommendations are based on independent research and testing. We may earn a commission through affiliate links at no extra cost to you.

Tags: load testing performance optimization k6 backend scalability system design

Related Articles