Backend Engineering |

Monolith to Microservices: My Real Migration Story and What I Would Do Differently

Practical guide to migrating from monolith to microservices with strangler fig pattern, domain boundaries, and real lessons.

By SouvenirList

Two years ago, I led a migration from a Django monolith to microservices. The monolith had served us well for three years, handling everything from user authentication to payment processing to inventory management. But as the team grew from 4 to 15 developers, deployments became a bottleneck. A one-line CSS fix required deploying the entire application, including the payment processing code. Every deployment was a 45-minute process with a mandatory rollback plan.

The migration took eleven months — five months longer than we estimated. We made mistakes that cost us weeks of rework. We also made decisions that saved us months of pain. This is the honest account of what happened, what worked, what did not, and what I would do differently if I started over today.


TL;DR — Migration Decisions at a Glance

DecisionWhat We DidWhat I Would Do Now
Migration strategyBig bang rewriteStrangler fig (gradual)
Service boundariesBased on database tablesBased on business domains
CommunicationREST everywhereREST for sync, events for async
Data managementShared database initiallyDatabase per service from day one
Timeline estimate6 monthsAdd 80% buffer to any estimate
Team structureFeature teams across servicesOne team per service

When a Monolith Is Actually Fine

Before I talk about migration, I need to say something that took me too long to learn: most applications should stay as monoliths. The pain points that drove our migration were real, but they were also solvable without microservices. We could have modularized the monolith, separated the deployment pipeline, or adopted a modular monolith architecture.

Signs You Actually Need Microservices

  • Team scaling problems: Multiple teams cannot deploy independently because they share a codebase
  • Scaling mismatches: One part of your system needs 10x more compute than the rest, and you are scaling everything together
  • Technology constraints: A specific component would benefit from a different language or runtime
  • Deployment frequency: You need to deploy one component hourly but can only deploy the monolith weekly

Signs You Should Stay with a Monolith

  • Your team has fewer than 8-10 developers
  • Your deployment pipeline takes less than 15 minutes
  • Your scaling needs are uniform across the application
  • You do not have operational experience with distributed systems

We had legitimate reasons to migrate. But I have since seen teams migrate to microservices because it seemed modern, only to spend the next two years rebuilding the operational capabilities they lost — things like transactions, debugging, and deployment simplicity that monoliths give you for free.


Phase 1: Understanding What You Have

The first thing I did — and the most valuable — was map the monolith. Not the code structure, but the business domains and their relationships. I spent two weeks drawing diagrams, interviewing team members, and tracing request flows through the codebase.

Domain Mapping

Our monolith had these major domains:

┌──────────────────────────────────────────────┐
│                  Monolith                     │
│                                              │
│  ┌──────────┐  ┌──────────┐  ┌───────────┐  │
│  │   User   │──│  Order   │──│ Inventory │  │
│  │  Mgmt    │  │Processing│  │           │  │
│  └──────────┘  └──────────┘  └───────────┘  │
│       │             │              │         │
│  ┌──────────┐  ┌──────────┐  ┌───────────┐  │
│  │ Payment  │──│ Shipping │──│Notification│  │
│  │          │  │          │  │           │  │
│  └──────────┘  └──────────┘  └───────────┘  │
└──────────────────────────────────────────────┘

Finding the Boundaries

The hardest part was not identifying the domains — it was finding where they actually separated. In the codebase, order processing directly queried the user table, payment processing updated the order status, and notification logic was scattered across every module.

I used a technique I call dependency counting: for every module, I counted how many other modules it directly called or was called by. Modules with few dependencies were candidates for early extraction. Modules entangled with everything else needed to be refactored before extraction.

Module            | Dependencies In | Dependencies Out | Extraction Difficulty
User Management   | 5              | 1                | Hard (many depend on it)
Notifications     | 0              | 4                | Easy (depends on others, nothing depends on it)
Inventory         | 2              | 1                | Medium
Payment           | 1              | 2                | Medium
Order Processing  | 2              | 4                | Hard (core orchestrator)
Shipping          | 1              | 2                | Medium

Notifications had zero inbound dependencies — nothing else in the system called notification functions directly. It was the obvious first extraction candidate.


Phase 2: The Strangler Fig Pattern

After our initial (failed) attempt at a big bang rewrite — where we tried to rebuild everything from scratch in parallel — we switched to the strangler fig pattern. Named after the fig vines that gradually envelop and replace a host tree, this approach extracts one capability at a time from the monolith.

How It Works

Step 1: Route all traffic through an API gateway
Step 2: Extract one service from the monolith
Step 3: Route that service's traffic to the new service
Step 4: Remove the old code from the monolith
Step 5: Repeat for the next service

The API gateway was critical. It gave us a single place to control routing, so we could shift traffic to new services gradually — starting with 5% of requests to verify correctness before ramping to 100%.

Our Extraction Order

  1. Notifications (week 1-3) — No inbound dependencies, low risk
  2. User Management (week 4-8) — High dependency count, but well-defined API surface
  3. Inventory (week 9-12) — Medium complexity, clear domain boundary
  4. Payment (week 13-18) — High risk, required careful transaction handling
  5. Shipping (week 19-22) — Medium complexity, dependent on order data
  6. Order Processing (week 23-30) — The core, extracted last after everything else was stable

Each extraction followed the same process:

1. Define the service API (OpenAPI spec)
2. Build the new service
3. Run both old and new code in parallel (shadow mode)
4. Compare outputs for correctness
5. Gradually shift traffic
6. Remove old code from monolith

Phase 3: The Database Problem

This is where we made our biggest mistake. We initially kept all microservices pointing at the same PostgreSQL database. It was the path of least resistance — no data migration needed, all existing queries still worked, and we could ship faster.

It was also a ticking time bomb.

Why Shared Databases Are Dangerous

With a shared database, services are coupled at the data level even if they are separated at the code level. When the inventory team wanted to change the products table schema, they had to coordinate with the order team, the shipping team, and anyone else who queried that table. We had recreated the deployment coupling we were trying to escape, just at a different layer.

# What we had (bad)
┌───────────┐  ┌───────────┐  ┌───────────┐
│  Orders   │  │ Inventory │  │  Payment  │
└─────┬─────┘  └─────┬─────┘  └─────┬─────┘
      │              │              │
      └──────────────┼──────────────┘

            ┌────────┴────────┐
            │   Shared DB     │
            └─────────────────┘

# What we migrated to (good)
┌───────────┐  ┌───────────┐  ┌───────────┐
│  Orders   │  │ Inventory │  │  Payment  │
└─────┬─────┘  └─────┬─────┘  └─────┬─────┘
      │              │              │
┌─────┴─────┐  ┌─────┴─────┐  ┌─────┴─────┐
│ Orders DB │  │Inventory DB│  │Payment DB │
└───────────┘  └───────────┘  └───────────┘

How We Split the Database

We used the database-per-service pattern, migrating one service’s data at a time:

  1. Create the new database for the service
  2. Set up data synchronization (CDC with Debezium)
  3. Migrate reads to the new database
  4. Migrate writes to the new database
  5. Remove synchronization and drop old tables

The data synchronization phase was nerve-wracking. For two weeks, we had dual writes — the monolith writing to the old database and Debezium replicating changes to the new one. We ran continuous comparison queries to verify data consistency. When we finally cut over, the difference was zero rows. I have never been more relieved.


Phase 4: Inter-Service Communication

Our initial approach was REST for everything. Service A needed data from Service B? HTTP GET request. Service A needed to trigger an action in Service B? HTTP POST request. This worked until we had a chain of synchronous calls five services deep, and a timeout in the last service cascaded failures all the way back to the user.

Synchronous vs. Asynchronous Communication

PatternUse WhenExample
REST (sync)Client needs an immediate responseGET /users/123
Events (async)Action triggers downstream work that does not need immediate feedbackOrderPlaced → Send email, Update inventory
Request/Reply (async)Need a response but can tolerate latencyProcess payment, get result via callback

The rule I follow now: if the caller does not need the result to complete its response, use asynchronous communication. When a user places an order, they need confirmation that the order was received. They do not need to wait for the email to send, the inventory to update, or the shipping label to generate.

// Bad — synchronous chain
app.post('/api/orders', async (req, res) => {
  const order = await orderService.create(req.body);
  await inventoryService.reserve(order.items);     // sync call
  await paymentService.charge(order.total);         // sync call
  await notificationService.sendConfirmation(order); // sync call
  await shippingService.createLabel(order);          // sync call
  res.json(order);
});

// Good — async events for non-critical path
app.post('/api/orders', async (req, res) => {
  const order = await orderService.create(req.body);
  await inventoryService.reserve(order.items);  // sync — must succeed
  await paymentService.charge(order.total);     // sync — must succeed

  // Async — happens in background
  await eventBus.publish('order.created', { orderId: order.id });
  // notification and shipping services react to this event independently

  res.status(201).json(order);
});

Event-Driven Architecture

We adopted RabbitMQ for event-based communication between services. Each service publishes events when something important happens in its domain, and other services subscribe to the events they care about.

// Order service — publishes events
async function createOrder(data) {
  const order = await db.query(
    'INSERT INTO orders (...) VALUES (...) RETURNING *',
    [...]
  );

  await rabbitMQ.publish('order.events', 'order.created', {
    orderId: order.id,
    userId: order.userId,
    items: order.items,
    total: order.total,
    createdAt: new Date().toISOString(),
  });

  return order;
}

// Notification service — subscribes to events
rabbitMQ.subscribe('order.events', 'order.created', async (event) => {
  const user = await userService.getById(event.userId);
  await emailService.send({
    to: user.email,
    template: 'order-confirmation',
    data: { orderId: event.orderId, total: event.total },
  });
});

What Went Wrong (And What I Learned)

Mistake 1: No API Gateway from Day One

We initially had services calling each other directly. When we needed to add authentication, rate limiting, or request logging, we had to add it to every single service. An API gateway would have given us a single place for cross-cutting concerns.

Mistake 2: Inconsistent Error Handling

Each team built their service’s error handling independently. One service returned { error: "message" }, another returned { errors: [{ code: "...", detail: "..." }] }, and a third returned plain text. Client code needed special handling for each service. We eventually standardized, but it cost us weeks of refactoring.

Mistake 3: Underestimating Operational Complexity

In a monolith, “debugging” means reading one log file. In microservices, a single user request can touch six services, and the bug might be in the interaction between services, not in any single one. We needed distributed tracing (OpenTelemetry), centralized logging (ELK stack), and service-level dashboards — none of which we had at launch.

Mistake 4: Splitting Too Small

We initially created a “user preferences” microservice separate from the “user profile” microservice. They were so tightly coupled that every feature required changes to both services, coordinated deployments, and cross-service API calls for simple operations. We merged them back within three months.


The Final Architecture

After eleven months, our architecture looked like this:

                    ┌──────────────┐
                    │  API Gateway │
                    │  (Kong)      │
                    └──────┬───────┘

        ┌──────────────────┼──────────────────┐
        │                  │                  │
┌───────┴──────┐  ┌────────┴───────┐  ┌───────┴──────┐
│    User      │  │    Order       │  │   Inventory  │
│   Service    │  │   Service      │  │   Service    │
│  (Node.js)   │  │  (Node.js)    │  │  (Python)    │
└───────┬──────┘  └────────┬───────┘  └───────┬──────┘
        │                  │                  │
┌───────┴──────┐  ┌────────┴───────┐  ┌───────┴──────┐
│  PostgreSQL  │  │  PostgreSQL    │  │  PostgreSQL  │
└──────────────┘  └────────────────┘  └──────────────┘
        │                  │                  │
        └──────────────────┼──────────────────┘

                    ┌──────┴───────┐
                    │  RabbitMQ    │
                    │  (Events)    │
                    └──────────────┘

Each service owned its data, communicated through events for async workflows, and used REST for synchronous queries. The API gateway handled authentication, rate limiting, and request routing.


Frequently Asked Questions

How Long Does a Monolith to Microservices Migration Take?

Based on my experience and conversations with other teams, expect 6-18 months for a medium-sized application (10-20 database tables, 4-8 major business domains). The actual coding is usually 40% of the effort. The remaining 60% is data migration, testing, operational setup (monitoring, logging, deployment pipelines), and fixing unexpected integration issues. Whatever your initial estimate, add at least 50%.

Should I Rewrite from Scratch or Migrate Incrementally?

Always migrate incrementally using the strangler fig pattern. We tried the rewrite approach first and abandoned it after two months. The rewrite was perpetually “almost done” while the monolith kept receiving features that the rewrite had to catch up with. The strangler fig approach let us extract one service at a time while the monolith continued to serve production traffic.

What Is the Biggest Risk in Microservices Migration?

Data consistency across services. In a monolith, a database transaction guarantees that an order is created and inventory is decremented atomically. In microservices, you need distributed transactions or eventual consistency, both of which are significantly more complex. If your business requires strong consistency (financial transactions, inventory management), plan your data strategy carefully before splitting services.

Do I Need Kubernetes for Microservices?

No, not initially. We ran our first microservices on simple VM instances with Docker Compose and a load balancer. Kubernetes adds operational complexity that is not justified until you have at least 5-10 services with independent scaling needs. Start simple — you can always migrate to Kubernetes later.

How Do I Handle Shared Data Between Services?

Each service should own its data and expose it through APIs. If the order service needs user information, it calls the user service API — it does not query the user database directly. For data that is read frequently, use event-driven synchronization: the user service publishes “user.updated” events, and the order service maintains a local cache of the user data it needs. This eliminates synchronous API calls for common queries while keeping each service’s data autonomous.


The Bottom Line

Migrating from a monolith to microservices is one of the most impactful — and risky — architectural changes you can make. The strangler fig pattern, domain-driven service boundaries, and event-driven communication are the decisions that made our migration successful. The shared database, synchronous call chains, and premature service splitting are the decisions that cost us months of rework.

If I were starting this migration today, I would spend more time on the domain mapping phase, insist on database-per-service from day one, and set up distributed tracing before extracting the first service. Most importantly, I would challenge whether microservices were truly necessary — because a well-structured monolith with clear module boundaries solves many of the same problems at a fraction of the operational cost.

Product recommendations are based on independent research and testing. We may earn a commission through affiliate links at no extra cost to you.

Tags: microservices monolith system design migration architecture backend

Related Articles