Database Design Patterns: How I Choose Between SQL and NoSQL for Every Project

The most expensive technical decision I have ever made was choosing MongoDB for a financial reporting system. At the time, it seemed logical — our data had nested structures, the schema was evolving rapidly, and MongoDB’s flexible documents felt like the right fit. Six months later, we needed to run complex aggregation queries across millions of records with transactional guarantees, and we were fighting the database at every turn. The migration to PostgreSQL took three months and required rewriting almost every data access function in the application.

That experience did not make me anti-NoSQL. It made me anti-default. The “just use MongoDB” or “just use PostgreSQL” mindset ignores the fact that different data patterns demand different databases. The right choice depends on your access patterns, consistency requirements, scaling needs, and operational capabilities. This guide is the decision framework I use for every new project — the framework I wish I had before that MongoDB incident.

TL;DR — Quick Decision Guide

Your Data Pattern	Best Fit	Why
Structured, relational data with transactions	PostgreSQL	ACID, joins, mature ecosystem
Flexible documents, rapid prototyping	MongoDB	Schema flexibility, developer velocity
Key-value lookups at massive scale	DynamoDB	Predictable performance, serverless scaling
Caching, sessions, real-time data	Redis	Sub-millisecond latency, rich data structures
Time-series metrics and logs	TimescaleDB / InfluxDB	Optimized for time-based queries
Full-text search	Elasticsearch	Inverted index, relevance scoring
Graph relationships	Neo4j	Traversal queries, relationship-first model

If you are unsure, start with PostgreSQL. It handles 90% of use cases well, and you can add specialized databases later when you hit specific limitations.

Understanding the Fundamental Tradeoff

Every database makes tradeoffs along three axes, captured by the CAP theorem: Consistency (every read returns the latest write), Availability (every request receives a response), and Partition Tolerance (the system continues operating despite network failures).

In practice, every distributed system must tolerate partitions, so the real tradeoff is between consistency and availability. SQL databases typically prioritize consistency. NoSQL databases often prioritize availability and partition tolerance, offering eventual consistency instead of strong consistency.

But the CAP theorem is just the starting point. The day-to-day tradeoffs that actually affect your development experience are:

SQL Databases	NoSQL Databases
Fixed schema (migrations required)	Flexible schema (evolve freely)
Powerful joins across tables	Denormalized data, embed related data
ACID transactions	Eventual consistency (usually)
Vertical scaling (bigger server)	Horizontal scaling (more servers)
Mature tooling, universal SQL knowledge	Varied query languages per database
Slower writes at extreme scale	Optimized for write-heavy workloads

PostgreSQL: The Default Choice

PostgreSQL is my default database for any new project, and it has been for the last five years. Not because it is the best at everything, but because it is good enough at almost everything and excellent at many things. It handles JSON documents (JSONB), full-text search, geospatial queries, and time-series data reasonably well — often well enough that you do not need a separate database for those use cases.

When PostgreSQL Shines

Transactional workloads: E-commerce, financial systems, anything where data integrity is non-negotiable
Complex queries: Reporting, analytics, data that needs joins across multiple tables
Mixed workloads: Applications that need both OLTP (transactions) and light OLAP (analytics)

Schema Design Example

CREATE TABLE users (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    email VARCHAR(255) UNIQUE NOT NULL,
    name VARCHAR(100) NOT NULL,
    metadata JSONB DEFAULT '{}',
    created_at TIMESTAMPTZ DEFAULT NOW(),
    updated_at TIMESTAMPTZ DEFAULT NOW()
);

CREATE TABLE orders (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    user_id UUID NOT NULL REFERENCES users(id),
    status VARCHAR(20) NOT NULL DEFAULT 'pending',
    total DECIMAL(10, 2) NOT NULL,
    items JSONB NOT NULL,
    created_at TIMESTAMPTZ DEFAULT NOW(),
    
    CONSTRAINT valid_status CHECK (status IN ('pending', 'processing', 'shipped', 'delivered', 'cancelled'))
);

CREATE INDEX idx_orders_user_id ON orders(user_id);
CREATE INDEX idx_orders_status ON orders(status);
CREATE INDEX idx_orders_created_at ON orders(created_at DESC);

Notice the JSONB column for items and metadata. This is PostgreSQL’s secret weapon — you get the relational model for structured data and JSON flexibility for semi-structured data, all in one database. I store user preferences, order line items, and configuration data in JSONB columns while keeping the core schema relational.

PostgreSQL Performance Tips I Have Learned the Hard Way

Index strategically, not aggressively. Early in my career, I added indexes to every column that appeared in a WHERE clause. This slowed down writes significantly and wasted storage. Now I only add indexes after analyzing actual query patterns with EXPLAIN ANALYZE.

-- See what the query planner is doing
EXPLAIN ANALYZE
SELECT * FROM orders
WHERE user_id = '...' AND status = 'pending'
ORDER BY created_at DESC
LIMIT 20;

-- Composite index for this specific query pattern
CREATE INDEX idx_orders_user_status_date
ON orders(user_id, status, created_at DESC);

Use connection pooling. PostgreSQL creates a new process for each connection, and each process consumes about 5-10 MB of memory. Without connection pooling, 500 concurrent users means 500 PostgreSQL processes and 2.5-5 GB of memory just for connections. I use PgBouncer in transaction mode for every production PostgreSQL deployment.

MongoDB: When Schema Flexibility Matters

MongoDB is the right choice when your data structure is genuinely uncertain or changes frequently. I use it for content management systems, product catalogs with varying attributes, and rapid prototypes where I need to iterate on the data model without running migrations.

When MongoDB Shines

Content management: Blog posts, articles, CMS data with varying fields
Product catalogs: Products with different attributes per category
Event storage: Log data, analytics events, IoT sensor data
Rapid prototyping: When you need to iterate quickly without schema migrations

Document Design Example

// Product document — different categories have different attributes
{
  _id: ObjectId("..."),
  name: "MacBook Pro 16-inch",
  category: "laptops",
  price: 2499.99,
  specs: {
    cpu: "M4 Pro",
    ram: "36GB",
    storage: "1TB SSD",
    display: "16.2-inch Liquid Retina XDR"
  },
  reviews: [
    { userId: "...", rating: 5, text: "Best laptop I have ever owned", date: "2026-03-01" },
    { userId: "...", rating: 4, text: "Great but expensive", date: "2026-03-15" }
  ],
  inventory: {
    warehouse_a: 45,
    warehouse_b: 12,
    reserved: 3
  }
}

In a relational database, this would require a products table, a product_specs table (or an EAV pattern), a reviews table, and an inventory table — plus joins to reconstruct the complete product. In MongoDB, it is a single document read.

The Embedding vs. Referencing Decision

This is the most important design decision in MongoDB. Embed data that is always accessed together and has a one-to-few relationship. Reference data that is accessed independently, changes frequently, or has a one-to-many/many-to-many relationship.

// Embed — reviews belong to the product, accessed together
{
  name: "Product A",
  reviews: [{ rating: 5, text: "..." }, { rating: 4, text: "..." }]
}

// Reference — users exist independently, many-to-many with orders
{
  name: "Order #123",
  userId: ObjectId("..."),  // reference, not embedded
  items: [
    { productId: ObjectId("..."), quantity: 2 }
  ]
}

I made the mistake of embedding order history inside user documents in an early project. Users with hundreds of orders had documents exceeding MongoDB’s 16MB limit, and every user query pulled back megabytes of order data that the application did not need. Now I follow a strict rule: if the embedded array can grow unbounded, use references instead.

DynamoDB: Predictable Performance at Any Scale

DynamoDB is Amazon’s fully managed NoSQL database, and it excels in a very specific niche: key-value and simple query patterns at massive scale with predictable single-digit millisecond latency. I use it for session storage, user profiles, feature flags, and any workload where I know the access patterns in advance and need guaranteed performance.

When DynamoDB Shines

Session storage: Fast key-value lookups
User profiles: Read-heavy, simple access patterns
Real-time gaming: Leaderboards, player state
IoT data: High write throughput, time-partitioned reads

The Single-Table Design Pattern

DynamoDB charges by read/write capacity and number of tables, so the community has developed the single-table design pattern — fitting multiple entity types into one table using clever partition and sort key design.

| PK (Partition Key)  | SK (Sort Key)           | Data                    |
|---------------------|-------------------------|-------------------------|
| USER#123            | PROFILE                 | { name, email, ... }    |
| USER#123            | ORDER#2026-03-15#001    | { total, status, ... }  |
| USER#123            | ORDER#2026-03-10#002    | { total, status, ... }  |
| ORDER#001           | METADATA                | { userId, total, ... }  |
| ORDER#001           | ITEM#SKU-456            | { quantity, price, ... }|

This design lets you fetch a user’s profile and recent orders in a single query, without joins and with consistent single-digit millisecond latency regardless of table size.

The tradeoff is that you must know your access patterns before designing the table. Adding a new query pattern to DynamoDB often requires adding a Global Secondary Index (GSI) or restructuring the entire table. This is the opposite of SQL, where you can write ad-hoc queries and add indexes later.

Redis: Beyond Caching

Most developers know Redis as a cache, but its data structures make it useful for far more than key-value caching. I use Redis for session management, rate limiting, real-time leaderboards, pub/sub messaging, and distributed locks.

Redis Data Structures by Use Case

Data Structure	Use Case	Example
String	Caching, counters, sessions	Page cache, rate limit counters
Hash	Object storage	User sessions, feature flags
List	Queues, recent items	Job queues, recent activity feeds
Set	Unique collections, tags	Online users, tag systems
Sorted Set	Rankings, time-based data	Leaderboards, scheduled tasks
Stream	Event streaming, logs	Activity streams, audit logs

const Redis = require('ioredis');
const redis = new Redis();

// Leaderboard with sorted sets
await redis.zadd('leaderboard:weekly', 1500, 'user:alice');
await redis.zadd('leaderboard:weekly', 2300, 'user:bob');
await redis.zadd('leaderboard:weekly', 1800, 'user:charlie');

// Get top 10 players
const top10 = await redis.zrevrange('leaderboard:weekly', 0, 9, 'WITHSCORES');

// Rate limiting with strings
const key = `ratelimit:${userId}`;
const count = await redis.incr(key);
if (count === 1) await redis.expire(key, 60);
if (count > 100) throw new Error('Rate limit exceeded');

// Distributed lock
const lockKey = `lock:order:${orderId}`;
const acquired = await redis.set(lockKey, 'locked', 'NX', 'EX', 30);
if (!acquired) throw new Error('Resource is locked');

Redis Is Not a Primary Database

This is a mistake I see regularly: using Redis as the primary data store. Redis stores data in memory, and while it supports persistence (RDB snapshots, AOF logs), it is not designed for durability in the way PostgreSQL or MongoDB are. I use Redis for data that can be regenerated (caches, sessions, counters) and always keep the source of truth in a durable database.

My Decision Framework

When I start a new project or add a new data requirement, I walk through these questions:

1. Do I need ACID transactions?
   → Yes: PostgreSQL
   → No: Continue

2. Is the schema well-defined and stable?
   → Yes: PostgreSQL
   → No: MongoDB or DynamoDB

3. Do I need complex queries (joins, aggregations)?
   → Yes: PostgreSQL
   → No: Continue

4. Do I know all access patterns in advance?
   → Yes, and I need massive scale: DynamoDB
   → No, or I need flexibility: MongoDB

5. Is this caching, sessions, or real-time data?
   → Yes: Redis (with a durable primary database)

6. Am I unsure?
   → PostgreSQL. You can always add specialized databases later.

The Polyglot Persistence Reality

Most production systems I have worked on use multiple databases. A typical e-commerce backend might use:

PostgreSQL for orders, users, and payments (transactional data)
Redis for sessions, caching, and rate limiting
Elasticsearch for product search
S3 for file storage

The key is making each database responsible for what it does best, rather than forcing one database to handle everything.

Frequently Asked Questions

Is MongoDB Good for Financial Data?

Generally no. Financial data typically requires ACID transactions, complex aggregation queries, and strict data integrity guarantees — all areas where PostgreSQL excels. MongoDB added multi-document transactions in version 4.0, but they come with performance penalties and are less mature than PostgreSQL’s transaction support. I use PostgreSQL for any data where correctness is more important than flexibility.

Should I Start with PostgreSQL or MongoDB for a New Project?

Start with PostgreSQL unless you have a specific reason to choose MongoDB. PostgreSQL’s JSONB support gives you document-like flexibility when you need it, while providing relational capabilities that are difficult to add to a document database later. The scenarios where MongoDB is clearly better (rapidly evolving schemas, document-centric access patterns) are the minority of projects.

How Do I Migrate from MongoDB to PostgreSQL?

The migration process I followed: (1) design the PostgreSQL schema based on your query patterns, not your MongoDB document structure; (2) write a migration script that transforms documents into relational rows; (3) run both databases in parallel with dual writes; (4) verify data consistency with comparison queries; (5) switch reads to PostgreSQL; (6) stop writes to MongoDB. Plan for 2-4 weeks for a medium-sized dataset.

When Is DynamoDB Better Than PostgreSQL?

When you need guaranteed single-digit millisecond latency at any scale and your access patterns are well-defined. DynamoDB removes all operational overhead (no server management, no connection pooling, no vacuuming) and scales automatically. The tradeoff is query flexibility — if you cannot predict your access patterns or need ad-hoc queries, PostgreSQL is a better fit.

Can I Use Redis as My Only Database?

No. Redis is designed for speed, not durability. While it supports persistence, data loss is possible during crashes or failovers. Use Redis for caching, sessions, rate limiting, and real-time features, but always back it with a durable database (PostgreSQL, MongoDB, DynamoDB) as the source of truth.

The Bottom Line

Database selection is one of the few technical decisions that is genuinely difficult to reverse. A wrong choice early on can cost months of migration work later — I learned this firsthand with the MongoDB-to-PostgreSQL migration that could have been avoided with a better initial analysis.

My rule of thumb: start with PostgreSQL unless you have a clear, documented reason to choose something else. Add Redis for caching and real-time features. Consider MongoDB or DynamoDB only when your data patterns genuinely benefit from their specific strengths. And always design your application layer with database portability in mind — abstract your data access behind repository interfaces so that when (not if) your needs change, the migration is contained.

Product recommendations are based on independent research and testing. We may earn a commission through affiliate links at no extra cost to you.