Replication

Database replication is the practice of keeping multiple copies of the same data on different nodes for availability, durability, and read scaling. Almost every production database runs with at least one replica; the design choices are around topology, synchronisation, and failover.

Common topologies

  • Primary-replica (master-slave). One node accepts writes, replicas follow. Standard in…

Sharding is the practice of horizontally partitioning a dataset across multiple database instances so each shard holds a subset of the data and serves a subset of the load. Sharding is the standard answer when a single database server cannot keep up with storage, read throughput, or write throughput.

How it works

A shard key (one or more fields) determines which shard each row or document…

Replica Set

A replica set is MongoDB's name for a group of database nodes that maintain the same data and provide automatic failover. One node is the primary (accepts writes); the rest are secondaries that asynchronously apply the primary's operation log. If the primary becomes unreachable, the secondaries elect a new primary among themselves.

How it works

Writes go to the primary, which records them in…

Aggregation Pipeline

The Aggregation Pipeline is MongoDB's framework for transforming and combining documents through a sequence of stages, conceptually similar to a SQL SELECT with GROUP BY, JOIN, and window functions. A pipeline is an array of stage operators applied in order; the output of one stage feeds the next.

Common stages

  • $match: filter documents (equivalent to WHERE)
  • $project: reshape and select…

BSON (Binary JSON) is the binary-encoded serialization format MongoDB uses to store and transmit documents. It extends JSON with additional data types (date, ObjectId, decimal128, binary data, regex), preserves field ordering, and is faster to parse and traverse than JSON text.

Extensions beyond JSON

  • ObjectId. 12-byte unique identifier embedding timestamp, machine, process, and counter.
    *…
Document Database

A document database stores data as self-contained documents (typically JSON or BSON) grouped into collections, instead of as rows in tables with strict schemas. Each document carries its own structure, allowing fields to vary across documents in the same collection.

Why document storage

  • Schema flexibility. Fields can be added without migrations; documents in the same collection can have…

NoSQL is an umbrella term for databases that depart from the strict relational, SQL-based, table-and-row model. The category emerged in the late 2000s as web-scale applications needed horizontal scaling and flexible schemas that traditional relational systems struggled to provide. NoSQL is not a single technology but four broadly recognised families.

The four families

  • Document. JSON-like…

OLTP (Online Transaction Processing) describes the class of database workloads characterised by many short-lived, latency-sensitive transactions: row-level reads and writes that back interactive applications. OLTP contrasts with OLAP (Online Analytical Processing), where queries scan large fractions of historical data for analytics.

OLTP characteristics

  • Short transactions. A few row reads…
MongoDB Data Modeling: How to Design Schemas for Real-World Applications

A fast MongoDB system comes from modeling data around how your application reads and writes it. This guide breaks down how to structure documents, when to embed or reference, the patterns used in real production systems, and the indexing strategies that keep performance predictable as data grows.

Read more →
Page 1