Data & Databases · 11 min read · ~26 min study · intermediate

Database Design for Trading Systems

Schema design, normalization, indexing, and the tradeoffs that matter when latency and correctness collide.

11 min read ~26 min study intermediate data-engineering #database#design#trading 20 learning outcomes

Database Design for Trading Systems

How to structure databases for trading platforms — normalization, schema design, indexing strategies, and the tradeoffs that matter in financial systems.

Design Matters More Than You Think

Writing SQL queries is one skill. Designing the database those queries run against is another — and arguably more impactful. A well-designed schema makes queries simple, fast, and correct. A poorly designed one means you are forever writing workarounds and wondering why everything is slow.

In trading systems, the stakes are higher than most applications. Data volumes are large, queries are complex, and errors in data integrity can have direct financial consequences.

Normalization: Organizing Data to Reduce Redundancy

Normalization means structuring your data so that each piece of information is stored exactly once. Instead of writing "Apple Inc." on every one of a million trade records, you store it once in a products table and reference it by symbol.

CREATE TABLE products (
 symbol VARCHAR(10) PRIMARY KEY,
 name VARCHAR(100) NOT NULL,
 sector VARCHAR(50),
 exchange VARCHAR(20),
 currency VARCHAR(3) DEFAULT 'USD'
);

CREATE TABLE accounts (
 account_id SERIAL PRIMARY KEY,
 account_type VARCHAR(20) CHECK (account_type IN ('TRADING', 'CUSTODY', 'MARGIN')),
 created_at TIMESTAMP DEFAULT NOW

CREATE TABLE trades (
 trade_id SERIAL PRIMARY KEY,
 symbol VARCHAR(10) NOT NULL REFERENCES products(symbol),
 account_id INTEGER NOT NULL REFERENCES accounts(account_id),
 quantity INTEGER NOT NULL,
 price DECIMAL(12,4) NOT NULL,
 side VARCHAR(4) CHECK (side IN ('BUY', 'SELL')),
 trade_date TIMESTAMP NOT NULL,
 settlement_date DATE,
 status VARCHAR(10) DEFAULT 'PENDING'

The REFERENCES keyword creates foreign keys — the database will reject any trade with a symbol that does not exist in products or an account that does not exist in accounts. This is data integrity enforced at the database level, not in your application code where it can be bypassed.

When to Denormalise

Pure normalization is not always practical. For read-heavy analytics workloads, some denormalisation — storing pre-computed or duplicated data — can dramatically improve query performance. The key is to denormalise intentionally, not accidentally.

Indexing Strategy

Indexes are like a book's index — they let the database jump directly to matching rows without scanning the entire table.

-- The most common query pattern: symbol + date range CREATE INDEX idx_trades_symbol_date ON trades (symbol, trade_date);

-- Account-based lookups CREATE INDEX idx_trades_account ON trades (account_id, trade_date);

-- Composite index for settlement queries CREATE INDEX idx_trades_settlement ON trades (status, settlement_date) WHERE status != 'SETTLED';

Column order in composite indexes matters. An index on (symbol, trade_date) supports queries filtering by symbol alone or by symbol + date, but not by date alone. Design your indexes around your actual query patterns.

The tradeoff: every index speeds up reads but slows down writes, because the database must update every relevant index on each INSERT or UPDATE. For trading systems processing thousands of trades per second, this balance requires careful thought. These performance considerations are also important when thinking about data pipeline design.

Partitioning for Scale

When your trades table reaches hundreds of millions of rows, even well-indexed queries slow down. Partitioning splits a table into smaller physical chunks — almost always by date in finance:

trade_id SERIAL, symbol VARCHAR(10), quantity INTEGER, price DECIMAL(12,4), trade_date DATE NOT NULL ) PARTITION BY RANGE (trade_date);

CREATE TABLE trades_2024_q1 PARTITION OF trades FOR VALUES FROM ('2024-01-01') TO ('2024-04-01'); CREATE TABLE trades_2024_q2 PARTITION OF trades FOR VALUES FROM ('2024-04-01') TO ('2024-07-01'); CREATE TABLE trades_2024_q3 PARTITION OF trades FOR VALUES FROM ('2024-07-01') TO ('2024-10-01'); CREATE TABLE trades_2024_q4 PARTITION OF trades FOR VALUES FROM ('2024-10-01') TO ('2025-01-01');

Now a query for January 2024 only scans the Q1 partition — ignoring the other 75% of the year's data. For multi-year tables, this can turn a 30-second query into a sub-second one.

Constraints and Data Quality

Financial data needs to be correct. Database constraints are your first line of defense:

-- Prices must be positive ALTER TABLE trades ADD CONSTRAINT chk_positive_price CHECK (price > 0);

-- Quantity must be non-zero ALTER TABLE trades ADD CONSTRAINT chk_nonzero_qty CHECK (quantity != 0);

-- No duplicate trade IDs from external systems ALTER TABLE trades ADD CONSTRAINT uq_external_id UNIQUE (external_trade_id);

-- Enum-like constraints ALTER TABLE trades ADD CONSTRAINT chk_status CHECK (status IN ('PENDING', 'FILLED', 'CANCELED', 'SETTLED'));

These constraints catch bad data at the database level, before it propagates through your APIs and reporting systems.

When Relational Is Not Enough

Relational databases are the default, but they are not ideal for every workload:

Time series databases handle high-frequency market data more efficiently
Document databases (MongoDB) work well for semi-structured data like trade confirmations
Columnar databases (ClickHouse, Redshift) excel at analytical queries over billions of rows

A well-designed production system often uses multiple database technologies — relational for transactional data, TSDB for market data, and a data warehouse for analytics. Understanding cloud database offerings helps you choose the right tool for each workload.

Good database design is foundational. Get it right and everything built on top — your SQL queries, your APIs, your reporting — works smoothly. Get it wrong and you will be fighting the schema for years.

Want to go deeper on Database Design for Trading Systems?

This article covers the essentials, but there's a lot more to learn. Inside , you'll find hands-on coding exercises, interactive quizzes, and structured lessons that take you from fundamentals to production-ready skills — across 50+ courses in technology, finance, and mathematics.

Free to get started · No credit card required

Keep Reading

[Data & Databases

SQL for Financial Data: A Developer's Starting Point

Learn the SQL fundamentals that matter for finance — querying trade data, aggregating positions, joining reference data, and understanding relational databases.](/quant-knowledge/data/sql-for-financial-data)[Data & Databases

Advanced SQL Techniques for Financial Systems

CTEs, window functions, query optimization, and the advanced SQL patterns used in trading platforms and financial data pipelines.](/quant-knowledge/data/advanced-sql-techniques-for-financial-systems)[Data & Databases

Time Series Databases in Finance: When Relational Is Not Enough

Why financial firms use specialized time series databases for market data, tick storage, and monitoring — and when you should consider one.](/quant-knowledge/data/time-series-databases-in-finance)[Software Engineering

APIs and REST for Financial Data

How APIs work, RESTful design principles, and practical patterns for building and consuming financial data APIs.](/quant-knowledge/software-engineering/apis-and-rest-for-financial-data)

What You Will Learn

Explain design matters more than you think.
Build normalization: organizing data to reduce redundancy.
Calibrate indexing strategy.
Compute partitioning for scale.
Design constraints and data quality.
Implement when relational is not enough.

Prerequisites

Basic SQL — see Basic SQL.
Python data tooling — see Python data tooling.
Comfort reading code and basic statistical notation.
Curiosity about how the topic shows up in a US trading firm.

Mental Model

Financial data is heavy, append-only, and queried at every horizon — milliseconds for risk, hours for research, years for compliance. The right database is the one that aligns its access pattern with the dominant query, not the one that scales the highest in a vendor pitch. For Database Design for Trading Systems, frame the topic as the piece that schema design, normalization, indexing, and the tradeoffs that matter when latency and correctness collide — and ask what would break if you removed it from the workflow.

Why This Matters in US Markets

US trading firms run on KDB+, ClickHouse, Snowflake, Postgres, and Parquet on S3. Tick stores at Citadel and Jane Street hold petabytes of CME and US equities data going back decades; risk groups at GS and JPM run intraday VaR off Snowflake or Vertica. Knowing how to model and query this data is the line between research and ops.

In US markets, Database Design for Trading Systems tends to surface during onboarding, code review, and the first incident a junior quant gets pulled into. Questions on this material recur in interviews at Citadel, Two Sigma, Jane Street, HRT, Jump, DRW, IMC, Optiver, and the major bulge-bracket banks.

Common Mistakes

Not partitioning by (symbol, date) and watching every query do a full scan.
Treating timestamps without timezones — US markets are America/New_York, not UTC, on the wire.
Joining on price columns subject to floating-point drift instead of integer ticks.
Treating Database Design for Trading Systems as a one-off topic rather than the foundation it becomes once you ship code.
Skipping the US-market context — copying European or Asian conventions and getting bitten by US tick sizes, settlement, or regulator expectations.
Optimizing for elegance instead of auditability; trading regulators care about reproducibility, not cleverness.
Confusing model output with reality — the tape is the source of truth, the model is a hypothesis.

Practice Questions

Design a SQL query that returns the rolling 20-day VWAP for SPY by trade date.
Why do most US tick stores partition by date and symbol rather than purely by date?
Compare row-oriented Postgres and column-oriented ClickHouse for storing 5 years of NBBO quotes.
Define an idempotent UPSERT in Postgres for a daily P&L table keyed by (account, trade_date).
What is the difference between a CTE and a subquery from a query-planner perspective?

Answers and Explanations

Use a window function: SELECT trade_date, SUM(price*volume) OVER w / SUM(volume) OVER w AS vwap_20 FROM trades WHERE symbol='SPY' WINDOW w AS (ORDER BY trade_date ROWS 19 PRECEDING). The window function avoids self-joins and runs in a single pass.
Because the dominant query is WHERE symbol = ? AND trade_date BETWEEN ? AND ?; double partitioning prunes most files at the planner stage. Pure date partitioning forces every per-symbol query to scan every file in the date range.
Postgres handles the workload but is 10-100× slower on aggregations because every row must be read even when only bid and ask are projected. ClickHouse projects only the needed columns and applies block-level compression, so a sum-of-spread over a year completes in seconds rather than minutes.
INSERT INTO pnl(account, trade_date, gross, fees, net) VALUES (...) ON CONFLICT (account, trade_date) DO UPDATE SET gross=EXCLUDED.gross, fees=EXCLUDED.fees, net=EXCLUDED.net; — running it twice produces the same final state.
A CTE is sometimes materialized (as in older Postgres) and sometimes inlined (modern Postgres, Snowflake, BigQuery), while a correlated subquery is typically inlined and may execute once per outer row. The semantic effect is the same; the performance profile is not.

Glossary

OLTP — Online Transaction Processing; row-oriented, optimized for many small writes (order books, ledgers).
OLAP — Online Analytical Processing; column-oriented, optimized for big aggregate queries (P&L, risk).
Index — an auxiliary data structure (B-tree, hash, bitmap) that speeds up lookups at the cost of write throughput.
Normalization — splitting data across tables to remove redundancy; trades disk for joins.
Denormalization — collapsing tables for read speed; standard for time-series tick stores.
CTE — Common Table Expression; a named subquery defined with WITH, often used for readability and recursive logic.
Window function — SQL function that computes over a sliding subset (rolling means, lagged returns).
Parquet — a columnar file format with compression and predicate pushdown; the de-facto research format on S3.

Further Study Path

SQL for Financial Data — Querying trade data, aggregating positions, joining reference data — the SQL fundamentals that matter for finance.
Advanced SQL for Financial Systems — CTEs, window functions, query optimization — the SQL patterns used in real trading platforms.
Time Series Databases in Finance — When relational is not enough — why finance reaches for purpose-built TSDBs for tick storage and monitoring.
Python for Quant Finance: Fundamentals — Variables, functions, data structures, classes, and error handling — the core Python every quant role expects.
Advanced Python for Financial Applications — Decorators, generators, and context managers — the patterns that separate beginner Python from production quant code.

Key Learning Outcomes

Explain design matters more than you think.
Apply normalization: organizing data to reduce redundancy.
Recognize indexing strategy.
Describe partitioning for scale.
Walk through constraints and data quality.
Identify when relational is not enough.
Articulate database as it applies to database design for trading systems.
Trace design as it applies to database design for trading systems.
Map trading as it applies to database design for trading systems.
Pinpoint how database design for trading systems surfaces at Citadel, Two Sigma, Jane Street, or HRT.
Explain the US regulatory framing — SEC, CFTC, FINRA — relevant to database design for trading systems.
Apply a single-paragraph elevator pitch for database design for trading systems suitable for an interviewer.
Recognize one common production failure mode of the techniques in database design for trading systems.
Describe when database design for trading systems is the wrong tool and what to use instead.
Walk through how database design for trading systems interacts with the order management and risk gates in a US trading stack.
Identify a back-of-the-envelope sanity check that proves your implementation of database design for trading systems is roughly right.
Articulate which US firms publicly hire against the skills covered in database design for trading systems.
Trace a follow-up topic from this knowledge base that deepens database design for trading systems.
Map how database design for trading systems would appear on a phone screen or onsite interview at a US quant shop.
Pinpoint the day-one mistake a junior would make on database design for trading systems and the senior's fix.