Networking · 11 min read · ~26 min study · intermediate

Network Speeds and Latency in Trading

Why latency matters, how to measure it, where bottlenecks hide — from co-location to kernel bypass.

11 min read ~26 min study intermediate systems #latency#hft#networking 20 learning outcomes

Network Speeds and Latency in Financial Systems

Why latency matters in trading, how to measure it, where the bottlenecks are, and what firms do to minimize it — from co-location to kernel bypass.

Why Nanoseconds Matter

In most software, a few milliseconds of latency is invisible. In trading, it can be the difference between profit and loss. If your system receives a market data update 1 millisecond after a competitor and both systems want to trade on it, you lose.

This creates an arms race where trading firms invest heavily in reducing latency at every level — network, hardware, software, even physical location. Understanding where latency comes from helps you make informed decisions about where to optimize and when the effort is (or is not) worth it.

Where Latency Lives

Speed of Light

The absolute physical limit. Light travels about 200km in a fiber optic cable per millisecond. New York to New York is roughly 5,500km — about 27ms one way through undersea cables. Nothing can make this faster except a shorter physical path.

Route	Distance	One-Way Latency
New York ↔ New York	~5,500 km	~27 ms
Chicago ↔ New York	~1,200 km	~6 ms
Same data center	~1 m	~5 ns
Same rack	~0.5 m	~2 ns

This is why co-location exists: trading firms place their servers in the same data center as the exchange. At the speed of light, being 1km closer saves 5 microseconds. Over millions of trades, that adds up.

Network Stack

Even within a data center, the network stack adds latency:

Application code ~100 ns - 10 us
System call overhead ~1-5 us
Kernel network stack ~5-15 us
Network interface card ~1-5 us
Switch/router ~1-5 us
Cable propagation ~5 ns per meter

For a standard TCP request within a data center, total round-trip latency is typically 50-200 microseconds. For low-latency trading, even this is too much, which is why firms invest in:

Kernel Bypass

The OS kernel network stack is generic — it handles every type of traffic equally. Kernel bypass techniques (DPDK, Solarflare OpenOnload, Mellanox VMA) let applications read network data directly from the network card, skipping the kernel entirely. This can cut latency from ~50us to ~5us.

FPGA Network Cards

Taking it further, some firms use FPGAs built into the network card itself to parse market data messages before they even reach the CPU. The hardware acceleration guide covers this in more detail.

Measuring Latency

You cannot improve what you do not measure. The key metrics:

Median latency (p50) — typical performance. Important but not sufficient.

Tail latency (p99, p99.9) — worst-case performance. This is where problems hide. A system with 100us median but 50ms p99 has occasional catastrophic slowdowns.

Jitter — variance in latency. High jitter means unpredictable performance, which is worse than consistently high latency in many trading strategies.

import numpy as np

latencies = measure_latencies(n_samples=10000)

print(f"Median (p50): {np.percentile(latencies, 50):.1f} us")
print(f"p90: {np.percentile(latencies, 90):.1f} us")
print(f"p99: {np.percentile(latencies, 99):.1f} us")
print(f"p99.9: {np.percentile(latencies, 99.9):.1f} us")
print(f"Max: {np.max(latencies):.1f} us")
print(f"Std dev: {np.std(latencies):.1f} us")

Software Latency Optimization

Before investing in exotic hardware, software optimizations often provide the biggest gains:

Memory Allocation

Dynamic memory allocation (malloc/new) is slow and unpredictable. Low-latency systems pre-allocate all memory at startup:

// Bad: allocating on the hot path
void process_order(const Message& msg) {
 auto order = new Order(msg); // Heap allocation — unpredictable latency
 // ...
 delete order;
}

// Good: use a pre-allocated pool
class OrderPool {
 std::array pool_;
 size_t next_ = 0;
public:
 Order* acquire { return &pool_[next_++]; }
 void release { next_--; }
};

Lock-Free Data Structures

Mutex locks cause threads to sleep and wake — adding microseconds of latency. Lock-free queues using atomic operations avoid this entirely.

Avoid System Calls

Every system call (file I/O, network I/O, memory mapping) involves a context switch between user space and kernel space. On the hot path, minimize or eliminate them.

CPU Pinning and Isolation

Dedicate specific CPU cores to latency-critical threads. Prevent the OS from scheduling other work on those cores:

# Isolate CPU cores 2 and 3 from the OS scheduler
# (kernel boot parameter)
isolcpus=2,3

# Pin a process to specific cores
taskset -c 2 ./trading_engine

The Latency Hierarchy in Practice

For most financial applications (not HFT), the priorities are:

Architecture — are you making unnecessary network calls? Can you cache data locally?
Database queries — are your queries optimized with proper indexes?
Serialisation — are you using efficient data formats? JSON is slower than binary formats.
Connection management — are you reusing connections or opening new ones for each request?
Code efficiency — are your algorithms appropriate? Using appropriate languages for hot paths?

Only HFT firms need to worry about kernel bypass, FPGA, and CPU cache optimization. But understanding the full picture helps everyone make better design decisions. Even reducing API latency from 200ms to 20ms by adding a cache can transform the user experience of a trading application.

The difference between "fast enough" and "needs hardware optimization" depends entirely on your use case. Understanding networking fundamentals helps you identify where the bottleneck actually is before investing in solutions.

Want to go deeper on Network Speeds and Latency in Financial Systems?

This article covers the essentials, but there's a lot more to learn. Inside , you'll find hands-on coding exercises, interactive quizzes, and structured lessons that take you from fundamentals to production-ready skills — across 50+ courses in technology, finance, and mathematics.

Free to get started · No credit card required

Keep Reading

[Networking

Networking Fundamentals Every Developer Should Understand

How the internet works under the hood — DNS, TCP/IP, HTTP, firewalls, and the networking concepts that matter for building financial applications.](/quant-knowledge/networking/networking-fundamentals)[Systems Programming

Rust for Low-Latency Trading Systems

Why Rust is gaining traction in finance — memory safety without garbage collection, zero-cost abstractions, and the performance characteristics that matter for trading.](/quant-knowledge/systems/rust-for-low-latency-trading-systems)[Systems Programming

Hardware Acceleration for Quantitative Finance

JIT compilation, SIMD instructions, GPU computing with CUDA, and FPGAs — the hardware acceleration techniques used in high-performance financial systems.](/quant-knowledge/systems/hardware-acceleration-for-quantitative-finance)[Systems Programming

C++ in Quantitative Finance

Why C++ remains the language of choice for performance-critical finance — low-latency trading, derivatives pricing, and the modern C++ features that matter.](/quant-knowledge/systems/cpp-in-quantitative-finance)

What You Will Learn

Explain why nano__pn0__onds matter.
Build where latency lives.
Calibrate measuring latency.
Compute software latency optimization.
Design the latency hierarchy in pract__pn0__.

Prerequisites

Networking fundamentals — see Networking fundamentals.
Comfort reading code and basic statistical notation.
Curiosity about how the topic shows up in a US trading firm.

Mental Model

In US equities, the speed of light dictates the playing field. Every microsecond of switch fabric, fiber length, and protocol overhead is either an opportunity or a tax. Networking is where physics meets P&L. For Network Speeds and Latency in Trading, frame the topic as the piece that why latency matters, how to measure it, where bottlenecks hide — from co-location to kernel bypass — and ask what would break if you removed it from the workflow.

Why This Matters in US Markets

The colocation market in US equities and futures is its own ecosystem. NY4, NY5, CH2, and CME Aurora all sell rack space, cross-connects, and microwave/laser links. Wireless carriers like McKay Brothers and Anova run point-to-point links between Aurora and Carteret because fiber is too slow.

In US markets, Network Speeds and Latency in Trading tends to surface during onboarding, code review, and the first incident a junior quant gets pulled into. Questions on this material recur in interviews at Citadel, Two Sigma, Jane Street, HRT, Jump, DRW, IMC, Optiver, and the major bulge-bracket banks.

Common Mistakes

Tuning mean latency while ignoring the 99.9th percentile that actually drives risk.
Letting NIC interrupt coalescing run with default settings on a low-latency host.
Forgetting that microwave links degrade in heavy precipitation.
Treating Network Speeds and Latency in Trading as a one-off topic rather than the foundation it becomes once you ship code.
Skipping the US-market context — copying European or Asian conventions and getting bitten by US tick sizes, settlement, or regulator expectations.
Optimizing for elegance instead of auditability; trading regulators care about reproducibility, not cleverness.
Confusing model output with reality — the tape is the source of truth, the model is a hypothesis.

Practice Questions

Why does a US futures shop pay six figures a year for a microwave link between Aurora and Carteret?
When is TCP unsuitable for market data and why?
What does jitter mean in trading, and why is it sometimes more important than mean latency?
Describe a multicast feed handler's failure mode when a switch drops a sequence number.
Why does NIC interrupt coalescing matter for a co-located trading process?

Answers and Explanations

Because microwave traverses the great circle path through the air, which is ~10 ms faster than fiber that follows roads and rights-of-way; that 10 ms determines who quotes the spread first.
When data volume and latency requirements outpace TCP's congestion control and retransmit semantics; UDP multicast is preferred because the application can drop, replay, or interpolate without head-of-line blocking.
Jitter is the variance of latency; tail latency drives risk because a single 99th-percentile spike crosses an exchange deadline and produces a stale quote. Many strategies optimize tail more than mean.
The handler detects the gap, requests a retransmission from a TCP recovery channel, and either re-orders or marks the stream stale; the strategy must decide whether to keep quoting or to step out.
Coalescing batches interrupts to save CPU but adds latency and jitter; HFT firms turn coalescing off and pin the NIC interrupt to a dedicated core, trading CPU for predictable response times.

Glossary

Latency — round-trip time; measured in microseconds for cross-data-center, nanoseconds for in-rack.
Throughput — bandwidth, in bits per second.
Jitter — variance in latency; often more harmful than absolute latency for trading.
Microwave / mmWave — wireless point-to-point links faster than fiber over the same path.
Cross-connect — a physical patch cable inside a colo facility.
Multicast — one-to-many delivery used by SIP, OPRA, CME ESS.
TCP — connection-oriented, reliable, in-order; used for orders.
UDP — connectionless, no retransmit; used for market data feeds.

Further Study Path

Networking Fundamentals — DNS, TCP/IP, HTTP, firewalls — how the internet works and why every finance dev should understand it.
Security and Authentication for Fintech — Auth, encryption, common vulnerabilities — the security mindset every financial app developer needs.
Python for Quant Finance: Fundamentals — Variables, functions, data structures, classes, and error handling — the core Python every quant role expects.
Advanced Python for Financial Applications — Decorators, generators, and context managers — the patterns that separate beginner Python from production quant code.
NumPy for Quantitative Finance — Why array operations power everything from portfolio risk to Monte Carlo — and why they outpace plain Python.

Key Learning Outcomes

Explain why nano__pn0__onds matter.
Apply where latency lives.
Recognize measuring latency.
Describe software latency optimization.
Walk through the latency hierarchy in pract__pn0__.
Identify latency as it applies to network speeds and latency in trading.
Articulate HFT as it applies to network speeds and latency in trading.
Trace networking as it applies to network speeds and latency in trading.
Map how network speeds and latency in trading surfaces at Citadel, Two Sigma, Jane Street, or HRT.
Pinpoint the US regulatory framing — SEC, CFTC, FINRA — relevant to network speeds and latency in trading.
Explain a single-paragraph elevator pitch for network speeds and latency in trading suitable for an interviewer.
Apply one common production failure mode of the techniques in network speeds and latency in trading.
Recognize when network speeds and latency in trading is the wrong tool and what to use instead.
Describe how network speeds and latency in trading interacts with the order management and risk gates in a US trading stack.
Walk through a back-of-the-envelope sanity check that proves your implementation of network speeds and latency in trading is roughly right.
Identify which US firms publicly hire against the skills covered in network speeds and latency in trading.
Articulate a follow-up topic from this knowledge base that deepens network speeds and latency in trading.
Trace how network speeds and latency in trading would appear on a phone screen or onsite interview at a US quant shop.
Map the day-one mistake a junior would make on network speeds and latency in trading and the senior's fix.
Pinpoint how to defend a design choice involving network speeds and latency in trading in a code review.