Debugging API Latency: A Step-by-Step Approach

Your p99 response time just doubled and someone's pinging you about it. Before you start randomly restarting services, here's a systematic way to narrow down the problem.

Step 1: Confirm the problem exists outside your monitoring

First, rule out a monitoring artifact. Run a quick check from your own machine and from a different network. Use curl with timing output to get a real measurement.

~ timing breakdown
$ curl -o /dev/null -s -w "dns: %{time_namelookup}s
connect: %{time_connect}s
tls: %{time_appconnect}s
ttfb: %{time_starttransfer}s
total: %{time_total}s\n" https://api.example.com/health

dns: 0.024s
connect: 0.051s
tls: 0.128s
ttfb: 0.847s
total: 0.852s

This immediately tells you where the time is going. In this example, the gap between TLS handshake completion (128ms) and time to first byte (847ms) is 719ms — that's your server processing time. DNS and connection overhead are negligible.

Step 2: Isolate the layer

API latency breaks down into a few layers, and the fix depends entirely on which one is slow:

  • DNS resolution (typically 5-50ms): Slow DNS usually points to misconfigured resolvers or a DNS provider having issues. Check if you're seeing TTL-based caching problems.
  • TCP + TLS handshake (typically 20-100ms): High connect times often mean geographic distance between client and server, or the server is slow to accept connections (overloaded, too many connections).
  • Server processing / TTFB (variable): This is where most problems live. The server received the request and is taking too long to generate a response.
  • Transfer time (depends on payload size): If total time minus TTFB is high, you're sending too much data. Check response sizes.

Step 3: Narrow down server-side causes

If TTFB is the problem (it usually is), you need to figure out what the server is waiting on. Common culprits, roughly in order of how often we see them:

Database queries

The single most common cause of API latency. Check your slow query log. Look for queries that are doing full table scans, missing indexes, or joining too many tables. A query that takes 5ms on a small dataset can take 500ms when the table grows.

Quick diagnostic: if latency scales with data volume or specific query parameters, it's almost certainly a database issue.

N+1 queries

The special case of database problems that deserves its own section. You're fetching a list of items, then making a separate query for each item's related data. 50 items becomes 51 queries. Your ORM is probably hiding this from you.

Enable query logging for a single request and count the queries. If you see the same query template repeated dozens of times with different IDs, that's your N+1.

External API calls

Your API might be calling another API that's slow. This is common in microservice architectures where a single request fans out to multiple services. Add timeouts to all external calls (you'd be surprised how many HTTP clients default to no timeout), and consider whether any of those calls can be made in parallel or cached.

Connection pool exhaustion

If latency spikes are intermittent and correlate with traffic, check your connection pools. Database connection pools, HTTP client pools, and Redis connection pools all have limits. When they're exhausted, new requests queue up waiting for a connection.

~ check connection pool
# PostgreSQL: check active connections vs limit
$ psql -c "SELECT count(*) as active,
  (SELECT setting FROM pg_settings
   WHERE name='max_connections') as max
  FROM pg_stat_activity
  WHERE state = 'active';"

 active | max
--------+-----
     94 | 100

94 out of 100 connections active. That's your problem.

Garbage collection pauses

If you're running on a garbage-collected runtime (Node.js, JVM, Go, Python), GC pauses can cause latency spikes that are hard to correlate with anything else. They tend to show up as periodic p99 spikes that don't correlate with traffic patterns. Check GC logs and consider tuning heap sizes or GC algorithms.

Step 4: Measure, don't guess

Resist the urge to add caching and call it done. Caching hides latency problems; it doesn't fix them. Before you reach for Redis, make sure you understand what's actually slow.

Add structured logging with timing for each phase of request processing: authentication, authorization, data fetching, serialization. Most frameworks support middleware-level timing that's easy to add and immediately reveals where requests spend their time.

Step 5: Validate the fix

After making changes, compare the same percentile metrics (p50, p95, p99) over the same time window. A fix that improves p50 but doesn't touch p99 probably only fixed the easy case and left the real problem in place.

Run your monitoring checks from multiple regions to make sure the fix isn't region-dependent. And set up a latency alert for the endpoint so you'll know quickly if the problem comes back.

The best time to set up latency monitoring is before you have a latency problem. The second best time is right after you fix one.

← All posts