Your p99 response time just doubled and someone's pinging you about it. Before you start randomly restarting services, here's a systematic way to narrow down the problem.
Step 1: Confirm the problem exists outside your monitoring
First, rule out a monitoring artifact. Run a quick check from your own machine and from a different network. Use curl with timing output to get a real measurement.
$ curl -o /dev/null -s -w "dns: %{time_namelookup}s connect: %{time_connect}s tls: %{time_appconnect}s ttfb: %{time_starttransfer}s total: %{time_total}s\n" https://api.example.com/health dns: 0.024s connect: 0.051s tls: 0.128s ttfb: 0.847s total: 0.852s
This immediately tells you where the time is going. In this example, the gap between TLS handshake completion (128ms) and time to first byte (847ms) is 719ms — that's your server processing time. DNS and connection overhead are negligible.
Step 2: Isolate the layer
API latency breaks down into a few layers, and the fix depends entirely on which one is slow:
- DNS resolution (typically 5-50ms): Slow DNS usually points to misconfigured resolvers or a DNS provider having issues. Check if you're seeing TTL-based caching problems.
- TCP + TLS handshake (typically 20-100ms): High connect times often mean geographic distance between client and server, or the server is slow to accept connections (overloaded, too many connections).
- Server processing / TTFB (variable): This is where most problems live. The server received the request and is taking too long to generate a response.
- Transfer time (depends on payload size): If total time minus TTFB is high, you're sending too much data. Check response sizes.
Step 3: Narrow down server-side causes
If TTFB is the problem (it usually is), you need to figure out what the server is waiting on. Common culprits, roughly in order of how often we see them:
Database queries
The single most common cause of API latency. Check your slow query log. Look for queries that are doing full table scans, missing indexes, or joining too many tables. A query that takes 5ms on a small dataset can take 500ms when the table grows.
Quick diagnostic: if latency scales with data volume or specific query parameters, it's almost certainly a database issue.
N+1 queries
The special case of database problems that deserves its own section. You're fetching a list of items, then making a separate query for each item's related data. 50 items becomes 51 queries. Your ORM is probably hiding this from you.
Enable query logging for a single request and count the queries. If you see the same query template repeated dozens of times with different IDs, that's your N+1.
External API calls
Your API might be calling another API that's slow. This is common in microservice architectures where a single request fans out to multiple services. Add timeouts to all external calls (you'd be surprised how many HTTP clients default to no timeout), and consider whether any of those calls can be made in parallel or cached.
Connection pool exhaustion
If latency spikes are intermittent and correlate with traffic, check your connection pools. Database connection pools, HTTP client pools, and Redis connection pools all have limits. When they're exhausted, new requests queue up waiting for a connection.
# PostgreSQL: check active connections vs limit $ psql -c "SELECT count(*) as active, (SELECT setting FROM pg_settings WHERE name='max_connections') as max FROM pg_stat_activity WHERE state = 'active';" active | max --------+----- 94 | 100
94 out of 100 connections active. That's your problem.
Garbage collection pauses
If you're running on a garbage-collected runtime (Node.js, JVM, Go, Python), GC pauses can cause latency spikes that are hard to correlate with anything else. They tend to show up as periodic p99 spikes that don't correlate with traffic patterns. Check GC logs and consider tuning heap sizes or GC algorithms.
Step 4: Measure, don't guess
Resist the urge to add caching and call it done. Caching hides latency problems; it doesn't fix them. Before you reach for Redis, make sure you understand what's actually slow.
Add structured logging with timing for each phase of request processing: authentication, authorization, data fetching, serialization. Most frameworks support middleware-level timing that's easy to add and immediately reveals where requests spend their time.
Step 5: Validate the fix
After making changes, compare the same percentile metrics (p50, p95, p99) over the same time window. A fix that improves p50 but doesn't touch p99 probably only fixed the easy case and left the real problem in place.
Run your monitoring checks from multiple regions to make sure the fix isn't region-dependent. And set up a latency alert for the endpoint so you'll know quickly if the problem comes back.
The best time to set up latency monitoring is before you have a latency problem. The second best time is right after you fix one.