I/O overhead occurs when an application spends too much time reading or writing data instead of completing useful work. In database-driven systems, the database is often called “slow” when it is actually executing inefficient access patterns created by the application.
The first goal is not to redesign the database. It is to prove that storage waits exist, identify the query or write path producing them, and reduce unnecessary movement of data.
What I/O Overhead Looks Like
A database-related I/O bottleneck often appears through several signals at the same time:
Slow APIs, Normal App CPU
Requests slow down even though the application process is not close to CPU saturation.
High Storage Wait
The server reports elevated disk latency, large I/O queues, or high %iowait.
Heavy Database Activity
PostgreSQL performs large reads or writes while the application process performs relatively little disk traffic.
Growth Makes It Worse
Query time increases as tables, logs, history records, or related objects become larger.
Read overhead commonly comes from table scans, weak indexes, oversized results, repeated relationship queries, and analytics traffic sharing resources with user requests.
Write overhead commonly comes from unnecessary updates, excessive indexes, large transaction batches, append-only tables growing without control, checkpoints, vacuum work, or replication pressure.
Step 1: Confirm Storage Pressure
Run this on the affected Linux server to measure device latency, I/O queueing, and whether storage is remaining busy during the slowdown:
iostat -xz 1 5
Example output:
Device r/s w/s r_await w_await aqu-sz %util
nvme0n1 1410.0 722.0 38.20 52.70 22.14 99.40
| Signal | Interpretation |
|---|---|
High r_await or w_await | Reads or writes are taking longer |
Growing aqu-sz | I/O requests are queueing |
High %util plus high latency | Storage is a likely constraint |
High %util alone | Not enough evidence |
Run this on the same server to identify which process is generating the storage traffic:
pidstat -d 1 5
Example output:
UID PID kB_rd/s kB_wr/s Command
999 1732 78120.00 22740.00 postgres
1001 28417 320.00 680.00 node
This result points the investigation toward PostgreSQL queries and write behaviour rather than application-side CPU optimization.
Step 2: Find the Expensive Queries
PostgreSQL's pg_stat_statements extension records execution statistics for normalized SQL statements. Enable it through the normal database change process before relying on its statistics.
Run this in PostgreSQL to enable query-level execution tracking when it is not already available:
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
Run this in PostgreSQL to identify queries that consume the most total execution time and read many shared blocks:
SELECT
calls,
round(total_exec_time::numeric, 2) AS total_ms,
round(mean_exec_time::numeric, 2) AS avg_ms,
shared_blks_read,
shared_blks_hit,
rows,
query
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 5;
Interpret the result in two directions:
| Pattern | Why It Matters |
|---|---|
| High calls and high total time | Frequent query is continuously taxing the system |
| Low calls but very high average time | Rare query may create latency spikes or reporting pressure |
High shared_blks_read | Query repeatedly needs blocks from storage |
| High hits with acceptable latency | Data is often served from memory instead of disk |
Do not optimize only the slowest single query. A moderately slow query called thousands of times may consume more system capacity than an occasional report query.
Step 3: Inspect the Query Plan
After locating an expensive query, inspect how PostgreSQL reaches the result. On production systems, use EXPLAIN (ANALYZE, BUFFERS) carefully because it executes the query.
Run this in PostgreSQL for a confirmed expensive user-facing query to see whether rows are found through an index or scanned from a larger table:
EXPLAIN (ANALYZE, BUFFERS)
SELECT id, status, total_amount, created_at
FROM orders
WHERE customer_id = 8821
ORDER BY created_at DESC
LIMIT 20;
Example plan before a suitable index exists:
Limit (actual time=184.101..184.118 rows=20 loops=1)
Buffers: shared hit=92 read=18342
-> Sort (actual time=184.099..184.103 rows=20 loops=1)
Sort Key: created_at DESC
-> Seq Scan on orders
Filter: (customer_id = 8821)
Rows Removed by Filter: 1924830
Execution Time: 184.166 ms
| Plan Signal | Meaning |
|---|---|
Seq Scan | PostgreSQL is examining the table instead of directly finding matching rows |
Large Rows Removed by Filter | Much of the read work produces no returned value |
High shared read | Many blocks are loaded from storage |
Sort before a small LIMIT | Index order may not match the requested result order |
For this access pattern, an index that supports both filtering and ordering can reduce reads sharply.
Run this in PostgreSQL through the approved migration process to create an index for recent orders by customer without blocking ordinary writes during creation:
CREATE INDEX CONCURRENTLY idx_orders_customer_created_at
ON orders (customer_id, created_at DESC);
A good index matches a frequent, valuable query path. It should not be added merely because a column appears in a filter.
Strategies That Reduce I/O
1. Index Real Access Paths
Support selective filters, frequent joins, and common filter-plus-sort patterns. Prefer indexes justified by measured queries.
2. Read Less Data
Select only required columns, paginate list endpoints, require bounded history ranges, and avoid returning large JSON or text fields unnecessarily.
3. Remove Repeated Reads
Replace N+1 query patterns with bounded joins, batched WHERE id IN (...) reads, ORM preloading, or caching for stable reference data.
4. Reduce Write Amplification
Avoid updates that do not change values, review unused indexes on write-heavy tables, and process background writes in controlled batches.
5. Separate Read Workloads
Route reporting, analytics, exports, or delay-tolerant dashboards to read replicas when primary read load becomes significant.
6. Manage Large Historical Data
For logs, events, or time-series tables, use retention, summaries, or partitioning when queries naturally operate over time ranges.
These strategies are not interchangeable. Indexing helps a query find rows efficiently; pagination reduces the amount requested; replicas redistribute eligible reads; write control prevents the primary database from spending excessive effort maintaining changes.
Important Trade-Offs
More Indexes Can Increase Write Cost
An index may make an important read query much faster, but each insert, update, or delete may also need to maintain that index. Large, unused indexes on frequently written tables should be reviewed.
Run this in PostgreSQL to identify indexes with low recorded scan usage and significant size for further investigation:
SELECT
relname AS table_name,
indexrelname AS index_name,
idx_scan,
pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM pg_stat_user_indexes
ORDER BY idx_scan ASC, pg_relation_size(indexrelid) DESC;
A low idx_scan value makes an index a candidate for review, not automatic deletion. It may still support rare operational tasks, constraints, failover workloads, or periodic reports.
Read Replicas Can Return Stale Data
Read replicas can protect the primary database from delay-tolerant read traffic, but they may lag behind recent writes.
User changes profile -> primary commits update -> replica has not replayed it yet -> user reads old profile
Use the primary for flows requiring immediate read-after-write consistency. Route tolerant reads, such as historical reports or dashboards, to replicas only when slight delay is acceptable.
Partitioning Is Not a Replacement for Query Design
Partitioning can help very large, time-bounded datasets such as audit logs or events. It does not fix an application that selects unnecessary columns, requests unbounded rows, or lacks useful indexes.
A Practical Resolution Workflow
1. Confirm
Use Linux metrics to verify that storage waiting, rather than application CPU, is affecting latency.
2. Locate
Confirm that PostgreSQL is generating the reads or writes, then find costly statements with pg_stat_statements.
3. Explain
Use query plans and buffer statistics to understand whether the cost comes from scans, sorting, repeated reads, or write work.
4. Reduce
Add justified indexes, return fewer rows and columns, eliminate N+1 reads, and avoid unnecessary updates.
5. Separate
Move delay-tolerant reads or analytical workloads away from critical transactional traffic when required.
6. Measure Again
Compare query time, blocks read, storage latency, API percentiles, write cost, and replica lag after each change.
The Main Principle
I/O overhead usually means the system is moving more data than the product behaviour requires.
Reduce the amount of data read, reduce unnecessary writes, index the access paths that matter, and isolate heavy workloads only after measurement proves they are responsible. A database becomes faster not only when storage improves, but when the application asks it to do less unnecessary work.
I/O overhead,也就是读写开销过高,代表应用程序花了太多时间从储存设备读取数据,或把变更写入储存设备,而不是推进真正对用户有价值的工作。
在数据库驱动的系统中,人们经常直接说“数据库太慢”。但很多时候,数据库只是忠实地执行了应用程序设计出来的低效率读取方式,例如扫过大量无关资料、反复读取关联对象,或者写入了不必要的更新。
I/O 瓶颈如何出现
数据库相关的 I/O 问题,通常不会只表现为一个指标异常,而是多个现象同时出现:
API 变慢,但应用 CPU 不高
应用程序并没有忙于计算,请求却持续延迟,说明它可能正在等待外部读写完成。
储存等待明显增加
磁盘延迟提高、I/O 队列变长,或者服务器上的 %iowait 显著上升。
数据库产生大量读写
PostgreSQL 持续读取或写入大量资料,而 Node.js 等应用进程本身磁盘活动较低。
资料越多,查询越慢
随着订单、日志、事件或历史记录增长,原本可接受的查询开始明显拖慢。
读取压力通常来自全表扫描、索引不适合、查询返回太多资料、N+1 查询,或分析型任务和正常用户请求争用同一个数据库。
写入压力通常来自没有实际变化的 update、过多索引、大型 transaction、不断增长的日志表,以及 checkpoint、vacuum 或 replication 带来的额外工作。
步骤一:确认问题来自储存等待
在受影响的 Linux 服务器上执行以下命令,用来检查设备读写延迟、I/O 排队情况,以及储存设备在 API 变慢期间是否持续忙碌:
iostat -xz 1 5
输出范例:
Device r/s w/s r_await w_await aqu-sz %util
nvme0n1 1410.0 722.0 38.20 52.70 22.14 99.40
| 观察指标 | 代表意义 |
|---|---|
r_await 或 w_await 偏高 | 读取或写入完成速度变慢 |
aqu-sz 持续上升 | 更多 I/O 请求正在排队 |
%util 高,同时延迟也高 | 储存可能已经成为限制因素 |
只有 %util 高 | 证据仍然不足 |
在同一台服务器上执行以下命令,用来找出是哪一个进程产生了主要磁盘流量:
pidstat -d 1 5
输出范例:
UID PID kB_rd/s kB_wr/s Command
999 1732 78120.00 22740.00 postgres
1001 28417 320.00 680.00 node
这个结果表示主要 I/O 来自 PostgreSQL。此时排查方向应该转向 SQL 查询、索引与写入行为,而不是优先优化 Node.js 的 CPU 运算。
步骤二:找出昂贵查询
PostgreSQL 的 pg_stat_statements extension 可以记录标准化 SQL 语句的执行统计,用来找出哪些查询长期消耗最多资源。
当环境尚未启用该 extension 时,应通过正式的数据库变更流程执行以下 SQL,以提供查询层级的统计依据:
CREATE EXTENSION IF NOT EXISTS pg_stat_statements;
在 PostgreSQL 中执行以下查询,用来找出总执行时间高、平均执行时间高,或者大量读取共享 block 的 SQL:
SELECT
calls,
round(total_exec_time::numeric, 2) AS total_ms,
round(mean_exec_time::numeric, 2) AS avg_ms,
shared_blks_read,
shared_blks_hit,
rows,
query
FROM pg_stat_statements
ORDER BY total_exec_time DESC
LIMIT 5;
结果应从以下方向解释:
| 查询表现 | 为什么重要 |
|---|---|
| 调用很多次,总耗时很高 | 它正在持续消耗系统容量 |
| 调用不多,但平均耗时非常高 | 它可能制造延迟尖峰或拖慢报表任务 |
shared_blks_read 很高 | 查询需要频繁从储存读取资料页 |
| block hit 高,且延迟可接受 | 很多资料已经由内存提供 |
不要只优化单次执行最慢的查询。一条每次只慢一点、但被调用数千次的 SQL,可能比偶尔发生一次的重型报表更值得优先处理。
步骤三:理解查询如何读取资料
找出昂贵 SQL 后,需要确认 PostgreSQL 是如何访问资料的。EXPLAIN (ANALYZE, BUFFERS) 会实际执行查询,因此在生产环境中应谨慎使用。
在 PostgreSQL 中对已确认昂贵的用户查询执行以下计划分析,用来判断它是否通过索引取得资料,还是正在扫描大量无关记录:
EXPLAIN (ANALYZE, BUFFERS)
SELECT id, status, total_amount, created_at
FROM orders
WHERE customer_id = 8821
ORDER BY created_at DESC
LIMIT 20;
缺少合适索引时,可能得到类似结果:
Limit (actual time=184.101..184.118 rows=20 loops=1)
Buffers: shared hit=92 read=18342
-> Sort (actual time=184.099..184.103 rows=20 loops=1)
Sort Key: created_at DESC
-> Seq Scan on orders
Filter: (customer_id = 8821)
Rows Removed by Filter: 1924830
Execution Time: 184.166 ms
| Plan 信号 | 说明 |
|---|---|
Seq Scan | PostgreSQL 正在扫表,而不是直接定位目标资料 |
大量 Rows Removed by Filter | 读取了许多最终不会返回的记录 |
shared read 很高 | 查询必须从储存设备载入大量 block |
| 只返回少量结果却需要排序 | 当前索引顺序可能无法支援查询需求 |
对于这个查询模式,适合同时支持 customer_id 过滤与 created_at 排序的索引,可以大幅减少读取量。
通过正式 migration 流程在 PostgreSQL 中执行以下 SQL,用来建立符合客户近期订单查询路径的索引,并在建立期间减少对一般写入的阻塞:
CREATE INDEX CONCURRENTLY idx_orders_customer_created_at
ON orders (customer_id, created_at DESC);
索引的目标不是为每一个被查询过的栏位建立结构,而是支援真正高频且具有业务价值的访问路径。
降低 I/O 的策略
1. 为真实查询建立索引
支援选择性高的筛选、常用 join,以及频繁出现的筛选加排序组合。索引应由测量结果支持。
2. 少读取资料
只选择必要栏位,为清单接口加入分页,限制历史查询范围,并避免不必要地返回大型 JSON 或文字内容。
3. 消除重复读取
将 N+1 查询改成受控的 join、批次 WHERE id IN (...)、ORM preload,或对稳定参考资料使用缓存。
4. 减少写入放大
不要写回没有改变的值;检查写入频繁资料表上的多余索引;后台写入应采用受控批次。
5. 分离读取型负载
当 primary 的读取压力已经明显时,将报表、分析、导出或允许轻微延迟的 dashboard 导向 read replica。
6. 管理大型历史资料
对日志、事件或时间序列资料,在查询天然依赖时间范围时考虑 retention、摘要表或 partitioning。
这些策略处理的是不同问题。索引让查询更快找到资料;分页与栏位选择让系统少传输资料;replica 负责分散可容忍延迟的读取;写入控制则减少 primary 必须维护的变更量。
必须理解的取舍
索引越多,写入成本可能越高
索引可以明显改善重要读取查询,但每一次 insert、update 或 delete,都可能需要同步维护更多索引。对于写入频繁的资料表,大型但几乎没有被使用的索引必须经过检查。
在 PostgreSQL 中执行以下查询,用来找出记录中扫描次数很低、同时占用空间较大的索引,以便进一步调查:
SELECT
relname AS table_name,
indexrelname AS index_name,
idx_scan,
pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM pg_stat_user_indexes
ORDER BY idx_scan ASC, pg_relation_size(indexrelid) DESC;
idx_scan 很低,只代表索引值得审查,并不代表它应该立刻删除。它仍然可能支援低频但重要的报告、约束验证、故障切换流量或季节性任务。
Read Replica 可能返回旧资料
Read replica 可以把允许延迟的查询从 primary 转移出去,但刚写入 primary 的资料,不一定马上出现在 replica 中。
用户修改个人资料 -> primary 确认成功 -> replica 尚未同步完成 -> 用户看到旧资料
需要 read-after-write consistency 的流程,应继续读取 primary。历史报表、非关键 dashboard 等可以接受短暂延迟的读取,才适合导向 replica。
Partitioning 不能取代正确查询设计
对于大型日志表、事件表或时间序列资料,partitioning 可以帮助限制查询需要考虑的数据范围。但如果应用程序仍然读取不必要的栏位、请求没有边界的结果,或缺少关键索引,partitioning 本身不会解决根本问题。
实际处理流程
1. 确认
使用 Linux 指标验证响应延迟确实来自储存等待,而不是应用程序 CPU 运算。
2. 定位
确认 PostgreSQL 正在制造主要读写流量,再通过 pg_stat_statements 找出昂贵 SQL。
3. 解释
使用 query plan 与 buffer 统计理解成本来自扫表、排序、重复读取,还是写入工作。
4. 减少
建立有依据的索引,减少返回行数与栏位,消除 N+1,并停止不必要的 update。
5. 分离
当业务需要时,把允许延迟的读取或分析型负载移出关键交易流量路径。
6. 再次测量
每次变更后重新比较查询耗时、block 读取量、储存延迟、API latency、写入成本与 replica lag。
核心原则
I/O overhead 通常表示系统移动了超过产品行为实际需要的数据量。
减少读取的行数与栏位,停止不必要的写入,为重要访问路径建立合适索引,并且只在测量证明有需要时分离重型负载。数据库变快,不只是因为硬盘更快,也因为应用程序不再要求它执行无价值的资料搬运工作。