I want to give a brief tutorial on understanding CPU processor utilization since it is commonly an area of confusion during performance analysis.
CPU time is the percentage of clock ticks than a processor spends waiting for instructions. This is opposed to wall-clock time, which is the total time the whole computer takes to perform an operation. The “load” on a system is the ratio of clock ticks which are performing operations versus the click ticks spent waiting for instructions over a given time period. Thus, load only makes sense as an average over a particular time period. During any single clock tick, the CPU is either processing an instruction or a HLT.
The CPU processes a set of instructions fed from memory. The memory contains a set of opcodes (commands) which tell the CPU which operation to perform and the memory where it is located. The rate of clock ticks is constant – several gigahertz in a modern CPU – and during each clock tick, the CPU processes either an instruction to do some work, or a HLT – an opcode which tells the CPU to turn keep components idle until the next cycle.
Monitoring CPU time
So if we were to “observe” a CPU at the clock-tick level of detail, we would see that it’s always either working or waiting. But we can’t actually do that, since the more closely we monitor clock cycles, the more clock cycles are needed to do the monitoring. So to get an accurate picture of activity, we have to step back to a level where the monitoring tool does not interfere too much with the process being observed.
Any given computer task has several components: CPU instructions, memory IO, disk IO, network IO, and many others. Only extremely simple tasks (like calculating π) can fit in the small (but very fast) memory buffers on the CPU itself. Real-world tasks will almost always require the CPU to wait for other components to finish shuffling data back and forth in a state where it is ready for the CPU to work on it. But the ideal scenario is for the CPU to be kept as busy as possible (constant 100% load) until the task is completed. This is the minimum possible time in which a task can be completed. If we were to monitor CPU activity during such a task, we’d see the load jump to 100% during the task than back to the baseline when it’s done. But if the task is shorter than the measurement frequency of the CPU monitoring tool, it would be an average over two periods, or it might not be detected at all.
The situation is more complicated when there are multiple tasks to run, each off which requires some fraction of CPU time. The operating system will run the tasks concurrently. Both the processing time and the IO of tasks will overlap, but because the operating system takes turns running each task, the total CPU load will be some combination which may be higher or lower than the total of the individual tasks – depending on the other competing resources the tasks use. For example, two disk-intensive tasks will use less CPU than the sum of both running individually because the disk IO will be the bottleneck. But two CPU-intensive tasks would more than double total load because the CPU will have to run both tasks and have to handle the context switching between concurrent tasks.