Interpreting machine load averages

2012-02-22

A few nights back a few people on #crunchbang IRC channel in Freenode seemed to have some confusion on what the load average values their scripts gave them actually mean.

Programs like w, top and uptime report the load averages by three floating point values.

$ w 22:26:10 up 14 days,  3:34,  2 users,  load average: 0.12, 0.06, 0.07

The values represent system load on averages for the last 1, 5 and 15 minutes. Each of these values tell how many processes on average were either in runnable or in uninterruptable state during the observation period. Runnable means a process being currently executed by the processor or waiting to be executed, and uninterruptable most often means it is waiting for some resource, typically an I/O device (fetching data from the hard drive is slow).

On a one CPU machine 1.00 would mean a full load for the CPU, 0.25 would mean it was occupied for 25% percent of the time. However, things get a bit more curious when the computer has multiple CPUs or CPU cores.

A dual-core machine similar to my desktop computer has a maximum load average value of 2.00. Now 1.00 load would mean the computation capability of my CPU was 50% utilized. Similarly a 6-core processor with a load average of 2.50 would mean that the full CPU capacity was utilized for about 40% worth.

Unfortunately this reporting scheme doesn't separate whether all of the CPUs were evenly utilized for half the reporting period, or if one or two did all of the work and the rest sat there idle.