What is this?
Every hour (at 00:10, 01:10, etc...), we get the load at each node with
`$ uptime'.
This gives a number which is indicative of the processor load (see
ProcessorLoadAverage for a discussion of what processor load means). We take (from the loads of all processors), the maximum, the minimum and the average.
Off course, this data may be not representative if the processor load changes too much.
The ideal would be to monitor the load with a finer granularity (say each minute) and to take the average, but this would represent some unnecessary additional load on the processors.
- Perhaps the more important is the average (green) column. It is good to have a smooth curve with a value near 1, since it indicates a full use of the cluster. The max and min values give an idea of the spread in the load. However,due to the coarse time granularity it can be affected by short duration processes, like compilations on the server node. (This will be fixed later).
- The server (node1) is included in the statistics, even if normally we shouldn´t include it in parallel runs.
- If the cluster is down (more precisely, if the server
node1
is down) at a certain point in time, then that column will be missing from the chart, so that all columns previous to that one will be shifted one time period. That means that the time axis may be wrong if the server is down for a certain period of time. Then after a certain period of time, this will arrange automatically. This should be fixed in the future.
- The chart is generated automatically using the
gd
graphics library (called from Perl modules GD,GD::Graph).
--
MarioStorti - 26 Sep 2002