This article looks at analyzing a NetMaster region when it has high CPU consumption.
A NetMaster region includes the following products:
- Unicenter NetMaster Network Management for TCP/IP
- Unicenter SOLVE:Access
- Unicenter SOLVE:Operations (all variants)
Sometimes these products may start to consume an inordinate amount of CPU resources. This usage may be transient or continuous.
The NetMaster products have some facilities that can help customers and support personnel to diagnose the cause of high resource consumption.
Below, we list some steps that can help in this diagnosis
Busy or looping?
The first step is to determine whether the region is very busy, or if in fact the region is stuck in a loop.
With NetMaster products, this is relatively easy:
- If you can log on to the region, albeit with slow response, then it is not looping
- If you can enter commands from a console (using MODIFY (F)), and receive responses, then it is not looping.
- If, after entering a few MODIFY commands (normally 5), you receive a 'Task Busy' message from MVS (IEE342I), then it is looping.
- If the region is constantly consuming the equivalent of 100% of one CPU in the LPAR, and appears to be unresponsive, then it is probably looping.
If the region is looping, then the action to take is simple:
- Cancel the region. Normally, NetMaster products request an SDUMP. Send the SDUMP data set, the formatted dump, and the NetMaster log to customer support for resolution.
The rest of this article discusses regions that are not looping. That is, they appear to be working, but the CPU resource consumption is regarded as excessive.
It can be useful to have some external statistics about the CPU consumption of the region. These statistics can come from:
- Other monitors
These figures can be useful when determining just where the CPU consumption occurs within the region.
One of the most useful diagnostic tools for a NetMaster region is the inbuilt performance monitor. This monitor samples activity, and produces a database. This database can be reported on in many ways.
There are two commands that control the performance monitor:
Both these commands use the same monitoring engine. The NCLMON command provides facilities that can be used in-house to analyse and tune locally written NCL. The ##PMON command provides more facilities that can be exploited by support personnel.
By default, the performance monitor only examines the NetMaster region Maintask. With the exception of Unicenter SOLVE:Access, this is where most of the activity occurs. (See below for special considerations for a Unicenter SOLVE:Access region).
Using the Performance Monitor
The best way to use the performance monitor is to start it and collect samples for a short period (say about two minutes) when the region is busy. A database of samples collected when the region is not busy, or a long sample run that includes only a brief period when the region is busy, will not provide a useful picture.
If, for example, the region uses excessive CPU every hour, on the hour, for five minutes, then you could set up a timer command (the AT command) to execute the monitor for two minutes:
AT 13:00 CMD=NCLMON START PERIOD=2
After the monitor has stopped, you could perform a basic analysis with this command:
NCLMON REPORT LEVEL=PROC LIMIT=20
This command would list the top 20 NCL procedures, in terms of CPU utilisation, in descending usage order.
Much more analysis can be performed. Remember that it is best to keep the collection period short, and within a peak CPU utilisation period, to get some meaningful statistics.
NOTE: Starting with Release 11.5, it is now possible to write the collected samples to a data set that can be sent to CA. This allows further analysis to be performed after the region has terminated (which is not possible at present).
The following are a few other NetMaster commands that can be used to see what is happening in the region:
- SHOW NCL, which provides a list of executing NCL Processes. Specifically, SHOW NCL=ALL will list all processes executing in the region. The P-UNITS column of the output can be used to see which processes seem to be consuming significant resources. However, note that a long-running process such as LOGPROC may have a high value because it has been active for several weeks, but is not actually using much at any time.
- SHOW SYSWAIT, which provides statistics on the apparent amount of time that the NetMaster maintask has been waiting (the AWAIT% column). High values here indicate that the maintask is spending most of its time waiting for work.
- ##FDUMP, which can be used to take a formatted dump of a running region, without actually abending the region . Optionally, an SDUMP can also be taken. These dumps can be very useful as aids to support. It can be very difficult to take down a region in peak times. With this command, you can produce a dump of a running region when it is using excessive resources, but leave it active (and thus not impact users, automation, or monitoring).
Unlike the other NetMaster products, a SOLVE:Access region uses several tasks to process work. This means that analysing a SOLVE:Access region that is consuming excessive resources is a little bit different.
The NCLMON command is not useful in this case. The ##PMON command must be used.
Before using the performance monitor, you need to execute the following command:
##PMON SET SUBTASK=YES
Analysis is at the assembler level. The specific commands are advised by Technical Support.