How to serialize the access to Global Variables in CA OPS/MVS?

Document ID : KB000028111
Last Modified Date : 14/02/2018
Show Technical Document Details


How to serialize the access to Global Variables in CA OPS/MVS? 



You can use numerous AOF rule types to monitor various types of events that may occur on the system. The need for serialization is not unique to any particular type of event that can be monitored by AOF. You could probably envision a need involving any one of them. In this article, however, we will limit the discussion to message events and )MSG rules, which are probably the most common. The coding techniques that we discuss can be applied to all types of events.

For example, suppose you want to count the number of times a particular message is issued on the system using a )MSG rule. You can easily save the data (count) in a CA OPS/MVS permanent global variable such as GLOBAL.#message. Doing so preserves the count over CA OPS/MVS re-starts and system IPLs. The )MSG rule starts with the following code in its )INIT section. This code initializes the counter (if necessary) when the rule is enabled:

 )MSG message
  IF ^OPSVALUE('GLOBAL.#message','E') THEN /* If no system msg count */
    GLOBAL.#message = 0                    /* Then init sys msg count*/

The OPSVALUE function used in the above example tests for the existence of the global variable GLOBAL.#message. If the global variable does not exist, the code in the example creates it and initializes it to 0. This instantiation of the global variable occurs only once, when the rule is enabled for the first time. Unless the global variable is subsequently deleted, it will exist in the CA OPS/MVS global variable database preserved in the SYSCHK1 data set over restarts and IPLs of the system.

The )MSG rule increments the count with code in its )PROC section, which is called each time the event occurs (that is, each time the message is issued). Such code may seem trivial and harmless. For example, the following code would seem to bring about the desired results:

  GLOBAL.#message = GLOBAL.#message + 1   /* Increment sys msg count */

However, the above example may in fact be insufficient if the message being counted can be issued concurrently (or simultaneously, or asynchronously) by more than one task active on the same system at the same time. If that can happen, and if the system is running in a multiple CP (Central Processor) hardware environment, then it is possible that a concurrent update could be made to the same global variable, GLOBAL.#message. If it is possible for an asynchronous or simultaneous update to occur, then updates could be "lost."

This concept of multi-processing may be new to some readers. Others may recognize it as a familiar problem. For those less experienced with the problem of concurrent updates, consider a scenario in which two different tasks (or processes) that are active on the same system issue the same message at the same time. Each process is running on a different CP (or processor). Each processor is simultaneously executing the same code in the same rule, specifically the instruction:

GLOBAL.#message = GLOBAL.#message + 1   /* Increment sys msg count */

So each processor simultaneously fetches the same value of the variable, increments it, and stores it. As a result, the actual value of the variable is incremented by only one, even though it was incremented two times. Each of the two separate increments stored the same value in the variable, because each increment fetched the same value from the variable. Thus, we have counted only one message event, even though two message events occurred; they just happened to occur simultaneously. The automation you've written has just "missed" counting an event.

You may ask "What are the chances of that happening?" Slim, perhaps. And if it does, what is one missed message count? It may not seem like much, but once created, code takes on a life of its own. Who knows where your message rule of today may end up tomorrow, or what events it may be counting? Perhaps counting the number of bolts screwed into an engine block? Perhaps the number of airplanes on a radar screen? Although it may not be possible to write code that is 100% accurate, it is important to try for just this reason.

To solve this problem of concurrent update, you must serialize (or synchronize) the update to the common global variable, so that only one process can execute the update at any one time. Experienced readers will recognize this situation as similar to having multiple copies of re-entrant code that must update a common variable while executing at the same time. In the world of Assembler language, the instruction CDS (Compare Double and Swap) would solve our problem, if the count were stored in a doubleword of storage. The CDS instruction is serialized at the hardware level. Even if multiple processors execute a CDS on the same doubleword of storage at the same time, the hardware guarantees that each complete fetch from and update to storage is fully executed without interruption, in a single-threaded manner. In standard REXX, there is no equivalent function that performs serialized updates to variable data. In OPS/REXX however, there is. It is OPSVALUE.

OPSVALUE is capable of performing numerous serialized functions on OPS/REXX compound variables. One such function, Add, is designed specifically to solve our example problem. The following code in the )PROC section of the )MSG rule safely adds one to the value of global variable GLOBAL.#message, in a serialized (or synchronized) manner:

  temp = OPSVALUE('GLOBAL.#message','A',1) /* Increment sys msg count*/

How do you know when serialization is necessary? How do you know if an event can happen concurrently? These questions can be difficult to answer. Regarding message events, OPSLOG can be a valuable tool in helping you to find the answer. For instance, turning on the ASID field in the OPSLOG DISPLAY column selection table will cause the generating ASID to be displayed next to every message in OPSLOG. Yet, the problem of concurrent update does not have to span multiple address spaces. The problem can occur between multiple tasks running under a single address space.

Additional Information:


Reference information for the OPSVALUE function can be found in the link below: