How can we prevent DC015007 SOS abends or over-use of resources by OLQ?

Document ID : KB000024946
Last Modified Date : 14/02/2018
Show Technical Document Details

Question:

How can we prevent DC015007 SOS abends or over-use of resources by OLQ?

Answer:

OLQ satisfies logical data requests by performing physical I/Os. The amount of resources used by any task (including OLQ) can be controlled by limits specified in the sysgen for that task definition. OLQ in addition has an interrupt count which can be specified to control the frequency with which rows are returned, at which point the user has the option of continuing or ending the transaction.

In some situations, OLQ can use up very many resources before hitting an interrupt count, causing it to reach DBIO or DCCALLS limits. This can occur because the limits & the interrupt count track very different things: the I/O counter (used to check the DBIO limits) tracks the number of I/Os performed, and the Interrupt Count tracks the number of rows returned to OLQ. The relationship between these is most easily explained with an example. Let's say someone wants to find all the clients within a certain zip code, and then to find out which offices exist within that state to satisfy an order. That would be easy to set up on OLQ, particularly menu mode; you just pick the client and office records, specify a WHERE criteria on the client zip, and a join clause between the client state and the state in the office record. That would generate a query that follows this general form:

  SELECT * FROM CLIENT, OFFICE 
  WHERE CLIENT_ZIP = nnnnn and CLIENT.STATE = OFFICE.STATE; 

Now let's assume that you have 1,000,000 clients, and 5 offices in each state. In this query, zip and state are unlikely to be key fields, so we have no indexes or calc keys to aid us in this retrieval. That means we'll be doing area sweeps on two records, and joining them in a nested loop. The optimizer will determine the most efficient way to retrieve these records. For the sake of discussion, let's assume that our path will be to sweep the Client area for clients within this zip code, and the walk the office area for offices that satisfy the join criteria on state. That means we will:

  1. Walk the Client area until we find a client with the specified zip code then we'll walk the Office area until we find the first office within the client's state. (Let's say we find the first client record with the desired zip code about 1/2 way through the area);
  2. Walk the office area until we find an office whose state matches the state in the selected client record. (Let's say we have to read most of the office records before we find the first one within the client's state.)
  3. At this point. the I/O counter is set to some-where around 500,200 because that tis the total number of records we have read; however, the interrupt count counter is set to 1 because we have returned only 1 occurrence to OLQ.
  4. After returning this first complete row, OLQ continues to walk the office area, and it returns one record to OLQ for every office record with a match on that state. It will increment the I/O counter for every record it reads, and it will increment the interrupt count counter for every row it returns to OLQ.
  5. When we reach the end of the office area, we return to walking the client area, resuming with the next client record.
  6. When we find the next client in this zip code, we start over doing an area sweep on the office records to find one within that state.
  7. Because we are starting a new walk of the office area for every client we find (nested loop), and because we're walking a large area to find the clients, we could end up with a very large number of I/Os (and calls to the I/O routines) while the interrupt count counter remains relatively low.

The interrupt count provides an opportunity for the end user to terminate the task and thus avoid consuming too many resources, but often the user is completely unaware of the resources required to satisfy the logical request. That's because the amount of resources required to satisfy the user query depends completely on the query that's coded, the database statistics available, and the database design. So the interrupt count is not a reliable method to manage resource utilization; instead, to avoid high resource utilization by OLQ, most sites establish limits for OLQ as they would for other user applications. The only way to ensure that long-running OLQ queries do not consume substantial resources at run-time is to establish limits, either at the system or task level; and to establish an interrupt count that does not interfere with these limits being reached (see more on this below). The same approach is often used for Server tasks and tasks that process generalized SQL queries, since these requests are all logical and do not specify any actual physical I/O DML commands. Often users are unaware of what resources are required to satisfy such a query, so setting limits is the only way to avoid tasks that use too many resources.

The unique factor which differentiates OLQ in this regard is that both the call limits and the interrupt count can have an impact on this process. That's because when the interrupt count is reached, the resource counters for the task (calls, I/Os, etc) are reset to zero. Thus if the interrupt count is set too low, it is possible for the resource counters to be continually reinitialized, so that the specified limits are never reached. A few actions will help prevent this from happening:

  1. In the sysgen OLQ statement, specify an moderately high interrupt count; this will help increase the odds that for resource-intensive tasks, the task or system limits are reached first;
  2. Do not use exits to extend the limits for OLQ when they are reached (this is sometimes done for system tasks, but OLQ functions as a user application, so it should not be done here);
  3. Specify limits specific to OLQ on the task statement, to avoid using the more generic system-level limits.
  4. Monitor the success of how the interrupt count and OLQ task limits interact. For some sites, depending on database designs and the types of queries that are issued, the interrupt count will need to be adjusted so that the limit counters are not reset before the limits are reached by resource-intensive tasks.

Additional Information:

More details about setting limits for the OLQ task in the System definition can be found in the "CA IDMS System Generation Guide".

More details about how to set the Interrupt Count for OLQ can be found in the manual "CA OLQ Online Query for CA IDMS: OLQ Online Query Reference Guide".