Verifying Statistics for SQL and NonSQL Databases

Document ID : KB000054623
Last Modified Date : 14/02/2018
Show Technical Document Details

Description:

In order to ensure that SQL DML statements are optimized to use the most efficient access method that will satisfy the request, it is imperative that UPDATE STATISTICS be run against databases that will be accessed by SQL DML. This is true whether the database is originally defined in SQL or nonSQL DDL. Statistics for SQL-defined databases are stored in the catalog, as part of the entity definitions, and can easily be seen and verified there. Statistics for nonSQL databases are not found in the catalog and thus extra steps are required if there is a desire to verify that accurate statistics are in place.

Solution:

When UPDATE STATISTICS is run against an SQL schema, area, or table, the statistics are stored in the DDLCAT area. When running UPDATE STATISTICS for a NonSQL schema, the DDLDML area containing the NonSQL schema is updated. Therefore, the DDLDML area needs to be available in update mode when the UPDATE STATISTICS is run for nonSQL.

Below are the SQL tables where statistics are stored, and the corresponding nonSQL records.

Tables NonSQL Records Statistics stored for these entities
SYSTEM.AREASA-018Physical database area
SYSTEM.COLUMNSCR-054Columns / Sortkey elements for sorted sets and indexes
SYSTEM.CONSTRAINTn/aConstraints
SYSTEM.INDEXSMR-052Indexes / Sets including the Calc set & Calckey elements
SYSTEM.TABLESRCD-113Table / Schema record

Specific columns in each table and record are updated to contain statistics; these columns and elements can be seen in the individual queries in each of the following sections.

Statistics for nonSQL databases

Users who are familiar with OLQ, Culprit, or other reporting tools can use them to look at the values in the DDLDML records using the records and field names noted above, and specifying subschema IDMSNWKA. The above records are obtained via the S-010 record by using the schema name as CALC key (S-NAM-010), and walking the network sets to the other records. Users who prefer to use SQL to view and/or verify these values can do that using the following steps:

  1. - Create a schema to access the dictionary records, using this syntax:
    CREATE SCHEMA <your-db-name>
    FOR NONSQL SCHEMA SYSDIRL.IDMSNTWK VERSION 1 ;
    It's possible that you will need to use a name other than 'SYSDIRL' to qualify the IDMSNTWK schema. If the above syntax gives an error, check with your DBA to see what dbname IDMSDIRL was run against, and use that name instead of SYSDIRL in this syntax. We're using your DBname for the schema name also, for ease of use; it's also a requirement for some IDMS components, so it may be useful later (or may already be defined).

  2. -Using this schema, you can access the columns with statistics in the DDLDML area with queries like the
    following.

    1. - To see the statistics for the physical areas, use
      SELECT SA.SA_NAM_018, SA.STAT_AREA_NUM_PAGES_018, SA.STAT_AREA_PAGE_SIZE_018,
      SA.STAT_AREA_NUM_PAGES_USED_018, SA.STAT_AREA_NUM_ROWS_018,
      SA.STAT_AREA_PCT_USED_018
      FROM SYSDICT."SA-018" SA, SYSDICT."S-010"
      WHERE "S-SA"
      AND S_NAM_010 = 'EMPSCHM'
      AND S_SER_010 = 100 ;

    2. - To see statistics for the records within each area, use
      SELECT SR.SR_ID_113, SR.STAT_REC_NUM_PAGES_113, SR.STAT_REC_NUM_PAGES_USED_113,
      SR.STAT_REC_NUM_ROWS_113, SR.STAT_REC_AVG_LENGTH_113,
      SR.STAT_REC_PCT_USED_113, SR.STAT_REC_PCT_FRAG_113
      FROM SYSDICT."SRCD-113" SR, SYSDICT."S-010" S
      WHERE "S-SRCD"
      AND S_NAM_010 = 'EMPSCHM'
      AND S_SER_010 = 100 ;

    3. - To see statistics for the sets in the schema (including the CALC set and statistics for the calckey), use
      SELECT SMR.SET_NAM_052, SMR.STAT_MEM_NUM_DISTINCT_KEYS_052,
      SMR.STAT_MEM_AVG_ROWS_SET_052, SMR.STAT_MEM_AVG_PAGES_KEY_052,
      SMR.STAT_MEM_AVG_CLUSTER_COUNT_052
      FROM SYSDICT."SMR-052" SMR, SYSDICT."S-010" S, SYSDICT."SRCD-113"
      WHERE "S-SRCD" AND "SRCD-SMR"
      AND S_NAM_010 = 'EMPSCHM'
      AND S_SER_010 = 100 ;

    4. - Finally, to see statistics for sortkey elements, use
      SELECT SCR.SCR_NAM_054, SCR.STAT_COL_NUM_DISTINCT_VALS_054, SCR.STAT_COL_LOW_VALUE_054,
      SCR.STAT_COL_HIGH_VALUE_054
      FROM SYSDICT."SCR-054" SCR, SYSDICT."S-010" S, SYSDICT."SRCD-113", SYSDICT."SMR-052" SMR
      WHERE "S-SRCD" AND "SRCD-SMR" AND "SMR-SCR"
      AND S_NAM_010 = 'EMPSCHM'
      AND S_SER_010 = 100 ;

  3. - Note that in all of the above queries, there are substitutions you will have to make:

    1. - In the FROM clause, instead of SYSDICT you should specify the schema you defined in step 1
      (with name = your dbname).

    2. - In the Boolean expression naming S_NAM_010 you should specify the name of your nonSQL schema instead
      of 'EMPSCHM'.

    3. - In the Boolean expression naming S_SER_010 you should specify the version of your nonSQL schema
      instead of 100.

  4. - Also, in reviewing the results, you may see some values that look very large, such as 1077952576. This is the numeric representation of x'4040404040' for some fields, and it means the field has not been updated. Other numeric values are valid and should be verified to ensure that the statistics are updated whenever the characteristics of the database change, including the quantity of stored records.

Statistics for SQL Databases

Simple SQL queries against the "SYSTEM" schema can be used to view the statistics stored for an SQL database.

  1. - Using this schema, you can access the columns with statistics with queries like the following.

    1. - To see the statistics for the physical areas, use
      SELECT SEGMENT, NAME, NUMPAGES, PAGESIZE, NUMPAGESUSED, PCTSPACEUSED
      FROM SYSTEM.AREA
      WHERE SEGMENT = 'SQLDEMO' AND NAME = 'EMPLAREA'

    2. - To see statistics for the records within a schema, use
      SELECT NAME, NUMPAGES, NUMROWS, AVGROWLENGTH, PCTSPACEUSED, PCTFRAGROWS
      FROM SYSTEM.TABLE
      WHERE SCHEMA = 'DEMOEMPL';

    3. - To see statistics for the indexes defined on tables in the schema, use
      SELECT * FROM SYSTEM.INDEX
      WHERE SCHEMA = 'DEMOEMPL';

    4. - To see statistics for constraints in the schema use
      SELECT * FROM SYSTEM.CONSTRAINT
      WHERE SCHEMA='DEMOEMPL';

  2. - Note that in all of these queries, there are substitutions you will have to make:

    1. - In the Boolean expression naming SEGMENT and AREA you should specify the name of the area where the tables in the schema reside;

    2. - In the Boolean expressions naming SCHEMA you should specify the name of the SQL schema you have updated statistics for instead of 'DEMOEMPL'.