Advanced custom certifications cause Data Aggregator synchronization failures after r2.6 upgrade

Document ID : KB000032118
Last Modified Date : 14/02/2018
Show Technical Document Details

Problem

There is a known issue with Data Aggregator (DA) Synchronization Failure that may be observed after upgrading to  the new r2.6 release of CA Performance Management (CAPM). This is observed only when specific advanced Custom Vendor Certification (VC) edits are present. These particular edits result in modification of the interface component element Indexes attribute to contain multiple indexes for polling.

 

If there has been no change made to any Interface VC's, the problem would not be observed. If that is the case the r2.6 upgrade can proceed without further concern.

If there have been changes made in the past to any Interface VC's, this problem could be observed after the upgrade.

 

Impact

While the problem does cause DA Synchronization failures, no data loss has been observed when this problem occurs. Despite the synchronization failure, DC polling continues and data is still stored for DB insertion into the Data Repository (DR) system.

 

Symptom

This problem is observed when we see DA Synchronization Failure with the Data Aggregator Data Source (DS) in the CA Performance Management UI.

If we see synchronization failure for the DA DS post r2.6 upgrade, we can check the DA karaf.log file for a very specific exception stack trace that shows the problem occurring.

This log file can be found in the directory:

<DA_HOME>/IMDataAggregator/apache-karaf-2.3.0/data/log

The default location if using /opt as home would be:

/opt/IMDataAggregator/apache-karaf-2.3.0/data/log

Within the karaf.log file we want to look for this particular WARN exception to show up:

 

WARN  | ... | PhaseInterceptorChain | ache.cxf.common.logging.LogUtils  452 | org.apache.cxf.cxf-api |

| Application {http://netqos.com/ProductSync2WS}IProductSync2WSService#{http://netqos.com/ProductSync2WS}PullRequest has thrown exception, unwinding now

org.apache.cxf.interceptor.Fault

...

Caused by: java.lang.NullPointerException

        at com.ca.im.connector.productsync2.PullInterfaceItemsHandler.calcIfIndex(PullInterfaceItemsHandler.java:276)[238:com.ca.im.NPCConnector.bundle:2.6.0.RELEASE-104]

        at com.ca.im.connector.productsync2.PullInterfaceItemsHandler.processBatchedItem(PullInterfaceItemsHandler.java:110)[238:com.ca.im.NPCConnector.bundle:2.6.0.RELEASE-104]

        at com.ca.im.connector.productsync2.PullInterfaceItemsHandler.processBatchedItem(PullInterfaceItemsHandler.java:36)[238:com.ca.im.NPCConnector.bundle:2.6.0.RELEASE-104]

        at com.ca.im.connector.productsync2.PullStageHandlerBase.getNextBatch(PullStageHandlerBase.java:188)[238:com.ca.im.NPCConnector.bundle:2.6.0.RELEASE-104]

        at com.ca.im.connector.productsync2.PullSyncRequest.handleStage(PullSyncRequest.java:162)[238:com.ca.im.NPCConnector.bundle:2.6.0.RELEASE-104]

        at com.ca.im.connector.productsync2.PullSyncRequest.execute(PullSyncRequest.java:122)[238:com.ca.im.NPCConnector.bundle:2.6.0.RELEASE-104]

        at com.ca.im.connector.productsync2.ProductSync2WS.pullRequest(ProductSync2WS.java:593)[238:com.ca.im.NPCConnector.bundle:2.6.0.RELEASE-104]

 

Determine If Your Environment Will Be Impacted

How to detect whether a CA Performance Management environment upgrading to r2.6 will be affected.

The simplest recommended method of determining if your environment will be impacted by this is with a VSql query issued against the Data Repository DB.

Run the VSql:

If you know your environment has customized Interface Vendor Certifications, or even think that might be the case, before upgrading to the r2.6 release run the following SQL query. It will check whether or not the environment will encounter this issue.

If you have already upgraded to r2.6 and have seen Synchronization Failures for the DA DS, check the karaf.log noted above for the known specific error tied to this problem. The below SQL query can still also be run post r2.6 upgrade to validate what the problem is.

Complete the following:

NOTE: The VSql Vertica query below requires the -U user and -w password flag values which reference the 'user' that the DA uses to insert data into the Vertica DB. To determine these values check the dbconnection.cfg file on the DA. That file is found in the directory (using default install path) /opt/IMDataAggregator/apache-karaf-2.3.0/etc. Look in the dbconnection.cfg file and note the "dbUser=<userName>" and "dbPassword=<passwd>" values. These values go into the vsql query below. The dbUser value is set against the -U flag, the dbPassword value is set against the -w flag.

NOTE: Ensure the DB is up and accessible before running the query. If it is not for some reason, please start the DB or contact support for assistance as needed.

  • Log in to the Data Repository DB server as the dradmin or equivalent OS DB admin user that owns the DR install.
  • Run the following on the DR node as the dauser, replacing the <userName> and <passwd> values with those found in the dbconnection.cfg file.

/opt/vertica/bin/vsql -U <userName> -w <passwd> -c "select count(distinct item_id) from v_list_attribute_instance where item_id in (select distinct item_id from v_item_facet where facet_qname ='{http://im.ca.com/inventory}Port') and attr_qname = '{http://im.ca.com/inventory}DeviceComponent.IndexList' and byte_array_value is null and rank > 1;"

An example of this being run can be seen here showing what the output will look like:

[dradmin@DR_HOST bin]$ /opt/vertica/bin/vsql -U daadmin -w dapass -c "select count(distinct item_id) from v_list_attribute_instance where item_id in (select distinct item_id from v_item_facet where facet_qname ='{http://im.ca.com/inventory}Port') and attr_qname = '{http://im.ca.com/inventory}DeviceComponent.IndexList' and byte_array_value is null and rank > 1;"

count

-------

    0

(1 row)

If the results from the VSql query return a count that is greater than 0 (zero), whether the count is 1 or 100+, the environment has ports that will fall into this situation. This means that the environment is likely to run into this problem some time after upgrading to r2.6 when either/or:

  • Change Detection runs against an existing discovered device, a new port is discovered or an existing port is updated using the involved VC customizations, and this change is attempted to be synced to the CA PC server from the DA DS
  • A new device is discovered, which has such ports that use the involved custom VC

 

Resolution

This problem is resolved in one of the following two options:

  • The upcoming r2.6 October Monthly Update Kit
  • Upgrade to r2.6. Post install deploy a patch on the Data Aggregator (without a DA restart) that will avoid the synchronization failure.

 

Option 1: Run the VSql query above. If it is determined this problem is likely to occur in your environment, do not upgrade to the GA r2.6 release. Instead wait for the r2.6 October Monthly Update Kit to be released, and upgrade using that install to put the new fixed code in place that prevents this problem from occurring.

 

Option 2: Upgrade to the GA r2.6 release; post install deploy PTF patch DA_2.6.0_01 to place code on the system that prevents the problem from occurring. This patch is only available on request from CA Support through a support case requesting the fix. If this is the chosen route, when opening the case be sure to include:

  1. CA Remote Engineer (CARE) diagnostics data set output from the DA host
  2. Output of the VSql query showing the results that are greater than 0 (zero).