Scaling Client Automation (ITCM): How to improve collect task and replication task performance by limiting the amount of hardware and software scans sent by agents.

Document ID : KB000046211
Last Modified Date : 02/03/2018
Show Technical Document Details
Issue:

Client Automation (ITCM) engines performing collect tasks with scalability servers are struggling to keep up with the incoming file volume from the agents.  Collect tasks may be running for several hours and processing tens of thousands of files reported by agents in the environment.  As a result, the engine processing upload replication to the enterprise Manager may also become backlogged with keeping the enterprise manager updated with all these changes.

DSM Explorer performance may also be adversely affected, as well as performance and memory consumption of the CA Messaging (CAM) service.

Environment:
Client Automation (ITCM) -- any version.
Cause:

There are various contributing factors to this problem:

1- Collecting hardware inventory too often from the agents.
2- Too many overall agents reporting to a single domain manager.
3- Too many agents reporting to a single scalability server.
4- Using the domain manager as a scalability server.
5- Poor distribution of tasks to engines.
6- Too many engines running in parallel, processing into the database.
7- Dynamic groups set to evaluate too frequently.

Resolution:
Recommendation: Reduce the frequency of hardware inventory scans.
Out of the box, configuration policy allows agents to report an unrestricted number of hardware scans per day.
 
The asset management (AM) agent performs both scheduled and event-based scans on the agent.  Scheduled scans include the "daily registration refresh" and "Run the UAM Agent" policies from within the CAF Scheduler policy.  Event-based scans include: user logon, system reboot, or network address change.
 
All these events, trigger the agent to re-scan your Agent and report results to the Scalability Server.
 
To reduce the amount of daily incoming data to the scalability server and domain manager, follow these recommendations:
DSM Explorer -> Control Panel -> Configuration -> Collect Tasks -> All Collect Tasks
 
Collect Tasks.png 
 
The general recommendation is to reduce the "Inventory Configuration" (hardware inventory configuration) to once per day scheduling.  To set the scheduling, simply right click on the task, and select, "Scheduling..."
 
Scheduling Context Menu.png
 
Always Run Job.png  Run Once per Day.png
 
Note 1: Depending on realistic business requirements and environment size, it may be favorable to choose "Run only once a week" for the inventory configuration.

Note 2: By default, the "Software Inventory Configuration" is already set to once per day.  It is wise to double check the default has not been changed.  Also per business requirements and environment size, it may be favorable to choose "Run only once a week" for this scan, as well.

Note 3: If the domain manager is linked to an enterprise manager, as in the first screenshot, you will see two sets for each built-in task.  As the agent will consider scheduling settings from both the domain and enterprise managers, ensure the settings are mirrored on both managers.  To change the values for the replicated/enterprise created tasks, you must use DSM Explorer on the enterprise manager.  The configuration seen on the domain manager is actually replicated from the enterprise manager.
 
Note 4: If you have additional custom inventory tasks, ask yourself how often you need the agent to report this data?  The answer should hopefully not be more often than once per day, and in most cases, once per week or month may be sufficient.  Ask yourself how often you realistically need this data reported by the agents, and what may be the trade-offs for requesting inventory too often.

Recommendation: Overall number of agents, per domain manager.
While there are no published limitations of the number of agents that can report to a single domain manager, a good best practice is to limit each domain manager to managing no more than 10,000 active registered agents.

Could a single domain manager manage more agents?  Sure, but at what trade-offs?  Is your ITCM architecture and available resources capable and positioned to handle the higher capacity?

Recommendation: Overall number of agents, per scalability server.
While there are no published limitations of the number of agents that can report to each scalability server, a good best practice is to limit each scalability server to no more than 1,000 active registered agents.

Could a single scalability server handle more agents?  Sure, but at what trade-offs (again)?  Consider your internal service level objectives (SLOs) for asset jobs, software jobs and reporting.

Recommendation: Don't use the domain manager as a scalability server.
Although the domain manager contains a built-in scalability server, avoid registering any agents with the domain manager itself.  Using the domain manager for a scalability server should be a rare exception, except for the domain manager's own agent which is required to report locally.

The reason for this has to do with the dual nature of the CAM service.  CAM is used for local/internal application traffic, as well as for remote traffic across the network.  Therefore, the more agents reporting directly to the domain manager, the busier CAM is handling all this communication.  The more remote traffic that CAM on the domain manager has to process, is less time CAM has for internal application traffic, resulting in performance degradation of the application.

It is always a good best practice to ensure all your agents are reporting appropriately to external scalability servers that are locally or regionally available to serve that agents.  A scalability server is not really serving its purpose, if its not physically located somewhat near its registered agents.  Physical location and network response times should always be considered.

Recommendation: Overall number of engines, per domain manager.
While there are no published limitations of the number of engine instances per domain manager, take into consideration two factors:

1- Each engine is a portal into the SQL database for concurrently processing information.  In particular, the engines processing collect or replication tasks typically generate a high volume of transactions against the SQL database.

2- The SQL database schema on the back-end remains a constant, compared to the varying number of engines.

Therefore, while increasing the number of engines can help introduce a degree of parallelism for processing data, having too many engine processes in parallel can also begin to work against the system.  Unfortunately ITCM is unaware of the performance of the underlying database, hence there is no automatic load balancing performed by the system.  It is up to the ITCM and SQL administrators to monitor the performance of the database, and make decisions to improve performance and throughput, as necessary.

Considering the factors mentioned above, a typical best practice is not to create any more than 8 or 10 additional engine instances, in addition to the built-in SystemEngine.

Recommendation: Advanced engine settings.
By default, each engine instance is configured to collect 10,000 files from a scalability server when processing a collect task.  They are also configured for a default rest interval of 60 seconds between tasks.

A good best practice is to reduce the number of collected files down to 500 or 1,000 files, per cycle.  This configuration allows the engine to cycle through a collect task (or list of collect tasks) in a shorter interval. This is a best practice because it allows the computer registration messages to be processed more frequently, and it allows the scalability servers to be "validated" more often.  During the "validation" phase of a collect task, a series of synchronizations are performed, including: 1-Software signature file, 2- Inventory module configuration and scheduling and 3- Asset job configuration and scheduling.

Another best practice here is to reduce the interval between processing engine tasks, down to between 20 and 30 seconds.  The idea is if you have a lengthy list of collect tasks or many tasks, you will reduce the amount of idle time of the engine, and with less down time the engine will be able to cycle through more tasks in a shorter period of time.

Recommendation: Tasks linked to the SystemEngine
As the "SystemEngine" is the default/built-in engine for ITCM, it performs additional background tasks that other additional engine instances are not performing.  Due to this, it's recommended not to link any time-consuming task with the SystemEngine.
 
Remember, the SystemEngine is also used as the default engine for registering collect tasks, when new Scalability Servers are deployed.  Check the SystemEngine, and reassign and balance these tasks to other engine instances.  Again, keep in mind the number of agents reporting to a scalability server, and how long it might take to collect when reassigning the collect task.
 
Recommendation: Dynamic group evaluations
When a dynamic group (or query-based group) is created, DSM Explorer allows you to configure a specific "evaluation engine", along with an "evaluation interval".  Always specify a specific engine for the query to be evaluated.  Never use "All Engines" and never set the evaluation interval to anything less than 60 minutes.  Always ask yourself, how often do I need the members of this group to be updated?  In most cases, once or twice per day is typically sufficient enough.
 
It is never recommended to set any dynamic group to be evaluated by "All Engines".  This means the engine will evaluate the dynamic group every time, any engine, completes evaluating any task.  Setting this value will decrease the performance of engines and degrade the performance of DSM Explorer, depending on the complexity of the query being evaluated.
 
The following SQL query is useful for listing all the dynamic groups, the query on which they are based, the creation date of the query, the specified interval at which the engine evaluates the query, which engine is evaluating the query and the last evaluation date of the dynamic group.
 
select gd.label as 'Dynamic Group',
qd.label as 'Query Name',
dateadd(s,qd.creation_date,'19700101') as 'Query Creation Date',
gd.eval_freq as 'Eval Interval',
eng.label as 'Eval Engine',
dateadd(s, datediff(s, getutcdate(), getdate()), dateadd(s, gd.last_eval_date_time, '19700101')) as 'Last Evaluated'
from ca_group_def gd with (nolock)
inner join ca_query_def qd with (nolock) on gd.query_uuid=qd.query_uuid
inner join ca_engine eng with (nolock) on gd.evaluation_uuid=eng.engine_uuid
order by gd.last_eval_date_time desc