This Article contains a proven method of capturing essential log files used for analyzing customer problems that are very difficult to catch when they occur. This method includes a nas LUA script to automatically save off (dump) specified log files when an alarm occurs (based on an AO profile that filters on the given alarm message or any other filter you choose to define in the AO profile.) This script may be edited to suit your particular needs for capturing the appropriate log files when a problem occurs.
It is especially useful in situations where the problem is infrequent or inconsistent, occurs randomly or it exhibits a pattern of behaviour possibly occurring at a given time on the weekend or during evening hours.
The bottom line is that this script automates the process of capturing the issue when it occurs to allow easier analysis of the given event at the exact time in which it occurs.
Listed below are a few scenarios in which this method has been used with great success but there are many other situations/troubleshooting scenarios in which it can be applied.
- a customer received alarms from the data_engine when using Oracle as their backend database, at 11 PM every Saturday evening.
- a customer needed to capture hub logs when their remote tunnel dropped randomly.
- a customer received random occurrences of an alarm from their data_engine
- a customer was erroneously receiving alarms from a probe during a maintenance schedule
Note that this sample script can be used to automate the capture of log files for any probe and for any reason. Since the trigger is the alarm message text in this case, defining an accurate message filter (regex) is key.
How to setup the nas to dump log files when an error or event occurs.
Setup tips: (note that this is just an example and the probe you're dumping the log files for can of course be any probe.)
- Make sure the log level is set to 5 and logsize is set to at least 100000. Open the probe GUI and set the loglevel using the slider or use Raw Configure to set it.
- The log 'archive' folder where you plan on dumping the files MUST EXIST (you can reference any folder name you like but make sure you create the folder first)
- Increase the logsize to whatever makes sense to catch the relevant data.
In the Infrastructure Manager, you can select the probe and hold down the SHIFT key and rt-click to enter Raw Configure...mode, then add a key under the setup section named logsize if the key doesn't exist already. You can also access the probe's Raw Configure from the Admin Console.
- Open the nas probe and add this script below under the Auto-Operator-> Scripts Tab and save it. Optionally you can create a new Script folder and save it there for better organization of scripts.
------------------ cut here -----------------------
-- List of probe log file paths
from_file_path_nas = "E:\\Program Files (x86)\\Nimsoft\\probes\\service\\nas\\"
from_file_path_mm = "E:\\Program Files (x86)\\Nimsoft\\probes\\service\\maintenance_mode\\"
from_file_path_co = "E:\\Program Files (x86)\\Nimsoft\\robot\\"
-- List of probe log file names (without extensions)
file1 = "nas"
file2 = "_nas"
file5 = "controller"
-- Location of where to store the files (you must create this folder)
to_file_path = "E:\\Log Archive\\"
-- timestamp to be used as part of the log file extension
ts = timestamp.now()
-- Copy the log files to the folder you created
rc = file.copy (from_file_path_nas..file1..".log",to_file_path..file1.."-"..ts..".log")
rc = file.copy (from_file_path_nas..file2..".log",to_file_path..file2.."-"..ts..".log")
rc = file.copy (from_file_path_mm..file3..".log",to_file_path..file3.."-"..ts..".log")
rc = file.copy (from_file_path_mm..file4..".log",to_file_path..file4.."-"..ts..".log")
rc = file.copy (from_file_path_co..file5..".log",to_file_path..file5.."-"..ts..".log")
------------------ cut here -----------------------
- Define a nas Auto Operator profile that filters on the message text and has an Action mode of script and set the profile to run 'On message arrival.' An example of a nas AO profile used to test the script is attached to this Article. Activating such a nas AO profile will result in this profile executing whenever you receive an alarm that matches the regex string pattern for the alarm.
- Note that you can test your script and the results (logs written to the directory) to ensure operation before using it, by opening the nas Status Tab, then rt-click to "Send Test Alarm" message such as:
"Test message for Oracle error ORA-00604 testing 123" or whatever alarm is being generated when the particular event/problem is occurring.
- Check the alarm sub-console to make sure your test alarm was generated. Make sure you don't have a filter enabled for the alarm subconsole view that prevents you from seeing the alarm when its generated. You can click Reset to clear the filter to be sure the test alarm was generated.
- Then double check the log archive folder or whatever folder name you used in the script to save the log files in, to make sure the saved logs are generated/stored in the expected location.
If you run the script from within the nas scripts window you can verify the script run executed successfully. Here is an example of such output:
Here is a screen shot of a Log Archive location containing the dumped log files from a script execution:
Note that IF you need to save these log files on the remote hub/relay, you need to:
- Create the directory where you save them
- nas probe must be installed on the remote hub
- nas queue must be enabled to see the alarm messages locally in the Status window
- Create the AO profile with a proper message filter to capture the logs when the relevant alarm occurs, e.g.,
/.*Tunnel from NMS-Server \(20x.xx.xx.xxx\) has been disconnected. Reason: n/a.*/
- Test to make sure the log files are created using a test alarm
example AO profile (screenshot below) for the above scenario:
For whatever alarm(s) are causing the reported problem/issue, filter upon a single alarm if that makes sense OR multiple alarms if the alarms that are generated from the problem/issue tend to vary, e.g.,
But ALWAYS test the regex by using the nas to send a test alarm message and make sure the logs are then dumped as a result.