Basic Issue Troubleshooting Guide

Document ID : KB000025373
Last Modified Date : 14/02/2018
Show Technical Document Details

Description:

Basic Issue Troubleshooting Guide for several types of Access Control Issues. Also covers the type of information needed when opening Access Control support issues.

Solution:

IMPORTANT: This article contains information about modifying the registry.
Before you modify the registry, make sure to create back up of the registry and ensure that you understand how to restore the registry if a problem may occur.
For more information about how to back up, restore, and edit the registry, please review the relevant Microsoft Knowledge Base articles on support.microsoft.com.

This document is aimed at assisting CA's clients to improve the time to resolution of issues by providing a step by step guide on how to describe problems and deliver the information CA Technical Support may need. This information should be provided when immediately opening an issue with CA Technical Support.

General Information

Basic questions:

  1. OS version and Service Pack/Patch level. See Appendix A

  2. eAC version Service Pack and Patch level.

    1. Run the command in UNIX: /opt/CA/eTrustAccessControl/issec
    2. Run Access Control Policy Manager in Windows and click Help -> About
    3. Do a Properties on an Access Control binary in Windows (i.e. seosagent.exe), then click the Version tab

  3. Problem description. Send a step-by-step description "how to recreate the problem".

    Example
    i.e. A Terminal access problem from one server to another.

    1. Login into terminal A with telnet, username is userA
    2. SSH to terminal B with this command: ssh -l userB terminalA
    3. Cannot login to the other server, get a message: "Connection Closed" when trying step b.

  4. Has the problem occurred since installation, or has it appeared after some time? If after some time, how long has AC bee installed?

  5. Has anything changed recently on the system, either at or near the same time the behavior started to occur?

FILE access problems:

Describe a file access problem

I am trying to protect against unauthorized access to files within /abc/*. I have written rule: nr FILE /abc/* defacc(n) audit(all)

However the root user can still access the file. Why is this happening if the default access is none.

Questions:

  1. For the USER you are testing the problem for set the AUDIT ALL action.
    Document Reference:
    Access Control Command Reference: chusr/editusr
    i.e. chusr userA audit(all)

  2. For the RULE resource you are accessing set the AUDIT mode to all.
    Document Reference:
    Access Control Command Reference: chres/editres
    i.e. chres FILE /tmp/abc audit(all)

  3. Start a TRACE and recreate the problem.

    1. Start trace: secons -tc -t+
    2. Recreate the problem
    3. Stop trace: secons -t-
    4. send the trace file $SEOSDIR/log/seosd.trace
      Where $SEOSDIR is your Access Control directory on the system

  4. Send all files in the $SEOSDIR\log directory.

  5. Export your audit records to a text file before sending so we can immediately review the audit and trace file. To export your audit records to a text file run the command
    seaudit -a > audit.txt

  6. Send an output file of "so list" (selang -c "so list" > c:\so_list.txt)

  7. Using 'regedit.exe' export the HKLM\software\ComputerAssociates registry keys to a file.

  8. Send the Database rules in a text file (rules.txt). Run the command
    dbmgr -e -r -f rules.txt

What to look for in the trace and audit files when encountering a FILE rule issue:

  1. Denials and untrusted records. These show in the audit.txt file and have a code of D or U. A reason is normally given in a numeric code in the same line. You can get the meaning of the numeric code by running the command: seaudit -t | grep number.

  2. Repeated records for a single file, this can indicate a reason for performance slowdown or too high a level of auditing (i.e. audit(all) instead of audit(failure).

PERFORMANCE problems:

Example of a Performance problem
The server is running Oracle 10.0.2.3 database in a standalone configuration.
Before Access Control was installed the sqlplus started in 3 seconds.
The command line used was: sqlplus -a
After Access Control was installed sqlplus took 18 seconds to start. When Access Control was shutdown the sqlplus started again in 3 seconds.

Questions:

Analysis of performance problems requires information about the system performance before AC is installed or started. Then you should give information about performance degradation during AC startup and while AC is running . Provide answers to the basic questions and:

  1. Did the problem start immediately after installation?

  2. Does performance degrade at startup of the operating system or Access Control or after some time?

  3. Is there any process running at specific times? (Schedule)

  4. Does the performance degrade around the same time?

  5. Does the problem occur only with the AC kernel extension loaded?

    1. verify this by unloading the kernel extension with: SEOS_load -u,
    2. if AC daemons are also loaded then stop AC daemons with: secons -sk
    3. to determine if the AC kernel extension is loaded run the command 'issec' and look for this line in the output: eTrustAC kernel extension is loaded.

  6. Does the problem occur when AC daemons are loaded.

  7. Are there any 3 rd party filter drivers installed on the system? For example: Netware Associates Anti Virus (Mcafee), Sophos, etc

  8. Is there a system monitoring application installed? For example:TNG, Tivoli, BMC, insight manager

  9. Check the database with "dbmgr -u -all" (after stopping all services!)
    You can optionally rebuild the database with:

    1. cd /opt/CA/eTrustAccessControl/seosdb
    2. /opt/CA/eTrustAccessControl/bin/dbmgr -u -build \ seos_cdf.dat
    3. /opt/CA/eTrustAccessControl/bin/dbmgr -u -build \ seos_odf.dat
    4. /opt/CA/eTrustAccessControl/bin/dbmgr -u -build \ seos_pdf.dat
    5. /opt/CA/eTrustAccessControl/bin/dbmgr -u -build \ seos_pvf.dat

  10. What processes are using the most resources when performance degrades? Are they AC processes such as: seosd, seagent, seoswd or other processes

    1. for Windows OS use the 'Task Manager'. In the 'Processes' TAB "click" on 'CPU' to set the order of the listed processes by "using" CPU resources at the top. Do the same for Memory. Take a screen shot (Ctrl + Prt Scr)
    2. for UNIX: Depending on the OS use top, prstat, vmstat, etc

  11. Are the eAC processes keeping the same PID? (from the time the services are started until they're stopped).

    Data:

  12. Gather the operating system logs or event files.

    1. For Windows
      Export the application and system event logs to an .evt format file and send. (Note the date and time eAC was started, when the problem occurs and when eAC was stopped).
    2. For UNIX: /var/adm/syslog and/or /var/adm/messages. These files are in different locations for the type of UNIX being used. They generally fall into the category of system logs.

  13. Run a trace from the time of Access Control daemons startup. To do this:

    1. edit the seos.ini file and set the value: trace_to=file
    2. restart Access Control
    3. reproduce the problem
    4. stop the trace file run the command: secons -t-

What to look for in the trace:

  1. Repeated events in a small period of time (i.e. many file accesses or connections)
  2. Processes that being killed
  3. In the trace file: ACEEH=-1 or U=negative number. This indicates that AC is unable to resolve a username or assign a value to a resource. This can indicate AC caches need to be updated or an incorrect rule for logins.

Troubleshooting:

Windows

  1. Disable the eAC filter driver:

    1. Stop eAC services
    2. use REGEDT32.exe to change this registry key to zero:
      \\HKLM\software\memco\seos\seos\UseFsiDrv=0
    3. Does the problem still occur?

CONNECTING to remote machine problems

Example of a connection problem
Automated FTP transfers are done every night via cron. After Access Control was installed the FTP transfers were timing out. When we did the FTP command manually the connection was automatically closed. We then turned off Access Control and manually did an FTP command. This time the connection was not killed.

Questions
Provide answers to the basic questions and:

  1. Has the encryption key been changed on any of the eAC machines? (using the 'sechkey' utility).

  2. Are any of the machines using an encryption method that is not the default method? (default=scramble. Other methods: DES, 3DES...)

  3. Has the tcp/udp port been changed in any of the machines?

  4. Are there any class TCP, CONNECT, HOSTNET or HOST rules that may be causing the problem? (for now these classes are active only on eAC for Unix).

    1. Run a trace file while AC has this connection problem

      1. Start trace: secons -tc -t+
      2. Stop trace: secons -t-
      3. Filename: $SEOSDIR/log/seosd.trace
        Where $SEOSDIR is your Access Control directory on the system

What to look for:
In the trace file look for connections that are blocked due to TCP rules or other rules.
In the AC audit file look for D records indicating a rule was blocking the port
Search for the port # and see if any code other than 'P' (Permitted) is being seen.
Find out if the machine has any Firewall or ports closed. Use UNIX commands such as netstat -an to find out the open ports.
Look in your syslog and see if any ports are having a problem binding.

PMDB problems:

Example of a PMDB problem
The PMDB setup is as follows:
Server A has the master PMDB called masterpm, it has 3 subscribers: PM1@aix, PM2@sun, and PM3@hp
I am pushing rules from masterpm to the subscribers.
PM1 and PM2 receive the update, but PM3 never receives the update.
When I look at the error log of masterpm I see the following message: cannot receive update from non-parent PMDB.

Questions:
Provide answers to the basic questions and:

  1. Describe the PMDB scheme. (What machines are Unix/NT). Use hostnames to describe the scheme. Tell which machines are subscribers of other machines in the PMDB hierarchy.
  2. Explain what action was being done when the PMDB problem was encountered.

Data:

  1. Send output files of the following commands. (note the upper/lower case option letters).

    1. sepmd -L <pmdb name>
    2. sepmd -C <pmdb name>
    3. sepmd -e <pmdb name>

What to look for:
In the sepmd -L output see for subscribers that are unavailable. Then look in the sepmd -e output for that hostname to find out which errors are being encountered.
Put the subscriber back as available with the command: sepmd -r <pmd> <subscriber> (i.e. sepmd -r pmdb1 client.ca.com)
Then re-run the sepmd -L and sepmd -e to review what the latest errors or status of the PMDB is.

Database maintenance:

The Access Control database is kept in a set of flat files in the seosdb directory under the AccessControl install directory. Periodic database maintenance ensures the files are always optimized for speed and reliability. Also as many updates are made to these files they can have fragmentation which slows down performance of lookups to the database by the Access Control daemons.

You should re-index the database when you feel there is a performance problem. If possible you can rebuild the database during an maintenance period (i.e. every 3 or 6 months).

All of these steps should be done logged in as the root user.

  1. Indexing, rebuilding

    1. Backup the database files ($SEOSDIR is your Access Control database directory i.e. /opt/CA/eTrustAccessControl) in $SEOSDIR/seosdb
    2. index and rebuild with these commands:

      1. cd /opt/CA/eTrustAccessControl/seosdb
      2. /opt/CA/eTrustAccessControl/bin/dbmgr -u -build \ seos_cdf.dat
      3. /opt/CA/eTrustAccessControl/bin/dbmgr -u -build \ seos_odf.dat
      4. /opt/CA/eTrustAccessControl/bin/dbmgr -u -build \ seos_pdf.dat
      5. /opt/CA/eTrustAccessControl/bin/dbmgr -u -build \ seos_pvf.dat

  2. recreating a db

    1. Backup the database files ($SEOSDIR is your Access Control database directory i.e. /opt/CA/eTrustAccessControl) in $SEOSDIR/seosdb
    2. Export the ruleset to a text file.

      If AC daemons are running:

      i. cd /opt/CA/eTrustAccessControl/seosdb
      ii. cd /opt/CA/eTrustAccessControl/bin/dbmgr -e -r -f /tmp/rules.txt

      If AC daemons are not running:

      i. cd /opt/CA/eTrustAccessControl/seosdb
      ii. cd /opt/CA/eTrustAccessControl/bin/dbmgr -e -l -f /tmp/rules.txt

    3. Stop the AC daemons: secons -s

    4. Make a new seosdb directory

      1. cd /opt/CA/eTrustAccessControl
      2. mv ./seosdb ./seosdb.bak
      3. mkdir seosdb

    5. Make a clean AC database

      1. cd /opt/CA/eTrustAccessControl/seosdb
      2. /opt/CA/eTrustAccessControl/bin/dbmgr -c -cq

    6. Import the rules into the new database.

      1. cd /opt/CA/eTrustAccessControl/seosdb
      2. /opt/CA/eTrustAccessControl/bin/selang -l -f /tmp/rules.txt

    7. Start AC daemons: seload

LOG ROUTING problems:

Example of a log routing problem
There are 40 servers that send their local audit records to a central log collector, all the records are being received, however some of the records show an IP number instead of a hostname. Why is it being show like this?

Questions:
Provide answers to the basic questions

Data:

  1. Copy of the selogrd.cfg file
  2. Output of the command: selogrd -d when the problem is occurring
  3. Details about what the problem with log routing is (i.e. some logs are missing, which logs?)
  4. Output of the seos.audit file which has the logs which are not being routed.
  5. Export these records to a text file. i.e. seaudit -a > audit.txt
  6. Output of seos.collect.audit from the log collector so comparison of routed logs can be done with the original logs. On the collector log directory run this command: seaudit -a -fn seos.collect.audit > collector.txt

What to look for:
Identify the exact times of missing logs
Compare the audit.txt and collect.txt to determine if the logs are actually in the original audit.txt file, if they are not there then they cannot be missing from the collect.txt as they were not routed in the first place.
Look in the selogrd -d output for any error messages, normally a selogrd will be periodically checking for new audit records to route. If other messages than that appear it can indicate a problem.

Appendix A:

Finding out your exact OS and patch level.

SUN

Output of command uname -a: 5.10 Generic_120011-14 sun4v sparc
Contents of the file /etc/release:
Solaris 10 8/07 s10s_u4wos_12b SPARC
Means: Solaris 10 Update 4

Linux

Output of the command uname -a
2.6.24-7-generic #1 SMP Thu Feb 7 00:56:31 UTC 2008 x86_64
Distribution Name
Contents of /boot/grub/grub.conf (if using the GRUB bootloader)
i.e. RHELinux AS 4.0 Update 4
Means: Redhat Linux AS 4.0 UPDATE 4 (kernel 2.4.24-7) on x86_64 hardware

AIX

Output of the command oslevel -r: 5300-04
Output of the commands: bootinfo -y and bootinfo -K
Returns: 32 or 64
Bootinfo -y means the hardware capability and bootinfo -K returns the
kernel level the OS is currently running

HP-UX

Output of uname -a: B.11.11
Output of getconf KERNEL_BITS: 32
Means: HP-UX 11.11 - 32 bit kernel

Windows
Method 1:

Right Click on My Computer
Click on Properties
On the General Tab in the text area titled System: you will see the Version (i.e. Windows XP Professional Service Pack 2)

Method 2:

Run winmsd from a command line or Start > Run
Export the resulting system report as a text file and send into CA Support