How do I delete duplicate records from the Ingres database?

Document ID : KB000054656
Last Modified Date : 14/02/2018
Show Technical Document Details

Description:

Sometimes you may have duplicate entries in your database. This could have been caused by a number of reasons. (Schema inconsistencies, Applications added duplicate entries, directory not configured to use unique-attrs, etc).

Solution:

There are two ways of removing the duplicate entries from the Ingres Database.

  1. DXloaddb (recommended)

    1. Take a LDIF dump from the database
    2. Delete the duplicate entry from the LDIF file
    3. Use ldifsort to sort the file
    4. Use dxloaddb to load the data from the LDIF file into the database

  2. Ingres (For super users only)
    The following procedure will assist you in deleting the duplicate entry directly from the Ingres database.
    Please be aware that any mistakes made during the procedure may corrupt the database.
    Before performing the following steps please take a backup of your database and test the procedure on a test system.

Eg. Your database has 2 entries of John and you want to delete one of them.
NOTE: Perform the following steps as user dsa on Unix and the user who has permissions to access the database on Windows

  • Run the following command on the command prompt.
    dxserver stop <dsaname>
  • Create a text file(findentry.sql) with the following content.
    SELECT eid,rdn FROM dit
      WHERE rdn LIKE '%JOHN%' \p\g\q
    The above sql query will give us all entries that have *JOHN* (Word JOHN is Case-sensitive), The "%" near string JOHN means wildcard.
    Note: The RDN should consist *JOHN*. You can replace this if the naming attribute is different, eg. if it was an ID=112233, you need to replace %JOHN% with %112233%.

  • Run the following command
    sql <dbname> <findentry.sql> findentry.log
    findentry.log will have the following entries Eg. 
        EID                   RDN
        2222             JOHN SMITH         14         
        4444             JOHN SMITH         14
  • You may want to delete a specific duplicate entry. To find out the attributes which are in the entry do the following
    Create a text file(entrydata.sql) with the following content.
    SELECT * FROM entry e,dit d
         WHERE e.eid = d.eid
          AND (d.eid=2222 OR d.eid=4444) \p\g\q
  • Run the following command
    sql <dbname> <entrydata.sql> entrydata.log
    View the entrydata.log file to see what information both the entries consisted of. Decide which entry you need to delete. You need to note the eid value of the entry. eg. you decide to delete eid 2222
    If you need to delete all the references of the duplicate entry (eid=2222) from the database. The following will help you achieve this.

  • Create a file del-eid.sql with the following contents and replace <eid> with the real EID (eg 2222 like in the above example)
    delete from entry where eid = <EID>;
    delete from tree where eid = <EID>;
    delete from dit where eid = <EID>;
    delete from name where eid = <EID>;
    delete from search where eid = <EID>;
    delete from subsearch where eid = <EID>;
    delete from blob where eid = <EID>;
    delete from alias where eid = <EID>;
    commit; 
    \p\g\q
  • Run the following command
    sql <dbname> <del-eid.sql> del-eid.log
  • To confirm you have deleted all references of eid=2222 you can do the following
    Create a text file(check-del-eid.sql) with the following content and replace <eid> with the real EID you want to delete (eg. 2222)
    SELECT * FROM entry e,dit d
      WHERE e.eid = d.eid
      AND d.eid=<eid> \p\g\q
  • Run the following command
    sql <dbname> <check-del-eid.sql> check-del-eid.log
  • View the contents of check-del-eid.log. The file should report (0) rows.

  • You may want to take a backup of your database at this stage, in case you want to restore to this point then you will not have to go through this procedure again.
    You can do one of the following for taking a backup

    • Take a LDIF dump of the database using DXdumpdb tool
    • Take a checkpoint of the database using the DXbackupdb tool

  • Run the following command to start the dsa
    dxserver start <dsaname>

APPENDIX

You can find out if any duplicate entries exist by running the following sql query on your database.

SELECT a.rdn, a.eid, b.eid FROM dit a, dit b
    WHERE (a.eid > b.eid AND a.rdn = b.rdn) \p\g