Using VISION:Builder to Identify Records with Duplicate Data.

Document ID : KB000023691
Last Modified Date : 14/02/2018
Show Technical Document Details

Description:

How can you identify which records have duplicate keys or other data that is duplicated?

Solution:

If a file has records with duplicate data, there is frequently a need to identify which records contain the duplicates. As long as the data is sorted, this is easily resolved. Use a temporary field to store the value of the prior record, and compare the value in the record to the temporary field.

We could use this technique to identify keys, but let's say there is another field we want to analyze.

Let's take a simple case where we have these records where the key is bytes 1-8 and the field of interest is the 1 byte field that follows:

  RECORD 1   A
  RECORD 2   A
  RECORD 3   B
  RECORD 4   C
  RECORD 5   D
  RECORD 6   D
  RECORD 7   D
  RECORD 8   E
  RECORD 9   E

To get this report:

  --------------------------------------------------
      KEY     DATA             DUPFLAG
  --------------------------------------------------
    RECORD 1   A
    RECORD 2   A    DATA DUPLICATES RECORD 1
    RECORD 3   B
    RECORD 4   C
    RECORD 5   D
    RECORD 6   D    DATA DUPLICATES RECORD 5
    RECORD 7   D    DATA DUPLICATES RECORD 6
    RECORD 8   E
    RECORD 9   E    DATA DUPLICATES RECORD 8
  ** MK4FT03  TYPE 0  END OF REPORT.

We would code this in the ASL freeform syntax:

  CONTROL
  FILE MASTER  INPUT  NAME MYDATA
  FILE REPORT
  ;
  LOGIC: PROC
  ;
  DUPFLAG:  FIELD TYPE C  LENGTH 30   INIT ' '
  PRIORKEY: FIELD TYPE C  LENGTH 10   INIT ' '
  TEMP:     FIELD TYPE C  LENGTH  1   INIT ' '
  ; 
    IF DATA = T.TEMP    THEN
        COMBINE 'DATA DUPLICATES' T.PRIORKEY  STORE T.DUPFLAG  BLANKS 1
    END
  ;
    REPORT  KEY, DATA, T.DUPFLAG
            FORMAT HEADINGS NAME
    END
  ;
    LET T.TEMP = DATA
    LET T.PRIORKEY = KEY
    LET T.DUPFLAG = ' '
  ;
  END

This is the code in fixed format that is generated as a result of the ASL code above:

--------------------------------------------------------------------
   RUN     STMT    FILE    O N S D U T R A B S D S L  RB  R S M R S
   NAME    TYPE    NAME    L E E I P R E U U R L C S  PL  J O O F S
                           D W Q R D N P D F T M N T  TK  T P P O R
--------------------------------------------------------------------
(ASLRC   ) (RC) (MYDATA  )(S)         (S)                      (R)
 
     *****************************
     *  PROC NAME - LOGIC        *
     *****************************
---------------------------------------------------------------
STMT   FIELD    FIELD FLD DEC OUTPT EDIT      INITIAL
TYPE   NAME     LNGTH TYP PLC EDIT  LGTH       VALUE
---------------------------------------------------------------
(TF) (DUPFLAG ) ( 30) (C)
(TF) (PRIORKEY) ( 10) (C)
(TF) (TEMP    ) (  1) (C)
(TF) (TEMP___1) ( 15) (C)                (DATA DUPLICATES )
 
------------------------------------------------------------------  ------------------
STMT SEQ  LOG CON .....OPERAND-A.....  OPER  ............OPERAND-B    ......RESULT....
TYPE NO.  LEV CTR QLF FIELD   SEG-LVL  ATN   QLF FIELD                 QLF FIELD
------------------------------------------------------------------  ------------------
(PR)(100)           (DATA    ) 001-1  (EQ)   (T,TEMP        )
(PR)(101)                             (NS)     (103         )
(PR)(102)         (T,TEMP___1)        (C1)   (T,PRIORKEY    )          (T,DUPFLAG )
(PR)(103)                             (GO)     (SUB REPT___2)
(PR)(104)                             (R )     (DATA        )          (T,TEMP    )
(PR)(105)                             (R )     (KEY         )          (T,PRIORKEY)
(PR)(106)                             (R )   (C             )          (T,DUPFLAG )
 
     *****************************
     *  REQUEST NAME - REPT___2  *
     *****************************
------------------------------------------------------------------------------ --------
STMT REPORT                                 MAX  SEL SUM VERT FORMS PAGE PAGE LINE REQ
TYPE  DATE           REQUESTOR ID          ITEMS CTL RPT  SP  CNTRL WDTH HGHT NOS? TYP
--------------------------------------------------------------------------------------
(ER)(TODAY )                                                                       (S)
 
------------------------------------------------
 ST SUM V  8  PG  PG   I SP  MAXIMUM TB COLUMN
 TY RPT S LPI WID HGT  M FRM LPP PGS TO HEADING
 MP     P              G             LT TYP
------------------------------------------------
(E1)                                    (F)
 
--------------------------------
          SP  Q
STMT SEQ  BF  L  FIELD     SEG-
TYPE NO.  COL F  NAME      LVL
--------------------------------
(R1)           (KEY     ) 001-1
(R1)           (DATA    ) 001-1
(R1)         (T,DUPFLAG )

Note that similar logic can be used to find redundant data in dependent segments as long as the data is presented in sequence.