Using XREF with Fast Data Masker

Document ID : KB000014859
Last Modified Date : 14/02/2018
Show Technical Document Details
Question:

I need to use the XREF masking in a project. I was looking for documentation on the subject, but I can't find any. Do you have any document and script you could send me? I need to replace a user code by a user code that would be generated. 

I need to mask a username, first name and last name in multiple tables in different databases. The masked username, first name and last name are stored in a different database for now but I could bring it back in our scramble database.  I would also generate the new masked data in TDM and they are some rule I need to follow. Also the username is synchronized with our Windows AD groups so what we have must stay the same. 

Example: 
Username: alvert
Generated username: pusertft 

First name: Leo 
Generated first name: puse 

Last name: Robert
Generated last name: rtft 

Now every time I find « alvert » in a field I need to change it with « pusertft » and I also need to change the first and last name with the generated value. I think the XREF should do what I need to do. 

What is the Connectscramble.txt format and what are the values inside the file and where should the file be? Once the connection is made and that I have setup the XREF table what is the mask type. I didn’t find any XREF function but I found something in HASHLOV. 

I understand that is not recommended but I have no choice because the value that we now have in our system it’s the value that is needed to be put out. 

Example: 
PROD: Username ALVERT 
DEV: Masked username : DKW5123 

The DKW5123 exists in our AD group so when a user goes to test he does not have to login we SSO. This is why we need the XREF. If we use FORMATENCRYPT the user can’t connect or we would have to recreate the AD group and it’s not part of our project. The only part missing in your explanation is how to use the XREF in FDM. In the doc it says to use for Param3 : XREF,PERSON_SSN(PERSON_ID). When I replicate it, I get an error. This is the only part missing on how to use it on a field that I need.

Environment:
CA Test Data Manager (TDM)- Fast Data Masker (FDM)
Answer:

The documentation in TDM 4.1 had an error in it. The cross reference identifier does not go in parm3. The cross reference identifier goes in the cross reference column in the masking CSV. If the table is pre populated and you just want to do a lookup then use the IGNORE function together with the cross reference identifier. 

In FDM when masking CSV, the cross reference identifier MUST be in the Cross Reference column. Typically you would not use a cross reference with HASHLOV, as it gives consistent masking anyway. Parm3 is used to specify the column to hash on, if this is left blank, then the current value (in the example ADDR_LINE1) is used for the hash. 


The connectscramble.txt is just the connection to your scramble database. The XREF details mentioned in the documentation were under masking rules and options available. There were some gaps in our documentation that have been fixed since TDM 4.1.

Using cross reference in FDM: 
One way of guaranteeing consistent masking across different tables or databases is to use values from a cross reference table. The structure of the table is as follows: 

CREATE TABLE gtsrc_xref ( 
rx_ref_id varchar (254) NOT NULL, 
rx_old_value varchar (254) NOT NULL, 
rx_new_value varchar (254) ); 

ALTER TABLE gtsrc_xref 
ADD CONSTRAINT gtsrc_xref_pk PRIMARY KEY ( 
rx_ref_id , 
rx_old_value ); 

CREATE INDEX gtsrc_xref_nu ON scramble.gtsrc_xref ( 
rx_ref_id , 
UPPER("RX_OLD_VALUE") ); 

This table can exist in the same schema as the one you are masking, but more typically exists in a different one. If scramble components for Subset have been installed, the table will exist in the scramble schema for Oracle or the scramble database for SQL Server. 


When masking 2 different columns the same way in FDM, if the SAME cross reference identifier is used in both masks, then the data will be masked in the same way. The following is an example of how the cross reference table is populated when two credit card numbers are masked.

cross reference fdm.png

When a cross reference identifier is supplied in FDM, as masking proceeds, FDM checks to see if the value to be masked exists in the rx_old_value column in the cross reference table for the identifier supplied. If it does exist, the rx_new_value is used, otherwise a new row is inserted into the table with both the old and new (masked) value. In this way, consistency is achieved. 


It should be noted that this approach is not the recommended method for consistent masking in FDM as it is not very performant for larger data sets. A better way to achieve consistent masking is to use one of the deterministic functions in FDM, so if masking names or addresses use HASHLOV, or if masking an id column use FORMATENCRYPT. 


If using cross reference the following need also be supplied in the options:

CROSSREFCONNECT - this is the name of the connection file used to connect to the cross reference table. Use the mapper to create a connection profile to the schema containing the cross reference table. A drop down list in the options show the connection profiles available, choose the one connecting to gtsrc_xref 
CROSSREFTABLE – this is the name of the cross reference table, usually gtsrc_xref. 

Additional Information:

If you experience any further issues, please open a support case by going to https://www.ca.com/us/services-support/ca-support/contact-support.html.