The DataSet Mask
The DataSet Mask reads data from a CSV file that is to be used as replacement data for sensitive data.
In order to use a DataSet mask you must first define the DataSet CSV file from which data shall be read. Please refer to the DataSets section for details. After this is done you can use DataSet masks as described below.
For example, suppose we want to replace a sensitive column called FamilyName in the table dbo.Customer with replacement family names that we have in a CSV file defined in the DataSets window as family_names_dataset.
[Note: Although DataVeil provides a Person Last Name mask, if you are in a region other than North America then you may wish to use names more suited to your country or language. You can do this by creating a CSV file containing such names and using the DataSet mask to use those names.]
Therefore, we create a DataSet mask and select the name of the DataSet in the mask's 'Using DataSet' combo box, and also select the 'DataSet Column' combo box to reference the required column from the CSV file.
Deterministic and Non-Deterministic Modes
If the DataSet Mask is configured as deterministic then the 'Non-Deterministic parameters' section shall be disabled (as shown above).
You can set the determinism in the 'Determinism' tab:
Further details on Determinism are explained in The Deterministic Option.
In Deterministic mode, DataVeil shall always replace the a distinct value with the same DataSet value, even on repeated masking executions. Of course, this assumes that the DataSet has not changed and that the Deterministic Seed value (if any) has not changed.
DataVeil performs this determinism as follows:
* Using the Seed value DataVeil calculates a hash value for the original sensitive value.
* DataVeil then uses this hash value as an index into the DataSet to select a replacement value.
Therefore, if the number of rows in a DataSet CSV file changes (other than adding or removing a Heading row in the file) then the selected index will change. You can add or remove unused columns in the DataSet without affecting the deterministic result on subsequent masking executions.
Deterministic mode is very useful, particularly when you want to ensure consistent replacement across databases. This will also effectively maintain duplicates, although DataVeil also supports a separate Duplicates capability which will always synchronize duplicates, regardless of the Determinism setting.
When 'Not Deterministic' is selected, the 'Non-Deterministic parameters' in the DataSet tab shall be enabled.
This provides you with the capability to select rows from the DataSet in a random or sequential order.
If sequential access is specified and there are fewer DataSet rows than in the target table, then DataVeil shall cycle through the DataSet.
Using Column Groups
The DataSet mask lets you select multiple columns from a single DataSet row to be used as replacement data for multiple columns on a single row in a table.
This can be used when it is essential that multiple columns are masked with consistent values per row.
For example, suppose we need to mask two columns: City and Zip. It would be unacceptable to replace these values with inconsistent values such as a city name with an incorrect zip code.
In order to do this you must first define a Column Group containing the City and Zip columns.
Next, select any column that belongs to the Column Group (e.g. 'City') and create a DataSet mask for it. The DataSet mask will contain all columns that belong to the Column Group.
Null and Empty Value Handling
The DataSet mask shall always replace all original values, including nulls and empty strings, with a value from the defined DataSet.
If you need to preserve nulls when using the DataSet mask then add a Preserve mask, with a Where condition of IS NULL, in the column's mask sequence prior to the DataSet mask.
The DataSet mask is not available on fields defined as binary objects or within a JSON mask.
The DataSet Mask within an XML Mask
The DataSet mask can be used within an XML mask. In this case, a simplified version of the DataSet mask is presented, as shown in the example below:
As usual with all child masks of a parent XML mask, then Where and Join tabs are not available. These appear in the parent XML mask as those conditions operate on an SQL per-row level.
Similarly, Column Groups are not available and the Sequential Non-Deterministic option does not offer a sequential ordering based on table column value ordering.
It is important to understand how some of these simplifications affect the DataSet mask's operation within an XML mask:
The DataSet mask shall operate normally. Every individual value within an XML record that is matched by a single XPath expression shall be masked individually. i.e. Each original value within the XML record may receive a different masked value.
Non-Deterministic mode - Random order:
The DataSet mask shall operate normally. Every individual value within an XML record that is matched by a single XPath expression shall be masked individually . i.e. Each original value within the XML record may receive a different masked value.
Non-Deterministic mode - Sequential order:
The DataSet mask shall use the same masked value for every individual value within an XML record that is matched by a single XPath expression. This is because the mask relies on the SQL row number of the XML record as an index to retrieve a masked value from the DataSet.