The DataSet Mask


The DataSet Mask reads data from a CSV file that is to be used as replacement data for sensitive data.

In order to use a DataSet mask you must first define the DataSet CSV file from which data shall be read. Please refer to the DataSets section for details. After this is done you can use DataSet masks as described below.

For example, suppose we want to replace a sensitive column called FamilyName in the table dbo.Customer with replacement family names that we have in a CSV file defined in the DataSets window as family_names_dataset.

[Note: Although DataVeil provides a Person Last Name mask, if you are in a region other than North America then you may wish to use names more suited to your country or language. You can do this by creating a CSV file containing such names and using the DataSet mask to use those names.]

Therefore, we create a DataSet mask and select the name of the DataSet in the mask's 'Using DataSet' combo box, and also select the 'DataSet Column' combo box to reference the required column from the CSV file.

CSV data mask


Deterministic and Non-Deterministic Modes


If the DataSet Mask is configured as deterministic then the 'Non-Deterministic parameters' section shall be disabled (as shown above).

You can set the determinism in the 'Determinism' tab:

Dataset mask determinism

Further details on Determinism are explained in The Deterministic Option.


Deterministic Mode

In Deterministic mode, DataVeil shall always replace the a distinct value with the same DataSet value, even on repeated masking executions. Of course, this assumes that the DataSet has not changed and that the Deterministic Seed value (if any) has not changed.

DataVeil performs this determinism as follows:

* Using the Seed value (if any) DataVeil calculates a hash value for the target (sensitive) value.

* DataVeil then uses this hash value as an index into the DataSet to select a replacement value.

Therefore, if the number of rows in a DataSet CSV file changes (apart from adding or removing a Headings row in the file) then the selected index will change. You can add or remove unused columns in the DataSet without affecting the deterministic result on subsequent masking executions.

Deterministic mode is very useful, particularly when you want to ensure consistent replacement across databases. This will also effectively maintain duplicates, although DataVeil also supports a separate Duplicates capability which will always synchronize duplicates, regardless of the Determinism setting.


Non-Deterministic Mode

When 'Not Deterministic' is selected, the 'Non-Deterministic parameters' in the DataSet tab shall be enabled.

This provides you with the capability to select rows from the DataSet in a random or sequential order.

If sequential access is specified and there are fewer DataSet rows than in the target table, then DataVeil shall cycle through the DataSet.


Using Column Groups

The DataSet mask lets you select multiple columns from a single DataSet row to be used as replacement data for multiple columns on a single row in a table.

This can be used when it is essential that multiple columns are masked with consistent values per row.

For example, suppose we need to mask two columns: City and Zip. It would be unacceptable to replace these values with inconsistent values such as a city name with an incorrect zip code.

In order to do this you must first define a Column Group containing the City and Zip columns.

Next, select any column that belongs to the Column Group (e.g. 'City') and create a DataSet mask for it. The DataSet mask will contain all columns that belong to the Column Group.

CSV dataset masking multiple columns


Null and Empty Value Handling

The DataSet mask shall always replace all original values, including nulls and empty strings, with a value from the defined DataSet.

If you need to preserve nulls when using the DataSet mask then add a Preserve mask, with a Where condition of IS NULL, in the column's mask sequence prior to the DataSet mask.



 The DataSet mask is not available on fields defined as binary objects.