DataSet Mask

The DataSet Mask uses a DataSet definition to read data from a CSV file that is to be used for masking sensitive data.

For example, let's mask a sensitive column called postalcode in the table Address.postalcode with replacement data from a CSV file.

First, select the DataSet mask to add to the column.

In the DataSet Mask form, specify the DataSet definition and select the column from the DataSet definition that is to be used to mask the table's column.
 

 

Mapping

Using DataSet from

This combo box specifies whether DataVeil should use a DataSet defined in the project's DataSet tab or DataSet Components.

Project

DataVeil shall populate the adjacent comb box with the DataSet definition names that are defined under the DataSets tab.

Component

DataVeil shall populate the adjacent comb box with the DataSet definition names that are defined as DataSet Components.
 

DataSet Column

After specifying Using DataSet from  the DataSet Column combo box will be populated with the available column names from the CSV file of the corresponding DataSet definition. The example above shows that the 'post-code' column has been selected.

 

Determinism

If the DataSet Mask is configured as deterministic then the 'Non-Deterministic parameters' section shall be disabled (as shown above).

You can set the determinism in the 'Determinism' tab:
 


 

For further information on determinism please see Determinism Option.

 

Deterministic

In Deterministic mode, DataVeil shall always replace a distinct value with the same DataSet value, even on repeated masking executions. This is true only if the DataSet has not changed and that the Deterministic Seed value has not changed.

DataVeil performs this determinism as follows:

* Using the Seed value DataVeil calculates a hash value for the original sensitive value.

* DataVeil then uses this hash value as an index into the DataSet to select a replacement value.
  

Therefore, if the number of rows in a DataSet CSV file changes (other than adding or removing a Heading row in the file) then the selected index will change. You can add or remove unused columns in the DataSet without affecting the deterministic result.

Deterministic mode is very useful, particularly when consistent replacement across databases is required. This will also effectively maintain duplicates, although DataVeil also supports a separate Duplicates capability which will always synchronize duplicates, regardless of the Determinism setting.
 

Non-Deterministic

In non-deterministic mode, the 'Non-Deterministic parameters' in the DataSet mask's DataSet tab shall be enabled.

This provides the capability to mask with rows from the DataSet that are selected in a random or sequential order.

If sequential access is specified and there are fewer DataSet rows than in the target table, then DataVeil shall cycle through the DataSet.

 

Column Groups

The DataSet mask enables multiple columns from a single DataSet row to be used as replacement data for multiple columns on a single row in a table.

For example, suppose it is required to mask two columns: 'city' and 'postalcode'. It would be unacceptable to replace these values with inconsistent values such as a city name with an incorrect postal code.

In order to do this you must first define a Column Group containing the 'city' and 'postalcode' columns.

Next, select any column that belongs to the Column Group (e.g. 'city') and create a DataSet mask for it. The DataSet mask will contain all columns that belong to the Column Group.

  

 

 

Null and Empty Value Handling

The DataSet mask shall always replace all original values, including nulls and empty strings, with a value from the defined DataSet.

If you need to preserve nulls when using the DataSet mask then add a Preserve mask, with a Where condition of IS NULL, in the column's mask sequence prior to the DataSet mask.

 

The DataSet Mask within an XML Mask

The DataSet mask can be used within an XML mask. In this case, a simplified version of the DataSet mask is presented, as shown in the example below:
  


 

As usual with all child masks of a parent XML mask, then Where and Join tabs are not available. These appear in the parent XML mask as those conditions operate on an SQL per-row level.

Similarly, Column Groups are not available and the Sequential Non-Deterministic option does not offer a sequential ordering based on table column value ordering.

It is important to understand how some of these simplifications affect the DataSet mask's operation within an XML mask:
  

Deterministic mode:

The DataSet mask shall operate normally. Every individual value within an XML record that is matched by a single XPath expression shall be masked individually. i.e. Each original value within the XML record may receive a different masked value.
 

Non-Deterministic mode - Random order

The DataSet mask shall operate normally. Every individual value within an XML record that is matched by a single XPath expression shall be masked individually . i.e. Each original value within the XML record may receive a different masked value.
 

Non-Deterministic mode - Sequential order:

The DataSet mask shall use the same masked value for every individual value within an XML record that is matched by a single XPath expression. This is because the mask relies on the SQL row number of the XML record as an index to retrieve a masked value from the DataSet.

 

  Limitations

The DataSet mask is not available on fields defined as binary objects or within a JSON mask.