The Randomize Mask

 

The Randomize mask replaces alphanumeric characters with randomized alphanumeric characters.

It will only replace alphabetic characters with random alphabetic characters (always preserving case), and numeric digits with random numeric digits.

All other characters (eg. spaces, special symbols, etc) are preserved.

This mask is primarily intended to work with text data types. Decimal numeric types can also be masked although care should be taken if distinct values are required as discussed for the "Distinct" parameters below.

Using this mask on binary numeric types may yield unexpected values.

This mask is particularly useful for fields that may contain sensitive information and you want to preserve as many lexical characteristics as possible. For example, you want to ensure that the masked value is exactly the same length as the original value and that the case is preserved. Also, you want to be sure that the punctuation, white space, etc appears exactly as it did in the original. This may help add to the degree of confidence in an application being tested because specific field characteristics, such as the actual lengths or appearance of special characters in your production database will be replicated in the masked database.

The potential disadvantage is that fields which ordinarily consist of obvious patterns, such as sentences in a client's notes, will appear as gibberish text. For example, "Client 2441234, is in debt!" could be masked with something like "Rsoewn 8710283, rm ap wopl!". This may not be an issue for your purposes; however, if you want to have readable sentences then consider the Sentences mask.

Note: An empty string as input to the Randomize mask in SQL Server will return an empty string, whereas in Oracle NULL shall be returned (as is the convention.) A NULL input shall always return NULL for SQL Server and Oracle.

 

 

 

Description: See The Mask Description Field

 

Randomize

Alphabetic characters:

If this box is selected then original alphabetic characters shall be replaced with random alphabetic characters. Case is always preserved.

If this box is not selected then all alphabetic characters shall be preserved.

Numeric characters:

If this box is selected then each original numeric digit shall be replaced with a random numeric digit.

This shall work correctly with text or decimal numeric data types. If you try to use this mask on other types, such as a binary numeric data type, then a warning shall be displayed and the mask may yield unexpected values.

If this box is not selected then all numeric digits shall be preserved.

If the data type is numeric and the number is negative then the minus sign may be counted as one character position. This is significant if using the Partial Masking option described below.

For binary numeric data types please consider using the Random Number mask.

 

Distinct Deterministic

Generate distinct masked values when in deterministic mode for distinct original values (no collisions):

This section shall only be enabled if the mask is configured for deterministic mode otherwise it shall be disabled and the settings shall be ignored at runtime.

Select this box if all instances of a distinct original value are to be masked with the same distinct masked value. i.e. a one-to-one relationship for original-to-masked values.

Example:

If this box is selected and the input value is 'Daytime' and the masked value is 'Kwucomp', then the masked value of 'Kwucomp' will be generated for only the input value 'Daytime'.

If this box is not selected then it is possible that multiple different input values could produce the same masked value of 'Kwucomp'.

Note: If you want the same set of masked values generated for the same set of original values on every Run then you must ensure that Deterministic mode is used.

 

Consider number type field as n digits wide:

This parameter field shall only be enabled when masking a numeric data type.

The Randomize mask generates distinct values for all possible values of a given character length

However, numeric data types do not consider leading zeros as significant and therefore although the Randomize mask shall produce distinct textual results, it could result in duplicates (collisions) when converted back to numeric types due to the non-significance of leading zeroes.

For example, consider the NUMERIC data type:
    Original NUMERIC value = 1      -->  Input to Randomize = "1"      -->  Masked value = "5"      -->  Converted back to NUMERIC = 5
    Original NUMERIC value = 368  -->  Input to Randomize = "368"  -->  Masked value = "005"  -->  Converted back to NUMERIC = 5

Note that as far as character data is concerned, "005" is distinct to "5"; however, when converted back to NUMERIC they shall both be 5 and therefore a collision would occur.

Therefore, this parameter effectively specifies a fixed width for which all numeric values shall be zero-padded on the left as input to the Randomize function. In the above example, if this parameter was specified as 4 digits wide then:
    Original NUMERIC value = 1      -->  Input to Randomize = "0001"  -->  Masked value = "0359"  -->  Converted back to NUMERIC = 359
    Original NUMERIC value = 368  -->  Input to Randomize = "0368"  -->  Masked value = "8276"  -->  Converted back to NUMERIC = 8276

Note that this means that the Randomize mask shall effectively generate values in the range from 0 to (10**n)-1 which in this example would be 0 to 9999 inclusive.

 

Masking Range

This panel allows you to specify a partial range of an original value to be selected for masking. The default, as shown in the mask above, shall select the entire value.

See the example below for a screen capture that shows a partial masking range selection.
  

Select Range

This defines what part of each value shall be selected for masking (subject further to the First and Last parameters below). Everything outside of this range shall be preserved.

Entire Field - The entire value is selected.

Before Substring - Only that part of the value from the appears before the specified substring in the value shall be selected for masking. The substring and remainder of the value shall be preserved.

After Substring - Only that part of the value from the appears after the specified substring in the value shall be selected for masking. The substring and part of the value that appears before the substring shall be preserved.

Substring

The substring that shall be considered the delimiter of the masking range.

Every character in this field is significant including quotes and spaces. Therefore, unless you want to search the field for a quotes and spaces do not include them in this field.

 

Preserve or Mask:

If Preserve is selected from the combo box then the First and Last parameters shall describe how many characters from the original value shall be preserved in the masked value.

If Mask is selected from the combo box then the First and Last parameters shall describe how many characters in the original value shall be masked. All other characters shall be preserved.

First

This specifies how many of the first count of characters (of the adjacent character unit) shall be preserved or masked (as specified by "Preserve or Mask" described above).

The character unit combo box offers the selection:

All                    - Every character is counted, including non-alphanumerics. ie. This yields a fixed offset.
 Alphabetic        - Only alphabetic characters are counted
 Numeric           - Only numeric characters are counted
 AlphaNumeric   - Only alphanumeric characters are counted

Example:

Suppose you have a telephone number field that contains a 3 digit area code followed by some other digits and formatting. However, the formatting is variable, such as in the following sample data:
     604-123-1234
     (415) 123-1234

Therefore, if you wanted to preserve the area codes and mask the remaining digits while preserving the original formatting, you could specify the Randomize parameters:
 Preserve, First 3 Numeric, which could yield masked values something like:
     604-817-9553
     (415) 261-8971

Last

This specifies how many of the last count of characters (of the adjacent character unit) shall be preserved or masked (as specified by "Preserve or Mask" described above).

 

 Example

Suppose that you wanted to mask only that part of email addresses before the '@' character. This would automatically preserve the domain names because they appear outside of this range.

Furthermore, suppose that you wanted to preserve the first 2 characters and last 1 character of this selected masking range. You probably wouldn't really want to preserve these 3 characters; it's being done in this example for illustrative purposes only.

The following settings would achieve this result:

Here is the corresponding sample output:

Consider the first line in the sample output above. First, the masking range in selected. This is the substring before the first occurrence, from the left, of the '@' character. Therefore, the range selected is 'Roseann.Cheek'. Next, we specified that the first 2 characters and the last 1 character of this range are to be preserved. The remainder of the range shall be masked. Therefore the masked result of the selected range is 'Rolcecr.Bhsvk' (the preserved characters are shown in bold). The full masked result for this field is 'Rolcecr.Bhsvk@ourtestdomain.com'.

 

Size Limitations:

The following are the maximum character lengths per value to be masked:

MySQL: 65,535
 Oracle: 4,000
 SQL Server: 2GB

Size and Performance Limitations:

Although the Randomize mask can accept very long values to be masked, this mask is computationally intensive and is therefore one of the slowest built-in masks. 

Please consider installing DataVeil Native Library for much faster performance of this mask. The estimates below refer to SQL performance (not native library).

It is slowest when Deterministic with the Distinct option is used.

Non-deterministic mode is faster and can typically mask 350,000 16 byte values per minute except MySQL where it would be approximately 160,000 per minute.

Performance is linear so if a value is twice as long then it shall take twice as long to mask.

The table below shows maximum field lengths accepted and approximate performance masking 16 character values on typical server configurations:

DBMS

Type/Maximum Length

Deterministic Distinct/min

Deterministic Non-Distinct/min

Not Deterministic/min

Azure SQL DB:

NVARCHAR(MAX)

200,000

300,000

350,000

MySQL:

TEXT CHARSET utf8mb4

50,000

130,000

160,000

Oracle:

NVARCHAR2(2,000)

200,000

300,000

350,000

SQL Server: 

NVARCHAR(MAX)

200,000

300,000

350,000