The Sentences Mask

 

The Sentences mask creates simple random sentences.

It is intended primarily to overwrite text fields that would normally contain some form of descriptive text such as comment fields.

The sentences generated are gibberish but they are real words and sentences. This may be favorable over using a mask such as Randomize because Randomize will produce seemingly random alphanumeric text (not words) which could give the end user reason to doubt the integrity of the database. Whereas the presence of actual sentences will indicate an intended replacement.

Consider the following original sample data:

 

When you create a Sentences mask, all of the mask parameters are optional. The default mask is shown below:

 

A sample result from applying this Sentences mask on the data above is shown below.

 

Same total length as original

The length of a generated text shall be exactly the same length as the original text. This means that the last word may be truncated.

For example, if the original sentence was "A short story." (length = 14) then the masked value would also be 14 characters even if it means that the last word must be truncated, such as "A quick proce.".

 

Randomized total length in the range

This specifies a randomized length range for the masked value.

Minimum

This specifies the minimum length of the masked sentence.

Original length and N percent/characters

This value specifies the minimum length of the sentence. Note that this value can be negative as the default of -25 percent shown in the panel above. Therefore, an original value's length was 200 characters then the above panel specifies a randomized minimum length of 150 characters.

DataVeil shall always ensure that the calculated minimum is always at least 1.

Maximum

This specifies the maximum length of the masked sentence.

Original length and N percent/characters

This value specifies the maximum length of the sentence. Therefore, if an original value's length was 200 characters then the above panel specifies a randomized maximum length of 250 characters.

DataVeil shall always ensure that the calculated maximum does not exceed the field size.
  

Include whole words if possible

If this option is selected then the last word generated shall be a whole word and not a truncated word.

An exception is that the last word may be truncated if the length specification is too short to contain a whole word.

When this option is selected and the last generated word exceeds the mask's target length then the sentence shall be shortened to include up to the last whole word that will fit within the target length.

For example, if the mask's length setting 'Same total text length as original' is selected and the original value length is 30 characters, then suppose DataVeil generates the sentence 'The quick brown fox acts suspiciously.'. The generated sentence happens to be 38 characters which exceeds the target length of 30. Therefore DataVeil would remove trailing words until the length falls within 30 characters. In this case that would result in 'The quick brown fox acts.' which is only 25 characters but it is the closest to the target length using whole words.

If you want to have exact length replacements then do not select this whole words option. If this whole words option was not selected then the generated sentence would have been 'The quick brown fox acts susp.' which matches the exact target length of 30 characters and the last word has been truncated accordingly.

The output in the 'comments' column shown above is an example of when this whole words option is selected.

The output shown below are the same sentences that would be generated if the whole words option is not selected. Note that last words in sentences in rows corresponding to NoteID 43 and 44 have been truncated.

 

Deterministic: Calculate determinism based on the first N characters

 If this mask is in deterministic mode then this parameter shall specify the maximum count of characters that shall be used from the original value to calculate the distinctiveness of the masked value. For example, if this value is 100 characters long and there are two different original values where both are 200 characters long; however, the first 100 characters are identical in both values then the masked value generated shall also be identical for both. This parameter is provided to improve masking performance by avoiding calculating masked values based on long original fields. E.g. If original values are each over 1,000 characters long but all of these values differ within the first 50 characters then you could even reduce this parameter to 50.

The valid range is from 10 to 4,000. The default is 100.

Case Sensitivity

If this mask is in deterministic mode then the generated sentences shall vary according to the case sensitivity setting in the mask's Determinism tab.

Please note that sentences generated when the 'Ignore case' checkbox is selected may be different to sentences generated for exactly the same original sentence when the 'Ignore case' checkbox is not selected. This is because when this mask is not case sensitive it shall calculate determinism based on the original sentence having first being converted to a single case which may be different to the actual original value as is used when the mask is case sensitive.

Null and Empty Strings

If the mask specifies any length parameters that reference the original value's length then:

    * If the the original value is NULL then NULL shall be returned.

    * If the the original value is an empty string then an empty string shall be returned.

Therefore, the only time when original NULLs and empty strings are overwritten with generated sentences is if explicit character lengths are specified for both Minimum and Maximum character lengths such as in the example shown below that an explicit range of between 50 and 200 characters.

 

Size Limitations

The following are the maximum character lengths per value to be masked:

MySQL: 65,535
 Oracle: 4,000
 SQL Server: 2GB
  

Large Objects

When masking Oracle large object types (e.g. CLOB) and it is possible that a value is larger than 4000 then use only 'Randomize total length in the range' with explicit Minimum and Maximum length values as shown in the example above that is using a range of 50 to 200 characters. Do not use 'Original length and..' because these could exceed 4000.