A DataVeil DataSet is a CSV ("Comma Separated Value") file that contains replacement data to be used by DataSet Masks to overwrite sensitive fields.
Therefore, there are two facets to DataSets in DataVeil:
1. The DataSet File Definition
This is a specification of the format of a CSV file and where to find it. This is defined in the "DataSets" Panel as described below.
2. The DataSet Mask(s)
The DataSet File Definition
A DataSet File Definition (or just "DataSet") defines the location of a CSV file and how to parse it, such as whether the file contains a heading row (CSV column names), Separator character, etc.
All DataSets used by the current DataVeil project are defined in the DataSet tab. To define a DataSet click on the 'Add...' button. After selecting a CSV file, a dialog similar to that shown below shall appear. Here you can preview how the file shall be parsed using the current parameters and adjust if necessary.
When a DataSet mask is used, DataVeil copies the required data to the target DBMS. Whether the DataSet is deleted from the DBMS or retained after the masking execution is completed is known as the Lifetime.
The default is Temporary Lifetime.
After a masking execution is completed, the DataSet is automatically deleted from the DBMS.
Although the DataSet upload process is relatively fast, if you have many DataSets that are much larger than 100,000 rows and you intend to repeat many masking executions using the same DataSets then it may be worthwhile, in the interests of efficiency, to specify that these DataSets should be retained on the DBMS after a masking execution.
You can do this is from the Lifetime dialog. Click on the "Lifetime..." button and a similar dialog as shown below appears:
You can now select the "Keep as Table" radio button and specify a DBMS table name where this DataSet is to be saved. This table should not initially exist.
If your DBMS requires a schema location (such as on SQL Server) then the schema must already exist prior to execution. For this purpose it is convenient to create a schema where all permanent datasets shall be kept so that it is easy to drop them all later. In this example the schema shown is "MaskingDataSets".
On repeated masking executions, during initialization, DataVeil shall see that the table exists and will therefore not attempt to upload the DataSet. If the table already exists then DataVeil will not attempt to validate that the contents are of the specified DataSet.
Note: If you use Permanent Lifetime then please be sure to manually DROP these DataSet tables from your DBMS after you complete your masking executions
A DataSet can contain a logically unlimited number of columns (3 shown in the above example).
Note: You do not need to delete unused columns from CSV files because the DataSet Mask allows you to specify precisely which columns you wish to use from a CSV file. This makes it very convenient to use CSV files that you already have or source from other locations.
Encode as Unicode
If this option is enabled then the data from the CSV file shall be used with Unicode encoding on the database, otherwise the default character set on the database shall be assumed.
The default is that 'Encode as Unicode' is enabled and this is generally the recommended setting.
The following are the maximum character lengths per CSV value according to how this parameter is set:
|As Unicode||Default Character Set||As Unicode||Default Character Set|
Column names on row number
If the CSV file has a row that lists the column names of the data in the file then this should specify that row number where the first row in the file is numbered from 1.
If the file does not contain a row with column names then this value must be set to 0.
This value must be less than the value for the 'Data starts on row number' parameter.
Data starts on row number
This is the row number in the CVS file where the first line of data appears where the first row in the file is numbered from 1.
For example, if the first row in the CSV file is a column names row and the next row is where the data starts, then this value would be 2.
This value must be greater than the value for the 'Column names on row number' parameter.
Ignore leading white space
If this value is 'Y' then any white space at the start of a line, or immediately following a Separator, until the next non-white space character shall be ignored.
If this value is 'N' then white space shall be considered as part of the current column value.
This is the character that terminates a column value and starts the next column value. The last column value in a row is terminated by a new line.
Printable characters (such as a comma ',') can be entered as a normal character.
Unprintable characters should be entered with a '\' followed by the decimal ASCII code. For example, the ASCII Group Separator character would be entered as '\29'.
The Tab character can be entered as '\t' or '\9'.
Quotes that enclose data values are optional.
If quotes are used to enclose data values then this parameter should specify the Quote character used. Example the double quote (") or single quote (').
If quotes are not used to enclose data values in the CSV file then this parameter has no effect.
Quotes that are to be considered as part of a data value can either be stuffed with an additional quote or escaped using the 'Escape character'.
Example: The value Sammy's could be represented as "Sammy""s" or "Sammy\"s" (assuming the Escape character is \ ).
This defines the character to 'escape' characters that would ordinarily be considered CSV control characters but are to be considered as plain text data instead. Please refer to the example under "Quote character" above.
The CSV Preview Tab
The CVS Preview Tab shows you how DataVeil shall parse the CSV file.
It is advisable to prepare CSV files to have a uniform number of columns. In other words, the Column Names row (if included) and every Data row should contain the same number of columns. Therefore, every such row should contain the same number of Separators.
If there are two consecutive Separators (i.e. not escaped and therefore not part of a column value) then the missing value (between the consecutive Separators) shall default to an empty string. Note: Oracle shall convert empty strings to NULL, whereas SQL Server preserves empty string values.
If a data row has fewer values than the number of CSV columns then the remaining columns for that row shall receive NULL values.
The number of CSV columns for a table shall be assumed to be the number of columns found in the Column Names row or the first Data row if a Column Names row was not included.