Getting Started with DataVeil Data Masking

Note: Video demonstrations of DataVeil are available at www.dataveil.com/demo

 

After you start DataVeil, you will be presented with an initial screen, similar to that shown below.
 

 

You may be wondering... "OK, so now what?"

In answer to that question, the general work-flow is as follows:

Step 1 - Connect to the Database

Step 2 - Add relevant Tables & Columns into your Project

Step 3 - Identify the Sensitive Columns

Step 4 - Configure Masks for the Sensitive Columns

Step 5 - Run the Masking Project
 

Each of these steps is described below.

 

Step 1 - Connect to the Database

The first thing you will need to do is to retrieve the schema (i.e. the database layout details) of the database that you wish to mask.

In order to do this you will need to define a connection to the database.

You can do this by either clicking on the  icon in the toolbar or from the main menu Database -> Add Database Connection...

 

This will bring up the Add Database Connection dialog as shown below. Select the database type and enter the name of the database as configured on the server. In the example below, we will use a database called "kona" which is an SQL Server type of database.

 

After you click 'OK' you shall be presented with the Connection view where you must specify the connection parameters to your database. In our example the connection parameters are shown below. If the database is on the same computer as DataVeil then you could also leave the 'Hostname or IP' field blank (SQL Server connection panel is shown).


 

Note: The User that you login with shall require broad database administrative privileges when a masking project is executed (Step 5 below.) This is because DataVeil shall perform actions such as: disabling/enabling constraints and triggers, creating/dropping tables and (depending on the DBMS) creating/dropping schemas.

After the connection parameters have been specified, click on the 'Get Schema' button. This shall retrieve the schema information such as table names, column names and dependency details.
 

The DataVeil Workspace

Please note that in the Explorer Window there are three tabbed views: Project, DBMS and Excluded.

Project View

This will show all tables and columns that you have selected as relevant to your masking project and shall be saved in a DataVeil project file (.dvp) whenever you perform a Save action. Upon initial connection to a database the Project view shall be empty and you shall be presented with the DBMS view instead.

DBMS View

This view shows all tables and columns that are available on your DBMS (Database Management System) that are not part of Project and are not Excluded.

It is intended that you use the DBMS view after initially connecting to a database so that you can select which tables and columns are to be moved into Project (using right-click action "Add to Project"). After you finish selecting the tables and columns you should switch to the Project view to develop your masking project. You can return later to the DBMS view if you decide that there are additional tables and columns that you wish to add to Project.

Excluded View

This view shows tables and columns that you have excluded using the 'Exclude' action. It is not necessary to exclude anything from a project. It has been provided so that you can exclude columns that you are certain are not sensitive and do not wish to see again. This is easily reversible from the Excluded view so don't worry if make a mistake or need to change this later.

 

 

Step 2 - Add relevant Tables & Columns into your Project

After successfully connecting to the database, the DBMS view shall list the available tables and columns in the database. An empty project Diagram shall also be displayed.

 

You now select the tables and/or columns that are to be moved into the Project (you can make multiple selections while holding down the Ctrl key) and then right-click and select the "Add to Project" action, as shown below. As you move tables and columns into the project, the project Diagram shall be updated accordingly.

You can filter the contents of the DBMS view by typing-in the filter text in the Filter text box.

Note: If your schema is relatively small (perhaps 20 tables or so) and you simply wish to move the entire schema into Project, then you can right-click on the database node (shown as "kona" in the example above) and choose "Add to Project".
 

After you have have finished moving tables and columns into your project you should switch to the Project view by clicking on the "Project" tab.

 

 

Step 3 - Identify the Sensitive Columns

DataVeil requires that you tag each column that you wish to mask as 'Sensitive' so that DataVeil can track the status of columns that need to be masked. For example, sensitive columns typically include person given names, family names, birthdays, account numbers, addresses, phone numbers, email addresses and so on.

If you already know which columns are sensitive then you can proceed to tag them in one of two simple ways (described as Options below). If you do not know where the sensitive data is located in your database then please refer to the Discovery section.

You can also examine a sample of the actual data using the Data Browser window.

 

Shown below is an example of how to tag the column 'lastName' as 'Sensitive'.

Using the Project view's explorer tree, review the list of columns and tag those which are sensitive by right-clicking on the column and selecting 'Sensitive' from the pop-up menu. You can also mark all columns of a table as Sensitive by right-clicking on the table name and selecting 'Sensitive' from the pop-up menu.

As you may notice, the pop-up menu shows an 'Add Mask...' option. Using this option you could add a mask and mark the column as sensitive in one step. You could even select many columns and entire tables and add a mask to all of these in one step. This is described below under the heading "One Step and Bulk Mask Creation". For now, however, we will take a few minutes to illustrate the basic concepts of tagging sensitive columns and adding masks individually.
 

 

  

Depending on your work-flow preference, you can either just tag all of the sensitive columns first and then later define masks for them, or you can tag a column as 'Sensitive' and immediately configure a mask for it, or you can select one or more columns (and/or tables) and add a mask to all of them in one step as described below under "One Step and Bulk Mask Creation."
 

You can always review a summary of which columns have been tagged as Sensitive and their currently configured masks from the Masks Summary view.
 

 

 

 

Step 4 - Configure Masks for the Sensitive Columns

After you have tagged a column as 'Sensitive' you will need to configure a mask for it. 

Note: You can actually define multiple conditional masks for a column. Each mask's condition is defined by its Where condition.

You can easily navigate to the column's Masking view directly from the Masks Summary view by right-clicking the column name and selecting 'Open Column Masks' from the pop-up menu.
 

 

You can also just open the Masking tab and select the column in the Project Explorer tree.
 

 

After a column has been tagged as Sensitive the Add... button becomes enabled.

Click on the Add... button to select the mask that you wish to configure. In this example we are demonstrating the Shuffle mask to keep this introduction simple; however, please be aware that there is also a Person Family Name mask that could have been used.

Note: You can view a sample of the actual data using the Data Browser window. Understanding the nature and format of the data that you are trying to mask can sometimes help you choose an appropriate mask and masking parameters.

 

 

Note: Sometimes, some of the mask choices will be disabled such as the DateTime and Randomize DateTime masks as shown above. This is because the data type of the target column ('lastName') is not compatible with those masks.

Now click on the 'OK' button to open the selected mask's configuration panel. Most of the masks have default values. In this example we shall accept the default parameters of a simple shuffle for a single column, so just click 'OK' to add the mask.
 

 

It is worthwhile pointing out another versatile mask called the Randomize mask. It can mask a field while preserving its original format. In other words, it will replace alphabetic characters (A-Z, a-z) with randomized alphabetic characters (preserving case), digits with randomized digits, and preserve all other characters such as whitespace and special symbols. It can also perform partial masking of a column. For example, suppose you wanted to mask telephone numbers. You could specify that only the last four digits are to be masked. This means that any country code, area code and formatting information of every single value would be preserved.

Another very useful mask is the DataSet mask. It will mask columns using replacement data from a CSV file that you supply.

You can review all available masks in the Masks Reference section to determine which masks would be best suited for your purposes. 

 

Non-English Languages

Several masks are designed to work optimally with the English language. For example, Person Given Name and Family Name masks will generate names commonly used in the USA, Canada, UK and Australia. This is also true of the Company Name and Sentences masks. The Randomize mask also relies on the English alphabet. However, there are other masks that are useful for non-English languages, such as DataSet, Shuffle, Redact, DateTime, Random Number and others.

For a complete list please refer to Unicode & Non-English Languages.

 

One Step and Bulk Mask Creation

Now that you have an understanding of the basic process and significance of identifying sensitive columns and configuring masks for them, it is time to introduce you to a faster and more convenient way to work that lets you mark columns as Sensitive and create masks for them in a single step.

You simply right-click on a column name and select 'Add Mask...'. You will then be presented with a list of available masks. After selecting a mask and entering any required parameters, the column shall be automatically marked as Sensitive (if it is not already) and the mask shall be added.

You can also choose multiple columns across multiple tables and add a mask to all of them in a single step. For example, suppose that you want to assign individual Shuffle masks to each of 9 columns across 3 tables. You would simply hold down the Ctrl key while selecting the 9 columns in the Project explorer tree, then right-click on any one of the selected column names and choose 'Add Masks to Selection...'.
 

 

You would then select the mask, then enter any required parameters, and click OK. A copy of that mask and its parameters shall be created for each of the selected columns.
 

 

 

Configure the Determinism and Seed

An important consideration in configuring masks is determinism.

DataVeil can perform masks in a deterministic or non-deterministic manner.
   

Deterministic Masking

Deterministic means that every occurrence of an original value shall be masked consistently to the same masked value. For example, if "Smith" generates the masked value of "Baker" then every occurrence of "Smith" shall get masked with "Baker". It also means that DataVeil shall generate the same masked values for the same set of original values every time the masking project is executed regardless of whether it is today, tomorrow or next year. These masked values shall also be consistent across different databases and even across databases on different DBMS's such as SQL Server and Oracle.

The actual set of masked values generated shall depend on the deterministic seed that you choose.

By default, DataVeil projects are set up to be deterministic and each mask shall use the project's global Default Seed; however, there is no default value for the Default Seed so you must specify one.

If you tried to run the masking project configured so far, which uses the project Determinism setting of 'Deterministic' and the Default Seed hasn't yet been specified, then DataVeil shall automatically prompt you for it as shown below.
 


 

The seed can be any combination of printable characters. Every character is significant except that leading and trailing spaces are trimmed. It is best to choose seeds with longer and unusual text as this tends to improve variability in generated values and makes it less likely for someone to guess the original seed.

If you change the seed value then the deterministic masks that rely on this seed shall produce a different set of masked values.

For more information on determinism please refer to The Determinism Option.
   

Non-Deterministic Masking

Alternatively, if you configure a project to be not deterministic then the masked values generated shall be unpredictable. For example, the first instance of "Smith" could be masked with "Franklin", and the next instance of "Smith" could be masked with "Ford", and the next with yet a different value. Furthermore, the next execution of the masking project could have an entirely different set of masked values.

 

 

Step 5 - Run the Masking Project

DataVeil provides two types of masking execution: a Preview Run and a Masking Run. Each of these is described below.

Before proceeding, we need to introduce one final DataVeil security requirement - the Project Key.

You will be prompted automatically by DataVeil if either the Deterministic Seed or Project Key are required and not yet specified.
 

Project Key

If a project contains sensitive elements (database passwords or seeds) then DataVeil shall prompt you to create a Project Key whenever required such as when attempting to save a project file or run a masking project. You can also think of the Project Key as a project password.

A Default Seed was defined a the preceding step, therefore when we attempt to run the masking project we shall first be prompted to create a Project Key. The dialog is shown below:
 


 

Choose a combination of any printable characters and click 'Yes'.

Do not forget this key, you shall need it to edit and run the project in future. Please also refer to Project Security.
 

The optional Compile step

You may have noticed a separate 'Compile' function. Whenever a Run is performed (either Preview or Masking) then a Compile is automatically performed. The main purpose of providing a separate Compile step is so a final validation of a project can be confirmed that the project is indeed ready to execute at a later time without having to actually attempt an actual execution.

To perform a Compile you can:

Click on the Compile button  in the toolbar,

or; click on the 'Compile' button in the 'Execution' view.

 
 

Compile is a quick function that usually takes only a matter of seconds.
 

Preview Run

Before you run a masking project that will overwrite your database with masked values, you may be interested in previewing what these masked values will actually look like.

DataVeil lets you preview the generated masked values in the Data Browser - showing both 'before' and 'after' values. This is an optional step. To preview these 'before' and 'after' values, click on the Preview Run button  in the main toolbar or in the Data Browser toolbar.

It is important to note that DataVeil will likely make temporary changes to the database even during a Preview Run, such as temporarily disabling triggers and constraints. These will all be restored upon the completion of the Preview Run. Therefore, never perform a Preview Run on a database that is currently in use by other users. You should have exclusive access to the database.

A confirmation box shall be displayed:
 

 

Masked Value Generation Scope

Quick

Only a small subset of rows shall be processed. This is ideal for quickly generating a preview of what masked values shall look like. Note: Dependant values (from automatically generated 'Inherit' masks) shall not be generated in Quick mode and shall be shown as "NA (Inherit)". This is because Quick mode may not generate all required parent masked values for its dependencies.

Complete

This shall generate the full range of masked values. This will enable you to also preview masked dependent (foreign key) values that are unavailable in the Quick preview. It shall also perform unique constraint checks as required.

This option can be used on any size database using any DataVeil license type. This means that you can evaluate DataVeil's performance of generating an unlimited number of masked values using just a free Community license.

 

After the Preview Run completes, the Data Browser shall display a preview of what the 'before' and 'after' masking values would look like. The default display format is shown below. You can sort the rows by clicking on the column name headings.

You can also customize which columns are to be displayed and their appearance order in the Data Browser Settings panel that you can access by clicking on the  settings icon. Please refer to the Data Browser section for more details.
 


 

You can read more about Preview Run in the Executing Projects section.

 

Masking Run

When you are ready to actual overwrite the original data with masked data you can run the masking project.

To run the masking project you can:

Click on the Run button  in the toolbar,

or; click on the 'Run...' button in the 'Execution' view.
 


 

Note: The user specified under Username in the Connection view shall require broad privileges as described in Step 1. Otherwise, the masking Run will likely fail.

 

A warning shall always be displayed to caution you that the target database shall be overwritten with masked values.
 

 

Select the checkbox next to "Proceed to overwrite the database location?" and then click on 'OK' to confirm that you have understood the cautions and to proceed to mask the target database. The Execution and Output Logger views shall be opened (if they aren't already) and the progress of the masking project shall be displayed.
 

 

You can access Logger display options by right-clicking on the Logger view. This allows you to control line-wrap, search the logger, filtering and saving a copy of the Logger output to a text file ('Save As...' ).

 

Execution from Command Line

It is also possible to execute a DataVeil project from the Windows command line. This gives you great flexibility because, using the operating system environment, you can schedule a DataVeil masking execution and even incorporate it into larger batch jobs to achieve automation.

A sample Windows batch file "dataveil_cmd_win.bat" has been provided in the installation folder "dataveil\batch". A sample Linux/Unix shell script has also been provided in file "dataveil_cmd_nix". These files contains a sample command to execute a DataVeil project and how to verify whether the execution outcome was successful or not.

For further details please refer to the Command Line Execution section.

 

Masking Execution Report

A masking execution PDF report is automatically created after each masking run and can be viewed in the PDF viewer window by clicking on the Reports tab as shown below.

By default, a copy of the actual masking project is also saved in the report folder. This can help with compliance requirements because the report folder will have an exact copy of the masking project that was executed together with a detailed PDF report of the masking details and outcome. This also serves as a useful automatic backup of masking projects if you should ever need to retrieve a copy of an older masking project.

An XML version of the masking run report is also automatically saved in the report folder.
 

 

 

You can customize report settings by clicking on the  button. You can specify options such as:

* Automatically save a copy of the masking project file with the report (default is Yes).

* Automatically delete older reports. Note: You can keep important results permanently by right-clicking on the result node and choosing 'Move to Archive'.

* Specify custom report headings such as your Company Name, Analyst Name, Contact Details, etc.

* Send email notifications of completed masking jobs with the option to include the PDF report as an attachment.
 

 

If you would like to view the folder contents directly then you can right-click on the result node and choosing 'Open in OS Application'. This will effectively open the result folder in File Explorer.
 

 

 

SQL Scripts

You can also define SQL commands that are to be performed before the masking functions are performed, such as for environment setup, and SQL commands that are to be performed after masking has been completed. You can find out more in the Scripting SQL Commands section.

 

Performance Boost

To get the best masking speed performance please consider installing the supplied DataVeil Native Library on your DBMS.

These are provided in the software delivery folder dataveil/native.

The increase in masked value generation speed varies considerably among masks but in general the speed increase is approximately 2x for Oracle and 10x for SQL Server.

Please refer to Native Libraries.
 

Components and Macros

Components and Macros provide very powerful ways to define flexible and reusable masking definitions that can be shared across masking projects.

 

Light and Dark UI Themes

The default UI theme is FlatLaf Light as shown in the screen captures above. Users who prefer a dark theme can easily switch to FlatLaf Dark.

 

Summary

This has been a simple introduction into the basic operation of DataVeil to perform static data masking.

Although DataVeil has other more advanced concepts and features, the basic process described in this section is fundamentally the same.

You are encouraged to review all of the topics in this Help file, particularly each of the available masks in the Masks Reference section, and look for other tutorials and videos at our web site at www.dataveil.com.

This DataVeil User Guide is available online at www.dataveil.com/user-guide and local HTML, PDF and CHM formats are delivered in the dataveil/help folder together with the software.