The Discovery Result Summary Tab
This tab display the completion status of a discovery search.
The Discovery Result Classification Tab
The Classification tab categorizes all positive results ("hits") by classification.
You can expand a classification node (e.g. "Date" below) to review further details, such as which patterns within the classification had hits.
You can further investigate the exact location of these hits by expanding the pattern node (e.g. "Birth Date" below) and seeing the columns where those hits occurred. If you click on the column name you can immediately view the data in the Data Browser window as shown for the column Employee.BirthDate below.
The counts shown by patterns and columns may at first appear to be confusing.
In the above example, the total Tables count for the Date classification is 19, whereas the counts for its child patterns Birth Date (1), Expiry Date (9), Service Date (12) and Start Date (8) would add up to 30. This can happen because multiple patterns can have hits in the same table. Therefore, the Date classification total simply says that 19 distinct tables had hits, whereas each underlying pattern shows how many of those 19 distinct tables had hits for each pattern.
The same applies to the other count columns. E.g. If a table has 100,000 rows and 2 columns are detected as sensitive, then Rows would show 100,000 and Values would show 200,000.
Note: The Rows and Values columns represent the total count of all rows and values of a column that had hits in a table - not just those that had hits. E.g. If a table has 100,000 rows and your sample size was 1,000 rows and a search pattern had detected 800 hits in a column, then the entire column is regarded as sensitive (not just the 800 rows in a sample that had hits). Therefore 100,000 would be reported for Rows and Values.
Let's consider another example.
We can see that PersonPhone.PhoneNumber has been detected within the Telephone classification and specifically by the Telephone International pattern. Yet, when we look at the sample displayed in the Data Browser the telephone numbers do not appear to be in international format at all.
For an explanation of why a DBMS column has been reported in the manner it has been, you can right-click on the result row and select "View in Performance". This will immediately open the Performance tab and highlight the discovery search performance statistics for that DBMS column, as explained in "The Discovery Result Performance Tab" section below.
The Discovery Result Performance Tab
The Performance tab provides information specifically to explain why data has been detected and categorized in the way it has. On the basis of this information, a Confidence level is also shown.
Following on from the example above, we are presented with the following view in the Performance tab.
You can see that although the Data Browser (which shows a random sample of data) shows only phone numbers in the North American telephone number format, the Performance tab's Data Hit Samples column shows a sample of data that has specifically generated hits by the pattern. You can click on the "..." button in the Data Hit Samples column to view the details of those hit samples, as shown below.
As you can see, the PersonPhone.PhoneNumber DBMS column does indeed contain international format telephone numbers. In fact, the Data Hit% column shows that 92% of the search sample (which happened to be 10,000 rows as set in the Search Options panel) contained international format telephone numbers.
The Data Neg Samples column shows samples of data that were rejected by the discovery pattern. In this case, the Telephone International pattern rejected 8% of the sampled data, and by clicking on the corresponding "..." button in the Data Neg Samples column you can view some of these rejected values, as shown below. As you can see, these values are not international format telephone numbers.
In this example, we can therefore conclude that PersonPhone.PhoneNumber contains a mix of differently formatted telephone numbers - some local and some international.
The DBMS column name.
Sensitivity status as already configure in the project:
S = User has already marked this column as Sensitive
N = User has already marked this column as Not Sensitive
E = User had already Excluded this column
Blank = Sensitivity status has not yet been assigned.
The discovery classification name.
The discovery pattern name.
System = DataVeil built-in pattern
Custom = User-defined regular expression
The degree of confidence, as a percentage, that DataVeil has classified this data correctly.
Y = The DBMS column name was examined by the pattern and caused a hit.
- = The pattern did not define a search pattern for the column name.
Blank = The DBMS column name was examined by the pattern but did not cause a hit.
The count of rows in the sample that satisfied the discovery pattern (had 'hits')
The percentage of non-null rows in the sample that satisfied the discovery pattern.
The count of rows in the sample that contained NULL.
Data Hit Samples
A sample of data values that satisfied the discovery pattern.
The count of rows in the sample that the discovery pattern pattern had rejected (a 'negative' hit).
Note: A Negative hit represents a data value that a system discovery pattern has determined to be significantly unlikely to satisfy the intended pattern. It does not represent ambiguous data values that DataVeil does not report as hits or negs. Therefore, you may sometimes see that the Data Hits count and Data Negs count do not add up to the sample size. This means that either there were ambiguous values or that the total number of rows was less than the sample size.
The percentage of non-null rows in the sample that the discovery pattern had rejected.
Data Neg Samples
A sample of data values that the discovery pattern had rejected.
You may notice that some DBMS column locations are in a color other than black. The significance of these other colors is:
Blue - A Column that has already been marked as Sensitive (and is therefore already in the Project)
Red - A Column that has been Excluded
Note: You can filter out some of these results using the Search Options in the Search tab. i.e. It may be convenient to ignore those Columns that have already been categorized as Sensitive or Excluded.
The Discovery Result Masking Tab
The Masking tab provides information that is oriented toward assisting you in selecting and assigning masks to the discovered DBMS columns if you would like to do so.
This tab shows some of the same information as in the Performance tab, such as Location, Confidence and Data Samples, but shows other information such as the DBMS Data Type, the Proposed Mask and some additional Notes about the data that the discovery search engine may have detected.
You can view the available actions by selecting one or more rows and right-click. A popup menu will appear, as shown above. Note that some actions in the menu may be disabled according to the context of your project.
As you can see in the example, DataVeil has proposed a Person Full Name mask. You can select the "Edit Proposed Mask..." from the popup menu to view or customize the mask. The mask in this example is shown below. DataVeil had detected whether the Family name appeared at the start or end of a value field and (as shown in the Notes column above) and has correctly configured the proposed mask.
Move into Project
If you are satisfied that a DBMS column listed in the Masking tab is a sensitive column and you wish to move it into your masking project then you can right click on its row and choose the "Move into Project" action.
This action will:
* Ensure that the DBMS column is marked as Sensitive
* Assign the proposed mask
* Move the DBMS column from the discovery result to the masking project. i.e. Its row shall disappear from the discovery result display.
Move All into Project...
The 'Move All into Project...' button at the top of the Masking tab provides a bulk action version of the "Move into Project" action described above.
If you use this action then it is suggested that you first carefully review all of the DBMS columns shown in the result to confirm that it contains only sensitive columns that you wish to mask. If there are columns that you do not wish to mask, such as false positives, then you should first use the "Remove from Result" action in the popup menu.
After you click 'OK' all of the DBMS columns shown in the Masking tab shall be marked as Sensitive (and will be assigned their proposed masks if the "Apply Proposed Masks" checkbox in the dialog is selected) and will be moved to the masking project.
After this action is performed the discovery result shall be empty (because all results have been moved into the masking project).
Original Complete Result
If you would like to refer to the original discovery search result then you can refer to the Discovery Report that is always generated upon a discovery search completion.