Discovery Result
The Discovery Result Summary Tab
This tab display the completion status of a discovery search.
The Discovery Result Classification Tab
The Classification tab categorizes all positive results ("hits") by classification.
You can expand a classification node (e.g. "Person Name" below) to review further details, such as which patterns within the classification had hits.
You can further investigate the exact location of these hits by expanding the pattern node (e.g. "Person Family Name" below) and seeing the columns where those hits occurred. If you click on the column name you can immediately view the data in the Data Browser window as shown for the column Customer.lastName below.
The counts shown by patterns and columns may at first appear to be confusing.
In the above example, the total Tables count for the Person Name classification is 2, whereas the counts for its child patterns Person Family Name (1), Person Full Name (1) and Person Given Name (1) add up to 3. This can happen because multiple patterns can have hits in the same table. Therefore, the Person Name classification total simply says that 2 distinct tables had hits, whereas each underlying pattern shows how many of those 2 distinct tables had hits for each pattern.
The same applies to the other count columns. E.g. If a table has 100,000 rows and 2 columns are detected as sensitive, then Rows would show 100,000 and Values would show 200,000.
Note: The Rows and Values columns represent the total count of all rows and values of a column that had hits in a table - not just those that had hits. E.g. If a table has 100,000 rows and your sample size was 1,000 rows and a search pattern had detected 800 hits in a column, then the entire column is regarded as sensitive (not just the 800 rows in a sample that had hits). Therefore 100,000 would be reported for Rows and Values.
Let's consider another example.
We can see that Customer.phone has been detected within the Telephone classification and specifically by the Telephone US pattern. Yet, when we look at the sample displayed in the Data Browser many of the telephone numbers do not appear to be US telephone numbers.
For an explanation of why a DBMS column has been reported in the manner it has been, you can right-click on the result row and select "View in Performance". This will immediately open the Performance tab and highlight the discovery search performance statistics for that DBMS column, as explained in "The Discovery Result Performance Tab" section below.
The Discovery Result Performance Tab
The Performance tab provides information specifically to explain why data has been detected and categorized in the way it has. On the basis of this information, a Confidence level is also shown.
Following on from the example above, we are presented with the following view in the Performance tab.
DataVeil uses many criteria to determine a Confidence level. To give the user an idea of what caused a pattern to be successful you can click on the "..." button in the Data Hit Samples column to view some samples of data that matched the pattern. As shown in the example below, these all look as though they could reasonably be considered US telephone numbers. The Data Hit% column shows that 99% of rows in the sample (not necessarily all rows in the table) contained similar data and so the confidence level is a high 99%.
If you click on the "..." button in the Data Neg Samples column, DataVeil will show negative samples, i.e. data that that was rejected by the pattern matching algorithm. As shown in the example below, these phone numbers are clearly not US format phone numbers. The Data Neg% column shows that 1% of rows in the sample were rejected by the pattern. This is very useful information because it tell us that our masking strategy may need to be adapted for this column, such as using a conditional mask to mask US (or North American) phone numbers and a default catch-all mask to mask the other non-US format phone numbers.
Columns Displayed
Location
The DBMS column name.
S
Sensitivity status as already configure in the project:
S = User has already marked this column as Sensitive
N = User has already marked this column as Not Sensitive
E = User had already Excluded this column
Blank = Sensitivity status has not yet been assigned.
Classification
The discovery classification name.
Pattern
The discovery pattern name.
Type
Pattern origin.
System = DataVeil built-in pattern
Custom = User-defined regular expression
Confidence
The degree of confidence, as a percentage, that DataVeil has classified this data correctly.
Name Hit
Y = The DBMS column name was examined by the pattern and caused a hit.
- = The pattern did not define a search pattern for the column name.
Blank = The DBMS column name was examined by the pattern but did not cause a hit.
Data Hits
The count of rows in the sample that satisfied the discovery pattern (had 'hits')
Data Hit%
The percentage of non-null rows in the sample that satisfied the discovery pattern.
Nulls
The count of rows in the sample that contained NULL.
Data Hit Samples
A sample of data values that satisfied the discovery pattern.
Data Negs
The count of rows in the sample that the discovery pattern pattern had rejected (a 'negative' hit).
Note: A Negative hit represents a data value that a system discovery pattern has determined to be significantly unlikely to satisfy the intended pattern. It does not represent ambiguous data values that DataVeil does not report as hits or negs. Therefore, you may sometimes see that the Data Hits count and Data Negs count do not add up to the sample size. This means that either there were ambiguous values or that the total number of rows was less than the sample size.
Data Neg%
The percentage of non-null rows in the sample that the discovery pattern had rejected.
Data Neg Samples
A sample of data values that the discovery pattern had rejected.
Colors
You may notice that some DBMS column locations are in a color other than black. The significance of these other colors is:
Blue - A Column that has already been marked as Sensitive (and is therefore already in the Project)
Red - A Column that has been Excluded
Note: You can filter out some of these results using the Search Options in the Search tab. i.e. It may be convenient to ignore those Columns that have already been categorized as Sensitive or Excluded.
The Discovery Result Masking Tab
The Masking tab provides information that is oriented toward assisting you in selecting and assigning masks to the discovered DBMS columns if you would like to do so.
This tab shows some of the same information as in the Performance tab, such as Location, Confidence and Data Samples, but shows other information such as the DBMS Data Type, the Proposed Mask and some additional Notes about the data that the discovery search engine may have detected.
You can view the available actions by selecting one or more rows and right-click. A popup menu will appear, as shown above. Note that some actions in the menu may be disabled according to the context of your project.
As you can see in the example, DataVeil has proposed a Person Full Name mask. You can select the "Edit Proposed Mask..." from the popup menu to view or customize the mask. The mask in this example is shown below. DataVeil had detected whether the Family name appeared at the start or end of a value field and (as shown in the Notes column above) and has correctly configured the proposed mask.
Move into Project
If you are satisfied that a DBMS column listed in the Masking tab is a sensitive column and you wish to move it into your masking project then you can right click on its row and choose the "Move into Project" action.
This action will:
•Ensure that the DBMS column is marked as Sensitive
•Assign the proposed mask
•Move the DBMS column from the discovery result to the masking project. i.e. Its row shall disappear from the discovery result display.
The 'Move All into Project...' button at the top of the Masking tab provides a bulk action version of the "Move into Project" action described above.
If you use this action then it is suggested that you first carefully review all of the DBMS columns shown in the result to confirm that it contains only sensitive columns that you wish to mask. If there are columns that you do not wish to mask, such as false positives, then you should first use the "Remove from Result" action in the popup menu.
After you click 'OK' all of the DBMS columns shown in the Masking tab shall be marked as Sensitive (and will be assigned their proposed masks if the "Apply Proposed Masks" checkbox in the dialog is selected) and will be moved to the masking project.
After this action is performed the discovery result shall be empty (because all results have been moved into the masking project).
Original Complete Result
If you would like to refer to the original discovery search result then you can refer to the Discovery Report that is always generated upon a discovery search completion.