FileMasker as an AWS Lambda Function
FileMasker can be run as an Amazon AWS Lambda function.
Some of the many benefits of using Lambda functions to perform on-demand masking of files are:
* Very easy to configure and use
* No further provisioning, tuning or reconfiguration required
* Automatic scaling through concurrency
* Very high throughput is possible (many terabytes per hour)
* AWS costs are charged only when Lambdas are actually executing.
The steps below describe in detail how to configure a Lambda with a FileMasker executable and how to configure and test request messages that are sent to Lambda functions on startup.
Configure an AWS Lambda Function
From the AWS Management Console, select Lambda under the Compute category.
Next, click on the 'Create Function' button.
Enter a Name for the Lambda function. In this example we shall call it fmLambda.
For Runtime select Java 8.
For Role/Existing role choose the settings that are appropriate for your organization.
Then click on the 'Create Function' button.
Configuring the Lambda Function
Runtime: Java 8
Handler: com.filemasker.lib.main.FileMasker (this is case sensitive)
Function package Upload: select the file from the unzipped FileMasker software location filemasker/modules/ext/FileMaskerLib.jar
Note: Do not delete, move or rename the FileMaskerLib.jar file because the FileMasker GUI also uses it and shall expect to find it there.
Environment variables / KMS key
FileMasker will automatically use the same SSE encryption scheme & key as the original file that is being masked.
Configure this according to your environment.
Memory: 2048 MB is recommended.
Increasing the memory setting beyond 2048 does not improve throughput performance. This observation was correct as of October 2019 but could change depending on whether AWS changes the Lambda implementation in future. See the topic "Lambda Limitations" below for more information.
2048 MB is adequate to process files up to the current maximum file size of 5GB.
Timeout: 5 minutes is recommended.
The maximum execution time of a Lambda function is 5 minutes. Depending on which masks and how many are configured this should be enough time for FileMasker to process files up to between approximately 4 and 8 GB. It is best to run a test on your maximum anticipated file sizes and note the actual execution time taken. If the time taken is very close to 5 minutes then consider allocating more memory and verify whether the execution time is reduced. Also consider whether it is possible to reduce the size of input files to ensure that the execution time shall always be less than 5 minutes. See the topic "Lambda Limitations" below for more information.
Review all other settings and adjust as required for your environment
Click on the Save button. This may take a few minutes to save because the FileMasker JAR file shall begin uploading to AWS at this time.
Create a Test Request Message
When a Lambda function starts, it receives a request message from AWS. FileMasker relies on this request message to contain details such as the location of the FileMasker project file, the input file (S3 object) and output location.
To configure the Test request event, click on the arrow in the 'Select a test event..' combo box, and choose 'Configure test events':
A form shall appear with a dummy JSON record. Replace this with a FileMasker request JSON and give this request event a name. It has been named 'MyFmRequest' in the example below:
inPath: the S3 path to the original file to be masked
outFolderPath: the S3 folder where the masked version of the original file shall be written. If the folder does not exist then it shall be created automatically. If the file already exists in the out folder then it shall be replaced. The file name shall be the same name as the original file.
projectPath: the S3 path to the FileMasker project file (.fmp) that defines the masking to be performed.
projectKey: the FileMasker project key. If your FileMasker project does not use a key then you can omit this projectKey field.
licensePath: the S3 path to your FileMasker license file (.fml). Your license file can be downloaded from your user account at www.dataveil.com. You will then need to upload your license file into an S3 bucket.
region: this should match the S3 files locations and where your KMS keys are configured.
A sample request message has been provided with the FileMasker software in the file aws/request.json.
Now click the Create button.
You are now ready to test the execution of the FileMasker Lambda function. Click on the Test button:
Note: It is normal for AWS Lambda functions to run faster on repeated invocations if the interval between invocations is relatively short. For example, if you run a Lambda for the first time and it takes 20 seconds and then you immediately run it again then it may run in less than 15 seconds on the second invocation. This is because AWS attempts to cache the Lambda's environment for rapid re-invocations. After an unspecified time, AWS will clear the environment and so the next invocation will take several seconds longer to execute again as the Lambda's environment is reinitialized.
Lambda Test Output
After pressing the 'Test' button above, the Lambda function shall execute.
Upon completion, an AWS 'Execution result' box is shown. Click on the 'Details' label to expand. This will show the FileMasker response message and log messages:
Two basic limits of AWS Lambda shall effectively limit the size of files (S3 objects) that you can mask. Specifically, these are Memory and Timeout.
AWS Lambda allocates parts of configured Memory for different purposes, such as for heap, meta and cache. The specific breakdown is not clear from AWS documentation.
The critical memory component required by FileMasker is the heap. Our observations in testing is that allocating Memory beyond 2048 MB does not appear to make any significant difference in heap allocation. Therefore, as of October 2019, the 2048 MB setting appears to be optimal for FileMasker as it provides the maximum CPU performance with maximum useable heap space. Allocating beyond this will not increase performance or throughput but will increase Lambda charges billed by AWS.
The Timeout is the maximum time that an AWS Lambda can run before AWS terminates the Lambda. Setting this to the maximum of 5 minutes is recommended.
In general, FileMasker can process files up to between 4 and 6 GB. The actual size shall depend on how many masks are configured and the type of masks selected. Some masks are more computationally intensive and are therefore slower, such as the Randomize mask. Therefore, if a large number of masks are configured then the upper limit of file size may be closer to 4 GB because the total processing time may approach the AWS Lambda hard limit of 5 minutes. If fewer or faster masks are used then the upper limit may be closer to 6GB. These are general estimates only. Finally, please be aware that the AWS Lambda environment can fluctuate considerably in performance from time to time. E.g. A Lambda that sometimes runs in 200 seconds may run closer to 220 seconds or more on other occasions.