Protect and Convert Documents in Real-Time with S3 Object Lambda

Sep 13, 2025
6 min read

By Ananta Cloud Engineering Team | September 13, 2025

Protect and convert documents in real time with S3 object lambda

In today's data-driven world, enterprises are continuously handling vast volumes of documents—PDFs, Word files, presentations, and more. Often, there’s a need to dynamically convert document formats, apply watermarks, or sanitize content before it’s consumed by end users.

Traditionally, this required preprocessing, maintaining separate versions of documents, or running costly compute services to process documents on demand. However, Amazon S3 Object Lambda, combined with Ananta Cloud's automation capabilities, provides a powerful, serverless way to handle document transformation on the fly.

What is Amazon S3 Object Lambda?

Amazon S3 Object Lambda enables you to add your own code to data retrieval requests from S3, allowing for on-the-fly transformation of data. This means you no longer need to create and store multiple derivative copies of your data.

You simply configure a Lambda function to transform the object as it’s being retrieved—such as converting a DOCX file to PDF or applying a watermark to a PDF.

Use Case Scenarios

1. Legal & Compliance

Automatically watermark all documents downloaded by internal or external users to ensure audit traceability and discourage data leakage. Example: Add a username, timestamp, or confidentiality label to each page of a PDF.

2. Document Format Conversion

Serve the correct file format dynamically depending on the client request (e.g., convert .docx to .pdf, or .pptx to .jpg for thumbnails). Example: Convert office files to PDFs on-the-fly for browser rendering or sharing.

3. Secure Document Delivery

Only serve watermarked, read-only versions of sensitive files to specific IAM roles, ensuring full control over data access. Example: Generate view-only PDFs with redacted content for guest users.

4. SaaS and Multi-Tenant Applications

Build scalable SaaS apps that allow users to preview, download, or convert documents without duplicating files. Example: In a document management system (DMS), each user can get a personalized watermark and file format based on their profile.

How It Works

Step 1: Store Raw Files in Amazon S3

Your original files—.docx, .pptx, .pdf—are securely stored in a standard S3 bucket.

Step 2: Configure Lambda Function

Create an AWS Lambda function that:

Accepts an S3 GET request
Reads the object stream
Converts file type (e.g., DOCX → PDF)
Adds watermark (text/image/timestamp/user ID)
Returns the modified content in real time

Use Python libraries like python-docx, reportlab, or PyPDF2 for transformation.

Step 3: Set up S3 Object Lambda Access Point

Create an S3 Object Lambda Access Point and connect it to your Lambda function. This access point acts as the new endpoint for fetching the transformed object.

Step 4: Retrieve Transformed Object via S3 URL

Clients access documents through the Object Lambda Access Point URL—no changes to how they normally access S3 files.

Security and Cost Benefits

Security: No need to expose raw files or store derivative copies. All transformations happen at runtime and can be personalized per user.
Cost Optimization: Save on storage by eliminating pre-generated variations.
Scalability: Built-in auto-scaling with Lambda, ideal for high-volume requests.

Sample Architecture Diagram

Document processing architectural diagram — Source: AWS Technical Blogs

User Authentication with Amazon Cognito

In this architecture, we expose a RESTful API to client applications—such as mobile or web apps—which first need to authenticate users before accessing protected resources. This is achieved using Amazon Cognito, which acts as an identity provider (IdP) to handle user sign-in and token issuance.

Cognito can serve as the primary IdP or integrate with third-party identity providers that support OpenID Connect (OIDC) or SAML, allowing for a flexible and federated authentication approach.

Token Validation with Amazon API Gateway

Once the user successfully authenticates via Amazon Cognito, the application receives a JSON Web Token (JWT) containing user identity and claims. This token must be included in subsequent API requests.

When the client makes a request to the API (served through Amazon API Gateway), the gateway validates the JWT against Amazon Cognito. If the token is valid, the request proceeds to the backend Lambda function. If validation fails, the request is rejected with an appropriate error response—ensuring only authorized users can access the service.

User Data Logging with Amazon DynamoDB

After a request is authenticated and reaches the Lambda API handler, the function parses the JWT to identify the requesting user. It logs relevant details such as:

User identity
File requested
Timestamp of access

This metadata is stored in Amazon DynamoDB, providing an auditable trail of file access. Optionally, DynamoDB can also store an encoded watermark string, offering enhanced flexibility and security by avoiding plaintext identifiers (e.g., usernames or emails) in the watermark itself.

Dynamic PDF Generation and Watermarking

The Lambda API handler then initiates a file retrieval operation. Instead of accessing the S3 object directly, it routes the request through an S3 Object Lambda Access Point. This enables the dynamic transformation of files during retrieval.

The request includes:

The original file name
The watermark string (customized per user or session)

Inside the S3 Object Lambda function, the original document is fetched from the source S3 bucket and transformed as follows:

Conversion to PDF: Using LibreOffice (via a prebuilt AWS Lambda Layer), the function converts the source document—such as .docx, .pptx, etc.—into a PDF format.
Watermark Injection: The resulting PDF is processed using a JavaScript library like PDF-Lib to embed the watermark text (e.g., "Confidential - User123") dynamically.

Serving the Final Document to Clients

Once the document is converted and watermarked, the Lambda function:

Uploads the processed file to a temporary Amazon S3 bucket
Generates a presigned S3 URL for secure, time-limited access
Returns a 302 redirect response to the client with the presigned URL

The client then downloads the final, transformed document directly from the presigned S3 link.

Automatic Cleanup of Temporary Files

To manage storage efficiently and ensure temporary files don't accumulate indefinitely, an S3 Lifecycle Policy is applied to the temporary bucket. This configuration automatically deletes files after a set expiration period, keeping your storage environment clean and cost-effective.

Process workflow for document transformation — Source: AWS Technical Blogs

Why Not Use Lambda@Edge? (An Alternative Approach Explained)

Before Amazon S3 Object Lambda became available, developers often used Lambda@Edge for similar document processing workflows. However, it presents a few limitations in this specific use case:

Proximity vs. Processing Power - Lambda@Edge is optimized for reducing latency by executing code closer to end users via CloudFront edge locations. However, in this architecture, low latency isn't the primary goal—server-side transformation and flexibility are more important.
CloudFront Dependency - Using Lambda@Edge requires an Amazon CloudFront distribution, which doesn’t align well with the single-download model used here. Furthermore, the benefits of CloudFront’s caching are largely irrelevant in this use case.
Resource Constraints - Lambda@Edge functions have stricter memory and execution time limits, making them unsuitable for resource-heavy processes like document conversion using LibreOffice or handling large binaries.

Extending the Architecture

The current implementation covers the core workflow—authenticating users, transforming documents, and delivering secure downloads. However, this foundation can be extended in several valuable ways:

1. Image Processing with S3 Object Lambda

You can add new API endpoints to handle image-related transformations—such as resizing, format conversion, and watermarking. Simply create an additional S3 Object Lambda function for image processing and configure a dedicated Access Point to trigger it based on API route or user action.

2. Enhancing API Security

While Amazon API Gateway offers built-in security features (like throttling and IAM authentication), you can further harden your RESTful API by:

Integrating AWS WAF for custom security rules and bot protection
Federating your existing Identity Provider (IdP) with Amazon Cognito, creating a centralized identity management system

3. Observability & Monitoring

Operational insight is key for troubleshooting and optimizing serverless applications. You can enable:

AWS X-Ray to trace requests and pinpoint bottlenecks across services
Amazon CloudWatch Lambda Insights for function-level metrics like memory usage, invocation count, and performance breakdowns

4. Adhering to Best Practices

If you plan to scale or adapt this architecture, it's essential to follow the AWS Well-Architected Framework, specifically the Serverless Application Lens, which offers design principles around:

Reliability
Operational excellence
Cost optimization
Performance efficiency
Security

Example expanded document processing architecture — Source: AWS Technical Blogs

Final Thoughts

There are multiple ways to build document processing pipelines, but leveraging Amazon S3 Object Lambda provides a highly efficient and streamlined approach. It enables you to transform files on-the-fly without requiring intermediary storage and helps decouple file transformation logic from the rest of your application stack—making your architecture more modular and maintainable.

By adopting a serverless architecture on AWS, as outlined in this post, you can minimize infrastructure management, reduce operational costs, and scale effortlessly with demand.

Furthermore, the extensibility of this solution means you can easily integrate new functionality—such as additional file types, watermarking styles, or content filtering—as your organizational requirements evolve.

Ready to automate document delivery at scale?

Contact Ananta Cloud to deploy your first S3 Object Lambda-powered workflow today!

Email: hello@anantacloud.com | LinkedIn: @anantacloud | Schedule Meeting