HIPAA on AWS: S3 buckets

Posted Apr 15, 2023

By Rodrigo Bressan 8 min read

Storing healthcare data safely is a big and relevant topic, and HIPAA (the Health Insurance Portability and Accountability Act) has some strict rules about it. Nowadays, many healthcare organizations are turning to Amazon S3 buckets in the cloud to keep patient records and other health-related info secure. Let’s break down how you can use Amazon S3 buckets to follow HIPAA standards and keep your healthcare data safe.

Disable Acess Control Lists (ACLs)

When ACLs are disabled access control for your data is based on policies, such as the following:

AWS Identity and Access Management (IAM) user policies
S3 bucket policies
Virtual private cloud (VPC) endpoint policies
AWS Organizations service control policies

  
resource "aws_s3_bucket" "example_bucket" {
  bucket = "your-bucket-name"
  acl    = "private"
}

Use IAM roles for accessing S3 buckets

Avoid directly using AWS credentias (access key + token), and instead, use IAM roles for accessing S3 buckets

Fine-graind access control: IAM buckets allows us to precisely define which roles can access which specific resources
Least privile principle: By default IAM roles will follow leas privilege principle, making sure we are not providing more permissions that what is needed
Temporary credentials: by using a role instead of a pair of long-living keys, we minimize the risk of long-term credential exposure

Example using an EC2 instance that accesses a S3 bucket through an IAM role:

  
# Create an S3 bucket
resource "aws_s3_bucket" "example_bucket" {
  bucket = "your-s3-bucket-name"
  acl    = "private"
}

# Create an IAM role for EC2 instances
resource "aws_iam_role" "s3_access_role" {
  name = "EC2S3AccessRole"

  assume_role_policy = jsonencode({
    Version = "2012-10-17",
    Statement = [
      {
        Action = "sts:AssumeRole",
        Effect = "Allow",
        Principal = {
          Service = "ec2.amazonaws.com"
        }
      }
    ]
  })
}

# Attach an S3 access policy to the IAM role
data "aws_iam_policy_document" "s3_access_policy" {
  statement {
    actions   = ["s3:GetObject", "s3:PutObject", "s3:ListBucket"]
    resources = ["arn:aws:s3:::your-s3-bucket-name/*"]
  }
}

resource "aws_iam_policy" "s3_access_policy" {
  name        = "S3AccessPolicy"
  description = "Policy for S3 access"
  policy      = data.aws_iam_policy_document.s3_access_policy.json
}

resource "aws_iam_policy_attachment" "s3_access_policy_attachment" {
  name       = "s3_access_policy_attachment"
  policy_arn = aws_iam_policy.s3_access_policy.arn
  roles      = [aws_iam_role.s3_access_role.name]
}

# Launch an EC2 instance with the IAM role
resource "aws_instance" "example_instance" {
  ami           = "ami-0c55b159cbfafe1f0" # Specify a valid AMI ID
  instance_type = "t2.micro"
  subnet_id     = "subnet-0123456789abcdef0" # Specify a valid subnet ID
  key_name      = "your-key-name"           # Specify your key pair

  iam_instance_profile = aws_iam_role.s3_access_role.name

  tags = {
    Name = "ExampleInstance"
  }
}

Using Object Lock

Data retention and preservation: ensures that cannot be deleted or modified until a specific reteiton period is set
Data immutability: prevents unauthorized changes to data, enhancing data integrity and compliance
Accidental deletion prevention: makes sure that not even administrators can accidentaly remove sensitive data, reducing data loss risk

  
resource "aws_s3_bucket" "hipaa_bucket" {
  bucket = "your-hipaa-compliant-bucket"
  acl    = "private"

  versioning {
    enabled = true
  }

  object_lock_configuration {
    object_lock_enabled = "Enabled"
  }
}

Data Encryption

In-transit: as it travels to and from AWS S3
- SSL/TLS or client-side encryption
At rest: while stored on S3 data centers
- Server-side encryption: S3 encrypts your objects before saving on disk, and then decrypts when you need to download them back
- All S3 buckets have encryption configured by default (using SSE-S3)

Customizing server-side encryption:

Server-side encryption with AWS S3 managed keys (SSE-S3)
- Open S3 bucket on console
- Go to properties
- Scroll down to server-side encryption settings
- Pick “override bucket default encryption settings
- On “encryption type”, chose AWS S3 managed keys (SSE-S3)
Server-side encryption with AWS KMS (SSE-KMS)
- Same as the first (SSE-S3), but on encryption type, choose AWS Key Management Service Keys (SSE-KMS)
Dual-layer server-side encryption with AWS KMS keys (DSSE-KMS):
- Same as the first one, but on encryption type, choose Dual-Layer server-side encryption with AWS KMS (DSSE-KMS)
- Under AWS KMS key, select an existing KMS key from the available keys (or create one if not existent)
Server-side encryption with customer-provided keys
- At the time of the object creation (e.g. REST API), specify the encryption key you want to use with the following HTTP headers:
  - x-amz-server-side-encryption-customer-algorithm: specify the encryption algorithm. Must be AES256
  - x-amz-server-side-encryption-customer-key: specify the 256-bit, base64 encoded encryption key for S3 to encrypt/decrypt the data
  - x-amz-server-side-encryption-customer-key-MD5: specify the base64-encoded 128-bit MD5 digest of the encryption key. This header is used to make sure the integrity key was transmitted without error

Client-side encryption:

You take care of encrypting your data, sending to S3, and then decrypt when downloading it back
You can use the AWS S3 Encryption Client, which offers the possibility to perform client-side encryption before sending data and retrieving data to/from S3. More info here: https://docs.aws.amazon.com/amazon-s3-encryption-client/latest/developerguide/what-is-s3-encryption-client.html

Example of server-side encryption using SSE-S3:

  
resource "aws_s3_bucket" "hipaa_bucket" {
  bucket = "your-hipaa-bucket-name"
  
  # Enable server-side encryption with Amazon S3-managed keys (SSE-S3)
  server_side_encryption_configuration {
    rule {
      apply_server_side_encryption_by_default {
        sse_algorithm = "AES256"
      }
    }
  }

AWS Service Endpoints

Connections to S3 containing PHI (Patient Health Information) must use encrypted transport methods (HTTPS).

Naming conventions

Do not store PHI in:

Bucket names
Object names
Metadata

Why? Because this data is not encrypted using server-side encryption and usually not encrypted on client-side neither.

Versioning

Why to enable versioning on S3 buckets?

Data Integrity: protects from accidental or malicious data deletions or modifications
Audit trail: provides an audit trail of changes made to objects on bucket
Retention requirements: HIPAA, as other regulatory entities, mandates specific data retentions. Versioning allows you to maintain all object versions
Recovery and rollback: Allows you to recover data to previous versions in case of corruption or unintended changes

Doing it with Terraform:

  
// inside your TF bucket definition
versioning {
  enabled = true
}

Access Logging

Audit trails: who accessed what when, aiding in auditing and compliance
Security monitoring: helps on identifying suspicious or unauthorized access attempts
Compliance requirements: needed for regulatory compliance standards (HIPAA, GDPR)
Incident response: in case of data breach, access logs can be essential to understand the scope of the breach
Policy enforcement: making sure the access policies are properly configured and enforced Terraform code:

  
# Create the source S3 bucket
resource "aws_s3_bucket" "source_bucket" {
  bucket = "your-source-bucket-name"
}

# Create the target S3 bucket for access logs
resource "aws_s3_bucket" "access_logging_bucket" {
  bucket = "your-access-logs-bucket-name"
}

# Enable access logging for the source bucket
resource "aws_s3_bucket_logging" "access_logging" {
  bucket = aws_s3_bucket.source_bucket.id

  target_bucket = aws_s3_bucket.access_logging_bucket.id
  target_prefix = "access-logs/"
}

Cross-Region Replication

Data redundancy: data stored on multiple geographic regions reduces the risk of data loss due to disasters or hardware failures
Data availability: downtime is reduced when one geographic region fails, ensuring customers can access patient data when needed
Disaster recovery: in case of accidental or malicious agents, data can still be recovered
Integrity: allows to check the integrity/consistency of the data

  
provider "aws" {
  region = "us-east-1" # The source region
}

# Create the source S3 bucket
resource "aws_s3_bucket" "source_bucket" {
  bucket = "your-source-bucket-name"
  acl    = "private"
}

provider "aws" {
  alias  = "us-west-2" # The target region
  region = "us-west-2"
}

# Create the target S3 bucket in a different region
resource "aws_s3_bucket" "target_bucket" {
  provider = aws.us-west-2
  bucket   = "your-target-bucket-name"
  acl      = "private"
}

# Enable versioning in both source and target buckets
resource "aws_s3_bucket_versioning" "source_versioning" {
  bucket = aws_s3_bucket.source_bucket.id
}

resource "aws_s3_bucket_versioning" "target_versioning" {
  provider = aws.us-west-2
  bucket   = aws_s3_bucket.target_bucket.id
}

# Configure cross-region replication for the source bucket
resource "aws_s3_bucket_replication_configuration" "replication_config" {
  bucket = aws_s3_bucket.source_bucket.id

  role = "arn:aws:iam::YOUR_ACCOUNT_ID:role/your-replication-role"

  destination {
    bucket = aws_s3_bucket.target_bucket.id
  }

  rules {
    id     = "rule-1"
    status = "Enabled"
    prefix = ""
  }
}

Using VPC access for S3 access

Network isolation: by using a VPC, you ensure that your data does not get exposed to external threats and it is available only inside the AWS network
Data control: allows organizations to maintain full control over data access

Leverage Amazon Macie to discover sensitive data

Amazon Macie is a security service that discovers sensitive data by using machine learning and pattern matching

More info here: https://docs.aws.amazon.com/macie/latest/user/what-is-macie.html

References

cloud, regulatory

aws s3 hipaa

This post is licensed under CC BY 4.0 by the author.