Post

HIPAA on AWS: S3 buckets

Storing healthcare data safely is a big and relevant topic, and HIPAA (the Health Insurance Portability and Accountability Act) has some strict rules about it. Nowadays, many healthcare organizations are turning to Amazon S3 buckets in the cloud to keep patient records and other health-related info secure. Let’s break down how you can use Amazon S3 buckets to follow HIPAA standards and keep your healthcare data safe.

Disable Acess Control Lists (ACLs)

When ACLs are disabled access control for your data is based on policies, such as the following:

  • AWS Identity and Access Management (IAM) user policies
  • S3 bucket policies
  • Virtual private cloud (VPC) endpoint policies
  • AWS Organizations service control policies
1
2
3
4
resource "aws_s3_bucket" "example_bucket" {
  bucket = "your-bucket-name"
  acl    = "private"
}

Use IAM roles for accessing S3 buckets

Avoid directly using AWS credentias (access key + token), and instead, use IAM roles for accessing S3 buckets

  • Fine-graind access control: IAM buckets allows us to precisely define which roles can access which specific resources
  • Least privile principle: By default IAM roles will follow leas privilege principle, making sure we are not providing more permissions that what is needed
  • Temporary credentials: by using a role instead of a pair of long-living keys, we minimize the risk of long-term credential exposure

Example using an EC2 instance that accesses a S3 bucket through an IAM role:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
# Create an S3 bucket
resource "aws_s3_bucket" "example_bucket" {
  bucket = "your-s3-bucket-name"
  acl    = "private"
}

# Create an IAM role for EC2 instances
resource "aws_iam_role" "s3_access_role" {
  name = "EC2S3AccessRole"

  assume_role_policy = jsonencode({
    Version = "2012-10-17",
    Statement = [
      {
        Action = "sts:AssumeRole",
        Effect = "Allow",
        Principal = {
          Service = "ec2.amazonaws.com"
        }
      }
    ]
  })
}

# Attach an S3 access policy to the IAM role
data "aws_iam_policy_document" "s3_access_policy" {
  statement {
    actions   = ["s3:GetObject", "s3:PutObject", "s3:ListBucket"]
    resources = ["arn:aws:s3:::your-s3-bucket-name/*"]
  }
}

resource "aws_iam_policy" "s3_access_policy" {
  name        = "S3AccessPolicy"
  description = "Policy for S3 access"
  policy      = data.aws_iam_policy_document.s3_access_policy.json
}

resource "aws_iam_policy_attachment" "s3_access_policy_attachment" {
  name       = "s3_access_policy_attachment"
  policy_arn = aws_iam_policy.s3_access_policy.arn
  roles      = [aws_iam_role.s3_access_role.name]
}

# Launch an EC2 instance with the IAM role
resource "aws_instance" "example_instance" {
  ami           = "ami-0c55b159cbfafe1f0" # Specify a valid AMI ID
  instance_type = "t2.micro"
  subnet_id     = "subnet-0123456789abcdef0" # Specify a valid subnet ID
  key_name      = "your-key-name"           # Specify your key pair

  iam_instance_profile = aws_iam_role.s3_access_role.name

  tags = {
    Name = "ExampleInstance"
  }
}

Using Object Lock

  • Data retention and preservation: ensures that cannot be deleted or modified until a specific reteiton period is set
  • Data immutability: prevents unauthorized changes to data, enhancing data integrity and compliance
  • Accidental deletion prevention: makes sure that not even administrators can accidentaly remove sensitive data, reducing data loss risk
1
2
3
4
5
6
7
8
9
10
11
12
resource "aws_s3_bucket" "hipaa_bucket" {
  bucket = "your-hipaa-compliant-bucket"
  acl    = "private"

  versioning {
    enabled = true
  }

  object_lock_configuration {
    object_lock_enabled = "Enabled"
  }
}

Data Encryption

  • In-transit: as it travels to and from AWS S3
    • SSL/TLS or client-side encryption
  • At rest: while stored on S3 data centers
    • Server-side encryption: S3 encrypts your objects before saving on disk, and then decrypts when you need to download them back
    • All S3 buckets have encryption configured by default (using SSE-S3)

Customizing server-side encryption:

  • Server-side encryption with AWS S3 managed keys (SSE-S3)
    • Open S3 bucket on console
    • Go to properties
    • Scroll down to server-side encryption settings
    • Pick “override bucket default encryption settings
    • On “encryption type”, chose AWS S3 managed keys (SSE-S3)
  • Server-side encryption with AWS KMS (SSE-KMS)
    • Same as the first (SSE-S3), but on encryption type, choose AWS Key Management Service Keys (SSE-KMS)
  • Dual-layer server-side encryption with AWS KMS keys (DSSE-KMS):
    • Same as the first one, but on encryption type, choose Dual-Layer server-side encryption with AWS KMS (DSSE-KMS)
    • Under AWS KMS key, select an existing KMS key from the available keys (or create one if not existent)
  • Server-side encryption with customer-provided keys
    • At the time of the object creation (e.g. REST API), specify the encryption key you want to use with the following HTTP headers:
      • x-amz-server-side​-encryption​-customer-algorithm: specify the encryption algorithm. Must be AES256
      • x-amz-server-side​-encryption​-customer-key: specify the 256-bit, base64 encoded encryption key for S3 to encrypt/decrypt the data
      • x-amz-server-side​-encryption​-customer-key-MD5: specify the base64-encoded 128-bit MD5 digest of the encryption key. This header is used to make sure the integrity key was transmitted without error

Client-side encryption:

  • You take care of encrypting your data, sending to S3, and then decrypt when downloading it back
  • You can use the AWS S3 Encryption Client, which offers the possibility to perform client-side encryption before sending data and retrieving data to/from S3. More info here: https://docs.aws.amazon.com/amazon-s3-encryption-client/latest/developerguide/what-is-s3-encryption-client.html

Example of server-side encryption using SSE-S3:

1
2
3
4
5
6
7
8
9
10
11
resource "aws_s3_bucket" "hipaa_bucket" {
  bucket = "your-hipaa-bucket-name"
  
  # Enable server-side encryption with Amazon S3-managed keys (SSE-S3)
  server_side_encryption_configuration {
    rule {
      apply_server_side_encryption_by_default {
        sse_algorithm = "AES256"
      }
    }
  }

AWS Service Endpoints

  • Connections to S3 containing PHI (Patient Health Information) must use encrypted transport methods (HTTPS).

Naming conventions

Do not store PHI in:

  • Bucket names
  • Object names
  • Metadata

Why? Because this data is not encrypted using server-side encryption and usually not encrypted on client-side neither.

Versioning

Why to enable versioning on S3 buckets?

  • Data Integrity: protects from accidental or malicious data deletions or modifications
  • Audit trail: provides an audit trail of changes made to objects on bucket
  • Retention requirements: HIPAA, as other regulatory entities, mandates specific data retentions. Versioning allows you to maintain all object versions
  • Recovery and rollback: Allows you to recover data to previous versions in case of corruption or unintended changes

Doing it with Terraform:

1
2
3
4
// inside your TF bucket definition
versioning {
  enabled = true
}

Access Logging

  • Audit trails: who accessed what when, aiding in auditing and compliance
  • Security monitoring: helps on identifying suspicious or unauthorized access attempts
  • Compliance requirements: needed for regulatory compliance standards (HIPAA, GDPR)
  • Incident response: in case of data breach, access logs can be essential to understand the scope of the breach
  • Policy enforcement: making sure the access policies are properly configured and enforced Terraform code:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
# Create the source S3 bucket
resource "aws_s3_bucket" "source_bucket" {
  bucket = "your-source-bucket-name"
}

# Create the target S3 bucket for access logs
resource "aws_s3_bucket" "access_logging_bucket" {
  bucket = "your-access-logs-bucket-name"
}

# Enable access logging for the source bucket
resource "aws_s3_bucket_logging" "access_logging" {
  bucket = aws_s3_bucket.source_bucket.id

  target_bucket = aws_s3_bucket.access_logging_bucket.id
  target_prefix = "access-logs/"
}

Cross-Region Replication

  • Data redundancy: data stored on multiple geographic regions reduces the risk of data loss due to disasters or hardware failures
  • Data availability: downtime is reduced when one geographic region fails, ensuring customers can access patient data when needed
  • Disaster recovery: in case of accidental or malicious agents, data can still be recovered
  • Integrity: allows to check the integrity/consistency of the data
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
provider "aws" {
  region = "us-east-1" # The source region
}

# Create the source S3 bucket
resource "aws_s3_bucket" "source_bucket" {
  bucket = "your-source-bucket-name"
  acl    = "private"
}

provider "aws" {
  alias  = "us-west-2" # The target region
  region = "us-west-2"
}

# Create the target S3 bucket in a different region
resource "aws_s3_bucket" "target_bucket" {
  provider = aws.us-west-2
  bucket   = "your-target-bucket-name"
  acl      = "private"
}

# Enable versioning in both source and target buckets
resource "aws_s3_bucket_versioning" "source_versioning" {
  bucket = aws_s3_bucket.source_bucket.id
}

resource "aws_s3_bucket_versioning" "target_versioning" {
  provider = aws.us-west-2
  bucket   = aws_s3_bucket.target_bucket.id
}

# Configure cross-region replication for the source bucket
resource "aws_s3_bucket_replication_configuration" "replication_config" {
  bucket = aws_s3_bucket.source_bucket.id

  role = "arn:aws:iam::YOUR_ACCOUNT_ID:role/your-replication-role"

  destination {
    bucket = aws_s3_bucket.target_bucket.id
  }

  rules {
    id     = "rule-1"
    status = "Enabled"
    prefix = ""
  }
}

Using VPC access for S3 access

  • Network isolation: by using a VPC, you ensure that your data does not get exposed to external threats and it is available only inside the AWS network
  • Data control: allows organizations to maintain full control over data access

Leverage Amazon Macie to discover sensitive data

Amazon Macie is a security service that discovers sensitive data by using machine learning and pattern matching

More info here: https://docs.aws.amazon.com/macie/latest/user/what-is-macie.html

References

This post is licensed under CC BY 4.0 by the author.