This checklist Checklist summarizes best practices for drafting and implementing architectures for AWS.
The checklist covers the basics and helps you not to lose sight of a single critical point.
Sr Number | Checklist | Questionnaire | Additional Information | Recommended Solution |
1 | Multi-AZ | Is every component distributed among at least two availability
zones? Make sure you have specified Multi-AZ as a requirement for services that come with a built-in option for Multi-AZ deployments. check whether your application is running on multiple EC2 instances. Make sure to distribute the EC2 instances among at least two AZs in parallel. | AWS
partitions their regions in so-called availability zones (AZ). An
availability zone consists of one or multiple isolated data centers. There are three types of AWS services: Services that are operating among multiple AZs by default Services that come with a built-in option for Multi-AZ deployments Services that you need to deploy among multiple AZs yourself There is no action needed for services that are operating among multiple AZs by default. Multi-AZ by default: Route 53, CloudFront, S3, SQS Multi-AZ optional: ALB, RDS Aurora, EFS, Auto Scaling Action required: EC2 Instances AWS does not even offer an SLA for EC2 instances running in a single AZ only. | Auto Scaling Groups to make sure EC2 instances are distributed among two Availability Zones. |
2 | Multi-Region | Is it necessary to deploy your application to multiple regions? | Availability
comes with a cost. It is complicated and expensive to go to the Multi-Region
path Multi-Region deployments are exceptions, not the norm. | |
3 | Stateless Server | Is your application persisting data on the virtual machines – either in memory or on disk? | Doing so enables you to add or remove EC2 instances on demand, roll out new versions of your application without any service interruption, replace failed EC2 instances automatically without losing any data. | Aim to implement the concept of a stateless server. Instead of persisting data on a single EC2 instance, outsource storing data to the database (for example, RDS Aurora), an object store (e.g., S3), or an NFS filesystem (EFS for Linux or FSx for Windows File Server). |
4 | Auto Recovery | Do all components recover from failure automatically? | The
Application Load Balancer uses a DNS name to be able to shift away from an AZ
affected by an outage. RDS Aurora will failover to a replica instance automatically within minutes after the primary instance becomes unavailable. S3 is fault-tolerant by design and will route incoming requests to healthy nodes automatically. EC2 does not recover from failure automatically. | There
are two options to recover failed instances automatically: An Auto Scaling Group uses built-in EC2 health checks or the health check status from a linked load balancer to detect failed instances, terminates failed instances, and replaces them by launching a new instance. EC2 Auto Recovery allows you to recover a single EC2 instance automatically. The built-in EC2 health checks are used to detect failures. |
5 | Decouple Client-Server-Communication | Are clients communicating with EC2 instances directly? | Try to decouple the client from the server by introducing a
load balancer or a queue. AWS provides three types of load balancers: Classic Load Balancer (CLB) a legacy load balancer that you should not use for new architectures anymore. Application Load Balancer (ALB) a layer-7 load balancer that you should use for HTTP/HTTPS based communication. Network Load Balancer (NLB) a load balancer that you should use for all communication that does not use HTTP/HTTPS. | Use an Application Load Balancer to decouple the HTTPS communication between client and server. The browser does not need to know which server is available to answer requests. Instead, the browser sends requests to the ALB, which is responsible for forwarding the request to a healthy server |
6 | Auto Scaling | Are you using Auto Scaling Groups to adjust the number of EC2 instances automatically. | Make sure your architecture leverages Auto Scaling Groups to adjust the number of running EC2 instances. | There
are two options to do so: Based on a schedule. Based on a utilization metric. For example, you could set the number of EC2 instances to 4 during working hours and to 2 during the night and weekends. |
7 | Backup and Restore | Is their a way to backup all persistent data? | Use
AWS Backup to create snapshots of your data periodically. Currently, AWS
Backup supports the following services: EC2 Instances EBS Volumes RDS DynamoDB EFS AWS Storage Gateway Make sure the RPO (Recovery Point Objective) is defined, and the AWS Backup plans are configured accordingly. | Decide whether your architecture will be able to fulfill your RTO (Recovery Time Objective). Unfortunately, AWS typically does not offer any guarantees for the duration of a restore. |
8 | Encrypt Data-In-Transit | Encrypt all data in transit. | No matter if the data is traversing the public Internet or just your private network (VPC). | Use
the Amazon Certificate Manager (ACM) to manage TLS/SSL certificates and make
sure you have specified HTTPS/TLS connections for incoming traffic to your
load balancers only. you should make use of the TLS/SSL encrypted endpoints that database services like RDS are offering. The communication with AWS API’s (e.g., S3 or SQS) uses HTTPS and is encrypted by default |
9 | Encrypt Data-At-Rest | Verify that all components also encrypt data-at-rest. | Most AWS services allow you to specify a key managed by the AWS Key Management Service (KMS) to encrypt and decrypt your data. | AWS
managed CMKs (customer master key) are easy to use but do not allow fine
granular access control. Customer managed CMKs are more complicated to use but allow you to control access and use keys among multiple AWS accounts. Use key material generated by AWS or bring your key material. Depending on your requirements, it might be necessary to use AWS CloudHSM to protect your master keys. |
10 | Minimize Network I/O | Your architecture Should minimize network I/O | AWS charges for network traffic. First of all, you have to pay for traffic from AWS to the Internet. On top of that, you are also paying for traffic between availability zones. | Do so by compressing data, reduce communication overhead, disable cross-region load balancing |
11 | Maximum I/O Throughput | Verify the maximum I/O throughput for every network connection in your architecture. | The
EC2 instance type defines the maximum network throughput between the ALB and
your EC2 instance. The EC2 instance type defines the maximum network throughput between the EC2 instance and SQS/RDS/EFS. The RDS instance type defines the maximum network throughput between the RDS instance and the EC2 instances. | |
12 | Caching | Add caching layers to reduce the number of read requests | Introducing caching decreases the load on all underlying parts of your architecture. But caching comes with a downside, you have to find a way to invalidate the cache. | AWS
offers a few services that allow you to introduce caching to your
architecture: CloudFront, the content delivery network, caches responses to HTTP requests. ElastiCache provides in-memory databases (Redis or memcached) that you can integrate into your applications, for example, to cache responses from the database or to store pre-calculated results. DynamoDB DAX is a caching layer for Amazon’s NoSQL database. |
13 | Protect Network Boundaries | Make sure you have specified network access only for a minimum of endpoints. | From a network security perspective, it is crucial to reduce the attack surface from the Internet to a minimum. | Divide
your VPC into public and private subnets. EC2 Instances and database services should be placed into a private subnet to avoid incoming traffic from the Internet at all. |
14 | Least Privilege Principle for IAM | Make sure your architecture follows the Least Privilege Principle when it comes to accessing AWS APIs. | Identity and Access Management (IAM) controls who can access or administer your cloud resources. | EC2
instances, ECS tasks, and Lambda functions should make use of IAM roles to
authenticate for AWS API calls. |
Leave a Reply