How to choose right database for your application in AWS
What is ACID Compliance in a Database?
Atomicity– Atomicity requires that either transaction as a whole be successfully executed or if a part of the transaction fails, then the entire transaction be invalidated.
Consistency-Transactions only pass in valid state which leads to Data integrity and prevents Data corruption.
Isolated- Different transactions are independent of each other and can be executed parallelly.
Durability: In databases that possess durability, data is saved once a transaction is completed, and is not lost because of power outage and system failure .
What is Base compliance in Database?
Basically Available: Basic read and write functionality with no consistency Guarantee. So some of the reads might not provide latest data and writes might not be reflected immediatly.
Soft State :State of system would be reflected after sometime without consistency guarantees.
Eventual Consistent: Reads reflect data after certain time post write operation inputs and then read and write operations becoming consistent after sometime.
BASE properties allow these databases to be distributed in nature and provides high scalability, fast read-write performance, easy replication and big data analytics capability.
Amazon RDS Explained
SQL systems are vertically scalable .Storage can scale up and down without downtime.
SQL based systems have static pre-defined schema.
Divides the Data among Tables.
Relationship established via the key enforced by the system.
Data Accuracy and consistency
Accessed via SQL-structured query language.
Relational Database use schemas to use Data.Data within Database is often accessed using a structured query language .RDS was created to minimize the effort involved in managing the Relational Database.
The service automates time-consuming administration tasks such as hardware provisioning, operational system and database setup, patching, and backups, while providing cost-efficient and resizable capacity.
The basic building block of Amazon RDS is the database instance class. When you create a database instance, you choose a database engine to run—like PostgreSQL or Amazon Aurora. The database engine manages and runs all database operations.
Another important consideration is the instance class, which determines how much memory, CPU, and I/O capabilities, in terms of network and storage throughput, will be available to the engine.
Amazon RDS provides enhanced availability and durability through the use of Multi-AZ deployments. This means that Amazon RDS creates multiple instances of the databases in different Availability Zones.
In case of an infrastructure failure, Amazon RDS performs an automatic failover to the standby in another Availability Zone. Database operations resume as soon as the failover is complete. You don’t have to update connection strings, because Amazon RDS uses a DNS service to point to the new master instance.
When you build your first Amazon RDS database, you have to make a few key decisions. First is the database instance type that determines the resources your database will have. Next is the type of database engine you want to run. You can choose from Amazon Aurora, PostgreSQL, MySQL, MariaDB, Oracle Database, and SQL Server. Each database engine has its own unique characteristics and features.
Use cases of RDS: Relational databases are commonly used for storing transactional data, like data from a shopping website or security records from a metal detector.
One of the biggest benefits of Amazon RDS is that you pay as you go.
First, you pay for the instance hosting the databases.
Second, you pay for the storage and I/O consumed by your database. Storage is billed per gigabyte per month, and I/O is billed per million-request.
Third, you pay for the amount of data transferred to or from the internet and other AWS Regions.
PostgreSQL is an enterprise class, open source relational database management system.
Amazon Aurora is a MySQL and PostgreSQL-compatible relational database built for the cloud that combines the performance and availability of traditional enterprise databases with the simplicity and cost-effectiveness of open source databases.
Amazon Aurora is up to five times faster than standard MySQL databases and three times faster than standard PostgreSQL databases. It provides the security, availability, and reliability of commercial databases at 1/10th the cost.
Amazon Aurora is fully managed by Amazon Relational Database Service (RDS), which automates time-consuming administration tasks like hardware provisioning, database setup, patching, and backups.
Amazon Aurora features a distributed, fault-tolerant, self-healing storage system that auto-scales up to 128TB per database instance.
Amazon Aurora automatically increases the size of your database volume as your storage needs grow. Your volume expands in increments of 10 GB up to a maximum of 128 TB. You don’t need to provision excess storage for your database to handle future growth
It delivers high performance and availability with up to 15 low-latency read replicas, point-in-time recovery, continuous backup to Amazon S3, and replication across three Availability Zones.You can increase read throughput to support high-volume application requests by creating up to 15 database Amazon Aurora Replicas. Aurora Replicas share the same underlying storage as the source instance, lowering costs and avoiding the need to perform writes at the replica nodes.
Aurora provides a reader endpoint so the application can connect without having to keep track of replicas as they are added and removed.
It also supports auto-scaling, automatically adding and removing replicas in response to changes in performance metrics that you specify.
Aurora supports cross-region read replicas. These provide fast local reads to your users, and each region can have an additional 15 Aurora Replicas to further scale local read.
On instance failure, Amazon Aurora uses Amazon RDS Multi-AZ technology to automate failover to one of up to 15 Amazon Aurora Replicas you have created in any of three Availability Zones. If no Amazon Aurora Replicas have been provisioned, in the case of a failure, Amazon RDS will attempt to create a new Amazon Aurora DB instance for you automatically.
Amazon Aurora is designed to offer 99.99% availability, replicating 6 copies of your data across 3 Availability Zones and backing up your data continuously to Amazon S3.
It transparently recovers from physical storage failures; instance failover typically takes less than 30 seconds.
Amazon Aurora storage is fault-tolerant, transparently handling the loss of up to two copies of data without affecting database write availability and up to three copies without affecting read availability.
Amazon Aurora storage is also self-healing; data blocks and disks are continuously scanned for errors and replaced automatically.
You can also backtrack within seconds to a previous point in time to recover from user errors.
With Global Database, a single Aurora database can span multiple AWS Regions to enable fast local reads and quick disaster recovery. Global Database uses storage-based replication to replicate a database across multiple AWS Regions, with typical latency of less than one second. You can use a secondary region as a backup option in case you need to recover quickly from a regional degradation or outage. A database in a secondary region can be promoted to full read/write capabilities in less than 1 minute.
On an encrypted Amazon Aurora instance, data in the underlying storage is encrypted, as are the automated backups, snapshots, and replicas in the same cluster.
Use case of Amazon Aurora– Web and mobile games that are built to operate at very large scale need a database with high throughput, massive storage scalability, and high availability. Amazon Aurora fulfills the needs of such highly demanding applications with enough room for future growth.
Amazon Aurora Serverless is an on-demand, auto-scaling configuration that automatically adjusts database capacity based on application needs. With Aurora Serverless, you only pay for the database capacity, storage, and I/O your database consumes when it is active. Your database capacity automatically scales up or down to meet your application workload needs and shuts down during periods of inactivity, saving you money and administration time
Aurora Serverless measures database capacity in Aurora Capacity Units (ACUs) billed per second. 1 ACU has approximately 2 GiB of memory with corresponding CPU and networking, similar to what is used in Aurora provisioned instances.
DynamoDB used for storing session state (key value pair)
Amazon DynamoDB is a fully managed, serverless, key-value NoSQL database designed to run high-performance applications at any scale. DynamoDB offers built-in security, continuous backups, automated multi-region replication, in-memory caching, and data export tools.
Use Cases-Scale throughput and concurrency for media and entertainment workloads such as real-time video streaming and interactive content, and deliver lower latency with multi-region replication across AWS Regions.
DynamoDB supports high-traffic, extreme-scaled events and can handle millions of queries per second.
DynamoDB automatically scales throughput capacity to meet workload demands, and partitions and repartitions your data as your table size grows.
DynamoDB synchronously replicates data across three facilities in an AWS Region, giving you high availability and data durability.
When reading data from DynamoDB, users can specify whether they want the read to be eventually consistent or strongly consistent.
DynamoDB supports GET/PUT operations by using a user-defined primary key. The primary key is the only required attribute for items in a table. You specify the primary key when you create a table, and it uniquely identifies each item.
DynamoDB is a fully managed cloud service that you access via API. Applications running on any operating system (such as Linux, Windows, iOS, Android, Solaris, AIX, and HP-UX) can use DynamoDB.
How does billing happens in DynamoDB?
Each DynamoDB table has provisioned read-throughput and write-throughput associated with it. You are billed by the hour for that throughput capacity. Note that you are charged by the hour for the throughput capacity, whether or not you are sending requests to your table.
Maximum throughput per DynamoDB table is practically unlimited.
The smallest provisioned throughput you can request is 1 write capacity unit and 1 read capacity unit for both auto scaling and manual throughput provisioning. Such provisioning falls within the free tier which allows for 25 units of write capacity and 25 units of read capacity. The free tier applies at the account level, not the table level.
DynamoDB table classes
DynamoDB offers two table classes designed to help you optimize for cost. The DynamoDB Standard table class is the default, and recommended for the vast majority of workloads. The DynamoDB Standard-Infrequent Access (DynamoDB Standard-IA) table class is optimized for tables that store data that is accessed infrequently, where storage is the dominant cost.
In DynamoDB, there is no limit to the number of items you can store in a table.
Each item in the table has a unique identifier, or primary key, that distinguishes the item from all of the others in the table.When you create a table, in addition to the table name, you must specify the primary key of the table. The primary key uniquely identifies each item in the table, so that no two items can have the same key.
You can create one or more secondary indexes on a table. A secondary index lets you query the data in the table using an alternate key, in addition to queries against the primary key.
DynamoDB supports two kinds of indexes:
- Global secondary index – An index with a partition key and sort key that can be different from those on the table.
- Local secondary index – An index that has the same partition key as the table, but a different sort key.
Each table in DynamoDB has a quota of 20 global secondary indexes (default quota) and 5 local secondary indexes.
DynamoDB Streams is an optional feature that captures data modification events in DynamoDB tables. The data about these events appear in the stream in near-real time, and in the order that the events occurred.
Each stream record also contains the name of the table, the event timestamp, and other metadata. Stream records have a lifetime of 24 hours; after that, they are automatically removed from the stream.
This enables a table or a global secondary index to increase its provisioned read and write capacity to handle sudden increases in traffic, without throttling. When the workload decreases, Application Auto Scaling decreases the throughput so that you don’t pay for unused provisioned capacity.. You can modify your auto scaling settings at any time.
Application Auto Scaling uses a target tracking algorithm to adjust the provisioned throughput of the table (or index) upward or downward in response to actual workloads, so that the actual capacity utilization remains at or near your target utilization.
You can set the auto scaling target utilization values between 20 and 90 percent for your read and write capacity.
Amazon DynamoDB Global Tables- With global tables, you can give massively scaled, global applications local access to an Amazon DynamoDB table for fast read and write performance. You also can use global tables to replicate DynamoDB table data to additional AWS Regions for higher availability.
Document DB Use case: Patient Data, Product catalogue. Can be accessed through APIs and SDKs.
Ledger Database–which cannot be changed.
Volume Types for Databse:
Database Migration Explained:
Amazon WQF–Workload Qualification Framework Tool
Schema Conversion Tool-
If you are changing from one engine to another, you can use the AWS Schema Conversion Tool to convert schemas, from let’s say Oracle to MySQL.
Converts Database Object from Source Database to Target Database (eg Oracle to Aurora).
AWS Data Migration Service -AWS DMS
AWS DMS supports migration of similar or dissimilar Databases.
Please note only Proprietary to Open Source conversion is only supported by DMS.
Post Migration Checks
Parameter group in Database:
Default Parameter group and Default Option group can not be edited.
Version Upgrade–For Major version upgrade, a new parameter group is needed for new version.
How are Replicas Upgraded:
We can upgrade the engine version of snapshot to higher supported verson.
Copies of snapshot can be upgraded keeping the original.
Copying and Sharing Snapshots.