What is Data: Information related to any object is data
What is Database: Database is a systemic collection of data. Databases support storage and manipulation of data.
Relational Database (RDS)—Database which is interrelated to some other database. There would be one primary key which would be common between different data base. Eg Universities, Banks would have Relational database. This is comparatively slow than NoSQL (non-relational). All fields in the database table must be filled. Best suited for OLTP (Online transaction processing). Each table has its own primary key.
- A schema is used design of a database .
- For web application, My SQL is used.
- Common applications for My SQL include php and java based web applications that require a database storage backend..eg Joomla.
RDS can not scale out horizontally (New Server cannot be added).
All RDS are SqL .
Non-Relational Database /No SQL features:
- Non-Relational databases stores data without a structured mechanism to link data from different tables to the one another.
- Require low cost hardware.
- Much faster performance (Read/Write) compared to Relational Database.
- Horizontal scaling is possible. (Adding new servers to existing server)
- Never provides table with flat fixed column record.
- Autoscaling is possible in Non-Relational database but not possible in RDS.
- Best suited for Online Analytical processing
- Eg MongoDB, DynamoDB, Cassandra,
Types of No-SQL database:
Columnar DB—Example -Cassandra , HBase.
Information is stored in columns instead of Row. Faster processing because of columnar database and uses less disk space.
Document DB—Example-MangoDB, CouchDB, RavenDB
Database is in document format using JSON script, Synchronization is easy., efficient for storing catalogue, useful for blogs and video platforms.
Key Value Database –Example Redis, DynamoDB, TokyoCabinet
Database is Related to one key information/entity. A unique key is defined .Cache memory used to used frequent keys.
Used for tasks to maintain Session like cart payment. Session stores in fb, twitter.
Amazon Elastic cache is an example of Key value stores.
Graph based DB—Neo4J,Flock DB
A graph DB is basically a collection of Nodes and Edges.
Each node represents an entity and each edge represents a connection or relationship between two nodes.
Used in research organisations.
In a AWS fully managed Relational DB engine service , AWS is responsible for:
- -Security and patching
- -Automated backup
- -Software update for the DB engine
- -If selected, Multi Az and synchronous Replication between the active and standby DB instances
Settings Managed by user
- 1.Managing DB settings.
- 2.Creating relational DB schema.
- 3.DB performance tuning.
Relational Database Engine Options:
- MS SQL Server
- My SQL
- AWS Aurora—Best throughput
- Postgre SQL—Open source-Highly Reliable and stable
- Maria DB
Two types of licensing option-
- BYOL—Bring your own license
- License from AWS and charged hourly basis.
- Upto 40DB instances per account can be created.
- 10 of the 40 can be oracle or MSSQL server under licence included model.
- Under BYOL model, maximum 40 DB instance can be created.
Please note–EBS is used in Database.
AWS RDS use EBS Volumes (not instance store) for DB and logs storage.
2 Types of storage available for RDS:
1.General purpose use for DB workloads with moderate I/O requirements.
Storage Limits—min—20 GB Max—16 TB
2. Provisional IOPS RDS Storage–
Used for high performance OLAP workloads
Storage Limits Min—100 GB
Provisioned IOPS used where multi AZ scenario followed.
Templates available in RDS—
- Free Tier- Multi AZ option not available.
EC2 instance types for DB Engine:
M series instances./Max capacity—96 Vcpu / Max throughput -14000 MBPS / Ram—384 GB
2.Memory Optimised Class—
R or X series /Max capacity—96 Vcpu / Max throughput -14000 MBPS /Max ram—768 GB
3.Burstable Class-For normal work
/Max capacity—8 Vcpu / Max throughput -1550 MBPS /Max ram—32 GB.
RDS Reserved Instance’s Region, DB Engine, DB Instance Class, Deployment Type and term length must be chosen at purchase, and cannot be changed later.
Important Notes about Purchases
– Reserved Instance prices cover instance costs only. Storage and I/O are still billed separately.
– Region, DB Engine, DB Instance Class, Deployment Type and term length must be chosen at purchase, and cannot be – changed later.
– You can purchase up to 40 Reserved Instances. If you need additional Reserved Instances, complete the form found here.
– Reserved Instances may not be transferred, sold, or cancelled and the one-time fee is non-refundable.
Concept of Multi AZ in RDS:
- You can select Multi-AZ option during RDS DB instance launch (note-not after).
- Provides high availability and
- RDS Service creates a standby instances in a different AZ in the same Region and configure synchronous replication between primary and standby.
- You can not read/write on standby instance.
- You cannot select which Availability Zone to select for Standby instance.
- You can view with AZ standby is created.
- Dependent on instance class, failover may take upto 2 mins to switch over to standby.
- After switchover, Standby becomes primary .
- While using Multi AZ, storage type recommended is Provisioned IOPS.
Scenarios in which failover will occur:
- 1.Primary DB instance is failed.
- 2.Network connectivity issue
- 3.In case of AZ failure
- 4.Loss of Primary EC2 instance failure
- 5.EBS failure of Primary DB instances
- 6.Primary DB instance is changed
- 7.Patching the OS of DB instance
- 8.Manual failover.
Multi AZ RDS failover consequences:
- 1.During failover , CNAME of RDS DB instance is updated to map to the standby IP address
- 2.It is recommend to use the endpoint to reference your DB instances and not its IP address.
- 3.The CNAME does not change, because the RDS end point does not change. RDS endpoint does not change by selecting Multi-AZ Option, however the primary and Standby instances will have different IP address, as they are in diff AZ.
What is difference between Amazon RDS Multi-AZ and Read Replicas: Multi AZ deployment is used for high availability or failover whereas Read Replicas are read scalability.
When we do Manual failover:
- 1.In case of rebooting
- 2.This is done by selecting reboot with failover option on the primary RDS DB instances
- 3.A DB instance reboot is required for changes to take effect when you change the DB parameter group or when you change a static DB parameter.
AWS RDS Backup and Retention period:
- Whenever failover occurs, AWS RDS sends SNS notification.
- You can use API /CLI calls to find out the event history of last 14 days, with console only last one day events can be checked.
- In case of OS patching, system upgrades, DB scaling – These things happen on standby first, then on primary to apply changes.
- In multi AZ , snapshot and automated backups are done on standby instance to avoid I/O suspension on Primary.
- Maintenance firstly performed on Standby. Now covert standby to Primary, so that maintenance can be done on Primary.
- In Multi AZ, Version upgrade is done simultaneously on both primary and standby and must be done on maintenance window. Maintenance window is not counted in SLA.
- There are 2 methods to take backups-Automated backups-at a certain time period. Manual backup—whenever required.
- Either you can take backup of Instance or Database.
- You can take snapshot of Instance or Database.
- Back up taken would be stored in diff AZ.
- Backup is stored in Amazon S3.
- In Multi AZ, backup is taken of Standby instance.
- DB instance must be in Active state for automated backup
- RDS automatically backups DB instance daily and stores in diff AZ, by default retains it for 7 days and then delete it. Retention period can be increased to 35 days.
- No additional charge for backing up DB instance but for storage.
- Automated backup are deleted when RDS DB instance is deleted.
- An outage occurs if you change the backup retention period from Zero to Non-Zero value or the other way.
- Retention period by default is 7 days when done through console or 1 day when done through CLI/API.
- If backup is not required, change Retention period to Zero.
- For replication to operate effectively, each read replica should have the same amount of compute and storage resources as the source DB instance
RDS Encryption, Manual Backup and Billing:
- In case of Manual snapshot, point in time recovery is not possible.
- Manual snapshot is also stored in S3
- For Manual Snapshot-They are not deleted automatically if RDS instance is deleted.
- You can take final snapshot before deleting RDS instance.
- You can share manual snapshot directly with other AWS account.
- When you restore a DB instance ,only default DB parameters are launched.
- You cannot restore a DB snapshot into an existing DB instance, so a new DB instance needs to be created.
- Restoring from a backup or a DB snapshot changes the RDS endpoint–At the time of restoring, you can change the storage type (General purpose or provisioned)
You cannot encrypt an existing unencrypted database.
- To do that you need to
- Create a new encrypted instance and migrate the data to it.
- You can restore from a backup snapshot into a new encrypted RDS instance.
- RDS supports encryption at rest for all DB engines.
What all data are encrypted when data at rest:
- 1.All snapshots
- 2.Backup of DB (S3 Storgae)
- 3.Data on EBS Volume
- 4.Read Replicas created from Snapshot.
Billing of Database:
- No Upfront cost
- You have to pay only for :
- 1-DB instance size (charged per hour)
- 2.-storage of EBS volume (charged monthly)
- 3-DATA Transferred through Internet (charged per GB)
- 4.Backup storage-that is S3 storage charges depending on retention period.
Multi-AZ Billing: Payment will be doubled
- Multi AZ DB instance
- Provisioned storage
- Double write I/0
- You are not charged for Database transfer during replication from primary to standby.
The best options to enhance RDS performance = Read Replicas + Larger instance types.
Amazon RDS Read Replicas provide enhanced performance and durability for RDS database (DB) instances. They make it easy to elastically scale out beyond the capacity constraints of a single DB instance for read-heavy database workloads. You can create one or more replicas of a given source DB Instance and serve high-volume application read traffic from multiple copies of your data, thereby increasing aggregate read throughput. Read replicas can also be promoted when needed to become standalone DB instances. Read replicas are available in Amazon RDS for MySQL, MariaDB, PostgreSQL, Oracle, and SQL Server as well as Amazon Aurora.
Aurora Auto Scaling enables your Aurora DB cluster to handle sudden increases in connectivity or workload. When the connectivity or workload decreases, Aurora Auto Scaling removes unnecessary Aurora Replicas so that you don’t pay for unused provisioned DB instances. You define and apply a scaling policy to an Aurora DB cluster. The scaling policy defines the minimum and maximum number of Aurora Replicas that Aurora Auto Scaling can manage. Based on the policy, Aurora Auto Scaling adjusts the number of Aurora Replicas up or down in response to actual workloads, determined by using Amazon CloudWatch metrics and target values
Amazon EC2 provides a wide selection of instance types optimized to fit different use cases. Instance types comprise varying combinations of CPU, memory, storage, and networking capacity and give you the flexibility to choose the appropriate mix of resources for your applications. Each instance type includes one or more instance sizes, allowing you to scale your resources to the requirements of your target workload.
Dynamo Db handles unstructured data. Dynamo DB is highly available as its 3 copies are created within AZ.
DynamoDb table has different items. Each item contains attributes.There must be at least one Primary key with unique value.
A table is collection of data items. DynmoDb table stores data in table.
Items: Each table contains Multiple Items. An item is an group of attributes that is unique identifiable among all of the other items.
An item consists of primary or composite key and a flixble number of attributes
Items in a DynamoDB are similar into Rows .
Each item consist of one or more attributes.
An attribute consists of the attribute name and a value or set of values
An attribute is an fundamental data elements ,something that does not need to be broken down any further.
Aggregate size of an item cannot be more than 400 Kilo Bytes.
DynamoDB allows low latency read/write access to items ranging from 1 byte to 400 KB.
DynamoDB can be used to store pointers to S3 stored objects,or items sizes larger than 400 KB too if needed.
DynamDB stores data indexed by a primary key.You specify the primary key when you create the table.
Each item in the table has unique identifier or primary key, that distinguish the items from all of the others in the table.
The primary key is the only attributes required for items in the table.
DynamoDB are schemaless.
Each items can have its own distint attributes.
DynamoDB Read capacity unit-
1 RCU=4 KB –Strongly consistent (hold for few secs till last update is done , so there is a delay.
1 RCU=8 KB/sec –Eventual consistent (data available will be given instantly for read)-fast in comparison to strongly consistent. Cheaper compared to Strongly consistent.
One read capacity unit represents one strongly consistent read per second or two eventually consistent reads per seconds for an item upto 4KB in size.
If you need to read an item that is larger than 4KB, DynamoDB will need to consume additional Read capacity Units.
The total number of read capacity unit depends on Item size and if you want Strongly consistent or eventually consistent Read.
Dynamo DB Write Capacity Unit
One write capacity unit represents one write per seconds for an item upto 1 KB in size.
If you need to write items that is larger than 1KB,DynamoDB will need to consume additional write capacity Units.
The total number of write capacity unit depends on the item size.
1 WCU—1 KB
DynamoDB is Used for applications where Read is more and write is less as it will be cost effective.
Data is stored in 3 data centres in same Availibility zone.
- Reads are cheaper than writes when using DynamoDB
- You pay for-Each table provisioned read/write throughput charged Hourly
- You are charged for provisioned throughput regardless whether you use or not.
- Indexed data storage (s3 url if more than 4kb)
- Internet data transfer from outside region
- Free tier- 25 read and 25 write capacity Units per month, above that is chargeable.
- 256 tables per account per region.
- No limits on the size of any table.
- DyanmoDB can do 10000 write capacity units and 10000 Read capacity units per seconds per table.
- Amazon Aurora is a MySQL and PostgreSQL compatible relational database built for cloud-simple and cost effective.
- 5 times faster than standard MySQL database and 3 times faster than PostreSQL database.
- Amazon Aurora is AWS Proprietary.
- Provides Security, Availability and Reliability at 1/10th of cost.
- Amazon Aurora is fully managed by Amazon RDS, which automates time consuming administration tasks like hardware provisioning, database setup, patching and backups.
- Fault tolerant and self healing storage system that can scale upto 128 TB per DB instance.
- High Performance and Availability–Upto 15 low latency read replicas, point in time recovery, continuous back up to Amazon S3, and replication across three Availability Zones.
- Scales in 10GB increments
- Automatic failover is available for Aurora replicas only.
- Amazon Aurora Pricing: Pay as you go with no upfront cost.
- Amazon Aurora Use Cases: Enterprise Applications, SaaS applications, Web and Mobile Gaming.
- Also refer-Elasticache: Notes on Memached,Redis and Exam Tips
To balance the requests request over multiple Aurora read replicas = Use Aurora Reader endpoint
Explanation-A reader endpoint for an Aurora DB cluster provides load-balancing support for read-only connections to the DB cluster. Use the reader endpoint for read operations, such as queries. By processing those statements on the read-only Aurora Replicas, this endpoint reduces the overhead on the primary instance. It also helps the cluster to scale the capacity to handle simultaneous SELECT queries, proportional to the number of Aurora Replicas in the cluster. Each Aurora DB cluster has one reader endpoint.
If the cluster contains one or more Aurora Replicas, the reader endpoint load-balances each connection request among the Aurora Replicas. In that case, you can only perform read-only statements such as SELECT in that session. If the cluster only contains a primary instance and no Aurora Replicas, the reader endpoint connects to the primary instance. In that case, you can perform write operations through the endpoint.
Amazon Aurora Global database:Amazon Aurora Global Database is designed for globally distributed applications, allowing a single Amazon Aurora database to span multiple AWS regions. It replicates your data with no impact on database performance, enables fast local reads with low latency in each region, and provides disaster recovery from region-wide outages.
As an alternative to cross-Region read replicas, you can scale read operations with mimimal lagtime by using an Aurora global database. An Aurora global database has a primary Aurora DB cluster in one AWS Region and up to five secondary read-only DB clusters in different Regions. Each secondary DB cluster
can include up to 16 (rather than 15) Aurora Replicas. Replication from the primary DB cluster to all
secondaries is handled by the Aurora storage layer rather than by the database engine, so lagtime for replicating changes is minimal—typically, less than 1 second.
Aurora Global Database replicates writes in the primary region with a typical latency of <1 second to secondary regions, for low latency global reads. In disaster recovery situations, you can promote a secondary region to take full read-write responsibilities in under a minute.
If your primary region suffers a performance degradation or outage, you can promote one of the secondary regions to take read/write responsibilities. An Aurora cluster can recover in less than 1 minute even in the event of a complete regional outage. This provides your application with an effective Recovery Point Objective (RPO) of 1 second and a Recovery Time Objective (RTO) of less than 1 minute, providing a strong foundation for a global business continuity plan.
DynamoDB stream captures a time ordered sequence of item level modification in any DynamoDB table.
Amazon DynamoDB auto scaling uses the AWS Application Auto Scaling service to dynamically adjust provisioned throughput capacity on your behalf, in response to actual traffic patterns. This enables a table or a global secondary index to increase its provisioned read and write capacity to handle sudden increases in traffic, without throttling. When the workload decreases, Application Auto Scaling decreases the throughput so that you don’t pay for unused provisioned capacity.