- S3 provides Object storage that manages data as objects as opposed to other storage architecture.
- Durable, highly available, and infinitely scalable data storage infrastructure at very low costs.
- Highly available—Objects stored across multiple devices across 3 Availability zones.
- Persistent-Stores the data when power off.
- Unlimited Storage available.
- S3 is a simple key-based object store.
- File size can be from 0 bytes to 5 TB.
- Largest object that can be uploaded in single PUT is 5 Gigabytes.
- For objects larger than 100 MB –multipart upload can be used.
- Read after Write consistency for PUTS of new objects
- Eventual consistency for overwrite PUTS and DELETES (takes time to propagate)
- http 200 code—indicates successful write to S3.
- S3 DATA consists of Key, Value, Metadata, Version and ACL.
- Read or Write performance can be increased exponentially. No limit to number of prefixes to the bucket.
- For read intensive requests, you can also use CloudFront edge locations to offload from S3.
- Buckets are flat container for objects.
- 100 buckets can be created in an account.
- Unlimited objects can be uploaded in bucket.
- Can create folders in a bucket.
- Cannot create nested buckets.
- Bucket name cannot be changed after being created and bucket ownership cannot be transferred.
- Bucket names are part of URL– URL is in this format: https://s3-eu-west-1.amazonaws.com/<bucketname>
- Bucket name must be unique.
- S3 bucket is a region specific.
- A bucket can be backed up to another bucket in another account.
- By default a bucket, its objects, and related sub-resources are all private.
- By default only a resource owner can access a bucket.
- The resource owner refers to the AWS account that creates the resource.
- When you create a bucket or an object, S3 creates a default ACL that grants the resource owner full control over the resource.
- Bucket policies are limited to 20 KB in size
- Object ACLs are limited to 100 granted permissions per ACL
- The only recommended use case for the bucket ACL is to grant write permissions to the S3 Log Delivery group.
- ACL permissions cannot be granted to individual users.
- You cannot explicitly deny access
- Each Object is stored and retrieved by a unique key.
- Object in S3 is identified by-Service end points, Bucket name, Object Key and Object Version.
- Permissions on objects can be set at any time using the AWS Management Console.
S3 Storage classes:
Standard (Default)-Fast 99.99 % Availability, 11 9s durability. Replicated across 3 availability zones.
Intelligent Tiering: Uses Machine learning to analyse your object usage and determined the appropriate usage class.
Standard infrequently accessed (IA)—Still fast and cheaper if you access files less than one.
Additional retrieval fee is applied.50 % less than standard.
One Zone IA: Objects stored in the S3 One Zone-IA storage class are stored redundantly within a single Availability Zone in the AWS Region you select. Availability is 99.5 % but cheaper than standard IA by 20 % less. Retrieval fees is applied.
Glacier: For long term cold storage. Retrieval of data takes from minutes to hours. Very cheap storage.
Glacier Deep Archive: Lowest cost storage class, Data retrieval time is 12 hours.
For S3 Standard, S3 Standard-IA, and Amazon Glacier storage classes- objects are automatically stored across multiple devices spanning a minimum of three Availability Zones.
With IAM the account owner rather than the IAM user is the owner
Within an IAM policy you can grant either programmatic access or AWS Management Console access to Amazon S3 resources.
Amazon Resource Names (ARN) are used for specifying resources in a policy.
The format for any resource on AWS is:
The format for S3 resources is:
A bucket owner can grant cross-account permissions to another AWS account (or users in an account) to upload objects.
The AWS account that uploads the objects owns them.
The bucket owner does not have permissions on objects that other accounts own, however: The bucket owner pays the charges .
The bucket owner can deny access to any objects regardless of ownership .
The bucket owner can archive any objects or restore archived objects regardless of ownership.
Authenticated Users group: Access permission to this group allows any AWS account access to the resource. All requests must be signed (authenticated) .Any authenticated user can access the resource.
All Users group:
- Anyone in the world can access to the resource •
- The requests can be signed (authenticated) or unsigned (anonymous) •
- Unsigned requests omit the authentication header in the request.
Log Delivery group: Providing WRITE permission to this group on a bucket enables S3 to write server access logs • Not applicable to objects.
|Bucket Policy Granting users permissions to a bucket owned by your account.||User Policy Granting permissions for all Amazon S3 operations|
|Managing object permissions (where the object owner is the same account as the bucket owner)||Managing permissions for users in your account|
|Managing cross-account permissions for all Amazon S3 permissions.||Granting object permissions to users within the account.|
For an IAM user to access resources in another account the following must be provided:
- Permission from the parent account through a user policy.
- Permission from the resource owner to the IAM user through a bucket policy, or the parent account through a bucket policy, bucket ACL or object AC.
An account that receives permissions from another account cannot delegate permissions crossaccount to a third AWS account.
Charges in S3:
- There are No charge for data transferred between EC2 and S3 in the same region.
- Data transfer into S3 is free of charge.
- Data transferred to other regions is charged
- Data Retrieval (applies to S3 Standard-IA and S3 One Zone-IA)
- Per GB/month storage fee .
- Data transfer out of S3
- Upload requests (PUT and GET) •
- Retrieval requests (S3-IA or Glacier)
- The bucket owner will only pay for object storage fees
- The requester will pay for requests (uploads/downloads) and data transfers.
- Can only be enabled at the bucket level.
Multipart Uploads: It is recommended for objects of 100MB or larger .Can be used for objects from 5MB up to 5TB . Must be used for objects larger than 5GB
Transfer acceleration Amazon S3 Transfer Acceleration enables fast, easy, and secure transfers of files over long distances between your client and your Amazon S3 bucket.
S3 Transfer Acceleration uses Amazon CloudFront’s globally distributed AWS Edge Locations Used to accelerate object uploads to S3 over long distances (latency)
You are charged only if there was a benefit in transfer time.
Need to enable transfer acceleration on the S3 bucket.
Cannot be disabled, can only be suspended.
URL is: .s3-accelerate.amazonaws.com
AWS Static Website: Does not support HTTPS/SS. Automatically scales.
Pre-Signed URLs : Pre-signed URLs can be used to provide temporary access to a specific object to those who do not have AWS credential. Pre-signed URLs can be used to provide temporary access to a specific object to those who do not have AWS credential. Expiration date and time must be configured.
Lifecycle Management: to optimize storage costs, adhere to data retention policies and to keep S3 volumes well maintained.
- An object must be in S3 Standard for at least 30 days before it can be transitioned to S3 Standard-IA.
- Objects must be stored in S3 Standard-IA for at least 30 day
- You cannot use a lifecycle policy to move an object from Glacier to S3 Standard or S3 StandardIA (restore to S3 One Zone-IA and copy
- Objects less than 128KB will not be transitioned to S3 Standard-IA.
- You cannot use a lifecycle policy to move an object from Glacier to S3 Standard or S3 Standard IA (restore to S3 One Zone-IA and copy)
- Cannot be used to change a storage class to S3 One Zone-IA.
- Objects less than 128KB will not be transitioned to S3 Standard-IA
- Can be used in conjunction with versioning or independently.
- Can be applied to current and previous versions.
Can be applied to specific objects within a bucket: objects with a specific tag or objects with a specific prefix
- Bucket level configuration The following actions can be performed.
- Transition to S3-IA (128Kb and 30 days after creation date).
- Archive to Glacier (30 days after IA if applicable)
- Can use SLL endpoints to securely upload/download your data to Amazon S3 using the HTTPS protocol (In Transit – SSL/TLS)
- Encryption Options:
- SSE-S3 :SSE Server side encryption with S3 managed key-Uses AES 256.
- SSE-KMS : Server Side encryption with AWS KMS keys.This is chargeable.
- SSE-C – Server-Side Encryption with client provided key. Client manages the keys, S3 manages encryption. If keys are lost data cannot be decrypted.
Object Tag: S3 object tags are key-value pairs applied to S3 objects which can be created, updated or deleted at any time during the lifetime of the object. Up to ten tags can be added to each S3 object and you can use either the AWS Management Console, the REST API, the AWS CLI, or the AWS SDKs to add object tag.
Allow you to create Identity and Access Management (IAM) policies, setup S3 Lifecycle policies, and customize storage metrics.
S3 Cloudwatch metrics:
CloudWatch Request Metrics will be available in CloudWatch within 15 minutes after they are enabled.
CloudWatch Storage Metrics are enabled by default for all buckets, and reported once per day.
The S3 metrics that can be monitored include:
S3 requests • Bucket storage • Bucket size • All requests • HTTP 4XX/5XX errors.
Cross Region Replication: Automatically replicates data across AWS Regions.
CRR is configured at the S3 bucket level. You can use either the AWS Management Console, the REST API, the AWS CLI, or the AWS SDKs to enable CRR.
Replication is 1:1 (one source bucket, to one destination bucket)
You can configure separate S3 Lifecycle rules on the source and destination buckets.
The replicas will be exact replicas and share the same key names and metadata.
AWS S3 will encrypt data in-transit with SSL.
AWS S3 must have permission to replicate objects.
Bucket owners must have permission to read the object and object ACL.
You can specify a different storage class (by default the source storage class will be used).
Triggers for replication are:
Uploading objects to the source bucket.
DELETE of objects in the source bucket.
Changes to the object, its metadata, or ACL.
What is replicated: •
New objects created after enabling replication.
Changes to objects.
Objects created using SSE-S3 using the AWS managed key.
Object ACL updates.
What isn’t replicated:
Objects that existed before enabling replication (can use the copy API).
Objects created with SSE-C and SSE-KMS.
Objects to which the bucket owner does not have permissions.
Updates to bucket-level sub-resources.
Actions from lifecycle rules are not replicated.
Objects in the source bucket that are replicated from another region are not replicated.
AWS Glacier-Requested archival data is copied to S3 One Zone-IA.
AWS Glacier–Requested archival data is copied to S3 One Zone-IA. Following retrieval, you have 24 hours to download your data. You cannot specify Glacier as the storage class at the time you create an object. There is no SLA. Glacier is designed to sustain the loss of two facilities.
Glacier automatically encrypts data at rest using AES 256 symmetric keys and supports secure transfer of data over SSL.
Glacier does not archive object metadata, you need to maintain a client-side database to maintain this information.
Archives can be 1 bytes up to 40TB. Glacier file archives of 1 byte – 4 GB can be performed in a single operation.
The contents of an archive that has been uploaded cannot be modified.
You can upload data to Glacier using the CLI, SDKs or APIs – you cannot use the AWS Console.
There is no charge for data transfer between EC2 and Glacier in the same region.
There is a charge if you delete data within 90 days.
- When you restore you pay for:
- The Glacier archive .
- The requests .
- The restored data on S3
AWS Elastic Beanstalk:-
Beanstalk is an easy-to-use service for deploying and scaling web applications
and services developed with Java, .NET, PHP, Node.js, Python, Ruby, Go, and
Docker on familiar servers such as Apache, Nginx, Passenger, and IIS.
You can simply upload your code and Elastic Beanstalk automatically handles the deployment, from capacity provisioning, load balancing, auto-scaling to application health monitoring. At the same time, you retain full control over the AWS resources powering your application and can access the underlying resources at any time.
There is no additional charge for Elastic Beanstalk – you pay only for the AWS resources needed to store and run your applications.
To increase the performance of reading a huge number of files in S3 bucket.
a – use sequential date-based naming (Old method)
b – Horizontally scale parallel requests to the Amazon S3 service endpoints
Horizontal Scaling and Request Parallelization for High Throughput
Amazon S3 is a very large distributed system. To help you take advantage of its scale, we encourage you to horizontally scale parallel requests to the Amazon S3 service endpoints. In addition to distributing the requests within Amazon S3, this type of scaling approach helps distribute the load over multiple paths through the network.
For high-throughput transfers, Amazon S3 advises using applications that use multiple connections to GET or PUT data in parallel. For example, this is supported by Amazon S3 Transfer Manager in the AWS Java SDK, and most of the other AWS SDKs provide similar constructs. For some applications, you can achieve parallel connections by launching multiple requests concurrently in different application threads, or in different application instances. The best approach to take depends on your application and the structure of the objects that you are accessing.