Important to know
| Concept | Description |
| HA - High Availability | Running Aplication in 2 different AZs. Goal is to survive data center loss |
| Static IP --> NLB, static DNS --> ALB |
| CloudFormation | Beanstalk and AWS SAM rely on this. |
| AWS Lambda | The amount of memory also determines the amount of virtual CPU available to a function. |
| Security groups and Network ACLs | Security Groups are stateful, so allowing inbound traffic to the necessary ports enables the connection. Network ACLs are stateless, so you must allow both inbound and outbound traffic |
| For Auto Scaling groups in a VPC | the EC2 instances are launched in subnets |
| AWS Certificate Manager and IAM | IAM is used as a certificate manager only when you must support HTTPS connections in a Region that is not supported by ACM. IAM securely encrypts your private keys and stores the encrypted version in IAM SSL certificate storage. IAM supports deploying server certificates in all Regions, but you must obtain your certificate from an external provider for use with AWS. You cannot upload an ACM certificate to IAM. Additionally, you cannot manage your certificates from the IAM Console. |
| Beanstalk and resources in .ebextensions | Any resources created as part of your .ebextensions is part of your Elastic Beanstalk template and will get deleted if the environment is terminated. |
| EC2 user data | By default, user data runs only during the boot cycle when you first launch an instance. Can be edited to run on each restart |
| Resource policy | can grant API access in one AWS account to users in a different AWS account by using Signature Version 4 (SigV4) protocols. |
| Which security credential can only be created by the AWS Account root user? | CloudFont Key Pairs, can have only 2 per account. Is used for signed urls |
| Which solution will guarantee that any upload request without the mandated encryption (SSE-S3) is not processed? | Invoke the PutObject API operation and set the x-amz-server-side-encryption header as AES256. Use an S3 bucket policy to deny permission to upload an object unless the request has this header |
| Which solution will guarantee that any upload request without the mandated encryption (SSE-KMS) is not processed? | Invoke the PutObject API operation and set the x-amz-server-side-encryption header as aws:kms. Use an S3 bucket policy to deny permission to upload an object unless the request has this header |
| Beanstalk custom configuration files location | .ebextensions/mysettings.config |
| Exported Output Values in CloudFormation must have unique names within a single Region |
| Which is the only resource-based policy that the IAM service supports? | Trust policies, define which principal entities (accounts, users, roles, and federated users) can assume the role. |
| users are unable to see AWS Billing and Cost Management service in the AWS console. | You need to activate IAM user access to the Billing and Cost Management console for all the users who need access |
| SAM template which are mandatory | Tansform is mandatory |
| which cloudformation resource can not be used in the conditionals section | parameters |
| SAM mandatory in template | resources and transform |
| At which gp2 volume size will their test environment hit the max IOPS? | 5.3 TiB |
| EBS is AZ locked |
| step function - required in template | resource |
| max iops for EBS | The maximum ratio of provisioned IOPS to requested volume size (in GiB) is 50:1. So, for a 200 GiB volume size, max IOPS possible is 200*50 = 10000 IOPS. |
| S3 buckets replication | Same-Region Replication (SRR) and Cross-Region Replication (CRR) can be configured at the S3 bucket level, a shared prefix level, or an object level using S3 object tags. S3 lifecycle actions are not replicated with S3 replication. |
| AWS glue | is a serverless data integration service that simplifies the process of discovering, preparing, and combining data for analytics, machine learning, and application development |
| ecs cluster does the updates in the maincluster, but the updates were intended for secondcluster. whats missing? | In the ecs.config file you have to configure the parameter ECS_CLUSTER='your_cluster_name' to register the container instance with a cluster named 'your_cluster_name' |
| immutable deployment type | only available for beanstalk NOT for EC2/ECS |
| ECS - terminated container stays in the cluster as a resource | if you terminated the container instance while it was in STOPPED state, that lead to this synchronization issues |
| ALB - you need to capture the client's IP address | Use the header X-Forwarded-For |
| aws ec2 monitor-instances --instance-ids i-1234567890abcdef0 | This enables detailed monitoring for a running instance. |
| his S3 bucket uses server-side encryption with AWS KMS managed keys (SSE-KMS) as the default encryption. Using the access key ID and the secret access key of the IAM user, the application received an access denied error when calling the PutObject API. | Correct the policy of the IAM user to allow the kms:GenerateDataKey action |
| S3 - Query String Authentication | customers can create a URL to an Amazon S3 object which is only valid for a limited time. Using query parameters to authenticate requests is useful when you want to express a request entirely in a URL. This method is also referred as presigning a URL. |
| You are not authorized to perform this operation. Encoded authorization failure message: 6h34GtpmGjJJUm946eDVBfzWQJk6z5GePbbGDs9Z2T8xZj9EZtEduSnTbmrR7pMqpJrVYJCew2m8YBZQf4HRWEtrpncANrZMsnzk. Which of the following actions will help developers decode the message? | AWS STS decode-authorization-message |
| Need to know sizes | environment variables max size is 4 KB. sqs messages max size is 256 KB. KDS max througput 10MB/second. kws max size is 4 kb |
| correct CF commands | cloudformation package and cloudformation deploy |
IAM - Identity and Access Management
IAM - users and groups
- mutiple users, user can have mutiple groups
IAM - cross-account permissions
Account A has a service account B would like to access:
To grant cross-account permissions, you need to attach an identity-based permissions policy to an IAM role. For example, the AWS account A administrator can create a role to grant cross-account permissions to AWS account B as follows:
- The account A administrator creates an IAM role and attaches a permissions policy—that grants permissions on resources in account A—to the role.
- The account A administrator attaches a trust policy to the role that identifies account B as the principal who can assume the role.
- The account B administrator delegates the permission to assume the role to any users in account B. This allows users in account B to create or access queues in account A.
IAM - access keys
- Access generated through AWS console (Access key ID & secret access key).
- Users can generate them. IAM —> user —> security credentials
IAM - resource policy
- can grant API access in one AWS account to users in a different AWS account by using Signature Version 4 (SigV4) protocols.
IAM - Security Groups
- control how traffic is allowed into or outo EC2.
- Can only contain ALLOW rules. Can reference by IP or other security groups.
- All inbound traffic is blocked, all outbound traffic is allowed by default.
If timeouts occure SG issue! if connection refused, apl problem!
- can be attached to mutiple EC2
- Locked down to region/VPC
- by default inbound traffic is blocked, outbound is allowed
Regulates:
- access to ports
- authorised IP ranges
- control inbound and outbound Network
IAM - Policies
- A policy is an object in AWS that, when associated with an identity or resource, defines their permissions.
- can be attached to users, groups or resources/AWS services
- used for managing permissions, can be applied to users (inline), groups, roles
classDiagram
Policy : **Version**, 2012-10-17
Policy : **Id**
Statement : **Sid**, Opt.
Statement : **Effect**, Allows or denies
Statement : **Principal**, required in some instances, account/user/role
Statement : **Action**, list of actions
Statement : **Resource**, to which the action applies
Statement : **condition**, Opt., when this P is in effect
Policy --> Statement
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "FirstStatement",
"Effect": "Allow",
"Action": ["iam:ChangePassword"],
"Resource": "*"
},
{
"Sid": "SecondStatement",
"Effect": "Allow",
"Action": "s3:ListAllMyBuckets",
"Resource": "*"
},
{
"Sid": "ThirdStatement",
"Effect": "Allow",
"Action": [
"s3:List*",
"s3:Get*"
],
"Resource": [
"arn:aws:s3:::amzn-s3-demo-bucket-confidential-data",
"arn:aws:s3:::amzn-s3-demo-bucket-confidential-data/*"
],
"Condition": {"Bool": {"aws:MultiFactorAuthPresent": "true"}}
}
]
}
source: https://docs.aws.amazon.com/IAM/latest/UserGuide/access_policies.html
IAM - Roles
- Some AWS services will need to perform actions on your behalf. To do so, we will assign permissions to AWS Services
IAM - Access Analyzer
- helps you identify the resources in your organization and accounts, such as Amazon S3 buckets or IAM roles, that are shared with an external entity. This lets you identify unintended access to your resources and data, which is a security risk.
- USER LVL - shows permissions granted to a user & when those services where last accessed.
- simplifies inspecting unused access to guide you toward least privilege. Security teams can use IAM Access Analyzer to gain visibility into unused access across their AWS organization and automate how they rightsize permissions. When the unused access analyzer is enabled, IAM Access Analyzer continuously analyzes your accounts to identify unused access and creates a centralized dashboard with findings. The findings highlight unused roles, unused access keys for IAM users, and unused passwords for IAM users. For active IAM roles and users, the findings provide visibility into unused services and actions.
IAM - IAM credentials rapport
- ACCOUNT LVL - a report that lists all you account’s users & status of their various credentials
IAM - IAM access advisor
- an online tool that provides you real-time guidance to help you provision your resources following AWS best practices on cost optimization, security, fault tolerance, service limits, and performance improvement.
IAM - shared responsibility model
- AWS is responsible for all infrastructure. User is responsible for how you use infra (enable MFA, rotate keys)
EC2 - Elastic Compute Cloud
-
infra as a service
-
Network
-
Computing
-
Memory
-
Storage
!If you restart your EC2 instance instance IP addrss will change!
m5.2XLARGE
- (m) - instance class
- (5) - generation
- (2XLARGE) - size within instance class
EC2 - User Data
- Runs only during boot cycle when you first launch an instance
- can also be enabled to be run on every restart
- bootstraps instance using EC2 user data script
- Runs with root user
EC2 - on restart
- EC2 IP addrss will change
EC2 - burstable instances
-
Traditional Amazon EC2 instance types provide fixed CPU resources, while burstable performance instances provide a baseline level of CPU utilization with the ability to burst CPU utilization above the baseline level.
-
Some policies replace all instances during the deployment or update. This causes all accumulated Amazon EC2 burst balances to be lost. It happens in the following cases:
- Managed platform updates with instance replacement enabled
- Immutable updates
- Deployments with immutable updates or traffic splitting enabled
EC2 - AMI - Amazon Machine Image
- customization of an EC2 instance.
- Add your own software, config, OS, …
- faster boot/config time because all you software is prepackaged.
- Can be build for specific Region. Can be copied across Regions
types:
- public. AWS owned
- own: maintain yourself.
- AWS marketplace: someone else made it or sells it.
EC2 - Instance types
- General purpose - balance btwn everything
- Compute optimized - great for compute-intensive tasks. E.g. batch processes, HPC - High Performance Computing
- Memory optimized - fast perf for workloads that process large data sets. E.g. RDBMS, Cache stores
- Storage - great for storage-intensive tasks. E.g. OLTP
EC2 - purchase options
- On-demand instance —> short workloads, preditable pricing. Pay by hour. Epensive for short term & un-interrupted workloads
- Reserved (1 & 3 years):
- reserved instances - long workloads.
- convertible reserved instances - long workloads with flexible instances.
- Savings plan (1 & 3 years) - commitment to amount of usage, long workloads
- Spot instances - short workloads, cheap, can lose instance (less reliable)
- Capacity reservations - Reserve Capacity in a specific AZ, for any duration. Guarantee availability for a specific instance type and Availability Zone, but don’t guarantee dedicated hardware.
- Dedicated instances - no other customer will share your hardware. Your instance runs on some dedicated hardware. Its not lockdown to you. If you stop/start instance, you can get some other hardware somewhere else. Basically, the hardware is “yours” (you are not sharing it with others) for the time your instance is running. You stop/start it, you may get different physical machine later on (maybe older, maybe newer, maybe its specs will be a bit different), and so on. So your instance is moved around on different physical servers - whichever is not occupied by others at the time.
- Dedicated hosts - Book an entire physical server, control instance placement. The physical server is basically yours. It does not change, it’s always the same physical machine for as long as you are paying.
EC2 - storage
EC2 - EBS - Elastic BlockStore
- Network drive attached to your instance
- uses network to communicate to instance, latency!
- Allow instance to persist data, even after termination
- can only be mounted to one instance at a time
- bound to specific AZ, can be switched to other AZ’s via snapshot
- can be quickly quickly dettached & reattached (to some other instance)
- has provisioned capacity and you need to pay for that
EC2 - EBS - DeleteOnTermination
EC2 - EBS - volume types
- gp 2/3 (SSD) - general purpose -> apl storage
- io 1/2 (SSD) - highest perf. SSD volume -> for mission critical stuff, e.g.: RDBMS
- st1 (HDD) - low cost HDD for freq. accessed workloads. E.g. big data
- sc1 (HDD) - lowest cost HDD for less freq. accessed workloads. E.g. cold storage
only gp & io can be used to boot OS volumes
io supports EBS multi-attach
- attach multiple EBS volumes to multiple EC2 in same AZ.
- each instance has full RW to the volume.
- MAX 16 EC2s at a time.
- must use filestorage that is cluster aware
EC2 - EBS - Snapshots
- Make backup (snapshot) of your EBS volume at point of time
- best practice to detach volume but not needed
- can copy snapshot to other AZ
EC2 - EBS - Snapshot archive
- moves snapshot to an archive tier that is cheaper & takes time to restore (24 to 72 hours).
EC2 - EBS - recycle bin for EBS snapshot
- setup rule to retain deleted snapshots so you can recover them.
Fast Snapshot restore
- force full init of snapshot to have no latency on first use. $$$
EC2 - Instance store
- if you need high performance hardware disk. Better I/O perf
- Is ephemeral, EC2 lose their storage if they are stopped.
- good for buffer/cache/scratch data/temp content.
- Risk of data loss if hardware fails.
- Backups and replication are your responsibility.
EC2 - EFS - Elastic File System
- managed NFS (network)
- can be mounted on many EC2s
- works with EC2 in muli-AZ, can also be set to single AZ (less $)
- HA, Scalable, expensive, pay per use.
- compatible with AWS linux only
- enable encryption at rest using KMS
- FS scales auto, pay-per-use, no capacity running.
EC2 - EFS - modes
- Performance modes
- throughput mode
- Bursting: looks at size of data stored and scales accordingly.
- Provisioned: set throughput regardless of storage size.
- Elastic: auto scale up or down based on work load.
EC2 - EFS - storage classes
- standard : for freq access.
- infreq : cost to retreive file $, lower cost to store $$
- archive: cost to retreive $$$, cost to store $
** make use of lifecycle policies**
The Standard–IA storage class reduces storage costs for files that are not accessed every day. It does this without sacrificing the high availability, high durability, elasticity, and POSIX file system access that Amazon EFS provides. AWS recommends Standard-IA storage if you need your full dataset to be readily accessible and want to automatically save on storage costs for files that are less frequently accessed.
- allow EC2 instances “to learn about themselves” without using an IAM role
- http://169.254.169.254/latest/metadata
- retreive:
- IAM role name but not the policy
- metadata: info about EC2
- userdata: launch script of the EC2 instance
- V2 is more secure, you need to get a session token first before accessing the API
EC2 - AGS - autoscaling Groups
- Scale out/in to match increased/decreased load.
- ensure min and max instances to a LB.
- re-create EC2 instances incase prev. is dead.
AGS - scaling policies
-
Dynamic scaling
- target tracking scaling
e.g. avg CPU needs to stay around 40%
- Simple/step scaling - depending on a CloudWatch trigger. If triggered add 2 units.
-
scheduled scaling
- anticipate scaling on known usage patterns
-
Predictive scaling
- forcast load and schedule scaling ahead
-
Good metrics
- CPU
- requests count/target
- AVG network in/out
AGS - scaling cooldown
- after a scaling activity, you have a cooldown period (def 300 secs — 5 min)
- during this period AWS will not launch or terminate or add instances
- to allow the metric to stabalize —> use ready-to-use AMI to reduce cooldown.
AGS - Scaling refresh
- Goal update launch template and then
- start instance refresh
- rolling update of the new Version
- Specify warmup times
ELB
ELB - Elastic Load Balancer
-
Managed by AWS
-
Integrated with many AWS offerings/services
-
Does health checks on EC2 instances & can reroute traffic.
-
Can be setup as internal or external.
-
can be multi-AZ
-
Classic ELB —> depricated!!
-
Separate public traffic from private traffic
-
If LB has no target groups —> 503 Service unavailable
ELB - Overview
- Aplication LB - HTTP(S), Websockets - L7
- Network LB - TCP, TLS, UDP - L4
- Gateway LB - IP protocol - L3
ELB - ALB - Application LB
- LB to mutiple HTTP Apls. accross machines —> Target groups!
- LB to multiple containers on same machine (containers)
- support for HTTP2 and Websockets.
- supports redirects HTTP —> HTTPS
- routes to different Target Groups (on String params, IP x-forward-for)
ELB - ALB - Access logs
- provides access logs that capture detailed information about requests sent to your load balancer.
- Each log contains information such as the time the request was received, the client’s IP address, latencies, request paths, and server responses.
- You can use these access logs to analyze traffic patterns and troubleshoot issues.
- Access logging is an optional feature of Elastic Load Balancing that is disabled by default.
ELB - NLB - Network LB
- forward TCP and UDP traffic
- Handle million of requests — Highly scalable —
- Ultra-low latency
- Supports one static IP/AZ and supports assigning Elastic IP.
- Health check supports TCP and HTTPS
- Static IP —> NLB —> ALB (so you have all the goodies of the ALB)
ELB - Target Groups
- are EC2 instances
- IP addresses must be private IP’s
Check if SG allows talking to LBS
ELB - Gateway LB
- deploy, scale and manage a fleet of 3rd party network virtual applications in AWS
e.g. firewalls, intrucion detections, deep packet inspection
- uses the geneve protocol on port 6801
- Combines following functions:
- Transparent network gateway: single entry/exit for all traffic
- Load balancer distributes traffic
graph LR
U[User] --1--> GLB
GLB --4--> U
GLB --3--> Aplication
GLB --2 check if traffic is allowed--> 3rd[3rd party security API's]
ELB - Sticky sessions
- same client is always connected to same application
- uses cookie with expiration data.
- May bring imbalances
ELB - Cross-zone LB
- distributes evenly accross all instances
- ALB by default on
- NLB and GLB by default off - also charges for it
graph TD
LBR["LBR - With CrossZone"] --50--> LBA
LBR --50--> LBB
subgraph AZ A
LBA --10--> Apl1
LBA --10--> Apl2
end
subgraph AZ B
LBB --10--> Apl3
LBB --10--> Apl4
LBB --10--> Apl5
LBB --10--> Apl6
LBB --10--> Apl7
LBB --10--> Apl8
LBB --10--> Apl9
LBB --10--> Apl10
end
graph TD
LBR["LBR - without CrossZone"] --50--> LBA
LBR --50--> LBB
subgraph AZ A
LBA --25--> Apl1
LBA --25--> Apl2
end
subgraph AZ B
LBB --6.25--> Apl3
LBB --6.25--> Apl4
LBB --6.25--> Apl5
LBB --6.25--> Apl6
LBB --6.25--> Apl7
LBB --6.25--> Apl8
LBB --6.25--> Apl9
LBB --6.25--> Apl10
end
ELB - Connection draining
- time to complete “in-flight requests” while the instance is de-registering or unhealthy
- stops sending new request tot the EC2 instance
- BTWN 1 to 3600 seconds - default is 300 seconds - 5 minutes
- can be disabled
RDS - Relational Database Service
- Managed database service
- Has auto provisioning, OS patching, continious backups and restart to specific timestamps.
- Monitoring dashboards and read replicas.
- Multi AZ
- Maintenance windows for upgrades
- Scaling capabilties (vertical and horizontal)
- flavours: Postgress, MySQL, MariaDB, Oracle, Microsoft SQL, Aurora (AWS specific)
No SSH excecpt on RDS custom
RDS - storage autoscaling
- increases storage dynamically
- when almost out of storage auto scales
- Set maximum threshold
Auto modify if:
- free storage is less then 10%
- low storage lasts 5 minutes
- and 6 hours has passed since last modification
RDS - Read replicas
- Up to 15 RR
- Within AZ, cross AZ or cross Region
- can be promoted to own DB
- Apl must update connection to leverage read replicas
- no cost same region and cost for data transfer cross region.
RDS - multi AZ
- Sync replication
- one DNS name - automatic failover (when AZ is down) to standby (that is in diff AZ)
- increases HA
- Not used for scaling
- RR can also be setup as multi AZ for disaster recovery
- applies OS updates by performing maintenance on the standby, then promoting the standby to primary, and finally performing maintenance on the old primary, which becomes the new standby
graph TD
subgraph AZ A
MDB[Master DB]
end
subgraph AZ B
SDB[Standby DB]
end
APL --> DNS
DNS <--> MDB
DNS <--> SDB
MDB --Sync--> SDB
How to go from single AZ to multi AZ?
- zero downtime operation
- modify DB
- take snapshot
- new db restored from snapshot in new AZ
- syn btwn DBs
RDS - Aurora
- self-healing + Replication + autoExpanding + autoscaling
- Isolation and security, backtrack restore data to point in time (without using backup)
- cloud optimized
- Storage auto grows in increments of 10 GB, max 128 GB
- 15 RR and replication is faster than MySQL
- Failover is instant and it’s HA native
- has support for cross region replication
- autopatching with zero downtime
- Audit logs can be enabled and optionally send to CW logs for longer retention
Has 6 copies accross 3 AZ
- 4/6 needed for WR.
- 3/6 needed for reads.
- self-healing with P2P replication.
- storage accross 100s of volumes.
RDS - Aurora - endpoints
- Writer endpoint
- alwasy pointing to Master
- Reader endpoint
- LoadBalanced and always connection to RRs.
RDS and Aurora - security
No SSH excecpt on RDS custom
RDS - proxy
- Managed DB proxy
- Allow apps to pool and share DB connections —> improves db effenciency
- Serverless, autoscaling, HA (multi-AZ)
- Supports all flavors
- Nocode changes req.
- enforce IAM authentication for DB and securly store credentials in AWS Secret Manager
e.g. use Proxies for lambda functions, scales up very fast
is never publicly accessible, must be accessed from VPC
scales up very fast
ElasticCache
- RR max 5, non-cluster node
- managed Redis or Memcached
- helps make Apl. stateless
- good for read-heavy and computer heavy applications
EC - Redis
- multi-AZ with auto-failover
- RR scales reads and HA
- Data durability
- backup and restore feature
EC - Memcached
- multi-node for partitioning of data (sharding)
- NO HA
- non persistent
- backup and restore (serverless)
- multi-threaded arch
- designed for simplicity and it does not offer support for advanced data structures and operations such as sort or rank, while redis does.
EC - Cache strategies
Both can be combined
EC - Cache Evictions and TTL
- Cache evictions - item is evected (deleted) if it’s not recently used (LRU, least recently used).
- Cache invalidation - TTL - you set a TTL, to auto delete an item after that time.
Amazon memory DB for redis
- Redis-compatible, durable, in-memory database service
- ultra fast
- durable in-memory data storage with multi-AZ transactional log.
- scales storage
e.g. gaming, web and mobile apps
Amazon route 53
- HA, scalable, managed domain registrar
- authorative DNS
- you can update DNS records yourself
- Ability to check health of your resource
- only aws service which provides 100% SLA
alias record for loadbalacing
- AWS specific has healthcheck and auto recognizes IP addrss changes
- Points a hostname to an AWS resources
- works for root domain
- only A/AAAA not for EC2 DNS name
Route 53 - Record types
- A —> IPV4
- AAAA —> IPV6
- CNAME - hostname A or AAAA
- Only for non root domains!
- can’t create for top node of a DNS!
- NS - NameServer for the hosted zone
Route 53 - Hosted Zones
- Container for records that define how to route traffic to a domain and it’s subdomains
- can have public or private hosted zones.
Route 53 - routing policies
- Simple
- single resource, can respond with mutiple A
- no healthchecks
- weighted
- control % of the req that go to each instance
- yes healthchecks, LB, can assign weights to each resources
- latency
- redirct to resource with least latency
- yes healtchecks, has fail over capabilities
- you need to manually specify where you IPs come from for it to work
- e.g.: IPA is from America, IPB is from Asia
- failover
- if healthchecks fails, route throught this
- mandatory
- yes healtcheck
- Geolocation
- routing based on user location, should create “default”
- e.g. Asia user go to IPA
- yes healthcheck
- GeoProximity
- route traffic bassed on geographic location of users and resources
- IP Based
- Provide CIDR ranges —> location/endpoints
- yes healthchecks
- Multi-value
- when routing to mutiple resources, max 8
- no subsitute for LB!
- yes healthchecks
Route 53 - healthcheck types
-
for automated DNS failover
-
monitor endpoints
-
monitor other healthchecks
- create sum of other healthchecks, max 256
-
monitor cloudwatch alarms - for private VPCs
VPC - Virtual private cloud - regional
- private network to deploy your resources (regional resource)
VPC - subnets
- allows you to partition your network inside your VPC
- AZ resource
- can be public
VPC - VPC endpoints
- You can create an interface VPC endpoint to connect to services powered by AWS PrivateLink, including many AWS services
VPC - route tables
- To define access to internet and subnets
VPC - Internet gateway - IGW
- help connect to internet
- have a route to the internet
- you services needs a public IP addrss to access the internet!
- works both ways, external services can also access
- Internet Gateway is not Availability Zone specific.
VPC - Nat gateway
- AWS managed vs NAT instances (self-managed)
- allow instances in your VPC to access internet while remaining private
- works one way, only internal services can access the internet
- Each NAT gateway is created in a specific Availability Zone and implemented with redundancy in that zone.
graph TD
subgraph VPC
subgraph AZ A
subgraph public subnet
NATG["NAT Gateway"]
end
subgraph private subnet
EC2-->NATG
end
end
subgraph AZ B
subgraph public subnet
NATGB["NAT Gateway"]
end
subgraph private subnet
EC2B["EC2"]-->NATGB
end
end
IGW["Internet Gateway"]
NATG --> IGW
NATGB --> IGW
end
PI["Public Internet"]
PI <--> IGW
VPC - Direct Connect
- direct private connection to AWS
VPC - network security
- NACL - Network Access Control List
- a firewall which
- can have allow and deny rules
- attached at subnet lvl
- rules only include IP addresse
- default allow all
Network ACLs are stateless, so you must allow both inbound and outbound traffic.
To enable the connection to a service running on an instance, the associated network ACL must allow both inbound traffic on the port that the service is listening on as well as allow outbound traffic from ephemeral ports. When a client connects to a server, a random port from the ephemeral port range (1024-65535) becomes the client’s source port.
The designated ephemeral port then becomes the destination port for return traffic from the service, so outbound traffic from the ephemeral port must be allowed in the network ACL.
By default, network ACLs allow all inbound and outbound traffic. If your network ACL is more restrictive, then you need to explicitly allow traffic from the ephemeral port range.
If you accept traffic from the internet, then you also must establish a route through an internet gateway. If you accept traffic over VPN or AWS Direct Connect, then you must establish a route through a virtual private gateway.
VPC - endpoints
- allow you to connect to aws services using private network instead of www
- if gateway endpoint don’t forget to also add the target service to the routing table of the VPC!
- enchances security
VPC - endpoints - types
- interface endpoints
- elastic network interface with a private IP address from the IP address range of your subnet that serves as an entry point for traffic destined to a supported service.
- gateway endpoints
- gateway that you specify as a target for a route in your route table for traffic destined to a supported AWS service.
- if gateway endpoint don’t forget to also add the target service to the routing table of the VPC!
- only supports
- S3 (supports both gateway as interface endpoints)
- DynamoDB
VPC - flow logs
VPC - peering
- connect 2 VPC privately using aws
- behave as if they are in same network
- must not have overlapping CIDR networks (IP ranges)!
- is not transitive
- A —> B and A —> C, B -/-> C —> Does not work!!
S3 - Simple Storage Service
- user defined objects
- metadata - KV - user defined
- begins with X-AMZ-meta-…
- Object tags
- KV, used for fine-grained permissions or Analyse
Cannot be searched, use a RDB!
- buckets must have a globally unique name
- buckets are defined at region lvl
Objects
- have key
- fullpath: s3://my-bucket/prefix-1/prefix-2/my-object-name
- Max object size is 5TB
- if uploading more than 5 GB use multipart upload —> recommended from 100 MB
S3 - Security
- user-based
- resource-based
- bucket-policies
- Object access control list (ACL)
- finer grained - can be disabled
- Bucket access control list (ACL)
- less common - can be disabled
IAM principal can access S3 object if
- the user IAM permissions allows it or the resource policy allows it
- and there is NO explicit DENY
Block public access to prevent data leaks
- can be done on account lvl
- overrides all public access rules
S3 - bucket policy
-
JSON based policies
- resources: buckets and objects
- effect: allow or deny
- actions: set of APIs to allow or deny
- principal: the account or user to apply policy to
-
used for
- grant public access
- force object to be encrypted at upload
- grant access to other account (cross account)
S3 - versioning
- enabled at bucket lvl
- a key will change version: 1, 2, 3
Best practice to version buckets
- protect against unintended deletes
- Easy rollback to prev versions
Note:
- any file that is not versioned prior will have NULL as version
- suspending version does not delete prev. versions
- deleting object only soft deletes it, you need to explicit delete it
S3 - replication
- must enable versioning! And buckets can be in different AWS accounts
- copying is ASYNC
- Cross region replicaiton (CRR) e.g. compliance, lower latency
- Same-region replication (SRR) e.g. log agg., live replication, ..
Must give proper IAM persmissions to S3
Only new Objects are replicated
- can be fixed with S3 batch replication —> wil also fix failed replications
there is no chaining of replication
- B1 -R-> B2(CB1) -R-> B3(CB2)
DELETE version
- can replicate delete markers from source to target - optional, needs to be enabled
- deletion with versionsID are not replicated (to avoid malicious deletes)
S3 - select
S3 Select enables applications to retrieve only a subset of data from an object by using simple SQL expressions
S3 - Storage classes
- durability: how many obj. lost.
- standard - general purpose
- 99,90 %
- freq. accessed data
- low latency & high throughput
- infrequent access
- 99,9%
- lower cost but cost on retrival
- one-zone AZ
- high durability in single AZ but data lost when AZ is destroyed
- glacier
- low cost object. Storage for archiving and backup
- glacier instant retrieval
- min storage duration: 90 days
- great for data accessed once a quarter
- glacier flexible retrieval
- min storage duration: 90 days
- expedited (1 to 5 minutes)
- standard (3 to 5 hours)
- bulk (5 to 12 hours) - free
- glacier deep archive
- longterm storage: min storage duration: 180 days
- standard (12h)
- bulk (48h)
- Intelligent-tiering
- small monthly monitoring and auto-tiering fee
- moves object automatic btwn access tier based on usage
- no retrieval changes
- freq accessed tiers (auto): default tier
- infreq accessed tiers (auto): obj not access for 30 days
- archive instant access (auto): object not accessed for 90 days
- archive access (not mandatory): obj not accessed for 90 days to 700 days
- deep archive access (not mandatory): object not accessed for 180 days to 700 days
S3 - lifecycle rules
- transition actions - move object to storage after x days
- Expiration - config obj to delete after x days
- used for old versions, incomplete multipart files
S3 - event notifications
- generate logs and send to Eventbridge, SNS, SQS, Lambda, DynamoDB
- multipart can be parallized
- S3 transfer Acceleration
- increases speed during upload:
- go to an edge location instead of directly uploading to bucket
- file 100TB -uploads-> edge loc -uploads- S3 bucket
S3 byte range fetch
- specify specific byte range to fetch
- speed up download
S3 - object encryption
- SSE - server side encryption - default
- encrypts S3 objects using keys handled, managed and owned by AWS
- SSE-S3
- encrypts S3 object using keys handled by S3.
- header: “s3:x-amz-server-side-encryption”:“AES256”
- does not support assymetric keys
- SSE-KMS - SSE with KMS keys
- user controls + audits key
- logged in Cloudtrail
- can hit KMS limits!
- header: “s3:x-amz-server-side-encryption”:“aws:kms”
- requires that AWS manage the data key but you manage the customer master key (CMK) in AWS KMS.
- does not support assymetric keys
- SSE-C - SSE with customer provided keys
- sends object with the keys
- use HTTPS
- does not support assymetric keys
- Client side encryption
- fully managed by client
- only option that supports assymetric keys! Other options support symetric or assymetric keys
Encryption in transit
- use HTTPS - mandatory for SSE-C
- you can force HTTPS by bucket policy. Use aws:secureTransport:
S3 - MFADelete
- only delete bucket after MFA confirm
- mandatory versioning of buckets needed
S3 - Access logs
- all req. will be logged to other buckets (same region)
S3 - signed urls
- generated by S3 console, cli, SDK
- has URL expiration
- inherit the permission of the user
S3 - access points policy
- simplify security management for S3 buckets
- has own DNS name and policies to manage at scale
- VPC origin - only accessible from VPC
- VPC Endpoint (policy)
S3 - object lambda
- use lambda before retrieving
- use S3 access point and S3 object lambda access point
S3 - events rule
- can only have one rule / event
- use fan-out to scale (SNS with SQS topics behind or kinese data firehose)
CloudFront
- content delivery network (CDN)
- DDOS protection shield, AWS web application firewall
S3 cross region —> good for dynamic content
CF —> good for static content, has TTL
CloudFront - origins
- CF has different backends to connect to. e.g.:
- S3 bucket - only available via CF using Origin Access control (OAC, edits bucket policy)
- VPC origin - ALB/NLB/EC2 instance
- custom origin - any HTTP service, S3 website
CloudFront - caching
- cache lives at CF edge
- can contorl TTL by using cache-control headers
- CF cache key —> hostname + resource position of URL (/home/test)
- Can add other elements: headers, cookies, query strings by using CF cache policies.
- Can add values to include in origin request but without caching it by using Origin request policies.
- client —> CF Edge —> EC2 (origin)
CloudFront - invalidation
- Can perform partial or full invalidation
CloudFront - cache behaviors
CloudFront - geographic restiction
- can have a allowlist and blocklist that is determined by a 3th party DB
CloudFront - signed URL / signed cookies
CloudFront - pricing
- pricing varies per location
- can do cost reduction by reducing edge location
- ALL $$$
- 200 $$ almost all edge locations, only excluding the most expensive ones
- 100 $ least expensive regions
CloudFront - field level encryption
- protect user sensitive info
- adds additional layer to HTTPS
- uses asynmetric encryption (public/private key)
- Specify which fields to encrypt, max 10 in post request
graph LR
CFE[CloudFront Edge Location
-Encrypt PII with public key-]
Server[Server
-Decrypt PII with private key-]
client --HTTPS--> CFE --HTTPS--> ALB --> Server
CloudFront - realtime logs
- Monitor, analyze and take actions based on delivery performance.
Amazon ECS - Elastic Container Service
- Launch docker container on AWS - launch ECS tasks on a ECS cluster
note
f you terminate a container instance while it is in the STOPPED state, that container instance isn’t automatically removed from the cluster. You will need to deregister your container instance in the STOPPED state by using the Amazon ECS console or AWS Command Line Interface. Once deregistered, the container instance will no longer appear as a resource in your Amazon ECS cluster.
ECS - Launch types
ECS - volumes
- use EFS
- works for EC2 and Fargate
- Tasks running in multiple AZs will share same data
ECS - IAM roles
- For EC2 type only
- EC2 instance profile (EC2 launch type only)
- used by ECS agent
- makes API calls to ECS service
- send container logs to CW logs
- pull docker image from ECR
- pulls from secrets manager or SSM parameter store
- ECS task role
- allow each task to have specific role
- to allow access to different AWS services
- task definition
- use only one role per task!
ECS - service autoscaling
- uses AWS aplication auto scaling
- ECS service avg
- CPU
- Memory
- ALB request count
- target traking
- target value for specific CW metric
- step scaling
- scheduled scaling
- scale on predefined timestamps/intervals
ECS scaling =/= EC2 autscaling
- autoscaling group scaling
- scale you ASG based on CPU
- ECS cluster capacity provider
- auto provision & scale infra
- capacity provider with ASG
- add EC2 when down CPU, RAM, …
ECS - rolling update
- max 4 instances
- min 50% max 100%
- min 100% max 150%
ECS - invoked by Eventbridge
ECS - Tasks
-
JSON like Kubernetes manifest
-
max 10 containers, useful for sidecar application or to share some empherical storage
- you will have a dynamic/random host port if you only define container port NOT host port
-
Fargate
- each task has an unique Private IP
- you only define container port
ECS - task placement strategies
- ECS places container where possible, if it satisfies placement strategy and constraints
- strategies:
- binpack
- Tasks are placed on container instances so as to leave the least amount of unused CPU or memory. This strategy minimizes the number of container instances in use.
- random
- spread
- spread evenly over specified value e.g. instanced AZs
ECS - task constraints
- distinct instance
- place each task on different instance
- member of
- place task on instance that satifies an expression (ClusterQL)
Copilot
- CLI to build, release and operate production ready container apps
- Runs on AppRunner, ECS and Fargate
- manages infra
- Auto deploys with CodePipeline
- deploy to multiple envs
- has troubleshooting, logs, healthstatus
ECR - Elastic Container Registery
- can be private or public
- access controller through IAM
EKS
- managed Kubernets, deploys on EC2 or Fargate type
- alternative to ECS
EKS - nodetypes
- managed node groups
- self-managed node groups
- Fargate
EKS - datavolumes
- Specify storage class
- Leverages a container storage interface (CSI) compliant driver
- has support for EBS, EFS (Fargate),
Beanstalk
- managed all services/(AWS) infra
- uses cloudformation
- helps deploy code via zipfile containg code
- Zip is uploaded to S3 bucket first
Beanstalk - components
- aplication creator
- collection of beanstalk components
- aplication version
- env.
- collection of AWS resources running aplication version
Beanstalk - services
- web service
- Sync
- works with ALB
- standard / default
- worker service
- SQS queue
- scale based on SQS numbers
Beanstalk - deployment modes
- single instance
- HA with LB
Beanstalk - deployment options
- all at once
- rolling
- rolling with additional batches
- To maintain full capacity during deployments, you can configure your environment to launch a new batch of instances before taking any instances out of service.
- immutable
- deploy new ASG and swap when ready to new ASG
- blue/green
- create new env and switch when ready
- In-place Deployment
- The application on each instance in the deployment group is stopped, the latest application revision is installed, and the new version of the application is started and validated. You can use a load balancer so that each instance is deregistered during its deployment and then restored to service after the deployment is complete.
- lifecycle
- Application Stop -> Before Install -> Application Start -> ValidateService
- traffic spliting / canary testing
- send x % to new deploy
- let you perform canary testing as part of your application deployment.
- launches a full set of new instances just like during an immutable deployment.
Beanstalk - lifecycle policy
- max 1000 API versions
- if not removed deploy will not succeed
- use lifecycle policy in time or space to remove versions,
- used versions won’t be deleted
- option to not delete source bundle in S3
Beanstalk - extensions
- all params can be configured via code using files in the zip
- required: .ebextension in your root folder, yaml or json, .config (to modify settings and resources)
- can use CF resources in here
Beanstalk - cloning
- Clone a Beanstalk with exact same config
Beanstalk - migration
- After creating LB it cannot be changed
- need to use migration
- create a new Beanstalk env with needed changes and deploy aplication
- use Route53 to go to new env
- use RDS and the same connection string in Beanstalk
-
declaritive way of declaring infra and resources - a template
-
upload template to S3 <-refrences- CF -creates-> CF Stack (has unique name) -creates-> AWS resources
-
update —> reupload new template to S3
-
deploy
- manually via web
- automated way
- editing yaml templates
- CLI
- AWSTemplateFormatVersion
- Resources - mandatory
- This section specifies the stack resources and their properties, such as an Amazon EC2 instance or an Amazon S3 bucket. Each resource is defined with a unique logical ID, type, and specific configuration details.
- Parameters
- dynamic inputs for your template
- pseudo params
- accountId, Region, StackId
- Mappings
- static variables e.g. env specific vars
- Outputs
- refs to what has been created, values you can import in other Stacks
- use !importValue to use in other templates
- You can use the Export Output Values to export the name of the resource output for a cross-stack reference
- For each AWS account, export names must be unique within a region.
- Conditionals
- contains statements that define the circumstances under which entities are created or configured.
- Conditions cannot be used within the Parameters section. After you define all your conditions, you can associate them with resources and resource properties only in the Resources and Outputs sections of a template.
AWSTemplateFormatVersion: "2010-09-09"
Mappings:
RegionMap:
us-east-1:
AMI: "ami-0ff8a91507f77f867"
us-west-1:
AMI: "ami-0bdb828fd58c52235"
us-west-2:
AMI: "ami-a0cfeed8"
eu-west-1:
AMI: "ami-047bb4163c506cd98"
sa-east-1:
AMI: "ami-07b14488da8ea02a0"
ap-southeast-1:
AMI: "ami-08569b978cc4dfa10"
ap-southeast-2:
AMI: "ami-09b42976632b27e9b"
ap-northeast-1:
AMI: "ami-06cd52961ce9f0d85"
Parameters:
EnvType:
Description: Environment type.
Default: test
Type: String
AllowedValues: [prod, dev, test]
ConstraintDescription: must specify prod, dev, or test.
Conditions:
CreateProdResources: !Equals [!Ref EnvType, prod]
CreateDevResources: !Equals [!Ref EnvType, "dev"]
Resources:
EC2Instance:
Type: "AWS::EC2::Instance"
Properties:
ImageId: !FindInMap [RegionMap, !Ref "AWS::Region", AMI]
InstanceType: !If [CreateProdResources, c1.xlarge, !If [CreateDevResources, m1.large, m1.small]]
MountPoint:
Type: "AWS::EC2::VolumeAttachment"
Condition: CreateProdResources
Properties:
InstanceId: !Ref EC2Instance
VolumeId: !Ref NewVolume
Device: /dev/sdh
NewVolume:
Type: "AWS::EC2::Volume"
Condition: CreateProdResources
Properties:
Size: 100
AvailabilityZone: !GetAtt EC2Instance.AvailabilityZone
Outputs:
OutputLogicalID:
Description: Information about the value
Value: Value to return
Export:
Name: Name of resource to export
- strack creation fails
- default: rollback everything and have a look at the logs
- optional: you can disable the rollback and troubshoot what happens. You can delete it and try again afterwards
- stack update fails:
- rollback failure:
- fix manually and use command/API call
continueRollback
- IAM rols for CF to manage resources
- give ability to users to CRUD resources, even if they don’t have the permissions
- CAPABILITY_NAMED_IAM
- default: delete
- retain
- snapshot: volume, RDS, cache
- JSON doc that defines update actions that are allowed on specific resources during stack update
- when enabled all resources are procted by default
- prevents deletion of you stack
- define resources not yet supported by CF
- run lambda before deletion of S3
- create, update or delete stacks across multiple accounts and regions with a single operation/template
- only admin accounts can create it
SQS - Simple Queue Service
- managed queue service
- you have producers (P) and consumers (C)
- uses polling mechanism
graph LR
P1 --> SQS
P2 --> SQS
P3 --> SQS
SQS --Polling--> C1
SQS --> C2
SQS --> C3
SQS - standard queue
- default retention 4 days, max 14 days
- limit of 256 kb message, use sdk if bigger
- default message visibility timeout is 30 seconds
- default retention of DLQ is 14 days
- can have duplicates, delivery at least once
- can have out of order messages, best effort delivery
SQS - security
- inflight - https
- at rest - KMS
- you can enable KMS for encrypted SQS
- SSE lets you store sensitive data in encrypted queues
- client side encryption, if want to perform encryption/decryption
SQS - access control
- IAM policies
- SQS access policies
- cross account
- for other services to use the sendMessage/ReceiveMessage API
SQS - message visibility timeout
- after message is polled by consumer, it becomes invisible to other consumers
- default is 30 seconds
- consumer can ask for more time, changeMessageVisibility API call
SQS - DLQ
- threshold of how many time a message can go back to queue after maximum retries
- for FIFO queue you can only have a FIFO DLQ
- set retetion of 14 days in DLQ
SQS - DLQ - redrive to source
- after fixing Aplication redrive DLQ message back to the normal queue
SQS - delay queue
- delay message max of 15 minuten before consumer can consume It
- can be overriden on sending of the message
SQS - short polling
- enabled by default
- Amazon SQS sends the response right away, and only from a subset of servers (based on weighted radom distribution), even if the query found no messages.
- You end up paying more because of the increased number of empty receives.
- short polling if WaitTimeSeconds param is set to 0
SQS - long polling
- wait for message if the are none in the queue
- less resources and latency
SQS - extended client
- max size of 256 kb
- (auto, if via SDK) send larg message via S3, also consumer (auto, if via SDK) retreives message via S3
graph LR
P --1/ send--> S3
P --2/ small metadata message--> Q
Q[Queue] --3/ polls--> C
C --4/ retreives from--> S3
SQS - FIFO
- First In First Out
- guarantee of ordering of messages
- exactly once send capabilities - by using deduplication IDs
- are processed in order
- ordering by MessageGroupId - all message in some group are ordered - not mandatory
SQS - FIFO - advanced topics
- deduplication interval is 5 min - after that duplicates are allowed
- check by
- content SHA-256 cipher
- message deduplication ID
SNS - Simple Notification Service
- one message many receivers - Pub/Sub
- has topics
- event producers only send message to one SNS topic
- event receivers - subscription to one topic
- each subscription gets all message from the topic
SNS - security
SNS - access policies
SNS - fan out
graph LR
S[publisher] --> SNS[SNS topic]
SNS --> SQSA[SQS Q]
SNS --> SQSB[SQS Q]
SQSA --> FS
SQSB --> SS
SNS - FIFO topic
- used for fan-out, ordering and deduplication
- can also use message filtering for subsribers
KDS - Kineses Data Streams
-
collect and store streaming data in real-time
-
1 shard per client
-
has 365 days retention - can’t be deleted
-
useful for lots of small real-time data
-
at rest encryption and HTTPS
-
data ordering guarantee foor data with same partitionID
-
enables real-time processing of streaming big data.
-
It provides ordering of records,
-
as well as the ability to read and/or replay records in the same order to multiple Amazon Kinesis Applications.
-
The Amazon Kinesis Client Library (KCL) delivers all records for a given partition key to the same record processor, making it easier to build multiple applications reading from the same Amazon Kinesis data stream (for example, to perform counting, aggregation, and filtering).
- The KCL helps consume and process data from a Kinesis data stream by handling shard-to-worker relationships. However, the KCL does not address producer issues. Also, the KCL does not have the ability to put records into the stream.
-
Amazon Kinesis Data Streams is recommended when you need the ability to consume records in the same order a few hours later.
For example, you have a billing application and an audit application that runs a few hours behind the billing application. By default, records of a stream are accessible for up to 24 hours from the time they are added to the stream. You can raise this limit to a maximum of 365 days. For the given use-case, Amazon Kinesis Data Streams can be configured to store data for up to 7 days and you can run the audit application up to 7 days behind the billing application.
graph LR
P[Producers] --real-time--> KDS
KDS --real-time--> C[Consumers
e.g. Data Firehose]
KDS - capacity modes
KDA - Kineses Data Analytics
- Amazon Kinesis Data Analytics is the easiest way to analyze streaming data in real-time. You can quickly build SQL queries and sophisticated Java applications using built-in templates and operators for common processing functions to organize, transform, aggregate, and analyze data at any scale. Kinesis Data Analytics enables you to easily and quickly build queries and sophisticated streaming applications in three simple steps: setup your streaming data sources, write your queries or streaming applications and set up your destination for processed data. As Kinesis Data Analytics is used to build SQL queries and sophisticated Java applications.
KDA - Kinesis Agent
Kinesis Agent is a stand-alone Java software application that offers an easy way to collect and send data to Kinesis Data Streams. The agent continuously monitors a set of files and sends new data to your stream. The agent handles file rotation, checkpointing, and retry upon failures. It delivers all of your data in a reliable, timely, and simple manner. It also emits Amazon CloudWatch metrics to help you better monitor and troubleshoot the streaming process.
The agent can also pre-process the records parsed from monitored files before sending them to your stream. You can enable this feature by adding the dataProcessingOptions configuration setting to your file flow. One or more processing options can be added and they will be performed in the specified order.
Data Firehose - DF
- fully managed
- near real time - with buffer which can be flushed and can also be disabled
- is the easiest way to load streaming data into data stores and analytics tools. It can capture, transform, and load streaming data into Amazon S3, Amazon Redshift, Amazon Elasticsearch Service, and Splunk, enabling near real-time analytics with existing business intelligence tools and dashboards you’re already using today. It is a fully managed service that automatically scales to match the throughput of your data and requires no ongoing administration. It can also batch, compress, and encrypt the data before loading it, minimizing the amount of storage used at the destination and increasing security. Kinesis Data Firehose is used to load streaming data into data stores.
graph LR
P[Producers] --max 1mb--> DF
DF --batch writer--> C[Consumers
e.g. S3, redshift analytics, opensearch]
DF --> lambda[process data by lambda
optional]
KDS vs DF
- KDS
- streaming data collection
- realtime
- data storage up to 356 days
- write code for C and P
- DF
- autoscaling
- no data storage
- save data to 3rd parties, S3, …
- near real-time
SQS VS SNS VS KDF
- SQS - pull data
- SNS - push data to many subscribers
- KDF - standard pull data, enhanced push data, replay certain date, ETL en realtime big data. Can do custom config with lambda
Apache Flink
- managed service
- realtime data processing Streams
- backups
- does not read from DF only from RDS
CW - CloudWatch
- provides metric and logging for every service
- metric is available to monitor (CPU, usage, RAM, network)
- belongs to namespace
- dimension, is an attribute of a metric, max 30
- have timestamps
- can be filtered (e.g. count of 4xx)
- can be used for alarms
- default EC2 metrics rate is metric every 5 minutes
CW - detailed monitoring
- this will get you a metrics sample rate every minutes
- costs extra $
CW - custom metrics
- define custom metrics with command PutMetricData
- can be added with dimensions
resolution
- standard 1 min
- high - 1/5/10/30 seconds - $$$
CW - logs
- log groups
- can be random or can be name of application
- log stream
- instances within application/container/log files
graph LR
CWL1[CloudWatch logs] --> SF[subscription filter]
CWL2[CloudWatch logs] --> AF[accounting filter]
CWL3[CloudWatch logs] --> UF[users filter]
SF --> KDS[Kinetic Data Stream]
AF --> KDS
UF --> KDS
KDS --> DF[Data Firehose]
DF --> S3
CW - log insights
- can query logs and use filters
CW - log retention policy
- normally never expires
- can be send to S3, RDS, DF, lambda, opensearch
- encrypted by default or setup own keys via KMS
CW - sources
You can export log data from your CloudWatch log groups to an Amazon S3 bucket and use this data in custom processing and analysis, or to load onto other systems.
CW - cross-account subscription
- send log events to a resource in different AWS account
- can also be:
- multi account
- multi region
CW - agent
- collect metrics, logs, and traces with the CloudWatch agent
CW - agent - types
- CW log agent
- is old, only sends logs to CW
- unified agent
- sends metrics and logs to CW
- centralized config using SSM param store
- EC2 by default logs Disck, CPU, Network. Without agent.
CW - alarms
- trigger notifications for any metric
- targets
- CRUD EC2
- upscale/downscale ASG
- send notifications
CW - composite alarms
- monitor multiple alarms
- if AA and AB is true, set alarm to true
CW - synthetic monitoring (canaries)
-
configurable script that monitors your APIs, URLs, websites
-
checks availability and latency of your endpoints and can store loadtime data and screenshots of your UI
-
integrates with CW alarms
-
in node or python, you have access to a headless chrome
-
run once or on a schedule
-
also has blueprints for more commonly used tasks
Eventbridge - EB
-
event pattern: event rules to react to a service doing something. E.g. sign in event —> SNS topic —> email notification
-
can be scheduled: every hour execute a lambda function
-
can be access by other aws accounts using resource base policies
-
default event bus —> aws services
-
partner event —> react on 3rd party events
-
custom event bus —> create your own
-
archive events and replay
EB - schema Registery
- EB can analyze events and infer schema
- schema registeryg generates code for your aplication. So application can know in advance how data is structured
- can be versioned
EB - permissions
- manage persmissions
- e.g. aggegrate events in one account
X-ray
- visual analyse of applications
- troubleshoot performance
- understand dependencies
- leverages tracing —> segements + subsegments
AWS X-Ray helps developers analyze and debug production, distributed applications, such as those built using a microservices architecture. With X-Ray, you can understand how your application and its underlying services are performing to identify and troubleshoot the root cause of performance issues and errors. X-Ray provides an end-to-end view of requests as they travel through your application, and shows a map of your application’s underlying components.
You can use X-Ray to collect data across AWS Accounts. The X-Ray agent can assume a role to publish data into an account different from the one in which it is running. This enables you to publish data from various components of your application into a central account.
X-ray - installation
-
use java, js, … XRaySDK
- which captures
- call to AWS services
- HTTP(S) requests
- database calls
- queue calls
-
Install the X-ray daemon
- give it the right IAM roles!
-
ECS
- EC2 type: a X-ray daemon per instance as side canaries
- Fargate type: as sidecar
X-ray - instrumentation
- measures a product performance, diagnose errors and writes trace info
- Trace —> segments —> subsegments
- sampling 20 %
- can use annotations KV to search trace index
X-ray - sampeling
- To ensure efficient tracing and provide a representative sample of the requests that your application serves, the X-Ray SDK applies a sampling algorithm to determine which requests get traced. By default, the X-Ray SDK records the first request each second, and five percent of any additional requests. X-Ray sampling is enabled directly from the AWS console, hence your application code does not need to change.
- 1 reservoir
- atleast one request is sampled every second
- 5 rate
- five procent at wich additional requests are sampled
X-ray - annotations
Annotations are simple key-value pairs that are indexed for use with filter expressions. Use annotations to record data that you want to use to group traces in the console, or when calling the GetTraceSummaries API.
X-Ray indexes up to 50 annotations per trace.
Cloudtrail
- provides governance, complaince and audit for AWS accounts
- get an history of events and API calls made within account by:
- console
- SDK
- CLI
- AWS services
- put logs from Cloudtrail into CloudWatch logs or s3
- Cloudtrail can be applied to all regions (default) or single region
Cloudtrail - events
- management events
- data events
- e.g. S3 object level activity
- not enabled by default
- Cloudtrail insight events - $
- to detect unusual activity
Lambda - SAAS - FAAS
- Serverless - no servers to manage
- limited by time
- runs on-demand
- scaling is auto
- containers runtime is possible
- but better in ECS or Fargate
- if you increase RAM you also increase VCPU
To expose use ALB with TargetGroup or API gateway - sync invocation
Lambda - authorizer
- Amazon API Gateway Lambda authorizer
- Lambda function that you provide to control access to your API.
- A Lambda authorizer uses bearer token authentication strategies, such as OAuth or SAML.
- Before creating an API Gateway Lambda authorizer, you must first create the AWS Lambda function that implements the logic to authorize and, if necessary, to authenticate the caller
Lambda - async invocations
- S3
- SNS
- CodeCommit
- CodePipeline
- cloudwatch
Lambda - event source mapping
- lambda resource, reads items from stream and queue-based services and invoke functions with batches
- is a sync invocation
Lambda - event source mapping - streams
- iterator for each shard processes items in order
- you can start with new items or beginning or from a certain timestamp
- processed items arent removed
- low traffic uses batch windows
- high parrallel process
- on error retry whole batch
- further processing on shard is stopped
configure source mapping to:
- discard events
- discarded events can go to a destination
- restirct number of retries
- splitch batch on error
Lambda - event source mapping - queue
- does long polling
- setup DLQ on event source mapping
- optional setup destination for discarded events
- support FIFO, shard/active member group
Lambda - event and context objects
- event objects
- contect objects
- metadata about Lambda, function name
Lambda - destinations
- send result to something
- is async for failed and success events
Lambda - edge functions - Lambda@edge
- is a compute service that lets you execute functions that customize the content that Amazon CloudFront delivers.
- change viewer req. and resp.
Lambda - edge functions - triggers
A Lambda@Edge trigger is one combination of a CloudFront distribution, cache behavior, and event that causes a function to execute. For example, you can create a trigger that causes the function to execute when CloudFront receives a request from a viewer for a specific cache behavior you set up for your distribution. You can specify one or more CloudFront triggers.
Viewer request
The function executes when CloudFront receives a request from a viewer, before it checks to see whether the requested object is in the CloudFront cache.
The function doesn’t execute in the following cases:
When fetching a custom error page.
When CloudFront automatically redirects an HTTP request to HTTPS (when the value of the Viewer protocol policy is Redirect HTTP to HTTPS).
Origin request
The function executes only when CloudFront forwards a request to your origin. When the requested object is in the CloudFront cache, the function doesn’t execute.
Origin response
The function executes after CloudFront receives a response from the origin and before it caches the object in the response. Note that the function executes even if an error is returned from the origin.
The function doesn’t execute in the following cases:
When the requested file is in the CloudFront cache and is not expired.
When the response is generated from a function that was triggered by an origin request event.
Viewer response
The function executes before returning the requested file to the viewer. Note that the function executes regardless of whether the file is already in the CloudFront cache.
The function doesn’t execute in the following cases:
When the origin returns an HTTP status code of 400 or higher.
When a custom error page is returned.
When the response is generated from a function that was triggered by a viewer request event.
When CloudFront automatically redirects an HTTP request to HTTPS (when the value of the Viewer protocol policy is Redirect HTTP to HTTPS).
When you add multiple triggers to the same cache behavior, you can use them to run the same function or run different functions for each trigger. You can also associate the same function with more than one distribution.
Lambda VPC
- lambda launches by default in an AWS managed VPC
- for lambda in a VPC
- define VPCId, Subnets and security groups
- lambda will create a Elastic network interface
- does not have access to public internet
- use VPC endpoints to access internal services
Lambda - execution context
-
temp. runtime env to init any external dependency
-
is reused for subsequent Lambdas
-
to reuse usage context btw lamba’s move the init of e.g. a database connection, outside of the function handler
- so outher subsequent lambda’s can reuse the database connection
-
/tmp stays between Lambdas
- max size 10 GB
- dir contents remains when execution context is frozen
- great for troubleshooting
Lambda - layers
- create custom runtimes e.g. C++, Rust
- Externalize deps to re-use them
Lambda - FS mounting
- config Lambda to mount FS to local storag during init
Lambda - env vars
Lambda - concurrency and throtteling
- reserved concurrency
- max limit of concurrent Lambda’s
- concurrency limit of 1000
- shared over ALL functions
- want more? open support ticket
- will throttle if over limit
- sync - throttle error - 429
- async - reruns auto or goes to DLQ
Lambda - provisioned concurrency
- Cold start —> Init can take time
- concurrency is allocated before function is invoked
- managed by apl. ASG
Lambda - dependencies and containers
- add dependencies to code in zip uploaded to S3
- max 10 GB ECR
- must implement Lambda runtime API
Lambda - versions
- default $latests
- can publish a version e.g. V1
- immutable
- increasing numbers
- get their own ARN (amazon resource name)
- version = config + code
Lambda - aliases
- points to Lambda fucntion version or own ARN (unique name)
- can have names like “dev”, “test”, “prod”
- can enable canary dev with this
- You can use routing configuration on an alias to send a portion of traffic to a Lambda function version
- can only alias other versions NOT other aliases!
Lambda - function URL
- unique and dedicated URL for Lambda function
- never changes
- public only
- supports resource based policies
- supports CORS config
- can be applied to aliases
- throttled by reserverd concurrency
Lambda - function URL security
- use invokeFunctionURL to allow traffic to cross accounts
- same account : identity based policy || resource based policy has a allow
- cross account : identity based policy && resource based policy has both allows
Lambda - limits
- execution has a max of 128 mb until 10 GB, more memory more CPU
- increase with 1 mb increments
- env vars are max 4 kb
- /tmp max 10 GB
- concurrent execution 1000 (can be increase with support ticket)
- deploy:
- deploy size max 50 mb compressed zip
- uncompressed (code + deps) 250 mb
- env vars 4kb
CodeDeploy
- fully managed deployment service that automates software deployments to a variety of compute services such as Amazon EC2, AWS Fargate, AWS Lambda, and your on-premises servers.
- easier for you to rapidly release new features, helps you avoid downtime during application deployment, and handles the complexity of updating your applications.
- It can deploy an application to an instance but it cannot provision the instance.
- automate traffic shift for Lambda aliases
- integrated within SAM framework
- uses appspec.yml which contains:
- name
- alias
- current version
- target version
- lifecycle hooks:
- order: DownloadBundle => BeforeInstall => ApplicationStart => ValidateService
- BeforeInstall – Use to run tasks before the replacement task set is created. One target group is associated with the original task set. If an optional test listener is specified, it is associated with the original task set. A rollback is not possible at this point.
- AfterInstall – Use to run tasks after the replacement task set is created and one of the target groups is associated with it. If an optional test listener is specified, it is associated with the original task set. The results of a hook function at this lifecycle event can trigger a rollback.
- AfterAllowTestTraffic – Use to run tasks after the test listener serves traffic to the replacement task set. The results of a hook function at this point can trigger a rollback.
- BeforeAllowTraffic – Use to run tasks after the second target group is associated with the replacement task set, but before traffic is shifted to the replacement task set. The results of a hook function at this lifecycle event can trigger a rollback.
- AfterAllowTraffic – Use to run tasks after the second target group serves traffic to the replacement task set. The results of a hook function at this lifecycle event can trigger a rollback.
- ValidateService – This is the last deployment lifecycle event. It is used to verify the deployment was completed successfully.
CodeDeploy - deploy types
- all at once
- linear
- canary
- pre and post traffic hooks to check the health
CodeDeploy - CodeDeploy Agent
The CodeDeploy agent is a software package that, when installed and configured on an instance, makes it possible for that instance to be used in CodeDeploy deployments. The CodeDeploy agent archives revisions and log files on instances. The CodeDeploy agent cleans up these artifacts to conserve disk space. You can use the :max_revisions: option in the agent configuration file to specify the number of application revisions to the archive by entering any positive integer. CodeDeploy also archives the log files for those revisions. All others are deleted, except for the log file of the last successful deployment.
CodeDeploy - Deployment Groups
You can specify one or more deployment groups for a CodeDeploy application. The deployment group contains settings and configurations used during the deployment. Most deployment group settings depend on the compute platform used by your application. Some settings, such as rollbacks, triggers, and alarms can be configured for deployment groups for any compute platform.
In an EC2/On-Premises deployment, a deployment group is a set of individual instances targeted for deployment. A deployment group contains individually tagged instances, Amazon EC2 instances in Amazon EC2 Auto Scaling groups, or both.
Dynamo DB - DDB
Max size of 400kb
DDB - API
DDB - primary key options
- partition key (HASH)
- unique for each item
- diverse enought to be distributes
- DynamoDB uses the partition key’s value as input to an internal hash function, to determine the partition
- partition key + sort key (HASH + RANGE)
- composite primary key
- unique
- data is grouped by partition key and sorted by sort key
- All items with the same partition key value are stored together, in sorted order by sort key value.
- can have same partition key but differen sort Key
- In a table that has a partition key and a sort key, it’s possible for multiple items to have the same partition key value. However, those items must have different sort key values.
- source: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.CoreComponents.html
DDB - R/W nodes types
- provisioned
- on-demand
- you can only switched types once 24 hours
DDB - capacity units
- can be exeeded temporary by using burst capacity.
- if that’s not enough you can have ProvisionedThroughputExceedException PTEE
DDB - capacity units - write
- Write Capacity Unit - WCU
- 1 WCU - write capacity unit
- items * (item kb / 1) = result WCU
- always round to next full kb
- 1 w/second for 1 kb item
DDB - capacity units - read
- Read Capacity Unit - RCU
- always round to next full 4 kb
- eventual consistent read (def)
- (items/2) * (item kb / 4) = result
- strong consistent read
- get correct data immediately after write
- items * (item kb / 4) = result
DDB - partition internals
- parition keys go through algo to know to which partition to go
- so WCU and RCU are spread evently across partitions
- 10 paritions | 10 wcu and rcu = 1 cu/partition
DDB - partition internals - hot keys
- if you get PTEE you have hot keys
- one partition has too many reads
- very large items
- exp back off
- distributes PKs
- use DAX, a DDB accalerator
DDB - PartialQL
- SQL compatible
- CRUD compatible
- no joins
DDB - indexes
- Local Secondary Index (LSI) —> use RCUs and WCUs of maintable
- is a alternative sort key —> one scalar atrribute
- max 5/table
- defined at table creation time
- can contain some or all attributes
- Global Secondary Index (GSI)
- alternative primary key —> only scalar attributes
- speed-up queries
- can contain some or all attributes
- must provision WCU and RCU
- if this index is throttled the main table/primary key will also be throttled
- The key values in a global secondary index do not need to be unique
DDB - DAX
- managed, HA, seamless in-memory cache for DDB.
- saves fetched items
- solves hot key problem
- 5 min TTL for cache
- max 10 nodes
- multi-AZ
- secure via KMS
- can be used with ElasticCache
- ElasticCache stores calculations, sum, min, max, …
DDB - streams
- ordered stream of items-level modifications (CUD)
- send to
- KDS
- Lambda - sync - use event source mapping
- kineses client lib
- data retention max 24 hours
- opensearch, send login mail on creation user
- managed by AWS, shards auto
DDB - transactions
- like RDBMS, to be ACID
- items * (item kb / 1) = result * 2
DDB - write-sharding
- add suffix if primary key is not unique enough.
DDB - global tables
If you have globally dispersed users, consider using global tables. With global tables, you can specify the AWS Regions where you want the table to be available. This can significantly reduce latency for your users. So, reducing the distance between the client and the DynamoDB endpoint is an important performance fix to be considered.
API gateway
API gateway - API caching
- API Gateway provides a few strategies for optimizing your API to improve responsiveness:
- response caching
- payload compression.
- results in:
- reduction in the number of calls made to endpoint
- improve the latency of requests to your API.
API Gateway - endpoint types
- edge-optimized (def.) - multi AZ
- regional
- private VPC
API Gateway - security
- IAM roles
- cognito
- custom authentication
API Gateway - deployment stages
- is a logical reference to a lifecycle state of your API (for example, ‘dev’, ‘prod’, ‘beta’, ‘v2’). API stages are identified by API ID and stage name. You use a stage to manage and optimize a particular deployment.
- For example, you can configure stage settings to enable caching, customize request throttling, configure logging, define stage variables, or attach a canary release for testing. After the initial deployment, you can add more stages and associate them with existing deployments. You can use the API Gateway console to create a new stage, or you can choose an existing stage while deploying an API. In general, you can add a new stage to an API deployment before redeploying the API.
- making changes are not live
- make deployment
- changes are deployed to stages, can have many
- history is kept so rollback can be done
API Gateway - stage variables
- like env variables but for API Gateway
- passed to context in lambda
API Gateway - integration types
- mock
- HTTP/AWS
- config request and response
- setup data mapping between request and response
- AWS proxy / lambda proxy
- request clients is passed to lambda. Function is responsible
- HTTP proxy
- request is passed to backends
- can add API key
API Gateway - mapping templates
- modify request / response
- modify query string, body, headers, …
- uses VTL, Vellocity Template Language
- content can be marshalled to JSON/XML
API Gateway - open API spec
request validation
- can perform basic validation with help of OpenAPI or without
API Gateway - usage plan and API keys
- usage plan
- client can only access certain API’s
- can be used in conjuction with API keys
API Gateway - HTTP API
- low-latency cost-effenciency
- supports OIDC and OAuth 2.0, usage plans and API keys
API Gateway - Rest API
- all same features except OAuth2.0 and OpenID Connect
CICD
- CodeCommit
- storing code —> Github
- IAM username and password - IAM username and password credentials cannot be used to access CodeCommit.
- CodePipeline
- CICD
- automates the build, test, and deploy phases of your release process every time there is a code change.
- CodeBuild
- building and testing code
- fully managed continuous integration service that compiles source code, runs tests, and produces software packages that are ready to deploy
- CodeBuild timeouts
- By setting the timeout configuration, the build process will automatically terminate post the expiry of the configured timeout.
- CodeDeploy
- deploy to EC2 =/= beanstalk
- has an agent for EC2
- deploy types:
- all at once
- 50/50
- 1 at a time
- blue/green
- CodeStar
- manage software development activities in one place
- CodeArtifact
- store, publish and share software packages
CodeBuild - encrypted artifacts
- specify KMS to encrypt the artifacts with
- to encrypt its build output artifacts, it needs access to an AWS KMS customer master key (CMK).
- CODEBUILD_KMS_KEY_ID
Codeguru
- auto review MR with Machine Learning
- static code analyzer
- profiler
Serverless Application Model - SAM
- framework for developing and deploying serverless applications
- yaml generates cloudformation template
- uses CodeDeploy, lambda, API Gateway, DynamoDB
graph LR
SAM[SAM +
application code] --transforms to --> CF[CF template +
application code]
CF --> S3
S3 --> CFS[CF Service]
AWSTemplateFormatVersion : '2010-09-09'
Transform: AWS::Serverless-2016-10-31
Description: |
An example RESTful service
Resources:
ExampleFunction:
Type: AWS::Serverless::Function
Properties:
Runtime: nodejs6.10
Handler: index.handler
Events:
ListCustomers:
Type: Api
Properties:
Path: /
Method: any
SAM - accelerate
- reduce latency while deploying a resource
SAM - Sync
SAR - Serverless Application Repository
- a managed repository for serverless applications
- enables teams, organizations, and individual developers to store and share reusable applications, and easily assemble and deploy serverless architectures in powerful new ways.
- on’t need to clone, build, package, or publish source code to AWS before deploying it. Instead, you can use pre-built applications
AWS Cloud Development Kit - CDK
- define cloud infra using programming Language
- contains high level components called constructs
- will be compiled to CloudFormation template
- steps to take
- Create the app from a template provided by AWS CDK -> Add code to the app to create resources within stacks -> Build the app (optional) -> Synthesize one or more stacks in the app -> Deploy stack(s) to your AWS account
CDK - constructs
- L1
- CFN resources
- all resources available in CF
- L2
- as L1 but with convient defaults and less boilerplate
- L3
- patterns
- mutiple related resources complete common tasks
Cognito
- way to give users identity to interact with applications
Cognito - sync
- an AWS service and client library that enables cross-device syncing of application-related user data.
- You can use it to synchronize user profile data across mobile devices and the web without requiring your own backend.
Cognito - User Pools - CUP
- serverlesses DB of users for APIs
- simple login, password reset, email and phone verification
- federated IDs: Facebook, Google, …
- can invoke lambda function on triggers/hooks
- authentication events
- sign-up, pre-sign up
- token creation
Cognito - Identity Pools
- Temporary AWS credentials
- obtained through Stacks
- can access throught AWS or API gateway
- IAM policies appliead are definied in Cognito
- use policy variables to partition your user access
- default IAM roles for authenticate and guest users
- define rules to choose role to define userID
- can verify via:
- public providers, Facebook, Google, …
- cognito pool (CUP)
- openID and SAML
- customer login server
- allows unauthenticated access / guest access
Step functions
- model workflows as state machines
- e.g.
- invoke AWS services
- run on activity
Step functions - states
- choice
- fail || succeeded
- pass: pass input to output || inject fixed data
- wait
- map
- parrallel
Step functions - error handeling
- Retry
- catch
- states
- ALL, matches any error
- timeout, no heartbeat received
- task failed, execution failed
- persmissions
Step functions - wait for task token - push based
- wait in execution for specific token, can be done manually or auto (EC2)
- PUSH based
Steps functions - activity tasks - pull/poll based
- enable task to be executed by activity worker
- can be apps running on EC2 or Lambda
- af completion it send a sendTask success || sendTask failure
- see if it’s alive with a heartbeat
Step functions - standard vs express
- express
- async
- sync
- at most once
- up to 5 minutes runtime, but much higher execution rate 100’000. Only cloudwatch compatible.
- Standard Workflows on AWS Step Functions are more suitable for long-running, durable, and auditable workflows where repeating workflow steps is expensive
- You should use Express Workflows for workloads with high event rates and short duration* - You should use Express Workflows for workloads with high event rates and short durations. Express Workflows support event rates of more than 100,000 per second.
Amplify
- create mobile and web applications
studio cli, libs, hosting
Appsync
- managed GraphQL
- can include multiple services
- retrieve realtime with Websockets or MQTT
- auth IAM, API keys, OIDC, User Pools
- HTTPS CF
STS - Security Token Service
- grant limited & temporary access to AWS resources (max 1h)
- assume role (with SAML or WebID)
- getsessiontoken
- getfederationtoken
STS - steps
- define IAM role within ACC or cross account
- define which principals can acces this IAM role
- use AWS STS to retrieve credentials and impersonate the IAM role you have access to
- receive temporary credentials
graph LR
IAM --permissions--> STS
user --assume role--> STS[AWS STS]
STS --temporary security credentials--> user
user --> role
STS - steps with MFA
- getsessiontoken
- App IAM policy using IAM Conditions
- AWS:MultiFactorAuthPresent: true
- only users with MFA enabled can use role
IAM policies
-IAM policies and resource policies will both be evaluated
graph LR
start[decision starts at DENY] --> eval[evaluate all policies] --> explicitdeny{explicit deny} --N-->allow{allow?} --N--> denyNo[DENY]
explicitdeny --Y--> denyYes[DENY]
allow --Y--> allowYes[ALLOW]
IAM policies - dynamic policies with IAM
- leverage policy variables
IAM policies - MAD - Microsoft active directory
- DB of objects
- users, accounts, computers
- objects are trees, trees are forests
Key Management Service - KMS
- manages encryption keys for you
- integrated with IAM for authorization
- audit KMS with Cloudtrail
- integrates into most AWS services
- only scoped per region
KMS - CMS - Customer Master Key
- a logical representation of a master key.
- The CMK includes metadata, such as the key ID, creation date, description, and key state. The CMK also contains the key material used to encrypt and decrypt data.
- You can generate CMKs in KMS, in an AWS CloudHSM cluster, or import them from your key management infrastructure.
- supports symmetric and asymmetric CMKs
- supports 3 types:
- customer-managed CMKs
- AWS managed CMKs
- AWS owned CMKs.
KMS - key policies
- default
- complete access to key and to root user
- custom
- define uses, roles, … that can access the KMS keys
KMS - key types
KMS - envelop encryption and rotation of keys
a secure method of protecting sensitive data by using two encryption keys:
- a data encryption key (DEK)
- a key encryption key (KEK).
The data is first encrypted with a DEK, and then the DEK itself is encrypted with a KEK.
you can easily rotate your keys by using the envelop method.
You only need to decrypt the DEKeys en reencrypt those with the new keys.
That way you keep the load low when rotating the keys and it’s possible to do this in a short time.
KMS - snapshot copy
- KMS is only scoped per region
- need todo following to copy EBS volums encrypted with KMS
graph LR
subgraph RA[region A]
direction TB
EBSA[EBS protected with KMS
KeyA] --> EBSSA[EBS snapshot
KeyA]
end
subgraph RB[region B]
direction TB
EBSB[EBS protected with KMS
KeyB] --> EBSSB[EBS snapshot
KeyB]
end
RA --KMS reencrypt with KeyB--> RB
KMS - throttle
- share quota cross-region
- use key caching
- increase quota via API or support ticket
KMS - S3 - SSE-KMS
- decrease cost
- data keys
- is used to encrypt KMS objects
- you can easily rotate keys by reencrypting the data keys
- use bucket keys to encrypt other data keys
- so no KMS apis are used —> less cost and load
KMS - cloud HSM
KMS - SSM parameter store
- SSM - Services System Manager
- secure storage for config and secrets
- can be optionally encrypted with KMS
- is secureString type
- which are parameters that have a plaintext parameter name and an encrypted parameter value. Parameter Store uses AWS KMS to encrypt and decrypt the parameter values of Secure String parameters.
- Also, if you are using customer-managed CMKs, you can use IAM policies and key policies to manage to encrypt and decrypt permissions. To retrieve the decrypted value you only need to do one API call.
- has version tracking
- can have TTL
- can assign multiple policies
KSM - secretsmanager
- newer service - meant for storing secrets
- force rotation of secrets every x days
- auto generation of secrets on rotation uses Lambda
- uses KMS
- mainly used for KMS internals
- can be used cross-region
- HA - has read replicas, these can be promoted to write/main replica in case of failure
- To grant permission to retrieve secret values, you can attach policies to secrets or identities.
KMS - Cloudwatch logs
- encryptions is enabled per log group
AWS nitro enclave
- process Highly sensitive data in isolated compute envs.
AWS - lesser know services
”AWS glue”
- is a serverless data integration service that simplifies the process of discovering, preparing, and combining data for analytics, machine learning, and application development
AWS Macie
Amazon Macie is a data security service that discovers sensitive data by using machine learning and pattern matching, provides visibility into data security risks, and enables automated protection against those risks. To help you manage the security posture of your organization’s Amazon Simple Storage Service (Amazon S3) data estate, Macie provides you with an inventory of your S3 buckets, and automatically evaluates and monitors the buckets for security and access control. If Macie detects a potential issue with the security or privacy of your data, such as a bucket that becomes publicly accessible, Macie generates a finding for you to review and remediate as necessary.
Macie also automates the discovery and reporting of sensitive data to provide you with a better understanding of the data that your organization stores in Amazon S3. To detect sensitive data, you can use built-in criteria and techniques that Macie provides, custom criteria that you define, or a combination of the two. If Macie detects sensitive data in an S3 object, Macie generates a finding to notify you of the sensitive data that Macie found.
Macie generates a sensitive data finding when it detects sensitive data in an S3 object that it analyzes to discover sensitive data. This includes analysis that Macie performs when you run a sensitive data discovery job and when it performs automated sensitive data discovery.
For the given use case, you can use Macie to analyze the output of the daily batch job and look for any sensitive data findings of type SensitiveData:S3Object/Financial which implies that the S3 object contains financial information, such as bank account numbers or credit card numbers.
AWS CodeStar
AWS CodeStar enables you to quickly develop, build, and deploy applications on AWS. AWS CodeStar provides a unified user interface, enabling you to easily manage your software development activities in one place. With AWS CodeStar, you can set up your entire continuous delivery toolchain in minutes, allowing you to start releasing code faster. AWS CodeStar makes it easy for your whole team to work together securely, allowing you to easily manage access and add owners, contributors, and viewers to your projects. Each AWS CodeStar project comes with a project management dashboard, including an integrated issue tracking capability powered by Atlassian JIRA Software. With the AWS CodeStar project dashboard, you can easily track progress across your entire software development process, from your backlog of work items to teams’ recent code deployments.