From Zero to Cloud: My Personal Journey into AWS (2026) - A path I am following
Introduction
If you are reading this, you are probably feeling overwhelmed.
You've heard that "The Cloud" is the future. You've opened the AWS console, seen 300+ services, and felt that sinking feeling.
Where do I even start?
This is my personal notebook. I documented every single lesson, mistake, and "Aha!" moment from my journey. I realized that to be a cloud engineer, I couldn't just click buttons—I needed to understand the Hidden Details.
This is the path I am following. I hope it helps you too.
Note for the Community: I wrote this guide to give back to the AWS Community that helped me. My goal is to simplify the complex and help others cross the "Chasm of Confusion".
The Roadmap

I broke my journey down into 30 Learning Modules. Click to jump to the section.
- Module 1: Linux Kernel Internals
- Module 2: Advanced Shell Scripting
- Module 3: Git Internals
- Module 4: Networking Physics
- Module 5: DNS & Traffic Flow
- Module 6: IAM Basics
- Module 7: EC2 Deep Dive
- Module 8: Storage Basics
- Module 9: VPC Networking
- Module 10: Database Engines
- Module 11: Terraform Internals
- Module 12: Serverless Architecture
- Module 13: Docker Internals
- Module 14: EKS Internals
- Module 15: CI/CD Pipelines
- Module 16: Networking at Scale
- Module 17: Security & Encryption
- Module 18: Observability
- Module 19: Cost Optimization
- Module 20: Career Strategy
- Module 21: Transit Gateway & PrivateLink
- Module 22: Kinesis Data Streams
- Module 23: Serverless Data
- Module 24: SageMaker AI
- Module 25: Disaster Recovery
- Module 26: Chaos Engineering
- Module 27: Hybrid Networking
- Module 28: Identity Federation
- Module 29: Generative AI & Bedrock
- Module 30: CI/CD & DevOps Strategy
I had a lot of work to do.
Phase 1: The Operating System (Weeks 1-3)
Module 1: Linux Kernel Internals
You cannot build a cloud if you don't understand the server it runs on.
1. Inode Basics
Every file is just a number to the computer.
ls -i: See the Inode number (the file's true ID).df -i: Check Inode usage.- Note: You can run out of inodes even if you have 50GB of disk space left.
- Hard Link vs Symlink:
ln source dest: Hard Link. Shares the same Inode (same data). If you delete the source, the link still works.ln -s source dest: Soft Link. A pointer to a path. If you move the source, the link breaks.
2. Process Basics
/proc: The virtual filesystem.cat /proc/meminforeads kernel memory stats directly.- Zombies: Processes marked
Zintop. They are dead, but the parent hasn't read their exit code yet. - Load Average: Seen in
uptime. It is the number of processes waiting for CPU time, not just CPU percentage.
3. Signal Basics
kill -15(SIGTERM): "Please stop nicely." Allows the app to save data before quitting.kill -9(SIGKILL): "Die immediately." The kernel removes the process instantly. Can cause data corruption.
Module 2: Advanced Shell Scripting
Automate everything.
1. Stream Basics
- STDOUT (1) vs STDERR (2): Learn to use
command > file 2>&1to catch error messages in your log file. |(Pipe): Takes the text output of the left command and pushes it as input to the right command.exit: Every script must end withexit 0(success) orexit 1(failure) so other scripts know what happened.
2. Text Processing Tools
awk '{print $1}': Prints the first column. Essential for extracting Process IDs.sed -i 's/foo/bar/g' file: Replaces text inside a file instantly.grep -r "error" /var/log: Searches for text recursively through folders.
3. Logic Basics
&&(AND): Run the next command only if the first one succeeded.||(OR): Run the next command only if the first one failed.
Phase 2: Version Control (Week 4)
Module 3: Git Internals
The "Undo Button" for your infrastructure.
1. Object Basics
- The SHA-1: The 40-character unique ID for every commit.
.git/HEAD: A file that contains the pointer to your current branch..gitignore: The file that tells Git to ignore sensitive files (like.envor AWS keys).
2. History Basics
git reflog: The "Time Machine." It tracks every movement of HEAD, allowing you to recover deleted commits.git commit --amend: Modify the previous commit message or add forgotten files without creating a new commit.
3. Workflow Basics
git checkout -b feature/name: Create and switch to a new branch.git merge --squash: Combine all your messy "work in progress" commits into one clean commit before merging to Main.
Phase 3: Networking Fundamentals (Weeks 5-6)
Module 4: Networking Physics
If the network is slow, it's usually one of these things.
1. Packet Basics
- MTU (1500 bytes): The maximum packet size. If a packet is too big and the "Don't Fragment" flag is set, it gets dropped.
- The Tuple: The 5 things that identify a connection: Source IP, Source Port, Dest IP, Dest Port, Protocol.
2. Analysis Tools
tcpdump: Capturing raw packets to see what is actually happening on the wire.netstat -tulpn: Shows which ports are listening (l) and the Process ID (p) attached to them.nc -zv 1.2.3.4 80: Netcat. The fastest way to check if a TCP port is open.
Module 5: DNS & Traffic Flow
The phonebook of the internet.
1. Resolution Basics
/etc/hosts: The file your computer checks before asking a DNS server. Useful for overriding domains locally./etc/resolv.conf: The configuration file that tells Linux which DNS server IP to use (e.g.,8.8.8.8).
2. Record Basics
- A Record: Maps Name to IPv4 (google.com -> 142.250.x.x).
- CNAME: Alias (www.google.com -> google.com).
- Note: You cannot put a CNAME at the "root" domain.
- TTL (Time To Live): How long a resolver caches the answer. If TTL is 24 hours, changing the IP takes 24 hours to propagate.
Phase 4: AWS Core Services (Weeks 7-10)
Module 6: IAM Basics
1. Identity Basics
- Root User: The email you signed up with. It has unlimited power. Lock it away.
- IAM Role: Temporary credentials. EC2 instances assume roles; they do not have passwords.
2. Policy Basics
- Effect: Allow or Deny. (Explicit Deny always wins).
- Action:
s3:ListBucket. - Resource:
arn:aws:s3:::my-bucket/*. - Principal: The entity (User/Service) allowed to use the policy.
Module 7: EC2 Deep Dive
1. Instance Basics
- User Data: A script that runs only on the very first boot. Logs stored in
/var/log/cloud-init-output.log. - Metadata Service:
curl http://169.254.169.254/latest/meta-data/. The instance asks itself for its public IP and Role.
2. SSH Basics
chmod 400 key.pem: If your private key is readable by others, SSH will refuse to use it.~/.ssh/authorized_keys: The file on the Linux server that holds the Public Key.
Module 8: Storage Basics
1. EBS (Block Storage) Details
- IOPS: Input/Output Operations Per Second. Speed.
- Throughput: Megabytes per second. Volume.
- AZ Lock: An EBS volume in
us-east-1acannot be attached to a server inus-east-1b.
2. S3 (Object Storage) Details
- Bucket Policy: A resource-based policy attached directly to the bucket (unlike IAM users).
- Consistency: S3 is now Strongly Consistent. If you write a file and immediately read it, you get the new file.
Module 9: VPC Networking
1. Subnet Basics
- Public Subnet: Has a Route Table entry
0.0.0.0/0 -> Internet Gateway. - Private Subnet: Has a Route Table entry
0.0.0.0/0 -> NAT Gateway.
2. Security Basics
- Security Group: Stateful. Allow Inbound Port 80, the Outbound reply is automatic.
- NACL (Network ACL): Stateless. You must allow Inbound Port 80 and allow the Outbound reply (Ephemeral Ports 1024-65535).
🔬 Hands-On Lab: IOPS Benchmarking
- Launch an EC2.
- Install
fio. - Run a 4K random write test.
- Observe IOPS plateau at 3000 (for gp2) or 16000 (for gp3).
Phase 5: Databases (Weeks 11-12)
Module 10: Database Engines
1. RDS Basics (SQL)
- Multi-AZ: Synchronous standby in a different zone. Use for DR.
- Read Replica: Asynchronous copy. Use for scaling Reads.
2. DynamoDB Basics (NoSQL)
- Partition Key: The hash that determines physical storage location. (e.g., UserID).
- Sort Key: Orders items within the partition. (e.g., Timestamp).
- Consistency: Eventual (Default, fast) vs Strong (Slower, guaranteed).
Phase 6: Infrastructure as Code (Weeks 13-15)
Module 11: Terraform Internals
1. State Basics
terraform.tfstate: The heart of Terraform. Maps code to real world IDs.- State Locking: Use DynamoDB to prevent concurrent
applyruns.
2. Lifecycle Helpers
create_before_destroy: Create the new server before deleting the old one (Zero Downtime).ignore_changes: Tell Terraform to ignore manual changes to specific settings (like Auto Scaling counts).
Phase 7: Serverless (Weeks 16-17)
Module 12: Serverless Architecture
1. Execution Basics
- Cold Start: The latency when AWS spins up a new micro-container. Python is fast (ms), Java is slow (s).
- Timeout: Max 15 minutes. Design for short, bursty tasks.
2. Trigger Basics
- Event Source Mapping: The internal poller (e.g., for Kinesis/SQS) that pulls data and invokes Lambda.
- Direct Invocation: Synchronous calls (e.g., from API Gateway).
Phase 8: Containers (Weeks 18-19)
Module 13: Docker Internals
1. Image Layers
- Layer Caching: Docker caches every build step. Put
COPY . .afterRUN npm installto speed up builds. docker exec: The modern SSH. Enters the running container's namespace.
Module 14: EKS Internals
1. Pods vs Nodes
- Pod: The smallest unit of Kubernetes. One or more containers sharing an IP.
- Node: The EC2 instance running the pods.
2. Sidecars
- Concept: A secondary container in the same Pod (e.g., a logging agent or proxy). They share disk and network.
Phase 9: CI/CD (Week 20)
Module 15: CI/CD Pipelines
1. The Stages
- Build: Compile code, run unit tests, build Docker image.
- Test: Deploy to Staging, run Integration tests.
- Deploy: Production rollout (Terraform Apply, Helm Upgrade).
2. Deployment Strategies
- Blue/Green: Two full environments. Instant switch via DNS. Quick rollback.
- Rolling: Replace 10% of instances at a time. No downtime, but slow.
Phase 10: Advanced Engineering (Weeks 21-22)
Module 16: Networking at Scale
1. Transit Gateway Basics
- Hub and Spoke: Connect 100 VPCs to one Router.
- Route Propagation: Automatically learning routes from attached VPCs.
2. PrivateLink Basics
- Interface Endpoint: An Elastic Network Interface (ENI) in your VPC that connects privately to AWS services (like S3) without using the public internet.
Module 17: Security & Encryption
1. KMS Basics
- Envelope Encryption: KMS encrypts a "Data Key." The Data Key encrypts your file.
- Key Rotation: AWS automatically rotates the backing key material every year.
Module 18: Observability
1. CloudWatch Basics
- Namespace: The container for your metrics (e.g., "MyApp").
- Dimensions: Filtering metrics (e.g., by InstanceId).
2. Tracing Basics
- X-Ray Trace ID: A unique ID passed in HTTP headers to track a request across Microservices.
Module 19: Cost Optimization
1. Spot Basics
- Spot Instance: Up to 90% off, but AWS can reclaim it with a 2-minute warning.
- Savings Plans: Committing to $X/hour for 1 or 3 years. Flexible across regions.
Module 20: Career Strategy
1. Resume Basics
- Keywords: ATS scanners look for "Terraform," "Kubernetes," "CI/CD," "Python."
- Impact: Don't say "Used AWS." Say "Reduced hosting costs by 20% using Spot Instances."
Phase 11: Networking at Scale (Week 21)
Module 21: Transit Gateway & PrivateLink
1. Transit Gateway (TGW) Details
- Hub & Spoke: TGW acts as a cloud router. You attach VPCs to it.
- Note: TGW has its own Route Table. If you attach a VPC but don't update the TGW Route Table, traffic goes nowhere.
- Route Propagation: TGW can automatically "learn" CIDRs from attached VPCs.
2. PrivateLink Details
- Consumer vs. Provider: The "Provider" (Service Owner) creates a Network Load Balancer (NLB). The "Consumer" (User) creates an Interface Endpoint.
- DNS Names: AWS creates a private DNS name (e.g.,
vpce-123...amazonaws.com) that resolves to a private IP inside your VPC.
Phase 12: Data Streaming & Analytics (Week 22)
Module 22: Kinesis Data Streams
1. The Shard Physics
- Shard: The unit of throughput. User writes -> Shard -> User reads.
- IteratorAge: The most important metric. If
IteratorAge> 0, your consumers are too slow. They are "falling behind" the stream. - Resharding: Merging two cold shards or splitting one hot shard.
2. Kinesis Firehose
- The "Fire Hose": Connects Kinesis Streams to S3, Redshift, or ElasticSearch.
- Buffer: It waits for 5MB of data OR 60 seconds before flushing.
Module 23: Serverless Data
1. Glue (The Crawler)
- Crawler: It looks at your S3 bucket (
/data/year=2023/month=01/) and figures out the schema (CSV, JSON, Parquet). - Data Catalog: The resulting "table definition."
2. Athena (The Query Engine)
- Serverless SQL: You write SQL against S3 files.
- Partitioning: ALWAYS partition by date.
WHERE date = '2023-01-01'scans 1 folder. Without partitioning, it scans Petabytes ($$$).
Phase 13: ML Ops (Week 23)
Module 24: SageMaker AI
1. Training vs. Inference
- Training: Teaching the model (Requires huge GPU for hours/days).
- Inference: Asking the model a question (Requires small CPU/GPU for milliseconds).
- Endpoints: You deploy the model to a SageMaker Endpoint (an HTTPS URL).
2. Cost Optimization
- Spot Training: Use Spot Instances for training jobs. If they die, SageMaker resumes from the last "Checkpoint" in S3. Save 70%.
Phase 14: Resilience & Chaos (Week 24)
Module 25: Disaster Recovery
1. RTO vs. RPO
- RPO (Recovery Point Objective): How much data can you lose? (e.g., "Max 15 minutes").
- RTO (Recovery Time Objective): How long until the site is back up? (e.g., "Max 1 hour").
2. The Strategies
- Backup & Restore: Slowest, cheapest. (RTO: Hours).
- Pilot Light: Database is running, Web Servers are off. (RTO: Minutes).
- Warm Standby: Scaled down version running always. (RTO: Seconds).
- Multi-Region Active/Active: Pure gold standard. (RTO: Near Zero).
Module 26: Chaos Engineering
1. AWS Fault Injection Service (FIS)
- The Experiment: "Terminate 20% of EC2 instances in us-east-1."
- Stop Condition: If "OrderSuccessRate" drops below 95%, STOP the experiment automatically. This is the Safety Button.
Phase 15: Hybrid & Enterprise (Week 25)
Module 27: Hybrid Networking
1. VPN vs. Direct Connect
- Site-to-Site VPN: Encrypted tunnel over the public internet. Fast to set up, but unstable latency.
- Direct Connect (DX): Physical fiber cable from your office to AWS. Consistent latency, high bandwidth, takes weeks to install.
Module 28: Identity Federation
1. SAML 2.0
- Concept: Don't create IAM Users. Use your corporate Active Directory (AD).
- Flow: User logs into AD -> AD gives "SAML Assertion" -> User trades Assertion for AWS STS Temp Credentials.
2. Cognito
- User Pools: Sign up/Sign in (Db of users).
- Identity Pools: Trade a token (Google, FB, User Pool) for temporary IAM creds to access S3 directly.
Phase 17: Generative AI (Week 26)
Module 29: Generative AI & Bedrock
1. Bedrock API
- Unified API: Access Claude, Jurassic, Llama 2 via one API. No servers to manage.
- Tokens: The currency of LLMs. 1000 tokens ≈ 750 words.
2. RAG (Retrieval Augmented Generation)
- The Problem: LLMs hallucinate often.
- The Fix (RAG):
- User asks question.
- Code searches YOUR company PDF docs (Vector DB).
- Code sends "Question + PDF Content" to the LLM.
- LLM answers using only that context.
Phase 18: The Finale (Week 27)
Module 30: CI/CD & DevOps Strategy
1. The Pipeline
- Commit: Developer pushes code.
- Build: Docker build, Unit Tests.
- Beta: Deploy to Staging, Integration Tests.
- Approval: Manual or Automated check.
- Prod: Blue/Green deployment to Production.
2. What this journey taught me
It wasn't about memorizing 200 services. It was about seeing the pattern:
- Everything is an API.
- Everything fails eventually (Design for failure).
- Security is job zero.
- Cost is a first-class citizen.
I started this journey wanting to master AWS. I ended up realizing that "mastery" isn't a destination. It's just the ability to figure things out one error message at a time.
If you found this roadmap helpful, I write about my ongoing learning in the AWS Community Builders program. Connect with me to share your own journey!