From Zero to Cloud: My Personal Journey into AWS (2026) - A path I am following

Rantideb Howlader15 min read

Introduction

If you are reading this, you are probably feeling overwhelmed.

You've heard that "The Cloud" is the future. You've opened the AWS console, seen 300+ services, and felt that sinking feeling.

Where do I even start?

This is my personal notebook. I documented every single lesson, mistake, and "Aha!" moment from my journey. I realized that to be a cloud engineer, I couldn't just click buttons—I needed to understand the Hidden Details.

This is the path I am following. I hope it helps you too.

Note for the Community: I wrote this guide to give back to the AWS Community that helped me. My goal is to simplify the complex and help others cross the "Chasm of Confusion".

The Roadmap

AWS Journey Roadmap

I broke my journey down into 30 Learning Modules. Click to jump to the section.

  1. Module 1: Linux Kernel Internals
  2. Module 2: Advanced Shell Scripting
  3. Module 3: Git Internals
  4. Module 4: Networking Physics
  5. Module 5: DNS & Traffic Flow
  6. Module 6: IAM Basics
  7. Module 7: EC2 Deep Dive
  8. Module 8: Storage Basics
  9. Module 9: VPC Networking
  10. Module 10: Database Engines
  11. Module 11: Terraform Internals
  12. Module 12: Serverless Architecture
  13. Module 13: Docker Internals
  14. Module 14: EKS Internals
  15. Module 15: CI/CD Pipelines
  16. Module 16: Networking at Scale
  17. Module 17: Security & Encryption
  18. Module 18: Observability
  19. Module 19: Cost Optimization
  20. Module 20: Career Strategy
  21. Module 21: Transit Gateway & PrivateLink
  22. Module 22: Kinesis Data Streams
  23. Module 23: Serverless Data
  24. Module 24: SageMaker AI
  25. Module 25: Disaster Recovery
  26. Module 26: Chaos Engineering
  27. Module 27: Hybrid Networking
  28. Module 28: Identity Federation
  29. Module 29: Generative AI & Bedrock
  30. Module 30: CI/CD & DevOps Strategy

I had a lot of work to do.

Phase 1: The Operating System (Weeks 1-3)

Module 1: Linux Kernel Internals

You cannot build a cloud if you don't understand the server it runs on.

1. Inode Basics

Every file is just a number to the computer.

  • ls -i: See the Inode number (the file's true ID).
  • df -i: Check Inode usage.
    • Note: You can run out of inodes even if you have 50GB of disk space left.
  • Hard Link vs Symlink:
    • ln source dest: Hard Link. Shares the same Inode (same data). If you delete the source, the link still works.
    • ln -s source dest: Soft Link. A pointer to a path. If you move the source, the link breaks.

2. Process Basics

  • /proc: The virtual filesystem. cat /proc/meminfo reads kernel memory stats directly.
  • Zombies: Processes marked Z in top. They are dead, but the parent hasn't read their exit code yet.
  • Load Average: Seen in uptime. It is the number of processes waiting for CPU time, not just CPU percentage.

3. Signal Basics

  • kill -15 (SIGTERM): "Please stop nicely." Allows the app to save data before quitting.
  • kill -9 (SIGKILL): "Die immediately." The kernel removes the process instantly. Can cause data corruption.

Module 2: Advanced Shell Scripting

Automate everything.

1. Stream Basics

  • STDOUT (1) vs STDERR (2): Learn to use command > file 2>&1 to catch error messages in your log file.
  • | (Pipe): Takes the text output of the left command and pushes it as input to the right command.
  • exit: Every script must end with exit 0 (success) or exit 1 (failure) so other scripts know what happened.

2. Text Processing Tools

  • awk '{print $1}': Prints the first column. Essential for extracting Process IDs.
  • sed -i 's/foo/bar/g' file: Replaces text inside a file instantly.
  • grep -r "error" /var/log: Searches for text recursively through folders.

3. Logic Basics

  • && (AND): Run the next command only if the first one succeeded.
  • || (OR): Run the next command only if the first one failed.

Phase 2: Version Control (Week 4)

Module 3: Git Internals

The "Undo Button" for your infrastructure.

1. Object Basics

  • The SHA-1: The 40-character unique ID for every commit.
  • .git/HEAD: A file that contains the pointer to your current branch.
  • .gitignore: The file that tells Git to ignore sensitive files (like .env or AWS keys).

2. History Basics

  • git reflog: The "Time Machine." It tracks every movement of HEAD, allowing you to recover deleted commits.
  • git commit --amend: Modify the previous commit message or add forgotten files without creating a new commit.

3. Workflow Basics

  • git checkout -b feature/name: Create and switch to a new branch.
  • git merge --squash: Combine all your messy "work in progress" commits into one clean commit before merging to Main.

Phase 3: Networking Fundamentals (Weeks 5-6)

Module 4: Networking Physics

If the network is slow, it's usually one of these things.

1. Packet Basics

  • MTU (1500 bytes): The maximum packet size. If a packet is too big and the "Don't Fragment" flag is set, it gets dropped.
  • The Tuple: The 5 things that identify a connection: Source IP, Source Port, Dest IP, Dest Port, Protocol.

2. Analysis Tools

  • tcpdump: Capturing raw packets to see what is actually happening on the wire.
  • netstat -tulpn: Shows which ports are listening (l) and the Process ID (p) attached to them.
  • nc -zv 1.2.3.4 80: Netcat. The fastest way to check if a TCP port is open.

Module 5: DNS & Traffic Flow

The phonebook of the internet.

1. Resolution Basics

  • /etc/hosts: The file your computer checks before asking a DNS server. Useful for overriding domains locally.
  • /etc/resolv.conf: The configuration file that tells Linux which DNS server IP to use (e.g., 8.8.8.8).

2. Record Basics

  • A Record: Maps Name to IPv4 (google.com -> 142.250.x.x).
  • CNAME: Alias (www.google.com -> google.com).
    • Note: You cannot put a CNAME at the "root" domain.
  • TTL (Time To Live): How long a resolver caches the answer. If TTL is 24 hours, changing the IP takes 24 hours to propagate.

Phase 4: AWS Core Services (Weeks 7-10)

Module 6: IAM Basics

1. Identity Basics

  • Root User: The email you signed up with. It has unlimited power. Lock it away.
  • IAM Role: Temporary credentials. EC2 instances assume roles; they do not have passwords.

2. Policy Basics

  • Effect: Allow or Deny. (Explicit Deny always wins).
  • Action: s3:ListBucket.
  • Resource: arn:aws:s3:::my-bucket/*.
  • Principal: The entity (User/Service) allowed to use the policy.

Module 7: EC2 Deep Dive

1. Instance Basics

  • User Data: A script that runs only on the very first boot. Logs stored in /var/log/cloud-init-output.log.
  • Metadata Service: curl http://169.254.169.254/latest/meta-data/. The instance asks itself for its public IP and Role.

2. SSH Basics

  • chmod 400 key.pem: If your private key is readable by others, SSH will refuse to use it.
  • ~/.ssh/authorized_keys: The file on the Linux server that holds the Public Key.

Module 8: Storage Basics

1. EBS (Block Storage) Details

  • IOPS: Input/Output Operations Per Second. Speed.
  • Throughput: Megabytes per second. Volume.
  • AZ Lock: An EBS volume in us-east-1a cannot be attached to a server in us-east-1b.

2. S3 (Object Storage) Details

  • Bucket Policy: A resource-based policy attached directly to the bucket (unlike IAM users).
  • Consistency: S3 is now Strongly Consistent. If you write a file and immediately read it, you get the new file.

Module 9: VPC Networking

1. Subnet Basics

  • Public Subnet: Has a Route Table entry 0.0.0.0/0 -> Internet Gateway.
  • Private Subnet: Has a Route Table entry 0.0.0.0/0 -> NAT Gateway.

2. Security Basics

  • Security Group: Stateful. Allow Inbound Port 80, the Outbound reply is automatic.
  • NACL (Network ACL): Stateless. You must allow Inbound Port 80 and allow the Outbound reply (Ephemeral Ports 1024-65535).

🔬 Hands-On Lab: IOPS Benchmarking

  1. Launch an EC2.
  2. Install fio.
  3. Run a 4K random write test.
  4. Observe IOPS plateau at 3000 (for gp2) or 16000 (for gp3).

Phase 5: Databases (Weeks 11-12)

Module 10: Database Engines

1. RDS Basics (SQL)

  • Multi-AZ: Synchronous standby in a different zone. Use for DR.
  • Read Replica: Asynchronous copy. Use for scaling Reads.

2. DynamoDB Basics (NoSQL)

  • Partition Key: The hash that determines physical storage location. (e.g., UserID).
  • Sort Key: Orders items within the partition. (e.g., Timestamp).
  • Consistency: Eventual (Default, fast) vs Strong (Slower, guaranteed).

Phase 6: Infrastructure as Code (Weeks 13-15)

Module 11: Terraform Internals

1. State Basics

  • terraform.tfstate: The heart of Terraform. Maps code to real world IDs.
  • State Locking: Use DynamoDB to prevent concurrent apply runs.

2. Lifecycle Helpers

  • create_before_destroy: Create the new server before deleting the old one (Zero Downtime).
  • ignore_changes: Tell Terraform to ignore manual changes to specific settings (like Auto Scaling counts).

Phase 7: Serverless (Weeks 16-17)

Module 12: Serverless Architecture

1. Execution Basics

  • Cold Start: The latency when AWS spins up a new micro-container. Python is fast (ms), Java is slow (s).
  • Timeout: Max 15 minutes. Design for short, bursty tasks.

2. Trigger Basics

  • Event Source Mapping: The internal poller (e.g., for Kinesis/SQS) that pulls data and invokes Lambda.
  • Direct Invocation: Synchronous calls (e.g., from API Gateway).

Phase 8: Containers (Weeks 18-19)

Module 13: Docker Internals

1. Image Layers

  • Layer Caching: Docker caches every build step. Put COPY . . after RUN npm install to speed up builds.
  • docker exec: The modern SSH. Enters the running container's namespace.

Module 14: EKS Internals

1. Pods vs Nodes

  • Pod: The smallest unit of Kubernetes. One or more containers sharing an IP.
  • Node: The EC2 instance running the pods.

2. Sidecars

  • Concept: A secondary container in the same Pod (e.g., a logging agent or proxy). They share disk and network.

Phase 9: CI/CD (Week 20)

Module 15: CI/CD Pipelines

1. The Stages

  • Build: Compile code, run unit tests, build Docker image.
  • Test: Deploy to Staging, run Integration tests.
  • Deploy: Production rollout (Terraform Apply, Helm Upgrade).

2. Deployment Strategies

  • Blue/Green: Two full environments. Instant switch via DNS. Quick rollback.
  • Rolling: Replace 10% of instances at a time. No downtime, but slow.

Phase 10: Advanced Engineering (Weeks 21-22)

Module 16: Networking at Scale

1. Transit Gateway Basics

  • Hub and Spoke: Connect 100 VPCs to one Router.
  • Route Propagation: Automatically learning routes from attached VPCs.
  • Interface Endpoint: An Elastic Network Interface (ENI) in your VPC that connects privately to AWS services (like S3) without using the public internet.

Module 17: Security & Encryption

1. KMS Basics

  • Envelope Encryption: KMS encrypts a "Data Key." The Data Key encrypts your file.
  • Key Rotation: AWS automatically rotates the backing key material every year.

Module 18: Observability

1. CloudWatch Basics

  • Namespace: The container for your metrics (e.g., "MyApp").
  • Dimensions: Filtering metrics (e.g., by InstanceId).

2. Tracing Basics

  • X-Ray Trace ID: A unique ID passed in HTTP headers to track a request across Microservices.

Module 19: Cost Optimization

1. Spot Basics

  • Spot Instance: Up to 90% off, but AWS can reclaim it with a 2-minute warning.
  • Savings Plans: Committing to $X/hour for 1 or 3 years. Flexible across regions.

Module 20: Career Strategy

1. Resume Basics

  • Keywords: ATS scanners look for "Terraform," "Kubernetes," "CI/CD," "Python."
  • Impact: Don't say "Used AWS." Say "Reduced hosting costs by 20% using Spot Instances."

Phase 11: Networking at Scale (Week 21)

1. Transit Gateway (TGW) Details

  • Hub & Spoke: TGW acts as a cloud router. You attach VPCs to it.
    • Note: TGW has its own Route Table. If you attach a VPC but don't update the TGW Route Table, traffic goes nowhere.
  • Route Propagation: TGW can automatically "learn" CIDRs from attached VPCs.
  • Consumer vs. Provider: The "Provider" (Service Owner) creates a Network Load Balancer (NLB). The "Consumer" (User) creates an Interface Endpoint.
  • DNS Names: AWS creates a private DNS name (e.g., vpce-123...amazonaws.com) that resolves to a private IP inside your VPC.

Phase 12: Data Streaming & Analytics (Week 22)

Module 22: Kinesis Data Streams

1. The Shard Physics

  • Shard: The unit of throughput. User writes -> Shard -> User reads.
  • IteratorAge: The most important metric. If IteratorAge > 0, your consumers are too slow. They are "falling behind" the stream.
  • Resharding: Merging two cold shards or splitting one hot shard.

2. Kinesis Firehose

  • The "Fire Hose": Connects Kinesis Streams to S3, Redshift, or ElasticSearch.
  • Buffer: It waits for 5MB of data OR 60 seconds before flushing.

Module 23: Serverless Data

1. Glue (The Crawler)

  • Crawler: It looks at your S3 bucket (/data/year=2023/month=01/) and figures out the schema (CSV, JSON, Parquet).
  • Data Catalog: The resulting "table definition."

2. Athena (The Query Engine)

  • Serverless SQL: You write SQL against S3 files.
  • Partitioning: ALWAYS partition by date. WHERE date = '2023-01-01' scans 1 folder. Without partitioning, it scans Petabytes ($$$).

Phase 13: ML Ops (Week 23)

Module 24: SageMaker AI

1. Training vs. Inference

  • Training: Teaching the model (Requires huge GPU for hours/days).
  • Inference: Asking the model a question (Requires small CPU/GPU for milliseconds).
  • Endpoints: You deploy the model to a SageMaker Endpoint (an HTTPS URL).

2. Cost Optimization

  • Spot Training: Use Spot Instances for training jobs. If they die, SageMaker resumes from the last "Checkpoint" in S3. Save 70%.

Phase 14: Resilience & Chaos (Week 24)

Module 25: Disaster Recovery

1. RTO vs. RPO

  • RPO (Recovery Point Objective): How much data can you lose? (e.g., "Max 15 minutes").
  • RTO (Recovery Time Objective): How long until the site is back up? (e.g., "Max 1 hour").

2. The Strategies

  • Backup & Restore: Slowest, cheapest. (RTO: Hours).
  • Pilot Light: Database is running, Web Servers are off. (RTO: Minutes).
  • Warm Standby: Scaled down version running always. (RTO: Seconds).
  • Multi-Region Active/Active: Pure gold standard. (RTO: Near Zero).

Module 26: Chaos Engineering

1. AWS Fault Injection Service (FIS)

  • The Experiment: "Terminate 20% of EC2 instances in us-east-1."
  • Stop Condition: If "OrderSuccessRate" drops below 95%, STOP the experiment automatically. This is the Safety Button.

Phase 15: Hybrid & Enterprise (Week 25)

Module 27: Hybrid Networking

1. VPN vs. Direct Connect

  • Site-to-Site VPN: Encrypted tunnel over the public internet. Fast to set up, but unstable latency.
  • Direct Connect (DX): Physical fiber cable from your office to AWS. Consistent latency, high bandwidth, takes weeks to install.

Module 28: Identity Federation

1. SAML 2.0

  • Concept: Don't create IAM Users. Use your corporate Active Directory (AD).
  • Flow: User logs into AD -> AD gives "SAML Assertion" -> User trades Assertion for AWS STS Temp Credentials.

2. Cognito

  • User Pools: Sign up/Sign in (Db of users).
  • Identity Pools: Trade a token (Google, FB, User Pool) for temporary IAM creds to access S3 directly.

Phase 17: Generative AI (Week 26)

Module 29: Generative AI & Bedrock

1. Bedrock API

  • Unified API: Access Claude, Jurassic, Llama 2 via one API. No servers to manage.
  • Tokens: The currency of LLMs. 1000 tokens ≈ 750 words.

2. RAG (Retrieval Augmented Generation)

  • The Problem: LLMs hallucinate often.
  • The Fix (RAG):
    1. User asks question.
    2. Code searches YOUR company PDF docs (Vector DB).
    3. Code sends "Question + PDF Content" to the LLM.
    4. LLM answers using only that context.

Phase 18: The Finale (Week 27)

Module 30: CI/CD & DevOps Strategy

1. The Pipeline

  • Commit: Developer pushes code.
  • Build: Docker build, Unit Tests.
  • Beta: Deploy to Staging, Integration Tests.
  • Approval: Manual or Automated check.
  • Prod: Blue/Green deployment to Production.

2. What this journey taught me

It wasn't about memorizing 200 services. It was about seeing the pattern:

  • Everything is an API.
  • Everything fails eventually (Design for failure).
  • Security is job zero.
  • Cost is a first-class citizen.

I started this journey wanting to master AWS. I ended up realizing that "mastery" isn't a destination. It's just the ability to figure things out one error message at a time.

If you found this roadmap helpful, I write about my ongoing learning in the AWS Community Builders program. Connect with me to share your own journey!


Ranti

Rantideb Howlader

Author

Connect