Introduction

In the world of Site Reliability Engineering (SRE), we often talk about complex things like Kubernetes clusters or high-performance databases. It is easy to think that basic tasks - like creating a user - are not important anymore.

However, complexity is often where bugs hide, while simplicity is where security fails.

I decided to start the KodeKloud 100 Days of Cloud challenge to check my knowledge of these fundamentals. My goal is to look at every task like an architect. I want to understand why it matters, how it works in a large system, and what happens if we get it wrong.

Here is a look at the first three days. These tasks cover the foundation of system security: Identity and Access Management (IAM).

Day 1: The Service Account Pattern (Non-Interactive Shells)

The Task

Objective: Create a user named mark on App Server 3 with a non-interactive shell.

To a beginner, a "user" means a human with a keyboard. But in a modern data center, most "users" are actually Service Accounts. These are identities created for software - like web servers, databases, or monitoring tools - to run processes.

A dangerous security hole opens up if these service accounts have the same rights as humans. If a hacker finds a way to trick your web server into running their code, and that web server runs as a user with a valid login shell (like /bin/bash), the hacker effectively has a command terminal on your server. They can look around your files and attack other parts of your network.

This is where the Non-Interactive Shell helps us. It is a simple way to add defense in depth.

Execution & Analysis

The command I ran looks simple, but it does something important inside the system:

sudo useradd -s /sbin/nologin mark

By using /sbin/nologin (instead of the standard /bin/bash), we point the user's login to a program that does exactly one thing: it refuses access.

I verified this by checking the /etc/passwd file, which is the system's database of users:

grep mark /etc/passwd
# Output: mark:x:1001:1001::/home/mark:/sbin/nologin

Then I tried to act like an attacker and switch to this user:

sudo su - mark

Result: This account is currently not available.

The system stopped me immediately. The user mark can still own files and run programs, but no human and no script can "log in" as mark. In a Zero Trust setup, this is the standard way to create every service account.

Day 2: Temporary Identities (User Expiry)

The Task

Objective: Create a user named jim on App Server 1 with an expiry date.

The Deep Dive: The "Zombie Account" Risk

A big risk in long-running servers is the "Orphan Account" (or Zombie Account). This happens when we create an account for a contractor or temporary support staff. When their project ends, we often forget to delete the account.

Months later, that forgotten account is still there. Hackers love these because nobody is watching them.

The solution is not to use a spreadsheet to remember who to delete. The solution is Compliance as Code. We set the account to expire automatically when we create it.

Execution & Analysis

I used the -e (expire) flag when creating the user. This tells the system exactly when to lock the account.

sudo useradd -e 2024-01-28 jim

To confirm this worked, I used the chage (Change Age) tool. This tool reads the secure /etc/shadow file where detailed password info is kept.

sudo chage -l jim

The output confirms the rule:

Account expires : Jan 28, 2024

This is a failsafe. Even if we forget to remove access, the operating system itself will block any login attempt on January 29th. It does not matter if they have the correct password. The account itself is dead. This is crucial for passing security audits like SOC2 and ISO 27001.

Day 3: Locking the Door (Secure Root SSH)

The Task

Objective: Secure the root SSH access on App Server 1. Date: Today, December 23rd.

Today's task fixed the most common mistake I see on servers: Permissive Root Access.

Allowing the root user to log in directly over SSH is very risky for three reasons:

Attackers know the name: Every Linux system has a user named root. Hackers don't need to guess the username. They can spend all their time guessing the password.
No Accountability: If three engineers log in as root to fix something, and one makes a mistake, the logs just say "root did it." We have no way to know who actually typed the command.
Spread of Attack: If a hacker takes over root on one machine, they can often use those credentials to take over other machines easily.

The standard engineering rule is Non-Repudiation. This simply means: every action must be tied to a specific human user. You log in as yourself, then use sudo to do admin tasks.

Execution & Analysis

I logged into the server (stapp01) and opened the SSH server configuration file.

sudo vi /etc/ssh/sshd_config

I found the line PermitRootLogin. It was commented out, which means it was using the default setting. In security, explicit (clearly stated) rules are always better than implicit (assumed) ones.

I changed it to:

PermitRootLogin no

Pro Tip: While looking at this file, I recommend checking other safety settings too. A strong production server should also have: PermitEmptyPasswords no: Never allow accounts without passwords. MaxAuthTries 3: Slow down brute-force scripts. AllowGroups ssh-users: Only allow specific groups to log in.

After saving the file, I restarted the service to apply the changes:

sudo systemctl restart sshd

The Verification

The golden rule of remote work applies here: Never close your current window until you test the new one. If I made a mistake, my current window is still open so I can fix it. If I verify it works, I am safe.

I opened a new terminal tab and tried to break in:

ssh root@stapp01

Result: Permission denied (publickey,password).

The server rejected the connection immediately. Now, the root account requires a real user to log in first and escalate privileges. This makes the system much harder to attack.

Frequently Asked Questions (FAQ)

Here are the most common questions engineers ask about these topics.

What is the difference between /sbin/nologin and /bin/false? ▼

Both options stop a user from getting a shell. However, /sbin/nologin is more descriptive. It keeps a log entry of the attempt and returns a specific message ('This account is currently not available') to the user, aiding in auditing and debugging. /bin/false simply exits with an error code.

How do I set an expiry date for an existing Linux user? ▼

Use the usermod command with the -e flag. For example, sudo usermod -e 2024-12-31 username sets the expiry date. Verify it with chage -l username.

How to fix Linux server lockout after disabling root SSH login? ▼

This happens if you disable root login but forget to add a sudo user first. This is why the "Don't Lock Yourself Out" Protocol is so important. Always keep your current session open while you test in a new window. If you do get locked out, you will need to use the web console provided by your cloud provider (like AWS or Azure) to fix it.

Is SSH key authentication more secure than passwords? ▼

Yes, significantly. Passwords are vulnerable to brute-force and theft. SSH keys use complex cryptography that is almost impossible to crack with modern computers. For the best security, you should disable passwords entirely in your config (PasswordAuthentication no) and only use keys.

When should I use useradd vs adduser in Linux? ▼

useradd is a basic, low-level command. It does exactly what you tell it to do, but nothing more (like creating a home directory unless you ask). adduser is a friendly script that asks you questions (like "What is the password?") and sets things up for you. For scripts and automation, use useradd. For manual work, adduser is easier.

Where are Linux user passwords stored safely? ▼

They are stored in the /etc/shadow file. This file is readable only by the root user. The /etc/passwd file contains user info but not the encrypted password hash. This separation keeps passwords secure even if someone can read the user list.

How to force a user to change their password in Linux? ▼

Yes. Account expiry locks the whole account. Password expiry forces the user to change their password. You can set password expiry rules (like "change every 90 days") using the chage command as well, using flags like -M (max days).

What acts as a non-repudiation control in Linux servers? ▼

It means nobody can deny their actions. If everyone logs in as "root", anyone can say "It wasn't me!" If everyone logs in with their own name, the logs prove exactly who ran the command. It creates accountability for every change on the server.

Why is an SSH service restart required after config changes? ▼

Linux services load their configuration files only when they start. If you change a file like sshd_config, the running program doesn't know about it yet. Restarting the service forces it to read the file again and apply your new rules.

Conclusion

These first three days - managing service accounts, setting time limits on users, and locking down root access - are not just "Linux tasks." They are the practical way we apply Least Privilege and Defense in Depth.

Advancing in DevOps isn't just about using new tools. It is about applying architectural thinking to the basic building blocks of the system.

Next up: Day 4: Script Execution Permissions.

Introduction

Day 1: The Service Account Pattern (Non-Interactive Shells)

The Task

The Deep Dive: Why "Login" is a Risk

Execution & Analysis

Day 2: Temporary Identities (User Expiry)

The Task

The Deep Dive: The "Zombie Account" Risk

Execution & Analysis

Day 3: Locking the Door (Secure Root SSH)

The Task

The Deep Dive: Why Root Login is Dangerous

Execution & Analysis

The Verification

Frequently Asked Questions (FAQ)

Conclusion