Build Your Own HTTP Server
Most developers live in the application layer. We import libraries like Flask, Express, or Django and trust that "something below" handles the messy details of bytes, wires, and protocols.
Today, we are going to be that "something below."
We're going to build a functional HTTP/1.1 server from scratch using Python's standard socket library. No frameworks. No shortcuts. Just you and the protocol specifications.
Note on HTTP/3: This guide focuses on HTTP/1.1 because it is a text-based protocol that is perfect for learning how the web works. Modern HTTP/3 uses UDP and QUIC, which are much more complex and require specialized libraries. We will cover HTTP/3 in a separate guide.
This guide is designed for beginners to network programming but covers enough detail to be useful for anyone wanting to understand the core mechanics of the web.
The Learning Path
We have broken this journey down into 3 Phases and 18 distinct Levels.
The Raw Socket
Building the Protocol
- Respond with 200 OK
- Return 404 Not Found
- Extract the URL Path
- Implement Basic Routing
- Respond with Content
- Parse Request Headers
- Send Response Headers
Advanced Engineering
- Concurrent Connections
- Parse Query Parameters
- Serve Files from a Directory
- Detect Content Types
- Handle POST Requests
- Connection Keep-Alive
- Support HTTP Compression
- HTTP Pipelining
- Chunked Transfer Encoding
Prerequisites
Before we write a single line of code, you need three things:
- Python 3.x installed: Type
python3 --versionin your terminal. If you see numbers, you are good. - Terminal Access: You need comfort using a command line interface.
- Telnet or Netcat: Tools to manually talk to our server.
- Mac/Linux:
nc(Netcat) is usually installed. - Windows: Enable the "Telnet Client" feature or use PowerShell.
- Mac/Linux:
Create a file named server.py in an empty folder. This will be our workspace.
The Raw Socket
In this phase, we are not building a web server yet. We are building a TCP Server. HTTP is just a language spoken over TCP. First, we must establish the connection.
1. Bind to a Port
Deep Dive: Sockets & Kernels
A Socket is the fundamental building block of network communication. It is an endpoint. When you want to talk to another computer, you create a socket.
To receive calls, a server needs:
- IP Address: "Address 123 Main Street". We will use
127.0.0.1(Localhost), which means "this computer". - Port: "Apartment 8080". An IP address gets you to the machine; the Port gets you to the specific application.
The Operating System (Kernel) manages these ports. When we "Bind", we ask the Kernel: "Please reserve Port 8080 for me. Send any data arriving there to my program."
How the OS Handles Connections
When you call bind(), the OS checks if any other process is using that port. If not, it marks it as "owned" by your process ID (PID).
When you call listen(), the OS creates a Queue for that port. When a client connects, the OS completes the "3-Way Handshake" (SYN, SYN-ACK, ACK) completely independently of your application. The connection sits in the queue until you call accept().
Writing the Server
We will write our first lines of Python. Open server.py.
import socket
# 1. Create a socket
# AF_INET = IPv4 (Internet Protocol v4)
# SOCK_STREAM = TCP (Transmission Control Protocol)
# This creates the endpoint inside the kernel.
server_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
# Optional: Avoid "Address already in use" errors during restart
# This sets a flag (SOL_SOCKET level) to allow reusing the address.
# Without this, if you crash and restart, the OS keeps the port reserved for ~60s.
server_socket.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
# 2. Bind to localhost:8080
# We use a tuple (HOST, PORT).
# 127.0.0.1 means "loopback" - only accessible from this computer.
# 0.0.0.0 would mean "all interfaces" - accessible from the network.
HOST = '127.0.0.1'
PORT = 8080
server_socket.bind((HOST, PORT))
# 3. Listen for connections
# The argument '5' is the "backlog".
# It means: "If 5 people are waiting to connect, reject the 6th person."
server_socket.listen(5)
print(f"[*] Server listening on {HOST}:{PORT}")
while True:
# 4. Accept a connection
# This line BLOCKS. The program pauses here until someone connects.
# It returns a NEW socket specifically for this client, and their address.
# The 'server_socket' remains open to accept MORE people later.
client_socket, client_address = server_socket.accept()
print(f"[+] Accepted connection from {client_address}")
# 5. Hang up immediately
# We are just testing the connection.
client_socket.close()Testing the Connection
Let's prove it works.
-
Run the script:
python3 server.pyYou should see:
[*] Server listening on 127.0.0.1:8080 -
Connect with Netcat: Open a new terminal window.
nc localhost 8080(On Windows, use
telnet localhost 8080or simply open your browser tohttp://localhost:8080). -
Check the Server Output: You should see:
[+] Accepted connection from ('127.0.0.1', 54321)(The port54321will be random - that is the client's ephemeral port). -
Check the Client Output: The client will immediately disconnect (return to command prompt) because our code called
client_socket.close().
Common Pitfalls
- Permission Denied: You cannot bind to ports below 1024 (like 80 or 443) without Administrator/Root privileges. Stick to 8080, 3000, or 5000 for development.
- Address Already in Use: You ran the server, stopped it, and tried to run it again immediately. TCP sockets sometimes stay OPEN in a
TIME_WAITstate to ensure all packets are flushed. Thesetsockoptline in our code fixes this.
2. Read Connection Data
Understanding Data Streams
When a client connects to a web server, they don't remain silent. They immediately send a request.
If we connect and just close() (like in Step 1), the browser considers the connection "Reset by Peer" and shows an error.
Data comes in as a Stream of Bytes. TCP is a streaming protocol, not a message protocol. This means sending "Hello" might arrive as one chunk "Hello", or two chunks "He" + "llo". However, for this simple server, we will assume the browser sends the small HTTP request in one go.
We use recv(buffer_size) to read data. 1024 is a common buffer size (1 Kilobyte).
Reading from the Socket
We will modify our loop to read data before closing.
while True:
client_socket, client_address = server_socket.accept()
print(f"[+] Accepted connection from {client_address}")
# Read 1024 bytes from the socket
# This call also BLOCKS until data arrives.
try:
request_data = client_socket.recv(1024)
# IMPORTANT: If recv returns empty bytes (b''),
# it means the client closed the connection gracefully.
if not request_data:
print("[-] Client closed connection")
client_socket.close()
continue
print("--- Received Raw Request ---")
# Decode raw bytes (b'GET...') into a UTF-8 String
# We assume UTF-8 because HTTP is text.
print(request_data.decode('utf-8'))
print("----------------------------")
except Exception as e:
print(f"Error reading request: {e}")
# Still just hanging up for now
client_socket.close()Verification
Run the server again.
Open your browser (Chrome/Safari) and go to http://localhost:8080.
Your terminal will explode with text. This is the HTTP Request:
GET / HTTP/1.1
Host: localhost:8080
Connection: keep-alive
Cache-Control: max-age=0
sec-ch-ua: "Not_A Brand";v="8", "Chromium";v="120"
sec-ch-ua-mobile: ?0
sec-ch-ua-platform: "macOS"
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36...
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp...
Sec-Fetch-Site: none
Sec-Fetch-Mode: navigate
Sec-Fetch-User: ?1
Sec-Fetch-Dest: document
Accept-Encoding: gzip, deflate, br
Accept-Language: en-US,en;q=0.9
This text is what your browser sends every time you visit a website. It tells the server what file it wants (GET /), what browser it is (User-Agent), and what formats it accepts (Accept).
Speaking Is Hard (The Protocol)
Now that we can hear the client, we must respond. If we send garbage, the browser will show an ERR_INVALID_RESPONSE. We must speak the HTTP Protocol strictly.
3. Respond with 200 OK
HTTP Response Structure
Every HTTP Response must start with a Status Line.
Format: PROTOCOL STATUS_CODE STATUS_TEXT
Example: HTTP/1.1 200 OK
After the status line, you MUST have headers (or at least blank references to them).
The most important rule in HTTP parsing is CRLF (Carriage Return Line Feed).
In Python strings, this is \r\n.
- A line ends with
\r\n. - The Header section ends with a double
\r\n\r\n(a blank line).
Wait, why \r\n?
In the old days of typewriters, \r (Carriage Return) moved the print head to the left, and \n (Line Feed) scrolled the paper up. Network protocols kept this tradition. If you just send \n, strict clients might reject it.
Sending the Response
request_data = client_socket.recv(1024)
print(request_data.decode('utf-8'))
# Constructing a valid HTTP Response
# 1. Status Line: HTTP/1.1 200 OK
# 2. Separator: \r\n
# 3. Headers: (None for now)
# 4. End of Headers: \r\n
# 5. Body: Hello World!
response_body = "Hello World!"
# Check the format carefully:
response = f"HTTP/1.1 200 OK\r\n\r\n{response_body}"
# Send it back. Note: We must ENCODE string to BYTES.
client_socket.sendall(response.encode('utf-8'))
# Close the interaction
client_socket.close()Verification
Visit http://localhost:8080.
The browser will finally stop spinning! It will display "Hello World!".
Inspect Element:
- Right-Click -> Inspect.
- Go to the "Network" tab.
- Refresh the page.
- Click the first request (
localhost). - Look at the "Headers" section. You will see
Status Code: 200 OK.
Congratulations! You have successfully spoken HTTP.
4. Return 404 Not Found
Why 404 Matters
A server that always says "OK" is useless. If I ask for /images/cat.png, and you send me "Hello World", my browser will try to display "Hello World" as an image, fail, and show a broken icon.
We need logic. We need to handle the case where a user asks for a resource that does not exist.
The standard code for this is 404 Not Found.
Handling Unknown Paths
We need to look at what the user asked for.
Recall the request line: GET / HTTP/1.1.
We need to parse this string to find the path (/ or /about).
# Decode the request
request_text = request_data.decode('utf-8')
# Split into lines to find the first line
request_lines = request_text.split('\r\n')
if len(request_lines) > 0:
# Get "GET / HTTP/1.1"
request_line = request_lines[0]
parts = request_line.split(' ')
# Safety check: ensure we have 3 parts
if len(parts) == 3:
method = parts[0] # GET
path = parts[1] # / or /about
protocol = parts[2] # HTTP/1.1
print(f"DEBUG: Client requested {path}")
if path == '/':
content = "<html><h1>Welcome Home</h1></html>"
response = f"HTTP/1.1 200 OK\r\n\r\n{content}"
else:
content = "<html><h1>404 Page Not Found</h1></html>"
response = f"HTTP/1.1 404 Not Found\r\n\r\n{content}"
client_socket.sendall(response.encode('utf-8'))Verification
- Visit
http://localhost:8080/. You see "Welcome Home". - Visit
http://localhost:8080/random-junk. You see "404 Page Not Found". - Visit
http://localhost:8080/admin. You see "404 Page Not Found".
Common Pitfalls
- Favicon: You will often see
DEBUG: Client requested /favicon.icoin your logs. Browsers request this automatically to show the little icon in the tab. Since we don't have that file, we correctly return 404.
5. Extract the URL Path
Handling Complex URLs
URLs are not always simple paths. They can be messy.
Consider: GET /users/profiles/edit?id=123#top HTTP/1.1
- Path:
/users/profiles/edit - Query:
?id=123 - Fragment:
#top(Fragments are usually handled by the browser and not sent to the server, but we should be aware).
We also need to handle URL Encoding.
If a folder has a space, like My Files, the URL will be /My%20Files.
If we try to open the file named /My%20Files, our OS will say "File not found". We must decode %20 back to a space.
Decoding the Path
We will use Python's urllib library to help us. It is part of the standard library.
from urllib.parse import unquote
# ... inside the loop ...
path = parts[1]
# Decode URL (turn %20 into space, %2F into /, etc)
clean_path = unquote(path)
print(f"Requested Path: {clean_path}")Testing URL Decoding
- Restart the server.
- Visit
http://localhost:8080/hello%20world. - Your terminal should print:
Requested Path: /hello world.
If we didn't use unquote, it would have printed /hello%20world.
6. Implement Basic Routing
Why We Need a Router
Writing if path == ... else if path == ... is bad practice. It creates "Spaghetti Code".
Imagine if we had 100 pages. That if statement would be 500 lines long.
Modern frameworks use Routes. We map a specific path string (the key) to a function (the value). When a request comes in, we look up the path in our "Router" (dictionary) and call the matching function.
Building a Routing Dictionary
Let's refactor our code completely to use a routing dictionary.
# Define handler functions first
def index_handler():
return "HTTP/1.1 200 OK\r\n\r\nWelcome to the Index!"
def about_handler():
return "HTTP/1.1 200 OK\r\n\r\nThis is a Python Server."
def blog_handler():
description = "We are learning about sockets."
return f"HTTP/1.1 200 OK\r\n\r\n<h1>Blog</h1><p>{description}</p>"
# Map strings to functions
routes = {
'/': index_handler,
'/about': about_handler,
'/blog': blog_handler
}
# In the request loop:
if clean_path in routes:
# Look up the function and call it with ()
handler_function = routes[clean_path]
response = handler_function()
else:
response = "HTTP/1.1 404 Not Found\r\n\r\nResource not found."
client_socket.sendall(response.encode('utf-8'))This is the primitive ancestor of Flask.route or app.get().
Common Pitfalls
- Trailing Slashes:
/blogand/blog/are different strings. Standard behavior is to either treat them as the same, or redirect one to the other (301 Redirect). For now, our dictionary is strict:/blog/will 404.
7. Respond with Content
Avoiding Hardcoded Headers
So far we have hardcoded headers. This is risky.
If you send a 5MB image but your manual header says Content-Length: 10, the browser will chop off the image after 10 bytes content.
If you send HTML but say it is text/plain, the browser will show raw tags <h1> instead of big bold text.
We need a standardized helper function to build responses. This function should calculate the length automatically.
Automating Response Generation
def build_response(body, status="200 OK", content_type="text/html"):
# Ensure body is bytes (images are already bytes, strings need encoding)
if isinstance(body, str):
body_bytes = body.encode('utf-8')
else:
body_bytes = body
headers = [
f"HTTP/1.1 {status}",
f"Content-Type: {content_type}",
f"Content-Length: {len(body_bytes)}",
"Server: PythonRaw/1.0",
"\r\n" # This empty string ensures the double CRLF at the end
]
# join with \r\n
header_str = "\r\n".join(headers)
# Return header (bytes) + body (bytes)
return header_str.encode('utf-8') + body_bytesNow we can use this helper in our handlers:
return build_response("<h1>Hi</h1>")
8. Parse Request Headers
Extracting Metadata
The browser sends useful metadata below the request line.
User-Agent: What browser/OS is this?Host: Which domain name did the user type? (Critical for virtual hosting).Cookie: Is the user logged in?Accept-Language: Does the user prefer English or Spanish?
This data is formatted as Key: Value, separated by newlines. We should parse this into a Python Dictionary so we can look up values easily.
Parsing to a Dictionary
def parse_headers(request_text):
headers = {}
lines = request_text.split('\r\n')
# Start from index 1 (skip Request Line)
for line in lines[1:]:
if line == "": break # End of headers
try:
# Split only on the FIRST colon
# "User-Agent: Mozilla" -> ["User-Agent", " Mozilla"]
key, value = line.split(':', 1)
headers[key.strip()] = value.strip()
except ValueError:
# Handle malformed lines gracefully
continue
return headers
# Usage inside the loop:
headers = parse_headers(request_text)
print(f"User is using: {headers.get('User-Agent', 'Unknown')}")Checking Browser Data
Run the server.
- Visit with Chrome. Terminal says:
User is using: Mozilla/5.0... - Run
curl localhost:8080. Terminal says:User is using: curl/7.64.1
9. Send Response Headers
Compliant Response Headers
We covered this partially in Step 7, but let's strictly formalize the Required headers for a compliant HTTP/1.1 server.
- Date: When was this response generated? (MUST be in GMT format).
- Server: What software is running? (e.g.
Apache,Nginx... orPythonRaw). - Content-Length: The exact size of the body in bytes.
- Connection: Should we keep talking or hang up? (
closeorkeep-alive).
The Date header is tricky. It must follow a specific text format: Tue, 15 Nov 1994 08:12:31 GMT.
Adding Date and Server Headers
Let's update our build_response to include the Date.
import time
from email.utils import formatdate
def get_current_date():
# Returns standardized HTTP Date format
# usegmt=True is critical. HTTP speaks GMT, not your local timezone.
return formatdate(time.time(), usegmt=True)
# Inside build_response headers list:
headers.append(f"Date: {get_current_date()}")
headers.append("Connection: close") # Being explicitAdvanced Protocol Engineering
We have a working web server. But it has flaws. It is Blocking: If one user is slow, everyone waits. It is Dynamic only: It can't serve files like images. It is Insecure: It has no protections.
Time to make it production-grade.
10. Handle Concurrent Connections
The Concurrency Problem
Currently, our server is Single Threaded.
accept() waits for a client. recv() waits for data.
If User A connects and takes 10 seconds to send their request (a slow mobile network), User B tries to connect but gets stuck in the OS backlog queue. User B thinks the site is down.
To fix this, we need Concurrency. We will use Threads. A thread is a lightweight "worker" process.
- Main thread waits for connections (
accept). - When a client arrives, Main thread creates a new Worker thread.
- Worker thread handles that client.
- Main thread immediately goes back to waiting for the next client.
Threading the Server
We will modify the main loop structure completely.
import threading
def handle_client_connection(client_socket):
# This function contains the logic we wrote in Phase 1 & 2
# Recv -> Parse -> Route -> Respond -> Close
try:
request_data = client_socket.recv(1024)
if request_data:
# ... (Insert Parsing Logic Here) ...
client_socket.sendall(response.encode('utf-8'))
finally:
# Always close the socket, even if code crashes
client_socket.close()
# The Main Loop
while True:
# 1. Accept
client_socket, addr = server_socket.accept()
# 2. Spawn Thread
# target=function, args=(arguments,)
thread = threading.Thread(target=handle_client_connection, args=(client_socket,))
# Daemon means "kill this thread if the main program stops"
thread.daemon = True
# 3. Start it
thread.start()
# 4. Loop immediately back to accept()Testing Parallel Connections
- Add
import timeandtime.sleep(5)to one of your route handlers. - Open two browser tabs.
- Load the slow page in Tab 1.
- Immediately load the normal page in Tab 2.
- Result: Tab 2 should load instantly, even though Tab 1 is still spinning.
If you were single-threaded, Tab 2 would wait 5 seconds.
11. Parse Query Parameters
Anatomy of a Query String
Paths often contain data: /search?q=pizza&sort=price.
The ? marks the start of the Query String.
The & separates pairs.
The = separates keys and values.
We need to extract q=pizza and sort=price so our application logic can use them.
Parsing with urllib
We assume path is /search?q=pizza.
from urllib.parse import urlparse, parse_qs
# Inside route handler
# urlparse breaks the string into components
parsed_url = urlparse(path)
real_path = parsed_url.path # "/search"
query_string = parsed_url.query # "q=pizza&sort=price"
# parse_qs checks the string and returns a dictionary of LISTS
# Why lists? Because: ?filter=red&filter=blue is valid.
params = parse_qs(query_string)
# Result: {'q': ['pizza'], 'sort': ['price']}
# Get the first value or a default
search_term = params.get('q', [''])[0]12. Serve Files from a Directory
Serving Static Assets
Serving hardcoded strings (return "<h1>Hit</h1>") is tedious. We want to serve HTML files, images, and CSS from a folder on our hard drive.
SECURITY ALERT: This is the most dangerous part of writing a server.
A hacker will request: GET /../../../../etc/passwd.
If you blindly join this path to your web root, your server will resolve ../../ and serve sensitive system files (passwords, ssh keys).
This attack is called Directory Traversal or Path Traversal.
We must Sanitize the path.
Safe File Serving
import os
WEB_ROOT = "./public" # Create this folder
def serve_file(client_socket, path):
if path == "/": path = "/index.html"
# 1. Decode URL (%20 -> Space)
path = unquote(path)
# 2. Secure Path Joining
# Remove leading slashes to prevent absolute path confusion
relative_path = path.lstrip('/')
# Join with our web root
file_path = os.path.join(WEB_ROOT, relative_path)
# 3. Resolve absolute path (handles ./ and ../)
absolute_path = os.path.abspath(file_path)
expected_root = os.path.abspath(WEB_ROOT)
# 4. SECURITY CHECK
# Does the resolved path start with our expected root?
if not absolute_path.startswith(expected_root):
# The user is trying to escape the root!
send_response(client_socket, "Access Denied", "403 Forbidden")
return
# 5. Check existence
if os.path.exists(absolute_path) and os.path.isfile(absolute_path):
# Open in BINARY mode ('rb') for images
with open(absolute_path, 'rb') as f:
content = f.read()
send_response(client_socket, content, "200 OK")
else:
send_response(client_socket, "Not Found", "404 Not Found")Verifying Security
- Create a folder
public. - Create
public/index.html. - Request
/index.html. It works. - Request
/../server.py. It should return 403 or 404, NOT your source code.
13. Detect Content Types
Why Content-Type Matters
If you serve a .jpg image but don't set Content-Type: image/jpeg, the browser will try to read it as text. It will look like binary garbage characters.
We must guess the type based on the file extension.
Guessing MIME Types
import mimetypes
# Initialize system mime database
mimetypes.init()
# Inside serve_file function, before opening file:
# guess_type returns ('image/jpeg', encoding)
mime_type, _ = mimetypes.guess_type(absolute_path)
if mime_type is None:
# Fallback for unknown files
mime_type = 'application/octet-stream'
# Pass to build_response
send_response(client_socket, content, "200 OK", content_type=mime_type)Now you can serve CSS, JS, PNG, and PDF files correctly.
14. Handle POST Requests
GET vs POST
GET requests have the data in the URL.
POST requests (like Login forms) put data in the Body, after the headers.
The critical difference: We cannot just data = recv(1024).
Why? The header might say Content-Length: 5000. Our recv(1024) only got the first 1kb.
We must loop and keep receiving until we have all 5000 bytes.
Reading the Body
# Separate headers from body (if we over-read in the first recv)
# We split on the first double-newline
if b'\r\n\r\n' in request_data:
header_part, body_part = request_data.split(b'\r\n\r\n', 1)
# Parse headers to find length
header_text = header_part.decode('utf-8')
headers = parse_headers(header_text)
# Get Content-Length (default to 0)
content_length = int(headers.get('Content-Length', 0))
# We might have already read some of the body in the first packet
body_received = len(body_part)
remaining = content_length - body_received
full_body = body_part
# Keep reading until we have the full body
while remaining > 0:
# Read a chunk
chunk = client_socket.recv(1024)
if not chunk: break
full_body += chunk
remaining -= len(chunk)
print(f"POST Body: {full_body.decode('utf-8')}")15. Connection Keep-Alive
The Cost of Handshakes
Opening a TCP connection involves a "3-Way Handshake". This implies round-trip latency.
Loading a modern webpage requires 50+ files (CSS, JS, Images).
If we close() after every file, we do 50 handshakes. The site will be slow.
Keep-Alive allows the client to reuse the same TCP socket for multiple HTTP requests.
- Browser sends Request 1.
- Server sends Response 1.
- Server Does not close.
- Browser sends Request 2 on same socket.
Looping the Connection
We need to change our thread logic.
Old: Recv -> Respond -> Close
New: Loop (Recv -> Respond)
def handle_client_connection(client_socket):
# Set a timeout. If client is silent for 5s, hang up.
client_socket.settimeout(5.0)
while True:
try:
request_data = client_socket.recv(1024)
if not request_data: break
# ... process and send response ...
# Check if client ASKED to close
if headers.get('Connection') == 'close':
break
except socket.timeout:
# Client didn't send another request in time
break
client_socket.close()16. Support HTTP Compression
Why Compress?
Text files (HTML/CSS/JS) contain a lot of repeated patterns (spaces, tags). Gzip encoding can reduce file size by 70-90%. This makes your site load 3x faster on slow networks.
- Client sends
Accept-Encoding: gzip, deflate. - Server sees
gzip. - Server compresses the body.
- Server adds header
Content-Encoding: gzip. - Browser receives compressed binary and unzips it.
Implementing Gzip
import gzip
def compress_if_possible(body_bytes, headers_dict, client_headers):
# What does the client accept?
accepts = client_headers.get('Accept-Encoding', '')
if 'gzip' in accepts:
# Compress
compressed_body = gzip.compress(body_bytes)
# Add Header so browser knows to unzip
headers_dict['Content-Encoding'] = 'gzip'
return compressed_body, headers_dict
return body_bytes, headers_dict17. HTTP Pipelining
What is Pipelining?
Pipelining is an advanced optimization where a client fires 3 requests (A, B, C) down the wire without waiting for the response to A.
Our recv buffer might look like: GET /A... GET /B... GET /C... all mashed together.
Handling this correctly involves sophisticated buffer management. We must parse one request, remove it from the start of the buffer, process it, and then look at the buffer again.
Most simple servers (like ours) don't support this and just process the first one. That is valid, as Pipelining is optional in HTTP/1.1 (but required/standard in HTTP/2).
18. Chunked Transfer Encoding
Streaming Data
Normally, we send Content-Length. But what if we are generating data on the fly (like a live video stream) and don't know the size yet?
We use Transfer-Encoding: chunked.
We send data in pieces. Each piece has a size header (in Hexadecimal).
Format:
4\r\n (4 bytes coming)
Wiki\r\n (Data)
5\r\n (5 bytes coming)
pedia\r\n (Data)
0\r\n (0 means End of Stream)
\r\n (End)
Sending Chunks
def send_chunk(client_socket, data):
# Size in Hexadecimal + CRLF
size_line = f"{len(data):X}\r\n".encode('utf-8')
client_socket.sendall(size_line)
# Data + CRLF
client_socket.sendall(data)
client_socket.sendall(b"\r\n")
def end_chunks(client_socket):
# The zero chunk
client_socket.sendall(b"0\r\n\r\n")Putting It All Together
We have learned 18 concepts. We have built a router, a static file server, a concurrent engine, and a security system.
Now, here is the final, copy-pasteable artifact.
The Full Code (server.py)
This is the production-ready(ish) code.
import socket
import threading
import os
import mimetypes
import gzip
from urllib.parse import parse_qs, urlparse, unquote
from email.utils import formatdate
import time
# --- Configuration ---
HOST = '127.0.0.1'
PORT = 8080
WEB_ROOT = './public'
DEBUG = True
def log(message):
if DEBUG: print(f"[{time.strftime('%H:%M:%S')}] {message}")
def get_response_headers(status, content_type, length, extra_headers=None):
headers = [
f"HTTP/1.1 {status}",
f"Content-Type: {content_type}",
f"Content-Length: {length}",
f"Date: {formatdate(time.time(), usegmt=True)}",
"Server: PythonRaw/1.0",
"Connection: keep-alive"
]
if extra_headers:
for k, v in extra_headers.items():
headers.append(f"{k}: {v}")
return "\r\n".join(headers).encode('utf-8') + b"\r\n\r\n"
def handle_get(client_socket, path, request_headers):
# 1. Parse URL
parsed = urlparse(path)
clean_path = unquote(parsed.path)
# 2. Router
if clean_path == "/test":
body = b"<h1>Dynamic Page</h1><p>This path is handled by code, not files.</p>"
head = get_response_headers("200 OK", "text/html", len(body))
client_socket.sendall(head + body)
return
# 3. Static File Server
if clean_path == '/': clean_path = '/index.html'
# Security: Prevent Directory Traversal
safe_path = os.path.abspath(os.path.join(WEB_ROOT, clean_path.lstrip('/')))
if not safe_path.startswith(os.path.abspath(WEB_ROOT)):
body = b"<h1>403 Forbidden</h1>"
client_socket.sendall(get_response_headers("403 Forbidden", "text/html", len(body)) + body)
return
if os.path.exists(safe_path) and os.path.isfile(safe_path):
mime_type, _ = mimetypes.guess_type(safe_path)
content_type = mime_type or 'application/octet-stream'
with open(safe_path, 'rb') as f:
content = f.read()
# Compression
extra_headers = {}
if 'gzip' in request_headers.get('Accept-Encoding', ''):
content = gzip.compress(content)
extra_headers['Content-Encoding'] = 'gzip'
client_socket.sendall(get_response_headers("200 OK", content_type, len(content), extra_headers) + content)
else:
body = b"<h1>404 Not Found</h1>"
client_socket.sendall(get_response_headers("404 Not Found", "text/html", len(body)) + body)
def handle_client(client_socket, addr):
client_socket.settimeout(10.0) # Keep-Alive timeout
try:
while True:
try:
# Read Request Line (Naive implementation)
data = client_socket.recv(4096)
if not data: break
request_text = data.decode('utf-8')
lines = request_text.split('\r\n')
request_line = lines[0]
if not request_line: break
parts = request_line.split(' ')
method, path = parts[0], parts[1]
# Parse Headers
headers = {}
for line in lines[1:]:
if line == '': break
parts_h = line.split(':', 1)
if len(parts_h) == 2:
headers[parts_h[0].strip()] = parts_h[1].strip()
log(f"{method} {path} from {addr}")
if method == 'GET':
handle_get(client_socket, path, headers)
else:
body = b"Method Not Allowed"
client_socket.sendall(get_response_headers("405 Method Not Allowed", "text/plain", len(body)) + body)
# Simple Keep-Alive check (real implementations are more complex)
if headers.get('Connection') == 'close':
break
except socket.timeout:
break
except Exception as e:
log(f"Error: {e}")
finally:
client_socket.close()
def start_server():
server = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
server.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
server.bind((HOST, PORT))
server.listen(5)
print(f"🚀 Server listening on http://{HOST}:{PORT}")
print(f"📂 Verify safe serving from: {os.path.abspath(WEB_ROOT)}")
if not os.path.exists(WEB_ROOT):
os.makedirs(WEB_ROOT)
with open(os.path.join(WEB_ROOT, "index.html"), "w") as f:
f.write("<h1>It Works!</h1><p>You built this server from scratch.</p>")
while True:
client, addr = server.accept()
t = threading.Thread(target=handle_client, args=(client, addr))
t.daemon = True
t.start()
if __name__ == "__main__":
try:
start_server()
except KeyboardInterrupt:
print("\nStopping...")How to Run It
-
Open your Terminal (Command Prompt or Terminal app).
-
Create the file:
# Create the python file touch server.py # (Open it in your editor and paste the code above) -
Run the server:
python3 server.py -
Test it in your browser: Open http://localhost:8080
You should see: "It Works! You built this server from scratch."
-
Stop the server: Press
Ctrl + Cin your terminal to quit.
Conclusion
Congratulations. If you tried every step, you didn't just write python code - you implemented a protocol. You rebuilt the engine that powers the World Wide Web.
Next time you use a framework and see app.run(), you'll know exactly what magic it is performing for you.