Networking Fundamentals

Master the networking concepts every cloud engineer needs, from TCP/IP and DNS to VPCs, load balancers, and troubleshooting connectivity issues in production.

Networking Fundamentals

Cloud infrastructure is built on networks. Computing in general is about moving data between servers, databases, and users, all over networks. Going on Youtube from your laptop uses the same general networking technology as running a web server in the cloud. This guide covers the networking fundamentals every cloud engineer, DevOps engineer, and SRE needs to know. You'll learn core protocols, cloud networking concepts, common tools, and practical troubleshooting techniques. There's a lot of information so don't worry about retaining it all, learning networking takes months of study (if not years).


Why Cloud Engineers Need Networking Skills

Cloud platforms abstract away physical hardware but they don't eliminate networking. You still need to:

  • Design secure networks: Configure virtual networks, subnets, security groups, and firewalls to isolate workloads and control traffic flow.
  • Optimize performance: Use CDNs, load balancers, and DNS routing to reduce latency and improve availability. Data doesn't move instantly, geographical proximity really does make a difference.
  • Troubleshoot connectivity: Diagnose firewall rules, DNS failures, SSL certificate issues, and network misconfigurations. If you're running a Windows webserver on the cloud, you'll still need to set up IIS and treat it like a normal server.
  • Implement hybrid architectures: Connect on-premises data centers to cloud environments via VPNs, Direct Connect, or ExpressRoute. There are tons of options, and they all have tradeoffs.

Let's learn how packets flow, how DNS works, and how to debug connection failures.


Core Networking Concepts

The OSI Model and TCP/IP Stack

The OSI model is a 7-layer framework for understanding network communication:

  1. Physical: Cables, signals, hardware.
  2. Data Link: MAC addresses, switches, Ethernet.
  3. Network: IP addresses, routing, routers.
  4. Transport: TCP/UDP, ports, reliability.
  5. Session: Connection management.
  6. Presentation: Encryption, data formatting.
  7. Application: HTTP, DNS, SMTP, APIs.

In practice, most cloud engineers focus on layers 3–7:

  • Layer 3 (Network): IP addresses, subnets, routing tables, VPNs.
  • Layer 4 (Transport): TCP (reliable, connection-based) vs UDP (fast, connectionless).
  • Layer 7 (Application): HTTP/HTTPS, DNS, load balancing, API gateways.

You can see this in load balancers on cloud platforms-- they're usually layer 4 or 7. No such thing as a "data link" or "presentation" load balancer!

IP Addresses and Subnets

Every device on a network has an IP address. Cloud engineers work with two types:

  • IPv4: 32-bit addresses (e.g., 192.168.1.10). Most common, but address space is limited.
  • IPv6: 128-bit addresses (e.g., 2001:0db8::1). Growing adoption in cloud environments. This is a newer protocol but it's not as common as IPv4; make sure you know what you're looking at when you see ::1 or ::.

CIDR notation (pronounced "cider") defines IP ranges using subnet masks. For example:

  • 10.0.0.0/16 = 65,536 IPs (10.0.0.0 to 10.0.255.255)
  • 10.0.1.0/24 = 256 IPs (10.0.1.0 to 10.0.1.255)
  • 10.0.1.0/28 = 16 IPs (10.0.1.0 to 10.0.1.15)

The number of IPs in a subnet is determined by the subnet mask. A quick way of knowing how many IP addresses are available is to do 2^(32 - subnet mask). For example, 10.0.1.0/24 has 2^(32 - 24) = 2^16, or 256, IP addresses. Then subtract 2 for the network and broadcast addresses, leaving you with 254 usable IP addresses.

Cloud VPCs (Virtual Private Clouds) use private IP ranges:

  • 10.0.0.0/8 (AWS, GCP)
  • 172.16.0.0/12 (Azure, AWS)
  • 192.168.0.0/16 (common in home networks)

When designing a VPC, you partition it into subnets-- typically public subnets (internet-facing) and private subnets (internal services).

DNS (Domain Name System)

DNS translates domain names (like cloudjobs.com) into IP addresses. Key DNS record types:

  • A: Maps a domain to an IPv4 address.
  • AAAA: Maps a domain to an IPv6 address.
  • CNAME: Alias for another domain (e.g., www.example.com -> example.com).
  • MX: Mail server records.
  • TXT: Text records (used for domain verification, SPF, DKIM).

This is hard to remember if you haven't done it a few times. If you set up a cheap domain to work on, use it to practice setting up DNS records.

Cloud engineers use DNS for:

  • Load balancing: Route53 (AWS), Azure DNS, Cloud DNS (GCP) distribute traffic across regions.
  • Service discovery: Internal DNS in Kubernetes, Consul, or AWS Cloud Map.
  • Failover: Health checks automatically reroute traffic if a service goes down.

Common DNS troubleshooting:

# Check DNS resolution
nslookup cloudjobs.com
dig cloudjobs.com
host cloudjobs.com

# Trace DNS query path
dig +trace cloudjobs.com

Ports and Protocols

Services listen on specific ports. Common ports cloud engineers encounter:

  • 22: SSH (remote server access), SFTP (file transfer)
  • 25: SMTP (email)
  • 80: HTTP (unencrypted web traffic)
  • 443: HTTPS (encrypted web traffic)

Some less common (but still moderately common) ports:

  • 3306: MySQL
  • 5432: PostgreSQL
  • 6379: Redis
  • 27017: MongoDB
  • 8080: Alternative HTTP port

TCP vs UDP:

  • TCP: Reliable, ordered delivery. Used for HTTP, SSH, databases. Slower due to handshakes.
  • UDP: Fast, no guarantees. Used for DNS, video streaming, VoIP. Lower latency.

Firewalls and Security Groups

Firewalls control traffic flow. In cloud environments, you configure:

  • Security Groups (AWS, GCP): Virtual firewalls for instances. These are stateful, so return traffic is automatically allowed.
  • Network ACLs (AWS): Subnet-level rules. Stateless (must explicitly allow inbound and outbound).
  • NSGs (Azure): Network Security Groups for VMs and subnets.

Example security group rule:

Type: HTTP
Protocol: TCP
Port: 80
Source: 0.0.0.0/0 (allow from anywhere)

Best practices:

  • Least privilege: Only open necessary ports. If you're just serving a web app, you just need ports 80 and 443. If you open ports 22 and 3306, you could inadvertantly open yourself up to attacks.
  • Restrict SSH: Allow SSH (port 22) only from your IP or a bastion host. If you're using a cloud provider, you can use a security group to restrict SSH access to your IP address.
  • Separate tiers: Put databases in private subnets with no internet access. Your webapp and APIs need access to the database, but the public internet definitely doesn't.

Cloud Networking Concepts

VPCs (Virtual Private Clouds)

A VPC is an isolated virtual network in the cloud. You define:

  • IP range (e.g., 10.0.0.0/16)
  • Subnets (public and private)
  • Route tables (how traffic flows)
  • Internet gateway (connects public subnets to the internet)
  • NAT gateway (allows private subnets to access the internet without exposing them)

Example AWS VPC architecture:

VPC: 10.0.0.0/16
├── Public Subnet: 10.0.1.0/24 (web servers, load balancers)
│   └── Route: 0.0.0.0/0 -> Internet Gateway
├── Private Subnet: 10.0.2.0/24 (app servers)
│   └── Route: 0.0.0.0/0 -> NAT Gateway
└── Private Subnet: 10.0.3.0/24 (databases)
    └── No internet route

This isn't the only thing that works, but it's the most common. Don't get fancy-- following best practices and common patterns will prevent you from making mistakes and help your teammates get up to speed.

Load Balancers

Load balancers distribute traffic across multiple servers. Cloud providers offer:

  • Application Load Balancer (ALB): Layer 7 (HTTP/HTTPS). Routes based on URL paths, headers, or hostnames.
  • Network Load Balancer (NLB): Layer 4 (TCP/UDP). Ultra-low latency, handles millions of requests per second.
  • Classic Load Balancer: Legacy product. These are being phased out.

Use cases:

  • High availability: If one server fails, traffic routes to healthy instances.
  • Scalability: Add/remove instances based on demand.
  • SSL termination: Load balancer handles HTTPS encryption, backends use HTTP.

Cloud load balancers have a common suite of features like health checks, sticky sessions, connection draining, and SSL termination.

VPNs and Direct Connect

Connecting on-premises infrastructure to the cloud:

  • VPN (Virtual Private Network): Encrypted tunnel over public internet. Lower cost, moderate latency.
  • Direct Connect (AWS) / ExpressRoute (Azure): Dedicated private connection. Higher cost, lower latency, more reliable. Seriously, much more expensive.

Service Meshes and Ingress Controllers

In Kubernetes environments:

  • Ingress Controller: Routes external HTTP/HTTPS traffic to services like NGINX Ingress and Traefik (pronounced "traffic").
  • Service Mesh: Manages service-to-service communication (e.g., Istio, Linkerd). Provides load balancing, encryption, and observability.

Essential Networking Commands

Connectivity Testing

# Ping a server (ICMP)
ping google.com

# Check if a port is open
telnet example.com 80
nc -zv example.com 443

# Trace network route
traceroute google.com
mtr google.com  # continuous traceroute

# Check open ports on localhost
netstat -tuln
ss -tuln

DNS Troubleshooting

# Basic DNS lookup
nslookup cloudjobs.com

# Detailed DNS query
dig cloudjobs.com

# Query specific DNS server
dig @8.8.8.8 cloudjobs.com

# Reverse DNS lookup
dig -x 104.26.10.123

Network Interface Management

# Show network interfaces and IPs
ip addr show
ifconfig  # older command

# Show routing table
ip route show
route -n

# Display ARP table (IP to MAC address mapping)
arp -a
ip neigh show

Firewall and Port Management

# Check firewall status (Ubuntu/Debian)
sudo ufw status

# Allow a port
sudo ufw allow 443/tcp

# Check open connections
sudo netstat -tunlp
sudo ss -tunlp

# Show listening services
sudo lsof -i -P -n | grep LISTEN

Packet Capture

# Capture network traffic
sudo tcpdump -i eth0 port 80

# Save to file
sudo tcpdump -i eth0 -w capture.pcap

# Read saved capture
tcpdump -r capture.pcap

# Wireshark (GUI alternative for deeper analysis)

Common Networking Issues and How to Fix Them

Issue 1: "Connection Timed Out"

Possible causes:

  • Firewall/security group blocking traffic
  • Service not running on target port
  • Network route misconfigured

Debugging steps:

  1. Check if the service is running: sudo systemctl status nginx
  2. Verify the service is listening: sudo netstat -tuln | grep 80
  3. Test connectivity: telnet <IP> <port>
  4. Check security group rules (AWS Console, CLI, or Terraform)
  5. Check route tables and network ACLs

Issue 2: "DNS Resolution Failed"

Possible causes:

  • DNS server misconfigured
  • DNS record not propagated
  • Firewall blocking DNS (port 53)

Debugging steps:

# Check DNS servers
cat /etc/resolv.conf

# Test with different DNS server
dig @8.8.8.8 example.com

# Check DNS propagation
dig +trace example.com

# Flush DNS cache (if local)
sudo systemd-resolve --flush-caches

Issue 3: "SSL Certificate Error"

Possible causes:

  • Expired certificate
  • Certificate not trusted
  • Hostname mismatch

If this is a project you're working on, you can use a self-signed certificate. If this is a production environment, you should use a certificate from a trusted Certificate Authority (CA).

Debugging steps:

# Check certificate details
openssl s_client -connect example.com:443

# Verify certificate expiration
echo | openssl s_client -connect example.com:443 2>/dev/null | openssl x509 -noout -dates

# Test with curl
curl -vI https://example.com

Cloud-Specific Networking Services

AWS

  • VPC Flow Logs: Capture IP traffic going to/from network interfaces.
  • Route53: Managed DNS service with health checks and failover.
  • CloudFront: CDN for content delivery.
  • Transit Gateway: Connect multiple VPCs and on-premises networks.

Azure

  • Virtual Network (VNet): Azure's VPC equivalent.
  • Azure DNS: Managed DNS hosting.
  • Azure Load Balancer: Layer 4 load balancing.
  • Application Gateway: Layer 7 load balancing with WAF.

GCP

  • VPC: Global virtual network.
  • Cloud Load Balancing: Global load balancing with auto-scaling.
  • Cloud CDN: Content delivery network.
  • Cloud Interconnect: Dedicated connection to on-premises.

Learning Path: From Basics to Production

  1. First: Fully understand IP addressing, subnets, and CIDR notation to the point that it's second nature.
  2. Second: Learn DNS-- set up a domain, configure A/CNAME records, and test with dig and nslookup.
  3. Third: Understand TCP/UDP, ports, and firewall rules. Deploy a web server and configure security groups.
  4. Fourth: Set up a VPC in AWS, Azure, or GCP. Create public and private subnets, configure NAT gateways.
  5. Fifth: Deploy a load balancer and route traffic to multiple backend instances.
  6. Sixth: Practice troubleshooting with ping, traceroute, telnet, tcpdump, and curl.

Practical Exercises

  1. Build a 3-tier network: Create a VPC with public (web), private (app), and isolated (database) subnets.
  2. Set up a VPN: Connect your local machine to a cloud VPC using OpenVPN or WireGuard.
  3. Configure DNS failover: Use Route53 or Azure DNS to automatically route traffic to a backup region if the primary fails.
  4. Troubleshoot a broken app: Deploy a misconfigured app (wrong security group, DNS issue, expired SSL cert) and fix it.
  5. Capture and analyze traffic: Use tcpdump or Wireshark to inspect HTTP requests and diagnose issues.

The Bottom Line: Everything runs on Networking

Start with the basics. Spin up cloud VPCs, break things, and fix them. Create a new web server, enable traffic to it, disable traffic to it. You'll never understand Kubernetes networking, service meshes, Terraform infrastructure unless you understand networking.

For more cloud engineering guidance, check out Linux fundamentals and how to get a cloud engineering job with no experience.

← Back to Insights