How Containers Isolate Workloads
Linux containers achieve isolation through several kernel features working together. Unlike virtual machines, containers share the host kernel — which makes them lightweight but also means isolation is process-level rather than hardware-level.
Namespaces: What Each Container Gets Its Own
Namespaces are the foundation of container isolation. Each namespace type provides a separate view of a specific system resource:
Mount Namespace
Each container gets an isolated filesystem view. Changes to mounts inside the container don't affect the host or other containers.
Network Namespace
Containers get their own network stack — interfaces, IP addresses, routing tables, and firewall rules. This is why containers can all bind to port 80 without conflicts.
PID Namespace
A separate process ID tree. The main process inside a container sees itself as PID 1, even though the host sees it as a regular process with a different PID.
UTS Namespace
Own hostname and domain name. Containers can set their hostname independently.
User Namespace
Separate user/group IDs. Root inside the container (UID 0) can map to an unprivileged user on the host — a key security feature.
IPC Namespace
Isolated inter-process communication: shared memory, semaphores, message queues.
Cgroup Namespace
Isolated view of the cgroup hierarchy, hiding the host's cgroup structure.
Cgroups: Resource Limits
While namespaces provide isolation of what a container can see, cgroups (control groups) limit how much of each resource it can use:
- CPU — time slices, core pinning
- Memory — RAM and swap limits
- Block I/O — disk bandwidth throttling
- Network — bandwidth limits (via tc)
- Devices — which devices the container can access
# Example: Limit container to 2 CPUs and 1GB RAM
docker run --cpus=2 --memory=1g nginx
Other Isolation Mechanisms
Seccomp
Restricts which system calls a container can make. Default Docker profiles block ~44 dangerous syscalls like reboot, mount, and ptrace.
Capabilities
Fine-grained root privileges. Instead of all-or-nothing root, containers can drop specific capabilities:
CAP_NET_ADMIN— network configurationCAP_SYS_ADMIN— broad system administrationCAP_CHOWN— changing file ownership
AppArmor/SELinux
Mandatory access control profiles that restrict file access, network operations, and other actions beyond what standard Linux permissions allow.
Read-only Root Filesystem
Running containers with immutable root filesystems prevents runtime modification of binaries.
Overlay Filesystem
Layered images with copy-on-write. Base layers are shared and immutable; containers write to their own layer.
The Key Distinction from VMs
| Aspect | Containers | Virtual Machines |
|---|---|---|
| Kernel | Shared with host | Own kernel |
| Isolation level | Process-level | Hardware-level |
| Overhead | Minimal | Hypervisor + full OS |
| Boot time | Milliseconds | Seconds to minutes |
| Security boundary | Kernel features | Hypervisor |
Because containers share the host kernel, a kernel vulnerability can potentially allow container escape. This is why:
- Keep host kernels patched
- Use user namespaces to avoid running as root
- Apply seccomp and capability restrictions
- Consider gVisor or Kata Containers for stronger isolation
Practical Security Recommendations
# Run as non-root user
docker run --user 1000:1000 myapp
# Drop all capabilities, add only what's needed
docker run --cap-drop=ALL --cap-add=NET_BIND_SERVICE myapp
# Read-only root filesystem
docker run --read-only myapp
# No new privileges
docker run --security-opt=no-new-privileges myapp
Conclusion
Container isolation is powerful but fundamentally different from VM isolation. Understanding these mechanisms — namespaces for visibility, cgroups for resources, seccomp/capabilities for syscall restriction — helps you make informed decisions about where containers are appropriate and how to harden them.
Building containerized infrastructure? Contact us to discuss security architecture for your workloads.
