“This is supposed to work!”, as we told ourselves as we attempt to integrate multiple systems together.
Often in deployment environments, connections can fail unknowingly as we try to bring together complex systems and integrations across multiple network zones and dependencies. There are several tools that help you identify the critical point(s) of failure.
| Tool | Answers | OSI / Stack Layers |
|---|---|---|
tcpdump | What packets are actually on the wire? | L2–L7 (Packet-level visibility) |
ping | Can I reach the host? Latency? Packet loss? | L3 (Network / ICMP) |
traceroute / tracert | Where along the path does it fail? | L3 (Network) |
nc | Can I connect to a raw TCP/UDP socket? | L4 (Transport) |
openssl | Does TLS/session/handshake work correctly? | L5–L6 (Session & Presentation – TLS) |
dig | Is DNS resolving correctly? | L7 (Application – DNS) |
curl | Does the application respond correctly? | L7 (Application – HTTP/TLS) |
nmap* | What ports/services are exposed? | L3–L7 (Discovery & probing) |
I was hesitant in placing nmap in this list due to how often a tool like this would be quickly flagged by IDS/IPS systems. However, if your network or systems allow such a tool, nmap can be very powerful, extending beyond connectivity debugging, to advanced functions such as service and os fingerprinting.
Together, these tools ensure you can debug connectivity issues across (almost) all layers of the OSI model and network stack. These tools are the minimum, but recommended. Most application-level protocols (e.g., LDAP or MySQL) cannot be meaningfully debugged without protocol-aware clients. These tools will at best tell you whether a service is reachable, but not why it is failing.
Even without specific tools, you could still identify and extract a lot of information from the error responses. For example, what happens when you curl http://postgres:5432? Naturally, this command would not work as the protocol is incorrect. However, the difference between the server responses "empty reply from server" and "connection refused" can reveal a lot of information.
Knowing your tools
Depending on how you use them, these tools can be extended beyond simple connectivity checks. For example, to measure latency, packet loss, or network jitter.
pingcan track round-trip time and packet loss over multiple probes.traceroutecan show where latency accumulates along the path.curlornccan measure service response times.
By creatively utilizing these tools, you can turn basic troubleshooting utilities into lightweight network performance tests.
Tradeoffs
With any of these tools, additional capabilities are also made available for attackers. For example, attackers can use nc to open listeners or exfiltrate data, while nmap can be used to probe the rest of your network.
One way to slightly reduce this attack surface while still retaining debugging capabilities is to run these tools inside a container (e.g., Kubernetes debug pods) and only spin them up when necessary. Running them inside a container is also helpful for situations where the host is immutable.
| |
However, the tradeoff here is that debugging is performed from the container’s perspective rather than the host’s. As a result, you must account for container networking abstractions such as network namespaces, NAT, service meshes, and CNI behavior, which can affect visibility and interpretation of network traffic.
Ultimately, though, traffic still egresses from the node. Understanding where container-level abstractions end and host-level networking begins can be the difference between spending six hours debugging and just one.
Conclusion
From my experience, these tools provide a baseline for network and integration troubleshooting, but they are by no means the only options. Depending on your environment and constraints, such as firewalls blocking ICMP or restricted protocols, you may find the need to curate your own set of debugging utilities. When you do, I hope this post helped.