April 9, 2020
The availability of your network and IT services can make or break business performance. A full-blown outage costs 86% of companies more than $300,000 per hour with 34% saying the costs would exceed $1 million per hour. But it’s not just outages that come with high costs. Slow performance and frequent brownouts will also lead to significant damages.
While IT decision makers say that 51% of outages are avoidable with proper monitoring, we have to recognize that troubleshooting is a fact of life. Being able to do so effectively and efficiently will keep your business running smoothly even when issues arise.
This is Part 1 in a two-part series explaining how to troubleshoot your network. Here, we’ll talk about how to troubleshoot the most common network issues and the tools needed to do so. And in Part Two, we’ll talk about how to troubleshoot network forensics issues.
There’s no one-size-fits-all answer to network troubleshooting. The most challenging issues will require deep investigations and an ability to quickly identify root causes. However, there are a few common network problems that have much more straightforward solutions.
Effective and efficient network troubleshooting starts with being able to master the three most common problems—connectivity, performance, and latency.
The first step to troubleshooting network connectivity is to try the easiest solutions. Check to see if all hardware is connected properly and that cables aren’t loose or damaged. Identify whether or not the problem is with your network or with external services you’re trying to connect to. And, when you know the problem is internal, you can try the most cliché solution—reboot the equipment that’s having the problem.
If restarting the network component doesn’t solve your connection issues, it’s time to dig a bit deeper. There are a few network troubleshooting commands you can use to check for connectivity issues:
Slow network performance is perhaps the most common workforce complaint for IT teams. Even though the problem often lies with an application or website, you still have to prove that the network isn’t a root cause, which can be easier said than done as you try to sift through thousands of log files for issues.
The key to troubleshooting network performance is anomaly detection. This means that creating a baseline of normal network performance is critical to the troubleshooting process.
Assuming you have that baseline for comparison, you can start running basic troubleshooting processes. First, check bandwidth utilization across all necessary links. If traffic volumes are spiking, you might be experiencing a DDoS attack or overuse of bandwidth-intensive services. From there, you can look at application performance and see which services are using the most bandwidth. If you can identify that, for example, large data replications are hurting performance, you can start scheduling those processes outside of business hours.
Manually troubleshooting network performance is tedious and loaded with the potential for human error. Deploying a network performance monitoring tool from vendors like Flowmon Networks or Riverbed will help you scan for performance issues automatically, detecting errors and pointing you toward the root causes. Wireshark, the free open-source network protocol analyzer will provide the capacity to perform live protocol capture and offline analysis.
Speed is incredibly important to the performance of high-bandwidth applications like voice and video calling or data streaming. While something like email delivery can stand higher latency, anything that requires real-time or near real-time data transfer relies heavily on the speed of your network traffic (latency).
The simplest way to troubleshoot latency issues is to use the ping command. In addition to identifying network connections, the ping command measures the latency between a requesting host and destination host. Another option is to use the tracert command, which can be used to map the hops a packet takes between a requesting host and a destination host. This helps you understand how packets move across your network and spot opportunities to optimize for latency.
The ping and tracert commands aren’t the most comprehensive option for troubleshooting latency, though. To go deeper than simply identifying the ping between two points, you need a fully functional network performance monitoring tool.
You can resolve latency issues by optimizing network traffic flows through Quality of Service (QoS) hierarchies. Being able to set QoS priorities for time-sensitive traffic like video playback and VoIP communications will ensure bandwidth is reserved to minimize ping for those services. In a perfect world you could continuously redesign your network and provision more bandwidth to minimize ping for all services, but this may be completely unrealistic. Taking advantage of the right tools and processes to quickly resolve latency issues will be far more efficient over the long term.
Being able to troubleshoot even these three common network problems as quickly as possible starts with one thing above all else—packet-level visibility. If you can’t see what’s going on in your network, you won’t be able to identify and mitigate the root cause of a particular issue.
When you’re out in the field trying to troubleshoot an issue, there are a number of essential tools and pieces of equipment you need to maximize visibility. First and foremost, you need access to power outlets, Ethernet ports, and the proper cables. But beyond those basics, having a portable network TAP on hand to easily access a link and gain visibility into every data packet is critical.
Our new Field TAPs are ideal for 10M/100M or 10M/100M/1G field test monitoring and troubleshooting because they provide full copies of traffic data without interrupting links. These highly efficient TAPs are a must-have item for any troubleshooting toolkit for easily checking network connections and maximizing the effectiveness of tools like Wireshark and network performance monitoring solutions.
Troubleshooting network connectivity, performance, and latency issues is just the start, though. The next post in this series will focus on solving issues in your network forensics so you can maintain your ability to efficiently run root cause analysis for troubleshooting situations.
Looking to add a visibility solution to better baseline your traffic, but not sure where to start? Join us for a brief network Design-IT consultation or demo. No obligation - it’s what we love to do.
If the inline security tool goes off-line, the TAP will bypass the tool and automatically keep the link flowing. The Bypass TAP does this by sending heartbeat packets to the inline security tool. As long as the inline security tool is on-line, the heartbeat packets will be returned to the TAP, and the link traffic will continue to flow through the inline security tool.
If the heartbeat packets are not returned to the TAP (indicating that the inline security tool has gone off-line), the TAP will automatically 'bypass' the inline security tool and keep the link traffic flowing. The TAP also removes the heartbeat packets before sending the network traffic back onto the critical link.
While the TAP is in bypass mode, it continues to send heartbeat packets out to the inline security tool so that once the tool is back on-line, it will begin returning the heartbeat packets back to the TAP indicating that the tool is ready to go back to work. The TAP will then direct the network traffic back through the inline security tool along with the heartbeat packets placing the tool back inline.
Some of you may have noticed a flaw in the logic behind this solution! You say, “What if the TAP should fail because it is also in-line? Then the link will also fail!” The TAP would now be considered a point of failure. That is a good catch – but in our blog on Bypass vs. Failsafe, I explained that if a TAP were to fail or lose power, it must provide failsafe protection to the link it is attached to. So our network TAP will go into Failsafe mode keeping the link flowing.
Single point of failure: a risk to an IT network if one part of the system brings down a larger part of the entire system.
Heartbeat packet: a soft detection technology that monitors the health of inline appliances. Read the heartbeat packet blog here.
Critical link: the connection between two or more network devices or appliances that if the connection fails then the network is disrupted.