- What is packet loss?
- What is latency?
- How does traceroute work?
- What can traceroute tell me?
- What can't traceroute tell me?
- I have a traceroute that has ten hops that have excellent latency, but a couple of hops in the middle that are running at high latency. Is this bad?
- I have a traceroute that has one hop in the middle that is dropping packets. I'm seeing no packet loss to the eventual destination. Is this bad?
- I have a traceroute that shows ten hops from me to my destination. The first 6 hops are great, but hops 7–10 have much higher latency. Does this mean hop 7 is congested?
- OK, so we've established that traceroute isn't useful for diagnosing throughput problems. What can I use instead?
- OK, so we've established that traceroute isn't useful for diagnosing latency problems. What can I use instead?
- OK, so we've established that traceroute isn't useful for giving me the full bidirectional path. What can I use instead?
- I think I've found a routing problem using traceroute. What can I do about it?
- I think I've found a firewall problem using traceroute. What can I do about it?
- Traceroute doesn't work at all when I turn on my ADSL modem/router's firewall or NAT gateway. What gives?
- I'm using pathping instead of traceroute.
What is packet loss?
Packet loss occurs when a packet you send doesn't reach its destination. It might have been corrupted in transit, queued for a congested link, or sent through a router that was too busy to forward it.
What is latency?
Latency is the period of time that elapses between the transmission of a packet and the reception of the packet at its destination.
Sometimes the term is used to refer to the time that elapses between the transmission of a packet and the reception of the response to that packet. Whirlpool users tend to use the word "ping" to describe this sense ("that link has a high ping").
Total latency is the sum of all the individual link latencies between source and destination, plus the latency caused by processing the packet on each router, plus the latency caused when a packet sits in each interface queue waiting for transmission.
How does traceroute work?
IP packets have a field in the header called the "TTL", or "Time To Live". Despite the name, it doesn't actually control the time the packet is allowed to stay on the network, it controls the number of router hops the packet is allowed to go through. Every time a packet goes through a router, the router decrements the contents of the packet's TTL field before forwarding it.
When a router decrements a packet's TTL field to 0, the router throws the packet away. This prevents packets from bouncing between two routers on the network forever in the event of a routing loop. Eventually packets involved in the routing loop will expire because their TTLs decrement closer to zero each time they bounce.
When a router throws a packet away because its TTL is zero, it inspects the packet's IP header, and sends a "Transit Time Exceeded" ICMP message back to the packet's source address. The source of this ICMP message is the IP address of the router that dropped the original packet.
Traceroute works by sending a packet to the nominated destination with the TTL field deliberately set to 1. This causes the first router the packet reaches to decrement the TTL to zero, then drop the packet, and then send an ICMP message back to you with the source address of that router. Traceroute displays that IP address and the number of milliseconds that have elapsed between the transmission of the original packet and the reception of the ICMP_TIMXCEED message.
Then traceroute repeats this process with additional packets with the TTL set to 2, 2, 4, etc. It continues to do this until the TTL reaches some predefined maximum (usually 20) or until it gets back an ICMP_TIMXCEED message from the address you've nominated as the traceroute destination.
What can traceroute tell me?
Traceroute gives you the NETWORK-LAYER UNIDIRECTIONAL PATH between yourself and the endpoint you're trying to probe. Its main purpose is to examine the path taken by packets with a view to discovering and diagnosing routing faults.
It also tells you something (but not everything) about the latency introduced by each hop. There are important caveats about that data, discussed below.
Depending on what the traceroute probe packets encounter on their journey, it might also tell you whether there's a firewall or packet filter in the way between you and the destination.
It can't tell you a lot about packet loss. If you want to know about that, you'll need to use a ping instead of a traceroute.
Generally speaking, making sense of traceroute output is more difficult and more nuanced than most people seem to expect. You can't just look at a traceroute and point your finger at a fault or a congested path: it doesn't work like that. The traceroute output can help you to determine what other bits of information you need to diagnose a fault, but it can rarely diagnose the fault all by itself.
If you don't have other sources of information (such as link utilisation graphs) available to you, drawing conclusions based on traceroutes is somewhat dubious.
What can't traceroute tell me?
Traceroute can tell you almost nothing about performance. People who complain about performance and who justify their complaints by posting traceroutes usually have a fault that is different to the one they think they have. Often they have no fault at all, and have been mislead into thinking there's something wrong because they've misinterpreted a completely normal traceroute.
Traceroute can't tell you about the RETURN PATH between yourself and the endpoint you're trying to probe. If there is any asymmetric routing between you and the endpoint, that'll be totally invisible to you unless you can get a traceroute initiated from the other end back to you.
Traceroute can't tell you about LINK-LAYER PATHS between yourself and the destination. Just because a traceroute shows a hop that goes from, say, Melbourne to Sydney, that doesn't mean that the underlying Layer-2 transport goes direct from Melbourne to Sydney. Perhaps it goes via Adelaide or Canberra. Or maybe it takes "the long way 'round" via Adelaide-Perth-Darwin-Brisbane-Sydney. Traceroute has no way of finding that out, and is the wrong tool to use to diagnose those kinds of problems.
Traceroute can't tell you about achievable speeds between yourself and the destination. It is entirely possible (and often likely) that you can have a high-latency path that still allows you to reach the full speed of your tail circuit (that is, just because traceroute tells you it's taking 600 mSec to reach something in China, that doesn't mean you can't achieve 1.5 Mbps throughput rates to it). Conversely, there are lots of reasons why a low-latency path might also deliver low throughput (for example, you might have 1 kilobyte per sec transfer rates to something that exhibits 1 mSec latency in a traceroute).
Traceroute can't tell you about congestion. Even congested paths can deliver good latency (depending on the nature of the congestion, queuing policies at each end of the congested link, the link's interface type, and various other factors). Congested paths will often drop packets, but there are lots of other completely innocuous reasons why a traceroute might show dropped packets, so a traceroute alone is not useful for performing that kind of diagnosis (although it can help to narrow down the cause).
I have a traceroute that has ten hops that have excellent latency, but a couple of hops in the middle that are running at high latency. Is this bad?
Probably not. Routers aren't designed to produce a network that efficiently and reliably reacts to traceroute packets. They're designed to forward real data, delivering good end-to-end throughput and latency.
This means a high-latency hop in the middle of a traceroute is not important if the next hop returns to low-latency. It just means that particular high-latency router has better things to do with its time than respond to your traceroutes (for example, it might be calculating BGP tables).
You need to understand that in the grand scheme of things, responding to traceroutes and pings is a task that is somewhere near the bottom of a router's priority list. If it has ANYTHING AT ALL to do that is more important, it'll either delay its response to your traceroutes (leading to traceroute telling you about a high-latency hop), or ignore them altogether (leading traceroute to tell you about imaginary packet loss). This is perfectly normal, and nothing to be concerned about.
The other thing that can cause this is where an ISP in the middle of your traceroute uses a higher-latency backchannel path to get its ICMP TIMXCEED messages back to you. So even though the data you're sending has low outbound latency, the responses that come back to say your traceroute packets have been dropped could have high latency. The effect of this on your end-to-end latency is zero, which can be why the last hop doesn't look like a high latency one.
I have a traceroute that has one hop in the middle that is dropping packets. I'm seeing no packet loss to the eventual destination. Is this bad?
No. See the discussion for the previous question.
I have a traceroute that shows ten hops from me to my destination. The first 6 hops are great, but hops 7–10 have much higher latency. Does this mean hop 7 is congested?
Perhaps, but not necessarily.
Lets look at the hops, and what can cause latency.
Firstly, are hops 7–10 in the same country as the others? Crossing the Pacific Ocean to the US can take between 180 and 220 mSec, so it's normal to see a big latency jump on international hops.
Secondly: Are hops 7–10 from the same ISP as the others? Asymmetric routing can mean that the backchannel path is "longer" than the path outwards from you. Often this starts happening on the boundaries between ISPs. Traceroute can't tell you about the backchannel path, but you might be able to ask someone at the other end to perform a traceroute back to you and email you the results.
Asymmetric routing can sometimes mean that the backchannel path crosses a long-haul transoceanic link even though the inward path is different. For example, the path from Sydney to Tokyo might go direct to Japan, whereas the return path from Tokyo to Sydney might go via Guam, Los Angeles, San Jose, and Hawaii. You have no way of knowing this from your uni-directional traceroute; You'll just see that all the Japanese hops seem like they're exhibiting higher latency than you'd expect. If your eventual destination isn't actually in Japan, and isn't exhibiting higher latency than you'd expect, you have no reason to actually care about this.
OK, so we've established that traceroute isn't useful for diagnosing throughput problems. What can I use instead?
FTP is generally pretty good: At the end of each transfer it tells you how many kilobytes per second you achieved - that is, it measures your throughput.
HTTP is generally pretty bad, because it can be (and often is) cached. Instead of measuring the performance of the network, you'll measure the performance of the cache.
OK, so we've established that traceroute isn't useful for diagnosing latency problems. What can I use instead?
Ping. If you have a latency issue between you and some destination, you don't care about the individual latencies between each router in the path, do you? You really care about that particular destination! So why use a tool that gives you the entire path?
As you've probably worked out from the preceding discussion, there are lots of reasons why latency measurements from traceroutes can be misleading if you don't have extra sources of information at your disposal. If you think there's a latency problem, use "ping" to confirm it and log a fault to get it investigated.
OK, so we've established that traceroute isn't useful for giving me the full bidirectional path. What can I use instead?
Go to http://www.traceroute.org - You'll see web-enabled traceroute gateways from all over the planet. If one of those happens to be close to your destination (especially if it's on the same ISP as your destination) you might be able to get a traceroute back to your own IP address.
You can also contact someone on the destination network and ask them to send you a traceroute back to your IP address.
Finally, you can use a ping with the RECORD_ROUTE option set. This doesn't always work across autonomous-system boundaries (the option is often filtered), and some equipment ignores it (so hops through those devices won't be recorded) but it can be useful when other efforts fail. Note that the path displayed might be truncated if it's too long: there's only room for 9 hops in the IP packet header, and if they fill up additional hops won't be recorded.
I think I've found a routing problem using traceroute. What can I do about it?
Log a fault. Note that sometimes our ability to resolve those problems will be somewhat limited (because the routing fault might actually be in someone else's network half way across the globe, and we can only influence those bits of the network we own and operate ourselves). But nevertheless, we'll always try.
I think I've found a firewall problem using traceroute. What can I do about it?
Probably best to contact the owner/operator of the firewall directly, explain what you're trying to do, and explain why you think the firewall is preventing it.
Firewall operators are resistant to ISPs contacting them about reachability problems: usually the entire point of a firewall is to keep people out, so people who use them don't tend to be sympathetic to ISPs who complain that their customers can't get through the firewall.
Traceroute doesn't work at all when I turn on my ADSL modem/router's firewall or NAT gateway. What gives?
Your firewall is probably blocking ICMP (so it can't see the ICMP TIMXCEED messages from the routers along the path).
This is almost certainly a bug: ICMP is not an optional part of the TCP/IP specification, and blocking it is likely to cause a lot of other subtle faults. Breaking traceroute is probably the least of your problems. Perhaps a firmware upgrade will fix the bug. It's probably best to open a ticket with us to see if we've seen this problem before with your hardware/software combination, because we might be able to make some useful recommendations.
I'm using pathping instead of traceroute.
Please stop. Pathping seems to be the result of a merger between ping and traceroute, and it tends to deliver the worst features of both tools.
In particular, pathping tends to show packet loss where none really exists. We've had lots of conversations with customers who have been lead into the belief that our network is broken due to pathping's output, and in-depth investigation has always revealed that they've found a bug in pathping rather than a bug in our network.
Monitoring tools that produce untrustworthy output are worse than useless.