Tuesday, September 1, 2015

Troubleshooting with ping6

At work today, I helped someone use ping6 to figure out what to fix on a server that wasn't reachable over IPv6. I have a pretty basic formula that I thought I would share.

I will use this information for the problem host:

nethope@fixme$ ip -6 address show
1: lo:  mtu 65536 
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0:  mtu 1500 qlen 1000
    inet6 2001:db8:2015:3000::5/64 scope global 
       valid_lft forever preferred_lft forever
    inet6 fe80::201:2ff:fe03:405/64 scope link 
       valid_lft forever preferred_lft forever
nethope@fixme$

When I start troubleshooting, I like to start small and expand to ever-wider scopes. One of my college professors said troubleshooting should start with a simple question to which you know the answer, and take small steps from there.

The smallest scope possible is loopback. Since I usually have multiple network interfaces, ping6 needs to know which one to use. The syntax will vary between operating systems. I used 'ping6 -c3 -I lo ::1' on my Linux, and 'ping6 -c3 ::1%lo0' on my Mac.

ping6 -c3 -I lo ::1

That should work; it should be the question to which you know the answer. If it doesn't work, IPv6 probably isn't configured yet. This command should also work without specifying the interface; if you need to specify the loopback interface then route selection is broken.

Then I expand the scope ever so slightly, all the way up to another easy one, just so I don’t miss some strange condition. Try the link-local address (HINT: it starts with fe80) for that interface. This traffic also shouldn't hit the network, just this computer and its network interface card.

ping6 -c3 -I eth0 fe80::201:2eff:fe03:405

If that fails, first bounce the interface (ifdown; ifup if that's an acceptable outage), and then remove and reconfigure IPv6 for that interface. Again, this task is specific for your operating system. Reconfiguring the interface was the solution today!

The next small step: Try the global address for that interface (still on the same machine). Since it’s a global address, you shouldn’t have to specify the interface, but if you get unexpected results, drop it back in to be sure.

ping6 -c3 -I eth0 2001:db8:2015:3000::5

This should work if the previous two steps worked, but if it doesn't, try the same repair procedure as the previous step.

Now I’m finally ready to move beyond the scope of the NIC, all the way out to the router interface, still within the same VLAN. Again, you shouldn't have to specify which interface to use now, but we're troubleshooting so watch out for oddities. If you know it, start with the router's link-local address.

ping6 -c3 -I eth0 fe80::a8bb:ccff:fedd:eeff
ping6 -c3 -I eth0 2001:db8:2015:3000::1

If that fails, look for a problem within the VLAN. Find other IPv6-enabled hosts in the same VLAN (all of them, right? *grin*), and see if they can reach the default router. If no hosts can reach the router, the router needs to be fixed. (I've had to remove then reconfigure IPv6 on a VLAN on a Cisco router, and magically the very same configuration started working again. It hasn't happened in ages, thankfully.) If most hosts can reach the router, look for reasons why this one has problems. Perhaps it is missing IPv6 router advertisements, and can’t determine its IPv6 default router or thinks it can’t reach it. That would be a problem with ICMPv6 or multicast. Since link-local multicast is generally reliable (multicast querier on the router aside), it's probably the host-based firewall.

Next, if available, look outside your VLAN, but stay on your network. Pick anything.

ping6 -c3 -I eth0 2001:db8:2015:3001::101

If that works, your network is probably fine. If it fails, make sure the router has IPv6 routes. Are all VLANs affected, or just some VLANs? I usually test other VLANs at this step, too.

ping6 -c3 -I eth0 www.yoursite.edu

If you want to be sure that DNS isn't the problem, either don't use hostnames, or validate your hostnames with host or dig first. However, we're entering the territory where I always use hostnames because I don't remember the numbers.

If the problem shows up at this scope, poke around. Pick two hosts in two VLANs, and try traceroute6 in both directions (one to the other and other to the one; I detected an OSPFv3 problem that way). Our F5 load balancers make traceroute look odd, so pay more attention to the simple "good, yes, connected" versus "bad, no, couldn't connect" results at first. Try mtr (a new enough version to support IPv6) and let it run for a while to look for path instability or intermittent packet loss (one weekend when an intermediate router was losing 60% of IPv6 packets).

And then try hosts that aren't on your network, like:
ipv6.google.com
ipv6.he.net
ip6.me

At what step in this process does ping6 fail? There’s the scope of your problem, and where to concentrate your analysis.

No comments:

Post a Comment