Deploying IPv6 – The Residential ISP’s Challenges

There’s a lot of support for deploying IPv6, no one really is saying that you shouldn’t, but the lackluster uptake from large eyeball ISPs tends to grate on those most vocal IPv6 evangelicals. Rest assured however that most of the larger ones have been silently working away on this for some time, working on making their large scale deployments to the mass market as seamless as possible. Those smaller ISPs that lack budget, or those that are flush with IPv4 space, are happy to ride it out letting others deal with the 0day vendor bugs. That’s fine, with the already tight margins in this market, why should they spend money when they don’t need to?

Anywho, I decided to document some of the issues we’ve faced and have had to overcome in the past few years as we start to descend over the precipice of our residential mass market IPv6 rollout. This is by no means a complete list, just a few of the interesting ones that spring to mind.

Authentication

PPPoE with CHAP for authentication will continue to be fine. Your PPP session establishes as normal, IPv6CP negotiates link-local addressing, and then DHCPv6 hands out an IA_NA and/or IA_PD over the top.

IPoE on the other hand, authentication is typically done when the BNG receives a DHCPv4 DISCOVER or DHCPv6 SOLICIT. It’s done with what’s colloquially referred to as “port based authentication”, using either the Circuit-ID or Remote-ID that’s inserted via the DHCP Relay Agent on the Access Node, ie. the DSLAM or OLT. This is often referred to in DHCPv4 land simply as “Option 82” with Circuit-ID being sub-option 1, and Remote-ID sub-option 2.

First issue: not all Access Nodes, or their service providers will support DHCPv6 LDRA insertion of Remote-ID or Interface-ID (the v6 equivalent of Circuit-ID). Openreach being our primary example here in the UK, although their Huawei MSANs do actually support it, Openreach don’t. The impact of this means you’re reliant on DHCPv4 for gleaning this information, and can’t go single-stacked native IPv6 just yet. Doesn’t seem like that big a deal, right? Which brings us to the second issue:

In lieu of having the native Remote-ID/Interface-ID, some BNGs can attribute the v6 session to the same as the v4 if the DHCPv6 SOLICIT is received within a few seconds of the DISCOVER. Nice wee kludge that works when a CPE is freshly booted, but can fall out of sync depending on timers, if v6 is enabled after it’s online, or other CPE quirks. If this happens, the CPE’s existing PD may become non-routable causing end user’s traffic to be blackholed, or the CPE just won’t get a new PD.

OK, so you own your own access network and your DSLAM/OLT supports Lightweight DHCPv6 Relay Agent to insert a Remote-ID, great we now know who sent us that DHCPv6 SOLICIT. Except that DHCPv4 Option 82.1 Remote-ID is different to the DHCPv6 Option 37 Remote-ID. RFC4649 prepends an extra 4 bytes to the front of the Remote-ID, to include the IANA registered Enterprise-ID of the relay agent vendor. Now you’re going to need extra RADIUS logic to strip off those first 4 bytes; to do that it needs to be able to reliably identify an IPv6 triggered ACCESS-REQUEST, which can also be a challenge in itself as the the different format Remote-IDs get inserted in to the same RADIUS attribute, Broadband Forum’s “Agent-Remote-ID”.

Resourcing

On a BNG, like other routers, you have to keep a close eye on the resources being used. On the Alcatel-Lucent 7750-SR, each subscriber connected chews up what they call a “host resource”. As soon as you dual stack a subscriber, that takes up another host resource, thus halving the total number of subscribers you can host on that one BNG.

If you hand out an IA_NA as well as an IA_PD, that uses up yet another host resource. To avoid this waste of resources, you don’t have to assign a public IPv6 point-to-point addressing for use on the CPE’s WAN interface (IA_NA), instead they can just use link local for BNG<->CPE communication.

Side note: In a later R12 release of SR OS it no longer uses a host resource each for IA_NA and IA_PD.

CPE

As we’re not allocating an IA_NA address for use on the WAN interface but rather using the link local addressing, the CPE no longer has a public IP address. Not really a major issue apart from perhaps confusing some helpdesk agents or end users who can no longer ping their CPE as proof their connection is up. We mitigate this with a small custom tweak on the CPE, it claims the first ::1 address from the PD and uses it as a loopback of sorts.

For firewalling, we’ve chosen to follow the RFC6092 recommendations on CPE IPv6 security. Which means we’ll be, by default, allowing all inbound IPSec but blocking all other non-solicited inbound traffic.

That poses an issue that we haven’t really resolved, and I’m unsure as to what the exact impact will be. As a result of NAT on IPv4 a lot of applications utilise UPnP to open up inbound ports for connectivity, this means not just a DNAT entry, but also an inbound firewall rule. Whilst we no longer need NAT with IPv6, that inbound firewall rule will still be required on CPEs that have a default deny policy.

As mentioned in a previous post, there are new UPnP functions which allow for the dynamic creation of IPv6 firewall rules and are actually mandated in the IGD:2 specs; Sadly not many CPEs meet these IGD:2 specifications yet, and even then it will require application developers to update their applications to make use of these new functions.

Another potential issue I foresee with firewalling is more of an end-user training one as opposed to a technical one. Most modern OS will make use of privacy addressing. This is a method of an end-host pseudo-randomly assigning itself temporary addresses to use for outbound connections and then deprecating them after a while to be replaced with new ones. The end result is that an end-host will have a multitude of IPv6 addresses on an interface, including:

Link local addressing which will start with fe80::
Unique Local Adress (ULA) which starts with fd
A static EUI-64 address based on the interface’s MAC address.
Several of the aforementioned Privacy Extension addresses. (Only 1 being used for new outbound connections, but possibly multiple deprecated addresses that were used for older flows)

Hopefully people will realise pretty quickly that the first two aren’t globally routable and aren’t to be used for inbound firewall rules. The issue is that the last two types of addresses are both assigned out of the same prefix handed out via RAs from the CPE and aren’t instantly recognisable by their format. Thankfully most OS will include the word “temporary” next to the privacy addresses, which will hopefully steer end-users to use the EUI-64 address for any IPv6 firewalling rules they decide to manually enter.

Right that’s enough for now, and that’s just a small snippet focusing on the very end of the Internet chain. Hopefully it helps some people, or gives others an idea of what kind of things they should be looking at when doing their own IPv6 deployment.

4 thoughts on “Deploying IPv6 – The Residential ISP’s Challenges”

Nathan Ward says:

July 13, 2015 at 1:08 pm

Worth noting, on a provider I observed recently, only about 1% of CPE even asked for IA_NA, they used a variety of methods for getting an address to send ICMP traffic on, most from the EUI-64 generated address on their LAN I/F.

Also, ASR9000 can take IPv6 attributes in a response to an IPv4 access request, and any subsequent IPv6 on that subscriber will use those attributes – not just within a few seconds like you suggest here.. Is that 7750SRs where that happens?

I haven’t looked in to it, but perhaps OS socket APIs can handle the port opening business, so applications can leverage that without having to have their own implementations.

LikeLike

- Richard Patterson says:
  
  July 13, 2015 at 1:35 pm
  
  Yes this is specifically with the 7750SR although I’m sure other vendors have a similar issue and/or method of resolving it.
  
  It’s not the access-request timing, it’s more the trigger of those access-requests that’s the issue. If the SOLICIT comes in within 10sec of the DISCOVER, the 7750 won’t even send a second access-request (so yes it would expect the v6 attributes to be included in the v4 triggered access-accept as well).
  I guess it’d be nice if they could make this length configurable, or set to infinity like you suggest the ASR9K does.
  
  LikeLike
  
  - Nathan Ward says:
    
    July 13, 2015 at 2:03 pm
    
    So, it forgets any IPv6 attributes after 10s if not used? That’s odd.
    
    I wonder how much of a problem that is in practice.. It would only cause a problem if someone turned on v6 a while after their v4 was up, right? I guess if they upgrade the firmware or whatever this would happen, and because the DHCPv4 matches an existing lease it wouldn’t do an Access-Request?
    
    LikeLike
Ross Chandler says:

July 13, 2015 at 6:31 pm

Richard, good article. I am also using the ALU SRs and Huawei access nodes but in Ireland (so not Openreach). In my case the Huawei does LDRA for DHCPv6. We also only support IA_PD and not IA_NA. The modem takes an address from the first /64 but TR69 management of the modem is still by IPv4 for the moment. Good luck with your deployment.

LikeLike