This page contains my ideas on how to design and implement a framework for configuring (embedded) ethernet switch controllers from FreeBSD.
As of May 16th, this is only a slightly dated brain dump of various ideas and questions that I had during the past couple of months. I'm planning on updating this page shortly with information on the implementation that Adrian committed to -head.
In the meantime, please refer to my talk at BSDCan 2012 http://www.bsdcan.org/2012/schedule/events/330.en.html ; the video of the presentation should be posted to YouTube shortly.
Many wireless router devices contain multiple Ethernet ports that are implemented by an Ethernet switch chip. These chips can often be configured in various ways. Interesting aspects include the port link parameters, VLAN settings, per-port and per-switch statistics. Newer switch chips contain hardware features for NAT, PPPoE processing, and similar advanced features.
Design Goals
The framework should:
- Allow an administrator to configure common aspects of the switch
- The administration interface should be generic (not switch model specific)
- Allow switch model specific configuration extensions
- Enable identification of ports by their label on the device's case
Abstract Switch
In order to have a generic configuration interface, the framework has the concept of an abstract switch that models the various features and configuration settings of the concrete switch chips. This abstract switch can have these features:
- 2 to 32 physical ports, some of which might be internal to the device (CPU attachment)
- external ports can have PHYs for link control
- max. frame length (std. Ethernet, .1q tagged, q-in-q tagged, jumbo frames)
- .1Q VLAN support with a limited number of VLANs (including tagged/untagged ports)
- .1Q VLAN support with all possible VID values
- .1Q Priority tag support and frame classification
- IPv4/IPv6 DSCP priority
- .1D Spanning Tree management functions
- outsource .1D processing to host CPU (punt frames, adjust forwarding and learning status per port)
- IGMP snooping
- port mirroring
- ACL with forward/drop/punt to CPU actions, filtering on SA, DA, IPv4, IPv6 addresses
- .1X support
Per-por features:
- port-based VLAN support
- per-port settings for ingress and egress handling of tagged and untagged frames (strip/drop/add)
- add VLAN tag to untagged frames with the VID indicating the ingress port of the frame
- QoS queue handling, per-port priority
- forwarding and learning enable/disable (.1D)
- statistics counters
- LED configuration
Open Issues
The following questions likely should be answered:
- How are statistics going to be exposed? sysctl, SNMP/MIB framework, userland utility? How are the various counters offered and discovered, since the switch controllers differ widely in this aspect?
- How can more advanced aspects be modeled in a generic way, how would other system componentes (natd, etc.) tie into these?
- Some of the switches are attached to the CPU via the MII SMI (signals MDC/MDIO). The current PHY code assumes a one-to-one relationship between MII and MAC with one or more PHYs connected to it; these switches use a common SMI, but separate MAC and PHY pairs for the switch ports.
- Many of these controllers allow the CPU to control spanning tree functions; can we reuse the STP code from if_bridge?
Switch Controllers
RTL8366RB
This Realtek Ethernet switch is used in the TP-LINK TL-WR1043ND. It has five ports with built-in PHYs and one MII port for connection to an Ethernet MAC. It can be configured through a register interface that is based on I2C (but not I2C protocol compliant). The built-in PHYs are controlled through this register interface, instead of an MDC/MDIO interface through the CPU MAC.
Accessing PHY registers through the switch controller over the bit-banging I2C is slow. The media status update routing uses the switch port status registers, resulting in a total of 3 register accesses instead of 5 * 3 * 3 register accesses compared to standard PHY polling. This makes the 1Hz update poll feasible.
AR8x16
This family of Atheros switches is used in a number of devices, including the TP-LINK TL-MR3420 and TL-MR3220 (built-in to the SoC).
VLAN configuration on this switch differs significantly from the RL836x family. Although it also has 16 VLAN table entries, the switch itself manages the table entries, so you can just say "add a VLAN for VID 27, and add ports 2 and 3 to it". On the Realtek switch, you have to do the management in the driver.
Current Implementation
I've developed some code to control an RTL8366RB. The patch is at http://www.lassitu.de/freebsd/etherswitch-rtl8366rb.patch.
It consists of the sys/dev/etherswitch/etherswitch.c driver, which provides a cdev to userland, and can be attached to individual switch drivers through the sys/dev/etherswitch/etherswitch_if.m interface. The sbin/etherswitchcfg utility talks to the etherswitch driver to query and change the switch settings.
sys/dev/etherswitch/rtl8366rb.c implements the etherswitch interface and attaches the etherswitch driver as a child. It talks to the hardware by attaching to an I2C bus. On the TL-WR1043ND, this is provided via ar71xx_gpio<-gpiobus<-gpioiic<-iicbus.
Ioctl Interface
The cdev allows a userland program to issue a number of ioctl to query and configure the switch. A forthcoming man page will explain the ioctls in detail.
Modeling the Switch
One challenge in modeling the switch was to find a way to reuse the existing MII and PHY code (sys/dev/mii). The miibus code assumes that an Ethernet interface has a MAC interface that is connected to one or more PHYs through MII, and that these PHYs can be controlled via SMI (serial management interface) with MDC and MDIO. Since there is only a single MAC, only one of the PHYs can be active at any given time (all others must have the BMCR_ISO isolate and possibly BMCR_POFF set).
Implementation Issues
Register access over the software-implemented I2C bus is slow, accessing PHY registers even more so. In one iteration of the kernel driver, a callout ran every second to run the miitick and miipoll functions for each of the five PHYs. This took about 100 milliseconds per PHY per function, or about a second in total. Because the I2C access functions use DELAY(9), most of this time is wasted; essentially the machine becomes very unresponsive. I haven't found a way to deal with this, potentially the driver will simply ignore status changes on the PHYs.
MII, MDIO, and miibus
In a typical Ethernet controller, there is one MAC, one MDIO master, and one or more PHYs (with MII slave and MDIO slave interfaces). The current miibus code (sys/dev/mii) relies on this model. The following diagram shows this typical setup.
For embedded systems with an Ethernet switch, the picture becomes more complicated. The diagram shows a typical setup (for example, AR7241), where the two Ethernet interfaces are connected to the switch function, which in turn connects to PHYs.
arge0 has it's MAC connected to PHY4; it's MDIO master is not connected.
arge1 has it's MAC connected to a MAC of the switch core (back-to-back, no PHY involved). The switch controller is connected to arge1's MDIO master.
The switch controller has an internal MDIO master, which is used to communicate with the PHYs. PHY0 to PHY3 are connected to their respective switch MAC via an MII each.
miibus Challenge
Since arge1 has a fixed connection to the switch core, a hard-coded (via hints) setup is sufficient. However, arge0 is connected to a PHY and needs to reconfigure it's MAC according to the PHY4 settings. To communicate, the arge0 driver needs to talk to the switch to read and write the PHY registers.
With the current miibus code, this poses a number of challenges:
- ethernet switches that are attached via MDIO may not look like PHYs, in particular, they might not have a BMSR or ID1 and ID2. This is the case with the Atheros AR8x16 series. The miibus code assumes that all possible children of an miibus are PHYs, and will not easily accept non-PHY children.
- attach hierarchy and sequence, e.g. for the PHY in the switch chip, where the switch chip is attached to arge1, but the PHY needs to be at arge0.
- creating miibus'ses that are not associated with an interface. The miibus code assumes that any PHY is attached to an MII connected to an ethernet interfaces MAC. The switch ports are not part of the ethernet interfaces, and their MAC configuration doesn't change when one of the switch ports negotiate a different media setting.