Ethernet has experienced huge capacity-driven growth recently from 10 Gbps up to 100 Gbps. The advantages of Ethernet are threefold 1) the low cost of equipment, 2) the scalability, as well as 3) the ease of operations, administration and maintenance (OAM). These features make Ethernet the best candidate to provide transport network for many application e.g. Data Center Network (DCN), converged Storage Area Network (SAN), High Performance Computing (HPC), cloud computing and Fibre Channel over Ethernet (FCoE). In this research, we explore the possibility of achieving a lossless, or more precisely, drop-free Ethernet. Further, we study the effect of this lossless Ethernet on several applications, namely i) switch fabric in routers, ii) data center network, iii) Remote Direct Memory Access (RDMA), iv) Common Public Radio Interface (CPRI) over Ethernet.
Switch fabric in routers requires very tight characteristics in term of packet loss, fairness in bandwidth allocation, low latency and no head-of-line blocking. Such attributes are traditionally resolved using specialized and expensive switch devices. With the enhancements that are presented by IEEE Data Center Bridging (DCB) (802.1, 2013) for Ethernet network, we explore the possibility of using commodity Ethernet switches to achieve scalable, flexible, and more cost-efficient switch fabric solution, while still guaranteeing router characteristics.
In addition, the rise of DCN facilitates new applications such as SAN and Virtual Machine (VM) automated deployment and migration that require high data rate, ultra low latency and packet loss. Additionally, DCN is required to support layer-two applications such as Virtual Local Area Network (VLAN) and Virtual eXtensible Local Area Network (VXLAN) that provide flexible workload placement and layer 2 segmentation. Because Ethernet is the most widely used transport network in data center fabric, we study the possibility of achieving a lossless transport layer to support these applications.
Due to Ethernet widespread, other technologies are migrating to Ethernet such as RDMA. RDMA technology offers high throughput, low latency, and low Central processing unit (CPU) overhead, by allowing network interface cards (NICs) to transfer data in and out of host’s memory directly. Originally, RDMA requires InfiniBand (IB) network protocol/infrastructure to operate. RDMA over IB requires adopting new network infrastructure which has experienced limited success in the enterprise data centers. RDMA Over Converged Ethernet (RoCE) v1 (Association et al., 2010) and v2 (Association et al., 2014) are presented as new network protocols which permit performing RDMA over Ethernet network. RoCE presents an intermediate layer with IB as an upper interface to support RDMA and Ethernet as a lower interface. This allows using RDMA over standard Ethernet infrastructure with specific NICs that support RoCE. Such application requires a robust and reliable Ethernet network which raises the need for an Ethernet congestion control protocol.
Ethernet layer congestion control protocols
Because of the widespread of Ethernet, it has become the primary network protocol that is considered to support both DCN and SDN. Ethernet was originally designed as a best-effort communication protocol, and it does not guarantee frame delivery. Many providers believe that TCP can perform well in case of network congestion. However, TCP sender detects congestion and reacts by reducing its transmission rate when segment loss occurs. To avoid this conservative TCP reaction on segments loss, one should minimize packet dropping at layer 2. In this context, IEEE has defined a group of technologies to enhance Ethernet into a lossless fabric named DCB (802.1, 2013) which are also known as CEE. These technologies aim to create a robust and reliable bridge between data center components through Ethernet network. DCB comprises Ethernet PAUSE IEEE 802.3x, (PFC – 802.1Qbb) (IEEE Standard Association, 2011) and QCN (802.1Qau) (IEEE 802.1Qau, 2010; Alizadeh et al., 2008).
These technologies can be classified based on the reaction point into two categories i) Hopby-Hop or ii) End-to-End. In hop-by-hop flow control mechanisms, control messages are forwarded from node to node in a store-and-forward manner. Hop-by hop transport involves the source, destination node, and some or all of the intermediate nodes. Hop-by-hop mechanisms react faster than End-to-End ones. However, it propagates the congestion starting from the congested point backward to the source causing what is known in the literature as congestion spreading or tree saturation effect (Hanawa et al., 1996) . Consequently, it causes HOL blocking. In addition, hop-by-hop mechanisms face scalability issue because it needs to keep per-flow state information at intermediate nodes.
Conversely, end-to-end mechanisms acknowledge the source responsible for congestion directly when congestion occurs . This involves relatively high delay until the source response. Due to this delay, hop-by-hop transport achieves considerably faster reaction time with short-lived flows. However, due to hop-by-hop techniques limitation, namely scalability and HOL blocking, end-to-end mechanisms are preferable to control long-lived flows .
Ethernet PAUSE is a hop-by-hop congestion control mechanism. It was issued to solve the congestion problem by sending a PAUSE request to the sender when the receiver buffer reaches a specific threshold. The sender stops sending any new frames until a resume notification is received or a local timer expires. Some data flows are very sensitive to frame loss such as FCoE and iSCSI, others depend on higher layer traffic control. In addition, Ethernet PAUSE is a coarse-grained protocol because it reacts per port which causes HOL blocking.
PFC was introduced as a fine-grained protocol to mitigate the HOL blocking by enabling the operator to discriminate flows based on traffic classes that are defined in IEEE 802.1p task group (Ek, 1999). PFC divides data path into eight traffic classes; each could be controlled individually. Yet, PFC is still limited because it operates on port plus traffic class (priority) level which can cause tree saturation (Hanawa et al., 1996) and HOL blocking (Stephens et al., 2014).
INTRODUCTION |
