## Analysis of load-balanced switch with finite buffers

Yury Audzevich<sup>\*</sup>, Yoram Ofek<sup>\*</sup>, Miklos Telek<sup>†</sup>, Bülent Yener<sup>‡</sup>

\*Department of Information and Communication Technology,

University of Trento, Trento, Italy

Email: {audzevi,ofek}@dit.unitn.it

<sup>†</sup>Department of Telecommunications,

Budapest University of Technology and Economics, Budapest, Hungary

Email: telek@hit.bme.hu

<sup>‡</sup> Department of Computer Science, Rensselaer Polytechnic Institute, Troy, New York, USA

Email: yener@cs.rpi.edu

Abstract—Recently the Birkhoff-von Neumann load-balanced (LB) switch has become a promising switch design due to its high scalability properties and simple control. The performance of the LB switch was studied under strong assumptions such as infinite buffers and admissible traffic conditions. However, both such assumptions may be violated in multi-hop networks since admissibility requirement cannot be maintained unless some inter-switch feedback mechanism is implemented, and infinite buffers are not feasible either.

This paper considers the performance of the LB switch with finite central stage buffers under both (i) admissible and (ii) inadmissible input traffic conditions. Its contributions are two folds: firstly, by means of mathematical model we demonstrate that the load-balanced switch has a non-zero cell dropping probability due to buffer overflow even under the admissible input traffic assumptions. Secondly, cell loss probabilities are even higher and large buffers are required under inadmissible traffic conditions to cope with such behavior.

#### I. INTRODUCTION

As the traffic volume on the Internet increases exponentially [4], so does the demand for fast switching of packets between asynchronous high speed routers. The packet switching time may run up to tens of nanoseconds in such routers with more than thousands ports, each processing at 10 GB/s. Switching such high volume of traffic from input to output requires large buffers and fast processors to perform the switching and forwarding functions. The load-balancing switching approach is simple, and therefore, may be capable to perform the switching and forwarding from all inputs to all outputs simultaneously with low complexity and high scalability. This simple approach is distributed and does not require fast switch control units [2], primarily because each stage is independent and it makes its own distributed switching and forwarding calculations.

In this paper, we focus on the load-balanced switching architecture (LB switch) introduced in [1], [2], [3], [5], which was aimed to be highly scalable switch architecture. Our efforts are focus on issues of finite buffer in load-balanced switch and its performance.

Finite buffer leads to the possibility of *cell drop in the central stage* of the switch even under admissible input traffic conditions. We also present simulation results for inadmissible

input traffic. Such kind of traffic patterns can appear in the multi-hop network where inter-switch feedback links are not implemented.

Some of the existing load-balanced switch designs with more complex control (Mailbox switch [9], Contention and Reservation switch [12]) are implementing interstage communication links by means of symmetric interconnection [9]. Some other schemes use stable matching algorithms for cells transmission (Concurrent Matching switch [11]) and extended set of Virtual Output Queues (VOQs) in the inputs and outputs (Byte-Focal switch [10]). However, transmission of occupancy data related to central buffers will add to the communication and computational overheads and, it may affect the system scalability. On the other hand, the feedback coming from central stage buffers can guarantee that none of cells are transmitted while central buffers experience overflow. Even with infinite buffering and fairly complex control some structures [9] cannot provide throughput guarantees.

In this paper we analyze the cell loss probability in the central stage buffers while considering single-stage switch [2] with *finite amount of buffering*. In contrast to the previously considered systems (where a set of strong assumptions were used), LB switch with *finite buffers* always has a possibility to drop a cell during operation, which will degrade system throughput and create significant problems back to resequencer in the output. The analysis of load-balanced switch behavior under mentioned conditions has significant practical importance. Since load-balanced switch obey the non-work conserving policy, even negligible overload on the output could lead to high internal cell loss. We assume the case when *no feedback information paths between stages are implemented*.

The initial part of the paper shows an adopted *batch* - Geo/D/1/K queueing model [7], [8], used for derivation of cell loss probability in the  $N \ge N$  load-balanced switch. As an additional tool, the simulator of the load-balanced switch was written. In following, the matching between the mathematical model and simulations is shown. Moreover, the behavior of the system under non-admissible input traffic was examined only by means of simulations.

The paper organization is as follows. Section II presents

a short overview of the operational principles related the to load-balanced switch as well as assumptions used for theoretical model and simulations. In section III, we proceed with the description of the model used for the analysis of loss probability in the  $N \ge N$  switch. Then in section IV some numerical analytical results and simulations are presented. Finally, some conclusions and future works are discussed in Section V.

# II. PRINCIPLES AND ANALYSIS OF LOAD-BALANCED SWITCH

## A. Load-balancing switch architecture

The basic load-balanced switch architecture shown in Figure 1 presents a stage where buffers positioned in-between two identical crossbar switches [1], [2], [4]. Each buffer at each intermediate input (central stage) is partitioned into N separate Virtual Output Queues, one for each output in order to avoid Head-of-Line blocking and reduce losses.



Fig. 1. Presentation of single-stage buffering load-balanced switch

The operating idea behind the N x N size single-stage buffering architecture [2] is to load-balance the packets (cells) from inputs along the VOQs of central buffering stage. Then, the packets are sent to the related output. Each crossbar set up a periodic connection pattern connecting the input to the central stage. The connection time for each packet is predefined and the rate is equal to 1/N. Both crossbars operate identically and walk through a fixed sequence of interconnections according to the rule j = (i + t)modN. Arriving packets are switched instantly and there are no buffers inside the crossbars [2], [3]. Some packets can checkout from the system in a disorganized fashion because of the FIFO policy. Researchers have attempted to solve this problem by proposing a multi-stage buffering design [3]. In contrast with the single-stage scheme a set of buffers are also available in the input and output stages. Consequently, solutions to resolve cells that are out-of-order were presented in [4], [5].

### B. Assumption and traffic model

Authors of [1], [2], [3], [4], [5] have assumed that all of the load-balancing switch packets are of the same size and they simply call them packets. Although in the real Internet world packets have variable length, yet these assumptions have been idealized and several studies were built upon. Within this paper, we will be calling *packets of the same size as cells*  and, we assume them to be integer divide of the variable size packets.

Another significant assumption is that each line card uses the same common slotted time. Indeed, this assumption implies that only one cell can arrive at an input and depart from an output of the switch during a time slot. Each line card is assumed to support equal rate flows. As one of the major assumptions to permit achievement of high throughput [5], [13] for the initial single-stage load-balanced switch is traffic admissibility. Precise definition of this traffic model can be found below. Identical to mentioned above assumptions were used for the description of mathematical model in Section III.

We describe the traffic load of the switch with an arrival probability matrix of dimension N x N:

$$A = \begin{pmatrix} a_{0,0} & a_{0,1} & \dots & a_{0,N-1} \\ a_{1,0} & a_{1,1} & \dots & a_{1,N-1} \\ \vdots & \vdots & \ddots & \vdots \\ a_{N-1,0} & a_{N-1,1} & \dots & a_{N-1,N-1} \end{pmatrix}$$

The i, j element of the arrival probability matrix  $a_{i,j}$  is the probability that a cell arrives to input i which is destined to output j in a given time slot. Indeed, we assume that the number of cells (and their input output port assignment) arrives to the switch in consecutive time slots are independent identically distributed random variable. This way matrix A completely characterizes the traffic process. Since no more than one cell can arrive to a given input we have

$$\sum_{j=0}^{N-1} a_{i,j} < 1, \quad i = 0 \dots N - 1 \tag{1}$$

and we say that output j is not overloaded if

$$\sum_{i=0}^{N-1} a_{i,j} < 1.$$
 (2)

When (2) does not hold, the traffic load is referred to as inadmissible. Inadmissible traffic load results in infinite queue length and delay in case of infinite buffer switches, but it can be analyzed in the same way as admissible in case of finite buffer switches. In the section IV the simulation results for such kind of traffic will be presented. In following the amount of traffic sent to each input will not exceed the unity. On the contrary, some particular outputs will be overloaded.

### III. THE MATHEMATICAL MODEL

In this section we present the mathematical model used for cell loss probability analysis of N x N load-balanced switch ( $N \ge 2$ ) with finite buffers. As already mentioned, the following analysis is based on the *batch-Geo/D/1/K* queueing model [7], [8]. We consider the system as a time homogeneous discrete-time, discrete-state Markov chain. The model [8] is modified in correspondence to the load-balanced switch operation principles. We do not present any complete derivation of a queue length and unfinished work distributions. Figure 2 illustrates the architecture of the N x N load-balanced switch.



Fig. 2. N x N load-balanced switch with finite buffers

The goal of our analysis is to describe the occupancy of the virtual output queue (VOQ) in the central stage of the switch in time. In particular, we consider  $V_{0,0}$  for our study. Let  $vq_{i,j}(n)$  be the occupancy (the number of cells) in the VOQ from input port *i* to output port *j* at time slot *n*.  $vq_{i,j}(n)$  can change in every time slot, from 1 to *n*. As we are considering the case of  $N \ge N$  switch we describe the evolution of the occupancy function during N time slots, because  $V_{0,0}$  has a kind of periodic behavior with a N time slots period.

The evolution of vq(n) is characterized by the following equation:

$$vq_{0,0}(n+1) =$$

$$max[vq_{0,0}(n) + I_{(n \ mod \ N=0)}\alpha_{0,0} + I_{(n \ mod \ N=1)}\alpha_{1,0}$$

$$+ \dots + I_{(n \ mod \ N=N-1)}\alpha_{N-1,0} - I_{(n \ mod \ N=0)}, 0],$$
(3)

where  $\alpha_{i,j}$  is the binary random variable representing the number of cells arrived to input *i* and destined to output *j* in the given time slot, and *I* is the indicator operator with two possible values:

$$I_{condition} = \begin{cases} 1, & \text{if condition is true;} \\ 0, & \text{otherwise.} \end{cases}$$

(3) represents the following two main cases:

if  $n \mod N = 0$ , then

$$vq_{0,0}(n+1) = max[vq_{0,0}(n) + \alpha_{0,0} - 1, 0]$$
  
if  $n \mod N > 0$ , then

$$vq_{0,0}(n+1) = vq_{0,0}(n) + \alpha_{i,0}.$$

According to (3) an N time slots long interval of  $vq_{0,0}(n)$ , when  $n \mod N = 0$ , is characterized by

$$vq_{0,0}(n+N) = max[vq_{0,0}(n) + \sum_{i=0}^{N-1} \alpha_{i,0} - 1, 0]$$
 (4)

Indeed  $vq_{0,0}(n)$  is non-decreasing during the first N-1 time slots and non-increasing during the last time slot.

The process can have N possible transitions from state



Fig. 3. Possible transitions of the  $N \times N$  LB switch when B > i - 1 + N

to the others(state represents number of cells in the buffer). Transition state probabilities could be represented by the following notations:

$$p_i = Pr\left(\sum_{i=0}^{N-1} \alpha_{i,0} - 1 = i\right), \quad \forall i \in \{-1, 0, 1, \dots, N-1\},$$

where  $a_{i,0} = Pr(\alpha_{i,0} = 1)$ .

In case of inhomogeneous traffic matrix we have,  $p_{-1} = \prod (1-a_{i,0}),$ 

$$p_{0} = \frac{1}{1!} \sum_{i}^{i} \left( a_{i,0} \prod_{j,j\neq i} (1-a_{j,0}) \right) = p_{-1} \frac{1}{1!} \sum_{i} \frac{a_{i,0}}{1-a_{i,0}},$$

$$p_{1} = \frac{1}{2!} \sum_{i} \left( a_{i,0} \sum_{j,j\neq i} \left( a_{j,0} \prod_{k,k\neq i,j} (1-a_{k,0}) \right) \right)$$

$$= \frac{1}{2!} p_{-1} \sum_{i} \left( \frac{a_{i,0}}{1-a_{i,0}} \sum_{j,j\neq i} \frac{a_{j,0}}{1-a_{j,0}} \right),$$

$$p_{2} = \frac{1}{3!} p_{-1} \sum_{i} \left( \frac{a_{i,0}}{1-a_{i,0}} \sum_{j,j\neq i} \left( \frac{a_{j,0}}{1-a_{j,0}} \sum_{k,k\neq i,j} \frac{a_{k,0}}{1-a_{k,0}} \right) \right),$$

$$\dots$$

$$p_{N-1} = \frac{1}{2!} \prod_{i=0}^{n} a_{i,0},$$

 $p_{N-1} = \frac{1}{N!} \prod_{i} a_{i,0},$ 

where the limits of the summations and products are 0 and N-1.

In case of a homogeneous traffic matrix, i.e., when  $a_{i,0} = a \quad \forall i \in \{-1, 0, 1, \dots, N-1\}$ , these transition probabilities simplify to

$$p_{-1} = (1-a)^{N},$$

$$p_{0} = \binom{N}{1} a^{1} (1-a)^{N-1},$$

$$p_{1} = \binom{N}{2} a^{2} (1-a)^{N-2},$$
...
$$p_{i} = \binom{N}{i+1} a^{i+1} (1-a)^{N-i-1}.$$

(4) represents a DTMC, whose transition structure is depicted in Figure 3 (only for the case when B > i - 1 + N). Figure 3 does not take into account cell loss. On the contrary, we present the transition graph for N = 3 and B = 5, where cell loss appears in the final states (Figure 4).

The transition probability matrix of the process is presented in Figure 5 and composed of N+1 non-zero diagonals. As the finite buffer case is considered, the dimensions of the transition probability matrix  $((B+1) \times (B+1))$  are directly related to buffer size B. The last column of matrix P is composed of irregular elements as the structure of transitions is changing when the buffer gets full (Figure 4).



Fig. 4. The transition graph of the LB switch when N = 3 and B = 5.

$$P = \begin{pmatrix} p_0 + p_{-1} & p_1 & p_2 & \dots & p_{N-1} & 0 & \dots & 0 \\ p_{-1} & p_0 & p_1 & \dots & p_{N-2} & 0 & \dots & 0 \\ 0 & p_{-1} & p_0 & \dots & p_{N-3} & 0 & \dots & 0 \\ 0 & 0 & p_{-1} & \dots & p_{N-4} & 0 & \dots & 0 \\ \vdots & \vdots & \ddots & \vdots & \dots & \vdots \\ 0 & 0 & \dots & 0 & \dots & p_{-1} & p_0 & \sum_{\substack{i=1\\N-1}}^{N-1} p_i \\ 0 & 0 & \dots & 0 & \dots & 0 & p_{-1} & \sum_{i=0}^{N-1} p_i \end{pmatrix}$$

Fig. 5. Transition probability matrix P

Let  $V_i$  denote the steady state probability that *i* cells are present in the VOQ and *V* the row vector composed by the stationary state probabilities, i.e.,  $V = \{V_0, V_1, \ldots, V_B\}$ . The stationary distribution of the DTMC can be obtained as the solution of the linear system ([6])

$$V P = V, VH = 1, (5)$$

where H is the column vector whose elements are equal to one.

In contrast with the case when N = 2, the state transition graph of the  $N \times N$  switch (when N > 2) does not exhibit a simple birth-death structure and, consequently, the stationary probability vector does not have a simple closed form solution.

In order to compute the loss probability we create column vector L, whose *i*th element,  $L_i$ , defines the mean number of lost cells in a cycle when the buffer occupancy is *i* at the beginning of the cycle. In state B the mean number of lost cells is  $\sum_{j=1}^{N-1} jp_j$ . In general,

$$L_{i} = \begin{cases} 0 & \text{if } i \leq B - N + 1\\ \sum_{j=1}^{i-B+N-1} jp_{B-i+j} & \text{if } i > B - N + 1 \end{cases}$$
(6)

From which the structure of vector L is as follows.

$$L = \begin{bmatrix} 0 \\ \vdots \\ p_{N-1} \\ 2p_{N-1} + p_{N-2} \\ 3p_{N-1} + 2p_{N-2} + p_{N-3} \\ \vdots \\ \sum_{j=1}^{N-1} jp_j \end{bmatrix},$$
(7)

Having vector V and vector L the loss probability is obtained as

$$P(loss) = V L. \tag{8}$$

The simulation and analytical results for  $N \ge N$  load-balanced switch are presented in section IV.

#### IV. COMPUTATIONAL STUDY AND SIMULATIONS

In this section, we present a computational study of a single-stage buffering, load-balanced switch of size  $N \ge 2$  under various finite buffer sizes  $B \ge N$  and traffic loads. In particular we consider both (1) admissible and (2) overloading inadmissible incoming traffic conditions. There are two objectives for this computational study. First, we aim to compare the accuracy of analysis and simulation results under the admissibility assumption. Second, we calculate by means of simulations the cell loss probabilities under inadmissible traffic.

### A. Numerical and simulation model

Next we define the arrival rate matrices A used for traffic scenarios as follows:

Uniform i.i.d:  $a_{i,j} = \rho/N$ , Unbalanced i.i.d:

$$a_{i,j} = \begin{cases} \rho(W + \frac{1-W}{N}), & \text{if } i = j, \\ \rho\frac{(1-W)}{N}, & \text{otherwise,} \end{cases}$$

where W is an unbalanced probability and  $\rho$  is a system load. When W = 0 the traffic is uniform, but in case when W = 1 the system will have "uniform" hotspot.

**Hotspot to single output:** The system has a hotspot to the output K ("many-to-one" hotspot) if

$$a_{i,j} = \begin{cases} \rho(W + \frac{1-W}{N}), & \text{if } j = K, \, i = \{0 \dots N - 1\}, \\ \rho\frac{(1-W)}{N}, & \text{otherwise.} \end{cases}$$

We also compared the behavior of the scheme under admissible uniform i.i.d, unbalanced i.i.d and "many-to-one" hotspot traffic matrices. Finally, we proceed with simulations for uniform inadmissible traffic (Figure 7).

#### B. Results and interpretations

In our initial experiments all the arrival rate matrix elements were set to  $\frac{\rho}{N}$  (Figures 6 and 7). First experiment (Figure 6) shows the intensive increase of the cell loss when switch size verge towards the buffer size. While examining our results we found out that the values of cell loss probability for admissible uniform i.i.d traffic and "many-to-one" hotspot model are



Fig. 6. Cell Loss versus Load ( $\rho = 0.4...0.99$ ) for N x N LB switch, buffer=70 cells, *admissible uniform i.i.d traffic*.

similar. Note, that our analysis is based on the cells occupancy of central stage VOQs with respect to some output, so it is obvious that in case of uniform i.i.d traffic and considered "many-to-one" hotspot model, the loading of corresponding column in the arrival rate matrix will be similar. However, the complete match in the results will be valid only for the case of admissible traffic matrix (for inadmissible traffic the loadings of specific columns of matrix A will be different).



Fig. 7. System loss versus Load for N x N LB switch, overall load range  $\rho = 0.9...1.99$ , Buffer = 70 cells, *inadmissible traffic*.

The simulations related to the  $N \ge N$  switch loss probability under inadmissible traffic model is depicted in Figure 7. We consider the load to be in range of  $0.9 \dots 1.99$ . As each stage of the switch obey non-work conserving policy, the amount of cells sent to the specific output (under overload) will remain in the central stage buffers during undefined period of time. In other words arrival service rate of the system is becoming greater than departure service rate (Figure 7 - doubling the system load will lead to 50% internal loss probability). As a result finite buffers will experience fast queue build up process and congestion inside.

If we assume that the switch is operating with variable length packets, preliminarily segmented (in the inputs) into fixed size cells, the drop of a single cell belonging to some



Fig. 8. Cell Loss versus Buffer Size, overall load 0.99, *admissible uniform i.i.d traffic*.

packet can lead to the drop of the whole packet. The reassembly unit will continuously exclude the remaining cells of the "broken" packet from the system degrading also the throughput. We suspect that in this case the packet loss probability can have a multiplicative effect related to cell loss probability. In the paper we assume that there is no feedback between the stages is implemented, so inputs do not have any information about occupancy of the central stage queues and traffic arriving to all the other inputs.

Figure 8 shows dependence of internal cell loss regarding to a buffer size under the maximum load. As it is expected, the loss probability is decreasing with increase of buffer size, and press towards zero (and throughput will reach 100% [2], [4]) in ideal infinity buffer size case.



Fig. 9. Cell Loss versus Switch Size, overall load 0.99, Buffer Size = 25 cells, *admissible uniform i.i.d traffic*; Analytical and simulation results.

Graph 9 shows dependence of central stage buffers cell loss probability versus size of the switch under unchanged buffer size. In this case LB switch presents behavior similar to input buffer switch under uniform traffic. As the switch size is increasing, the probability that additional cells will arrive to the specific queue during a time cycle is also increasing. This fact is important to take into account while adding new



Fig. 10. Cell Loss versus Load,  $16 \times 16$  LB switch, Buffer Size = 20 cells, *admissible unbalanced i.i.d traffic, for different W*.



Fig. 11. Buffer Size versus Switch Size with Loss Probability 1E - 12 and 1E - 09, for overall load of 0.99 (*admissible uniform i.i.d traffic*).

line cards to scale up system performance. While increasing operating performance one can also increase the internal cell loss probability if the appropriate buffering is not used.

For the final experiment (Figure 10), we choose uniform i.i.d (W = 0.2), unbalanced i.i.d (W = 0.5) and 'uniform' hotspot traffic models (W = 0.8) with overall load in range  $\rho = 0.4 \dots 0.99$  and fixed buffering of 20 cells. With increase of unbalanced probability W, the diagonal values of arrival rate matrix are considerably higher than all other elements, i.e., each input has different hotspot. This implies that each input is sending high portion of traffic to the specific output while transmitting negligible amount to all the others. However, even in this case, input traffic remains perfectly balanced between the outputs because of the round-robin cell spreading in the first stage crossbar. As a result, the loss probability results for "uniform" hotspot are appearing to be the smallest among other admissible traffic matrices (Figure 10). Each of the graphs was compared also with the simulation results, the match in the results was presented in Figure 9.

#### V. CONCLUSION

This paper introduces the new problem of the cell loss in the single-stage load-balancing switch with finite buffers. The short analysis and simulations show that load-balancing switch with finite buffers has always a non-zero probability (formula 8) of dropping cells due to internal congestion. Multiple cell drops considerably degrades overall system performance especially when cells are distributed among different line cards making it difficult to re-sequence the cells.

The results demonstrate that the loss probability can be small for admissible input traffic. However, as shown in Figure 11, when the admissible traffic is near full load, say 0.99, there can be significant loss if we would like to maintain the packet loss probability with similar values optical fiber transmission, say, 1E-12 and 1E-09. This may be significant for streaming media (primarily video) applications that are expected to occupy more than 90% of the Internet traffic and may be transmitted using the unreliable UDP (user datagram protocol). Consequently, in the case of multi-hop networks the traffic may become inadmissible, even for a short period of time, will require large number of buffers in order to maintain cell loss in an acceptable level. Moreover, if load-balancing switches are implemented in the all-optical domain, where buffers are costly and complex, and therefore, may be limited to a very small number the cell loss probability may become a major issue. This may become even worse when variable size packets are transmitted, since a single cell loss will cause a whole packet (of multiple cells) loss. The details of such traffic scenarios will be studied in future works.

#### REFERENCES

- C.S. Chang, D.S. Lee and Y.S. Jou, Load-Balanced Birkhoff-von Neumann Switches, IEEE HPSR'01, pp. 276 - 280, Dallas, May 2001.
- [2] C.S. Chang, D.S. Lee and Y.S. Jou, *Load-Balanced Birkhoff-von Neumann switches, Part I: One-Stage Buffering,* Computer Communications, Vol. 25, pp. 611-622, 2002.
- [3] C.S. Chang, D.S. Lee and C.M. Lien, Load-Balanced Birkhof-von Neumann switches, Part II: Multi-Stage Buffering, Computer Communications, Vol. 25, pp. 623-634, 2002.
- [4] I.Keslassy, S.T. Chuang, K.Yu, D. Miller, M. Horowitz, O. Solgaad and N. McKeown, *Scaling Internet Routers Using Optics*, ACM SIGCOMM'03, Karlsruhe, Germany, 2003.
- [5] I.Keslassy, *The Load-Balanced Router*, Ph.D. dissertation, Stanford University, Stanford, 2004.
- [6] Leonard Kleinrock, Queueing Systems: Volume 1 Theory Wiley, New York, 1975.
- [7] A. Gravey, J.-R. Louvion, and P. Boyer, On the Geo/D/1 and Geo/D/1/n queues, Performance Evaluation, vol. 11, pp. 117 - 125, 1990.
- [8] Hideaki Takagi, Queueing Analysis: Volume 3 Discrete Time Systems, North Holland, Amsterdam, 1993.
- [9] C.S. Chang, D. Lee, Y.J. Shih, Mailbox Switch: A Scalable Two-Stage Switch Architecture for Conflict Resolution of Ordered Packets, in Proceedings of IEEE INFOCOM'04, vol.3, pp. 1995 - 2006, Hong Kong, March 2004.
- [10] Y. Shen, S. Jiang, S.S. Panwar, H.J. Chao, Byte-Focal: A Practical Load-Balanced Switch IEEE HPSR'05, pp. 6 - 12, Hong Kong, May 2005.
- [11] Bill Lin and Isaac Keslassy, *The Concurrent Matching Switch Architecture*, IEEE INFOCOM'06, pp. 1 - 12, Barcelona, Spain, April 2006.
- [12] C. L. Yu, C.S. Chang, D.S. Lee, *CR Switch: A Load-Balanced Switch with Contention and Reservation*, IEEE INFOCOM'07, pp. 1361-1369, Anchorage, Alaska, May 2007.
- [13] C.-Y. Tu, C.-S. Chang, D.-S. Lee, C.-T. Chiu Design a Simple and High Performance Switch Using a Two-Stage Architecture, IEEE GLOBE-COM'05, vol. 2, pp. 6 - 11, St. Louis, USA, November 2005.