The Network Effect: CCIE DC: Buffer to Buffer Credits in FC and Dedicated Vs Shared Mode, then finally FCoE Distances

Hi Guys!

Let's talk about a topic that at least for me had me running scared, but now that I know how it all works is much less intimidating :).

Buffer to Buffer Credits

So in FC, Drops are obviously a huge issue, one of the causes of Drops is buffers running out on receive ports, so FC created a concept called Buffer to Buffer Credits, What this does is let the two switches communicate to each other how much buffer space they have for receiving traffic, when all the available buffers have been depleted, the FC will tell the other end to pause sending until it has emptied it's buffers, it will then send a R_RDY to say hey let's keep transmitting :).

From Wikipedia:

"Each time a port transmits a frame that port's BB Credit is decremented by one; for each R RDY received, that port's BB Credit is incremented by one "

Buffer to buffer credits are affected by distance: the problem is that when traffic is on the wire, it takes a while for the other end to receive them, while that traffic is being transmitted down to the other end, now if we had only 1 buffer credit, we would only be able to send one frame at a time and we would have to wait to receive an R_RDY back before we could transmit the next one! This would obviously negatively impact our SAN link performance.

So, Buffer to Buffer credits allow us to send a certain amount down the wire that we know the other end will be able to cope with, because it has communicated to us how many buffers it has available, we keep track of how much we are sending it, so we know how much buffer we are using up as we send the traffic.

So Buffer to Buffer credits are affected by distance: The longer the fibre link, the more buffer to buffer credits we need.

The rough formula is:

BB_Credit = [port speed] x [round trip time] / [frame size]

Another more practical formula at least as I am concerned is 2 buffer to buffer credits per KM.

OK now let's look at how they fit in with our MDS

So you may know the internal structure of the MDS is that basically each set of ports is allocated to a port-group, this is very similiar to most modern day switches (but modern day switches hide a lot of the buffer allocation magic)

Let's look at the port resource command and we can get a bit of information and explain along the way:

MDS1# show port-resources module 2
Module 2
Available dedicated buffers are 4656

Port-Group 1
Total bandwidth is 12.8 Gbps
Total shared bandwidth is 12.8 Gbps
Allocated dedicated bandwidth is 0.0 Gbps
--------------------------------------------------------------------
Interfaces in the Port-Group       B2B Credit Bandwidth Rate Mode
                                        Buffers     (Gbps)
--------------------------------------------------------------------
fc2/1                                      16        4.0 shared
fc2/2                                      16        4.0 shared
fc2/3                                      16        4.0 shared
fc2/4                                      16        4.0 shared
fc2/5                                      16        4.0 shared
fc2/6                                      16        4.0 shared
fc2/7                                      16        4.0 shared
fc2/8                                      16        4.0 shared

Ok let's look at the basic output first, if you notice each of the interfaces, you will see that each interface has the default allocation of 16 buffers, the exact number of buffers allocated varies by interface and by what mode you set the interface to, E ports tend to get a lot more buffers because obviously they normally have a lot more traffic going over them.

So, when we have a port with let's say 16 buffers, that means this interface is GUARANTEED that we will give him 16 buffers at least, no matter what situation the rest of the switch is in, even if its smashing away doing a thousand gigabits per second, we guarantee those buffers to our interface

This value is adjustable for each port as we will see later, and as mentioned it's actually changed by the switch automatically depending on what kind of port you configure, and what speed that port is, it will also change depending on the hardware of the module

From the Cisco Documentation:
http://www.cisco.com/en/US/docs/switches/datacenter/mds9000/sw/5_0/configuration/guides/int/nxos/buffers.html
"

The receive BB_credit values depend on the module type and the port mode, as follows:

•

For 16-port switching modules and full rate ports, the default value is 16 for Fx mode and 255 for E or TE modes. The maximum value is 255 in all modes. This value can be changed as required.

•

For 32-port switching modules and host-optimized ports, the default value is 12 for Fx, E, and TE modes. These values cannot be changed.

•

For Generation 2 and Generation 3 switching modules, see the "Buffer Pools" section."

If we take a look at interface fc2/2 for example, which is configured as an F Port, we see the following:

MDS1# show int fc2/2 bbcredit
fc2/2 is up
    Transmit B2B Credit is 3
    Receive B2B Credit is 16
      16 receive B2B credit remaining
      3 transmit B2B credit remaining
      3 low priority transmit B2B credit remaining

The credits allocated to this interface are shown, as you can see, we see a term here, transmit B2B credit, why is that so low? The reason is that this port is connected to a HBA, and the HBA on this particular server does not have all that much buffer space, so it transmits a message saying hey i only have 3 B2B credits spare.

So the FC ports communicate this with each other as you can see above :)

Let's look at how we can manually change the buffer to buffer credits on an interface:

MDS1(config-if)# switchport fcrxbbcredit 24 ? mode   Configure receive BB_credit for specific mode

Here you can see that we could just specify the amount of BB credits, but we have more interesting options avialable to us too!

MDS1(config-if)# switchport fcrxbbcredit 500 mode ?
E   Configure receive BB_credit for E or TE mode
Fx Configure receive BB_credit for F or FL mode

We can pre-configure a port to say OK if it's in E mode, use these buffer credits, if it's in F mode, use these buffer credits

MDS1(config-if)# switchport fcrxbbcredit ?
1-500>              Enter receive BB_credit
default              Default receive BB_credit
extended             Configure extended BB_credit for the port
performance-buffers Configure performance buffers for receive BB_credit

We can specify both performance and extended buffer credits too, performance buffer credits are additional buffer credits on top of the already allocated buffer credits, but are only available in some line cards and modules:

MDS1(config-if)# switchport fcrxbbcredit performance-buffers ?
1-145> Enter performance buffers for receive BB_credit
default Default performance buffers for receive BB_credit

MDS1(config-if)# switchport fcrxbbcredit performance-buffers 145 ?

MDS1(config-if)# switchport fcrxbbcredit performance-buffers 145
fc2/1: (error) requested config change not allowed
MDS1(config-if)# switchport fcrxbbcredit performance-buffers 1
fc2/1: (error) requested config change not allowed
As you can see from the above I don't have a single buffer available :( No performance buffers for me!

However, by enabling the feature:

feature fcrxbbcredit extended

I now have additional buffer to buffer credits I can allocate:

MDS1(config-if)# switchport fcrxbbcredit extended ?
256-4095> Enter extended credit receive BB_credit

MDS1(config-if)# switchport fcrxbbcredit extended 256

So! This shows the basics of buffer to buffer credits, but a few questions remain.

First of All, what is this line of output at the top of our show port-resource:

MDS1# show port-resources module 2
Module 2
Available dedicated buffers are 4656

So we have a pool of buffers that are available to all the ports, this is the "common unallocated buffer pool for BB_Credits" as per the very useful Diagram from Cisco below:

So as you can see, the buffers go up, with allocated buffers per port, the common unallocated buffer pool for BB_credits, and finally the performance buffers which we have IF we have a hardware module that supports them, finally there is the reserved internal buffers which we as the user cannot modify.

Now, if we look at a port that we configure the buffer manually on:

MDS1(config-if)# show run int fc2/2

interface fc2/2
switchport speed 4000
switchport rate-mode dedicated
switchport fcrxbbcredit 128

Notice our "Shared buffer pool" has shrunk:

MDS1(config-if)# show port-resources module 2
Module 2
Available dedicated buffers are 4556

On the other hand, if we take a port out of service (or in this case, a whole bunch of ports)

MDS1(config)# int fc2/10 - 20
MDS1(config-if)# out-of-service
Putting an interface out-of-service will cause its shared resource configuration to revert to default
Do you wish to continue(y/n)? [n] y
MDS1(config-if)#

Suddenly our availble buffers increases:

MDS1(config-if)# show port-resources module 2
Module 2
Available dedicated buffers are 4875

The last thing to worry about in my opinion for buffer to buffer credits is described succulantly in the Cisco documentation:

Enabling Buffer-to-Buffer Credit Recovery

Although the Fibre Channel standards require low bit error rates, bit errors do occur. Over time, the corruption of receiver-ready messages, known as R_RDY primitives, can lead to a loss of credits, which can eventually cause a link to stop transmitting in one direction. The Fibre Channel standards provide a feature for two attached ports to detect and correct this situation. This feature is called buffer-to-buffer credit recovery.

Buffer-to-buffer credit recovery functions as follows: the sender and the receiver agree to send checkpoint primitives to each other, starting from the time that the link comes up. The sender sends a checkpoint every time it has sent the specified number of frames, and the receiver sends a checkpoint every time it has sent the specified number of R_RDY primitives. If the receiver detects lost credits, it can retransmit them and restore the credit count on the sender.

The buffer-to-buffer credit recovery feature can be used on any nonarbitrated loop link. This feature is most useful on unreliable links, such as MANs or WANs, but can also help on shorter, high-loss links, such as a link with a faulty fiber connection.

Configuring:

MDS1(config-if)# int fc2/2

MDS1(config-if)# switchport fcbbscn

That should just above cover hopefully buffer to buffer credit's, next let's look at interfaces and shared bandwidth

Back to our favorite command show port-resource

MDS1(config-role)# show port-resource module 2
Module 2
Available dedicated buffers are 4875

Port-Group 1
Total bandwidth is 12.8 Gbps
Total shared bandwidth is 8.8 Gbps
Allocated dedicated bandwidth is 4.0 Gbps
--------------------------------------------------------------------
Interfaces in the Port-Group       B2B Credit Bandwidth Rate Mode
                                        Buffers     (Gbps)
--------------------------------------------------------------------
fc2/1                                      16        4.0 shared
fc2/2                                     128        4.0 dedicated fc2/3                                      16        4.0 shared
fc2/4                                      16        4.0 shared
fc2/5                                      16        4.0 shared
fc2/6                                      16        4.0 shared
fc2/7                                      16        4.0 shared
fc2/8                                      16        4.0 shared
fc2/9                                      16        4.0 shared

fc2/10 (out-of-service)
fc2/11 (out-of-service)
fc2/12 (out-of-service)

I have highlighted the sections we care about.

In this module i have 12.8 gig available to share amongst 12 ports, the total bandwidth that is shareable across all the ports is 8.8 gig, BECAUSE i have specifically allocated 4 gig of dedicated bandwidth to one port. But in order to get more dedicated ports, i would need to take some ports out of service, let's explore that a bit more...

If I was to leave all ports at the default, the 12.8 gig would simply be divided up amongst the ports as a shared pool of bandwidth,

MDS1(config-if)# show port-resource module 2
Module 2
Available dedicated buffers are 4888

Port-Group 1
Total bandwidth is 12.8 Gbps
Total shared bandwidth is 12.8 Gbps
Allocated dedicated bandwidth is 0.0 Gbps
--------------------------------------------------------------------
Interfaces in the Port-Group       B2B Credit Bandwidth Rate Mode
                                        Buffers     (Gbps)
--------------------------------------------------------------------
fc2/1                                      16        4.0 shared
fc2/2                                      16        4.0 shared
fc2/3                                      16        4.0 shared
fc2/4                                      16        4.0 shared
fc2/5                                      16        4.0 shared
fc2/6                                      16        4.0 shared
fc2/7                                      16        4.0 shared
fc2/8                                      16        4.0 shared
fc2/9                                      16        4.0 shared
fc2/10                                     16        4.0 shared
fc2/11                                     16        4.0 shared
fc2/12                                     16        4.0 shared

So here you can see, we have 12.8 Gig, and we have 12 ports, now it doesn't take a maths Genius (Maths was my worst subject in school ha ha) to work out that 12.8 gig divided by 12 ports does NOT give us 4.0 gig per port, so we have 12 gig to share amongst all these ports. We could take one of those interfaces and make them dedicated like so:

MDS1(config)# int fc2/1
MDS1(config-if)# switchport rate-mode dedicated
MDS1(config-if)# show port-resources module 2
Module 2
Available dedicated buffers are 4885

Port-Group 1
Total bandwidth is 12.8 Gbps
Total shared bandwidth is 8.8 Gbps
Allocated dedicated bandwidth is 4.0 Gbps
--------------------------------------------------------------------
Interfaces in the Port-Group       B2B Credit Bandwidth Rate Mode
                                        Buffers     (Gbps)
--------------------------------------------------------------------
fc2/1                                      16        4.0 dedicated
fc2/2                                      16        4.0 shared
fc2/3                                      16        4.0 shared
fc2/4                                      16        4.0 shared
- Output Ommited -

So now we have 8.8 gig of shared bandwidth free, and 4 gig of dedicated bandwidth, but let's say i need another dedicated interface in this port group, well i have 8.8 gig of shared bandwidth free right, so can't I just carve out another 4 gig of that for my dedicated interface?

MDS1(config-if)# int fc2/2
MDS1(config-if)# switchport rate-mode dedicated
fc2/2: (error) Bandwidth not available

Huh? What's going on

Check out this table below from Cisco:

Every port has a particular amount of resreved bandwidth, in our case it's 0.8 gig per port.. so let's do some math.. 0.8 x 11 (because remember, we have 12 ports, one of which is dedicated, so 8.8 gig of bandwidth to share between 11 ports = 8.8!

So, if we want another dedicated interface, we need to increase that shared pool, how do we do it?

By taking ports out of service:
MDS1(config-if)# int fc2/12
MDS1(config-if)# out-of-service

MDS1(config-if)# show port-resources module 2
Module 2
Available dedicated buffers are 4943

Port-Group 1
Total bandwidth is 12.8 Gbps
Total shared bandwidth is 8.8 Gbps
Allocated dedicated bandwidth is 4.0 Gbps
--------------------------------------------------------------------
Interfaces in the Port-Group       B2B Credit Bandwidth Rate Mode
                                        Buffers     (Gbps)
--------------------------------------------------------------------
fc2/1                                      16        4.0 dedicated
fc2/2                                      16        4.0 shared
fc2/3                                      16        4.0 shared
fc2/4                                      16        4.0 shared
fc2/5                                      16        4.0 shared
fc2/6                                      16        4.0 shared
fc2/7                                      16        4.0 shared
fc2/8                                      16        4.0 shared
fc2/9                                      16        4.0 shared
fc2/10                                     16        4.0 shared
fc2/11 (out-of-service)
fc2/12 (out-of-service)

Now that we have taken the port out of service, we have 8.8 gig to divide by not 11 ports, but 9 ports, giving us 0.97, but if we where to put fc2/2 into dedicated, that would be 8.8 - 4, leaving us with 4.8, which divided by 8 ports (since we would not be counting fc2/2 anymore) which gives us 0.6 which is still not meeting our minimum reserved bandwidth, so we need to put more ports out of service

4.8 / 6 ports gives us the total we are looking for, 0.8

So we take a few more ports out of service:
MDS1(config-if)# int fc2/2
MDS1(config-if)# switchport rate-mode dedicated
fc2/2: (error) Bandwidth not available
MDS1(config-if)# int fc2/9 - 10
MDS1(config-if)# out-of-service
Putting an interface out-of-service will cause its shared resource configuration to revert to default
Do you wish to continue(y/n)? [n] y
MDS1(config-if)# int fc2/2
MDS1(config-if)# switchport rate-mode dedicated

Now we have the bandwidth to do it

Obviously, all of these calculations change if when you setup the dedicated rate-mode, you change the amount of bandwidth that the dedicated rate-mode interface is allowed, with something like:

MDS1(config-if)# switchport speed auto max 2000

Now, you CAN change this behaviour

MDS1(config)# no rate-mode oversubscription-limit module 2

This reduces significantly the amount of reserved bandwidth per port

MDS1(config-if)# show port-resource module 2
Module 2
Available dedicated buffers are 4879

Port-Group 1
Total bandwidth is 12.8 Gbps
Total shared bandwidth is 0.8 Gbps
Allocated dedicated bandwidth is 12.0 Gbps
--------------------------------------------------------------------
Interfaces in the Port-Group       B2B Credit Bandwidth Rate Mode
                                        Buffers     (Gbps)
--------------------------------------------------------------------
fc2/1                                      16        4.0 dedicated
fc2/2                                      16        4.0 dedicated
fc2/3                                      16        4.0 dedicated
fc2/4                                      16        4.0 shared
fc2/5                                      16        4.0 shared
fc2/6                                      16        4.0 shared
fc2/7                                      16        4.0 shared
fc2/8                                      16        4.0 shared
fc2/9                                      16        4.0 shared
fc2/10                                     16        4.0 shared
fc2/11                                     16        4.0 shared
fc2/12                                     16        4.0 shared

As you can see here we have now allocated all three interfaces as dedicated, which is the maximum we can do because 4 x 3 = 12, leaving us with 0.8 of a gig shared between the other ports..

Alright! Last thing, is to do with FCoE

So a quick review, when FCoE came out they said oh dear, we don't have a lossless mechanism in FCoE, we don't have anything like R_RDY etc, and we certainly don't have anything like buffer to buffer credit's, they had something called pause frames, which have been used with priority flow control for quite a while, but they said to themselves, let's work on the pause frames and create a per-priority flow control so that you can do pause frames but only for particular CoS Values! Champagne for Everyone!

But they still had a problem, the same issue faced by fibre channel and that is that as distance of a link increased, what could happen is that you could start sending traffic and actually send so much that the other end has to drop some, and before the other end can inform you to say "Hey my buffers are full don't send me anymore please!" (i.e a pause frame), you have already sent too much traffic, meaning it will have to drop it. The way FC combatted this we have mentioned: buffer to buffer credits to keep an eye out for how much traffic the other end can handle while still making sure we wait for the R_RDY, unfortunately no such mechanism exists in FCoE, so they had to have a bit of a diffirent solution, check out the below on the cisco documentation:

http://www.cisco.com/en/US/docs/switches/datacenter/nexus5000/sw/qos/513_n1_1/b_cisco_nexus_5000_qos_config_gd_513_n1_1_chapter_011.html

Configuring No-Drop Buffer Thresholds

Beginning with Cisco NX-OS Release 5.0(2)N1(1), you can configure the no-drop buffer threshold settings for 3000m lossless Ethernet.

Note

To achieve lossless Ethernet for both directions, the devices connected to the Cisco Nexus 5548 switch must have the similar capability. The default buffer and threshold value for the no-drop can ensure lossless Ethernet for up to 300 meters.

They very helpfully also show you the values to support the maximum distance, which is 3000m, (3km)

switch(config-pmap-nq)# policy-map type network-qos nqos_policy
switch(config-pmap-nq)# class type network-qos nqos_class
switch(config-pmap-nq-c)# pause no-drop buffer-size 152000 
pause-threshold 103360 resume-threshold 83520
switch(config-pmap-nq-c)# exit
switch(config-pmap-nq)# exit
switch(config)# exit
switch#

So if we take the values they show there, 152000, 103360 and 83520 we can divide them by 3 for example to get 1km values.

This is all found in the NXoS QOS Configuration guide

I hope this was interesting or helped someone out there.

The Network Effect

CCIE DC: Buffer to Buffer Credits in FC and Dedicated Vs Shared Mode, then finally FCoE Distances

Enabling Buffer-to-Buffer Credit Recovery

Configuring No-Drop Buffer Thresholds

9 comments:

Popular old posts.