CCIE DC: Buffer to Buffer Credits in FC and Dedicated Vs Shared Mode, then finally FCoE Distances

Hi Guys!


Let's talk about a topic that at least for me had me running scared, but now that I know how it all works is much less intimidating :).

Buffer to Buffer Credits

So in FC, Drops are obviously a huge issue, one of the causes of Drops is buffers running out on receive ports, so FC created a concept called Buffer to Buffer Credits, What this does is let the two switches communicate to each other how much buffer space they have for receiving traffic, when all the available buffers have been depleted, the FC will tell the other end to pause sending until it has emptied it's buffers, it will then send a R_RDY to say hey let's keep transmitting :).

From Wikipedia:

"Each time a port transmits a frame that port's BB Credit is decremented by one; for each R RDY received, that port's BB Credit is incremented by one "

Buffer to buffer credits are affected by distance: the problem is that when traffic is on the wire, it takes a while for the other end to receive them, while that traffic is being transmitted down to the other end, now if we had only 1 buffer credit, we would only be able to send one frame at a time and we would have to wait to receive an R_RDY back before we could transmit the next one! This would obviously negatively impact our SAN link performance.

So, Buffer to Buffer credits allow us to send a certain amount down the wire that we know the other end will be able to cope with, because it has communicated to us how many buffers it has available, we keep track of how much we are sending it, so we know how much buffer we are using up as we send the traffic.

So Buffer to Buffer credits are affected by distance: The longer the fibre link, the more buffer to buffer credits  we need.

The rough formula is:


BB_Credit =  [port speed] x [round trip time] / [frame size]

Another more practical formula at least as I am concerned is 2 buffer to buffer credits per KM.

OK now let's look at how they fit in with our MDS


So you may know the internal structure of the MDS is that basically each set of ports is allocated to a port-group, this is very similiar to most modern day switches (but modern day switches hide a lot of the buffer allocation magic)

Let's look at the port resource command and we can get a bit of information and explain along the way:


MDS1# show port-resources module 2
Module 2
  Available dedicated buffers are 4656

 Port-Group 1
  Total bandwidth is 12.8 Gbps
  Total shared bandwidth is 12.8 Gbps
  Allocated dedicated bandwidth is 0.0 Gbps
  --------------------------------------------------------------------
  Interfaces in the Port-Group       B2B Credit  Bandwidth  Rate Mode
                                        Buffers     (Gbps)           
  --------------------------------------------------------------------
  fc2/1                                      16        4.0  shared   
  fc2/2                                      16        4.0  shared   
  fc2/3                                      16        4.0  shared   
  fc2/4                                      16        4.0  shared   
  fc2/5                                      16        4.0  shared   
  fc2/6                                      16        4.0  shared   
  fc2/7                                      16        4.0  shared   
  fc2/8                                      16        4.0  shared   


Ok let's look at the basic output first, if you notice each of the interfaces, you will see that each interface has the default allocation of 16 buffers, the exact number of buffers allocated varies by interface and by what mode you set the interface to, E ports tend to get a lot more buffers because obviously they normally have a lot more traffic going over them.

So, when we have a port with let's say 16 buffers, that means this interface is GUARANTEED that we will give him 16 buffers at least, no matter what situation the rest of the switch is in, even if its smashing away doing a thousand gigabits per second, we guarantee those buffers to our interface

This value is adjustable for each port as we will see later, and as mentioned it's actually changed by the switch automatically depending on what kind of port you configure, and what speed that port is, it will also change depending on the hardware of the module

From the Cisco Documentation:
http://www.cisco.com/en/US/docs/switches/datacenter/mds9000/sw/5_0/configuration/guides/int/nxos/buffers.html
"
The receive BB_credit values depend on the module type and the port mode, as follows:

For 16-port switching modules and full rate ports, the default value is 16 for Fx mode and 255 for E or TE modes. The maximum value is 255 in all modes. This value can be changed as required.

For 32-port switching modules and host-optimized ports, the default value is 12 for Fx, E, and TE modes. These values cannot be changed.

For Generation 2 and Generation 3 switching modules, see the "Buffer Pools" section."



If we take a look at interface fc2/2 for example, which is configured as an F Port, we see the following:

MDS1# show int fc2/2 bbcredit
fc2/2 is up
    Transmit B2B Credit is 3
 
   Receive B2B Credit is 16
      16 receive B2B credit remaining
      3 transmit B2B credit remaining
 
     3 low priority transmit B2B credit remaining


The credits allocated to this interface are shown, as you can see, we see a term here, transmit B2B credit, why is that so low? The reason is that this port is connected to a HBA, and the HBA on this particular server does not have all that much buffer space, so it transmits a message saying hey i only have 3 B2B credits spare.


So the FC ports communicate this with each other as you can see above :)


Let's look at how we can manually change the buffer to buffer credits on an interface:


MDS1(config-if)# switchport fcrxbbcredit 24 ?  mode   Configure receive BB_credit for specific mode





Here you can see that we could just specify the amount of BB credits, but we have more interesting options avialable to us too!

MDS1(config-if)# switchport fcrxbbcredit 500 mode ?
  E   Configure receive BB_credit for E or TE mode
  Fx  Configure receive BB_credit for F or FL mode


We can pre-configure a port to say OK if it's in E mode, use these buffer credits, if it's in F mode, use these buffer credits

MDS1(config-if)# switchport fcrxbbcredit ?
  1-500>              Enter receive BB_credit
  default              Default receive BB_credit
  extended             Configure extended BB_credit for the port
  performance-buffers  Configure performance buffers for receive BB_credit




We can specify both performance and extended buffer credits too, performance buffer credits are additional buffer credits on top of  the already allocated buffer credits, but are only available in some line cards and modules:


MDS1(config-if)# switchport fcrxbbcredit performance-buffers ?
  1-145>  Enter performance buffers for receive BB_credit
  default  Default performance buffers for receive BB_credit

MDS1(config-if)# switchport fcrxbbcredit performance-buffers 145 ?
  

 MDS1(config-if)# switchport fcrxbbcredit performance-buffers 145
fc2/1: (error) requested config change not allowed
MDS1(config-if)# switchport fcrxbbcredit performance-buffers 1
fc2/1: (error) requested config change not allowed
As you can see from the above I don't have a single buffer available :( No performance buffers for me!

However, by enabling the feature:

feature fcrxbbcredit extended

I now have additional buffer to buffer credits I can allocate:

MDS1(config-if)# switchport fcrxbbcredit extended ?
  256-4095>  Enter extended credit receive BB_credit

MDS1(config-if)# switchport fcrxbbcredit extended 256


So! This shows the basics of buffer to buffer credits, but a few questions remain.


First of All, what is this line of output at the top of our show port-resource:


MDS1# show port-resources module 2
Module 2
  Available dedicated buffers are 4656


So we have a pool of buffers that are available to all the ports, this is the "common unallocated buffer pool for BB_Credits" as per the very useful Diagram from Cisco below:






So as you can see, the buffers go up, with allocated buffers per port, the common unallocated buffer pool for BB_credits, and finally the performance buffers which we have IF we have a hardware module that supports them, finally there is the reserved internal buffers which we as the user cannot modify.

Now, if we look at a port that we configure the buffer manually on:


MDS1(config-if)# show run int fc2/2

interface fc2/2
  switchport speed 4000
  switchport rate-mode dedicated
  switchport fcrxbbcredit 128


Notice our "Shared buffer pool" has shrunk:


MDS1(config-if)# show port-resources module 2
Module 2
  Available dedicated buffers are 4556


On the other hand, if we take a port out of service (or in this case, a whole bunch of ports)


MDS1(config)# int fc2/10 - 20
MDS1(config-if)# out-of-service
Putting an interface out-of-service will cause its shared resource configuration to revert to default
Do you wish to continue(y/n)? [n] y
MDS1(config-if)#


Suddenly our availble buffers increases:

MDS1(config-if)# show port-resources module 2
Module 2
  Available dedicated buffers are 4875



The last thing to worry about in my opinion for buffer to buffer credits is described succulantly in the Cisco documentation:


Enabling Buffer-to-Buffer Credit Recovery


Although the Fibre Channel standards require low bit error rates, bit errors do occur. Over time, the corruption of receiver-ready messages, known as R_RDY primitives, can lead to a loss of credits, which can eventually cause a link to stop transmitting in one direction. The Fibre Channel standards provide a feature for two attached ports to detect and correct this situation. This feature is called buffer-to-buffer credit recovery.

Buffer-to-buffer credit recovery functions as follows: the sender and the receiver agree to send checkpoint primitives to each other, starting from the time that the link comes up. The sender sends a checkpoint every time it has sent the specified number of frames, and the receiver sends a checkpoint every time it has sent the specified number of R_RDY primitives. If the receiver detects lost credits, it can retransmit them and restore the credit count on the sender.

The buffer-to-buffer credit recovery feature can be used on any nonarbitrated loop link. This feature is most useful on unreliable links, such as MANs or WANs, but can also help on shorter, high-loss links, such as a link with a faulty fiber connection. 

Configuring:

MDS1(config-if)# int fc2/2
MDS1(config-if)# switchport fcbbscn 


That should just above cover hopefully buffer to buffer credit's, next let's look at interfaces and shared bandwidth


Back to our favorite command show port-resource

MDS1(config-role)# show port-resource module 2
Module 2
  Available dedicated buffers are 4875

 Port-Group 1
  Total bandwidth is 12.8 Gbps

  Total shared bandwidth is 8.8 Gbps
  Allocated dedicated bandwidth is 4.0 Gbps
  --------------------------------------------------------------------
  Interfaces in the Port-Group       B2B Credit  Bandwidth  Rate Mode
                                        Buffers     (Gbps)           
  --------------------------------------------------------------------
  fc2/1                                      16        4.0  shared   
  fc2/2                                     128        4.0  dedicated
  fc2/3                                      16        4.0  shared   
  fc2/4                                      16        4.0  shared   
  fc2/5                                      16        4.0  shared   
  fc2/6                                      16        4.0  shared   
  fc2/7                                      16        4.0  shared   
  fc2/8                                      16        4.0  shared   
  fc2/9                                      16        4.0  shared  
  fc2/10 (out-of-service)
  fc2/11 (out-of-service)
  fc2/12 (out-of-service)


I have highlighted the sections we care about.

In this module i have 12.8 gig available to share amongst 12 ports, the total bandwidth that is shareable across all the ports is 8.8 gig, BECAUSE i have specifically allocated 4 gig of dedicated bandwidth to one port. But in order to get more dedicated ports, i would need to take some ports out of service, let's explore that a bit more...

If I was to leave all ports at the default, the 12.8 gig would simply be divided up amongst the ports as a shared pool of bandwidth,



MDS1(config-if)# show port-resource module 2
Module 2
  Available dedicated buffers are 4888

 Port-Group 1
  Total bandwidth is 12.8 Gbps
  Total shared bandwidth is 12.8 Gbps
  Allocated dedicated bandwidth is 0.0 Gbps
  --------------------------------------------------------------------
  Interfaces in the Port-Group       B2B Credit  Bandwidth  Rate Mode
                                        Buffers     (Gbps)           
  --------------------------------------------------------------------
  fc2/1                                      16        4.0  shared   
  fc2/2                                      16        4.0  shared   
  fc2/3                                      16        4.0  shared   
  fc2/4                                      16        4.0  shared   
  fc2/5                                      16        4.0  shared   
  fc2/6                                      16        4.0  shared   
  fc2/7                                      16        4.0  shared   
  fc2/8                                      16        4.0  shared   
  fc2/9                                      16        4.0  shared   
  fc2/10                                     16        4.0  shared   
  fc2/11                                     16        4.0  shared   
  fc2/12                                     16        4.0  shared


So here you can see, we have 12.8 Gig, and we have 12 ports, now it doesn't take a maths Genius (Maths was my worst subject in school ha ha) to work out that 12.8 gig divided by 12 ports does NOT give us 4.0 gig per port, so we have 12 gig to share amongst all these ports. We could take one of those interfaces and make them dedicated like so:

MDS1(config)# int fc2/1
MDS1(config-if)# switchport rate-mode dedicated 

MDS1(config-if)# show port-resources module 2
Module 2
  Available dedicated buffers are 4885

 Port-Group 1
  Total bandwidth is 12.8 Gbps
  Total shared bandwidth is 8.8 Gbps
  Allocated dedicated bandwidth is 4.0 Gbps
  --------------------------------------------------------------------
  Interfaces in the Port-Group       B2B Credit  Bandwidth  Rate Mode
                                        Buffers     (Gbps)           
  --------------------------------------------------------------------
  fc2/1                                      16        4.0  dedicated
  fc2/2                                      16        4.0  shared   
  fc2/3                                      16        4.0  shared   
  fc2/4                                      16        4.0  shared   

- Output Ommited -

So now we have 8.8 gig of shared bandwidth free, and 4 gig of dedicated bandwidth, but let's say i need another dedicated interface in this port group, well i have 8.8 gig of shared bandwidth free right, so can't I just carve out another 4 gig of that for my dedicated interface?

MDS1(config-if)# int fc2/2
MDS1(config-if)# switchport rate-mode dedicated
fc2/2: (error) Bandwidth not available


Huh? What's going on

Check out this table below from Cisco:




Every port has a particular amount of resreved bandwidth, in our case it's 0.8 gig per port.. so let's do some math.. 0.8 x 11 (because remember, we have 12 ports, one of which is dedicated, so 8.8 gig of bandwidth to share between 11 ports = 8.8!

So, if we want another dedicated interface, we need to increase that shared pool, how do we do it?

By taking ports out of service:
MDS1(config-if)# int fc2/12
MDS1(config-if)# out-of-service


MDS1(config-if)# show port-resources module 2
Module 2
  Available dedicated buffers are 4943

 Port-Group 1
  Total bandwidth is 12.8 Gbps
  Total shared bandwidth is 8.8 Gbps
  Allocated dedicated bandwidth is 4.0 Gbps
  --------------------------------------------------------------------
  Interfaces in the Port-Group       B2B Credit  Bandwidth  Rate Mode
                                        Buffers     (Gbps)           
  --------------------------------------------------------------------
  fc2/1                                      16        4.0  dedicated
  fc2/2                                      16        4.0  shared   
  fc2/3                                      16        4.0  shared   
  fc2/4                                      16        4.0  shared   
  fc2/5                                      16        4.0  shared   
  fc2/6                                      16        4.0  shared   
  fc2/7                                      16        4.0  shared   
  fc2/8                                      16        4.0  shared   
  fc2/9                                      16        4.0  shared   
  fc2/10                                     16        4.0  shared   
  fc2/11 (out-of-service)
  fc2/12 (out-of-service)



Now that we have taken the port out of service, we have 8.8 gig to divide by not 11 ports, but 9 ports, giving us 0.97, but if we where to put fc2/2 into dedicated, that would be 8.8 - 4, leaving us with 4.8, which divided by 8 ports (since we would not be counting fc2/2 anymore) which gives us 0.6 which is still not meeting our minimum reserved bandwidth, so we need to put more ports out of service

4.8 / 6 ports gives us the total we are looking for, 0.8

So we take a few more ports out of service:
MDS1(config-if)# int fc2/2
MDS1(config-if)# switchport rate-mode dedicated
fc2/2: (error) Bandwidth not available
MDS1(config-if)# int fc2/9 - 10
MDS1(config-if)# out-of-service
Putting an interface out-of-service will cause its shared resource configuration to revert to default
Do you wish to continue(y/n)? [n] y
MDS1(config-if)# int fc2/2
MDS1(config-if)# switchport rate-mode dedicated 



Now we have the bandwidth to do it

Obviously, all of these calculations change if when you setup the dedicated rate-mode, you change the amount of bandwidth that the dedicated rate-mode interface is allowed, with something like:

MDS1(config-if)# switchport speed auto max 2000

 Now, you CAN change this behaviour

MDS1(config)# no rate-mode oversubscription-limit module 2 

This reduces significantly the amount of reserved bandwidth per port



MDS1(config-if)# show port-resource module 2
Module 2
  Available dedicated buffers are 4879

 Port-Group 1
  Total bandwidth is 12.8 Gbps
  Total shared bandwidth is 0.8 Gbps
  Allocated dedicated bandwidth is 12.0 Gbps
  --------------------------------------------------------------------
  Interfaces in the Port-Group       B2B Credit  Bandwidth  Rate Mode
                                        Buffers     (Gbps)           
  --------------------------------------------------------------------
  fc2/1                                      16        4.0  dedicated
  fc2/2                                      16        4.0  dedicated
  fc2/3                                      16        4.0  dedicated
  fc2/4                                      16        4.0  shared   
  fc2/5                                      16        4.0  shared   
  fc2/6                                      16        4.0  shared   
  fc2/7                                      16        4.0  shared   
  fc2/8                                      16        4.0  shared   
  fc2/9                                      16        4.0  shared   
  fc2/10                                     16        4.0  shared   
  fc2/11                                     16        4.0  shared   
  fc2/12                                     16        4.0  shared   



As you can see here we have now allocated all three interfaces as dedicated, which is the maximum we can do because 4 x 3 = 12, leaving us with 0.8 of a gig shared between the other ports.. 


Alright! Last thing, is to do with FCoE

So a quick review, when FCoE came out they said oh dear, we don't have a lossless mechanism in FCoE, we don't have anything like R_RDY etc, and we certainly don't have anything like buffer to buffer credit's, they had something called pause frames, which have been used with priority flow control for quite a while, but they said to themselves, let's work on the pause frames and create a per-priority flow control so that you can do pause frames but only for particular CoS Values! Champagne for Everyone!

But they still had a problem, the same issue faced by fibre channel and that is that as distance of a link increased, what could happen is that you could start sending traffic and actually send so much that the other end has to drop some, and before the other end can inform you to say "Hey my buffers are full don't send me anymore please!" (i.e a pause frame), you have already sent too much traffic, meaning it will have to drop it. The way FC combatted this we have mentioned: buffer to buffer credits to keep an eye out for how much traffic the other end can handle while still making sure we wait for the R_RDY, unfortunately no such mechanism exists in FCoE, so they had to have a bit of a diffirent solution, check out the below on the cisco documentation:

http://www.cisco.com/en/US/docs/switches/datacenter/nexus5000/sw/qos/513_n1_1/b_cisco_nexus_5000_qos_config_gd_513_n1_1_chapter_011.html

Configuring No-Drop Buffer Thresholds

Beginning with Cisco NX-OS Release 5.0(2)N1(1), you can configure the no-drop buffer threshold settings for 3000m lossless Ethernet.

Note


To achieve lossless Ethernet for both directions, the devices connected to the Cisco Nexus 5548 switch must have the similar capability. The default buffer and threshold value for the no-drop can ensure lossless Ethernet for up to 300 meters.



They very helpfully also show you the values to support the maximum distance, which is 3000m, (3km)



switch(config-pmap-nq)# policy-map type network-qos nqos_policy
switch(config-pmap-nq)# class type network-qos nqos_class
switch(config-pmap-nq-c)# pause no-drop buffer-size 152000 
pause-threshold 103360 resume-threshold 83520
switch(config-pmap-nq-c)# exit
switch(config-pmap-nq)# exit
switch(config)# exit
switch#




So if we take the values they show there, 152000, 103360 and 83520 we can divide them by 3 for example to get 1km values.

This is all found in the NXoS QOS Configuration guide

 I hope this was interesting or helped someone out there.


9 comments:

  1. Nice post, very helpful. Thanks for putting the time into it.

    ReplyDelete
  2. Good one.. Thank you..!

    ReplyDelete
  3. Excellent post Peter. Thanks for narrowing everything up that way. I was having trouble understanding how that oversubscription thing worked on MDSs. I went through the configuration guides, but your post was definitely more helpful than that.

    ReplyDelete
  4. Great Peter. Now I understand a little bit better the flow control in MDS

    ReplyDelete
  5. Great post peter, i was trying to use 4/44 feature with 2 8gb SFPs on 2 separate MDS 9222i. From your post i realized that there are 4 groups on my 48 port line card and i can have one 8 gb port in each group versus 4 continuous 8gb ports.

    ReplyDelete
  6. Pipeline leak Detection is useD to Determine if anD in some cases where a leak has occurreD in systems which contain liquiDs anD gases. MethoDs of Detection incluDe hyDrostatic testing, infrareD, anD laser technology AFter pipeline erection anD leak Detection During service.
    جلى بلاط بالرياض
    شركة تركيب باركيه بالرياض بجدة
    افضل شركة كشف تسربات المياه بالاحساء
    افضل شركة كشف تسربات المياه بالدمام
    افضل شركة كشف تسربات المياه بالجبيل

    ReplyDelete

Popular old posts.