vPC, the gotcha's you need to know

Hi Guys

Having spent a lot of time with customers working on vPC deployments, I have found quite a few of the gotcha's for vPC that I want to share with you now. There are plenty of guides out there on the internet including from Cisco themselves but I have found a lot of them to be dated as improvements are constantly made to vPC.

This blog post addresses vPC considerations for the following version:

NX-OS 6.0(2) on Nexus 7000 Hardware


Now we have that out of the way :)

So, if you don't know what vPC is, have never even looked at the basics on how to configure it, this is not the blog post for you. This blog post assumes you have vPC enabled and are maybe experiencing strange behavior, or you have been through the basics of vPC and are about to deploy but just want to know the gotchas

Let's talk about one vPC design caveat, addressed very well by Brad Hedlund in his blog post

Layer 3 Considerations

This particular vPC design caveat could end up causing you lots of grief if you are unaware of it.

To understand this caveat, you must understand the following rule:

vPC will not allow traffic that was RECEIVED over a VPC peer-link to be sent out a vPC member port.

This is a loop prevention method, keep that in the back of your head as you read this.


So, Let's say you have two Nexus 7k's, let's make things really simple and say that you have two VLAN's, one is your server/router VLAN, VLAN 99 and the other is VLAN 100 which is your user VLAN.

So, you have a router connected to the First Nexus, from a routing point of view it peers with the two nexus over VLAN 99.
Your router is not etherchanneled to the Nexus, it's just connected via a normal access port


You then have a server which is on a vPC port channel, called vPC 1, vPC 1 has a configuration like so:

int po1
 switchport access vlan 100
 switchport mode access
!
Pretty simple config but will do for what we are trying to show. It is connected to both Nexus

Now, for some reason your router, even though it is physically connected to the primary nexus, decides to use the Secondary Nexus as a next-hop address for the vlan 100 subnet, maybe something happened with the routing protocol on the first nexus, or it was simply misconfigured from the start, whatever the case, you have now broken the golden rule for vPC loop prevention I mentioned above

Think about it, you have a router (let's say 99.1.1.254) trying to get to a server (let's say 10.1.1.2) but it's next hop is the nexus connected OVER the vPC peer Link, then the second nexus would need to route it down a vPC MEMBER PORT


The traffic will be dropped by the loop prevention technology.


There are several solutions to this, most of which are well addressed in Brad Hedlunds document, you could create a VLAN for the router and the two nexus to establish their peer relationship on and make sure that that VLAN is not trunked to any vPC member ports, you could create an entirely seperate link between the two Nexus to carry the Layer 3, you could run the router into both chassis and use Layer 3 ports. Lot's of options. But if you ever have problems and the routing is not working, go back to that golden rule, Am i coming in over a vPC link and then trying to go out a member port?

The next Layer 3 caveat is an odd one, but worth talking about. Apparently some SAN's out there from EMC and Netapp, implement something they call "fast routing" which basically means that whenever they receive a packet from an IP address, they store the MAC address and IP address combination in there ARP table, so by the end of it there ARP table would look something like this:

9.1.1.1 aaaa.bbbb.cccc
9.2.2.2 aaaa.bbbb.cccc
3.3.3.3 aaaa.bbbb.cccc

Where aaaa.bbbb.cccc is the MAC address of there default gateway, the idea behind this is that it means the SAN does not have to perform a route lookup/ARP request and should save it some time, in my humble opinion it would shave maybe a fraction of a millisecond in most modern CPU's on the SAN's and in return horribly breaks the RFC (is it acceptable as part of the RFC? Am I dead wrong? it would not be the first time, leave a reply below or ping me on twitter @ccierants)

Anyway, regardless of the merits, this causes problems for the Nexus when used in combination with VRRP
 the problem is that with VRRP, the default gateway has a VRRP Defined MAC, but the actual reply when it comes back to the Netapp will actually be from the Burnt In MAC address, this can cause problems! Because now when teh netapp does its look in it's arp table, it will send the traffic there, if for some reason this is the non active neighbor (the non VRRP Master), and the frame is destined for a vPC port member.. guess what, we just broke the golden rule again.

So in order to fix this, cisco implemented the peer-gateway command, the peer-gateway command tells the Nexus 7k's to route any frame rather than forwarding it over the vPC link if it is received for either mac address of either Nexus 7k. Easy Peasy!

Here is how to configure it, I can't see a single downside to configuring peer-gateway so recommend you always turn this on :)


Nexus(config)# vpc domain 1
Nexus(config-vpc-domain)# peer-gateway

Easy :)


Ok, On to a few more caveats.

Making changes to your vPC's
This is not strictly an issue with the version of NX-OS we are running in our example as the feature to stop this causing problems is turned on by default, however it is included here in case someone turned it off :)

Let's say you had a simple vPC that looked like this on both switches:

int po1
 mtu 9216
 switchport mode access 
 switchport access vlan 50
!

Simple, easy, but for some reason you want to change the MTU, this would be considered a type 1 mismatch and as soon as you changed it, the vPC would be brought down across BOTH NEXUS 7'K's!!!

"What the hell just happened? I was careful and I only changed one port, now my server has gone offline, since it was etherchannel'd I should have been fine!" < - this is what you would have been saying to yourself prior to NX-OS 5.2, as a feature called "Graceful consistency check" did not exist, to see if you have graceful consistency check enabled:


Nexus# show vpc
Legend:
                (*) - local vPC is down, forwarding via vPC peer-link

vPC domain id                     : 1  
Peer status                       : peer adjacency formed ok     

... ...
Graceful Consistency Check        : Enabled

Auto-recovery status              : Enabled (timeout = 240 seconds)



If this is not set as enabled.. trust me, set it as enabled:


Nexus# conf t
Enter configuration commands, one per line.  End with CNTL/Z.

Nexus(config)# vpc domain 1
Nexus(config-vpc-domain)# graceful consistency-check

OK excellent let's keep going :)

The next thing to talk about quickly is the difference between the peer-link and the peer-keepalive link.

The peer-link is an important part of the vPC puzzle, the peer-keepalive link is actually not so important. the peer-keepalive link you could actually unplug and your vPC peers would continue to function quite happily, you would have messages that the peer keepalive had failed, but you would be able to continue working, in previous NX-OS releases you would have been unable to make configuration changes, but this is not the case anymore.

What the Peer-keepalive does do however, is that in the event your peer-link fails, the peer keepalive is used to prevent a split brain scenario, if your peer-links die but the chassis itself actually remains up, you will get a message like so:

 Nexus %ETHPORT-3-IF_ERROR_VLANS_SUSPENDED: VLANs 110, 99,on Interface p
ort-channel10 are being suspended. (Reason: vPC peer is not reachable over cfs
)


This is to prevent loops, any vPC member ports are shutdown on the secondary vPC peer.

OK next it is time to talk about a command auto-recover, this is NOT SET BY DEFAULT in this NX-OS  although I would argue strongly that it should be.

Let's say for some reason, you are in a situation where both your Nexus's have been turned off, and you can only bring back one of them (turn on one of them), maybe you had a power outage and only have enough power to bring up one (A UPS from a particular feed has died) or maybe a power spike blew up one chassis and your waiting for cisco to deliver the spares for the other in the mean time, whatever the situation may be, if the end result is, you are turning on one chassis but not the other you need the auto-recover command. This command is NOT relevant if you had two Nexus switches up and let's say the power failed to one of them, if you restored the power to that Nexus, you would not need to worry about this command: the two Nexus would see each other and restore there relationship, and while one of them was offline, the vPC would have kept working.

By default, if a Nexus has been turned on with vPC configuration, and vPC port channels configured, if it cannot see it's vPC peer, it will not bring the vPC port channels up!

You can tell the Nexus upon bootup to wait a certain amount of time before deciding that hey, the other nexus involved in my vPC is not coming back any time soon, he is on an extended lunchbreak or something, so let's get those vPC's up so we can start forwarding traffic.

Here is how to turn it on:

Nexus(config)# vpc domain 1
Nexus(config-vpc-domain)# auto-recovery
Warning:
 Enables restoring of vPCs in a peer-detached state after reload, will wait for 240 seconds to determine if peer is un-reachable

 

As per the warning, the default time to wait before bringing up the vPC's if you can't see a peer is 240 seconds, this timer can be adjusted as a parameter to the auto-recovery command.


I will now bring you to one final caveat

Mis-configured port-channel on the end device.

 So your probably use to the fact that, if you enable two interfaces for a port channel using LACP, if the other end doesn't have port-channel turned on or there is some other problem, no worries right? LACP will just place the port(s) into standalone mode and spanning-tree will just choose an active path.

Unfortunately with the Nexus, there is no such thing as standalone, it is either part of a vPC or it will be suspensed as the following output shows:



Nexus# show port-channel sum
Flags:  D - Down        P - Up in port-channel (members)
        I - Individual  H - Hot-standby (LACP only)
        s - Suspended   r - Module-removed
        S - Switched    R - Routed
        U - Up (port-channel)
        M - Not in use. Min-links not met
--------------------------------------------------------------------------------
Group Port-       Type     Protocol  Member Ports
      Channel
--------------------------------------------------------------------------------
6     Po6(SD)     Eth      LACP      Eth1/1(s)  


Nexus# show int eth1/1
Ethernet1/1 is down (suspended(no LACP PDUs))

Easy to fix:

Nexus(config)#int po1
Nexus(config-if)# lacp ?
  max-bundle            Configure the port-channel max-bundle
  min-links             Configure the port-channel min-links
  suspend-individual    Configure lacp port-channel state. Disabling this will
                        cause lacp to put the port to individual state and not
                        suspend the port in case it does not get LACP BPDU from
                        the peer ports in the port-channel 


so if we enter:


Nexus(config-if)# shut
Nexus(config-if)# no lacp suspend-individual
Warning: !! Disable lacp suspend-individual only on port-channel with edge ports. Disabling this on network port port-channel could lead to loops.! 

Nexus(config-if)# no shut

As per the warning guys, this could cause you HUGE problems if you enable this on a port that is part of a vPC, so I would only use this no lacp suspend-individual on ports that are not part of a vPC port channel (in which case, why are you port channeling, in which case, why don't you just fix the fact that the other end is not doing port-channel or just remove the port channel config from the Nexus?)

I hope this helps someone out there!






ACL Capture feature

Hi Guys, I found this very interesting  as a feature and I thought I would share it quickly on my blog, No details for you on how to configure it yet or how well it works but still :)


 

ACL Capture

You can configure ACL capture in order to selectively monitor traffic on an interface or VLAN.
When you enable the capture option for an ACL rule, packets that match this rule are either forwarded or dropped based on the specified permit or deny action and may also be copied to an alternate destination port for further analysis.
An ACL rule with the capture option can be applied as follows:
  • On a VLAN
  • In the ingress direction on all interfaces
  • In the egress direction on all Layer 3 interfaces
ACL capture can be used in a variety of scenarios. For example, ACL capture can use ACL rules to identify packets belonging to a tunnel and to send a copy (or capture) of the tunnel packets to a specific destination. ACL capture can also be used to monitor all HTTP traffic on a particular VLAN.
Finally, you can also configure the capture session for the whole ACL rather than configuring it per ACL rule. This configuration applies the capture session to all of the ACL rules.




CCIE DC Yet more information

Hi Guys

So yet more information on the CCIE DC as per my @ccierants twitter (please follow! I have like 5 followers i need more :p)

  • Rack Rentals for all this equipment straight from Cisco should be available by September
  • NO VMWARE QUESTIONS, all the vmware stuff will be taken care of by Cisco, you will not be expected to know anything about VMWARE, think of it like when Cisco do the DNS server for you in the routing and switching, or the Certificate authority in the CCIE Security
  • Written beta May/June
  • NEW SPECIALIST EXAMS FOR UCS DO NOT REQUIRE YOU TO HAVE A VCP ANYMORE yay yay
  • The labs will be hosted in any location that has CCIE Storage, so for me that means Sydney yay
I hope this helps guys!

CCIE DC More information!

Hi Guys

I am at Cisco Live Melbourne as per my previous blog posts and have found out more about the CCIE Data Center for you, you probably have already heard that the first written exam will be available May with the lab to follow in September.

I have also now confirmed the following:
  • For the short term (1 to 2 years) this will NOT replace the SAN CCIE, Repeat: the SAN CCIE will not be replaced
  • There will not be a HUGE emphasis on fibre channel, because as mentioned, this is NOT the SAN CCIE
  • Cisco understand the rack rental issue and understand it will cause major problems for quite a few people, however they have committed that they will have rack rentals available for this (assuming through the Cisco 360 program) (I was thinking about this too, if I was IPExpert or another vendor of training materials I would be looking at using VDC to my advantage in terms of offering multiple racks)
  • CCNP DC and CCNA DC will be following soon
  • Troubleshooting will be a big part of this exam (as can be deduced from the blueprint), I couldn't get him to commit if it was going to have troubleshoot tickets like the CCIE R&S but the implication was that yes it would. 
I have been promised a date for the lab rentals by Friday when we do the breakout session for the CCIE DC, so I will try and keep you all informed :)

How to copy files using SCP on Cisco Nexus OS

Hi Guys

Just a quick follow up to one of my favorite blog posts regarding how to copy files using SCP onto Cisco, this can also be done on the Nexus OS,

Super quickly you might want to check out the following blog post for a quick review on how to copy using SCP:
http://www.ccierants.com/2011/06/great-way-to-copy-files-on-cisco.html

To enable this exact same functionality on a Nexus:

Nexus(config)#feature scp-server

Done! You can now copy using SCP :)

I hope this helps someone out there
 

UPDATE: Cisco CCIE DC availability dates

Awesome news Guys

CCIE DC Written will be available in May!

Lab exam will be available in September 2012 (Right near my birthday, early birthday present maybe? ;))

Beta Test Availability

The beta version of the CCIE Data Center Written Exam v3.0 (351-080) will be available for scheduling and testing at all worldwide Cisco-authorized Pearson VUE testing centers beginning May 1 through June 15, 2012. The beta test will also be offered during Cisco Live San Diego event from June 10-14, 2012.

Candidates may schedule and take the exam on the same day.  The beta exam will be offered at a discounted price of US$50, with full recertification or lab qualification credit granted to all passing candidates.

Candidates preparing for this exam should refer to CCIE Data Center Exam Topics on the Cisco Learning Network for a detailed outline of the topics covered



CCIE Data Center at Cisco Live Melbourne

Hi Guys

So you may have heard that Cisco Have announced at Cisco Live Melbourne the new CCIE Data Center:

https://learningnetwork.cisco.com/community/certifications/ccie_data_center

Finally a more detailed blueprint is now available at the following location, I am on site at Cisco Live Melbourne and I will attempt to get as much information for you as I can, so far I have the following questions in mind, if you have any good idea's for questions please post below


1. How far away is the CCIE DC? When can we expect the written/lab exams to be available
Answered: Available in September for the lab, Written exam (beta) available in May

2. Given the very large requirements for a lab with the equipment listed on the CCIE DC, what plans do Cisco have to make it possible to access labs of this equipment?
3.  Will the Cisco 360 program be available for CCIE DC?
4. Do Cisco expect that the level of difficulty in regards to SAN will be the same sort of difficulty as the actual CCIE SAN, i.e. is there as much emphasis on FC

In terms of my study plans, I have ordered a Cisco UCS C200 to use for the Nexus 1000v work.

I want a fibre channel switch, but i have no idea how to get a very cheap Fibre Channel SAN of some description, does anyone have any ideas for this?

I will keep you all informed of how I intend to study for my CCIE DC!

Let the fun begin!


Hardware Blueprint:
Cisco Catalyst Switch 3750
Cisco 2511 Terminal Server
MDS 9222i
Nexus7009
- (1) Sup
- (1) 32 Port 10Gb (F1 Module)
- (1) 32 Port 10Gb (M1 Module)
Nexus5548
Nexus2232
Nexus 1000v
UCS C200 Series Server
– vic card for c-series
UCS-6248 Fabric Interconnects
UCS-5148 Blade Chassis
– B200 M2 Blade Servers
– Palo mezzanine card
– Emulex mezzanine card
Cisco Application Control Engine Appliance – ACE4710
Dual attached JBODs
Software Versions
NXOS v6.0(2) on Nexus 7000 Switches
NXOS v5.1(3) on Nexus 5000 Switches
NXOS v4.2(1) on Nexus 1000v
NXOS v5.2(2) on MDS 9222i Switches
UCS Software release 2.0(1x) for UCS-6248 Fabric Interconnect
Software Release A5(1.0) for ACE 4710
Cisco Data Center Manager software v5.2(2)
Lab Blueprint:
Cisco Data Center Infrastructure – NXOS
Implement NXOS L2 functionality
Implement VLANs and PVLANs
Implement Spanning-Tree Protocols
Implement Port-Channels
Implement Unidirectional Link Detection (UDLD)
Implement Fabric Extension via the Nexus family
Implement NXOS L3 functionality
Implement Basic EIGRP in Data Center Environment
Implement Basic OSPF in Data Center Environment
Implement BFD for Dynamic Routing protocols
Implement ECMP
Implement FabricPath
Implement Basic NXOS Security Features
Implement AAA Services
Implement SNMPv3
Configure IP ACLs, MAC ACLs and VLAN ACLs
Configure Port Security
Configure DHCP Snooping
Configure Dynamic ARP Inspection
Configure IP Source Guard
Configure Cisco TrustSec
Implement NXOS High Availability Features
Implement First-Hop Routing Protocols
Implement Graceful Restart
Implement nonstop forwarding
Implement Port-channels
Implement vPC and VPC+
Implement Overlay Transport Protocol (OTV)
Implement NXOS Management
Implement SPAN and ERSPAN
Implement NetFlow
Implement Smart Call Home
Manage System Files
Implement NTP, PTP
Configure and Verify DCNM Functionality
NXOS Troubleshooting
Utilize SPAN, ERSPAN and EthAnalyzer to troubleshoot a Cisco Nexus problem
Utilize NetFlow to troubleshoot a Cisco Nexus problem
Given an OTV problem, identify the problem and potential fix
Given a VDC problem, identify the problem and potential fix
Given a vPC problem, identify the problem and potential fix
Given an Layer 2 problem, identify the problem and potential fix
Given an Layer 3 problem, identify the problem and potential fix
Given a multicast problem, identify the problem and potential fix
Given a FabricPath problem, identify the problem and potential fix
Given a Unified Fabric problem, identify the problem and potential fix
Cisco Storage Networking
Implement Fiber Channel Protocols Features
Implement Port Channel, ISL and Trunking
Implement VSANs
Implement Basic and Enhanced Zoning
Implement FC Domain Parameters
Implement Fiber Channel Security Features
Implement Proper Oversubscription in an FC environment
Implement IP Storage Based Solution
Implement IP Features including high availability
Implement iSCSI including advanced features
Implement SAN Extension tuner
Implement FCIP and Security Features
Implement iSCSI security features
Validate proper configuration of IP Storage based solutions
Implement NXOS Unified Fabric Features
Implement basic FC in NXOS environment
Implement Fiber channel over Ethernet (FCoE)
Implement NPV and NPIV features
Implement Unified Fabric Switch different modes of operation
Implement QoS Features
Implement FCoE NPV features
Implement multihop FCoE
Validate Configurations and Troubleshoot problems and failures using Command Line, show and debug commands.
Cisco Data Center Virtualization
Manage Data Center Virtualization with Nexus1000v
Implement QoS, Traffic Flow and IGMP Snooping
Implement Network monitoring on Nexus 1000v
Implement n1kv portchannels
Troubleshoot Nexus 1000V in a virtual environment
Configure VLANs
Configure PortProfiles
Implement Nexus1000v Security Features
DHCP Snooping
Dynamic ARP Inspection
IP Source Guard
Port Security
Access Control Lists
Private VLANs
Configuring Private VLANs
Cisco Unified Computing
Implement LAN Connectivity in a Unified Computing Environment
Configure different Port types
Implement Ethernet end Host Mode
Implement VLANs and Port Channels.
Implement Pinning and PIN Groups
Implement Disjoint Layer 2
Implement SAN Connectivity in a Unified Computing Environment
Implement FC ports for SAN Connectivity
Implement VSANs
Implement FC Port Channels
Implement FC Trunking and SAN pinning
Implement Unified Computing Server Resources
Create and Implement Service Profiles
Create and Implement Policies
Create and Implement Server Resource Pools
Implement Updating and Initial Templates
Implement Boot From remote storage
Implement Fabric Failover
Implement UCS Management tasks
Implement Unified Computing Management Hierarchy using ORG and RBAC
Configure RBAC Groups
Configure Remote RBAC Configuration
Configure Roles and Privileges
Create and Configure Users
Implement Backup and restore procedures in a unified computing environment
Implement system wide policies
Unified Computing Troubleshooting and Maintenance
Manage High Availability in a Unified Computing environment
Configure Monitoring and analysis of system events
Implement External Management Protocols
Collect Statistical Information
Firmware management
Collect TAC specific information
Implement Server recovery tasks
Cisco Application Networking Services – ANS
Implement Data Center application high availability and load balancing
Implement standard ACE features for load balancing
Configuring Server Load Balancing Algorithm
Configure different SLB deployment modes
Implement Health Monitoring
Configure Sticky Connections
Implement Server load balancing in HA mode



Boot from SAN iSCSI with Cisco UCS 2.0

Update:

Here are a couple of tips for all of you, if you see the error message about invalid iSCSI Configurations when configuring this and trying to apply as a service profile, first of all to help troubleshoot try removing the iSCSI boot parameters so that you just have the actual iSCSI NIC's but no iSCSI boot policy, that way you know if the error is related to your iSCSI NIC or your iSCSI boot policy. 


One particular error for me was that I had defined a MAC address pool for my iSCSI adapter the actual vNIC itself under iSCSI NIC's, i had actually set a MAC pool, for M81KR Adapters you should set this to derived or not select a pool at all.


 

 Hi Guys!

Hot on the heels of my previous post about Port Channels with UCS between the IOM's and the Fabric Interconnects (a new feature in UCS 2.0) comes another blog post about another great new feature in UCS 2.0.

I wrote this article as although there is a few out articles already out there talking about boot from SAN iSCSI, it is very new and thus troubleshooting it is a bit more difficult, I also want to show you how to make sure it uses Jumbo Frames :)


Sections of Document
Boot from SAN iSCSI
Troubleshooting boot from SAN iSCSI



Boot from SAN iSCSI


In this blog post, I will explain how to configure boot from SAN iSCSI and a great way to TEST it yourself without having to involve any SAN guys until your certain your end is all working great :), with some very handy FREE software

Finally, I will take you through some great ways to troubleshoot using some very cool and useful commands available under your Mezzanine Adapter (A CLI in my Mezzanine Adapter? what a cool concept! and extremely useful!)

Let's get started

First of all here are the enviromentals used here just so we can make sure we are all on the same page :)

We are using UCS 2.0(X), The Cisco M81KR Mezzanine Cards (Other cards SHOULD be fine but you may need to adjust your iSCSI adapter policies) and some B2XX series blades

So, first step is to login to UCS, I am going to assume you already know how to create a service profile, so I have just created one and we will be modifying that to suit boot from iSCSI :)

The first step is to go to the LAN tab and create a VLAN for our iSCSI traffic, I try (unless I can't avoid it) to not route iSCSI traffic, as this can stuff around with the MTU etc. But ofcourse that doesn't always fit with every situation

Go to the LAN TAB -> LAN Cloud -> VLAN's






Create a new VLAN with the plus button on the right hand side, give it a suitable name
and more than likely, you are just going to set it as common/global

give it an ID too :)





Now we need to create some QOS policies for our vNIC's

Under the same LAN tab, under LAN cloud again, go to "QoS System Class"
Pick a class you are not using for anything else, I picked GOld, Set the class to enabled,
 and change the MTU to 9000, I leave packet drop for this class turned ON, I know that some people think this is not a good idea, but my personal feeling is that iSCSI unlike fibre channel, has mechanisms in place to gracefully deal with packet drop and that packet drop should be reserved for traffic that REALY needs it, but anyway

Here is a screenshot of the gold class configured for this




Next, we need to assign this class of traffic to a QOS policy. Go to Policies - Root (or your own sub-organization if your using them (and if your not, why not?) and then click on QoS Policies, create a new QoS Policy from here, Change the priority to whichever class of traffic you chose in the previous step.



Next, let's create some vNIC templates :)

In the LAN tab go to vNIC Templates, we will be adding two new vNIC templates.

Set the name as you like, for the fabric ID, choose fabric A for this, and the next vNIC we create we will be choosing fabric B. I don't personally enable failover, I like to treat it like fibre channel where you have two separate storage networks, but the ticking/unticking of this box will depend on your own iSCSI topology :)
 (Got an opinion as to why one or the other is better? Leave a comment!)

I personally set the template type to updating, so that if I have made a mistake, all my service profiles are updated along with it.

 Select your iSCSI VLAN you created previously, and set that as the native VLAN

Change the MTU to 9000

For the MAC pool, select a MAC pool that you have created previously (or create a whole new one if you prefer :))

The QOS policy should be the same QoS policy we created in the previous step.






Now it's time to associate these two vNIC's with your service template, the next few steps may differ if your setting up a service profile from scratch, i am going to for the sake of brevity assume your modifying an existing service profile.

Go to the server tab and find your service profile, click on vNIC's and click Add

Name your  iSCSI adapter and click "use LAN connectivity template, then select your vNIC template you created previously, repeat this for your two vNIC's using the separate fabric failovers




 Now, click on your service profile and click on iSCSI vNIC's, then click add, give your iSCSI vNIC a name (in this case, i chose iSCSIHBA1, and for overlay vNIC choose the vNIC we assigned to the service profile previously, for the VLAN select the iSCSI VLAN we created previously.




Make SURE you don't specify a MAC Pool but instead choose none, I had a major problem where I set a MAC pool, it said that the configuration was wrong and configuring this became quite difficult, I was told that the iSCSI configuration was incomplete (at least that is what the Service Profile indicated), this might be an issue with 2.0.1(w), but if your experiencing this issue, that is likely to be your problem.

Ok! We are getting close now!

So just to review, you should now have a System Class for QOS, a QOS Profile, some vNIC templates, two vNIC's created based off these templates and assigned to your service profile and finally two iSCSI NIC's now showing under your Service Profile

*Phew*


OK, the last step is to modify the boot parameters so that it will boot from iSCSI, click on the top branch of your service profile tree (the service profile itself) and select Boot Order.

Click "Modify Boot Policy", select "specific boot policy" (You could also go and create a boot policy for this under your organization)

First, add a CD-ROM drive as you will need to be able to actually install your OS so I am assuming you will use a KVM with virtual media to do this :)
Next, click on your iSCSI interfaces you created previously, and assign them into the boot policy, then click OK











Next, we need to define the boot Parameters


You will be taken back to the boot order tab when you click OK. Click on "Set Boot Parameters" under the iSCSI vNIC's on the left hand side, we will need to set Parameters for both.

The first step is that you must choose an initiator name, I chose to use the following format:

iqn.1992-08.cisco.com:2500
and
iqn.1992-08.cisco.com:2501 

for each of my vNIC's, they must be separate for each of your vNIC's

 For the initiator IP address policy, this is where you actually set the IP address of the iSCSI adapter itself, so this should be something in your iSCSI VLAN and it should be able to reach the target






Next, at the bottom of this screen is where we specify the target, click Add and enter in the name and IP address of your target




Repeat these steps using different initiator names and  IP Addressing for the other iSCSI vNIC boot parameters.

Now, _if_ your confident that your iSCSI target is correct and setup and ready to go (and supports boot from SAN) at this point your finished! You can go ahead and boot the system, install your software to the remote LUN over iscsi and you should be good to go.

Let's assume for a minute that its not quite that simple ;) and that you come across issues. Let's start talking about troubleshooting!

Troubleshooting Boot from SAN iSCSI

So of course, before you can even begin to install the operating system, you actually have to be able to see the LUN. Let's talk about how to make sure that is happening.

First of all, when you boot your freshly configured Service Profile, you should notice that the vNIC actually comes up with a message during boot saying if  it was able to see the target, if your all good, you will see it show the LUN available via your target, if your not all good it will show something such as Initiator Error 1 (which generally means it couldn't find the target) or initiator error 4

But sometimes those messages scroll by so fast, and it takes a while for the server to reboot! So sometimes you just want to be able to find out if it worked some other way, and hell, get a bit more information too!

So, inspired by another Bloggers post I found, where he talks about how to see what LUN's are available over a HBA before and after the server has booted

 

I wondered if there was a way to do the same thing in iSCSI? there is!


First, login to the UCS CLI, then connect to the adapter of the server that is running the Service Profile with the boot from iSCSI (Where in my example, 1 is the chassis ID, 1 is the server, 1 is the mezzaine adapter number)

UCSHOSTNAME#connect adapter 1/1/1
UCSHOSTANME#(adapter)connect
UCSHOSTNAME(adapter)#attach-mcp

adapter 1/1/1 (mcp):20# iscsi_get_config
vnic iSCSI Configuration:
----------------------------
 
vnic_id: 6
          link_state: Up

       Initiator Cfg:
     initiator_state: ISCSI_INITIATOR_READY
initiator_error_code: ISCSI_BOOT_NIC_NO_ERROR
                vlan: 0
         dhcp status: false
                 IQN: iqn.1992-08.com.cisco:2500
             IP Addr: 192.168.227.10
         Subnet Mask: 255.255.255.0
             Gateway: 192.168.227.1

          Target Cfg:
          Target Idx: 0
               State: ISCSI_TARGET_READY
          Prev State: ISCSI_TARGET_DISABLED
        Target Error: ISCSI_TARGET_NO_ERROR
                 IQN: iqn.2008-08.com.starwindsoftware:192.168.227.10-test
             IP Addr:192.168.227.10
                Port: 3260
            Boot Lun: 0
          Ping Stats: Success (9.982ms)

        Session Info:
          session_id: 0
         host_number: 0
          bus_number: 0
           target_id: 0



You can tell from the very helpful output above that there was no issue connected to the iSCSI target!, this will also give you useful information about if the target did fail for some reason, and why, you can even run iscsi_ping to make sure that you can actually reach the target via network.

So all in all a very cool way to do troubleshooting.

And Another Thing...

So maybe you don't have access to the SAN storage yet, and your trying to test boot from SAN iscsi, or maybe your trying to ensure the problem is not on the UCS side but possibly with the SAN, how can you quickly and easily run up a SAN? I mean they cost tons of money!!


Enter the Starwind Software iSCSI software SAN, a _FREE_ version is available on there website, and runs on ANY windows platform (not just windows server platforms) and allows you to quickly and very easily create an iSCSI SAN, you don't even need Raw Disks!!! you can just tell it to create a file on the computer and treat that as a volume/LUN to be exported via iSCSI (Very cool!)


When you download it and install it, just so you know the default username and password (which is not documented very well) is root and starwind


As you can see, it was super easy for me to create a iSCSI target, i just clicked next a few times and away i went, and best of all this software is FREE (some features are paid), believe me when I say i am not being paid to push this product (no one pays me anything :p) I am just so impressed that someone could offer this software for free when it works so incredibly well.


It is my sincere hope that armed with this knowledge you will be able to go out there and prove UCS iSCSI boot from SAN works perfectly for Cisco UCS (I wonder when Junipers Server products are coming? ;))

I hope this helps someone out there!



































Popular old posts.