CCIE DC: Nexus 1000V troubleshooting

Hi Guys!!

If your anything like me, one of the most nerve-racking things when installing the Nexus 1000V is the big step when you move the uplinks of your ESX host to the Nexus 1000V. this can be quite daunting, but is a perfectly safe operation that you can always back out of. In this I will take you through the steps of troubleshooting and resolving issues with this connection.


Incidentally, one of the main hints in this article, how to add a system-VLAN using the command line on the ESX Server was pointed out by one of the many very smart people on our CCIE DC facebook group, I strongly recommend if your serious about CCIE DC that you join this group, we have lot's of really smart people contributing some great ideas to this group.

So full credit to my fellow members of this group!


Here are some general recommendations
 - Do one uplink at a time if you can, keeping your host available indirectly if you get absolutely stuck

- Be sure to have organised KVM or ILO access to the host so that if you need to get onto the host in a hurry you can

- Ensure you have enabled SSH and direct console access to the host before you move the uplinks so you can login to troubleshoot

- Of course, put the host in maintenance mode before you perform this operation :).

OK so let's talk about the System VLAN.


The System VLAN command you define under your port profile is the single most important command you can do to ensure a safe change of your uplinks: The VLAN's you specify as System VLAN's bypass the distributed switch and are allowed to communicate with no restrictions. Your System VLAN should match your VMKernel Port.

Here is an example of one done correctly:

port-profile type ethernet n1kv-eth-2
  vmware port-group
  switchport mode trunk
  switchport trunk allowed vlan 1
  switchport trunk native vlan 1
  channel-group auto mode on sub-group cdp
  no shutdown
  system vlan 1  state enabled
 
In this example, I have specified that VLAN 1 is my system VLAN. this is the VLAN that my VMKernel Management Port uses:

port-profile type vethernet n1kv-veth-vlan-1-l3
  vmware port-group
  port-binding static auto
  switchport mode access
  switchport access vlan 1
  no shutdown
  system vlan 1
  max-ports 256
  min-ports 16
  state enabled





You need to specify the system VLAN both on the uplink and on the vethernet that you will be assigning to the host VMKernel itself.

OK, let's look at an example where I was naughty and didn't assign it correctly:

Here is my ethernet port that doesn't have a system-VLAN assigned

port-profile type ethernet n1kv-eth-2-no-sysvlan
  vmware port-group
  switchport mode trunk
  switchport trunk allowed vlan 1
  switchport trunk native vlan 1
  channel-group auto mode on mac-pinning
  no shutdown
  state enabled





I assign this to my host and oh dear, i can no longer access the host:




To be honest, if you have done this and you've forgotten to specify a system VLAN, your pretty much hosed and will need to restore to standard Switch, let's take a look why:





In the above example, you can see that the admin state for the physical interface (vmnic0) is down, this will occur because you have not specified a system VLAN for one of the uplinks, you will notice that vmk0 is showing as up up because this port HAS been assigned (correctly) a system VLAN. Ports will remain down until they can talk to the VSM, and they can't talk to the VSM if they don't have a System VLAN....


So in this case your only option is to restore the standard vSWITCH.

Let's take a look at what happens if we specify an incorrect System VLAN on both our vethernet and ethernet profile, and we also configure completely the wrong trunk:

port-profile type ethernet n1kv-eth-2-wrong-sysvlan
  vmware port-group
  switchport mode trunk
  switchport trunk allowed vlan 1-2
  switchport trunk native vlan 2

  no shutdown
  system vlan 2
 
state enabled
 

port-profile type vethernet n1kv-veth-vlan-1-l3-wrongsysvlan
  capability l3control
  vmware port-group
  port-binding static auto
  switchport mode access
  switchport access vlan 2
 
no shutdown
  system vlan 2
 
max-ports 256
  min-ports 16
  state enabled

!


Let's look at our helpful show port output now:



So the first step is to restore the trunk so that the correct VLAN is the native VLAN






We are now in the correct VLAN for our trunk, a few more steps remain:
 

Let's see what we can see under show bd now, which is a very useful command for determining which interfaces are using the VLAN:




This looks pretty promising: we can even see the multicast group, let's check to see if we now have MAC address table entries:





Success! So finally we should have show module....

FakeNexus1000V# show module
Mod  Ports  Module-Type                       Model               Status
---  -----  --------------------------------  ------------------  ------------
1    0      Virtual Supervisor Module         Nexus1000V          active *
3    248    Virtual Ethernet Module           NA                  ok
4    248    Virtual Ethernet Module           NA                  ok

Mod  Sw                  Hw     
---  ------------------  ------------------------------------------------ 
1    4.2(1)SV2(1.1a)     0.0                                             
3    4.2(1)SV2(1.1a)     VMware ESXi 5.0.0 Releasebuild-469512 (3.0)     
4    4.2(1)SV2(1.1a)     VMware ESXi 5.0.0 Releasebuild-469512 (3.0)     

Mod  MAC-Address(es)                         Serial-Num
---  --------------------------------------  ----------
1    00-19-07-6c-5a-a8 to 00-19-07-6c-62-a8  NA
3    02-00-0c-00-03-00 to 02-00-0c-00-03-80  NA
4    02-00-0c-00-04-00 to 02-00-0c-00-04-80  NA

Mod  Server-IP        Server-UUID                           Server-Name
---  ---------------  ------------------------------------  --------------------
1    172.21.1.11      NA                                    NA
3    172.21.1.103     564d22bf-1f05-4824-33e3-d3da94c45c17  NA4    172.21.1.105     564d0f85-3b7e-313e-d774-1124ea22cf10  172.21.1.105

We do! Great Success!


So, as long as your uplink (in our case, LTL 17), your VMKernel (LTL 49), Control (Always LTL 10) and packet (always LTL 12) have the system VLAN specified (determined using show bd you will be able to recover from the incorrect System VLAN.


Two other great troubleshooting commands are vemcmd show port-old:


and vemcmd show card, which can be used to determine if you have an issue with duplicate VSM's





I hope this helps someone out there!

















1 comment:

  1. what if you want to modify the current System Vlan? at this time you have the port-profile used by veth so the system wont allow you.
    i am looking for a way to change it on the VSM CLI without the need to access the Vcenter.

    Thanks

    ReplyDelete

Popular old posts.