This blog post is on Graceful Restart, also known as Non-Stop-Forwarding, which is a totally open standard that can be extremely useful when performing maintenance on devices with seperation of control-plane and Data-plane , with distributed linecards.
ISSU (In-Service-Software-Upgrades) is a Cisco feature that works in conjunction with Graceful restart to provide the ability to perform upgrades on devices without interrupting network traffic, even on single-supervisor platforms (obviously it is somewhat easy to perform upgrades on dual-supervisor platforms and has been for a very long time)
OK First let's make sure we all agree on our terminology and quickly talk about the diffirent options
Graceful Restart
Straight from the horses mouth:
( http://www.cisco.com/en/US/docs/switches/datacenter/sw/5_x/nx-os/high_availability/configuration/guide/ha_network.html )
OSPFv2 Graceful Restart on an OSFPv2 Process Failure
OSPFv2 automatically restarts if the process experiences problems. After
the restart, OSPFv2 initiates a graceful restart so that the platform
is not taken out of the network topology. If you manually restart OSPF,
it performs a graceful restart, which is similar to a stateful
switchover. The running configuration is applied in both cases. The
graceful restart allows OSPFv2 to remain in the data forwarding path
through the process restart.
Note If
the restarting OSPFv2 interface does not come back up before the end of
the grace period, or if the network experiences a topology change, the
OSPFv2 neighbors tear down adjacency with the restarting OSPFv2 and
treat it as a normal OSPFv2 restart.
So as we can see from above graceful restart at least in OSPF is implemented using the actual routing protocol itself.
Non Stop Routing is not a protocol level idea but rather a process level idea where by if the OSPF process on NXOS itself has an issue, the process itself will attempt to resume from it's previous runtime state, this is completely transparent to the other routers and is used when doing things like switching supervisor modules.
Non-stop-Forwarding! Refers to Graceful restart,
Graceful Restart, or Non-Stop-Forwarding is available in OSPF, EIGRP, ISIS and BGP on the NX-OS Platform.
Here is how it works for each of the protocols
OSPF
When OSPFv2 needs to do a graceful restart, it first sends a link-local opaque (type 9) LSA, called a grace LSA
So this is actually something built into the protocol itself! Great!
OK Let's look at how we would configure it:
First we would configure OSPF:
router ospf GRACEFULRESTART
......
We are done!
No seriously, graceful restart is the default configuration:
switch(config)# show ip ospf
Routing Process GRACEFULRESTART with ID 0.0.0.0 VRF default
Stateful High Availability enabled
Graceful-restart is configured Grace period: 60 state: Inactive
Routing Process GRACEFULRESTART with ID 0.0.0.0 VRF default
Stateful High Availability enabled
Graceful-restart is configured Grace period: 60 state: Inactive
Let's see it in action, i am going to skip the boring part of my normal OSPF config and go straight to showing you what happens when you issue the graceful restart command
switch# show ip ospf neighbor
OSPF Process ID GRACEFULRESTART VRF default
Total number of neighbors: 1
Neighbor ID Pri State Up Time Address Interface
2.2.2.2 1 FULL/DR 00:01:13 10.5.5.2 Vlan1
switch# restart ospf GRACEFULRESTART
switch# 2013 Jun 18 14:28:31.256961 ospf: GRACEFULRESTART [9566] BFD send auto expiry failed - status Broken pipe
2013 Jun 18 14:28:31.257193 ospf: GRACEFULRESTART [9566] (default) Coming up in graceful planned restart
2013 Jun 18 14:28:31.257229 ospf: GRACEFULRESTART [9566] (default) Enabling flooding on all the active physical interfaces.
2013 Jun 18 14:28:31.289319 ospf: GRACEFULRESTART [9566] (default) Scheduling grace LSAs for round round 2 2013 Jun 18 14:28:32.298621 ospf: GRACEFULRESTART [9566] (default) Putting all the active interfaces on the flooding list again.
2013 Jun 18 14:28:32.299151 ospf: GRACEFULRESTART [9566] (default) Scheduling grace LSAs for round round 3 2013 Jun 18 14:28:33.308720 ospf: GRACEFULRESTART [9566] (default) Putting all the active interfaces on the flooding list again.
2013 Jun 18 14:28:33.309434 ospf: GRACEFULRESTART [9566] (default) Disabling flooding on all the active physical interfaces...
2013 Jun 18 14:28:37.374766 ospf: GRACEFULRESTART [9566] (default) Determining exit after a new adj is formed..
2013 Jun 18 14:28:37.374813 ospf: GRACEFULRESTART [9566] (default) total: pre-restart lsas recd?: NO, reestablished adjs: 0, pre-restart adjs: 1
2013 Jun 18 14:28:37.374834 ospf: GRACEFULRESTART [9566] (default) discover_vlink_adj: NO, total_prerestart_transit_areas: 0, total_active_transit_areas: 0
2013 Jun 18 14:28:37.380476 ospf: GRACEFULRESTART [9566] (default) Determining exit after a new adj is formed..
2013 Jun 18 14:28:37.380500 ospf: GRACEFULRESTART [9566] (default) total: pre-restart lsas recd?: NO, reestablished adjs: 1, pre-restart adjs: 1
2013 Jun 18 14:28:37.380517 ospf: GRACEFULRESTART [9566] (default) discover_vlink_adj: NO, total_prerestart_transit_areas: 0, total_active_transit_areas: 0
switch# show ip ospf neighbor
OSPF Process ID GRACEFULRESTART VRF default
Total number of neighbors: 1
Neighbor ID Pri State Up Time Address Interface
2.2.2.2 1 FULL/DR 00:00:10 10.5.5.2 Vlan1
switch#
OSPF Process ID GRACEFULRESTART VRF default
Total number of neighbors: 1
Neighbor ID Pri State Up Time Address Interface
2.2.2.2 1 FULL/DR 00:01:13 10.5.5.2 Vlan1
switch# restart ospf GRACEFULRESTART
switch# 2013 Jun 18 14:28:31.256961 ospf: GRACEFULRESTART [9566] BFD send auto expiry failed - status Broken pipe
2013 Jun 18 14:28:31.257193 ospf: GRACEFULRESTART [9566] (default) Coming up in graceful planned restart
2013 Jun 18 14:28:31.257229 ospf: GRACEFULRESTART [9566] (default) Enabling flooding on all the active physical interfaces.
2013 Jun 18 14:28:31.289319 ospf: GRACEFULRESTART [9566] (default) Scheduling grace LSAs for round round 2 2013 Jun 18 14:28:32.298621 ospf: GRACEFULRESTART [9566] (default) Putting all the active interfaces on the flooding list again.
2013 Jun 18 14:28:32.299151 ospf: GRACEFULRESTART [9566] (default) Scheduling grace LSAs for round round 3 2013 Jun 18 14:28:33.308720 ospf: GRACEFULRESTART [9566] (default) Putting all the active interfaces on the flooding list again.
2013 Jun 18 14:28:33.309434 ospf: GRACEFULRESTART [9566] (default) Disabling flooding on all the active physical interfaces...
2013 Jun 18 14:28:37.374766 ospf: GRACEFULRESTART [9566] (default) Determining exit after a new adj is formed..
2013 Jun 18 14:28:37.374813 ospf: GRACEFULRESTART [9566] (default) total: pre-restart lsas recd?: NO, reestablished adjs: 0, pre-restart adjs: 1
2013 Jun 18 14:28:37.374834 ospf: GRACEFULRESTART [9566] (default) discover_vlink_adj: NO, total_prerestart_transit_areas: 0, total_active_transit_areas: 0
2013 Jun 18 14:28:37.380476 ospf: GRACEFULRESTART [9566] (default) Determining exit after a new adj is formed..
2013 Jun 18 14:28:37.380500 ospf: GRACEFULRESTART [9566] (default) total: pre-restart lsas recd?: NO, reestablished adjs: 1, pre-restart adjs: 1
2013 Jun 18 14:28:37.380517 ospf: GRACEFULRESTART [9566] (default) discover_vlink_adj: NO, total_prerestart_transit_areas: 0, total_active_transit_areas: 0
switch# show ip ospf neighbor
OSPF Process ID GRACEFULRESTART VRF default
Total number of neighbors: 1
Neighbor ID Pri State Up Time Address Interface
2.2.2.2 1 FULL/DR 00:00:10 10.5.5.2 Vlan1
switch#
Unfortunately I can't show this 100 percent live, but rest assured when I restarted the OSPF process, the other neighbor never saw the relationship go down, and as you can see the nieghbors when the process has finished restarting never went through the "TWOWAY/DR" negotiation phase.
There are not that many options for your graceful-restart, the major things you can do in OSPF are change the timer that the graceful restart period will wait before assuming if it hasn't heard from the neighbor within that period then something went wrong with the graceful restart, so the peer is assumed to be actually dead and that a full restart will be required.
EIGRP
In EIGRP we have the same concept as graceful restart, but when configuring it, EIGRP uses the term "non stop forwarding" (NSF)
Because EIGRP is not a SPF routing protocol, the way graceful restart works is slightly more complicated.
First, if EIGRP needs to perform a graceful restart, it sends an indication in it's hello messages to it's neighbors, if your neighbor supports graceful restart, the first thing they do is exchange a full topology table.
Next, the graceful-restart-aware router (i.e. the router that is a neighbor of the one performing a graceful restart) expires the hold timer so that hello messages can be sent more quickly, this actually helps because it means when the restarting router comes back up, adjacancies can be established more quickly.
Next, the graceful-aware router (remmeber, the one that is NOT restarting) waits a hold period for the restarting router to send it a hello, if it doesn't receive a hello within this time frame, it assumes the neighbor has actually died during the graceful restart and drops all the routes for that neighbor, but while waiting for that period of time, the router WILL hold all the routes in it's routing table until the neighbor finishes restarting.
Like OSPF, EIGRP has Graceful restart enabled by default, there are some timers that can be modified however
switch-sw1-3(config)# show ip eigrp
IP-EIGRP AS 1 ID 2.2.2.2 VRF default
Process-tag: 1
Status: running
Authentication mode: none
Authentication key-chain: none
Metric weights: K1=1 K2=0 K3=1 K4=0 K5=0
IP proto: 88 Multicast group: 224.0.0.10
Int distance: 90 Ext distance: 170
Max paths: 8
Number of EIGRP interfaces: 1 (0 loopbacks)
Number of EIGRP passive interfaces: 0
Number of EIGRP peers: 1
Graceful-Restart: Enabled
Stub-Routing: Disabled
NSF converge time limit/expiries: 120/0
NSF route-hold time limit/expiries: 240/0
NSF signal time limit/expiries: 20/0
Let's look at these values, the signal time is how long the router will wait before starting the graceful restart process, the converge time is how long the router is willing to wait before sending a EOT (End of Topology) message to the "Assisting" router to indicate it has finished sending its topology table and finally the route-hold time is how long this particular router will keep the OTHER ROUTERS routes in it's routing table during the graceful restart.
Hello, Graceful restart/NSF is a great feature but I think it works in a bit different way than explained. First I work in SP area, so correct me if it's different in DC. During the NSF the neighbor relationship actually drops, but because of the LSA 9, the neighbor keeps the routes in the routing table and continue forward the traffic (of course within the grace time). On the other hand NSR doesn't drop the neighbor ship during the the RP failure (or planned switchover), in that case the box doesn't send any isis/ospf grace LSA and the neighbor doesn't even understand for the failure.
ReplyDeleteHi, nice post. Could you show BGP's one as well ?
ReplyDeleteHi, I have a question that how to perform a graceful restart test? I mean the cisco device's included some admin commands such as "ospf nsf restart", "ospf graceful restart" ... to enter GR mode. I'm using C3560X switch layer 3.
ReplyDeleteTrung tâm đào tạo kế toán Tại cầu giấy
ReplyDeleteTrung tâm đào tạo kế toán Tại từ liêm
Trung tâm đào tạo kế toán Tại thanh xuân
Trung tâm đào tạo kế toán Tại hà đông
Trung tâm đào tạo kế toán Tại long biên
Trung tâm đào tạo kế toán Tại nguyễn chính thanh đống đa
Trung tâm đào tạo kế toán Tại minh khai hai bà trưng
Trung tâm đào tạo kế toán Tại bắc ninh
Trung tâm đào tạo kế toán Tại hải phòng
Trung tâm đào tạo kế toán Tại tphcm
Trung tâm đào tạo kế toán Tại quận 3
Trung tâm đào tạo kế toán Tại thủ đức
Trung tâm đào tạo kế toán Tại đà nẵng
Trung tâm đào tạo kế toán Tại biên hòa
Trung tâm đào tạo kế toán Tại đồng nai
Trung tâm đào tạo kế toán Tại nam định
Trung tâm đào tạo kế toán Tại thái bình
Trung tâm đào tạo kế toán Tại bắc giang
Trung tâm đào tạo kế toán Tại vĩnh phúc
Trung tâm đào tạo kế toán Tại thái nguyên
Trung tâm đào tạo kế toán Tại quảng ninh
Trung tâm đào tạo kế toán Tại hải dương
Trung tâm đào tạo kế toán Tại hưng yên
Trung tâm đào tạo kế toán Tại hà nam
Trung tâm đào tạo kế toán Tại ninh bình
Trung tâm đào tạo kế toán Tại nghệ an
Trung tâm đào tạo kế toán Tại vũng tàu
hoc chung chi ke toan