Saturday, 1 December 2012

Exchange 2010 DAG failover

I currently support a 3 node mailbox (MBX) DAG where 2 nodes are in one data centre (I’ll call it the primary) and 1 is in a second data centre (I’ll call it the secondary.)  Also in the primary is a pair of load balanced CAS/HUB servers and in the secondary a single CAS/HUB server.  There is a DNS alias for the CAS that clients use for access.  All servers are Windows 2008 R2.

Occasionally I have had planned downtime in the primary DC where I want to move all services to the secondary, but with the majority of nodes being in the primary DC, Exchange dismounts all databases once the two nodes there are shutdown. I haven’t found a much help on the web for this specific scenario so here is my version.  It does require some knowledge of Exchange DAG’s and Microsoft clustering especially as sometimes it isn’t completely plain sailing!

**NOTE I find it more reliable to use Exchange Managament Shell (Powershell) to perform the following Exchange tasks as the GUI does not always refresh consistently and promptly.

-------- move CAS--------------------------------------------------------------
Change the CAS DNS alias to the secondary CAS/HUB server
I have the TTL for my alias set to 5 minutes, this means all clients should be connected to the secondary after 5 minutes at most.

-------- move mailbox databases to the secondary MBX server---------
Move all active mailbox databases to the secondary MBX server (using the lossless option if possible)
Move-ActiveMailboxDatabase –ActivateOnServer <secondaryMBX> -MountDialOverride lossless

Suspend all database copies on primary MBX servers
Suspend-MailboxDatabaseCopy <db identity\MBX server>

-------- move witness to the secondary DC--------------------------------
The DAG has a file share witness configured, I can’t see a way of not needing this to fail over to a single server
Open "Failover Cluster Manager" on secondary MBX server
Right click DAG > more actions > configure cluster quorum settings
change witness file share to \\<secondary CAS/HUBserver>\<DAGwitnesspath>\
close "Failover Cluster Manager"

-------- Move DAG owner to secondary-------------------------------------
The DAG has an IP address and a MBX server will be the owner, it must be moved to the secondary
Cluster group “<cluster group name>” /MoveTo:<secondary MBX server>
At this point check that the DAG IP address has moved to the secondary DC network and is online and pingable

-------- Shutdown rimary DC Servers-----------------------------------------
Shutdown CAS servers in primary DC
Once complete shutdown MBX servers in primary DC

-------- Check database status------------------------------------------------
Databases will now be dismounted on the secondary MBX server due to the primary two nodes shutting down
Check with Get-MailboxDatabaseCopyStatus –Server <secondary MBX server>
Open "Failover Cluster Manager" on secondary MBX server
Force the cluster back online by right clicking the DAG and use the "force cluster start" option, accept the warning
Now check again that all databases are mounted on secondary MBX.

-------- Failback-----------------------------------------------------------------
Once the primary datacentre is online again and all the servers are available then resume database copies on the primary MBX servers
Resume-MailboxDatabaseCopy <db identity>\<MBX server>

You can check the health status (copy and replay queues plus content index state) of all database copies with
Get-MailboxServer | Get-MailboxDatabaseCopyStatus
Depending on the amount of downtime and activity on your Exchange system some of these queues may get large and require some time to replicate all changes back before the databases can be moved back to the primary

Once all database copies are complete then they can be moved back to the primary MBX servers (using the lossless option if possible)
The CAS DNS alias can be moved back to primary CAS cluster
The file witness can be moved back to one of the primary CAS/HUB
The failover cluster group should be moved back to one of the primary DC MBX servers, checking the IP address is online afterwards

No comments:

Post a Comment