Monday, 17 December 2012

VMware host alert won't clear

Had a little issue with several esxi hosts after they were shutdown for a power down. When powered back up several (not all) had a connection and power state alert which wouldn't go away, even though the cluster was all good.

To clear the alert I had to go to parent object where the alert was defined, edit settings, disable and save. Then edit again and enable the alert again. The alarm then cleared.

Saturday, 1 December 2012

Exchange 2010 DAG failover

I currently support a 3 node mailbox (MBX) DAG where 2 nodes are in one data centre (I’ll call it the primary) and 1 is in a second data centre (I’ll call it the secondary.)  Also in the primary is a pair of load balanced CAS/HUB servers and in the secondary a single CAS/HUB server.  There is a DNS alias for the CAS that clients use for access.  All servers are Windows 2008 R2.

Occasionally I have had planned downtime in the primary DC where I want to move all services to the secondary, but with the majority of nodes being in the primary DC, Exchange dismounts all databases once the two nodes there are shutdown. I haven’t found a much help on the web for this specific scenario so here is my version.  It does require some knowledge of Exchange DAG’s and Microsoft clustering especially as sometimes it isn’t completely plain sailing!

**NOTE I find it more reliable to use Exchange Managament Shell (Powershell) to perform the following Exchange tasks as the GUI does not always refresh consistently and promptly.

-------- move CAS--------------------------------------------------------------
Change the CAS DNS alias to the secondary CAS/HUB server
I have the TTL for my alias set to 5 minutes, this means all clients should be connected to the secondary after 5 minutes at most.

-------- move mailbox databases to the secondary MBX server---------
Move all active mailbox databases to the secondary MBX server (using the lossless option if possible)
Move-ActiveMailboxDatabase –ActivateOnServer <secondaryMBX> -MountDialOverride lossless

Suspend all database copies on primary MBX servers
Suspend-MailboxDatabaseCopy <db identity\MBX server>

-------- move witness to the secondary DC--------------------------------
The DAG has a file share witness configured, I can’t see a way of not needing this to fail over to a single server
Open "Failover Cluster Manager" on secondary MBX server
Right click DAG > more actions > configure cluster quorum settings
change witness file share to \\<secondary CAS/HUBserver>\<DAGwitnesspath>\
close "Failover Cluster Manager"

-------- Move DAG owner to secondary-------------------------------------
The DAG has an IP address and a MBX server will be the owner, it must be moved to the secondary
Cluster group “<cluster group name>” /MoveTo:<secondary MBX server>
At this point check that the DAG IP address has moved to the secondary DC network and is online and pingable

-------- Shutdown rimary DC Servers-----------------------------------------
Shutdown CAS servers in primary DC
Once complete shutdown MBX servers in primary DC

-------- Check database status------------------------------------------------
Databases will now be dismounted on the secondary MBX server due to the primary two nodes shutting down
Check with Get-MailboxDatabaseCopyStatus –Server <secondary MBX server>
Open "Failover Cluster Manager" on secondary MBX server
Force the cluster back online by right clicking the DAG and use the "force cluster start" option, accept the warning
Now check again that all databases are mounted on secondary MBX.

-------- Failback-----------------------------------------------------------------
Once the primary datacentre is online again and all the servers are available then resume database copies on the primary MBX servers
Resume-MailboxDatabaseCopy <db identity>\<MBX server>

You can check the health status (copy and replay queues plus content index state) of all database copies with
Get-MailboxServer | Get-MailboxDatabaseCopyStatus
Depending on the amount of downtime and activity on your Exchange system some of these queues may get large and require some time to replicate all changes back before the databases can be moved back to the primary

Once all database copies are complete then they can be moved back to the primary MBX servers (using the lossless option if possible)
The CAS DNS alias can be moved back to primary CAS cluster
The file witness can be moved back to one of the primary CAS/HUB
The failover cluster group should be moved back to one of the primary DC MBX servers, checking the IP address is online afterwards

Sunday, 21 October 2012

Short filenames can be painful

So I migrated data on a Windows 2008 file server this weekend and it seemed pretty straight forward until I checked services and found the patching proxy service wouldn't start. The services uses squid proxy and it turned out by checking the log file that it uses the 8.3 dos short filenames for paths to the config file. I did dir /x and found two program file paths, Program Files (x86) - PROGRA~1 and Program Files - PROGRA~2. They must have switched shortname when copying, I can only assume the x86 folder was originally created first on the server. It was pretty annoying but thankfully pretty easy to switch them around using "fsutil file setshortname." I can now enjoy the rest of my sunday!

Monday, 9 July 2012

HP network config utility issue

Bit of a strange one configuring nic teams on four DL380's. Two worked fine. Two kept giving me an error "error occured when making a call into the operating system." I tried uninstalling then installing the ncu software but that didn't fix it. Finally I got it working by unticking all the nic services and protocols except the hp teaming service which then allowed me to create the teams.  I didn't find out what the actual problem was but this seemed a fairly straightforward work around as the teaming software does this anyway when creating the team.

Tuesday, 15 May 2012

VM Templates and Windows Licensing

I recently had a frustrating problem cloning a vm. It just wouldn't sysprep using either the customisation tool in vsphere or manually in Windows. Doing a little digging I discovered the rearm count was down to zero. Presumably the vm was a clone of a clone of a clone! (This vm has been around longer than I have here!)  There's a useful command to check the rearm count if you are planning on cloning a machine and aren't sure of it's history

Cscript c:\windows\system32\slmgr.vbs /dlv

I did find this blog, which is useful:

http://option9.blogspot.co.uk/2009/06/getting-around-windows-rearm-limit-with.html

However I couldn't get the SkipRearm reg key or xml update to work. Luckily I had an alternative machine I could clone this time...

Friday, 11 May 2012

Windows 7 VM's on ESXi

Something to remember, if you have any Windows 7 VM's running on ESXi it runs a whole load smoother if you turn the Aero themes off!  Especially when connecting via RDP.