Stretched DAG across 2 sites, 1 x multi-role server in each site running Exchange 2010 SP2 UR6 on Server 2008R2 SP1. Witness Server is in the primary site, alternate witness server is in the secondary site, a firewall exists between the 2 sites.
All appeared to be well with our DAG until our lovely firewall people decided to remove the ANY /ANY rules that existed between the Exchange Servers and between the Exchange Servers and DCs to see if we could run Exchange with restrictive firewall rules in place. During this period the servers were rebooted and traffic was blocked by the firewall, I now believe that we have a split-brain scenario! ANY /ANY rules are back in place but we are faced with this situation:
The mail database has a copy status of MOUNTED on the primary Exchange Server but is FAILED on the secondary Server.
If I try to manage the DAG through EMC from either server I receive error messages along the lines of ‘GetDAGNetworkConfig Failed. The NetworkManager has not yet been initialized. If I run the get-databaseavailabilitygroup | fl command in EMS the Operational Servers, PrimaryActiveManager and Networks settings are empty.
If I run Failover Cluster Manager on the secondary server the Secondary Node is marked as unavailable. Cluster Events is riddled with the following events:
1196/1579 (events for both primary and secondary servers) – DNS operation Refused (I think I’ve resolved this by deleting and recreating the DNS Record for the DAG and enabling the option to Allow any authenticated user to update DNS record with the same owner name). 1069 – Cluster resources ‘IPV4 Static Addresses 2 (cluster group) in clustered service or application cluster group failed. 1564 – The file share witness resource failed to arbitrate the file share…
Windows Event logs include:
Primary Server Application Log 4123 – Failed to get the boot time of witness server. The RPC server is unavailable Primary Server Application Log 4082 – Active Manger Failed.
Secondary Server Application Log 4113 – Database redundancy health check failed Secondary Server Application Log 2060 – Exchange replication service encountered a transient error, NetworkManager has not been initialized.
There are only a few mailboxes on the database as the service is not live at the moment so his is positive. Would it be best to remove the database from the secondary server, destroy the DAG and recreate?
Any tips would be REALLY appreciated!
Andrew