Whenever you develop a disaster recovery plan, make SURE you keep 3 backups especially if you are not experienced with developing (not maintaining) an enterprise-level backup system. I got bit by this in the past week. It has been experience well earned but at a price.
The main computer network I support has no test network. Nor do I have one (yet) at home. I consider this to be extremely important to have through the rest of my career. So soon as I get the chance, I'm going in the classified and getting three cheapy pIII boxes. That will give me the chance to get the majority effects of Active Directory replication I need as I shovel through my book kit
What happened is I have a two-server (and our web server linux box) domain setup. I have been performing disaster recovery/backup testing to ensure we don't go out of service because of a duff hard drive (which are now mirrored). The network is native 2000 domain, but there are still "main controller" roles filled by the first controller. I brought that controller offline while I pulled a different box and hardware up with the backup.
One of the mistakes I made was that I joined the domain and then performed a restore. Don't do this. Boot directly into Active directory restore mode and nowhere else. At this point I realized that things were not working as they should. The machine name was there and working BUT nothing was accessable from an Active Directory perspective (Roaming Profiles). Nor did the antivirus server software want to start its MS SQL engine, that I haven't spent enough time to figure out.
At this point after doing some digging and realized removing the master DC computer from the domain may get things going. I did not want to do this with a semi-functioning backup, so at this point I brought backup offline. When bringing the original master DC back online, the exact same problem for logins arose. The antivirus was fine, the DNS was resolving, and DHCP after a reauthorization was back in action. I removed the computer object from the active directory store and created a new one, gave it the same permissions, and voila, everything from userland is fully functional.
The problem is Active Directory. It is unhealthy at the moment. From box 1 (the master dc) I can no longer manage AD, I can manage box 2 AD however. A message appears about switching from domain . to domain contoso.com which then allows me to manipulate the AD from box 2. From box 2 I can manipulate both domain controllers.
Now in a real-world disaster recovery, I would have forced box 2 to become the master. Doing that though, means that box 1 may never be brought back into the network. That is not what I want to do because I was mearly "testing" disaster recovery. In the future, the proper steps to follow in order to test complete hardware failure recovery are to the best of my knowledge are here:
Some of the other nice links I have come across here, and here
At this point I will be seeking either help from MS or forums to get my AD back up to snuff.
It's all a work in progress...
What I've figured out I can do is because my most recent (only backup) of two weeks ago for our primary DC also holds a full backup of our secondary DC, I can perform a non-authoritive restore of the system state for each DC. Then, hopefully, that will fix the AD store. I am going to perform that this evening and post my results afterwards.
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment