A Lesson in Business Continuity
 
 
	
	
		Friday, 6/29/2012, a derecho, a thunderstorm with sustained high 
		straight line winds, hit Virginia, Maryland, DC, Ohio, and some other 
		states.  Large trees and limbs were toppled taking power and 
		communication lines down with them.  Millions were without power and 
		numerous deaths were attributed to the storm.  Few businesses were 
		*directly* impacted, though their electric, Internet, and phone 
		providers were.
	
 
		I am not a fan of the term disaster recovery.  When you hear that, you 
		think about a meteor taking out the whole building, not a key person 
		having appendicitis or a derecho.  I like to think in terms of business 
		continuity.  Business continuity is all about being able to operate, 
		perhaps in a degraded fashion, when things go wrong.
	
 
		We did not have a whole lot of warning.  The timing for many businesses 
		in the Washington, DC area was "good."  The storm hit my home and office 
		at 10:30PM on Friday 6/29 and was gone by 11:30PM.  The next week 
		included the July 4 holiday on a Wednesday, so lots of people were 
		taking time off that week anyway, though Iron Horse was supposed to be 
		open and all its personnel were in town.
	
 
		This taught me the following lessons:
	
 
		(1) Even what you think is a well laid plan will not work if something 
		else it depends on is not available.  The basement of my home flooded 
		because the torrential rains filled up the basement stairwell. 
		Unfortunately, the sump pump at the bottom could not pump the water out 
		because it had no power.  I also could not see what was happening 
		because my six year old had had fun with some of our flashlights and 
		they had run down, the bulbs had burnt out, or he forgot where he put 
		them after taking them from where they were supposed to be.
	
 
		(2) Even if you test your plan, and everyone knows the plan, some of it 
		might fail.  We had those flashlights working a couple of months 
		before.  Though my six year old knew about the plan, those flashlights 
		were just an irresistible draw.
	
 
		(3)  Assessment is a necessary first step before you attempt to remedy a 
		problem, but remedies and even attempts at assessment can create further 
		problems...  When I first opened the stairwell door to check on the 
		water, the power was operating and I could see everything was OK.  The 
		second time, the power was out and I could not see clearly.  I opened 
		the door and a wall of water came in.  Oops!
	
 
		(4) Be safe.  Take care of your people.  I wanted to see how bad stuff 
		was and maybe bail out the stairwell, but my wife (wisely) did not let 
		me go out in the storm.  I was upset at having to mop up the basement 
		and did not want any more flooding, but going out in the storm would not 
		have been one of my brighter moves.
	
 
		(5)  Think creatively.  I had two uninterruptible power supplies in my 
		house.  One was connected to my TV, cable, and Internet.  They continued 
		working when the lights went out so we found out what was happening. But 
		that one ran down quickly.  The other one was connected to a computer 
		that was powered off.  I picked up that UPS and put it on a stable stool 
		next to the stairwell door and plugged it into the GFI circuit (no, I do 
		not have a death wish and am very careful with electricity!) on the wall 
		that had the sump pump plugged in.  Then I plugged the sump pump into 
		it.  Yeah!  The water immediately pumped out of the stairwell!
	
 
		(6)  Give up and take care of yourself.  It was late at night.  The 
		storm had passed.  I had sucked the water out of the stairwell.  We had 
		no power and I could not see whether other damage had been done.  So the 
		wife, Fluppy the Puppy, my six year old, and I went to sleep in one 
		bed.  Maybe sleep is not the right word to use with a six your old and a 
		dog in your bed....
	
 
		(7)  Use what works.  I could have used my watch or my phone to work as 
		an alarm clock, but six year olds tend to get up early anyway....
	
 
		(8)  Luck does not hurt.  Our power was on again by 7AM, but many people 
		in our area did not have power for a week.  It was also the weekend, so 
		work was not an issue.
	
 
		(9)  Gather information and reassess as necessary.  I found out the area 
		around my office had been especially hard hit by searching the 
		Internet.  One man was killed by a falling tree on a major road nearby.  
		I knew the entire area would be a mess.  It being Saturday at that 
		point, I resolved not to even try to get to work even though my web site 
		and e mail were down.  Getting in the way of the work and emergency 
		crews was a bad idea.  By Sunday morning, the connection to the office 
		was back up and so was the web site and e mail.  Reports from the area 
		were still bad, so my staff and I stayed away on Monday and worked from 
		home.
	
 
		(10)  Fixes can cause their own problems or require new plans.  When our 
		street got its power I started hearing a loud hum like that of a high 
		power piece of equipment.  It was not in my house.  At the end of my 
		street, a downed wire was arcing and sending flames 20 or more feet into 
		the air.  The fire department came and closed off the street, but the 
		power crews were not able to cut the power for over an hour.  Cutting 
		that power blacked out part of my neighborhood for many days.
	
 
		(11)  Pool your resources.  One of our friends was in an area without 
		power for days.  The following week was unbearably hot, so we invited 
		her family over and they charged up their cell phones and tablets, slept 
		in our air conditioning, and used our Internet connection (she often 
		works from home, but it had no power).
	
 
		(12)  Travel may not be an option, so teleworking can save the day. 
		Trees were down everywhere.  Power lines were down.  Stoplights were 
		dark.  Travel was iffy, especially in the area near the Iron Horse 
		offices.
	
 
		(13)  Other people can make their problems yours.  When I finally got 
		back in to work, I found a very large tree had snapped off about 20 feet 
		up and fallen on the roof of the business located directly above Iron 
		Horse.  We had to have our cars out of the way of the crane coming in, 
		the possibility of a flood upstairs flooding us as well, and the tree or 
		part of it falling on to our brand new heat pump when they tried to 
		remove it.  Fortunately none of that happened, but we prepared for it.
	
 
		(14)  It may not be over when it is over.  Power blinked on and off at 
		the office multiple times like it did at my house as the storm went 
		through.  There were also overvoltages which my uninterruptible power 
		supplies (UPSs) handled.  Unlike my house, the power never failed for an 
		extended period, so my UPSs kept all the equipment working.  However, 
		the power did blink on and off after the storm for days afterwards.  I 
		observed it happening at my desk, but at least the UPSs kept us 
		working.  There were brownouts (low power) events as well.  These mini 
		outages and brownouts probably occurred as crews in the area powered on 
		parts of nearby grids.  Brownouts, blackouts, and overvoltages can cause 
		hardware damage or data corruption.  Just a few days ago, the UPS that 
		had protected the machine at my desk registered a battery failure, so 
		now I need to replace its batteries.  Still, it did its job.  On 
		occasion, I have seen UPSs fail because though the electronics read the 
		batteries as being OK, they were not and failed.  [If you are wondering 
		at this point whether your UPS is up to the task, ask us.]
	
 
		(15)  Backup and restore is not just about computers, it is about 
		people, communications, places to work, and other resources as well. 
		When our Internet link failed our phone lines went with it, but they 
		were automatically redirected to cell phones and we were able to do much 
		of our work at home.
	
 
		(16)  Emergency services may not be available.  When an entire area gets 
		clobbered, you cannot count on emergency services being available to 
		you.  They may be otherwise engaged, be working a larger or more urgent 
		issue, or may have their own issues.  Fairfax County citizens not only 
		lost their dial tone, but the 911 service and its backup also got 
		knocked out.  Yes, it was not supposed to happen....  This is a very 
		good reason for intentionally delaying recovery efforts as part of a 
		continuity plan.  If you get in the way of emergency crews or you need 
		their assistance, you have a big problem.  Sometimes just sitting back 
		and saying, "We're hosed.  Everyone's hosed.  Let's relax." is best.
	
 
		(17)  Sometimes issues cascade and you must deal with those.  Right 
		after the power went down the temperatures jumped up.  This hampered 
		emergency crew response times, but also made it impossible to safely use 
		many electronic devices.  It was simply too hot and humid for them to 
		operate.  Computers do not behave well in hot, humid, un-air conditioned 
		environments.  The heat was so high it bent some train tracks and took 
		out some electrical switch gear.  High heat and lack of air conditioning 
		made it imperative that many people find cool shelter.
	
 
		There were some other failures of equipment.  A local movie theater with 
		power had a lot of disgruntled customers after the storm because they 
		could not show some of their movies.  That was because movie theaters 
		now get digital copies of the movies they show and their servers and 
		networks were "acting up."  In other words, they did not have proper UPS 
		protection or equipment.  Even after they got power back, recovery 
		procedures and smaller power glitches kept them from being able to do 
		business.
	
 
		(18)  Help your neighbors.  Later, they might be able to help you. After 
		checking out Iron Horse, I got a frantic call from a neighboring 
		business.  Their computers were "making terrible noises."  Turns out 
		they had failed during the power outage and the terrible noises were 
		coming from their speakers.  Turning those computers off solved the 
		problem until the user could return.
	
 
		I then decided to check out other businesses in my complex.  One 
		doctor's office was completely out of business.  Though they had power, 
		they had Verizon DSL Internet access and that was out.  Since they had 
		converted to electronic record keeping and the records were centralized 
		in another office, they could not treat any patients.  Furthermore, they 
		could not call many of their patients because many phone lines were also 
		down.
	
 
		DSL typically costs less than other connections but.... the phone 
		companies do not promise the reliability that they do with other 
		connections.  Consumer grade DSL connections, like those you might have 
		at your house, have even less of a reliability promise than business 
		grade DSL connections.  All of the people on Verizon DSL in that complex 
		did not have Internet access for days and when it came back up, it 
		flickered up and down due to both power and connectivity issues. 
		Millions of Verizon FiOS and traditional land line customers had no dial 
		tone either.  [If you want to talk about ways to keep your Internet and 
		phones up and running, just ask.]
	
 
		I offered what help I could (for free) to get my neighbors back on 
		track.
	
 
		(19)  Your business continuity plan might have to take into account that 
		you might have more business than usual.  Many businesses had to shut 
		down due to a lack of power, but some restaurants and gas stations had 
		to stay closed because their registers or credit card machines would not 
		work.  Those restaurants and gas stations that could stay open had 
		tremendous amounts of business because people could not cook.  A couple 
		of nearby restaurants ran out of food and had to close!
	
 
		(20)  Sometimes alternate procedures are simple.  For those businesses 
		in my complex trying to reach clients, I told them that Verizon land 
		line numbers might not work.  Even if it did ring on their end, it might 
		not ring on the other end.  If they were able to leave a message, it was 
		fairly likely that that person might not be aware they had a message for 
		days.  If they could not get through on the land lines, I advised them 
		to call the cell phone numbers they had on file.
	
 
		I also explained to them that e mail works like picking up and sending 
		mail at the post office.  Your e mail client on your machine posts a 
		message to your e mail server.  That "post office" then contacts other 
		post offices down the line until it can deliver it to the destination 
		post office of your recipient.  At that point, the mail is considered 
		"delivered," though the recipient still has to pick up the mail. 
		However, if one of those handoffs between "post office" servers cannot 
		be made because the connection is broken, the sending server just waits 
		and tries again.  It keeps extending the wait period between retries 
		until it finally decides (usually after 5 or more days) that it cannot 
		get through and sends a message back to the sender.  Many clients will 
		keep trying to send to a server (deliver a message to the post office) 
		until they succeed or they time out (admit failure) as well.
	
 
		While most people think that e mail is instantaneous and an assured 
		delivery mechanism, it is not.  E mail may take days to deliver.  My 
		record is having an e mail returned as being undeliverable 35 days after 
		I sent it.  If something happens in transit an e mail may get corrupted 
		or, more likely, disappear entirely.  Anti-spam measures often make 
		valid messages disappear entirely with neither the sender nor recipient 
		knowing those messages did not get through.  Even if you send a message 
		and it gets all the way through to the recipient, it still does not mean 
		that they have actually seen it.
	
 
		This last point was especially important to the businesses I talked to.  
		Because messages may have been sent days ago and were being retried at 
		varying intervals, when a link goes down you are almost certain to 
		receive messages out of order.  So, a message sent 5 days ago might 
		arrive today along with something sent 5 minutes ago.  But, you would be 
		more likely to notice the 5 minute message because almost everyone looks 
		at their inbox in terms of time and checks their most recent messages. 
		The time stamp on a message is not the time of reception, but the time 
		the message was first sent.  I had to warn these businesses that they 
		needed to check for "new" messages that had been sent days in the past.
	
 
		(21)  Recovery is not really possible.  You will never really fully make 
		up for the time lost and the extra pain and suffering you had to go 
		through.  That is life.
	
 
		(22)  Learn something from the experience.  In writing this e mail, I 
		have considered adjusting some of my plans.  For example, I need to buy 
		more flashlights and batteries and hide them.  I have yet to have one of 
		my neighbors ask me about their issues.  I offered to help, but I cannot 
		help anyone who does not believe they need to do something. 
		[Fortunately, most of my regular clients seemed to have weathered the 
		storm nicely.]
 
		If you have to implement your business 
		continuity plan, you are going to be in some sort of pain.  A key to 
		alleviating that pain is to think ahead and plan for contingencies.  
		Iron Horse can help if you call on us.
	
 
		And, do not think that you are immune.  You are not.  I recently advised 
		a federal government client of mine whose budget had been unexpectedly 
		cut to implement his business continuity plans and declare a 
		"disaster."  Such a declaration would allow him the option of making 
		"extraordinary" measures like curtailing services to business units 
		because they could no longer be paid for with a decreased budget.  I 
		advised another client whose key IT person got sick to implement their 
		business continuity plan as well.
	
 
		You do not know what is coming, but you do know that life is always a 
		bumpy ride.  Be ready and get your "shock absorbers" in place.  If you 
		have not done any business continuity planning or built reliability 
		features into your workplace, maybe we need to talk.
	
 
		If you have any pointers you would like to 
		share, please e mail us back!
	
	©2012 Tony 
									Stirk, Iron Horse tstirk@ih-online.com