Earlier in the day when I left Orlando, the winds were stiff, I had a feeling that perhaps they were too high to launch, but out on the causeway the winds had lessened, and the skies were clear. Over the loudspeaker, as the updates came in, the chances of launch were dropping, first they were 80%, then 60%, then 30%, all the while clear skies and moderate to light winds were all the frozen thousands could see.The forecasters would be right. About an hour before launch, low clouds filtered in. Occasionally, they were scattered and broken, giving hope that the launch would go on, but just 9 minutes before launch it was scrubbed. The low cloud deck and the thickness of the clouds were right on the edge of the margin of safety should the Shuttle have an abort event during liftoff where it would have to glide back to the runway at the Kennedy Space Center. During such an event, the Shuttle Commander and Pilot would not have the benefit of preprogrammed guidance into their computer; it would be pure piloting, in the darkness, with very little moonlight providing any extra light for visibility.
The old NASA prior to Challenger and Columbia might have launched. The margins were close. The current NASA couldn’t chance it. Such is life for NASA. After the loss of two Shuttles in accidents that could have been prevented – there can be no risk taking, even if the odds are in your favor. Flying into space is risky enough, adding even the slightest risk is out of the question. I got to thinking about the other well known accidents and whether scrubbing this launch was evidence of NASA at its best watching out for safety. Interestingly enough, when the poll was taken and they proceeded to get a “go for launch”, the Range Safety Officer and Weather Officer gave a final “no go”, you could tell in the voice of the launch director what sounded like a little hesitancy to call a scrub. NASA had a 10 minute window and the scrub came 9 minutes before launch which was in the middle of the window. So NASA had about 15 minutes for the weather to change. The launch director asked if they should continue to wait it out in hopes that it would change before the launch window expired. The call came back, “no”, it wasn’t going to change and they wouldn’t launch. We all wanted it to go. I thought if they have asked me what would I had said?
There’s a difference between having all the facts and making a call, and making a call based on rationalizing the past history. In the absence of facts, people tend to rationalize history. For example, there’s never been an instance of an abort back to the landing strip. Therefore, it’s an unlikely event, and if that were the only risk holding up launch assume it won’t happen and launch. This is the type of thinking that led to Challenger. This launch though was cause for additional caution. It was a cold night, breezy, clouds, the shuttle has been on the pad for weeks through some unusually cold weather for Florida (although the Shuttle was protected and heated during that time), and it was early February. The three loss of crew events have all occurred at the end of January or beginning of February. It’s a statistical anomaly that must lurk in the back of NASA’s mind whenever a launch occurs around that same time. All of those events and the serious in-flight accident of Apollo 13 all share a common thread.
The link between Apollo 1, Apollo 13, Challenger, and Columbia is that they are failures of systems and process, and not of the crew. What is most striking about the four is that they were all preventable. Ultimately, NASA is responsible for the oversight and the process that protects the crew and prevents or greatly reduces the likelihood of an accident and does not increase the risk. These four examples are the most striking, but the process has failed before without tragedy, but only because luck was on the side of the crew. In some instances, the safety nets kicked in and prevented tragedy and in others, the human element of the process failed.
When NASA launched Apollo 12 through rain clouds, the vehicle was struck by lightning and the systems went offline. The rocket could have lost control or not regained its bearings had the computer been damaged or the electronics disabled from the strike. The response: no one knew that could happen. NASA launched the first shuttle manned, even though the vehicle was little understood. NASA continued launching despite issues with the thermal protection system and foam loss early in the program. There were other issues that received little public attention:
- While Columbia was being transported from the factory where it was built in Palmdale, California, 5000 tiles fell off the vehicle. This included 4800 temporary tiles and 200 critical heat resistant tiles.
- In a January 1990 mission, as the crew slept, the Space Shuttle Columbia tumbled out of control for nearly twenty minutes.
- On a Columbia mission in 1993, an Auxiliary Power Unit leaked fuel and caused a minor fire while the Shuttle was descending. This was discovered as technicians found a burned section of the vehicle during inspection.
- In January, 1986, the launch of Columbia was stopped with 14 seconds to go when sensor readings caused a launch abort. It was discovered that a technician had accidentally drained the liquid oxygen from the fuel tank. The fuel starved Columbia would not have reached orbit. The shortage would have certainly led to the first ascent abort in the program history. It would have taken flawless computer and sensor performance to shut down the engines before the fuel ran out and avoid a catastrophic failure before the vehicle could attempt a high risk abort-to-landing site maneuver. It would have made for an unlikely survival scenario.
- In yet another close call, during a 1999 Shuttle mission by Columbia, it main engines leaked hydrogen from launch all the way to orbit. It resulted in an abnormally high use of the liquid oxygen fuel resulting in a premature engine shutdown. It was a stroke of luck that the leaking fuel did not explode and cause a loss of the vehicle.
There were also signs of o-ring erosion that were not addressed. The response: the Shuttle has to meet aggressive launch schedules to meet its costs and performance objectives. The Shuttle has had one ATO (abort to orbit), which occurred on the 19th flight when a main engine failed, but the mission proceeded normally until six missions later when Challenger exploded. Then on STS-95, the cover for the deceleration chute came off on the pad during launch and struck the vehicle without causing damage. There have been numerous launch pad aborts, an early cutoff of the Space Shuttle main engines due to a 4000lb shortage in the external tank, a failure of the primary system to detonate an explosive bolt on the SRBs to release the Shuttle stack from the pad. Luckily the backup system worked. In total, there were 16 abort events in the final 60 seconds to SSME ignition or after SRB ignition. There were numerous instances of foam coming off the vehicle, but it was considered a “maintenance” issue and not a “safety” issue.
After the Challenger accident and the subsequent review, there were numerous issues identified with Challenger including the brakes, steering, landing gear, tires that were unrelated to the disaster, but known weaknesses that were overlooked. After Columbia, there was discussion about the “bolt catchers”. It was a primitive system that is designed to “catch” the explosive 80lb bolts from the SRBs. If it failed, the bolt could strike the orbiter. Even with modifications, there is no guarantee that fragments still couldn’t escape into the airstream and hit the orbiter. On a sophisticated spacecraft, it could have come down to a bucket catching a bolt that could mean the difference between a crew’s life or death.
To look at the four accidents out of context of every mission and every near-miss would lead one to believe they are an anomaly. They occurred far enough apart to be considered a statistical risk of flight, but viewed in another light and one could assume that a) human spaceflight is risky, but loss is preventable, b) to meet its objectives, NASA must compromise time, safety and costs; NASA can only have two of the three, c) NASA is operating vehicles that they do not fully understand in environments that they don’t understand, and d) NASA’s training and culture does not adequately support its mission objectives.
It appears that a mix of all four has been at play in NASA’s organization over the years. NASA grew up in an era where it had unlimited funding, an ambitious and exciting goal, but little time to execute. That occurred during a time where it was also building its organization, its culture, and the systems and processes that would last long beyond the moon landings. In the full scope of NASA’s history, the race for the moon was the most impressionable, but just a small part of its total history. It left a ‘muscle memory” that lingered long after and through many changes. NASA had to adapt to substantially lower funding and take more risks to accomplish its goals. NASA didn’t knowingly put lives at risk, but it did so unknowingly and with a naiveté of an organization that didn’t know how to adapt. The organization never had a chance during its formative years to pace and learn all it needed to know, but instead was under a timeline to land on the moon, and second was under the gun to deliver at the least expense. Both philosophies would prove tragic.
In Apollo 1, the NASA contractor was guilty of poor workmanship, but there were design flaws like the inward opening hatch, the use of pure oxygen, the failure to recognize the flammability of oxygen saturated items such as Velcro, paper and fabric. There were process failures in oversight of the vendor and lack of quality control checks. In Apollo 13, the tank that exploded was a tank from the Apollo 10 Command Module that was damaged during removal for modification. The tank was repaired and an underrated component was not replaced during the upgrade. Subsequent testing showed an anomaly in the tank. After a conference with the contractor, a process was used to complete the test and approve the tank. The process used to clear the tank overheated and damaged the underrated component which damaged the internal heating element. The oversight would prove disastrous and near fatal. The conference with the manufacturer, and the subsequent decision to approve and use a critical component that was not performing as planned had a familiar ring and was reminiscent of NASA's next accident, Challenger.
In Challenger, NASA overrode the recommendation of the manufacturer of the SRB’s and launched in cold weather well below any previous Shuttle launch, and well below the previous near miss launch in 52 degree weather. Little known is that the Challenger may have survived had it not been buffeted by wind shear prior to explosion , which dislodged the debris blocking the o-ring leak, or that a misread on the weather had scrubbed a perfect day to launch the Sunday before the disaster. Any event could have prevented the disaster, but may not have solved the root problem. Even the disaster itself didn’t solve the process that looked at all aspects of flight safety and operational anomalies. Seventeen years later, another event caused the loss of Columbia. In that event, despite rather ominous visual evidence of a strike on the left wing, NASA failed to order additional checks on the condition of the orbiter from ground and space based assets and dismissed the impact a suitcase sized object with the consistency of Styrofoam could have on the orbiter’s wing.
Ultimately, NASA owns these issues. These were systemic failures of process and organization. I previously stated that NASA has had shifting sand beneath its feet since its inception, and has had to deal with numerous administrations and numerous administrators; but it never instituted a risk management or oversight process from its inception that could have avoided accidents and tragedies. A process that would permeate, become part of the agency’s DNA, and become part of the agency’s culture. Through their 50 years of operation, that aspect has proven hard to change. After ARES I had already been designed, there was debate over whether the harmonic oscillation would cause enough vibration to damage electronics and harm the crew. Rather than redesign the vehicle, the plan was to put dampers in the booster to absorb the vibration. The modifications left the vehicle with no margin for future increases in power and already there have been compromises in vehicle weight. It appeared that once again NASA’s culture was to work within the budget and accept risk rather than demand the necessary funding to adequately build and fly vehicles with the greatest amount of safety money can buy, but on this chilly, somewhat cloudy night in early February, the process worked. NASA did not move the margin higher in the risk category to get the launch off, but waited for better conditions. Thousands like me would be disappointed, but the crew and vehicle launched safely the next night proving that NASA has learned. Had they launched safely I would have been ecstatic, had this been the time that another tragedy would befall the space program and the weather would have made a difference, it likely would have brought an end to America’s manned spaceflight program or at a minimum, and an even longer hiatus then is already planned. I’m sorry I missed the launch, but glad NASA made the right call.