Working at the Rocket Factory for 30 Years

With a BA in math and an unfinished masters in Computer Science, I got a job in June 1982 at the Rocketdyne facility in Canoga Park, CA.

(Above is my pic from the 1987 Rocketdyne open house. Wikipedia’s pic below is cleaner, and follows a remodel of the front of the building.)

This was the main Rocketdyne site, noted for the F-1 engine (like those that powered the Saturn V rockets that launched the Apollo missions) sitting prominently before the main entrance on Canoga Avenue.

Rocketdyne was then part of Rockwell International, itself derived from an earlier company called North American Rockwell. In the mid-1990s  Rockwell sold the division (the Canoga site and four others) to Boeing. Boeing had no previous involvement in building rockets, though by the ‘90s Rocketdyne did considerably more than actual rocket work. (It designed and built the electrical power system for the International Space Station (ISS), for example.) Boeing kept Rocketdyne for about a decade, then sold it to Pratt & Whitney, known for making jet engines for aircraft, and itself a division of United Technologies.

My starting salary was $25,488/year. Despite being told, when I began work, that the only way to “get ahead,” i.e. to get promotions and salary increases, was to move around among different companies for the first few years, I never did that. I stayed with Rocketyne for just over 30 years, even as the corporate owners changed above us. In the early years I did very well, getting 8-10% raises every year (in addition to the regular adjustments for inflation, then very high, that everyone got), until leveling off after 5 or 7 years, then gradually creeping to just barely over 6 figures at the end. I stayed a worker-bee; I never went into any kind of management. In the year after I was laid off, at the end of 2012, UTC sold Rocketdyne to Aerojet, a much smaller company based in Sacramento.

June 1982 was the very end of the keypunch age (I had been a keypunch operator at my college library job at CSUN) but not quite the age of the desktop computer terminal. My first desk, left, had no keyboard of any type. We would fill in documents or coding sheets by hand, and hand them over to a typist or keypunch operator for computer input. (We also used enormous FAX machines, the size of dishwashers.) The first computer terminal system we had was, IIRC, one designed strictly for word processing, called Wang. Using it entailed walking over a dedicated work station with a Wang terminal. Obviously you didn’t use it very often, because others needed their turns.

Over the decades, Wang gave way to a mainframe computer, a VAX, with terminals on everyone’s desks (as in the photo here), and eventually to Windows PCs connected to a network for file sharing, e-mail exchange, and eventually the internet. I’m likely forgetting several intermediate steps. By the mid-1990s, if not before, the Microsoft Office Suite was the standard toolset on everyone’s PCs, including Word, Excel, PowerPoint, Access, and Outlook.

For about the first third of my career, I supported specific projects, first for Space Shuttle, then for ISS. For the second two thirds, I moved into process management and process improvement. Both activities were fascinating, in different ways, and interesting to summarize as basic principles.

SSME Controller

The project I was hired to work on was for SSME controller software maintenance. SSME is Space Shuttle Main Engine. Recall that the space shuttle was a hybrid vehicle. The plane-like shuttle had three rocket engines affixed to its rear end, and for launching it was attached to a large fuel tank to feed those three engines, plus two solid rocket boosters to give the vehicle sufficient initial boost to get into orbit. The boosters dropped away after a few minutes; the fuel tank stayed attached for the full 8 minutes that the SSMEs ran, when it too dropped away.

The three engines, later called RS-25 engines, each had a “controller,” a microwave-sized computer strapped to its side, with all sorts of cables attached. The controller took commands from the outside, responded to them by adjusting fuel gauges to start, throttle, and shutdown the engine, and continuously (60 times a second) monitored temperature and pressure sensors for signs of any problem that might warrant emergency engine shutdown. (The photo here shows a controller on display at a Rocketdyne open house in 1987.)

The shuttle program had been under development since the 1970s. By the time I joined Rocketdyne in 1982, the first three orbital missions had already flown (and the fourth would fly four days later). The controllers had been built and programmed originally by Honeywell, in Florida; once development was complete, maintenance of them was turned over to Rocketdyne, both in California and at a facility in Huntsville, AL.

It’s critical to appreciate how tiny these computers were, not in physical size but in capacity! The memory capacity was 16K words. It was thus extremely important to code as efficiently as possible. Yet while the code had been completed sufficiently for the early shuttles to fly, change orders from the customer (NASA) came in regularly, and (rarely) a bug might be found that required immediate repair. So there was work for a small team of software engineers (10 or 12 of us) to process these changes and manage their deployment into regular new versions of the software.

You referred to the software by consulting print-outs kept in big blue binders. The “actual” software was stored on magnetic tapes. There was no “online” in those days.

Nor was the software “compiled” in those days. The code was written in assembly language, with each word assigned a particular location in memory. There was a requirement for maintaining a certain proportion of unused memory. Some requirements or design changes entailed simply changing out some code words for others (e.g. the value of a fuel flow sensor qualification limit).

Other changes, to add functionality, requiring using some of that unused memory. But you didn’t rearrange the existing code to make room for the new. No, you “patched” the old code. You removed one word of old code and replaced it with a “jump” command (JMP) to an area of blank memory. You put new code there (beginning with the word you replaced with the jump), ending with a “jump” command back to the old code at the place you jumped out. This was because the software was only considered tested and verified in the fixed locations it was originally placed. To “move around” old code to fit in new code would require retesting and verifying all that moved around code. With patching, you only had to retest the one routine with the patch.

The photo here shows the “test lab” onsite at Canoga where our team ran preliminary test verification of the patched software. (A controller box is visible at the back.) Official verification testing was done at the site in Huntsville AL.

At some point the controller itself was modified allowing for expanded memory and new version of controller software, “Block II,” written in C, much easier to maintain, even if modified code, being compiled anew every time, required more testing. Still, some concerns remained, especially the limitation of memory space. Repeated patching of certain areas (sensor processing was an area continually refined by the experts at NASA) made the code less and less efficient. It became my specialty, of sorts, so tackle large redesign projects to improve that efficiency. The biggest one I did was that very module for sensor processing, which took some 25% of the total code and squeezed out 10% or 20% of its memory space.

My Experience of the Shuttle Program

Shuttle landings

The shuttle program had been underway, as I’ve said, for several years before I started my job supporting it. And while the shuttles launched from the opposite side of the country from southern California, they landed relatively nearby, in the Mojave Desert on a dry lake bed at Edwards Air Force Base, a two hour drive from LA. NASA made these landings open to the public. There had been a prototype shuttle, named Enterprise by popular demand, that was lifted into the air on top of a 747, and then dropped for practice landings at Edwards, several times before the first flight shuttle actually launched.

These landings were day and a half affairs. Two or three times some friends and I would drive out to Edwards the afternoon before such a test flight, or (later) before an orbital flight was scheduled to land. The viewing area was a section of the enormous dry lake bed a couple miles away from the landing strip. (There were porta-potties but little else; you camped on the ground or in your car.)

Thousands gathered. The actual landings occurred fairly early in the morning, and happened pretty quickly – from first sighting of the shuttle as a tiny dot way up in the sky, to the touchdown across the lake bed, took about five minutes—and were utterly silent. You saw the plume of dust when the wheels hit, and the roll-out of the lander as it coasted for a minute or so before it came to a stop.

And then everyone got back in their cars and took hours creeping out the two-lane road that led back to the interstate.

Test Stands

Perhaps my earliest exposure to the bigger picture of the SSME program was a visit (with a group of other new employees) to the Santa Susana Field Lab (SSFL), in the mountains to the west of the Rocketdyne plant in Canoga Park, up in the SS mountains at the west end of the San Fernando Valley. (This rough map shows approximate locations.) The area had several large “test stands,” multi-story structures where an actual rocket engine could be mounted and fired. This LA Times article show a photo of the test stand we visited, I think. The engine, at that time an SSME, was mounted in the middle, and when fired its fiery exhaust would spew out, horizontally, the big gap at the bottom. (Another pic here, showing the fiery plume of an engine test.)

The group of us stood in a bunker a few hundred feet away from the test stand (just left out of frame of the first pic above), behind walls of concrete. Still, it was very loud, and went on for several minutes.

I should note that the SSFL become controversial in later years, especially as the populated area of the San Fernando Valley expanded toward it. There were issues of ground contamination by chemicals use for the rocket tests, and even a nuclear event that had left some residue. The site had been built, of course, way back in the 1950s, long before residential areas encroached. It was completely shut down by the early 2000s.

Shuttle launches

I never saw a shuttle launch; the opportunity never arose. The launches were across the country, as I’ve said, at Kennedy Space Center in Florida. Rocketdyne did have a program to send a couple employees to each launch, based on some kind of lottery or for meritorious service, but I never applied or was chosen.

The practical difficulties of attending launches were that the scheduled launches were often delayed due to weather, sometimes for days, so you couldn’t plan a single trip to last a couple nights; you’d have to extend your stay, or give up and come home.

On the Launch Pad

However, I did snag a trip to KSC, on my own time, a decade later, by which time I was no longer working on the program. The occasion was the 1992 World Science Fiction Convention, held that year in Orlando. A coworker from Rocketdyne in Canoga Park had moved back east and gotten a job at KSC, so I contacted him to see if he wanted to meet. He got me a pass and took me on a private tour. We stepped briefly into the famous Mission Control room, and then went up onto the actual launch pad where an actual space shuttle sat ready to launch, something most visitors, even authorized guests, never have a chance to see, I imagine. (This was 2 Sep 1992, so it was Endeavour, STS-47, on the pad.) We took at elevator up the level of the base of the shuttle, with the three main engines to one side, and the tail of the shuttle was directly above us. (You can see where we would have stood in the opening shot of this video, https://www.youtube.com/watch?v=GREwspcOspM) I could have reached up and touched the tail. I was told not to. I didn’t. And then we took the elevator up further, to the level of the beanie cap at the very top, then back down to the astronaut level where escape baskets awaited. And then a walk through of the enormous VAB, the Vehicle Assembly Building.

SSME Business Trips
Similar to the test stand visit described above, I and two other new employees were sent on an orientation trip to Stennis Space Center (SSC) in Mississippi, where another Rocketdyne facility oversaw test firings on much larger test stands than those at SSFL. On that trip, ironically, we didn’t see an actual engine test, but we did see the facility — a huge area out in the middle of the wilderness of the state, nor far across the border from New Orleans, where we’d flown in. Passing through New Orleans was a close to a glamorous business trip destination as I ever managed, at least while working SSME, and I only got a trip there that once.

Much more frequently the Controller team took trips to Huntsville, Alabama, to the NASA Marshall Space Flight Center, where yet another Rocketdyne facility held a software team that the formal testing of the SSME Controller software. Sometimes we went to help oversee formal testing, or sometimes to take classes from the more senior staff there. But never any test firings.

Vandenberg and Slick Six.

I was able to attend a field trip fro Rocketdyne employees, one long by bus, to a new, under-construction, shuttle launch site an Vandenberg Air Force Base, on the coast northwest of Santa Barbara. This would have been mid- or late-1980s. The site was called Space Launch Complex 6, SLC-6 (https://en.wikipedia.org/wiki/Vandenberg_Space_Launch_Complex_6), pronounced “slick six.” It was planned to be a second site to launch space shuttles, in addition to Kennedy, but for various reasons was never completed. I recall this visit especially because I had my camera with me and took a bunch of photos (I’ll post some) We spent a couple there, getting a tour and walking around, then riding the bus back to Canoga Park.

Challenger.

The first space shuttle disaster (https://en.wikipedia.org/wiki/Space_Shuttle_Challenger_disaster) happened in January 1986; I’d been working at Rocketdyne for less than four years. It happened about 8:30 in the morning west coast time, and fortunately or unfortunately, I was at home sick with a cold, watching the launch on TV. No doubt everyone at work was watching it too, and I can only imagine the collective reaction of everyone there seeing the shuttle explode on live TV. (Though at the time there were a few moments, even minutes, of befuddlement on the part of witnesses and even Mission Control about precisely what had just happened. “Major malfunction” I believe was the earliest description.)

What followed was months of analysis and investigation to understand the cause of the explosion, which turned out to be rubber O-rings sealing joints in the SRBs, solid rocket boosters, that had gone brittle in the chilly morning air, letting the burning fuel in the booster escape out through the side. Rocketdyne was relieved to find itself innocent of any role in the disaster—but if NASA was by nature risk-averse, it then became even more so, and every contractor for every component of the shuttle assembly spent months doing what was called FMEA, Failure Mode and Effects Analysis (https://en.wikipedia.org/wiki/Failure_mode_and_effects_analysis), an intensive examination of every component looking for any possible failure scenario. In particular was the emphasis on single point failures, a catastrophe that would result if a single component, say a sensor, failed. (This kind of single-point failure of a sensor is what brought down two 737-MAX passenger jets in 2019.) The SSMEs were full of redundancies — two command busses, two of most sensors in the engine except for where we had four (fuel flow), and much of the software function in our software was to constantly – the program cycled at 60 times a second—evaluate these sensor readings against qualification limits and then against each other, with various command reactions should a sensor seem to have gone wrong.) This involved overtime work, sometimes late in the evening after dinner, over many weeks.

Columbia.

The second space shuttle disaster (https://en.wikipedia.org/wiki/Space_Shuttle_Columbia_disaster) occurred in 2003, on the mission’s re-entry rather than take-off, and again I saw it from home on TV. It was a Saturday morning, and the catastrophe happened around 9am eastern, so had already happened by the time I turned on the TV news. I followed the investigation and resolution of the incident over the next months, but was no longer working on the shuttle program at that time.

Space Station Support

By the early 1990s the Space Shuttle program had matured and required less and less maintenance. Meanwhile Rocketdyne had taken on another large NASA contract, building the electrical power distribution system for the International Space Station, ISS. At some point they needed extra help to complete the local testing of the software, and brought several of the SSMEC staff over to support this aspect of the program.

I had a much narrower involvement with this program, a more limited role. Part of my assignment was to run particular test cases in the test lab, a lab analogous to the one shown above for SSMEC, but because the lab was in demand, this usually meant I’d have to go in to work in the evening after dinner for a couple of hours as necessary.

During the day my job was to convert a set of Excel spreadsheets, containing records of various components and appropriate command responses, into a Microsoft Access Database that the testing team could more easily consult and analyze. This is how I learned Access, which I later parlayed into the building of Locus Online and my science fiction awards database, sfadb.com.

I don’t think this period of program support lasted more than a year or two. Eventually I and two others, my coworker Alan P and our immediately manage Jere B, accepted a completely new assignment: process engineering the improvement.

This suited me because I was as fascinated by the processes of doing software engineering, all the conceptual steps that go into creating a complex product for delivery to a spacecraft, as the details of any particular program.

So I’ll take a long aside to describe that, before discusses how and why our job because process improvement.


Software Engineering

Software engineering is, in a sense, the bureaucratic overhead of computer programming, without the negative connotation of bureaucracy. Software engineering is the engineering discipline that includes the management structure, the coordination of individuals and teams, the development phases, and the controls that are necessary to get the customer’s desired computer program, software, into the target project and to be sure it works correctly and works as the customer intended.

At the core are several development phases. First of these is system requirements. These are statements by the customer (in these cases NASA) about what they want the software to do. These statements are general, in terms of the entire “system” (e.g., SSMEs, the Space Shuttle Main Engines), and not in software terms. An example might be, the software will monitor the temperature sensors and invoke engine shutdown should three of the four sensors fail.

The next phase is software requirements. This is where software engineers translate the system requirements into a set of very specific statements about what the software should do. These statements are numbered and typically use the word “shall” to indicate a testable requirement. Examples might be: The software shall, in each major cycle, compare the temp sensor to a set of qualification limits. If the sensor reading exceeds these limits for three major cycles, the sensor shall be disqualified. If three sensors become disqualified, the software shall invoke Emergency Shutdown.

These requirements entail identifying the platform where the software where will run; the size of the memory; the specific inputs (sensor data, external commands) by name, and outputs (commands to the hardware, warnings to the astronauts) by name, and so on.

The next phase is design. Design is essentially everything that has to happen, given all the specific inputs, to produce the required outputs. The traditional method for documenting design was flowcharts (https://en.wikipedia.org/wiki/Flowchart), with various shapes of boxes to indicate steps, decisions, inputs, outputs, and so on.

Next was code. When I began we were still writing in assembly language! That was the language of the particular computer we were writing for, and consisted of various three-letter abbreviations for each command, where some commands were indications to move the flow of execution to some position above or below the current one. Within a couple years after I started, the SSME software transitioned to “Block II,” where the software was rewritten in a higher level language, C+, much easier to write and maintain.

The final phase was test. The code was run in a lab where the target platform was simulated inside a hardware framework that faked commands and sensor inputs. Each set of fake inputs was a test case, and each test case was designed to test and verify a particular item back in the software requirements.

The key all this was traceability. The software requirements were numbered; the design and then code documented at each step the s/w requirement(s) it implemented. The test phase was conducted without knowledge of the design and code; the testers looked only at the requirements, and created test cases to verify every single one.

This was the core sequence of developing software. There were two other attendant aspects.

One area consisted of quality assurance, QA, and another configuration management, CM. QA people are charged with monitoring all the development phase steps and assuring that the steps for doing them are complete; they’re monitoring the process, essentially, without needing to know about the product being developed. CM folks keep track of all the versions of the outputs of each development phase, to make sure that consistency and correctness are maintained. You might not think this is a significant task, but it is. As development continues, there are new versions of requirements, of design and code, of test procedures, all the time, and all these versions need to be kept track of and coordinated, especially when being released to the customer!

An attendant task of CM is, that after a release of the software to the customer, to keep track of changes to be made for the next deliverable version. Change requests can come in from anyone—the customer especially — for requirements changes, but also any software developer who spots an error or simply has an improvement suggestion (a clarification in the requirements; a simpler implementation in code) to make. And so there is an infrastructure of databases and CM folk to keep track of change requests, compile them for periodic reviews, record decisions on whether to implement each or not, and track them to conclusion and verification.

A supporting process to all these phases of software development was that of peer review, which became something of my specialty (I maintained the training course materials for the subject, and taught it a bunch of times, both onsite and at other sites). While these days “peer review” is tossed around as an issue of the credibility of scientific papers, our process had a very specific definition and implementation for software development. First, the context is that there’s a team of software engineers all working parallel changes to the same master product. When I started originally, working Block I, within a whole team of 10 or 12, one particular team member would work all phases of each particular change: changes to requirements, to design, to code, to test plans. Later, Block II was large enough to allow specific engineers to specialize in one of those phases; some would do only design work, some only test work. In either case, a change would be drafted as markups to existing documentation, and these markups were distributed to several other team members — “peers” — for review. After a couple days, a formal meeting would be held during which each reviewer would bring their comments, including errors found or suggestions for improvement. This meeting was conducted by someone other than the author of the changes. A member of the quality team attended, but management was specifically not invited — the intent of the meeting was to get honest feedback without fear of reprisal. As the meeting was conducted, it was not a presentation of the material; the reviewers were expected to have become familiar with it in advance of the meeting. And so the meeting consisted of paging through the changed documents. Anyone have comments on page 1? No? Page 4? (If no changes were made on pages 2 and 3.) OK, what is it, let’s discuss. And so on. The meeting participants would arrive at a consensus about whether each issue needed to be addressed by the change author. The number of such issues was recorded, the change author sent off to address them, with a follow-up after a week or so by the meeting coordinator, and QA, to assure all the issues were addressed.

We made a crucial distinction between what were called errors and defects. The worst news possible was an “external defect,” a flaw found by the customer in a delivered product. Such problems were tracked at the highest levels by NASA review boards. The whole point of peer reviews was to identify flaws as early as possible in the development process. Within the context of a peer review, a problem made by the change author, spotted by a peer reviewer so it could be fixed before the change products were forwarded to the next phase of development, was an “error.” A problem found from a previous phase of development, say a design error found during a code review, was a defect (an internal one, since caught before it reached the customer); such a defect meant that that an earlier peer review had not caught the problem.

Counts of errors and defects, per peer review and per product, were ruthlessly documented, and analyzed, at least in later years when process management and improvement took hold (more about that below). It was all about finding problems as early as possible in to avoid later rework, and expense.

This all may seem incredibly complex and perhaps overly-bureaucratic – but modern computer systems are complex, from the basic software in the Saturn V and the Space Shuttle, to the decades-later iPhones, whose functionality is likely a million times the Shuttle’s, depend on similar or analogous practices for developing software.

Aside: Coding

Every phase of the software development process can be done haphazardly, with poorly written requirements, design flowcharts with arrows going every which way and crossing over one another, spaghetti code with equivalent jumps from one statement to another, up or down the sequence of statements. Or, elegantly and precisely, with clean, exact wording for requirements (much as CMMI has continually refined; below), structured flowcharts, and structured, well-documented code. (With code always commented – i.e., inserted lines of textual explanation in between the code statements, delimited by special symbols so the compiler would not try to execute them, explaining what each group of code statements were intending to do. This would help the next guy who might not come along to revise the code for years; even if you’re that guy, years later.)

But for code, more than the other phases, there is a certain utter certainty to its execution; it is deterministic. It’s digital, unlike the analog processes of virtually every other aspect of life, where problems can be attributed to the messiness of perception and analog sensory readings. So if there’s a problem, if running the code doesn’t produce the correct results, or if running it hangs in mid-execution, you can *always* trace the execution of one statement after the next all the way through the program, find the problem, and fix it. Always. (Except when you can’t, below.)

I keep this in mind especially since, outside industry work, I’ve done programming on my own, for my website and database projects, since the mid-1990s, at first writing Microsoft Word “macros” (to generate an Awards Index in page perfect format for book publication.. which never happened) and then moving on to writing Microsoft Access “macros,” to take sets of data from tables or queries and build web pages, for my online Awards Indexes (which did happen). (Also, to compile the annual Locus poll and survey, and similar side tasks.)

With highly refined code used over and over for years (as in my databases), when running a step hangs in mid-execution, it is always a problem with the data. The code expects a certain set of possible values; some field of data wasn’t set correctly, didn’t match the set of expected values; you find it and fix the data. But again, you always find the problem and fix it.

There’s a proviso, and exception, to this thesis.

The proviso is that it can be very difficult to trace a problem, when running a piece of code hangs. Sophisticated compilers give error warnings, and will bring up and highlight the line of code where the program stopped. But these error warnings are rarely helpful, and are often misleading, even in the best software. The problem turns out to be one of data, or of a step that executed correctly upstream but produced incorrect results. And so you have to trace the path of execution and follow every piece of data used in the execution of the code. This can be difficult, and yet – it always gets figured out.

Interrupts

The exception that I know of to this perfect deconstruction, likely one of a class of exceptions, is when the software is running in a live environment (real-time software) and is subject to interruptions based on new inputs (sensor, command), which can interrupt the regular execution of the software at any instant. My experience of this is from the Space Shuttle controller software, which ran at something like 60 times per second during the first few minutes of launch, while the main engines were running. If an astronaut abruptly hit an ‘abort’ button, say, or if another component of the shuttle blew up (as happened once), an ‘interrupt’ signal would be sent to the controller software to transfer regular execution to a special response module, usually to shut the engine down as quickly as possible. Whatever internal software settings that existed thus far might be erased, or not be erased but no longer be valid. These are unpredictable situations that might never be resolved. There is always an element of indeterminacy in real-time software.

Still, after the explosion of the Challenger in 1986, everyone worked overtime for months conducting an FMEA, Failure Mode and Effects Analysis, which for our software meant combing in fine detail to identify anywhere where an interrupt might cause catastrophic results. Fortunately, in that case, we never found any.

Data vs. Algorithm

I have one other comment to make about coding. This was especially important back in the SSME Block I days when memory space was so limited, but it also still informs my current database development. Which is: the code implementation is a tradeoff, and interplay, between data and logic. When there is fixed data to draw upon, the way the data is structured (in arrays or tables, say) greatly affects the code steps that process it. You can save lots of code steps if you structure your sets of data appropriately at the start. Similarly, when I rebuilt the sensor processing module, writing a large section of the code from scratch, replacing earlier versions that had been “patched” (perhaps I should explain that), the savings in memory came partly from avoiding the overhead of patched software, but also from rebuilding data tables (of, for example, minimum and maximum qualification limits for sensors) in ways that made the writing of the code more efficient.

Patching

 

This will seem extremely primitive by modern standards, but that’s how it was done in the ‘80s. I’ll invent a simple example.

 

 

Suppose you’re asked to modify the code for a simple comparison of a sensor reading with its qualification limits. The original code ran like this (not real code, a mock-code example):

If current_sensor_reading > max_qual_limit then

Increment disqual_count by 1

If disqual_count >2 then

Set sensor_disqualification tag

Endif

Endif

Now suppose a new requirement came along to, in addition to incrementing the disqual count by 1, also set an astronaut_warning_flag. Now the point here is that, in the earliest, Block I software, these instructions were coded in assembly code, with every code step loaded into a specific location of memory. The code was not “compiled” in the later sense every time it was run, or modified, because the qualification of the code applied only to the original coding in those particular locations of memory. Thus, to make this change, you would affect as few stable pieces of code as possible, add the new steps in some previously unused section of memory, and use “jumps” to implement the new sequence of steps, like this:

If current_sensor_reading > max_qual_limit then

Increment disqual_count by 1  LABEL1: Jump to INSERT1

If disqual_count >2 then

Set sensor_disqualification tag

Endif

Endif

 

(down in previously empty memory):

INSERT1: Increment disqual count by 1

Set astronaut_warning_flag to yes

Jump to LABEL1 +1

So to add, in effect, one line of code, you had to spend two lines of code to jump and out of and back into the existing execution flow. You can see how repeating patching of different areas of the software made the aggregate less and less efficient, in terms of memory locations used.

Object oriented

One more principle that we gradually employed for SSME, and which I later employed in my database designs, was the idea of object-oriented design. This was a generalization of the idea of subroutines, or functions. Super-simple example:

Input next name from input list

Perform steps to capitalize every letter

Input next name from input list

Perform steps to capitalize every letter

(and so on)

Actually this can be optimized thusly:

Do until end-of-list:

Input next name from input list

Perform steps to capitalize every letter

Move to next position in input list

Loop

But suppose you need to do the capitalization from many different places in a large program? Instead of repeating the several steps to capitalize every letter, you isolate those steps in a separate subroutine, or function, that can be invoked from anywhere, not just the one Do-loop:

 

Do until end-of-list:

Input next name from input list

Call Subroutine Cap_all()

Move to next position in input list

Loop

 

Once Cap_all() is written, it can be used from anywhere else in the entire program.

And the extension of this, object-oriented programming, is to divide the entire program into separate, self-sufficient modules, that call each other as needed, and make every one of them independent, with its own inputs and outputs that don’t depend on the sequence of execution of any other modules. In my database development for my online awards sites (there was an earlier one on the locusmag.com site before I created sfadb.com), I took the database I’d developed for the earlier site, which had many repeated sections of almost-identical code (e.g. to format a title or a byline, from the base set of data, for different output pages), and rewrote it from scratch for sfadb.com, using object oriented techniques, to format titles and bylines in one module called “assemble” before a later module was executed to “build” various webpages.

These software examples are extremely basic and even then I am probably oversimplifying them. But perhaps they provide a taste of the kind of conceptual thinking that goes into software engineering. Rigorous, logical, remorseless. But once they work – this is the kind of engineering that has built our modern world.

My Experience

When I started at Rocketdyne, the first three Space Shuttles had already flown. So the software for the SSME main engine “controllers” (pic) had already been written (initially by Honeywell, in Florida) and installed. The software having been turned over to Rocketdyne for maintenance, it was my group’s job to process changes and updates. I did well, and became an advocate of cleaning up code, and documentation, that had suffered too many haphazard updates and had thus become inefficient.

It’s critical to remember that in this era, we were writing code for a *very small*, by modern standards, computer—it had something like 16K words of memory [check!!!}. So it was extremely important to code efficiently. But the accumulated result of individual changes and updates to that code had used up much of the available margin. So the biggest project I did, in my earliest years at Rocketdyne, was to redesign and recode the entire module for sensor processing – some 25% of the total code — making the result more efficient and saving 10 or 20% of memory space.

[aside with photos about details of that project]

Yet I learned some lessons in those early years—mainly, that even intelligent engineers can become accustomed to tradition and resistant to change. One case was a proposition, by our counterparts in Huntsville, to transition to “structured” flow charts, rather than flowcharts that merely captured to “spaghetti” code being written. (The advantage of structured flow charts, aside from being more understandable, is that they corresponded to the various logical proofs that computer programs accomplished what they were designed for.) There was resistance among the older staff, to my consternation; still, the reform was implemented.

A second case was when I tried to reformat a chart in the requirements document that I thought messy; this was a chart of FID (Failure Identification) codes and responses. Again, it had been amended and revised over the years, and had become messy. I drafted a revised form and sent it out, and got pushback from the senior system engineer, simply because he was used to the current chart and didn’t want to deal with a change, even if it was an improvement…

Potted History

The lead-up history to the early 1980s when I began working for Rocketdyne, supporting the Space Shuttle, might be prehistory to those of you, if anyone, reading this account. As concisely as possible: rockets were conceived centuries ago (initially by the Chinese I believe) but not implemented until the 1940s, when Germany used U2 rockets to bomb London; these rockets traveled in arcs of a few hundred miles. After World War II, the Soviet Union and the US competed to build rockets that could achieve orbit. Throughout this period, futurists (like Willy Ley) and science fiction authors (like Arthur C. Clarke), imagined the use of rockets to place satellites in orbit, or to send men to the moon or other planets. (It was a commonplace assumption in science fiction, from the 1940s and beyond, that human exploration of the planets and even the galaxy was inevitable—a sort of projection into the far future of the Manifest Destiny that informed American history.) The Soviets won the first round, launched Sputnik, a satellite that orbited the earth, the first man-made object to do so. The following decade was a competition between the two countries to send men into space. The US launched Mercury flights (one man per capsule), Gemini flights (two men), and finally Apollo flights, with three men each, and designed ultimately to reach the moon. After several preliminary flights, Apollo 11 landed on the moon, in July 1969 (my family and I watched the live-feed from the spacecraft on grainy black and white TV). Several more Apollo missions landed at other spots on the moon

So the US won the competition with the USSR – they seem to have given up around the mid-1960s, though of course they didn’t admit it. What next? Well, there was the first attempt at a space station, called Skylab (https://en.wikipedia.org/wiki/Skylab), for a year or so in 1973. Then, greatly collapsing the following decades, two big US projects: the Space Shuttle, intended as a re-usable method of getting into orbit, which first launched in 1982, and the International Space Station, which launched in 1998, and which is still going.

All the components of the Mercury, Gemini, and Apollo missions were used once and then lost (burned up in the atmosphere, sunk into the sea, or sent to museums). The Space Shuttle was an odd hybrid, the result of numerous compromises, but it entailed re-usable components: a central plane-like shuttle with three rocket engines at its base that were reusable, and two solid-fuel boosters to lift the ensemble for the first few minutes as it got into orbit, that fell off and landed in the sea.

My job at Rocketdyne was maintaining the software that monitored and controlled the software for the SSMEs, the Space Shuttle Main Engines. The engines were reused, as often as possible, though I’m thinking there were two or three dozen engines, each used multiple times, that were installed in the 135 missions, on the five orbiters (Columbia, Challenger, Discovery, Atlantis, Endeavour).

The last shuttle flight occurred in 2011, long after I’d left the program.

Process Management and Improvement: CMMI

  • –add somewhere here: How with cmmi I specialized in metrics, and peer review… process performance baseline, etc.

In the early 1990s NASA and the DoD adopted a newly developed standard for assessing potential software contractors. This standard was called the Capability Maturity Model, CMM, and it was developed by the Software Engineering Institute (SEI) at Carnegie Mellon University in Pittsburgh. The CMM was an attempt to capture, in abstract terms, the best practices of successful organizations in the past.

The context is that software projects had a history of coming in late and over-budget. (Perhaps more so than other kinds of engineering projects, like building bridges.) If there were root causes for that history, they may have in the tendency for the occasional software genius to do everything by himself, or at least take charge and tell everyone else what to do. The problem then would be what the team would do when this “hero” left, or retired. All that expertise existed only in his head, and went with him. Or there was a tendency to apply the methods of the previous project to a new project, no matter how different.

In any case, the CMM established a series of best practices for software development, arranged in five “maturity levels,” to be used both as a guide for companies to manage their projects, and also as a standard whereby external assessors would assess a company for consideration when applying for government contracts.

The five levels, I now realize, are analogous to the various hierarchies I’ve identified as themes for consideration for knowledge and awareness of world, from the simplest and most intuitive, to the more sophisticated and disciplined.

  1. Level 1, Initial, is the default, where projects are managed from experience and by intuition.
  2. Level 2, Managed, requires that each project’s processes be documented and followed.
  3. Level 3, Defined, requires that the organization have a single set of standard processes that are in turn adapted for each project’s use (rather than each project creating new processes from scratch).
  4. Level 4, Quantitatively Managed, requires that each project, and the organization collectively, collect data on process performance and use it to manage the projects. (Trivial example: keep track of how many widgets are finished each month and thereby estimating when they will all be done.)
  5. Level 5, Optimizing, requires that the process performance data be analyzed and used to steadily implement process improvements.

Boiled even further down: processes are documented and reliably followed; data is collected on how the processes are executed, and then used to improve them, steadily, forever.

Examples of “improvements” might be the addition of a checklist for peer reviews, to reduce the number of errors and defects, or the acquisition of a new software tool to automate what had been a manual procedure. They are almost always incremental, not revolutionary.

The directions of those improvements can change, depending on changing business goals. For example, for products like the space shuttle, aerospace companies like Rocketdyne placed the highest premium on quality—there must be no defects that might cause a launch to fail, because astronaut’s lives are at stake. But software for an expendable booster might relax this priority in favor of, say, project completion time.

And software companies with different kinds of products, like Apple and Microsoft, place higher premiums on time-to-market and customer appeal, which is why initial releases of their products are often buggy, and don’t get fixed until a version or three later. But both domains could, in principle, use the same framework for process management and improvement.

Again, projects are run by processes, and in principle all the people executing those processes are interchangeable and replaceable. That’s not to say especially brilliant engineers won’t have a chance to perform, but it has to be done in a context in which their work can be taken over by others if necessary.

So…. In the early 1990s, while Rocketdyne was still part of Rockwell International, Rocketdyne and the several other divisions of Rockwell in southern California formed a consortium of sorts, which we called the “Software Center of Excellence” (SCOE, pronounced Skoe-ee) for the group effort of writing a set of standard processes that would satisfy the CMM, at least through Level 3. If I recall correctly, NASA had given its contractors a deadline for demonstrating compliance to Level 3, a deadline that was a few years out.

So I left the SSME Controller Software group and joined two others, Jere B and Alan P, as Rocketdyne’s process improvement group. The work of writing 15 or 20 standard processes with divvied up among the divisions, and in a year or two we put out a “Software Process Manual” in 1994.

The task of writing “standard processes” was pretty vague at first. What is a process? What do you base it on? At its most basic, a “process” identifies a set of inputs (e.g. sensor readings, commands from the astronauts), performs a series of steps on them, and results in some number of outputs (e.g. commands to the engine to start, to throttle up, to throttle down, to shut down). But how do you write up a standard process for your organization about, say, configuration management? What elements of CM (e.g. version management, audits, etc.) were required to be included? The task was to combine the guidance from the CMM, with the reality of how the different divisions of Rockwell actually did such work, and try to integrate them into some general whole.

One perk of this era, in the early/mid 1990s, was that meetings among representatives from these various sites were held. The other sites included Downey, Seal Beach, El Segundo, and one or two others I’m not remembering. At the time, Rockwell owned company helicopters! They were used to fly senior management back and forth among these sites, but if they were otherwise not reserved, lowly software engineers like Alan and me could book them, and get a half hour flight from Canoga Park to Downey, some 40 miles, avoiding an hour and half drive on the freeways. It was cool: the helicopter would land in a corner of the parking lot at the Canoga facility, we would walk toward it, ducking our heads under the spinning helicopter blades, and get a fantastic ride. What I remember especially is how the populated hills between the San Fernando Valley and west LA, crossing over the Encino Hills and Bel Air, were immense – nearly as wide as the entire San Fernando Valley. All those properties, so many with pools.

We didn’t always use the copters; I remember having to drive to the Seal Beach facility once, (a 55 miles trip) because as I got on the 405 freeway to drive home, the freeway was so empty – because of some accident behind where I’d entered – my speed crept up and I was pulled over, and got my first ever traffic ticket.

But another copter trip was memorable. Coming back from Downey, I suppose, the weather was bad and the copter was forced to land at LAX. To approach LAX, a major airport with big planes landing and taking off, always from the east and west respectively, the copter would fly at a rather high altitude toward the airport from the south, and then spiral down to its target, a rooftop on a building in El Segundo on the south side of the airport. On that occasion we had to take a taxi back to the San Fernando Valley, as the rain came in.

The software CMM was successful from both the government’s and industry’s points of view, in the sense that its basic structure made sense in so many other domains. And so CMMs were written for other contexts: software engineering; acquisitions (about contractors and tool acquisitions), and others. After some years the wise folks at Carnegie Mellon abstracted even further and consolidated all these models into an integrated CMM: CMMI (https://en.wikipedia.org/wiki/Capability_Maturity_Model_Integration). And so my company’s goals became satisfying this model.

The idea of conforming to the CMMI, for our customer NASA, entailed periodic “assessments,” where independent auditors would visit our site for some 3 or 5 days, in order to assess the extent our organization met the standards of the CMMI. The assessment included both a close examination of our documented standard processes, and interviews with the various software managers and software engineers to see if they could “speak” the processes they used, day to day. Assessments were required every 3 years.

Rocketdyne’s acquisition by Boeing, in 1996, did not change the assessment requirements by our customer, NASA. Boeing supported the CMMI model. In fact it established a goal of “Level 5 by 2005.” The advance from Level 3 to Level 5 was problematic for many engineering areas: the collecting and analyzing of data for Levels 4 and 5 was seen as an expensive overhead that might not actually pay off. Rocketdyne, under Boeing, managed to do that anyway, using a few very selected cases of projects that had used data to improve a couple specific processes. And so we achieved Level 5 ahead of schedule, in 2004. (In fact, I blogged about it at the time: http://www.markrkelly.com/Views/?p=130. )

Time went on, and the SEI kept refining and improving the CMMI, both the model and the assessment criteria; Rocketdyne’s later CMMI assessments would not get by on the bare bones examples for Level 5 that we used in 2004. I’ve been impressed by the revisions of the CMMI over the years: a version 1.1, then 1.2, then 1.3, each time refining terminology and examples and sometimes revising complete process areas, merging some and eliminating others. They did this, of course, by inviting feedback from the entire affected industry, and holding colloquia to discuss potential changes. The resulting model were written in straightforward language as precise as any legal document but without the obfuscation. This process of steadily refining and revising the model is analogous to science at its best: all conclusions are provisional and subject to refinement based on evidence. (A long-awaited version 2.0 of CMMI has apparently been released in the past year, so I haven’t seen it.)

CMMI Highlights

  • Business trips: There were lots of reasons for business trips in these years, and the trips were more interesting because they were in many more interesting places than Huntsville or Stennis. A key element of CMMI is training, that all managers and team members are trained in the processes they are using. At a meta-level, this included people doing process management taking courses in the CMMI itself, and in subjects like process definition (the various ways to capture and document a process). The CMMI training was often held in Pittsburgh, at the SEI facility, but in later years I also recall trips to both Arlington and Alexandria Maryland, just outside Washington DC, interesting trips through because they were during the work week there was no time for sight-seeing.
  • Conferences. Other trips were to attend professional conferences. Since dozens or hundreds of corporations across the country were using CMMI to improve their processes or use the model to assess their performance, these conferences were occasions for these companies to exchange information and experience (sometimes guardedly). Much like a science fiction convention, there were speakers talking to large audiences, and groups of panelists speaking and taking questions from the audience; a few dozen presenters and hundreds or thousands of attendees. Furthermore these conferences were not tied to any particular city, and so (like science fiction conventions) moved around: I attended conferences in Salt Lake City (about three times), Denver, Pittsburgh, and San Jose, and I’m probably forgetting some others.
  • Assessments. Then there were the occasional trips to other Rockwell or Boeing sites, for us from Rocketdyne to consult with the process people there, or even to perform informal assessments of their sites (since Rocketdyne was relatively ahead of the curve). I did two such trips by myself, one in Cleveland, once in some small town (name forgotten) northeast of Atlanta.
  • Maui. But the best assessment trip was one Alan P and I did in 1999, in Maui. The reason was that Rocketdyne (or was it through some other Boeing division?) had a contract to maintain the software for some of the super-secret spy telescopes on top of Haleakala (https://en.wikipedia.org/wiki/Haleakala_Observatory). There’s a cluster of small ‘scopes there, including top secret ones; we didn’t have to know anything specific about them in order to assess the processes of the support staff, who worked in an ordinary office building down near the coast in Kihei. Our connection was that a manager, Mike B, who’d worked at Rocketdyne had moved to Maui to head the facility there, and thought of us when needing an informal assessment. So Alan and his wife, and I, flew in early on a Saturday to have most of a weekend to ourselves, before meeting the local staff in their offices for the rest of the week. Meanwhile, we did get a tour of the observatory, if only a partial one, one evening after dinner, a long drive up the mountain and back in the dark. (The one hint I got about the secret scopes was that one of them was capable of tracking foreign satellites overhead, as they crossed the sky in 10 or 15 minutes, during daylight.)
  • HTML. In the mid-1990s the world wide web was becoming a thing, and one application of web technology was for companies to build internal websites, for display of information, email, and access to online documents. (Past a certain point, everything was online and no one printed out documents, especially big ones like our process manuals.) With more foresight, I think, then I’d had when learning Access for ISS support, I volunteered to learn HTML and set up webpages for our process organization, the SEPG (Software Engineering Process Group). I did so over the course of a few months, and shortly I parlayed those skills into my side-career, working for Locus magazine—I volunteered to set up its webpage. Charles Brown had thought ahead at least to secure the locusmag.com domain name (presumably locus.com was already taken), but hadn’t found anyone to set up a site. So he took me up on my offer. The rest is history, as I recounted in 2017 here: http://locusmag.com/20Years/.

Reflections

Looking back at these engineering activities, it now occurs to me there’s a strong correlation between them and both science and critical thinking. When beginning a new engineering project, you use the best possible practices available, the result of years of refinement and practice. You don’t rely on the guy who led the last project because you trust him. The processes are independent of the individuals using them; there is no dependence on “heroes” or “authorities.” There is no deference to ancient wisdom, there is no avoiding conclusions because someone’s feelings might be hurt or their vanity offended. Things never go perfectly, but you evaluate your progress and adjust your methods and conclusions as you go. That’s engineering, and that’s also science.

Things never go perfectly… because you can’t predict the future, and because engineers are still human. Even with the best management estimates and tracking of progress, it’s rare for any large project to finish on-time and on-schedule. But you do the best you can, and you try to do it better than your competitors. This is a core reason why most conspiracy theories are bunk: for them to have been executed, everything would have had to have been planned and executed perfectly, and without any of the many people involved leaking the scheme. Such perfection never happens in the real world.

UTC, P&W, ACE

For whatever reason, after a decade Boeing decided Rocketdyne was not a good fit for its long-term business plans, and sold the division to Pratt & Whitney, an east coast manufacture of passenger jet engines. (An early Twilight Zone episode from 1961, “The Odyssey of Flight 33,” https://en.wikipedia.org/wiki/The_Odyssey_of_Flight_33, mentioned Pratt & Whitney engines, so I was familiar with the name.) Pratt & Whitney was in turned owned by United Technologies Corporation, UTC, whose other companies include Otis Elevators. Whereas Boeing, a laid-back West Coast company, was hands-off with Rocketdyne, letting it establish its own standards and procedures, UTC, an east-coast company, was relatively uptight and authoritarian. This was visible no more starkly than with its “operating system,” a company wide set of tools and standards called ACE, for “Achieving Competitive Excellence.” ACE was homegrown by UTC and stood independent of industry or government standards. Furthermore, it was optimized for high-volume manufacturing, and was designed for implementation on factory floors. That didn’t stop UTC from imposing the totality of ACE on our very low-volume manufacturing site (one or two rocket engines a year) where most employees sat in cubicles and worked on PCs.

It’s notable too that while all sorts information can be found on CMMI through Google searching, almost no details of ACE can be found that way; it’s UTC proprietary. I did finally find a PDF presentation (https://pdf4pro.com/view/acts-system-management-ace-caa-gov-tw-2c4364.html) that lists (on slide 7) the 12 ACE “tools,” from which I will describe just a couple examples. Most notorious was what P&W called “6S,” its version of UTC’s “5S,” which was all about workplace cleanliness and organization. The five Ss were Sort, Straighten, Shine, Standardize, and Sustain; the sixth one was, inconsistently, called Safety. While it may make sense to keep a manufacturing environment spic and span clean, when applied to cubicle work-places it became an obsession about tying up and hiding any visible computer cables, keeping the literal desktop as empty as possible, and so on. Many engineers resented it.

Another example was that in the problem solving “DIVE” process, each ACE “cell” (each business area at a site) was obliged to collect “turnbacks,” which were any examples of inefficiency or rework. It didn’t matter to the ACE folk that in software we had a highly mature process for identifying “errors” and “defects,” we were required to double-book these for ACE as “turnbacks.” Furthermore, each cell was required to find a certain number of turnbacks each month, and show progress in addressing them. You can see how this would encourage a certain amount of make-work.

To avoid duplicate work, at least, I and others who maintained the software processes spent some time trying to resolve double-booking issues, even introducing ACE terminology into the processes we maintained to satisfy CMMI. (P&W didn’t care about CMMI, but our customers did.)

So the last few years at Rocketdyne were my least pleasant. They ended on a further sour note as I was pulled away from process management and put onto a P&W project based back east that needed more workers, even remote ones. This was NGPF, for “next generation product family,” that became the PW1000G (https://en.wikipedia.org/wiki/Pratt_%26_Whitney_PW1000G), a geared turbofan jet engine for medium-sized passenger jets. A couple dozen of us at Rocketdyne were assigned to NGPF, but I was pulled in later than the initial group and got virtually no training in the computer-based design tools they used or background in the concept of the product. So my assignments were relatively menial, and frustrating because I had to figure things out as I went along, without proper peer reviews or the other processes we used for CMMI-compliant software projects.

NFPG was winding down, I had gone back to working a final pass on a new set of process documents, when a bunch of us were laid off in November 2012. Fortunately, since I’d worked for the same company for 30 ½ years, and was old enough to have been grandfathered in to pension eligibility from Boeing, I did get a pension, as well a severance. And I had two different 401K accounts that had accumulated over the years.