| ??? 09/24/03 04:42 Read: times |
#55301 - A few techniques... |
The following is an article published by jack ganssle...I thought it'll help many out there to write better code...
Software WILL Fail ------------------ I had the honor of giving the keynote speech at last week's Embedded Systems Conference in Boston. The talk covered software disasters and the lessons we can learn. But while working on the presentation I wondered why we refuse to acknowledge that software is indeed prone to failure. Developers generally have a subconscious attitude that the software will work, and there will be no unexpected inputs or odd operating conditions. Witness the plethora of buffer overrun conditions that plague many OSes, email and browser programs. Software is funny stuff. Software engineering even funnier. No other branch of engineering is so intolerant of failure. We can beef up a strut to handle unexpected loads, or use a bigger conductor to handle possible electrical surges. But a single software error brings down the entire application. There are no fuses or weak links we can install to protect a system under stress. Or are there? A watchdog timer is a hardware kick-start in case the software crashes. Like a fuse it's extra hardware we add to the system (unless it's part of the processor's silicon), which sits outside the app, ready to intervene if the worst happens. WDTs are a well-known and essential ingredient for reliable embedded systems. Yet an astonishing number of systems don't have a WDT. The Clementine spacecraft was lost when the code crashed and dumped all of the satellite's fuel. The team didn't "have time" to write code to use the WDT hardware that existed in the system. Exception handlers are another tool we can and must use to capture unexpected problems. Yet unimplemented handlers are a common theme in the lore of disaster stories. If we've bothered to write the handler, in many, many cases that code is poorly tested. The system throws an exception, and the minimally tested handler, meant to save the device, causes a crash. Memory management units are one of the best defenses against crashes. Run each task in its own partition; the MMU hardware traps errant accesses. The system can process the error and recover... or at least leave debugging breadcrumbs behind. If your system doesn't have an MMU, we can program spare chip selects to tag code that wanders into unused code regions. Yet we don't. We think of software as being inherently brittle. Maybe it's not; maybe the code is fragile only because we refuse to install these sorts of fuses and weak links, as is done in every other branch of engineering. While writing this a good friend called to discuss some C code, and he very casually mentioned that all of his apps rewrite the output ports every second or so in case ESD flips their state. His code always includes a stack monitor to flag excessive pushing. Why aren't we all doing this? There's two reasons: laziness/schedule pressure, and lack of knowledge about techniques we can use. The former is problematic and possibly not solvable. We can deal with the latter by inventing cool tricks, and sharing those tricks widely. A few techniques: - Add an MMU, as discussed - Always build in an awesome WDT - Capture every possible exception - and TEST each handler thoroughly - Monitor stack growth - Rewrite output ports often - Use ASSERTS to throw exceptions in any case where an error can possibly occur. - Apply sanity checks to all inputs, whether from a user or from hardware (like an A/D). Rgds Raj Shetgar |



