??? 07/14/06 04:24 Modified: 07/14/06 04:37 Read: times Msg Score: +1 +1 Good Answer/Helpful |
#120242 - Software hang-up, hardware lock-up Responding to: ???'s previous message |
Harshada said:
My project involves a fault, called "processor fault". How do we detect this fault? This is a very interesting question! Two major mechanisms are known, which result in a malfunction of micro, software hang-up and hardware lock-up. Don't take these phrases too literaly, english is not my mother language: 1. Software hang-up: By this I mean the wrong execution of program due to external influences. In such situations it can often be observed, that the program counter of micro eroneously skips by one or more additional steps. This can be invoked, for instance, by an ESD event hitting a pin of the micro, by a short overvoltage spike coming from the PSU (power supply unit), by hitting the die by a cosmic particle, etc. In such a case the micro suddenly interprets bytes of his code memory as instructions, which are no regular instructions, and runs ill. This need not to result in a total disaster, if the skip of program counter leads to an other regular instruction, but at least one or more instructions are not executed or twice executed, so the program makes a mistake! But in most cases, the micro runs totally ill, not doing any useful job any longer, sometimes even hanging in an infinite loop. By the outside world it cannot easily be detected that the micro hangs, because the micro is not in a certain, well defined "status" that specifies him to be in a hang-up condition. Only, the program seems not to do, what it is intended to do... Important: The micro being in a software hang-up can easily be resetted by a reset signal at its reset pin. After removing the reset signal the micro commences as if nothing wrong has ever happened. 2. Hardware lock-up: Sometimes it can be seen, that the micro quits such an above "external influence" not simply by a skip of program counter, but by some much more serious malfunction, namely a complete hardware lock-up. In such a case, the micro does no longer execute any instruction, whether correct or wrong ones, but stops all it's activity and, yes, becomes dangerously hot. The cause of such a hardware lock-up can be a such high overvoltage spike at a pin of micro, that a latch-up occurs, means that an internal parasitic thyristor becomes fired and short circuits certain internal supply lines to ground. Or it can be the result of the invoke of a metastable status of some internal logic by a heavy overvoltage spike. Etc. Important: A micro being in a hardware lock-up is in a dangerous condition, which can result in the total destruction of micro! And, a hardware lock-up cannot always be ended by a reset! Mostly, only by a power-down of micro!! To derive a "micro fault" signal you can use the fact, that in both cases the micro stops working as intended. So, you could use a MAX1232 reset chip containing a watchdog, which is to be strobed every, let's say, 10msec by the micro. If the micro stops working as intended, the micro (hopefully) also stops strobing the MAX1232, which forces the MAX1232 to reset the micro after a defined delay. This reset signal could be used then also as your "micro fault" signal. Not only the software hang-up but also the hardware lock-up could be detected by this scheme. In the latter case, a not ending series of periodic reset pulses would occur, because the micro never starts to work properly again, until it is powered-down and powered-up again. But take care, there's no guarantee, that the MAX1232 always resets the micro when a software hang-up occurs! Only if the micro at the same time stops emitting the strobe pulses, as consequence. So, using a watchdog is never a guarantee to cure all imaginable malfunctions of a micro! Kai |