Email: Password: Remember Me | Create Account (Free)

Back to Subject List

Old thread has been locked -- no new posts accepted in this thread
???
04/20/07 22:16
Read: times


 
Msg Score: +2
 +1 Informative
 +1 Good Question
#137677 - lesson learned
Yesterday, I got bit by something very weird. Maybe this can help someone else.

I'm using an SiLabs micro in a design. The UART receives packets from a host computer and the micro processes the commands and data in these packets.

The UART ISR, when handling receive interrupts, simply writes the character in SBUF to a FIFO (some XRAM) and updates the FIFO write pointer. My packets are ASCII characters, terminated by '\n'. If the received character is \n, the ISR sets a flag (gotEOL) which has file scope in the uart source file (not visible to any other source file).

The main program loop calls a function isUartEOL() once every time through the loop to see if a complete data packet is sitting in the receive FIFO. If so, it calls a function to read the FIFO and parse the packet. To prevent the main program loop from continually trying to parse the same packet, isUartEOL() clears the gotEOL flag. After a command is handled, or if an invalid packet is received, the micro sends a response back to the host.

Here is the isUartEOL() function:
bit isUartEOL(void) {
    bit rval = gotEOL;
    gotEOL = 0;
    return rval;
} // isUartEOL()
The problem was that "occasionally" the firmware would miss a command. I'd send a command and get no response. When running in the SiLabs debugger, I noticed that the new command (which never gets acknowledged) was still sitting in the receive FIFO and the FIFO pointers indicated that it was never read. If I sent another command, I'd get a response indicating that the previous command got handled.

Weird ... so it seems like the main program loop was never seeing gotEOL set true. I then made gotEOL global and put an extern reference to it in the main.c source, and changed the main loop so it tested gotEOL directly instead of calling isUartEOL(). Here's the main loop skeleton when testing gotEOL directly:
    while (1) {
        if (gotEOL) {
            gotEOL = 0; // clear so we don't parse same again
            if (ReadPacket()) {
                 // 
            } // if ReadPacket()
        } // if gotEOL
    } // while
That worked fine and I never missed a command. But I like the idea of encapsulating all of the UART stuff and truth be told, I don't like globals and externs and keeping track of all of that, and in this application, the cost of the function call was minimal. So the question: why does my program work when I read the flag directly instead of calling the function?

(scroll down for answer)























Look carefully at the isUartEOL() function.

First, I copy the flag to a temp so I can then clear the flag. Then I return the temp. Again, the idea is to make sure that if a packet had been received, the main program loop tries to process it only once.

Soooo ... what happens if the UART ISR is called after the line "rval = gotEOL;" is executed but before the next line, "gotEOL = 0;" is executed? In most cases, nothing.

But here's the failure. We might have just copied 0 from gotEOL to rval when we're interrupted, and then the ISR sees the '\n' and sets gotEOL before returning. But we return right before the line where gotEOL is cleared! So even though the ISR has just set gotEOL, we neatly clear it and isUartEOL() always returns false. This means the FIFO never gets read and the command just sits there until another end-of-line is received.

This took a couple of annoying hours to sort out, and like most things, once you realize what the problem is, you see that it's pretty simple. It was a race between reading a value and then clearing it with the possibility of an ISR changing it in between.

Simple problems require only simple solutions, so the trick here was to simply test the state of gotEOL right away, and only if it was already set, then clear it and return true. If it wasn't set, then return false:
bit isUartEOL(void) {
    if (gotEOL) {
        gotEOL = 0;
        return 1;
    }
    else
        return 0;
} // isUartEOL()
This works because while the ISR can set gotEOL at any time, we only clear it once it's been set.

Of course now I'm concerned that two end-of-lines in a row will potentially screw things up but I can't force it, and I can always write my host software to ensure that never happens.

-a

List of 12 messages in thread
TopicAuthorDate
lesson learned            01/01/70 00:00      
   "atomic"            01/01/70 00:00      
      atomicity            01/01/70 00:00      
      if thatb is so            01/01/70 00:00      
         Indeed            01/01/70 00:00      
   Bit test and clear            01/01/70 00:00      
      JBC            01/01/70 00:00      
         favourite atomic test-and-set instructions            01/01/70 00:00      
      SDCC also            01/01/70 00:00      
         Classic atomic problem            01/01/70 00:00      
            and/or ...            01/01/70 00:00      
   another example            01/01/70 00:00      

Back to Subject List