Friday, July 23, 2010

Debugging m68k in MAME

Mame has again proven to be an invaluable debugging tool on my embedded software project. The problem began to rear its ugly head when I had moved a subroutine from one file to another. However, I found that this subroutine wasn't even being called when I ran the software, which at first glance seems to make no sense at all.

I traced the origin of the exception flow into the code that executes the rs232 ISR. I observed the condition of the 68000 CPU regs in the MAME debugger, and compared the conditions to a build of the software in which I had made no changes to this particular subroutine. I noticed that in the "correctly" executing software, the SR (status flags register) was left with the value 2700 after returning from the ISR to the (OS) interrupt dispatcher. The 2 indicates that the CPU is in supervisor mode, which is what it should be on the 68000 when executing any privileged operation, such as an ISR.

In the broken software, the 2 was not set after leaving the ISR. Consequently, upon returning from the dispatcher, the SP (A7) is now being incorrectly loaded with 0x3FFFF0 - this address is un-initialized (0), so the return causes the return address of 0 to be popped from the stack. The 68000 detects this as an invalid condition and throws an exception (address 0 is the first entry in the vector table, which is not a vector but rather contains the initial value to be loaded into SP at reset). I didn't notice until later, but the 0x3FFFF0 happens to be the value in USP (user stack pointer register: eventually I would remember that the 68k loads the stack pointer from the USP when not in supervisor mode).

The question now is, why is the SR being clobbered when returning from the ISR? Note that the MAME debugger has no means for source-level debugging: you have to follow along in the ASM listing when using the MAME debugger - a minor inconvenience most of the time. In this case, I finally noticed that the code in question was running into some additional instructions which were not to be found in the listing for the ISR. The code was running these additional instructions and eventually encountering an RTE (return from exception).

The RTE returns from the jump or branch, loading the return address from the stack as might be expected. However, RTE also loads the SR, which is probably not what we want. In this case, the SR is being loaded with whatever happens to be in memory at that location. In the working code, the value loaded into SR happened to coincide with the return address that had been stacked. In the working code, that return address just happened to have the 0x2000 bit set, as needed to keep the SR in supervisor mode. In the broken code, I had moved a subroutine around, and the address that it ended up linking to did not allow the 2 to be set.

Eventually it became obvious that the ISR (written in assembly) was missing an RTS instruction at the end, and the execution was falling through to the next section of code that had been linked in (link order determined by a linker configuration script). This turned out to be some unused legacy code (we seem to have a lot of that) whose purpose was to ...change between supervisor and user mode and reload the status register!

Tuesday, July 6, 2010

Kick the dog

The problem I was working on for the last week is finally seeing some progress. Actually, manifestation of two different problems, which can make debugging embedded software awfully difficult.

First, there is a matter of adjusting the timeout for the watchdog. It turns out that the software is very sensitive to how much data it receives over the RS232 port and will result in process times taking longer periods between "petting" the watchdog. The right answer is somewhere between "not to short" and "not to long". The timeout must have enough wiggle room that we don't get a reset because a task takes a little bit longer when we've bombarded it with serial port messages. On the other hand, we have the potential for a real error where a task totally goes out in the weeds.

The side effect of the watchdog timeout that has caused me so much grief lately is that the software apparently doesn't cleanly reset everything on a "warm" reboot... particularly, it seems to leave interrupts enabled. The result is a race condition after a watchdog reset, where the interrupt fires before the software has a chance to do some other initialization, resulting in a really bizarre error in a part of the software that shouldn't even be executing, all because of wrongly or uninitialized data following the reset.

I have given the MAME debugger quite the workout lately: breakpoints, data watchpoints, execution traces - all these features are indispensable.

Saturday, June 19, 2010

ICE ICE Baby

Today I think I will make a few introductory comments to get things started. For now, the topic will be the work I am doing in embedded software development.

The project I am working on for the last few months has proven to be interesting and challenging. The customer has a "legacy" product line build around older microcontroller technology: the motor controller circuit board appears to have been designed around 1990 or the late 80's, and is based around the Motorola 68000 (68k) CPU. When I first saw the board I was nearly in a state of shock: I don't recall the last time I was involved in a project where the circuit board did not utilize surface mount components! And EPROMs! I mean the ones that you have to erase with the UV lamp thingy!

If you know anything at all about the Motorola processors, you might realize that the 68k processor was cutting edge technology back in the Reagan years. This was a powerful micro for it's time: it's the CPU that found its way to the early Macintosh, many arcade video games, a few UNIX workstations, and even embedded applications.

The 68k has been obsolete (but not forgotten) for a long time now, and as a microcontroller for embedded applications, the 68k is - not really a microcontroller in the sense that I typically think of one. The 68k, being "only" a CPU, still requires a fair number of other integrated circuits in order to make it useful. Nowadays, its all about integration and even the lowliest Microchip PIC has a good bit of peripheral devices (timers, IO ports, etc) not to mention RAM and Flash ROM integrated into the package. The 68k was nonetheless a popular choice for embedded apps in its time. So much so that Motorola developed an extensive product line specific to embedded application, giving rise to the CPU32 family of controllers (but even those have been obsoleted in favor or newer technology such as the PowerPC architecture).

Often one of the first tasks on these kinds of jobs is get the software development environment and tools working. On this project we already have some "working" code (more on that later), so hopefully I can compile the previously released configuration of their code, and then (can you believe this part?): load the code to EPROMs! I mean real EPROMs with the little glass window thingy on the top where you can see through to the chip! Anyway if the compiler is setup and working properly then the code will compile and link, and then I will have a fresh firmware image available to load. After programming a new pair of EPROMs (which takes about 10 seconds using a chip programmer connected to the USB port on my computer), we grab an IC-desocketer-grabber-thingy (or a pliers or a hammer or whatever else is needed) and proceed to swap the new EPROMs to the board. If all has gone according to plan, then the board will do whatever it is supposed to do when it is powered it up. The developers of this board were wise enough to install some LEDs, and the software will blink the LED as long as the code is still running properly.

Prying the EPROMs off of the board every time you load and test code is of a course a pain in the behind - and for a while anyway there was an entire thriving cottage industry built around selling very expensive dongles and gadgets and doodads to people like me for (seemingly) ridiculous prices. The newer microcontrollers now almost invariably feature not only on-chip RAM and flash memory, but also dedicated IO pins and single-step modes of operation for debugging.

So now I have now found myself warped back to the ICE age (in-circuit-emulator age that is!). The previous devel team actually had been using a PROMice ROM emulator + resident ROM monitor in order to debug the software with the Xray debugger. On this board the CPU is not socketed, but the ROMs can be replaced to do software upgrades (imagine a time before every stinking gadget imaginable was connected to the internet and back before maybe even ... the existence of the internet itself?)

For those who are not in the know, the PROMice, at the risk of gross oversimplification, is basically a black box containing some RAM which is connected into the ROM sockets of your board (the emulator replaces the ROMs on the board with its own specially fabricated ribbon cable adaptor which matches the pinout of the ROM chip that is being emulated i.e. 28 pin devices require a 24 pin adapter, and so on for 28 pin or 32 pin devices etc). The emulator RAM is controlled from the host workstation or laptop by running a utility program on your workstation where you set up the memory size and some other properties, choose a firmware image to load (binary or hex format), and then download the image to the emulator memory. Once you power the board or reset it, the board runs from the emulator RAM as long as the emulator is powered.

So ... after extricating the PROMice from a cabinet where it was languishing beneath a bunch of old cables and circuit boards and nameless other technological artifacts, the first order of business is to dig the two EPROMs off of the board without buggering up those chips or the sockets. Following the instructions in the very clear and coherent end-user documentation, I proceeded to plug everything in. Did I mention that the PROMice works over parallel or serial (RS232) ports - try finding a laptop with those anymore. The PROMjet software is very cryptic, like it was developed in the Windows 95 era. Well I'm not surprised at all that the thing is not working.

Before I wasted a bunch of time debugging this antiquated piece of equipment, I contacted the emulator vendor (Grammar Engine)... and I found out that the company was no longer in the development tools business - Grammar Engine had discontinued operations and essentially had opened a new business developing some other kind of electronic product. Anyway, the fellow that had been responsible for the Grammar Engine product support had continued in the product engineer role at the new company. This fellow Arvind that I spoke to was very courteous in handling my call, and offered what assistance he could. He mentioned that they had actually continued to support the Grammar Engine products until just a couple of years ago, but they hadn't even taken any support calls for some time. I ended up getting a PDF of the emulator schematics for troubleshooting, but never did get the thing working. Well, I didn't try very hard... one can waste a bunch of time getting obsolete equipment to work, or... alternatively, go buy something else to replace it! And who doesn't like to buy new things...

Thursday, June 17, 2010

Hit Any Key to Begin

Well here it is, the first post at Assignment Earth.