A Bit of C Debugging

    October 25, 2005

have a customer who has a hodge-podge of machines running various software programs. It’s all a mess, low on disk space, not every machine that should be backed up is backed up, and so on. I’m slowly working toward a better world on some of those things, but we still have to deal with the day to day problems.

One of those problems is their scheduling program. They schedule orders onto presses, track production, typical stuff. It’s driven by an html front end and a bunch of shell, Perl, awk and C programs behind it. Kind of a mess. My intention is to rewrite the whole darn thing, but that is sometime in the future. For now we have to live with what we have.

I’d never looked at the code until yesterday when all orders disappeared from the presses. The screen would come up, but no orders were assigned to anything. Not very useful, so I dug in and found the section of code that read the orders in. I added some debugging, and that showed me that it was succesfully reading the orders. It then read a “press” file, which assigned orders to presses, and that was all fine too, or at least the various presses had orders assigned. The html wasn’t showing them, but I could see that they were there.

The section of code responsible looks like this:

           while(fgets(buff,BUFF,fp)) {
            buff[strlen(buff)-1] = '\0';
            for(i=0; orders[i].name[0]; i++) {
                  printf("debug %d\n",i); /* added */
                  if(strcmp(orders[i].name,buff) == 0) {
                        orders[i].displayed = 1;

It was the “for(i=0; orders[i].name[0]; i++) ” loop that wasn’t being executed, as though there were no orders. My “debug” printf never saw the light of day. But I knew from the read_order function that 488 orders had been read. So what gives?

The immediately suspicious part is that “orders[i].name[0];” in the middle. That’s going to stop the loop if there’s a null string. As I could see it never got into the loop at all, that would mean the very first member of the array was at fault. But it wasn’t. I could see that it contained a valid string when read.

Ahh, yes, “when read”. But years of hacking at my own and other people’s code leads to the next question: what happens to the array after that? So I put this in at various places:

printf("in display %s\n",orders[0].name);

That produced

in display 42688

until just after a "load_flag_delete()" was called. After that, it showed:

in display 0

There’s my null.

I looked at “load_flag_delete” and it had absolutely no reference to the orders[] array. Since it obviously WAS writing to that array, that meant that it was overstepping bounds somewhere – it thinks it’s still writing to a buffer of deleted items, but that’s allocated too small, so it’s writing elsewhere and munging the orders array.

OK. This should be easy to fix. The code is littered with arrays like “char buff[BUFF];”; obviously BUFF is too small. Let’s check what it is; yep, it’s in a “.h” file and it’s defined as 256. What’s the current count of deleted items being read? Why it’s 261. There we are. Fix the .h, change BUFF to 1024 for now, recompile and..

Same problem. Huh? It has to be that. OK, yes it is 4:30 AM and I haven’t had my coffee yet, but I’m not that groggy. The “load_flag_delete” is overwriting the orders array. What the heck?

Back to the code for a closer look. Aaargh! The furshluginner person who wrote this defined the buffer in “load_flag_delete” as “char buff[256]”, not “char buff[BUFF]”, and I hadn’t noticed. Changed that, compiled, and the html now shows orders assigned to presses.

There’s more to be done, of course. This is a horrid mix of junk. It must have been worked on by several people, because why would someone who knew Perl and C write an awk script that takes the absolute worst path through the orders to find matches for completed orders? They wouldn’t, which means someone worked on that later, and all they knew was awk.

But, we play the cards we’re dealt. I’ll neaten this all up eventually, or try to convince the owners to replace it with one of several commercial apps that could do a far better job. I’d rather the latter, honestly. They should have a totally integrated system rather than this conglomeration.

*Originally published at APLawrence.com

A.P. Lawrence provides SCO Unix and Linux consulting services http://www.pcunix.com