2012-08-11

Re-inventing wheels

Quite often I end up writing some sort of utility function, normally in C. I tend not to write full blown applications that often (well, not for PC environment) as apps tend, these days, to be web based with back end utilities of various sorts. Today I was coding something which had previously been done in various scripts using various existing utilities. Basically, it is a tool to handle JPEG images from my camera(s) and populate a mysql database entry with a load of data from the image (width, height, ISO, aperture, exposure, camera, lens, etc). The database is used to make the photo site work.

So there are a number of stages to the process of making a new utility like this.

1. Has someone else already written it?

Often this is the case, as there are so many people and so many computers out there. There are two snags you run in to. Some times someone has written a big, all powerful system that does everything, including what you want - this is great, expect such things can be slow, and big, and take a lot of RTFM and config, but sometimes it is the way to go. The other issue can be that there is a small quick utility that does not quite do what you want, so needs wrapping up in a script with other stuff, or requires some minor changes. One can change an existing app, and even send in new code submissions. Sometimes this too is the right way to go. However, all too often there is not an exact fit for what you need.

2. Can you script it?

The other approach, especially when there are various utilities and tools that do some of the job, is to make a script. Code it using some scripting language (yes, I know, I still use csh, and should be shot). It can be easy to hack about and change, but tends to be slower and messier than doing C code. In this instance this is what I had done, and I could not find a way to add an extra detail (the lens name) using existing tools.

3. Is there a library?

Of course, if you are going to code something then there may well be an existing library. Rarely is the library perfectly suited, and can either be bloat or broken just like existing utilities. Some times there is a perfect library for the job. A good example of a library that just works, and whilst it is complex, if has good documentation and examples, is the curl library. I have, in the past, hand coded http requests, but these days I just use libcurl.

For the image processing task, there is both a jpeg library and an exif library. I was using libexif already in a simpler utility that was used by my scripts. It was working fine but only did part of the job.

4. Make a library?

One thing I have done a few times now is made my own library. I have tools for mysql working, and for XML file handling, and generating png images. All of these came about because existing libraries were massive bloat or were broken in some way. Using my own library means I have to update it and maintain it as needed, but also, only as I need when I need new features. It means it does just what I want and I understand it. There are several people using my libraries already, so there is a tad more maintenance from time to time. This sort of thing works well when not working against a moving target - where specifications and current practice are not constantly changing.

5. Writing it from scratch?

Some times I do get to properly reinventing the wheel, as I did today. It means reading RFCs and specifications carefully, coding, and testing. It means lots of error traps and test cases.

So what was I doing and why did I re-invent the wheel, yet again?

The problem is that the JPEG/EXIF file format is complex. Well, not so much complex, but involves lots of different bits. JPEG is a codec format. EXIF defines various blocks in the JPEG file. EXIF uses TIFF format tags. These are in IFD blocks, which reference each other. Then there are camera specific uses of the format which are not standard but follow the standard to some extent. So some of the work is trial and error as well.

I want to extract a few key parameters of the JPEG file quickly. I don't want to load the whole file, just the necessary headers. Then I update the mysql database. I can use my own library for the mysql stuff, that also avoids any sql injection attacks by someone sending me a JPEG with SQL statements embedded in it.

I managed to get all but one of the data fields using libexif. The one I wanted was the lens name which is in the Maker Note field in the EXIF sub IFD from IFD0. Now, libexif is clever, in that it does not just grab the tagged fields, but understands many of them from the EXIF specification. It has the names of the fields, and knows how to display them. It knows Maker Note, and understands formats used by many camera makes including Canon. However, the lens name is not in its list, and so it does not extract it. There seems to be no way to tell it that I know what it is and just give me the damn string.

Well, this was a new thing, so I was prepared to leave it out for now, and try and see if there is an updated libexif I can use later which knows it.

But then I found some images were not working. Basically, two of the really simple parameters were not always right - the width and height. This is because I was getting these from the EXIF tags. But one of the cameras seems to rotate the image for portrait use, but does not change the EXIF tags (which, to be fair, are X and Y, not width and height). So I was seeing width and height swapped on portraits. But the JPEG is right. The problem is the image width and height is not an EXIF field, it is in the JPEG SOF0 header. Previously I used an imagemagick function to get this in a script, but I wanted to do this in one quick and efficient C tool.

So, re-inventing the wheel, I load the headers, including the first bytes of the SOF0 header to get width and height. I was able to scan the Maker Note for the lens name with no problem. It was not that hard to code. In fact, I think it took no longer than using the libexif did, and now I understand EXIF and TIFF a lot better as a result. I wish I had done it this way the first time.

1 comment:

Comments are moderated purely to filter out obvious spam, but it means they may not show immediately.

How long is a month?

A month averages 30.436875 days, apparently, according to google, but sounds right. A lunar month as we see from Earth is 29.5306 days, agai...