Deskew Tool Updated

There is a new version of Deskew command line tool introduced in post Deskewing Scanned Documents. Looks like quite a few people found it useful 🙂

What’s new in the latest version?

  • Background color can be defined (empty space around the original page after the rotation is filled with this color)
  • “Area of interest” rectangle to force skew detection only into selected part of the page (useful when  e.g. noisy page borders or images confuse skew detection when processing the entire page)
  • 64 bit and Mac OSX support
  • PSD and TIFF file format support (TIFF only in Win32 for now, sorry)
  • Display of skew detection stats and program parameters

Download

  Deskew 1.20
» 4.1 MiB - 7,206 hits - January 5, 2011 (last update November 1, 2016)
Command line tool for deskewing scanned documents. Binaries for several platforms, test images, and Object Pascal source code included.

Source Code Repository

Public Mercurial source repository of Deskew is now hosted at BitBucket: https://bitbucket.org/galfar/app-deskew.

Deskewing Scanned Documents

Check out updates and new versions of Deskew tool.

Some time ago I wrote a simple command line tool for deskewing scanned documents called Deskew. Technically, it’s a rotation since angles are preserved and skew transformation doesn’t do that. However, deskewing is commonly used term in this context.

Deskewing some smart paper

My approach is fairly common for this problem – rotation angle is first determined using Hough transform and then the image is rotated accordingly. Classical Hough transform is able identify lines in the image and it was later extended to allow detection of any arbitrary shapes.

Lines of text can be thought of as horizontal lines in the image. In a skewed scanned document all the lines will be rotated by some small angle. We can start with the equation of the line y = k · x + q. Since we’re interested in the angle, we can rewrite it as y = (sin(α) / cos(α)) · x + q. Finally, we can rearrange it as y · cos(α) − x · sin(α) = d. Now every point [x, y] in the image can have infinite number of lines going through it, where each is defined by two parameters: angle α and distance from the origin d.

We want to consider lines only for certain points of input image. Ideally, that would be the base lines on which the “text is sitting”. Simple way of determining these points is to check for black pixels which have white pixels just below them. Now for each of the classified points, we determine parameters α and d for all the lines that go through them. To get some finite number of lines, we calculate d for angles α from a certain range (I use angle step of 0.1 degrees). We want to find a line that intersects as many classified points as possible – an accumulator is used to store “votes” for each calculated line. For each point that is believed to be on the text base line, we add one vote for each line that intersects it. At the end, we find the top lines that have the most votes. Ideally, these are the base lines of all lines of text in the document. Finally, we get the rotation angle by averaging angle α of the top lines and rotate the whole image accordingly.

Important part is that one: “check for black pixels which have white pixels just below”. What’s black and white is determined by comparing value of the current pixel against some given threshold. For images where background is plain white and the text is black it’s easy just to use 0.5 as the threshold. But when the background/foreground distinction is not so sharp calculating the threshold adaptively based on the current image can be very useful. Deskew supports both adaptive threshold calculation as well as specifying constant threshold as command line parameter.

Deskewing some math exercise

Implementation is written in Object Pascal and uses Imaging library for reading and writing various image file formats. There are precompiled binaries for a few platforms, others be built from sources using Free Pascal compiler. Archive also contains few test images.

  Deskew 1.20
» 4.1 MiB - 7,206 hits - January 5, 2011 (last update November 1, 2016)
Command line tool for deskewing scanned documents. Binaries for several platforms, test images, and Object Pascal source code included.

Imaging in C++ Builder

I tried compiling Imaging in C++ Builder (it uses Delphi compiler to generate .obj file which C++ linker can link and also generates C++ header for Pascal unit) few years ago. It didn’t work – there was internal compiler error, I think right in ImagingTypes unit.

Few days ago I tried C++ Builder 2010 and was pleasantly surprised. It worked! I tried just the library core for now (ImagingTypes, Imaging, ImagingFormats, Pascal only file handlers, etc.) and it works without problems. I’m not sure which C++ Builder version is required for successful compile though. Versions 6 and 2006 stopped with internal error, 2010 worked, and there are 2 other versions between.

Anyway, I’ll try to check out most of the library until the 2010 trial expires for me. Hmm, I’m wondering how many people use C++ Builder for C++ development – I’ve never did something serious in it, basically just to get object files usable by Delphi – so I have no idea if it’s ok.

PS: Another C++ Builder related news – patch for OpenJpeg library to get it compiling in BCB is posted here.

PasJpeg2000 News

JPEG 2000 for Pascal project is based on OpenJpeg library. For a very long time there was a bug that caused alpha (fourth and subsequent image channels) channel to be saved with all the samples having the value of 0.5 (128 for 8bit channels). This buggy behavior also depended on compiler settings – optimization level in case of GCC. You could use at most O1 in Windows and Linux, and only O0 in Mac OS X. Bug was also present when compiling with C++ Builder (to get object files usable in Delphi) but only when irreversible DWT transformation was enabled in OpenJpeg during encoding (it wasn’t before, but working versions of both PasJpeg2000 and Imaging use it now when lossy compression is selected by user to get smaller files).
You can read more about it in this news group post. Basically it was all fixed by changing a condition in one if statement to prevent accessing the fourth element of a three element array.

So what can you expect in the next version of PasJpeg2000 library?
Higher GCC optimization levels should make it a lot faster when using Free Pascal (particularly in Mac OS X where O0 was used). Irreversible DWT transformation produces smaller lossy files than current PasJpeg2000 version and with optional MCT (multicomponent transform – basically RGB>YCbCr) you get even smaller ones. There’s now also a patch that enables OpenJpeg to get palettes from JP2 files so indexed JPEG 2000 images could be supported too. And finally, there are some bug fixes (wrong reconstruction of subsampled files, …).

Block Compression, DXTC, And Imaging

Imaging supported DXT image/texture compression since one of the earliest releases. Quality of compressed images isn’t very high though (at least the compression isn’t too slow). For future Imaging versions I plan to ditch the current compression code and add a new one. To be precise, two new ones – fast and lower quality (still probably better than current Imaging’s compression), and slow and higher quality mode. Fast one will be based on Real-Time DXT Compression by Id Soft. I’m not decided on high quality one yet, but probably something like cluster fit algorithm from Squish library.

Mainly for testing purposes during implementation of these new methods, I want to create extension for Imaging that compares two images (original and one reconstructed from compressed original) and measures PSNR and some other quality metrics.

I’m also thinking of implementing DXT5 based format using YCoCg colorspace and PVRTC (texture compression currently used in iPhone).

Few links if you’re interested:

Here’s some quick comparison of DXT compressors (click the image to see full size). Value in brackets is MSE (mean square error) – lesser number means compressed image is more similar to original.

DXT compressors comparison

APNG Update

APNG loading and animating implementation for Imaging wasn’t very hard work. However, there are not that many test images to be found on the Internet and most of the available ones are very simple. They’re usually not using disposal methods and are basically just collection of independent images. Big difference compared to GIF files – some of them were quite difficult to animate right. I even got most of nontrivial APNG test files by converting some more complex animated GIF files. Tools for creating APNG images are not yet as sophisticated as some GIF animation tools – there’s GIF to APNG converter, APNG Anime Maker, and some web based tools that assemble simple APNG file from bunch of uploaded single PNG frames.

Now I’m gonna extend PNG saver to allow saving multi images to simple APNG files, much like MNG saving works now. There’s really not a good unified way how to pass some more information to image file savers in current library design – that’s one of TODOs for new Imaging architecture.

APNG Support for Imaging

I started working on support for APNG format for Imaging library. APNG is unofficial extension of PNG image file format created by two guys from Mozilla Corporation. The point of APNG is to allow storing simple animations in PNG files (hence the “A” for “Animated”).

There is already PNG-like chunk based format for animations called MNG (already supported by Imaging – at least the basic features). However, MNG is quite complex format and its support among browsers and image viewers/editors is lacking. Code library supporting all MNG features is huge.

APNG on the other hand is just an extension of PNG and its implementation is not so complex. I’m going to load only the raw frames from files at first and see what will have to be done to support animating the frames next. Canvas class will have to be used here for alpha blending subsequent frames to previous ones. I’ll add option to turn the animating on/off just like it is available for animated GIF files.

More info about APNG: http://www.animatedpng.com and https://wiki.mozilla.org/APNG_Specification.

Imaging in Mac OS X

I wasn’t sure if Vampyre Imaging Library works right in Mac OS X until few weeks ago. One poster in Imaging’s forum wrote a post about scrambled images produced by the library on Mac OS X. Fortunately, the problem was related only to Lazarus LCL support – all other functionality worked fine.

After not so straightforward installation of Mac OS X in VMWare I fixed the issue just by changing number “24” to “32” in the code (TRawImage.Description.Depth field, LCL raw image to TBitmap conversion). Apparently, Carbon created bitmap with 6 bits per color channel. Now I just need to check if 24->32 change doesn’t break anything when using other LCL widget sets (I’m sure there was a reason for 24bits since I vaguely remember 32bits were there few years ago) – so maybe conditional compilation will be needed here.

Another issue I noticed is that LCL Imager demo couldn’t load default image (Tigers.jpg) that is displayed when it is started without parameters. Demo uses relative path to the image but (from Demos/Bin to Demos/Data directory). Mac OS X application LCLImager.app is placed in Demos/Bin directory by Lazarus but it is not a simple single file. It’s a directory itself and actual demo executable is located somewhere inside. I’ve not really decided on solution yet. Maybe embed the image in the executable as resource?

See the difference?

See the difference?

Imaging 0.26.2 released!

My Vampyre Imaging Library was updated to version 0.26.2 few days ago. This was mostly fix/patch/update release with no significant new features.

I decided to remove Kylix support (CLX graphic classes, project files, build scripts, core library still compiles). It’s not working properly on many (all?) current Linux distros (so I can’t test) and it was abandoned by Borland/Codegear quite some time ago. It was nice to have DCC compiler in Linux and it also made Borland to make Delphi RTL crossplatform. There are rumors about crosscompiling features in upcoming Delphi releases (in 2010?)  so maybe we’ll see DCC in Linux again.

Instead of Kylix project files there are new ones for Delphi 2009. Imaging itself didn’t require many fixes to compile and work with Delphi 2009,  most of them were related to text-based file format loaders (XMP, PNM) and external libraries (JpegLib, ZLib).

More info and downloads at Imaging’s homepage.