Digitizing books with a camera and open source software

Last year I had need to help out with digitizing an old book, so that its current content can be updated, expanded and ultimately reprinted. The current copy was written directly on a typewriter and ran to almost 250 pages of very dense text. Transcribing that much content manually was not an enticing prospect, so I started investigating the options for automation. I quickly found the open source Tesseract OCR software which runs on Linux, Windows and OS-X. This dates back into the mid-80’s and was open sourced by HP in 2005. Tesseract just focuses on the core OCR tasks, and leaves image acquisition to other tools; likewise post-recognition processing.

Reading about how it works it becomes clear that the biggest factor in accuracy is the quality of the input images. It converts the input image to monochrome, approximately speaking, by applying a threshold algorithm to the image. For this to work effectively the image has to be evenly illuminated; any kind of gradient across the background will confound the monochrome conversion leading to large blocks of text getting lost.

A flatbed scanner is not a satisfactory way of capturing the pages of the book because it is impossible to get the pages flat without damaging the spine. Instead a digital SLR camera is the preferred tool, with flashgun(s) to provide illumination. Even when using a camera, the spine of the book is still a problem as if you simply open the book on a flat surface the pages will curve near the spine leading to uneven illumination & distortion of the text which will ruin OCR accuracy. The solution to this is to construct some kind of book scanning rig that will support the book such that it opens to an angle somewhere in the region of 110-140 degrees. This might sound like overkill for a single book, but it is well worth the effort.

The simple book scanning rig, constructed from MDF, large enough to hold a book upto approximately 20x30cm in size.

The rig is very simple to construct requiring little more than a sheet of MDF, wood screws, a jigsaw, drill and screwdriver. The exact dimensions are not important, this one was sized to fit the book that was being digitized using off-cuts of MDF leftover from a previous job. Two rectangular pieces of MDF were used to form the long sides of the rig, and v-shape cut out from them to form an angle approximately 120 degrees . Two more pieces of MDF were cut to form the short ends of the rig. The four peices are screwed together to give the basic box form. A final two pieces MDF are cut to cover over the v-shape depression, and screwed to the sides. The screws were all countersunk and covered with wood filler. The final task was giving the thing a coat of white paint. The construction didn’t take more than an hour and a few hours between two coats of paint. It was allowed to fully dry overnight.

The v-shaped depression cradles the spine of the book, but the pages are still a little curved. The trick to dealing with this is to place a sheet of glass on top of the page. The glass needs to be thick enough to have sufficient weight to hold the pages completely flat. A salvaged window pane in the workshop happened to be the right size and suitable weight. If buying glass new it is preferable to have the edges rounded off smooth. The salvaged glass had rough edges so some red electrical tape was stuck over the edges to protect the book. Bear in mind that the tape must not cover any text that needs scanning – generally there are sufficient margins in books that this won’t be a problem.

Book on the scanning rig with glass weight holding the page flat

With construction of the rig to support the book complete, it was time to begin the image capture process. To avoid distortion of the text, the camera needs to be perfectly aligned such that the lens is perpendicular to the page. Needless to the say, the camera needs to be placed on a tripod and positioned over the pages of the book. The parallel pencil lines drawn on the side of the book scanner rig are there to assist in aligning the camera. The OCR process will work best with images that have a resolution of at least 300 DPI. To get near this kind of resolution, the camera lens needs to be chosen to ensure the page will fill the image. Zoom lens in particular suffer from geometric distortion at their extremes of focal length and aperture. Picking a middling aperture in the region of f/8 will minimize the distortion & thus improve OCR accuracy.

To avoid a gradient across the background of the page the lighting setup during image capture is important. It is unlikely that either normal room lighting or natural lighting from windows will give even illumination. It is better to take full control by using camera flash guns. Ideally a pair of flash guns would be positioned either side of the book, their combined beams giving the desired result. I only had a single flash gun available, but the room was blessed with large white walls and ceilings. The flash gun was thus pointed at an angle towards the wall on a wide spread. This required a high power level on the flash, but the resulting reflected light gave excellent results.

It is possible to do everything in camera, but since I wrote the Entangle Photo software for controlling digital SLRs, I naturally used a laptop to control the camera. This allowed the images to be reviewed on a large display on the fly. It was now simply a matter of running through the book, turning the page and pressing the shutter button ~250 times. It was quicker to capture all the odd pages first, and then flip the book around and capture all the even pages. If capturing lots of books a second camera would be desirable allowing odd and even pages to be captured at the same time.

Camera mounted on tripod above the book scanning rig. Flash gun is pointed at a white wall for reflected light. The laptop controls the camera and displays captured images.

With this rig in operation it was possible to easily capture about 6-8 pages a minute, allowing the whole book to be captured in less than 40 minutes. Due to the positioning of the camera rig, the pages were all perfectly square wrt the image, but it is still necessary to crop the images to eliminate borders which can confuse the OCR process. The Darktable application makes it easy to process large numbers of raw images, cropping them all to the same extents.

The cropped images can now be through the Tesseract command line program to perform the OCR process. The results, for the most part, were really very impressive in terms of accuracy. Where it had problems in particular were with some pages typed on very thin paper, such that text from the reverse would bleed through. Those pages had to be thrown away and transcribed by hand, but this was only 6 pages out of 250. It uses a language dictionary to analyse the recognised text and resolve ambiguity. This can lead to similar looking words being substituted and it notably falls down on place and person names which are largely absent from the dictionary.

The results of OCR on each image were saved to a separate plain text file. These were loaded up in gedit and the input images loaded up in gthumb. Their respective windows were tiled such that they filled the screen side-by-side. gedit annoyingly doesn’t have a way to turn on the spell checker by default, but a one line change in the source code & a rebuild was sufficient to fix this. Each text file had to be read through and the highlighted spelling mistakes all corrected. This was by far the most time consuming part given the large amount of text per page, taking 1-3 minutes per page depending on the quantity of corrections needed, adding up to many hours work in total across the full 250 pages.

The OCR process aims to preserve the page layout so will put hard line breaks after every line of text. The new digital edition of the book is expected to be printed at a different size than the original, so the layout was inevitably going to change. Thus a second pass through every page is made to remove the hard line breaks, leaving just the paragraph breaks.

Each plain text file still represents a single page, so they are now concatenated with form feed character inserted between each file’s contents. The digitizing of the book is essentially now complete and the result ready to be loaded into LibreOffice for the interesting editing work to start. It is hard to accurately say how long the whole digitization process took, since the effort was spread, sporadically, a little at a time over a large number of weeks. It was, however, definitely faster and less tedious than transcribing the entire text of the book manually. With good quality input images, and the right language dictionary, the accuracy of Tesseract OCR is very impressive and well worth using.

Polar alignment of the SkyWatcher Star Adventurer

The SkyWatcher Star Adventurer mount is a good quality equatorial tracking mount for DSLR based astrophotography. It is reasonably portable, so combined with a sturdy tripod, it is well suited for travel when weight & space are at a premium. It is not restricted to night time usage either, providing an option for tracking the movement of The Sun, making it suitable for general solar imaging / eclipse chasing too.

The key to getting good results from any tracking mount is to take care when doing the initial setup and alignment. The Star Adventurer comes with an illuminated polar scope to make this process easier. The simple way to use this is to rotate it so that the clock positions (3, 6, 9, 12) have their normal orientation, and then use a smart phone application to determine where Polaris should appear on the clock face. The alternative way is to use the date / time graduation circles to calculate the positioning from the date and time. Learning this process is helpful if your phone batteries die, or you simply want to avoid bright screens at night time.

The explanation of how to use the graduation circles in the manual is not as clear as it should be though, so this post attempts to walk through the process with some pictures along the way.

Observing location properties

The first thing to determine is the longitude & latitude of the observing location, by typing “coordinates <your town name>” into Google. In the case of Minneapolis it replies with

44.9778° N, 93.2650° W
The second piece of information required is the difference between the observing location longitude and the longitude associated with the timezone. Minnepolis is in the USA Central timezone which has a central meridian of 90° W, so the offset is:
93.2650° W - 90° W == 3.2650° W

Rough tripod alignment & mount assembly

Even though the Star Adventurer is a small portable mount, the combination of the mount, one or more cameras, and lens / short tube telescopes will have considerable weight. With this in mind, don’t try to get away with a light or compact tripod, use the strongest and heaviest tripod that you have available to support it well. When travelling, a trade off may have to be made to cope with luggage restrictions, which in turn can limit the length of exposures you can acquire and/or make it more susceptible to wind and vibrations. To increase the rigidity of any tripod, avoid fully extending the legs and keep them widely spaced. If the tripod has bracing between the legs use that, and if possible hang a heavy object beneath the tripod to damp any vibrations.

On this compact tripod the legs are only extended by 1/4 the normal length to maximize stability.

With the tripod erected, the first step is to attach the equatorial wedge. The tripod should be oriented so that the main latitude adjustment knob on the wedge is pointing approximately north. Either locate Polaris in the sky, or use a cheap hand held compass, or even a GPS app on a smart phone to determine north.

At this time also make sure that the two horizontal adjustment knobs are set to leave an equal amount of slack available in both directions. This will be needed when we come to fine tune the polar alignment later.

Now adjust the tripod legs to make the base of the wedge level, using the built-in omnidirectional spirit level to gauge it.

The spirit level on the wedge should have its bubble centred to ensure the tripod is level

The final part of the approximate alignment process is to use the altitude adjustment knob on the wedge to set the angle to match the current observing location latitude. As noted earlier the latitude of Minneapolis is 44.9778° N, so the altitude should be set to 45 too. Each major tick in the altitude scale covers 15°, and is subdivided into 5 minor ticks each covering .

Latitude adjustment set to 45 corresponding to the latitude of Minneapolis

At this point the main axis of the mount should be pointing near to the celestial north pole, but this is not anywhere near good enough to avoid star trailing. The next step will to do the fine polar alignment.

Checking polar scope pattern calibration

For a mount that has not yet been used, it is advisable to check the calibration of the polar scope pattern, as it may not be correct upon delivery, especially if the unit has been used for demo purposes by the vendor or was a previous customer return. Once calibrated, it should stay correct for the lifetime of the product, so this won’t need repeating every time. Skip to the next heading if you know the pattern is correctly oriented already.

The rear of the main body has two graduated and numbered circles tracking time and date. The outer circle is fixed against the body and marked with numbers 0-23. Each of the large graduation marks represents 1 hour, while the small graduation marks represent 10 minutes each. The inner circle rotates freely and is marked with numbers 1 through 12. Each of the large graduation marks represents 1 month, while the small graduation marks represent approximately 2 days each. The inner circle has a second scale marked on it, with numbers 20, 10, 0, 10, 20 representing the time meridian offset in degrees. The eyepiece has a single white line painted on it which is the time meridian indicator.

To check calibration the inner circle needs to be rotated so that the time meridian circle zero position aligns with the time meridian indicator on the eyepiece.

The zero position on the time meridian circle is aligned with the time meridian indicator mark on the eyepiece.

Now while being careful not to move the inner circle again, the mount axis / eyepiece needs to be rotated so that the zero mark on outer time graduation circle aligns with the date graduation circle Oct 31st mark (the big graduation between the 10 and 11 numbers).

While not moving the inner cicle, the mount axis / eyepiece is rotated so that the number zero on the time graduation circle lines up with the large graduation between the 10 and 11 marks on the date graduation circle.

These two movements have set the mount to the date and time where Polaris will be due south of the north pole. Thus when looking through the eyepiece, the polar alignment pattern should appear with normal orientation, 6 at the bottom, 9 to the left, 3 to the right and 0 at the top. If this is not the case, then a tiny allen key needs to be used to loosen the screws holding the pattern, which can then be rotated to the correct orientation.

As mentioned above this process only needs to be done once when first acquiring the mount. Perhaps check it every 6-12 months, but it is very unlikely to have moved unless the screws holding the pattern were not tightened correctly.

Polar alignment procedure

After getting the tripod setup with the wedge attached and main body mounted, the process of polar alignment can almost begin. First it is recommended to attach the mount assembly dovetail bar and any cameras to the mount body. It is possible to attach this after polar alignment, but there is the risk of movement on the mount which can ruin alignment. The only caveat with doing this is that with many versions of the mount it is impossible to attach the LED polar illuminator once the dovetail is attached. Current generations of the product ship an shim to solve this problem, while for older generations an equivalent adapter can be created with a 3-d printer and can often be found pre-printed on ebay.

Earlier the difference in longitude between the timezone meridian and the current observing location was determined to be 3.2650° W. The inner graduated disc on the mount needs to be rotated so that the time meridian indicator on the eyepiece points to the time meridian circle position corresponding to 3.2650° W

The time meridian indicator is aligned with the time meridian circle position corresponding to 3 W, which is the offset between the current observing location and the timezone meridian.

Now without moving the inner dial the main mount axis / eyepiece needs to be rotated to align the time graduation circle with the date graduation circle to match the current date and time. It is important to use the time without daylight saving applied. For example if observing on May 28th at 10pm, the time graduation circle marking for 21 needs to be used, not 22. May is the 5th month, and with each small graduation corresponding to 2 days, the date graduation circle needs to aligned for the graduation just before the big marker indicating June 1st.

The time graduation circle marking for 21 is aligned with the date graduation circle marking for May 28th.

The effect of these two movements is to rotate the polar scope pattern so that the 6 o’clock position is pointing to where Polaris is supposed to lie. Hopefully Polaris is visible through the polar scope at this point, but it is very unlikely to be at the right position. The task is now to use the latitude adjustment knob and two horizontal adjustment knobs to fine tune the mount until Polaris is exactly at the 6 o’clock position on the pattern.

View of pattern through polar scope when set for 10pm on May 31st in Minneapolis, which is almost completely upside down. Polaris should be placed at the 6 o’clock position on the pattern.

Notice that the polar scope pattern has three concentric circles and off to the side of the pattern there are some year markings. Polaris gradually shifts from year to year, so check which of the concentric rings needs to be used for the current observing year.

The mount is now correctly aligned with the North celestial pole and should accurately track rotate of the Earth to allow exposures several minutes long without stars trailing. All that remains is to turn the power dial to activate tracking. One nice aspect of equatorial mounts compared to alt-az mounts, is that they can be turned off/on at will with no need to redo alignment. When adding or removing equipment, however, it is advisable to recheck the polar scope to ensure the mount hasn’t shifted its pointing.

The power dial set to normal speed tracking for stars.

Processing workflow for lunar surface images using GIMP and G’MIC

This post is going to illustrate a post-processing workflow for lunar surface images using the open source tools GIMP and G’MIC. GIMP has been largely ignored by astrophotographers in the past since it only supported 8-bit colour channels. The long awaited GIMP 2.10 release in April 2018, introduced 16-bit and 32-bit colour channel support, along with many other important improvements that enable high quality post-processing.

Astrophotographers seeking to present high detail images of The Moon, have long recognised that capturing a single still image is not sufficient. Instead normal practice is to capture a high definition video at a high frame rate lasting for a minute or more, by attaching a webcam to a telescope in place of the eyepiece. A program such as AutoStakkert2 will then process the video, analysing the quality of each video frame, selecting the “best” frames, and then merging them to produce a single frame with less noise and more detail. The output of AutoStakkert2 though is not a finished product and requires further post-processing to correct various image artefacts and pull out the inherent detail. A common tool used for this is Registax which particularly found popularity because of its wavelet sharpening feature.

Use of AutoStakkert2 can be a blog post in its own right, so won’t be covered here. What follows will pick up immediately after stacking has produced a merged image, and show how GIMP and G’MIC can replace use of the closed source, Windows based Registax tool.

The source material for the blog post is a 40 second long video captured with a modified Microsoft Lifecam HD paired with a Celestron Nexstar 4GT telescope. Most astrophotographers will spend £100 or more on CCD cameras directly designed for use with telescopes, so this modded Lifecam is very much at the low end of what can be used. This presents some extra challenges, but as can be seen, still allows for great results to be obtained with minimal expense.

The first noticeable characteristic of the video is a strong pink/purple colour cast on the edges of the frame. This is caused by unwanted infrared light reaching the webcam sensor. A IR cut filter is attached to the webcam, but it is positioned too far away from the CCD chip to be fully effective. A look at a single video frame at 100% magnification shows high level of speckled chromatic noise across the frame. Finally the image slowly drifts due to inaccurate tracking of The Moon’s movement by the telescope mount and features are stretched and squashed due to atmospheric distortion.

100% magnification crop of a single still video frame before any processing

After the video frames are stacked using AutoStakkert2, the resulting merged frame shows significant improvements. The speckled noise has been completely eliminated by the stacking process which effectively averages out the noise across 100s (even 1000s) of frames. The image, however, appears very soft lacking any fine detail and there is chromatic aberration present on the red and blue channels

100% magnification crop after stacking top (50% by quality) video frames in AutoStakkert2

AutoStakkert2 will save the merged image as a 16-bit PNG file, and GIMP 2.10 will honour this bit-depth when loading the image. It is possible to then convert it to 32-bit per channel before processing, but for lunar images this is probably overkill. The first task is to get rid of the chromatic aberration since that has the effect of making the image even softer. With this particular webcam and telescope combination it is apparent that the blue channel is shifted 2 or 3 pixels up, relative to the green, while the red is shifted 2 or 3 pixels down. It is possible to fix this in GIMP alone by decomposing the image, creating a layer for each colour channel, then moving the x,y offset of the blue and red layers until they line up with green, and finally recomposing the layers to a produce a new colour image.

This is a rather long winded process that is best automated, which is where G’MIC comes into play. It is a general purpose image processing tool which has the ability to run as a GIMP plugin, providing more than 450 image filters. The relevant filter for our purpose is called “Degradations -> Chromatic Aberrations“. It allows you to simply enter the desired x,y offset for the red and blue channels and will re-align them in one go, avoiding the multi-step decompose process in GIMP.

G’MIC Chromatic Aberration filter. The secondary colour defaults to green, but it is simpler if it is changed to blue, since that is the second fringe colour we’re looking to eliminate. The preview should be zoomed in to about 400% to allow alignment to be clearly viewed when adjusting x,y offsets.

100% magnification crop after aligning the RGB colour components to correct chromatic aberration.

With the chromatic aberration removed the next step is to get rid of the colour cast. The Moon is not a purely monochrome object, with different areas of its surface have distinct colours which would ideally be preserved in any processed images. Due to the limitations of the camera being used, however, the IR wavelength pollution makes that largely impossible/impractical. The only real option is to desaturate the image to create an uniformly monochrome image. If a slightly off-grey colour tint is desired in the end result, that could be added by colourizing the final image.

100% magnification crop after desaturating to remove colour cast due to IR wavelengths

The image that we have at this stage is still very soft, lacking in any fine detail. One of the most popular features in Registax is its wavelet based sharpening facility. Fortunately there are a number of options available in GIMP now that can achieve comparable results. GIMP 2.10 comes with a “Filters -> Enhance -> Wavelet decompose” operation, while G’MIC has “Details -> Split Details (wavelets)” both of which can get results comparable to Registax wavelets operating in linear mode. The preferred Registax approach though is to use guassian wavelets, and this has an equivalent in G’MIC available as “Details -> Split Details (gaussian)“. The way the G’MIC filter is used, however, is rather different so needs some explaining.

Split details (gaussian) filter. The image will be split into 6 layers by default, 5 layers of detail and a final background residual layer. Together the layers are identical to the original image. The number layers together with the two scale settings determine the granularity of detail in each layer. The defaults are reasonable but there’s scope to experiment if desired.

Describing the real mathematical principals behind gaussian wavelets is beyond the scope of this posting, but those interested can learn more from the AviStack implementation. Sticking to the high level, when the plugin is run it will split the visible layer into a sequence of layers. There is a base layer “residual” and then multiple layers of increasingly fine details applied with “Grain Merge” mode. Taken together these new layers are precisely equivalent to the original image.

The task now is to work on the individual detail layers to emphasize the details that are desired in the image, and (much less frequently) to de-emphasize details that are not desired. To increase the emphasis of details at a particular level, all that is required is to duplicate the appropriate layer. The finest detail layer may be duplicated many, many, many times while coarse detail layers may be duplicated only once, or not at all. If even one duplication is too strong, the duplicated layer opacity can be reduce to control its impact.

GIMP layers. The default G’MIC split details filter settings created 6 layers. The layer labelled “Scale #5” holds the fine details and has been duplicated 5 times to enhance fine details. The “Scale #4” and “Scale #3” layers have both been duplicated once, and opacity reduced on the “Scale #3” duplicate.

It is recommended to work in order from coarsest (“Scale #1”) to finest (“Scale #5”) detail layers, and typically the first two or three levels of details would be left unchanged to avoid unnatural looking images with too much contrast. There is no perfect set of wavelet adjustments that will provide the right amount of sharpening. It will vary depending on the camera, the telescope, the subject, the seeing conditions, the quality of stacking and more. Some experimentation will be required to find the right balance, but fortunately this is easy with layers since changes can be easily rolled back. After working on an image, ensure it is saved in GIMP’s native XCF format, leaving all layers intact. It that then be revisited the following day with a fresh eye whereupon the sharpening may be further fine tuned with benefit of hindsight.

100% magnification crop after sharpening using G’MIC gaussian wavelets filter and GIMP layer blending

As the image below shows, even with a modified webcam costing < £20 on a popular auction site, it is possible to produce high detail images of The Moon’s surface, using open source tools for all post-processing after the initial video stacking process.

Complete final image after all post-processing