Image processing for molecular biologists

This document aims to explain some of the basic principles of digital imaging that are of relevance to molecular biologists who need to make pictures of their molecules for presentations and publications. It is accompanied by a series of slides that summarise the main points, and these can be used as an index to this document. If you don't use this index, essential diagrams are linked to this page at appropriate points in the text.

The slide show can be found at : http://xray.bmc.uu.se/markh/notes/improc

Vectors, Rasters and Antialiasing

Vector Graphics

In vector graphics simple images are built up by drawing a series of lines to represent vertices in the object. This is very efficient and fast for simple objects, but becomes very slow as the complexity of the object increases, and surfaces cannot be represented, only wire-mesh models at best. However, the images are easily scaled, and lose no resolution when enlarged to any degree, plus background colours cost very little. They are not much used in molecular graphics, but you may find them in PostScript and PDF files.

Raster graphics

Most images you will use will be rasterised, meaning that they are built up of a fixed number of dots (pixels) on a regular grid. This means that very complex images cost no more than simple ones in terms of storage space and display time, but that cost is higher than for simple vector graphics. Surfaces and backgrounds can be represented satisfactorily at little extra cost.

Antialiasing

Raster images cannot be scaled up without loss of resolution, so they must be stored at the maximum resolution that will ever be needed, which may result in a huge file, and when they are scaled too much, jaggy lines become apparent along the junctions of the pixels. This can be made less objectionable by antialiasing procedures, whereby pixels on the border between differently-coloured areas are drawn in an intermediate colour to smooth the transition. Note that the antialiasing colours are dependent on the colours on both sides of the boundary, so if you have change the background colour in an image, the antialiasing must also be recalculated.
(Slide1c, antialiasing at work)

Resolution

Resolution measures the amount of detail in a raster image, usually expressed in terms of dots-per-inch (DPI). This means that for an image containing a particular number of pixels, the resolution depends upon the actual measurable size of the image when printed out or displayed on a screen. It is misleading to talk about the resolution of an image file sitting on a disk, since the only meaning this can have is that the file has been tagged with some text that represents the resolution at which someone has decided it should be printed. Changing this number makes no difference to the quality or amount of information in the file.

Computer screens usually have a resolution of 72 DPI, so if you display an image 400 pixels wide, it will have a width of 5 inches (13cm), but if you print the same image on a good-quality printer with a resolution of 300 DPI, it will appear only just over an inch wide (3cm). You can still produce a 5-inch picture (by scaling up in an image-manipulation program, or letting the printer driving software do it for you), but then the resolution will fall to 72 DPI, even though the printer is still drawing 300 dots in every inch (most of those dots will now be duplicates of their neighbours). To get a 5-inch printout of your picture at the maximum resolution of the printer, you will need an image file 1500 pixels across, but if you tried to look at this file on a screen, it would be around 20 inches (50cm) wide.

So as you can see, it is much simpler to talk (and think) about the number of pixels in the image file, rather than the resolution at a particular size, or the size at a particular resolution.

If the size of an image is expressed in megapixels, then it refers to the product of the width and height in pixels. Assuming the image is not extremely elongated, this gives a useful but rough guide to the resolution of the image.

You may also hear people expressing resolution in megabytes, in which case they are hopefully referring to an uncompressed TIFF file which needs 3 megabytes for every megapixel of image. If they are not refering to an uncompressed TIFF, then the number is rather meaningless.

There is an extra step of confusion concerning the resolution to be used when scanning an image from hardcopy. The resolution at which you scan should correspond to the size of the final use of the image, not the original. For example, scanning a 35mm slide (~1 inch long) to make an A4 hardcopy (~10 inches), you should scan at 3000 DPI in order to get around 300 DPI in the final print.

Uprezzing

If you need to increase the number of pixels in an image, you can use the 'Image Size' command in Photoshop. You need to click the button in the bottom right-hand corner labelled "Resample Image", otherwise the number of pixels will stay the same, and you will only change the text labels for resolution and size. Uprezzing an image adds no information, but smooths out the transitions that would otherwise make the enlarged image look jaggy. (Upprezzing)

Formats

There are dozens of formats available for storing images (see http://www.cica.indiana.edu/graphics/image.formats.html for an extensive list). The differences are in the efficiency of storage, resolution issues, the number of colours available, and software support. Different formats are indicated for different situations.

Raster Formats


(Slide3a, a summary of raster storage formats)

GIF - A very common format. The lossless compression is excellent for molecular graphics, and blocks of a few pure colours, but bad for complex real images, because of the limit of 256 colours that can be used. Used for simple images on the web.

TIFF - Losslessly-compressible format like GIF, but not quite so widely readable (no browsers can handle it without an external agent, for example). Popular in the FAX world, and with scanners. There is no colour compression, so this is the format of choice for maximum information fidelity. The penalty is greater file size.

JPEG - Good for photographic images, bad for line art and object graphics. Uses a very clever and efficient algorithm to smooth out small or gradual colour and intensity changes to give enormous (but lossy) compression without damaging the overall impression of the image. Colour depth is enormous. JPEG is not good or efficient with large blocks of single colours or with text. You choose the degree of compression according to how much quality you can afford to lose. Used for photographic images on the web.

RGB - A misleading name used by SGI for their uncompressed pixmapped images. No compression, no loss, no space left on disk.

TARGA - A simple, uncompressible bitmap format similar to RGB.

PICT - A Macintosh format. Good compression for line/block art, but it can be hard to convert once it's left the mac.

PNG - A new format invented to avoid the legal problems associated with the GIF format. It has vastly more colours, and decent compression, but JPEGs will usually be smaller. It is now well supported by all browsers except internet explorer.

So which format should you use ?
For universal use on the web, you are currently restricted to GIFs and JPEGS.

For images with a lot of colours (for example photographs), use JPEGs, and when you save the file try different values for the quality (compression) parameter, choosing the smallest file size that is still acceptably sharp.

If you have block art or only a few colours (less than about 200), then a GIF will be sharper than a JPEG, and probably smaller (and thus faster). A GIF will also give sharper text than a JPEG. Most pictures of molecules are on the borderline, but usually a GIF works fine, and has the advantage of giving good text, and accepting transparent backgrounds.
Because Netscape and Explorer have slightly different default colour palettes, it's best not to use all 256 colours that are possible in a GIF, but to restrict yourself to the 216 web-safe colours. You will usually be offered this option when you save your file as a GIF in your image manipulation program.

For ultimate quality away from the web, such as sending images to publishers, use TIFF.

Other Formats

Postscript - Used mostly for printing, and files tend to be enormous. Tries to store as vector data with fill commands, but can resort to bitmaps. They can in principle be read by the image manipulation programs, but they should really be considered a finally product en route to the printer, not as a storage form.

MPEG - A movie format related to JPEG for stills. Compression in time, space and colour leads to tiny, tiny files. Not really supported by web browsers, but most systems have agents that can deal with the format.

Quicktime - Another movie format that is more flexible than mpeg, but less widely supported. Made popular by Apple.

Compression Ratios

The different compression schemes used by the different formats result in widly varying file sizes for the same image. How widely they differ depends upon the complexity of the image.
(Slide3c, Compression ratios)

Compression artifacts

The different compression schemes also give characteristic artifacts, depended upon the amount of compression used. GIF compression results in quantization of colour that results in banding in the image, whereas JPEG compression results in a blurred, muddy appearance in areas of high contrast.
(Slide3d, Compression artifacts)

Colour Spaces

There are many ways to express colours, RGB and CMYK being two of the the most common. The essential difference is that RGB colours are made by adding Red, Green and Blue colours together, and CMYK colours are made by subtracting from white light the colours Cyan, Magenta, Yellow and blacK. The former is used with luminous screens, the latter with inks on paper, which inevitably absorb more light and give duller colours.
(Slide4a, Colour Spaces)

When pictures are published in a journal, there is an even more limited colour range available than when you use a nice ink-jet printer in the lab. If you send the journal nice bright colours, they will map them to the nearest dull, printable colours, and you will be disappointed. There is not so much you can do about this, except pre-disappoint yourself by choosing your own selection of dull colours when you create the image. This you can do by using Photoshop's 'gamut warning' facility in CMYK mode. The CMYK gamut is dependent on the printing medium, so before changing mode to CMYK, you must first specify the printing conditions that your publisher will be using. "Euro_standard Coated" is the default for european publishers. Having done this, 'gamut warning' will display unprintable colours as grey, so that you can see which regions to redo.To see a map of which colours can be printed, goto http://www.rapidgraphics.com/faq_gamut.html

Shading Models

In flat shading, whole elements of the picture are drawn the same colour, regardless of any surface texture or curvature. With local shading, curvature of elements is taken into account in the lighting model, but independently of the other elements. For the ultimate in realistic imaging, ray tracing is used, where individual rays of light are followed and calculations of reflection angles and colour absorbtion are performed at every interaction with a surface. This results in accurate shadows and reflections, but is very expensive in computational time and memory.

Image Enhancement

There are many ways to enhance the appearance of an image, but here we will only deal with levels adjustments and sharpening.
Most images benefit from having a range of tones from black to white, but often photographic recording compresses the tonal range so that there are only shades of grey. Using the 'Levels' command that you will find in most image processing programs you can stretch these grey tones so that they span the whole range. In photoshop you do this by dragging the little triangles under each end of the histogram so that they are at the start and end of the significant part of the histogram.
The 'Curves' command is a more complicated way to achieve the same effect, and also allows you to change the shape of the histogram.
(Slide6b, Levels and curves)

The Unsharp Mask command makes an image appear sharper by amplifying contrast at edges in the image. No new information appears, and indeed artifacts are added, but our brains perceive the image as sharper.
(Slide6b, Sharpening)

Mark Harris March 2009


This document is : http://xray.bmc.uu.se/markh/notes/improc/improc.html