This document was put together by b3ta member claws of doom as part of a digitisation project which is to be found here.

Why do we have file formats?

We have file formats because they form the basis of a convenient way of passing image data from machine to machine. If I were to ask you to pass information about an image to me over the phone, how would you go about it? Would you break the image into sections and describe what is in each section? Would you try and convey the whole of the image at once? Would you try and define what colours there are in the image?

File formats are merely pre-agreed standard ways of describing images – so that when your computer talks to mine about a so-and-so colour in a specific part of the image, my computer will be able to understand exactly, and replicate the image your computer describes.

File formats

This a rough guide into what each of the popular file formats was designed to do. It'll discuss which format is suitable for different purposes, and what the specific strengths and weaknesses of each file format is. This isn't meant to be an absolutely accurate fact-laden primer – merely a gentle introduction into discussing file formats.

The three formats under scrutiny here will be GIF (Graphic Interchange Format), JPEG (Joint Photographic Experts Group), and TIFF (Tagged Image File Format). There are many (many!) more file formats than the three described here. These three are merely those used most often on the web for their given purposes.

The basics

All the file formats follow roughly the same logic for storing images. They take an image, break it up into manageable bits, and then store those bits in an order that can be reconstructed later. The "bits" that the images are broken into, and the way the order is chosen is what differentiates the file formats (on a very basic level).

GIF

A GIF is mostly used on the web for logos, lines and one pixel images. It is the only image format that can handle animations on the internet (not discussed here). GIFs are limited (by their definition) to be able to handle at most 256 colours. This means that they cannot accurately display photographic images (which need 256 colours in each base colour: Red, Green and Blue).

To store an image as a GIF, the following steps are taken.

  1. The image is analyzed to see which set of colours (up to 256 different ones) will best describe the colours present.
  2. The colours are defined and each tagged with a different key (colour a, b, c etc. if you will)
  3. The lines of the picture are placed end to end, with the topmost line first.
  4. Each individual pixel is then described. This generates a line of data which could appear as "a,b,a,a,a,c,b,a,a,b,b,b,b,c,a" etc.
  5. This is compressed to: "a,b,3a,c,b,2a,4b,c,a" etc.

From this you can see that a large swathe of the same colour takes up very little space – imagine the section compressed by "2478a". These large swathes of colour occur mostly in logos and blocks of colour. Even the slightest change of shade between pixels will render the compression useless.

Given the original picture dimensions the image can be worked out again, by merely reading the list of data and laying each pixel down – much like a mosaic. If the image had less than 256 colours in it at the start, you wouldn't have lost any detail in your image by storing it as a gif.


JPEG

A JPEG is mostly used for photographs and complex images where continuous change of tone is required. The number of colours aren't limited. Compression is dependent on the image quality desired.

To store an image as a JPEG, the following steps are taken:

  1. The image is broken up into bits: squares of a size dependent on the compression desired. The higher the compression, the bigger the square. Squares can typically be about 5 pixels by 5 pixels.
  2. The top left corner, and bottom right corner of a square is then looked at, and the colours in those corners defined.
  3. A simple mathematical equation is used to draw a colour curve from the top left corner to the bottom right, approximating as good a match as is possible.
  4. A list is generated of the two colours and the equation for each box, in their order. (example: Col1, Col2, Eq1; Col 3, Col1, Eq2). This list is not compressed – compression occurs by making the size of the squares bigger – requiring less of them to complete a whole picture.

From this you can see that if you had any fine detail (smaller than the size of the squares), it would be lost – or at least altered as an approximation. When these fine detail changes are visible, they are known as "artefacts".

Once again, given the dimensions of the original image, it is a simple process to re-lay all the boxes in their correct order and generated the image.

Note that the regenerated image isn't exactly the same as your original – merely a (close) approximation. If you re-save the image as a jpeg many times – with varying compression (and therefore box size), the image will get further and further away from the original.


TIFF

A TIFF is an image format that is not used to display images on the web, but is used to transfer images accurately. It can handle all the colours of a JPEG with the accuracy of a GIF. The down side to this is that the file size does tend to be larger. The up side of this is that you can save a TIFF exactly as it was before without loss of data between each generation, and that TIFFs store photographic images actually better than a JPEG could.

Storing an image as a TIFF is much the same as you would a GIF – just with a larger defined colour table. TIFF is also a much-extendable format. This means that should you want to define an image in your own way – it allows you to do so. It also allows for a large variety of complex compression algorithms – which can help reduce file size.


Examples:

Included are exaggerated examples of the suitability of GIFs or JPEGs for different images. TIFFs would match the best in every case – and would beat the JPEG for photographic images.

Example 1: Logo


logo.gif


logo.jpg

Example 2: Photo


photo.gif


photo.jpg

Comparison table of uses for GIF, JPEG and TIFF

Suitable for

GIF
(*.gif)
JPEG
(*.jpg/*.jpeg)
TIFF
(*.tif)
Logos Yes No Yes
Photographs No Yes Yes
Data Loss No/Variable Yes, Variable No
Web Display Yes Yes No

 


Notes

This is in no way meant to be an exacting or even truthful account of How Things Work – merely an explanation that gives some (if any) insight into the mechanics of image formats. It is intended to whet the appetite more than provide the final answer. It was generated after seeing that explanations of file formats often instantly involved delving deeply into technical specifications. I hope it helps. Should you require further input (Johnny Five is alive!), see the following links:

(c)Gwydion Gruffudd (5/1/2004)

 

 

 

 

| Home | B3ta | Me | Contact | Flash and stuff | Links |

Site design © Sam Cartwright 2004