I have been hacking with RAW files a bit. There is no public official specification of the CR2 format, but with some research on Internet you can find almost everything you need to know.
CR2 files are actually EXIF / TIFF files - they have the same header and internal structure, although there are some strange, nonstandard things inside the file.
Did you know that each CR2 file, besides the raw sensor data, contains two JPEG images and one uncompressed RGB image? Each CR2 from my 5D contains a 2496 x 1664 pixel preview image in JPEG format, a 160 x 120 pixel thumbnail image and a 384 x 256 pixel uncompressed RGB data image. Those extra three images make up about 20% of the whole file.
I wonder what all those images are for. The thumbnail is probably for quick viewing on the LCD (without zooming), the preview image is used when you zoom in on the LCD and I guess the 384 x 256 pixel uncompressed image is maybe used for Direct Print functionality - so that the printer can display the image if it has an LCD screen.

