Yesterday, I found myself in need of a TIFF image splitter. Reason: The scanner in my office didn’t allow me to retain the settings after a document scan. So every time I choose to ‘End Scanning’, the output image quality setting is reset to default. Which means I need change the output settings for each of document that I want to scan. The only way was to scan all documents at one go. That lead me to the situation where a TIFF splitter is needed.
A quick Google-search brought me to a simple, free and open-source program called Tiff Splitter. It is exactly the kind of program that I need. It is simple, and does its job well.
Because it does the job well, I become interested to find out how exactly Tiff Splitter works. Since it’s an open-source, I can immediately take a dive and learn.
After we click, select an input file, or, drop a file, the event handlers will eventually call processFile method.
private void processFile(string fileName)
{
// ... SNIP ...
_configs.inputFile = fileName;
// Configure the extractor
RetObj retObj = TiffSplitCode.Prepare(_configs, out numOfPages);
// ... SNIP ...
// Let user select pages to extract
PageSelection ps = new PageSelection(numOfPages);
ps.ShowDialog();
_configs.fromPage = ps.PageFrom;
_configs.toPage = ps.PageTo;
_configs.doOverwrite = ps.OverwriteFiles;
// ... SNIP ...
// do the work in separate thread
backgroundWorkerSplit.RunWorkerAsync(_configs);
}
On the worker thread it will do the splitting
private void backgroundWorkerSplit_DoWork(object sender, DoWorkEventArgs e)
{
ConfObj input = e.Argument as ConfObj;
RetObj retObj = TiffSplitCode.Split(input, _updateProgress);
e.Result = retObj;
}
The actual work was done by TiffSplitCode
public static RetObj Split(ConfObj input, UpdateProgress updateProgress)
{
int numOfPages = input.toPage - input.fromPage + 1;
// save each image
for (int i = 0; i < numOfPages; i++)
{
_coder.Save(input.fromPage - 1 + i);
// ... SNIP ...
}
// ... SNIP ...
}
Hmm, it looks simple. But who is _coder? How does it able to distinguish PDF, TIFF or JPG? Different format definitely requires different treatment right (or so I thought) ?
The secret is in Prepare and CoderFactory method.
public static RetObj Prepare(ConfObj input, out int numOfPages)
{
// .. Snipped: Validate input file ..
// overwrite output type if the input is PDF
if (ext.ToUpper() == ".PDF")
input.outputType = OutputType.PDF;
// create output coder
_coder = CoderFactory(input.outputType);
// open file
numOfPages = _coder.LoadImage(input.inputFile);
// prepare
_coder.Prepare(input);
return retobj;
}
static private ICoder CoderFactory(OutputType type)
{
switch (type)
{
case OutputType.TIF:
return new TiffCoder();
case OutputType.JPG:
return new JpegCoder();
case OutputType.PDF:
return new PDFCoder();
default:
throw new Exception("Unknown output format.");
}
}
So now it's clear that the actual loading and splitting in classes which implement ICoder interface. So for Tiff, the work is done by TiffCoder
public class TiffCoder : ICoder
{
private Image _image;
private FrameDimension _dim;
// ... SNIP ...
public int LoadImage(string fileName)
{
_inputImageName = fileName;
_image = Image.FromFile(fileName);
Guid guid = _image.FrameDimensionsList[0];
_dim = new FrameDimension(guid);
return _image.GetFrameCount(_dim);
}
public void Save(int pageNum)
{
_image.SelectActiveFrame(_dim, pageNum);
string outputFileName = null;
// ... SNIP: Output file name
if (!_config.doOverwrite)
{
outputFileName = HelperMethods.ModifyFileName(outputFileName);
}
_image.Save(outputFileName);
}
// ... SNIP ...
}
Well that's interesting. The program actually uses .NET's System.Drawing.Image instead of some external library to handle TIFF. As a dig deeper, I found out that System.Drawing is actually a managed interface to Windows' native library, GDI+ (mind blown!)
I'll stop my exploration here, perhaps in the future I'll have the motivation to dig deeper than today. For more reading please check the following references:
- System.Drawing.Image source code
- brief introduction to GDI
- Microsoft GDI+ page