Chapter 17: PDF to Image Conversion

Contents

17.1 Page.ToImage Method

Starting with Version 2.0, AspPDF.NET is capable of converting a PDF page to an image via the method PdfPage.ToImage.

The ToImage method returns an instance of the PdfPreview object which performs the PDF-to-image conversion and generates the page image. The image can then be saved to disk, memory or an HTTP stream via the methods Save, SaveToMemory and SaveHttp, respectively. The ToImage method accepts an optional PdfParam object or parameter string argument.

Under .NET Core on Windows and Linux, instead of Save, SaveToMemory and SaveHttp, which rely on the System.Drawing.Imaging assembly not available in .NET Core, you must use the methods SaveCore, SaveToMemoryCore and SaveHttpCore, respectively. These methods' output is limited to PNG.

By default, PdfPreview generates images in PNG format. This versatile image format is ideal for the task since it supports true colors and uses artifact-free lossless compression. PNG format is fully supported by all major browsers. PdfPreview can save images in other image formats supported by the .NET framework as well, such as JPEG, BMP, GIF and TIFF. When saving to disk, the format is specified via the file extension (last 3 characters of the path.) When saving to memory or an HTTP stream, the format is specified via the optional ImageFormat argument.

The following code sample generates a PDF document from a URL, saves/reopens it, and converts page 1 of the document to an image. The saving/reopening step is necessary because the PdfPage.ToImage method can only be used on existing, not new, documents.

PdfManager objPdf = new PdfManager();
PdfDocument objDoc = objPdf.CreateDocument();
objDoc.ImportFromUrl( "http://www.persits.com/old/index.html", "scale=0.6" );

// Save and reopen as Page.Preview only works on new documents.
PdfDocument objNewDoc = objPdf.OpenDocument(objDoc.SaveToMemory());

// Create preview for Page 1 at 50% scale.
PdfPage objPage = objNewDoc.Pages[1];
PdfPreview objPreview = objPage.ToImage("ResolutionX=36; ResolutionY=36");

objPreview.SaveHttp("filename=preview.png");
Dim objPdf As PdfManager = new PdfManager()
Dim objDoc As PdfDocument = objPdf.CreateDocument()
objDoc.ImportFromUrl( "http://www.persits.com/old/index.html", "scale=0.6" )

' Save and reopen as Page.Preview only works on new documents.
Dim objNewDoc As PdfDocument = objPdf.OpenDocument(objDoc.SaveToMemory())

' Create preview for Page 1 at 50% scale.
Dim objPage As PdfPage = objNewDoc.Pages(1)
Dim objPreview As PdfPreview = objPage.ToImage("ResolutionX=36; ResolutionY=36")

objPreview.SaveHttp("filename=preview.png") ' PNG format by default

Click on the links below to run this code sample:

By default, the resultant image's width and height in pixels match the page's width and height in user units. For example, a standard US Letter page (8.5 x 11 inches or 612 x 792 user units) becomes a 612 x 792 pixel image (in case of a landscape-oriented page, the width and height of the image matches the height and width of the page, respectively.)

To set the page image to a desired size, the ResolutionX and ResolutionY parameters to the ToImage method should be used. These parameters are 72 (dpi) by default. Note that in the code sample above, these parameters are both set to 36 which makes the image dimensions half of what they would be by default. Setting these parameters to a number larger (smaller) than 72 makes the resultant image proportionally larger (smaller). The ResolutionX and ResolutionY values usually equal each other to avoid a distortion in the image's aspect ratio.

To obtain the pixel dimensions of the resultant image, use the PdfPreview object's properties Width and Height.

17.2 CMYK-to-RGB Conversion

Page images generated by the ToImage method are always in the RGB color space, but the original PDF being converted to an image may contain images and graphics in the CMYK color space, in which case they have to be converted to RGB.

To achieve reasonably good color reproduction, the ToImage method performs a series of complex non-linear color transformations based on profiles, the standard color space definitions established by the International Color Consortium (ICC). Profile-based CMYK-to-RGB conversion is a fairly slow process. If your PDF document contains large high-resolution CMYK images and performance is of essence, use the parameter FastCMYK=True which invokes a simple linear formula for CMYK-to-RGB conversion and offers some performance improvement at the expense of color-reproduction quality.

The following images demonstrate the effect of the FastCMYK parameter:

Original Document:

FastCMYK=False (default):

FastCMYK=True:

17.3 Error Log

The majority of PDF documents are self-sufficient since they embed all the fonts they use. Documents like that make it easy and straightforward for PDF viewers such as Acrobat Reader and PDF-to-image converters such as AspPDF.NET to do their jobs.

Some PDF documents, however, only reference font names and do not embed the fonts themselves. While large applications such as Acrobat have the luxury of being deployed with a whole library of fonts they can use, AspPDF.NET only contains the 14 required standard PDF fonts in it and must rely on the fonts already installed on the system in case the PDF document does not embed a certain font. Therefore, not every document can be rendered properly.

To help diagnose such issues, the PdfPreview object provides the property Log that returns a string of errors encountered during the PDF-to-image conversion process. Most of these errors are usually font-related. Error entries are separated by a pair of CRLFs. To enable error logging, the parameter Debug=True must be used:

PdfPreview objPreview = objPage.ToImage( "ResolutionX=20; ResolutionY=20; Debug=True" )
Response.Write( objPreview.Log );

A typical log string may look as follows:

Font 'Palatino-Roman' was replaced by 'Helvetica'.

Could not find external TrueType font 'ArialUnicodeMS'.

17.4 Other ToImage Parameters

A PDF page may contain a /Rotate attribute set to 90, 180 or 270 (degrees), which makes this page appear in the landscape mode or upside-down. By default, the ToImage method takes the Rotate value into account to orient the image appropriately. If, for whatever reason, the Rotate value needs to be ignored, the parameter IgnoreRotate should be set to True.

PdfPreview objPreview = objPage.ToImage( "...; IgnoreRotate=True" );

Also, some PDF pages contain /MediaBox and /CropBox attributes which are different from each other. By default, the ToImage method uses the CropBox attribute to calculate the dimensions of the resultant image, which is consistent with the behavior of all major PDF viewers. If, for whatever reason, the MediaBox attribute needs to be used instead (which usually covers a larger area than the CropBox), the parameter IgnoreCropBox should be set to True. This may cause the image to include areas of the page that are normally invisible when viewed in a PDF viewer.

PdfPreview objPreview = objPage.ToImage( "...; IgnoreCropBox=True" );

17.5 Image Extraction

As an additional bonus, the PdfPreview object is capable of extracting images from a PDF page. The method PdfPreview.ExtractImage returns a new instance of the PdfPreview object representing an image specified by a 1-based index. This image can then be saved the regular way, via the methods Save, SaveToMemory or SaveHttp. If the specified index exceeds the number of images on the page, ExtractImage returns null (C#) or Nothing (VB.NET).

The images are always saved as PNGs as this is PdfPreview's format of choice.

The following code snippet opens a PDF documents and saves all of the images from page 1 to disk:

PdfManager objPDF = new PdfManager();
PdfDocument objDoc = objPDF.OpenDocument(@"c:\path\mydoc.pdf");
PdfPage objPage = objDoc.Pages[1];
PdfPreview objPreview = objPage.ToImage();

int i = 1;
PdfPreview objImage;

while ((objImage = objPreview.ExtractImage(i++)) != null)
{
  objImage.Save(@"c:\path\image.png", false);
}
Dim objPDF As PdfManager = New PdfManager()
Dim objDoc As PdfDocument = objPDF.OpenDocument("c:\path\mydoc.pdf")
Dim objPage As PdfPage = objDoc.Pages(1)
Dim objPreview = objPage.ToImage()

Dim i As Integer = 1

Do While True
  Dim Image As PdfPreview = objPreview.ExtractImage(i)

  If Image Is Nothing Then
    Exit Do
  Else
    Image.Save("c:\path\image.png", False)
  End If

  i = i + 1
Loop

As of Version 3.6, AspPDF.NET is capable of replacing images in an existing PDF document with other images or graphics. The image extraction and image replacement features can be used jointly to shrink the overall size of a PDF document by substituting its high-resolution images with their lower-resolution versions. Image replacement is covered in detail in Section 17.8 below.

17.6 Printing

17.6.1 Individual Page Printing

The PdfPreview object offers automatic printing functionality via the method SendToPrinter. This method sends the image of a page contained in this PdfPreview object to a printer.

The SendToPrinter method accepts two arguments: the local or network name of the printer and, optionally, a parameter list adjusting the appearance of the printout. If the printer name is set to null (C#) or Nothing (VB.NET), the default printer name for the current machine is used.

The print quality is determined by the resolution of the image being printed. It is therefore recommended that the ResolutionX and ResolutionY parameters to the ToImage method be set to at least 300 or, better yet, 600.

By default, the SendToPrinter method prints the image as it is, without stretching it. If the parameter Stretch is set to True, the image is stretched to cover the entire print area. Additionaly, you can use the parameters ScaleX and ScaleY to scale the image up or down. For example, the values "ScaleX = 0.5; ScaleY= 0.5" scales the image down by 50%.

The following code sample sends page 2 of c:\path\document.pdf to the printer. The image is stretched to cover the entire print area.

PdfManager objPDF = new PdfManager();
PdfDocument objDoc = objPDF.OpenDocument(@"c:\path\document.pdf");
PdfPage objPage = objDoc.Pages[2];
PdfPreview objPreview = objPage.ToImage("ResolutionX=600; ResolutionY=600");

objPreview.SendToPrinter(@"\\192.168.1.2\HP LaserJet 6P", "Stretch=true");
Dim objPDF As PdfManager = New PdfManager()
Dim objDoc As PdfDocument = objPDF.OpenDocument("c:\path\document.pdf")
Dim objPage As PdfPage = objDoc.Pages(2)
Dim objPreview = objPage.ToImage("ResolutionX=600; ResolutionY=600")

objPreview.SendToPrinter( "\\192.168.1.2\HP LaserJet 6P", "Stretch=true" )

You may encounter the error "Access is denied" when attempting to send the image to a network printer. To avoid the error, you need to impersonate an interactive user account with the LogonUser method of the PdfManager object, as follows:

...
objPDF.LogonUser( "domain", "username", "password" );
objPreview.SendToPrinter( @"\\192.168.1.2\HP LaserJet 6P", "Stretch=true" );
...
objPDF.LogonUser( "domain", "username", "password")
objPreview.SendToPrinter( "\\192.168.1.2\HP LaserJet 6P", "Stretch=true" )

The first argument to the LogonUser method is a Windows domain and can be an empty string. The 2nd and 3rd arguments are the username and password of the account to be impersonated. The 4th argument is the login type and usually omitted.

As an alternative to the LogonUser method, a user can be impersonated via the <identity> tag in your Web.config file, as follows:

<configuration>
  <system.web>
    <identity impersonate="true" userName="username" password="password" />
  </system.web>

  ...
</configuration>

17.6.2 Document Printing

As of Version 2.7, the page printing functionality described in the previous sub-section has been expanded to support the printing of an entire document or any portion thereof, in both simplex (one-sided) and duplex (double-sided) modes.

In AspPDF.NET 2.7+, the PdfDocument object has been given its own SendToPrinter method which sends the entire document (or an arbitrary set of pages) to the printer as opposed to just an individual page. Internally, the PdfDocument.SendToPrinter method iterates through the pages of the document, creates PdfPreview objects for each of them and sends the page images to the printer one by one. If your printer supports duplex printing, this method can optionally print double-sided documents in both long-edge and short-edge binding modes.

The PdfDocument.SendToPrinter method expects the same arguments as its PdfPreview.SendToPrinter counterpart: the printer name (local or network) and a list of parameters. In addition to the parameters described above, this method also accepts parameters controlling the duplex mode, as well as the page ranges to be printed.

The Duplex parameter controls duplex printing. Duplex=1 enables duplex printing in the regular long-edge-binding mode, and Duplex=2 in the short-edge binding mode. If this parameter is set to 0 or omitted, the regular simplex (one-sided) printing is used.

The From1/To1, From2/To2, ..., FromN/ToN pairs of parameters specify the ranges of pages to be printed. Page indices are 1-based. If the specified ranges overlap, the overlapping pages will be printed multiple times. By default, the entire document is printed.

The following code prints pages 2, 5, and 7-10 of a document in a duplex long-edge-binding mode:

PdfDocument objDoc = objPdf.OpenDocument( @"c:\path\doc.pdf" );
objDoc.SendToPrinter( "Brother HL-2270DW series",
  "Stretch=true; Duplex=1; From1=2;To1=2; From2=5;To2=5; From3=7;To3=10" );
Dim objDoc As PdfDocument = objPdf.OpenDocument( "c:\path\doc.pdf" )
objDoc.SendToPrinter( "Brother HL-2270DW series", _
  "Stretch=true; Duplex=1; From1=2;To1=2; From2=5;To2=5; From3=7;To3=10" )

As of Version 2.9, the PdfDocument.SendToPrinter method also supports printer tray (or bin) selection via the Tray parameter. The Microsoft documentation defines the following values for this parameter:

First (1), upper (1), only one (1), lower (2), middle (3), manual (4), envelope (5), envelope manual (6), auto (7), tractor (8), small format (9), large format (10), large capacity (11), cassette (14), form source (15), last (15).

However, many printers use driver-specific tray values that start at 256 and up. The correct Tray values for such printers should be determined by trial and error.

As of Version 3.2.0.2, the PdfDocument.SendToPrinter method also supports printing multiple copies via the Copies parameter (1 by default.)

As of Version 3.4.0.3, the Collate parameter (False by default) is supported, which enable collation if set to True. For example, if you print two copies of a three-page document and you choose not to collate them, the pages print in this order: 1, 1, 2, 2, 3, 3. If you choose to collate, the pages print in this order: 1, 2, 3, 1, 2, 3.

Also, Version 3.4.0.3 adds two more parameters to support label printers: PaperWidth and PaperHeight. These parameters specify the paper width and height in tenths of a millimeter. For example, if your label printer uses 2 5/16" x 4" labels, the PaperWidth and PaperHeight parameters should be set to 590 and 1020, respectively. If these parameters are not used, a document may come out shrunk when printed on a label printer.

17.7 Structured Text Extraction

As of version 2.8, the PdfPreview object has been expanded to perform yet another useful task: extracting text strings from the document along with their respective coordinates. This feature enables you to know exactly where on the page a particular text item is. Regular (coordinate-less) text extraction is described in Section 9.4 - Content Extraction.

PdfPreview's TextItems property returns a collection of objects, each encapsulating the text fragment and its respective coordinates and dimensions. To avoid adding a new object to AspPDF.NET's already populous object diagram, we have retrofitted the PdfRect for the task by adding a new Text property to this object which returns the actual fragment of text in Unicode format. The existing PdfRect properties, Left, Bottom, Width and Height, return the coordinates of the lower-left corner and horizontal and vertical dimensions of the fragment, respectively.

To populate the TextItems collection, the PdfPage.ToImage method must be called with the parameter ExtractText set to a non-zero value. ExtractText can be a combination (sum) of the following flags:

  • Bit 1 (1): Enables text extraction. If this flag is not set, text extraction is not performed and the TextItems collection is empty.
  • Bit 2 (2): Sorts text fragments in the order from top to bottom, and from left to right. If this flag is not set, the text fragments in the TextItems collection appear in an arbitrary order.
  • Bit 3 (4): Glues adjacent text fragments together. If this flag is not set, a single text fragment may contain a single word, a part of the word or even a single character. Setting this flag usually combines all or most text fragments of a paragraph line into a single long string. For this flag to work, bit 2 must also be set.
  • Bit 4 (8): Does not glue adjacent text fragments if there is a space character separating them. For this flag to work, flags 2 and 3 must also be set.

The following code sample draws red outlines around all text fragments it finds on a page, as well as the order in which each fragment is encountered in the collection (as shown on the image below.)

PdfManager objPdf = new PdfManager();
PdfDocument objDoc = objPdf.OpenDocument(Server.MapPath("population.pdf"));
PdfPage objPage = objDoc.Pages[1];
PdfPreview objPreview = objPage.ToImage("extracttext=7"); // sort/glue

objPage.Canvas.LineWidth = 0.5f;
objPage.Canvas.SetFillColor(1, 0, 0);

int i = 1;

foreach(PdfRect rect in objPreview.TextItems)
{
  objPage.Canvas.SetColor(1, 0, 0);
  objPage.Canvas.SetFillColor(1, 0, 0);

  // Red outline
  objPage.Canvas.DrawRect(rect.Left, rect.Bottom, rect.Width, rect.Height);

  // Small box on top to display count
  objPage.Canvas.FillRect(rect.Left, rect.Top, 10, 5);
  objPage.Canvas.DrawRect(rect.Left, rect.Top, 10, 5);
  objPage.Canvas.DrawText(i.ToString(), "x=" + (rect.Left + 1).ToString()+";y=" +
    (rect.Top + 6).ToString() + ";color=white; size=5",
  objDoc.Fonts["Helvetica"]);

  i++;
}

String strFilename = objDoc.Save(Server.MapPath("extracttext.pdf"), false);
Dim objPdf As PdfManager = New PdfManager()
Dim objDoc As PdfDocument = objPdf.OpenDocument(Server.MapPath("population.pdf"))
Dim objPage As PdfPage = objDoc.Pages(1)
Dim objPreview As PdfPreview = objPage.ToImage("extracttext=7") ' sort/glue

objPage.Canvas.LineWidth = 0.5f
objPage.Canvas.SetFillColor(1, 0, 0)

Dim i As Integer = 1

For Each rect As PdfRect in objPreview.TextItems
  objPage.Canvas.SetColor(1, 0, 0)
  objPage.Canvas.SetFillColor(1, 0, 0)

  ' Red outline
  objPage.Canvas.DrawRect(rect.Left, rect.Bottom, rect.Width, rect.Height)

  ' Small box on top to display count
  objPage.Canvas.FillRect(rect.Left, rect.Top, 10, 5)
  objPage.Canvas.DrawRect(rect.Left, rect.Top, 10, 5)
  objPage.Canvas.DrawText(i.ToString(), "x=" + (rect.Left + 1).ToString() + _
    "; y=" + (rect.Top + 6).ToString() + ";color=white; size=5", _
  objDoc.Fonts("Helvetica"))

  i = i + 1
Next

Dim strFilename As String = objDoc.Save(Server.MapPath("extracttext.pdf"), false)

Click on the links below to run this code sample:

17.8 Image Replacement

17.8.1 Feature Overview

As of Version 3.6, AspPDF.NET is capable of replacing images in an existing PDF documents with other images or graphics. This feature is useful, among other things, for reducing the overall size of a PDF document by replacing its high-resolution images with their lower-resolution versions.

The image replacement feature is built on top of the document stitching functionality described in Section 14.1. Consider the following code:

PdfDocument objDoc2 = objPDF.OpenDocument(path);
PdfDocument objDoc1 = objPDF.CreateDocument();
objDoc1.AppendDocument(objDoc2);
objDoc1.Save(path2);

This code creates an empty new document, Doc1, and appends Doc2 to it, thus creating a document almost completely identical to Doc2. All items comprising Doc2 are copied to Doc1 during the appending operation, including images.

If a certain image in Doc2 needs to be replaced with another image, the new image needs to be opened via the Doc1 object's OpenImage method, and a mapping between the new and old images needs to be added with the method AddImageReplacement (introduced in Version 3.6):

PdfDocument objDoc2 = objPDF.OpenDocument(path1);
PdfDocument objDoc1 = objPDF.CreateDocument();

PdfImage objImage = objDoc1.OpenImage(imagePath);

objDoc1.AddImageReplacement(imageID, objImage);

objDoc1.AppendDocument(objDoc2);
objDoc1.Save(path2);

This code copies all items from Doc2 to Doc1 except the image item specified by imageID. That item is not copied and the Image object is used in its place.

The 1st argument to the AddImageReplacement method specifies the internal ID of the image item to be replaced within Doc2, and the 2nd argument is an instance of the PdfImage object to replace it with. The 2nd argument can also be an instance of the PdfGraphics object if the image needs to be replaced with a graphics as opposed to another image. The method AddImageReplacement can be called multiple times if multiple images need to be replaced.

An image ID is a string containing two numbers separated by an underscore, such as "123_0". The 2nd number is usually 0. Image IDs can be obtained via the Log property of the object representing an extracted image, or via the PdfPreview.ImageItems collection, as described below.

17.8.2 Combining Image Extraction and Image Replacement

The following code sample replaces all images in a PDF document with their resized versions. Image resizing is performed with the help of AspJpeg, another Persits component which can be downloaded from www.aspjpeg.com.

First, the code sample extracts all images from the document using the procedure described in Section 17.5 earlier in this chapter. The extracted images are then resized by 50%. Their names and image IDs are recorded in arrays. The image ID of an extracted image is returned by the Log property of the PdfPreview representing this extracted image.

Lastly, the AddImageReplacement method is called for every resized image in the array, followed by a call to AppendDocument and Save to complete the image replacement operation.

// Arrays to store image IDs and filenames
string[] ImageIDs = new string[100];
string[] Filenames = new string[100];

PdfManager objPDF = new PdfManager();

PdfDocument objOriginalDoc = objPDF.OpenDocument(Server.MapPath("twoimages.pdf"));

// Extract all images, resize, build a list of names and image IDs
PdfPage objPage = objOriginalDoc.Pages[1];
PdfPreview objPreview = objPage.ToImage();
PdfPreview objImage;

int i = 1;
while ((objImage = objPreview.ExtractImage(i++)) != null)
{
  using (MemoryStream ms = new MemoryStream( objImage.SaveToMemory() ) )
  {
    Bitmap objBitmap = new Bitmap(ms);
    Bitmap objResized = new Bitmap(objBitmap,
    new Size(objBitmap.Width / 2, objBitmap.Height / 2));
    string strImageName = String.Format("extractedimage{0}.jpg", i);

    // These lines control quality (and hence size) of JPEG output
    System.Drawing.Imaging.Encoder myEncoder = System.Drawing.Imaging.Encoder.Quality;
    EncoderParameters myEncoderParameters = new EncoderParameters(1);
    EncoderParameter myEncoderParameter = new EncoderParameter(myEncoder, 50L);
    myEncoderParameters.Param[0] = myEncoderParameter;

    objResized.Save(Server.MapPath(strImageName), GetEncoder(ImageFormat.Jpeg), myEncoderParameters);

    ImageIDs[i - 2] = objImage.Log; // Log property returns Image ID
    Filenames[i - 2] = strImageName;
  }
}

// Now perform image replacement
PdfDocument objNewDoc = objPDF.CreateDocument();

for (int n = 0; n < i - 2; n++)
{
  PdfImage objExtractedImage = objNewDoc.OpenImage(Server.MapPath(Filenames[n]));
  objNewDoc.AddImageReplacement(ImageIDs[n], objExtractedImage);
}

objNewDoc.AppendDocument(objOriginalDoc);

string strFilename = objNewDoc.Save(Server.MapPath("imagereplacement.pdf"), false);
Dim ImageIDs(200) As String
Dim Filenames(200) As String

Dim objPDF As PdfManager = New PdfManager()

Dim objOriginalDoc As PdfDocument = objPDF.OpenDocument(Server.MapPath("twoimages.pdf"))

' Extract all images, resize, build a list of names and image IDs
Dim objPage As PdfPage = objOriginalDoc.Pages(1)
Dim objPreview As PdfPreview = objPage.ToImage()

Dim i As Integer = 1
Do While True
  Dim objImage As PdfPreview = objPreview.ExtractImage(i)
  If objImage Is Nothing Then Exit Do

  Using ms As MemoryStream = new MemoryStream( objImage.SaveToMemory() )
    Dim objBitmap As Bitmap = new Bitmap(ms)
    Dim objResized As Bitmap = new Bitmap(objBitmap,
    new Size(objBitmap.Width / 2, objBitmap.Height / 2))
    Dim strImageName As String = String.Format("extractedimage{0}.jpg", i)

    ' These lines control quality (and hence size) of JPEG output
    Dim myEncoder As System.Drawing.Imaging.Encoder = System.Drawing.Imaging.Encoder.Quality
    Dim myEncoderParameters As EncoderParameters = new EncoderParameters(1)
    Dim myEncoderParameter As EncoderParameter = new EncoderParameter(myEncoder, 50L)
    myEncoderParameters.Param(0) = myEncoderParameter

    objResized.Save(Server.MapPath(strImageName), GetEncoder(ImageFormat.Jpeg), myEncoderParameters)

    ImageIDs(i - 1) = objImage.Log ' Log property returns Image ID
    Filenames(i - 1) = strImageName
    i = i + 1
  End Using
Loop

' Now perform image replacement
Dim objNewDoc As PdfDocument = objPDF.CreateDocument()

For n As Integer = 0 To i - 2
  Dim objExtractedImage As PdfImage = objNewDoc.OpenImage(Server.MapPath(Filenames(n)))
  objNewDoc.AddImageReplacement(ImageIDs(n), objExtractedImage)
Next

objNewDoc.AppendDocument(objOriginalDoc)

Dim strFilename As String = objNewDoc.Save(Server.MapPath("imagereplacement.pdf"), false)

Click on the links below to run this code sample:

This code sample reduces the size of the PDF document twoimages.pdf (included in the installation) from 117 KB to 30 KB.

17.8.3 Obtaining Image Information

In addition to the AddImageReplacement method, AspPDF.NET 3.6 also offers a way to obtain a list of images in a PDF document, including information about each image's pixel size, displacement, scaling, rotation, image ID and coordinate transformation matrix, without actually performing image extraction. This information is obtained via the PdfPreview object's ImageItems property which returns a collection of PdfRect objects, each representing an image within the PDF document. This collection is similar to that returned by the TextItems property described above.

To populate the ImageItems collection, the PdfPage.ToImage method must be called with the parameter ImageInfo set to True.

The width and height of the image are returned by the PdfRect properties Right and Top, respectively. The image ID is returned by the PdfRect property Text.

The displacement, scaling, rotation and coordinate transformation matrix values are encapsulated in a PdfParam object returned by the PdfRect property ImageInfo (introduced in Version 3.6). For example, the following code snippet displays the object ID, width, height and scaling factors for all images of a document:

PdfManager objPDF = new PdfManager();
PdfDocument objDoc = objPDF.OpenDocument(@"c:\path\doc.pdf");
string strOutput = "";

foreach( PdfPage objPage in objDoc.Pages )
{
   PdfPreview objPreview = objPage.ToImage("imageinfo=true");
   foreach( PdfRect rect in objPreview.ImageItems )
   {
      strOutput = "Object ID=" + rect.Text + "<br>";
      strOutput += "Width=" + rect.Right + "<br>";
      strOutput += "Height=" + rect.Top + "<br>";
      strOutput += "ScaleX=" + rect.ImageInfo["ScaleX"] + "<br>";
      strOutput += "ScaleY=" + rect.ImageInfo["ScaleY"] + "<br>";
   }
}

The full list of values encapsulated in the PdfParam object returned by the Rect.ImageInfo property is as follows:

Property
Meaning
shiftX
Horizontal displacement
shiftY
Vertical displacement
rotation
Rotation in degrees
scaleX
Horizontal scaling
scaleY
Vertical scaling
hasMask
1 if the image has transparency, 0 otherwise
a, b, c, d, e, f
The 6 components of the coordinate transformation matrix

Access to this information can be quite useful when an image needs to be replaced with a graphics, as discussed below.

17.8.4 Replacing Images with Graphics

As shown in the Section 17.8.2 code sample, replacing images with other images with the same aspect ratio is very straightforward. However, if the aspect ratio of the replacement image is different from that of the original image, a vertical or horizontal distortion will occur as the new image will be stretched to fill the area occupied by the old image. To avoid distortions, an image can be replaced with a graphics as opposed to another image. The graphics may have arbitrary content, including other images, drawings, text, etc.

The following code sample replaces all images in a PDF document with a red image with the caption "Image Removed" on a yellow background filling the entire area of the original image. The red image is not stretched in any way and centered inside the yellow area.

PdfManager objPDF = new PdfManager();
PdfDocument objOriginalDoc = objPDF.OpenDocument(Server.MapPath("twoimages.pdf"));

// Replace images with graphics
PdfDocument objNewDoc = objPDF.CreateDocument();
PdfImage objImage = objNewDoc.OpenImage(Server.MapPath("17_imageremoved.png"));

foreach( PdfPage objPage in objOriginalDoc.Pages )
{
  PdfPreview objPreview = objPage.ToImage("imageinfo=true");

  foreach( PdfRect rect in objPreview.ImageItems )
  {
    // Create a graphics which will host the image on the new document.
    float fWidth = rect.Right * rect.ImageInfo["ScaleX"];
    float fHeight = rect.Top * rect.ImageInfo["ScaleY"];
    PdfParam param = objPDF.CreateParam();
    param["left"] = param["bottom"] = 0;
    param["right"] = fWidth;
    param["top"] = fHeight;

    // Neutralize image coordinate transformation matrix
    param["a"] = 1 / fWidth;
    param["d"] = 1 / fHeight;
    param["b"] = param["c"] = param["e"] = param["f"] = 0;

    PdfGraphics gr = objNewDoc.CreateGraphics(param);
    gr.Canvas.SetFillColor( 1, 1, 0 ); // yellow
    gr.Canvas.FillRect( 0, 0, fWidth, fHeight );
    gr.Canvas.SetColor( 0, 0, 0 ); // black
    gr.Canvas.LineWidth = 10; // border
    gr.Canvas.DrawRect( 0, 0, fWidth, fHeight );

    PdfParam param2 = objPDF.CreateParam();
    param2["x"] = (fWidth - objImage.Width) / 2;
    param2["y"] = (fHeight - objImage.Height) / 2;
    gr.Canvas.DrawImage( objImage, param2 );

    // Replace image with graphics
    objNewDoc.AddImageReplacement( rect.Text, gr );
  }
}

objNewDoc.AppendDocument( objOriginalDoc );
string strFilename = objNewDoc.Save(Server.MapPath("replacedimage_gr.pdf"), false);
Dim objPDF As PdfManager = New PdfManager()
Dim objOriginalDoc As PdfDocument = objPDF.OpenDocument(Server.MapPath("twoimages.pdf"))

' Replace images with graphics
Dim objNewDoc As PdfDocument = objPDF.CreateDocument()
Dim objImage As PdfImage = objNewDoc.OpenImage(Server.MapPath("17_imageremoved.png"))

For Each objPage As PdfPage In objOriginalDoc.Pages
  Dim objPreview As PdfPreview = objPage.ToImage("imageinfo=true")

  For Each rect As PdfRect In objPreview.ImageItems
    ' Create a graphics which will host the image on the new document.
    Dim fWidth As Single = rect.Right * rect.ImageInfo("ScaleX")
    Dim fHeight As Single = rect.Top * rect.ImageInfo("ScaleY")
    Dim param As PdfParam = objPDF.CreateParam()
    param("left") = 0
    param("bottom") = 0
    param("right") = fWidth
    param("top") = fHeight

    ' Neutralize image coordinate transformation matrix
    param("a") = 1 / fWidth
    param("d") = 1 / fHeight
    param("b") = 0
    param("c") = 0
    param("e") = 0
    param("f") = 0

    Dim gr As PdfGraphics = objNewDoc.CreateGraphics(param)
    gr.Canvas.SetFillColor( 1, 1, 0 ) ' yellow
    gr.Canvas.FillRect( 0, 0, fWidth, fHeight )
    gr.Canvas.SetColor( 0, 0, 0 ) ' black
    gr.Canvas.LineWidth = 10 ' border
    gr.Canvas.DrawRect( 0, 0, fWidth, fHeight )

    Dim param2 As PdfParam = objPDF.CreateParam()
    param2("x") = (fWidth - objImage.Width) / 2
    param2("y") = (fHeight - objImage.Height) / 2
    gr.Canvas.DrawImage( objImage, param2 )

    ' Replace image with graphics
    objNewDoc.AddImageReplacement( rect.Text, gr )
  Next
Next

objNewDoc.AppendDocument( objOriginalDoc )
Dim strFilename As String = objNewDoc.Save(Server.MapPath("replacedimage_gr.pdf"), false)

Click on the links below to run this code sample:

Color Spaces, Patterns and Shadings IE-based HTML-to-PDF Conversion