18.1 OpenUrl Method: Overview
18.2 Pagination
18.3 Hyperlinks
18.4 Miscellaneous Parameters
18.5 IE Compatibility Mode
18.1 OpenUrl Method: Overview
18.1.1 OpenUrl vs. ImportFromUrl
As of Version 2.9, AspPDF.NET expands its HTML-to-PDF functionality
by adding a new method, OpenUrl, to the PdfDocument object. Like the ImportFromUrl method
described in Chapter 15, OpenUrl helps convert an arbitrary URL, or an HTML string,
to a PDF document, but it does so differently.
While ImportFromUrl relies on our own in-house HTML-rendering engine, OpenUrl delegates the job to
Microsoft Internet Explorer by connecting to the IE WebBrowser object residing in the MSHTML.dll library.
Harnessing the robust IE engine helps achieve the perfect PDF snapshot of any HTML document, no matter how complex, whereas
the ImportFromUrl method is only capable of rendering basic HTML.
However, unlike ImportFromUrl, the OpenUrl method produces a rasterized image of the HTML document.
This method's output is essentially a static bitmap convertible to PDF. The scalability and searchability of the original
document is not preserved.
18.1.2 OpenUrl Usage
The PdfDocument.OpenUrl method expects four arguments: a URL or HTML string, a list of parameters, and a username and password.
The first argument is required, the others are optional. The method returns an instance of the PdfImage
object representing the image of the specified URL or HTML string. This image can then
be drawn on a PDF page via the PdfCanvas.DrawImage method in an arbitrary position and at an arbitrary scale.
By default, the entire web document is converted to a single image. Pagination is covered in the next section.
The following code sample converts our corporate web site http://www.persits.com to a PDF:
C# |
void Page_Load(Object Source, EventArgs E)
{
PdfManager objPdf = new PdfManager();
// Create empty document
PdfDocument objDoc = objPdf.CreateDocument();
// Add a new page
PdfPage objPage = objDoc.Pages.Add();
// Convert www.persits.com to image
PdfImage objImage = objDoc.OpenUrl("http://www.persits.com");
// Shrink as needed to fit width
float fScale = objPage.Width / objImage.Width;
// Align image top with page top
float fY = objPage.Height - objImage.Height * fScale;
// Draw image
PdfParam objParam = objPdf.CreateParam();
objParam["x"] = 0;
objParam["y"] = fY;
objParam["ScaleX"] = fScale;
objParam["ScaleY"] = fScale;
objPage.Canvas.DrawImage(objImage, objParam);
// Save document, the Save method returns generated file name
string strFilename = objDoc.Save(Server.MapPath("iehtmltopdf.pdf"), false);
lblResult.Text = "Success! Download your PDF file <A HREF=" + strFilename + ">here</A>";
}
|
VB.NET |
Sub Page_Load(Source As Object, E As EventArgs)
Dim objPdf As PdfManager = new PdfManager()
' Create empty document
Dim objDoc As PdfDocument = objPdf.CreateDocument()
' Add a new page
Dim objPage As PdfPage= objDoc.Pages.Add()
' Convert www.persits.com to image
Dim objImage As PdfImage= objDoc.OpenUrl("http://www.persits.com")
' Shrink as needed to fit width
Dim fScale As Single = objPage.Width / objImage.Width
' Align image top with page top
Dim fY As Single = objPage.Height - objImage.Height * fScale
' Draw image
Dim objParam As PdfParam = objPdf.CreateParam()
objParam("x") = 0
objParam("y") = fY
objParam("ScaleX") = fScale
objParam("ScaleY") = fScale
objPage.Canvas.DrawImage(objImage, objParam)
' Save document, the Save method returns generated file name
Dim strFilename As String = objDoc.Save(Server.MapPath("iehtmltopdf.pdf"), False)
lblResult.Text = "Success! Download your PDF file <A HREF=" + strFilename + ">here</A>"
End Sub
|
Click the links below to run this code sample:
http://localhost/asppdf.net/manual_18/18_simple.cs.aspx
http://localhost/asppdf.net/manual_18/18_simple.vb.aspx
18.1.3 Authentication
If the URL being opened is protected with Basic authentication, the valid username and password must be passed to the OpenUrl method
as the 3rd and 4th arguments, as follows:
PdfImage objImage = objDoc.OpenUrl( url, "", "username", "password" );
The Username and Password arguments can instead be used to pass an authentication cookie
in case both the script calling OpenUrl and the URL itself
are protected by the same user account under .NET Forms authentication.
To pass a cookie to OpenUrl, the cookie name prepended with the prefix "Cookie:" is passed
via the Username argument, and the cookie value via the Password argument.
The following example illustrates this technique.
Suppose you need to implement a "Click here for a PDF version of this page" feature
in a web application. The application is protected with .NET Forms Authentication:
<authentication mode="Forms">
<forms name="MyAuthForm" loginUrl="login.aspx" protection="All">
<credentials passwordFormat = "SHA1">
<user name="JSmith" password="13A23E365BFDBA30F788956BC2B8083ADB746CA3"/>
... other users
</credentials>
</forms>
</authentication>
The page that needs to be converted to PDF, say report.aspx, contains the button
"Download PDF version of this report" that invokes another script, say convert.aspx, which
calls OpenUrl. Both scripts reside in the same directory under the same protection.
If convert.aspx simply calls objDoc.OpenUrl( "http://localhost/dir/report.aspx", ... ),
the page that ends up being converted will be login.aspx and not report.aspx, because
AspPDF.NET itself has not been authenticated against the user database and naturally will be forwarded
to the login screen.
To solve this problem, we just need to pass the authentication cookie whose name is MyAuthForm
(the same as the form name) to OpenUrl. The following code (placed in convert.aspx) demonstrates this technique:
C# |
...
string strName = "Cookie:" + Request.Cookies["MyAuthForm"].Name;
string strValue = Request.Cookies["MyAuthForm"].Value;
// Convert URL to image
PdfImage objImage = objDoc.OpenUrl("http://localhost/dir/report.aspx", "", strName, strValue);
|
VB.NET |
...
Dim strName As String = "Cookie:" + Request.Cookies("MyAuthForm").Name
Dim strValue As String = Request.Cookies("MyAuthForm").Value
' Convert URL to image
Dim objImage As PdfImage = objDoc.OpenUrl("http://localhost/dir/report.aspx", "", strName, strValue)
|
18.1.4 Direct HTML Feed
The first argument to the OpenUrl method can be used to directly pass an HTML string instead of a URL.
The string must start with the characters "<HTML" or "<html"
to signal that the value is to be treated as an HTML text and not a URL.
The non-ASCII characters in the string must be in Unicode format, and not encoded in any way. For example:
string strText = "<html><table border><tr><th>AA</th><th>BB</th></tr><tr><td>X</td><td>Y</td></tr></table></html>";
strText = strText.Replace("AA", "Greek");
strText = strText.Replace("BB", "Chinese");
strText = strText.Replace('X', Convert.ToChar(0x03A9));
strText = strText.Replace('Y', Convert.ToChar(0x56FD));
Set Image = Doc.OpenUrl( strText )
...
The script above produces the following output:
Note that in the direct HTML feed mode, there is no "base" URL by default, so if your HTML string contains images and other objects
pointed to via their relative paths, you must also provide the base URL information via the <base> tag, as follows:
string strText = "<html><base href=\"c:\\images\\\">
...
<img src=\"logo.png\">
...
</html>";
18.2 Pagination
18.2.1 PageHeight & AspectRatio Parameters
As mentioned above, the OpenUrl method returns the snapshot of the HTML document as a single continuous image by default.
For long HTML documents spanning multiple pages, this default behavior may not be practical as multiple images representing the individual pages
of the document are needed instead.
The OpenUrl method is capable of splitting the HTML document's snapshot image into multiple pages. When used in the pagination mode,
OpenUrl generates a linked list of images. The method returns an instance of the PdfImage object which represents
the top page of the document. The subsequent images are obtained via the PdfImage.NextImage property.
This property returns the next PdfImage object in the sequence
or null if the current image is the last one in the linked list.
The pixel height of each individual page image can either be specified directly, via the PageHeight parameter,
or be computed based on the current document's page width and the desired aspect ratio specified via the AspectRatio parameter.
For example, the line
PdfImage objImage = objDoc.OpenUrl( url, "PageHeight=792" );
makes all page images (except possibly the last one) 792 pixels high.
Since the width of the document image wholly depends on the underlying HTML code and is not always known in advance,
it is often more practical to specify the page height indirectly, via an aspect ratio that matches the aspect ratio of the PDF page on which
this image is ultimately to be drawn.
For example, the line
PdfImage objImage = objDoc.OpenUrl( url, "AspectRatio=0.7727" );
makes the aspect ratio of all the page images (except possibly the last one) the same as that of the standard US Letter page (which is 8.5"/11" = 0.7727.)
The image height is computed automatically by dividing the document width, whatever it happens to be, by the specified aspect ratio value.
When drawing the images on the PDF pages, a scaling factor has to be applied to make the image occupy the entire area of the page.
The PageHeight and AspectRatio parameters are mutually exclusive. If both are specified, PageHeight is ignored.
The following code sample converts the URL http://support.persits.com/default.asp?displayall=1 to a multi-page PDF
with pagination based on the US Letter aspect ratio:
C# |
void Page_Load(Object Source, EventArgs E)
{
PdfManager objPdf = new PdfManager();
// Create empty document
PdfDocument objDoc = objPdf.CreateDocument();
PdfParam objParam = objPdf.CreateParam();
// Convert URL to image
PdfImage objImage = objDoc.OpenUrl("http://support.persits.com/default.asp?displayall=1",
"AspectRatio=0.7727");
// Iterate through all images
while( objImage != null )
{
// Add a new page
PdfPage objPage = objDoc.Pages.Add();
// Compute scale based on image width and page width
float fScale = objPage.Width / objImage.Width;
// Draw image
objParam["x"] = 0;
objParam["y"] = objPage.Height - objImage.Height * fScale;
objParam["ScaleX"] = fScale;
objParam["ScaleY"] = fScale;
objPage.Canvas.DrawImage( objImage, objParam );
// Go to next image
objImage = objImage.NextImage;
}
// Save document, the Save method returns generated file name
string strFilename = objDoc.Save( Server.MapPath("pages.pdf"), false );
lblResult.Text = "Success! Download your PDF file <A HREF=" + strFilename + ">here</A>";
}
|
VB.NET |
Sub Page_Load(Source As Object, E As EventArgs)
Dim objPdf As PdfManager = new PdfManager()
' Create empty document
Dim objDoc As PdfDocument = objPdf.CreateDocument()
Dim objParam As PdfParam = objPdf.CreateParam()
' Convert URL to image
Dim objImage As PdfImage = objDoc.OpenUrl("http://support.persits.com/default.asp?displayall=1", _
"AspectRatio=0.7727")
' Iterate through all images
While Not objImage Is Nothing
' Add a new page
Dim objPage As PdfPage = objDoc.Pages.Add()
' Compute scale based on image width and page width
Dim fScale As Single = objPage.Width / objImage.Width
' Draw image
objParam("x") = 0
objParam("y") = objPage.Height - objImage.Height * fScale
objParam("ScaleX") = fScale
objParam("ScaleY") = fScale
objPage.Canvas.DrawImage( objImage, objParam )
' Go to next image
objImage = objImage.NextImage
End While
' Save document, the Save method returns generated file name
Dim strFilename As String = objDoc.Save( Server.MapPath("pages.pdf"), False )
lblResult.Text = "Success! Download your PDF file <A HREF=" + strFilename + ">here</A>"
End Sub
|
Click the links below to run this code sample:
http://localhost/asppdf.net/manual_18/18_pages.cs.aspx
http://localhost/asppdf.net/manual_18/18_pages.vb.aspx
18.2.2 Hemming
The code sample in the previous subsection produces a paginated PDF document in which the page delimiters often fall on critical content
such as text or images, as shown below:
For cleaner cutting, the OpenUrl method can be instructed to push the bottom edge of each page upwards until it meets a relatively blank row of pixels.
For the lack of a better term, we dubbed this process "hemming", a word used by tailors. Hemming reduces the height of some or all page images somewhat.
By default, OpenUrl performs no hemming. If the Hem parameter is specified and set to a non-zero value,
OpenUrl scans the specified number of pixel rows of each page, starting with the bottom row, looking for a
row with the fewest number of pixels deviating from the white background.
Once this row is found, it is used as the new page delimiter row, and the next page begins with the row directly below it.
If Hem is set to a negative number such as -1, the entire page image is scanned in search for a suitable row.
The background color against which the pixels are compared is specified via the HemColor parameter and is usually white.
If this parameter is omitted, the predominant color for each row is computed and used as the base color instead.
The image below demonstrates the improvement in pagination if the code sample above is modified by adding the Hem and HemColor parameter,
as follows:
PdfImage objImage = objDoc.OpenUrl( "http://support.persits.com/default.asp?displayall=1", "AspectRatio=0.7727; Hem=40; HemColor=white");
18.2.3 Colored Page Breaks
As of Version 3.0, the OpenUrl method is capable of splitting a document into pages along colored horizontal delimiters contained in the document.
The parameter PageBreakColor specifies the color of the delimiter.
For example, an HTML document may contain the following construct where the page break should be:
<div style="background-color: green; width: 100%; height: 1pt"></div>
This construct appears in the document as a thin green horizontal line. Setting the PageBreakColor parameter to green, will cause
OpenUrl to create a page break right before this green line, as follows:
PdfImage objImage = objDoc.OpenUrl( strUrl, "AspectRatio=0.7727; PageBreakColor=green");
When PageBreakColor is specified, OpenUrl scans each page image from the top down looking for a row of pixels of the specified color.
The parameter PageBreakThreshold specifies what percentage of pixels in a row must be of the specified color
for this row to be considered a page break line. By default, this value is 0.8 which defines the default threshold percentage to be 80%.
18.3 Hyperlinks
The OpenUrl method is capable of preserving the hyperlinks on the HTML document being converted.
If the method is called with the parameter Hyperlinks set to True,
every image object it generates is populated with the collection of PdfRect objects,
each representing a hyperlink depicted on this image. It is your application's responsibility to draw those hyperlinks
(in the form of link annotations connected to URL actions) on the PDF pages along with the images themselves.
Annotations and actions are described in Chapter 10 - Interactive Features.
As of Version 2.9, the PdfImage object is equipped with the Hyperlinks
property which returns a collection of PdfRect objects.
The PdfRect object encalsulates the standard properties of a rectange (Left, Bottom, Top, Right, Width, Height)
and also a string property, Text. The properties (Left, Bottom) and (Width, Height)
return the coordinates of the lower-left corner of the hyperlink
relative to the lower-left corner of the image to which this hyperlink belongs, and the hyperlink's dimensions, respectively.
The Text property returns the target URL of this hyperlink.
Note that the coordinates of the hyperlinks are provided in the coordinate space of the image (with its origin in the lower-left corner, as in standard
PDF practice.)
When the image is drawn on a PDF page at a certain location (as specified by the X and Y
parameters of the DrawImage method), the hyperlink annotations must be drawn with the same X and Y displacements.
Also, if scaling is applied to the image via the ScaleX and ScaleY parameters of the DrawImage method,
the same scaling must apply to the hyperlink coordinates and dimensions as well. Failure to adjust the hyperlink coordinates
properly will result in a misalignment between the depiction of the hyperlink on the page and the actual clickable hyperlink area.
The following code sample performs hemming (described in the previous section) as well as hyperlink rendering.
Clickable hyperlinks on the PDF pages are created with the help of the PdfAnnot and PdfAction objects.
Note that the same scaling is applied to both the image and the coordinates and dimensions of the link annotations.
In addition to that, the Y-coordinate shift applied to the image is also applied to the annotations (the X-coordinate
shift is 0 in our example.)
objHyperlinkParam["x"] = objRect.Left * fScale;
objHyperlinkParam["y"] = objRect.Bottom * fScale + objParam["y"];
objHyperlinkParam["width"] = objRect.Width * fScale;
objHyperlinkParam["height"] = objRect.Height * fScale;
C# |
void Page_Load(Object Source, EventArgs E)
{
PdfManager objPdf = new PdfManager();
// Create empty document
PdfDocument objDoc = objPdf.CreateDocument();
// Parameter object for image drawing
PdfParam objParam = objPdf.CreateParam();
// Parameter object for hyperlink annotation drawing
PdfParam objHyperlinkParam = objPdf.CreateParam();
objHyperlinkParam.Set( "Type = link" );
// Convert URL to image. Enable hyperlinks. Use hemming.
PdfImage objImage = objDoc.OpenUrl( "http://support.persits.com/default.asp?displayall=1",
"AspectRatio=0.7727; hyperlinks=true; hem=50; hemcolor=white" );
// Iterate through all images
while( objImage != null )
{
// Add a new page
PdfPage objPage = objDoc.Pages.Add();
// Compute scale based on image width and page width
float fScale = objPage.Width / objImage.Width;
// Draw image
objParam["x"] = 0;
objParam["y"] = objPage.Height - objImage.Height * fScale;
objParam["ScaleX"] = fScale;
objParam["ScaleY"] = fScale;
objPage.Canvas.DrawImage( objImage, objParam );
// Now draw hyperlinks from the Image.Hyperlinks collection
foreach( PdfRect objRect in objImage.Hyperlinks )
{
objHyperlinkParam["x"] = objRect.Left * fScale;
// Y-coordinate must be lifted by the same amount as the image itself
objHyperlinkParam["y"] = objRect.Bottom * fScale + objParam["y"];
objHyperlinkParam["width"] = objRect.Width * fScale;
objHyperlinkParam["height"] = objRect.Height * fScale;
objHyperlinkParam["border"] = 0;
// Create link annotation
PdfAnnot objAnnot = objPage.Annots.Add("", objHyperlinkParam);
objAnnot.SetAction( objDoc.CreateAction("type=URI", objRect.Text) );
}
// Go to next image
objImage = objImage.NextImage;
}
// Save document, the Save method returns generated file name
string strFilename = objDoc.Save( Server.MapPath("hyperlinks.pdf"), false );
lblResult.Text = "Success! Download your PDF file <A HREF=" + strFilename + ">here</A>";
}
|
VB.NET |
Sub Page_Load(ByVal Source As Object, ByVal E As EventArgs)
Dim objPdf As PdfManager = New PdfManager()
' Create empty document
Dim objDoc As PdfDocument = objPdf.CreateDocument()
' Parameter object for image drawing
Dim objParam As PdfParam = objPdf.CreateParam()
' Parameter object for hyperlink annotation drawing
Dim objHyperlinkParam As PdfParam = objPdf.CreateParam()
objHyperlinkParam.Set("Type = link")
' Convert URL to image. Enable hyperlinks. Use hemming.
Dim objImage As PdfImage = objDoc.OpenUrl("http://support.persits.com/default.asp?displayall=1", _
"AspectRatio=0.7727; hyperlinks=true; hem=50; hemcolor=white")
' Iterate through all images
While Not objImage Is Nothing
' Add a new page
Dim objPage As PdfPage = objDoc.Pages.Add()
' Compute scale based on image width and page width
Dim fScale As Single = objPage.Width / objImage.Width
' Draw image
objParam("x") = 0
objParam("y") = objPage.Height - objImage.Height * fScale
objParam("ScaleX") = fScale
objParam("ScaleY") = fScale
objPage.Canvas.DrawImage(objImage, objParam)
' Now draw hyperlinks from the Image.Hyperlinks collection
For Each objRect As PdfRect In objImage.Hyperlinks
objHyperlinkParam("x") = objRect.Left * fScale
' Y-coordinate must be lifted by the same amount as the image itself
objHyperlinkParam("y") = objRect.Bottom * fScale + objParam("y")
objHyperlinkParam("width") = objRect.Width * fScale
objHyperlinkParam("height") = objRect.Height * fScale
objHyperlinkParam("border") = 0
' Create link annotation
Dim objAnnot As PdfAnnot = objPage.Annots.Add("", objHyperlinkParam)
objAnnot.SetAction(objDoc.CreateAction("type=URI", objRect.Text))
Next
' Go to next image
objImage = objImage.NextImage
End While
' Save document, the Save method returns generated file name
Dim strFilename As String = objDoc.Save(Server.MapPath("hyperlinks.pdf"), False)
lblResult.Text = "Success! Download your PDF file here"
End Sub
|
Click the links below to run this code sample:
http://localhost/asppdf.net/manual_18/18_hyperlinks.cs.aspx
http://localhost/asppdf.net/manual_18/18_hyperlinks.vb.aspx
18.4 Miscellaneous Parameters
18.4.1 Allowed Content
By default, the OpenUrl method instructs Internet Explorer to only display images and videos when rendering the HTML document.
The method supports six parameter, all optional, that control the type of content IE is allowed to load. These parameters are:
- Images (True by default) - instructs IE to load images;
- Video (True by default) - instructs IE to load video;
- DownloadActiveX (False by default) - instructs IE to download ActiveX controls;
- RunActiveX (False by default) - instructs IE to run ActiveX controls;
- Java (False by default) - instructs IE to enable Java applets;
- Scripts (False by default) - instructs IE to enable scripts.
18.4.2 Internal Window Dimensions
OpenUrl creates an internal window to hold the WebBrowser control. The default dimensions of this window is 100x100.
In most cases, the window is resized automatically to accommodate the HTML document. However,
if the specified URL is a frameset, the window may retain its original size which is usually not large enough for
the HTML content to fit. The parameters WindowWidth and WindowHeight enable you to
specify a window size large enough to accommodate your frameset.
18.5 IE Compatibility Mode
In most cases, the IE rendering engine runs under a "compatibility mode" by default.
Sometimes this causes serious rendering issues -- the output generated by the OpenUrl
method looks considerably different than what is displayed by a browser.
To switch the IE rendering engine to the "regular" mode, the following simple change in the registry is needed:
Under the key
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Internet Explorer\MAIN\FeatureControl\FEATURE_BROWSER_EMULATION
and, on a 64-bit server, under the key
HKEY_LOCAL_MACHINE\SOFTWARE\Wow6432Node\Microsoft\Internet Explorer\MAIN\FeatureControl\FEATURE_BROWSER_EMULATION
the following DWORD entry must be added for IIS:
Name=w3wp.exe, Value=9000 (decimal), as follows:
IIS has to be reset (iisreset command at the command prompt) for the change to take effect.