PDF Export Application

Last modified by Marius Dumitru Florea on 2024/12/19 09:49

page_white_acrobatAdds support for exporting wiki pages to PDF on the client-side using the web browser.
TypeXAR
CategoryApplication
Developed by

XWiki Development Team

Rating
2 Votes
LicenseGNU Lesser General Public License 2.1
Bundled With

XWiki Standard (14.10+)

Compatibility

XWiki 14.2+

Installable with the Extension Manager

Description

Uses paged.js along with CSS Paged Media Module and the CSS Generated Content for Paged Media Module to export wiki pages to PDF using the browser's print to PDF feature.

This application provides:

  • a "PDF" export format on the Export Modal, replacing by default the old PDF export based on Formatting Objects Processor (FOP)
  • an improved PDF Export Options modal, that allows the user to select the PDF template
  • a default PDF template for basic needs (including support for multi-page export)
  • a template provider to help creating new PDF templates
  • an administration section to configure various things, such as the list of PDF templates the end user can select from or the tool used to generate the PDF
  • a PDF export job that renders the selected XWiki pages on the server-side in a background (daemon) thread
  • components to print web pages to PDF on the server-side using a headless Chrome web browser running inside a Docker container

History

Originally the XWiki PDF Export feature was developed to work server-side. However, as XWiki's development progressed, more and more features got implemented in JavaScript and the server-side PDF export cannot export changes done to the HTML DOM by JavaScript (that would require a JavaScript engine running on the server-side and it's not easy to integrate one that would execute any JavaScript framework properly). Thus, we've decided to rewrite the PDF export feature and this extension is the result of that. The old server-side PDF export is now deprecated.

How it works

The Front-end

  • The user opens the "Export" modal using the "More Actions > Export" page menu and then selects "PDF" from the list of export formats.
  • If the current page is a nested page (can have child pages) then the user will get the "Export Tree Modal" where they can select the pages to export. Otherwise they will get directly the "PDF Export Options" modal.
  • The user chooses the PDF export options and then clicks on the "Export" button.
  • The JavaScript click listener on the "Export" button makes an HTTP request to start the PDF export job on the back-end, passing the collected data (the list of pages to export, the PDF template, whether to generate the cover page and the table of contents, etc.); the HTTP response includes the id of the scheduled job;
  • The JavaScript code then makes subsequent HTTP requests to get the status of the PDF export job, passing the id received when the job was started, until the job ends (either successfully or failing).
  • XWiki 14.4.3+, 14.6+ The user can click on the "Cancel" button to cancel the running PDF export job; this sends an HTTP request to the back-end to stop the PDF export by setting the corresponding flag on the job status; the PDF export job won't stop immediately but as soon as it reads the cancel flag.
  • When the JavaScript code detects that the PDF export job finished (based on its status) it has two options:
    • if the job status specifies a PDF file, which is the case when the PDF is generated server-side, then it redirects the user to that file
    • otherwise it uses a hidden iframe to load the PDF template passing the id of the finished PDF export job, waits for everything to load and be ready for print then calls window.print() which opens the browser's print modal that the user can use to save the result as PDF
  • The PDF template uses the status of the PDF export job specified on the HTTP request to generate the HTML that is going to be printed to PDF
    • it uses paged.js to split the HTML content in print pages and to generate the PDF cover page, table of contents as well as the page header and footer

The Back-end

  • The PDF export job simply iterates the list of wiki pages to export (XWiki 16.10.0+ the page order from the navigation tree is respected) and renders them to HTML, collecting the results, without aggregating them (this is done later by the PDF template)
  • The rendering results are exposed on the job status (to be read by the PDF template) but they are accessible only by the user that triggered the export.
  • If the configuration says that the PDF should be generated server-side then the PDF export job uses a dedicated component to generate the PDF using a headless Chrome web browser and saves the PDF file as a temporary resource, exposing its reference on the job status.
    • The temporary resource name is a Java UUID, but the resource reference has a "fileName" parameter set by default to the title of the wiki page from where the PDF export is triggered; this "fileName" parameter appears also in the PDF temporary file URL, and is used as file name by default when downloading the PDF.
  • The PDF printer component is responsible for downloading the Docker image, creating the Docker container and connecting to the headless Chrome web browser running inside.
  • The PDF printer uses a separate browser context for each export, copying the cookies from the original request that triggered the PDF export in order to have the user authenticated
  • The PDF printer tells Chrome to open the PDF template and waits for everything to be ready before calling the Chrome API to save the web page as PDF, returning the generated PDF file to the PDF export job

Id Generator

The wiki pages included in the PDF export are rendered using a shared id generator in order to ensure that the generated identifiers (e.g. heading or image identifiers) are unique across the entire exported content. The consequence is that the generated id may be different when the page is rendered alone (e.g. in view mode) versus when it is rendered with other pages for PDF export. Suppose you have a wiki page with this heading:

= Description =

The generated id (that can be used to create a link to this section) is probably going to be HDescription when accessing the page in view mode, but it may be HDescription-1 when performing a multi-page export if one of the pages that appear before in the PDF has also a section named "Description". This means that if you have code that relies on the generated id, it might not work as expected when performing a multi-page export.

XWiki 14.10.6+, 15.1+

Note that links that target document fragments are refactored automatically to use the global id (when possible) so you don't have to care about this. Suppose the page you want to export has this content:

Check the [[description>>||anchor="HDescription"]] section below.

The link is going to be modified automatically to target HDescription-1 if the referenced section is not the first one with that name.

Firefox currently doesn't produce internal links when printing web pages to PDF. See XWIKI-20567. For this reason we recommend using Chrome for performing the PDF export, unless the generated PDF is meant only for printing on paper in which case it doesn't matter.

XWiki 14.10.6+, 15.1+

The links from the exported content can be split into internal and external links based on whether the link target is included or not in the PDF export. The following types of internal links are supported:

  • link to a fragment from the same page:
    [[label>>||anchor="fragmentId"]]
  • link to a fragment from another page that is included in the PDF export:
    [[label>>Other.Page||anchor="fragmentId"]]
  • link to a page included in the PDF export, using view mode and no query string:
    [[label>>Some.Page]]

Links that don't target the view mode or that include a query string are not considered internal, in other words, we consider that their target is not included in the PDF. Internal links are modified automatically before printing the content to PDF in order to ensure that clicking them while reading the PDF will make the PDF viewer scroll the target section into view, rather than opening the link in a web browser.

PDF Export Options

pdf-export-options.png

The "PDF Export Options" modal allows you to:

  • Select the PDF template to use. The list of PDF templates you can choose from is configured in the dedicated administration section.
  • Specify whether to generate or not:
    • the cover page
    • the table of contents
    • the header (on each printed page, except for cover page and table of contents)
    • the footer (on each printed page, except for cover page and table of contents)

Note that if either table of contents, header or footer is checked then the Paged.js JavaScript library is used for print layout which in some edge cases, for very specific content, can lead to a timeout when performing the export.

PDF Templates

Default PDF Template

This application provides a default PDF template that supports:

  • cover page, showing the title, version, last author and modification date of the wiki page from where the export was triggered (the current wiki page)
  • table of contents, showing:
      • either the headings from the exported wiki page, up to level 3, when a single wiki page is included
      • or the aggregated headings (up to level 3) from all exported wiki pages, including the wiki page titles, for multi-page export

    In both cases the table of contents shows the print page number where each heading appears, and provides internal links to them.

  • page header, showing:
    • either the title of the wiki page from where the export was triggered (for single wiki page export)
    • or the title of the wiki page that provided the content from the current print page (for multi-page export)
  • page footer, showing the print page number and count
  • XWiki 14.10.17+, 15.5.3+, 15.8+ page metadata, which is additional information related to the exported wiki pages, that can be displayed in the PDF header or footer

When multiple wiki pages are exported, the content of each wiki page starts on a new print page in the generated PDF.

Grid system

XWiki's skin is based on Bootstrap which provides a grid system that helps creating responsive, mobile-first UIs. The problem is that this grid system relies on pixel units (in order to match standard screen sizes) and so it doesn't work very well with print media (which has standard paper sizes that are not expressed in pixels). For this reason the default PDF template is redefining Bootstrap's grid system to match the standard paper sizes used when printing:

  • extra small print: anything less than A4 portrait width
  • small print: A4 portrait up to A4 landscape
  • medium print: A4 landscape up to A3 landscape
  • large print: A3 landscape and up

This means that CSS styles that target say medium screens in view mode are mapped and applied automatically when performing a medium print (e.g. A4 landscape). The PDF export uses by default the A4 (portrait) paper size which means the generated PDF should match what you see in view mode on a small screen / device.

Custom PDF Templates

You can create your own PDF Template by creating a new page and then selecting the "PDF Template" template:

pdf-template-select.png

This leads you to the creation page for your custom template:

pdf-template-edit.png

Once saved, you can see how it looks:

pdf-template-view.png

And you can perform inplace editing of the properties:

pdf-template-inplace.png

You can also inject CSS or Javascript using Skin Extension xobjects. If you edit your custom PDF Template in object mode you'll see a pre-filled SSX xobject:

pdf-template-ssx.png

Page Metadata

XWiki 14.10.17+, 15.5.3+, 15.8+

A common use case is to display additional information (metadata) about the exported wiki pages in the PDF header or footer. Let's see how we can, for instance, show the tags of the exported wiki pages in the PDF footer.

  1. The first step is to specify the metadata. Edit the Metadata field of your custom PDF template and use a script macro and the special metadata binding (which is a Map<String, String>) to set the metadata:
    {{velocity output="false"}}
    $metadata.putAll({
      'data-tags': $stringtool.join($doc.getTagList(), ', ')
    })
    {{/velocity}}

    As you can see, the output produced when rendering the Metadata field doesn't matter at all. What matters is only what you put in the metadata map. Note also that you can use the doc binding to access the current XWiki document, because the Metadata field is evaluated in the context of each of the wiki pages (XWiki documents) included in the PDF export. The specified metadata ends up in the (print preview) HTML as attributes of the heading (H1) used to display the title of the wiki page. Somethig like this:

    <h1 id="Hxwiki:Some.Page" class="wikigeneratedid" data-tags="science, technology"
       data-xwiki-document-reference="xwiki:Some.Page">
      <span>Page Title</span>
    </h1>

    For this reason it is advisable to use the data- prefix for the metadata keys (unless you want to set a known attribute of the title heading).

  2. The second step is to specify where to display the metadata. You can edit for instance the Footer field of your custom PDF template an put:
    {{html clean="false"}}
    <div>Tags: <span class="pdf-chapter-tags"></span></div>
    <div>
      <span class="pdf-page-number"></span> / <span class="pdf-page-count"></span>
    </div>
    {{/html}}

    Note that we didn't put the actual metadata value in the Footer field. We'll use CSS to show the metadata where the pdf-chapter-tags placeholder is. It's important to understand that the Footer field is rendered only once for the entire PDF export (because it serves as a template for the PDF footer), while the Metadata field can be evaluated multiple times, once for each wiki page included in the PDF export. The metadata values are then injected in the PDF footer, depending on the current wiki page (i.e. the wiki page that provided the content from the current print page).

  3. The third and the last step is to use CSS to inject the metadata value in the PDF footer (or header). Edit the Style Sheet Extension object from your custom PDF template and do something like this:
    h1[data-xwiki-document-reference] {
     /* Define a string "variable" with the value of the metadata. */
     /* See https://www.w3.org/TR/css-gcpm-3/#setting-named-strings-the-string-set-pro */
     string-set: chapter-tags attr(data-tags);
    }
    .pdf-chapter-tags:before {
     /* Display the value of the previously defined variable in the PDF footer. */
     /* See https://www.w3.org/TR/css-gcpm-3/#using-named-strings */
     content: string(chapter-tags);
    }

Save your custom PDF template and then you can use it both with single page and multi-page exports.

Skip or Reset Numbered Headings

XWiki 14.10.17+, 15.5.3+, 15.8+

When performing a multi-page export that includes pages for which Numbered Headings is enabled, the page title headings get numbered also by default (with the default PDF template), because they are part of the content. You can use the page metadata to skip the numbering of the page title headings and / or to reset the heading numbering when a page title heading is encountered:

{{velocity output="false"}}
$metadata.putAll({
  'data-xwiki-rendering-protected': 'true',
  'data-numbered-headings-start': '0',
  'style': '--numbered-headings-start: 0'
})
{{/velocity}}

Administration Section

pdf-export-adminSection.png

The administration section allows you to:

  • XWiki 14.9+ check the state of the PDF generator
  • set the list of PDF templates that the users can select from on the PDF Export Options modal; note that leaving the list of templates empty will effectively disable the browser-based PDF export; same happens if the current user doesn't have view access on none of the configured PDF templates
  • XWiki 14.9+ select and configure the PDF generator
  • XWiki 14.9+ set the page ready timeout, that is the number of seconds to wait for the web page to be ready for print before aborting the PDF export
  • XWiki 14.10+ set the maximum content size, in kilobytes (KB), that can be included in a PDF export. XWiki 15.5.1+, 14.10.15+, 15.6+ This limit is taken into account only when exporting multiple wiki pages. The default value is 5MB.
  • XWiki 14.10+ set whether to replace the old PDF export based on Apache Formatting Objects Processor (FOP) or not; the label displayed on the Export Modal depends on this setting: "PDF" vs. "PDF (Web)"
  • XWiki 14.10.3+ disable the maximum content size by setting its value to 0
  • XWiki 14.10.15+, 15.5.2+, 15.7+ specify the base URI that the remote (headless) Chrome web browser is using to access XWiki (i.e. the print preview page); the scheme (e.g. HTTP versus HTTPS) and the port number (e.g. 8080) are optioal (fall back on the values from the URL used to trigger the export), but the host (domain name or IP address) is mandatory; this configuration replaces the old "XWiki Host" configuration (export.pdf.xwikiHost);

PDF Generator

There are multiple ways in which the PDF can be generated and the application provides configuration options in a dedicated administration section (but also in xwiki.properties) to choose what's best for you.

User Browser

The first option (XWiki 14.8+ and the one used by default) is to generate the PDF using the user's own web browser, on the client side. This has the advantage that it works out of the box because it doesn't depend on any external service (like Docker or a remote headless Chrome) but it has the downside that different users (with different web browsers or different versions of the same web browser) can get different results.

XWiki <14.8

For older versions of XWiki you can opt for the client side PDF generation using the available global configuration:

# [Since 14.4.3]
# [Since 14.6RC1]
# Use the user's browser to generate the PDF instead of a headless Chrome browser instance on the server-side (Docker).
export.pdf.serverSide=false

The PDF export job request also has a property to force the client-side generation for a custom export:

#set ($pdfExportJobRequest = $services.export.pdf.createRequest())
## Tell the PDF export job we want to generate the PDF on the client side.
#set ($discard = $pdfExportJobRequest.setServerSide(false))
## The PDF export job will only render the XWiki pages on the server side. Once the job is done you'll have to redirect
## the user to the print preview page with the job id in the query string.
#set ($pdfExportJob = $services.export.pdf.execute($pdfExportJobRequest))

Chrome Docker Container

The PDF can also be generated on the server-side using a headless Chrome web browser running inside a Docker container. The application takes care of:

  • pulling the right Docker image (when not found locally)
  • creating the container and starting it (if there's no existing container available)
  • stopping the container at the end when XWiki shuts down (if the container was created by XWiki)

The requirements for this are:

  • Docker 20.10+ must be installed on the machine running XWiki (the servlet engine) if XWiki is not itself inside a Docker container (see the following section). The reason is because in this case (XWiki running outside Docker, on the same machine as the Docker daemon) the Chrome browser running inside a Docker container needs to access the XWiki instance running on the Docker host. This is possible thanks to the host-gateway magic host name that was introduced in Docker 20.10 and which we use when creating the Chrome container like this: --add-host=host.xwiki.internal:host-gateway.
  • the OS user running XWiki (e.g. "tomcat") must be allowed to use Docker (e.g. on Linux this usually means adding the user to the "docker" group so that it has access to the Docker socket)
  • internet access to pull the Docker image

Docker out of Docker

If XWiki is also running inside a Docker container then:

  • you need to bind-mount the Docker socket so that XWiki can communicate with the Docker daemon in order to manage the headless Chrome container
  • you should create a Docker network, add the XWiki container to that network and configure XWiki to use it for the headless Chrome container so that they can communicate (XWiki needs to access the Chrome container for remote debugging and the Chrome container needs to be able to load XWiki pages)
    # Tell XWiki which Docker network to use to communicate with the headless Chrome container.
    export.pdf.dockerNetwork=xwiki-network
  • you have to specify in the XWiki configuration the base URI that the Chrome container can use to access XWiki (usually the network alias of the XWiki container or its IP address):
    # The base URI that the Chrome container uses to access XWiki.
    export.pdf.xwikiURI=//xwiki-container

Note that in this case you can use an older version of Docker because being in the same network means XWiki and Chrome can talk to each other based on their network aliases or IP addresses. We don't need to rely on the magic host-gateway provided by Docker 20.10+.

Reusable Docker Container

If for some reason the machine running XWiki doesn't have internet access but it has Docker installed then you have the option to (re)use an existing Docker container with the headless Chrome web browser:

# Specify the name of the Docker container to reuse.
export.pdf.chromeDockerContainerName=headless-chrome-pdf-printer

In this case you are responsible for creating the headless Chrome container using a proper image. XWiki will be responsible for starting and stopping the Chrome container as needed. The requirements for this are:

  • Docker must be installed on the machine running XWiki (the servlet engine). No specific version of Docker is needed (from the point of view of XWiki), but you need to make sure that the Chrome container you create (for XWiki to reuse) can access the XWiki instance (specified using the export.pdf.xwikiURI configuration). Be aware that if XWiki runs on the same host as the Docker daemon (rather than inside its own Docker container) then you probably need to:
    • either set export.pdf.xwikiURI=//host.docker.internal, if you are on Windows or MacOS and have Docker 18.03+
    • or create the Chrome container with --add-host=host.xwiki.internal:host-gateway, if you are on Linux and have Docker 20.10+ (which supports the magic host-gateway)
  • the OS user running XWiki (e.g. "tomcat") must be allowed to use Docker (e.g. on Linux this usually means adding the user to the "docker" group so that it has access to the Docker socket)

If XWiki is also running inside a Docker container then check out the Docker out of Docker section above.

Remote Chrome

If you don't want to rely on Docker, or you don't want to give XWiki access to Docker for security reasons, but you still want to perform the PDF export on the server side then you also have the option to connect to a remote Chrome instance:

# Specify the Chrome host and port so that we can connect for remote debugging.
export.pdf.chromeHost=172.17.0.3
export.pdf.chromeRemoteDebuggingPort=9222
# Specify how the remote Chrome instance can access the XWiki instance in order to load XWiki pages (print preview).
export.pdf.xwikiURI=//172.17.0.2

Note that "remote" could also mean local if you use Docker containers like this:

  • run XWiki in a Docker container
  • run headless Chrome in a Docker container
  • put both containers in the same Docker network
  • configure chromeHost and xwikiURI (see above) either using the container IPs or their network aliases

Headless Chrome on Localhost

If you have Chrome installed on the server where XWiki is running then you can use it to perform the PDF export on the server side:

  1. First you need to run Chrome in headless mode like this:
    chrome --headless --remote-debugging-port=9222 --remote-allow-origins=http://localhost:9222

    You can use a different port number if you wish. Check the list of available Chrome command line switches if you want to tweak its behaviour.

    Starting with Chrome 112 there's a new and improved headless mode available that you can use like this:

    chrome --headless=new --remote-debugging-port=9222 --remote-allow-origins=http://localhost:9222

    Unfortunately the PDF export doesn't work with this mode currently because it fails to create an incognito tab:

    <throwable class="com.github.kklisura.cdt.services.exceptions.ChromeDevToolsInvocationException">
     <detailMessage>Failed to open new tab - no browser is open</detailMessage>
     <stackTrace>
       <trace>com.github.kklisura.cdt.services.impl.ChromeDevToolsServiceImpl.invoke(ChromeDevToolsServiceImpl.java:172)</trace>
       <trace>com.github.kklisura.cdt.services.invocation.CommandInvocationHandler.invoke(CommandInvocationHandler.java:87)</trace>
       <trace>com.sun.proxy.$Proxy174.createTarget(Unknown Source)</trace>
       <trace>org.xwiki.export.pdf.internal.chrome.ChromeManager.createIncognitoTab(ChromeManager.java:156)</trace>
       <trace>org.xwiki.export.pdf.browser.AbstractBrowserPDFPrinter.print(AbstractBrowserPDFPrinter.java:74)</trace>
       <trace>org.xwiki.export.pdf.browser.AbstractBrowserPDFPrinter.print(AbstractBrowserPDFPrinter.java:54)</trace>
       <trace>org.xwiki.export.pdf.internal.job.PDFExportJob.saveAsPDF(PDFExportJob.java:189)</trace>
       <trace>org.xwiki.export.pdf.internal.job.PDFExportJob.runInternal(PDFExportJob.java:121)</trace>
       <trace>org.xwiki.job.AbstractJob.runInContext(AbstractJob.java:246)</trace>
       <trace>org.xwiki.job.AbstractJob.run(AbstractJob.java:223)</trace>
       <trace>java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)</trace>
       <trace>java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)</trace>
       <trace>java.base/java.lang.Thread.run(Thread.java:829)</trace>
     </stackTrace>
    </throwable>

    We think this is a bug in the new headless mode. Note that we also tried with:

    chrome --headless=new --remote-debugging-port=9222 --remote-allow-origins=http://localhost:9222 --incognito --start-in-incognito

    without any luck.

  2. Go to Wiki Administration > Content > PDF Export section and:
    • Select "Remote Chrome" as Generator
    • Set Chrome Host to "localhost"
    • Set Chrome Remote Debugging Port to the port you used (e.g. 9222)
    • Set XWiki Host to "localhost"
    • Save. Check that the Generator status is "Available"
  3. Perform a PDF export to test the configuration.

Configuration Options

The following configuration options can be set from xwiki.properties:

# [Since 14.4.3]
# [Since 14.6RC1]
# Whether the PDF export should be performed server-side, e.g. using a headless Chrome web browser running inside a
# Docker container, or client-side, using the user's web browser instead; defaults to client-side PDF generation
# starting with 14.8
export.pdf.serverSide=false

# The host running the headless Chrome web browser, specified either by its name or by its IP address. This allows you
# to use a remote Chrome instance, running on a separate machine, rather than a Chrome instance running in a Docker
# container on the same machine; defaults to empty value, meaning that by default the PDF export is done using the
# Chrome instance running in the specified Docker container.
export.pdf.chromeHost=

# The port number used for communicating with the headless Chrome web browser.
export.pdf.chromeRemoteDebuggingPort=9222

#-# [Since 14.10.15]
#-# [Since 15.5.2]
#-# [Since 15.7RC1]
#-# The base URI that the headless Chrome browser should use to access the XWiki instance (i.e. the print preview page).
#-# The host (domain or IP address) is mandatory but the scheme and port number are optional (they default on the scheme
#-# and port number used when triggering the PDF export). Defaults to "host.xwiki.internal" which means the host running
#-# the Docker daemon; if XWiki runs itself inside a Docker container then you should use the assigned network alias,
#-# provided both containers (XWiki and Chrome) are in the same Docker network.
#-#
#-# Note that this configuration replaces the old "export.pdf.xwikiHost" configuration which is currently still taken
#-# into account as a fallback in case this configuration is not set.
export.pdf.xwikiURI = host.xwiki.internal

# The Docker image used to create the Docker container running the headless Chrome web browser.
export.pdf.chromeDockerImage=zenika/alpine-chrome:latest

# The name of the Docker container running the headless Chrome web browser. This is especially useful when reusing an
# existing container.
export.pdf.chromeDockerContainerName=headless-chrome-pdf-printer

# The name or id of the Docker network to add the Chrome Docker container to; this is useful when XWiki itself runs
# inside a Docker container and you want to have the Chrome container in the same network in order for them to
# communicate. The default value "bridge" represents the default Docker network.
export.pdf.dockerNetwork=bridge

# [Since 14.9]
# The number of seconds to wait for the web page to be ready (for print) before timing out.
export.pdf.pageReadyTimeout=60

# [Since 14.10]
# The maximum content size, in kilobytes (KB), an user is allowed to export to PDF; in order to compute the content size
# we sum the size of the HTML rendering for each of the XWiki documents included in the export; the size of external
# resources, such as images, style sheets, JavaScript code is not taken into account; 0 means no limit;
export.pdf.maxContentSize=5000

# [Since 14.10]
# The maximum number of PDF exports that can be executed in parallel (each PDF export needs a separate thread).
export.pdf.threadPoolSize=3

# [Since 14.10]
# Whether to replace or not the old PDF export based on Apache Formatting Objects Processor (FOP).
export.pdf.replaceFOP=true

Script Service

The application provides a script service that can be used to perform custom PDF exports:

## Create a PDF export job request based on the current servlet request.
#set ($pdfExportJobRequest = $services.export.pdf.createRequest())

## Customize the PDF export job request:
#set ($discard = $pdfExportJobRequest.setDocuments($documentReferenceList))
#set ($discard = $pdfExportJobRequest.setTemplate($templateDocumentReference))
#set ($discard = $pdfExportJobRequest.setWithCover(true))
#set ($discard = $pdfExportJobRequest.setWithToc(false))
#set ($discard = $pdfExportJobRequest.setWithHeader(true))
#set ($discard = $pdfExportJobRequest.setWithFooter(false))
#set ($discard = $pdfExportJobRequest.setWithTitle(true))
#set ($discard = $pdfExportJobRequest.setServerSide(true))
#set ($discard = $pdfExportJobRequest.setFileName('myCool.pdf'))

## Trigger the PDF export job and wait for it to finish.
#set ($pdfExportJob = $services.export.pdf.execute($pdfExportJobRequest))
#set ($discard = $pdfExportJob.join())

## Get the PDF file reference from the job status.
#set ($pdfExportJobStatus = $pdfExportJob.status)
#set ($pdfFileReference = $pdfExportJobStatus.getPDFFileReference())
#if ($services.resource.temporary.exists($pdfFileReference))
  #set ($pdfFileURL = $services.resource.temporary.getURL($pdfFileReference))

  ## Redirect the user to the generated PDF file.
  #set ($discard = $response.sendRedirect($pdfFileURL))
#end

Differences between view mode and PDF export

The PDF export is normally supposed to use the same styles and layout as in view mode, but it's not always possible and sometimes there are valid reasons for using different styles. Moreover, custom PDF templates can change the style or layout completely. Here are some known differences that you may notice:

  • Images that are generated from wiki syntax and have their width or height specified are by default resized on the server-side in view mode; this is disabled by default during PDF export because the generated PDF should look good even when zoomed, and zooming PDFs is way more common than zooming web pages. The downside is that the generated PDF can be large, if it includes large images.
  • Body background color is not preserved.
  • Box shadows are dropped, because they don't look good in PDFs.
  • Table content is hyphenated, in order to avoid overflowing the limited print page width.
  • Code blocks use a smaller font size (80%) and long code lines wrap.
  • Multi-column (e.g. table) layout is not perfectly preserved due to some limitations in the JavaScript library we use for splitting the content in print pages.
  • The limited width of a print page can trigger the responsive layout, as if you're viewing the wiki page on a smaller screen. See also the Grid System.
  • Each canvas found in the exported content is replaced by an image generated from that canvas, which means the PDF might not look perfect when zoomed.
  • Internal links (anchors) behave as external links in PDFs generated from Firefox (a Firefox limitation that should be fixed in the future).
  • The generated PDF doesn't preserve all the accessibility features available in view mode (e.g. image alternative text may be lost, but this is a limitation of the web browser used to generate the PDF).

Adapt the content for PDF export

If you want some parts of the exported content to be excluded from the PDF export then the easiest is to use CSS. Just mark the content that should be excluded:

(% class="hidden-print" %)
Some content that should not appear in PDF

If you want to go one step further you can also create a wiki rendering macro with this code:

(% class="hidden-print" %)(((
{{wikimacrocontent/}}
)))

that could be used like this:

{{hiddenPrint}}
Content that is excluded from PDF export.
{{/hiddenPrint}}

If you want to replace some parts of the content for the PDF export then the easiest is to user server-side scripting:

{{velocity}}
#if ($xcontext.action == 'export')
  Content for PDF export.
#else
  Content for standard page view.
#end
{{/velocity}}

The downside of this approach is that the content becomes harder to edit in WYSIWYG edit mode (you're forced to edit the source wiki syntax). Alternatively you can rely on CSS:

  • first you need to mark the content that should be replaced and the replacement:
    (% class="hidden-print" %)
    Content that should be replaced in PDF export.

    (% class="hidden-view" %)
    Replaces the previous content in PDF export.
  • then you need to add CSS to your skin or color theme, or use a global style sheet extension:
    @media screen {
     body:not([contenteditable]) #xwikicontent:not([contenteditable]) .hidden-view {
       display: none;
      }
    }

Note that the CSS makes sure the replacement is visible when editing in WYSIWYG edit mode, otherwise it becomes hard to update the replacement content. At the same time this can create confusion because the replacement content "disappears" on save and view.

If you have a custom PDF template, and you want to adapt the content only when that specific PDF template is used then you can rely on server-side scripting:

{{velocity}}
#set ($expectedTemplateReference = $services.model.resolveDocument('Some.CustomPDFTemplate'))
#set ($actualTemplateReference = $services.job.getCurrentJobStatus(['export', 'pdf']).request.template)
#if ($expectedTemplateReference.equals($actualTemplateReference))
  Using my custom PDF template.
#else
  Either no PDF export or using another PDF template.
#end
{{/velocity}}

Finally, you can also modify the content before the PDF is generated, using JavaScript:

require(['xwiki-page-ready'], function(pageReady) {
 if (window.XWiki?.contextaction === 'export') {
   // Make synchronous changes to the content before PDF export.
   document.querySelectorAll('#xwikicontent p').forEach(paragraph => {
      paragraph.prepend('\u00B6 ');
    });

   // You can also delay the PDF export until some asynchronous action is done.
   pageReady.delayPageReady(new Promise((resolve, reject) => {
     // Not an asynchronous operation here but you get the idea.
     document.querySelectorAll('#xwikicontent h1').forEach(heading => {
        heading.prepend('\u00A7 ');
      });
     // Don't forget to call resolve() otherwise the PDF export timesout.
     resolve();
    }), 'adjustments for PDF export');
  }
});

You can put the JavaScript code in a JSX that loads globally, if you want those changes to be applied independent of the selected PDF template. Alternatively you can add the JSX to a specific PDF template and mark it as loaded on demand. The PDF export will take care of loading it.

Performance

The time spent to generate the PDF export is influenced by multiple factors:

  • the hardware on which the PDF export runs and the amount of resources (CPU, memory) allocated to XWiki and the web browser generating the PDF
    • when using the user's own web browser to generate the PDF, half of the work is done server-side (to render the wiki pages to HTML) and the other half is done client-side (to load the HTML into a DOM document, generate the print preview and finally the PDF)
    • when using a headless Chrome the entire work is done server-side, but again split between XWiki (to generate the HTML) and the headless Chrome (to generate the PDF)
  • the content size (the size of the DOM document built from aggregating the HTML produced from all the exported wiki pages, i.e. the number of DOM tree nodes plus the text size)
  • the number of HTTP requests made by the exported content, e.g. a wiki page with lots of images or any asynchronously loaded data will be slower to export than a wiki page containing mostly text
  • the content structure; tables are much harder to layout than paragraphs or lists, so a wiki page with many or long tables will take more time than a wiki page of the same size without tables)
  • the CSS (style) complexity; a wiki page with less or no custom CSS should be faster to export
  • the JavaScript code executed on load; the PDF export waits for this JavaScript code to be executed; a page with less JavaScript code should be faster to export
  • the browser used to generate the PDF and its version
    • future browser updates could decrease or increase the time
    • Firefox is currently significantly slower than Chrome
    • using the user's own web browser is slower than a headless Chrome because it needs to display the print preview modal which takes additional time (this step is skipped when using a headless Chrome)
    • using a headless Chrome on the same host as XWiki is faster than using a headless Chrome on a different host (network delays)
  • the JavaScript library (paged.js) used to lay out the exported content and split it in print pages; future upgrades of this library should bring performance improvements, but regressions are also possible

Troubleshooting

Failed PDF Export

If the PDF export fails then you should first check if the PDF export job starts:

  • open the Network tab from the browser's developer's tools
  • reload the page you want to export as PDF
  • open the Export modal and choose "PDF", then select the pages to export and click "Export"
  • clear the request log from the Network tab
  • click on "Export" (from PDF Export Options modal) and check the HTTP request log

Normally, you should see:

  • a first request to /xwiki/bin/get/PageToExport/ that schedules the PDF export job and returns the job status as JSON:
    {"id":["export","pdf","1663658402005-493"],"state":"NONE","canceled":false,"progress":{"offset":0.0}}
    • if this request fails then it probably means that the PDF export job didn't start. Check the HTTP response (might include a stacktrace) and the request parameters (see if they look normal)
  • once the PDF export job is scheduled the front-end starts making HTTP requests to fetch the job status until the job finishes; thus you should see multiple requests like this:
    /xwiki/bin/get/PageToExport/?outputSyntax=plain&sheet=XWiki.PDFExport.WebHome&data=jobStatus&jobId=export%2Fpdf%2F1663658402005-493

    The response is the job status as JSON:

    {
     "id":["export","pdf","1663658402005-493"],
     "state":"FINISHED",
     "canceled":false,
     "progress":{"offset":1.0},
     "pdfFileURL":"/xwiki/tmp/export/document%3Axwiki%3APageToExport.WebHome/pdf/9888d576-2858-4209-af26-5e88d9a1ebab.pdf",
     "failed":false
    }

    If the job failed then you should see a failed: true in the JSON.

  • At the end, if the PDF export job is successful then you should see a request to the generated PDF file that looks like this:
    /xwiki/tmp/export/document%3Axwiki%3APageToExport.WebHome/pdf/9888d576-2858-4209-af26-5e88d9a1ebab.pdf

If the PDF export job started but failed then you should check the job log that you can find in:

<permanentDirectory>/jobs/status/export/pdf/<timestamp>/log.xml

<permanentDirectory> is the configured permanent directory, while for <timestamp> you should either check the most recent log or take the timestamp from the front-end HTTP requests (jobId parameter). Inside log.xml file you should look for a Java stacktrace, close to the end of the file.

If the job file doesn't include enough information to explain the problem then you should enable debug logs:

  • go to the Logging administration section
  • filter loggers by org.xwiki.export.pdf
  • set log level to debug for the first entry
  • perform again the PDF export and check the new job log, it should contain more detailed information

Unexpected PDF Output

If the generated PDF doesn't match your expectations in terms of layout, styling or content then the following steps may help you investigate the problem:

  • switch to the User Browser (PDF) generator from the PDF Export administration section (if you’re not using it already)
  • open the page to export
  • open the DOM inspector (using web browser's developer tools)
  • export the page to PDF
  • wait for the web browser's print preview modal to appear and then check the iframe element added at the end of BODY element; open its source URL in a new browser tab (e.g. right click and select to open in new tab)
  • inspect the print preview page to understand the bad layout / styling / content; if you modify your (custom) PDF template you can simply reload the print preview page to check the new output; if you modify the exported content (page) then you need to redo the steps (i.e. redo the export) because the exported content is cached

If you're using a headless Chrome to perform the export (doesn't matter if it runs in a Docker container or not) and you want to inspect the PDF output on that specific Chrome instance then you need to connect to it from your own Chrome instance following the Chrome remote debugging documentation. Once connected you can do the previous steps on the remote Chrome, or you can load directly on the remote (headless) Chrome instance the print preview page that you got on your local Chrome instance.

Prerequisites & Installation Instructions

We recommend using the Extension Manager to install this extension (Make sure that the text "Installable with the Extension Manager" is displayed at the top right location on this page to know if this extension can be installed with the Extension Manager). Note that installing Extensions when being offline is currently not supported and you'd need to use some complex manual method.

You can also use the following manual method, which is useful if this extension cannot be installed with the Extension Manager or if you're using an old version of XWiki that doesn't have the Extension Manager:

  1. Log in the wiki with a user having Administration rights
  2. Go to the Administration page and select the Import category
  3. Follow the on-screen instructions to upload the downloaded XAR
  4. Click on the uploaded XAR and follow the instructions
  5. You'll also need to install all dependent Extensions that are not already installed in your wiki

See the different export modes that exist and that can be configured. You should also check the requirement matching the export mode you've chosen to use (or the default mode's requirement if you haven't changed any configuration).

Dependencies

Dependencies for this extension (org.xwiki.platform:xwiki-platform-export-pdf-ui 16.10.1):

Get Connected