Manual
RealObjects GmbH
Version 11.4.0

PDFreactor is a registered trademark of RealObjects GmbH.

Installation

PDFreactor can be deployed in various ways:

When it is used as a Java library no further installation is required.

However, if the clients for PHP, .NET, Python, Ruby, Perl, JavaScript, Node.js, Java (the client, not the library) or the Python Command Line APIs are used, the PDFreactor Web Service is required.

For details about system requirements and information about the latest changes, please see the readme and changelog files contained within the PDFreactor installation package.

The PDFreactor Library

The PDFreactor package comes with two PDFreactor libraries:

It is generally recommended to use the pdfreactor.jar, since it not only contains PDFreactor itself but also all 3rd party libraries required by PDFreactor. This JAR Java ARchive, a file container used for Java classes. file is a stand-alone PDFreactor library. No other libraries are required.

If some of the 3rd party libraries are already installed on the server or if certain functionality is not required, the pdfreactorcore.jar can be used. It only contains PDFreactor, while required and optional 3rd party libraries are contained in the required and optional directories, which should be added to the PDFreactor class path manually depending on whether or not they are already installed on the server or their functionality is desired.

Please refer to the README.txt in the PDFreactor/libs directory for more information about the 3rd party libraries.

The PDFreactor Web Service

If PDFreactor is deployed using the PDFreactor installer, the installation provides an option to automatically install the PDFreactor Web Service with PDFreactor. No further configuration is required in this case.

The PDFreactor service is run on the application server Jetty. It is a requirement for the .NET, PHP, Perl, Python, Ruby, Java, JavaScript, Node.js and Python Command Line cients.

By default, Jetty will listen at localhost:9423.

for information on how to modify this and https://www.eclipse.org/jetty/ for further details about Jetty and ways to configure it.

On Unix and Linux platforms the separate installation of a Java VM is required. Furthermore the PDFreactor Web Service must be started manually. To do so, after extracting the archive or installing the RPM go to the bin subdirectory and use the following command to start the service:

./pdfreactorwebservice start

To stop the service, use:

./pdfreactorwebservice stop

To display whether the service is already running, use:

./pdfreactorwebservice status

PDFreactor Web Service Configuration on Windows

On Windows systems the PDFreactor Web Service is started with the Local Service account by default.

When the Web Service is started using this account, it can only access files from the local file system that the Local Service account is allowed to access. For example, files from the user's home directory cannot be read on most systems. The Web Service may or may not be able to read files from other locations on the disk depending on the system configuration. If you need the Web Service to be able to access a particular file or folder on the disk, add the Local Service user to the list of users that can access this file or folder, and enable read permissions for this user.

In production environments, you may wish to start the PDFreactor Web Service with its own distinct user account.

PDFreactor Web Service Configuration on Linux / Unix

If PDFreactor was installed using the RPM package, PDFreactor will automatically be registered as a systemd service if your system supports systemd, otherwise it will be registered as a "System V Init" script.

Installing PDFreactor through the RPM installer will create a system user called pdfreactor. The PDFreactor Web Service will be executed using this user by default.

Running PDFreactor on systems that support systemd

The PDFreactor Web Service systemd service will automatically be enabled and started by the RPM installer.

You can start, stop, restart or display the status of this service as with any other systemd service:

service pdfreactor start
service pdfreactor stop
service pdfreactor restart
service pdfreactor status

Running PDFreactor as a System V Init Service

The RPM installer will register as a "System V Init" service on systems that do not support systemd.

You can start, stop, restart or display the status of this service as with any other "System V Init" service:

/etc/init.d/pdfreactorwebservice start
/etc/init.d/pdfreactorwebservice stop
/etc/init.d/pdfreactorwebservice restart
/etc/init.d/pdfreactorwebservice status

Installing PDFreactor from a Tarball

PDFreactor is also available as tarball for systems that do not support RPM, or for users that prefer deployment from a tarball. To start the PDFreactor Web Service after unpacking the tarball, please use the bin/pdfreactorwebservice script located in the PDFreactor deployment directory, e.g.:

<user.home>/PDFreactor/bin/pdfreactorwebservice start

When the PDFreactor Web Service is started in this way, it will be run with the permissions of the user that started it. User privileges can be configured in PDFreactor/jetty/start.d/user-privileges.ini.

PDFreactor Web Service Configuration on macOS

If the "Jetty Application Server" installation component is selected in the .DMG installer, the PDFreactor Web Service will be registered as a LaunchDaemon. This LaunchDaemon will be managed by the user _pdfreactor. This user is removed automatically when PDFreactor is uninstalled again. Note that if you need PDFreactor to have access to files in your file system, you need to make sure they can be read by the _pdfreactor user.

PHP Requirements

To use PDFreactor with the PHP API a web server (e.g. Apache) with a PHP-installation (Version >4.3 or >5.0) is required.

The PDFreactor service must be running within Jetty on the same machine.

.NET Requirements

The PDFreactor .NET API requires the Microsoft .NET framework 4.0 including the latest patches.

The PDFreactor service must be running within Jetty on the same machine.

Additional Requirements for ASP.NET

The .NET framework 4.0 must be registered at your IIS Internet Information Services (https://www.iis.net/) -server.

Perl/Python/Ruby Requirements

The Perl/Python/Ruby API can be used via CGI Common Gateway Interface, a protocol for calling external software via web server (https://www.w3.org/CGI/) on your web server, or by the corresponding modules for the Apache web server (mod-python, mod-perl, mod-ruby).

The PDFreactor service must be running within Jetty on the same machine.

For specific installation requirements please have a look at the install.txt of the related client.

Integration

You can integrate PDFreactor by directly using it as a Java library, by using its .NET, PHP, Perl, Python, Ruby, JavaScript or Node.js API, or by running it on the command line.

Memory

Depending on the input documents, PDFreactor may require additional memory. Large and especially complex documents, e.g. documents containing several hundred pages or documents using a complex nested HTML structure, may require larger amounts of memory.

The exact amount of memory required depends nearly entirely on the input document. Should you run into any issues converting a document, we recommend increasing the memory to e.g. 2GB or higher before attempting another conversion. First signs of memory running short are unusal long conversion times and high CPU usage of multiple threads, even if only one document is being converted.

for how to increase the memory available to the PDFreactor Web Service.

The memory available to the PDFreactor Preview app is set to 1024m by default.

To increase the amount of memory available to the PDFreactor Preview app, you need to adapt the -Xmx1024m parameter in the file PDFreactor/bin/PDFreactor Preview.vmoptions.

To increase the memory to e.g. 2GB, change the parameter to -Xmx2048m or -Xmx2g and restart the PDFreactor Preview app.

Parallel Conversions

When doing multiple parallel PDF conversions, it is important to adapt the available memory to the number of parallel conversions.

Generally, a common document requires no more than 64MB of memory. To safely convert up to 16 of these documents in parallel, PDFreactor requires at least 1GB of memory (16 * 64MB). Keep in mind that this is merely a rule of thumb and that the amount of required memory may vary depending on the documents and integration environments.

Using the Java library

With just a few lines you can create PDFs inside your applications and servlet.

The following sample program converts https://www.realobjects.com/ to PDF and saves it as output.pdf.

import java.io.FileOutputStream;
import java.io.OutputStream;

import com.realobjects.pdfreactor.PDFreactor;
import com.realobjects.pdfreactor.Configuration;
import com.realobjects.pdfreactor.Result;

public class FirstStepsWithPDFreactor {
    public static void main(String[] args) {
        PDFreactor pdfReactor = new PDFreactor();
        // configuration settings
        Configuration config = new Configuration();
        // the input document
        config.setDocument("https://www.realobjects.com");
        // conversion result
        Result result = null;

        try {
            // render the PDF document
            result = pdfReactor.convert(config);
            byte[] pdf = result.getDocument();
            
            try (OutputStream outputStream = new FileOutputStream("output.pdf")) {
                outputStream.write(pdf);
            } catch (IOException e) {
                e.printStackTrace();
            }
        } catch (PDFreactorException e) {
            // partial result without PDF
            result = e.getResult();
            e.printStackTrace();
        }
    }
}

The API documentation for details.

Using PDFreactor in a Servlet

When used in a Servlet to generate a PDF that is returned to the client (e.g. a browser) PDFreactor can write directly to the ServletOutputStream:

ServletOutputStream out = response.getOutputStream();
response.setContentType("application/pdf");
pdfReactor.convert(config, out);
out.close();

Logging Handler

PDFreactor uses the Java Logging API to output information about its progress. A simple console logger can be created like this:

Logger pdfReactorLogger = Logger.getAnonymousLogger();
pdfReactorLogger.setLevel(Level.INFO);
pdfReactorLogger.addHandler(new DefaultHandler());
config.setLogger(pdfReactorLogger);

https://docs.oracle.com/javase/8/docs/technotes/guides/logging/

OSGi Support

PDFreactor provides support for OSGi out of the box. The Manifest of the self-contained variant of PDFreactor (pdfreactor.jar) includes all entries required to deploy it as a bundle in your OSGi environment. Only the self-contained version of PDFreactor is OSGi compatible. The non-self-contained variant of PDFreactor ("pdfreactorcore.jar" and associated libraries) does not contain appropriate Manifest entries.

Running PDFreactor Without Graphics Environment

If you are using PDFreactor on a system without a graphics environment like X11, you need to enable the headless mode of Java. This can be done by setting the appropriate Java system property. You can either set the property as a Java VM argument or you can set it inside your Java code. it is recommend to set it as early as possible, as changing it affects the entire Java VM instance. In any case it is important to set the property before PDFreactor is instantiated.

As a Java VM Argument

java -Djava.awt.headless=true

In Java Code

public class MyPDFreactorIntegration {
        // set the headless system property
        static {
            System.setProperty("java.awt.headless", "true");
        }
    
        public void createPDF() {
            PDFreactor pdfReactor = new PDFreactor()
            // ...
        }
    }

If the headless mode is not enabled on a system without a graphics environment, you might experience an error similar to this:

java.lang.InternalError: Can't connect to X11 window server using '' as the value of the DISPLAY variable

Improving Cold Start with OpenJ9 Class Data Sharing

When running PDFreactor as a command line application, the time required for the cold start of the Java JRE can be a significant portion of the total time required to convert a single document. Note that this only really applies when using PDFreactor on the command line, when running PDFreactor as a library that is part of larger application or as Web Service, the Java run-time is not likely to go through a cold start each time a PDF is converted.

To circumvent this, you could leverage the Class Data Sharing feature of the OpenJ9 runtime (see Class sharing in Eclipse OpenJ9). Creating and using a cache for shared classes will significantly improve the cold start time for the command line. This can improve conversion time up to 30% - 50% for smaller documents.

There is an example (a batch file or a shell script, depending on your installation) on how to use the OpenJ9 runtime in path-to-PDFreactor/bin/openj9. Important: It is for reference only and is not intended for productive use. You will have to edit the file and configure the path to your OpenJ9 Java executable in order to use it. You may want to use a different set of OpenJ9 parameters depending on your environment and requirements.

Using the PDFreactor Web Service

If PDFreactor is deployed using the PDFreactor installer, the installation provides an option to automatically install the PDFreactor Web Service with PDFreactor. No further configuration is required in this case.

On Unix and Linux platforms, no installer is available. Therefore, the PDFreactor Web Service must be started manually on these systems. To do so, after unzipping the PDFreactor installation archive go to the path-to-PDFreactor/bin directory and use this command to start the service:

./pdfreactorwebservice start

To stop the service, use:

./pdfreactorwebservice stop

To display whether the service is already running, use:

./pdfreactorwebservice status

Install PDFreactor Web Service as system.d service

Alternatively on systems which support system.d you can install PDFreactor as system service as follows:

After unzipping the PDFreactor installation archive go to the path-to-PDFreactor/bin directory. Then issue the following commands:

cp pdfreactor.service /etc/systemd/system
systemctl start pdfreactor.service
systemctl enable pdfreactor.service

The PDFreactor Web Service can be used by one of the clients (PHP, .NET, Python, Ruby, Java, JavaScript, Node.js and Python Command Line) or by using its REST REpresentational State Transfer API.

Checking if the Web Service is Operational

You can check if the PDFreactor Web Service is operational (i.e. if it can create PDFs) by using the method getStatus in the clients or the REST URL /status of the . If the Web Service is not working normally, an appropriate exception is thrown when using a client or the status code 503 is returned when using the REST API. In this case you should restart the PDFreactor Web Service.

Debugging start-up

If you have problems starting the PDFreactor web service, you can try to debug the start-up process using the following command:

./pdfreactorwebservice run

Asynchronous Conversions

The PDFreactor Web Service can convert documents asynchronously, meaning that the client is not required to keep an open HTTP connection to the server until the conversion is finished. While this is usually negligible when converting small documents, synchronous conversions may be very detrimental to the user experience when converting large or complex documents.

When doing asynchronous conversions, temporary files are created on the server's file system (if not configured otherwise, see ). These files are deleted when the document is retrieved by the client (except when the keepDocument property is set in the configuration). Should these documents not be retrieved, they will remain on the server until they are automatically deleted after 5 days. It is also save to remove these files via external cleanup mechanics.

Starting an Asynchronous Conversion

Converting synchronously is very simple. You send a request for conversion to the server using the convert method and receive the result object in the response. Asynchronous conversions on the other hand have to be managed by the integrating application. You can start an asynchronous conversion by using the convertAsync method. The response is a unique ID which references the conversion you just triggered. The ID is important as it is the only way to check on or retrieve the finished document from the server at a later time.

// sync
Result result = pdfReactor.convert(config);
// async
String id = pdfReactor.convertAsync(config);
// sync
Result result = pdfReactor.Convert(config);
// async
String id = pdfReactor.ConvertAsync(config);
// sync
$result = $pdfReactor->convert($config);
// async
$id = $pdfReactor->convertAsync($config);
# sync
result = pdfReactor.convert(config);
# async
id = pdfReactor.convertAsync(config);
# sync
result = pdfReactor.convert(config);
# async
id = pdfReactor.convertAsync(config);
// sync
const result = await pdfReactor.convert(config);
// async
const id = await pdfReactor.convertAsync(config);
// sync
const result = await pdfReactor.convert(config);
// async
const id = await pdfReactor.convertAsync(config);
# sync
$result = $pdfReactor->convert($config);
# async
$id = $pdfReactor->convertAsync($config);

To convert synchronously, POST your configuration to /convert

To convert asynchronously, POST your configuration to /convert/async

Not possible.

Checking the Progress

Since after the conversion is triggered you do not have any information on whether it is finished or not, your application needs to poll the progress of the conversion. This is done by using the getProgress method, which takes the conversion ID as argument. The returned object contains an indicator whether the conversion is finished, the current estimated progress in percent and a partial log, if a log level was configured.

Progress progress = pdfReactor.getProgress(id);
Progress progress = pdfReactor.GetProgress(id);
$progress = $pdfReactor->getProgress($id);
progress = pdfReactor.getProgress(id)
progress = pdfReactor.getProgress(id)
const progress = await pdfReactor.getProgress(id);
const progress = await pdfReactor.getProgress(id);
$progress = $pdfReactor->getProgress($id);

Make a GET request to /progress/{id}

Not applicable.

Retrieving the Document

After the conversion is finished, you can retrieve the document by using the getDocument method, which again takes the conversion ID as a parameter. The returned result object is the same as if you had called the convert method in the beginning, meaning that it contains the converted document.

Result result = pdfReactor.getDocument(id);

Retrieving the document causes it to be deleted from the server if not configured otherwise. See for further information.

Deleting the Document

As already mentioned, asynchronously converted documents are stored on the server to be accessible at a later point. To make managing these stored files as convenient as possible, by default the document is deleted from the server once it is retrieved for the first time, e.g. by using the method getDocument. Since this might be undesirable in certain cases, it can be prevented by setting the keepDocument property of the Configuration object to true.

config.setKeepDocument(true);
config.keepDocument = true;
$config["keepDocument"] = true;
config['keepDocument'] = True
config['keepDocument'] = true
config.keepDocument = true;
config.keepDocument = true;
$config["keepDocument"] = true;
{ "keepDocument": "true" }

Not applicable.

Once you want to remove the document from the server, call the deleteDocument method with the conversion ID as argument.

pdfReactor.deleteDocument(id);
pdfReactor.DeleteDocument(id);
$pdfReactor->deleteDocument($id);
pdfReactor.deleteDocument(id)
pdfReactor.deleteDocument(id)
pdfReactor.deleteDocument(id);
pdfReactor.deleteDocument(id);
$pdfReactor->deleteDocument($id);

Make a DELETE request to /document/{id}

Not applicable.

Using the REST API

The REST API provides application- and language-neutral access to the PDFreactor Web Service. To use a RESTful resource, your application has to open an HTTP connection to the appropriate URL.

The PDFreactor Web Service offers two REST APIs:

  • Conversion API: The conversion API is used to perform conversions.

  • Monitoring API: The monitoring API is only intended for administrators to observe the service’s load and performance.

All REST APIs are available under /service unless the service is otherwise deployed or configured. RESTful resources respond with an appropriate HTTP status code. Please see the REST API documentation for detailed information.

RESTful Conversion API

The conversion API is used to perform and manage document conversions. While the RESTful URLs are not identical to the appropriate client methods, the names are recognizable (see for a comparison).

The RESTful PDFreactor Web Service can be reached at /rest, i.e. via the URL http://localhost:9423/service/rest, unless otherwise deployed or configured. The WADL Web Application Description Language is available under http://localhost:9423/service/rest?_wadl.

The following table gives a comprehensive overview of all available RESTful resources:

RESTful Resources of the Conversion API
Resource HTTP method Description Headers
/convert POST Converts the specified document into PDF or image.
/convert/async POST Converts the specified document into PDF or image asynchronously. Location
/progress/{id} GET Checks the progress of the conversion with the given ID. Location
/document/{id} GET Retrieves the converted PDF or image.
/document/{id}/{page} GET Retrieves the specified page of a converted multi-page image.
/document/metadata/{id} GET Retrieves the metadata of the converted PDF or image.
/document/{id}/show/{fileName} GET Displays the converted PDF in the browser with the given file name.
/document/{id}/download/{fileName} GET Triggers a download of the converted PDF with the given file name.
/document/bundle POST Downloads a ZIP file containing the PDFs with the given IDs and file names.
/document/{id} DELETE Deletes the converted PDF or image from the server.
/schema GET Retrieves the JSON schema for all data models consumed or produced by the PDFreactor Web Service.
/status GET Checks if the REST service is responsive and able to convert documents.
/version GET Retrieves the version of the PDFreactor Web Service.

To convert a document using the RESTful conversion API, the following resource has to be called using the HTTP POST method:

http://localhost:9423/service/rest/convert

The PDFreactor configuration must be included in the POST data, either as JSON or XML string.

Payload

All POST resources require a payload in XML, JSON or ZIP format. Usually, the payload is the PDFreactor configuration. In case of ZIP, the payload is an asset package and contains all resources required to convert it to PDF (see ).

When doing a request, the appropriate Content-type header should be set.

XML:

<prws:configuration xmlns:prws="http://webservice.pdfreactor.realobjects.com/">
        <prws:document>https://www.realobjects.com</prws:document>
    </prws:configuration>

JSON:

{
        "document": "https://www.realobjects.com"
    }

Headers

The RESTful resources /convert/async and /progress/{id} both return a Location header, which contains the URL that should be called next.

The Location header of the /convert/async response contains the complete document URL to /progress/{id}, including the id parameter. This makes it very convenient to get the progress after triggering an async conversion. The Location header of the /progress/{id} response contains the complete document URL to /document/{id}, including the id parameter. This header is only present if the conversion is finished, so it can be used to directly access the converted document.

Data Formats

Certain resources like /convert or /progress return data in XML format by default. However, you can control the data format by either specifying appropriate Accept headers or more conveniently by appending a file extension to the REST resource. Not all file extensions are supported for all resources, and some file extensions may behave differently.

  • pdf, png, jpg, bmp, tiff, gif – Retrieves the binary data of the converted PDF or image directly. Also, the appropriate Content-Type headers are included so that you can display the PDF or image directly in the browser. These file extensions are only supported for the /convert and /document resources

  • bin – Same as above, however, the data is returned as generic binary data with content type "application/octet-stream".

  • json, xml – The data is returned in JSON or XML format.

  • txt – The data is returned as plain text. What exactly is returned depends on the resource:

    • /progress/{id}.txt returns the current estimated progress in percent

    • /version.txt returns the full version as a string

    • /convert.txt or /document/{id}.txt return the converted PDF as a base64 encoded string

To retrieve an asynchronously converted PDF from the server, use the /document resource with the conversion ID "1234" as a URL parameter like this:

http://localhost:9423/service/rest/document/1234

The resource will return a result object which includes (among other data) the converted PDF as a base64-encoded string. If no file extension is given, the data is returned in XML format. If you prefer the data in JSON format, just add the appropriate file extension to the resource:

http://localhost:9423/service/rest/document/1234.json

Sometimes it might be desirable to retrieve the PDF directly as binary data or display it in the browser. For this, simply use the "pdf" file extension:

http://localhost:9423/service/rest/document/1234.pdf

When using the convert or document resources to retrieve the binary data of the converted document directly, you can specify an image file extension like jpg even if you retrieve a PDF (and vice-versa). This is not recommended. While the returned binary data is the same, an inappropriate "Content-Type" header is set which might confuse some user agents. If you do not know whether you retrieve an image or a pdf, use the generic extension bin.

RESTful Monitoring API

The monitoring API of the PDFreactor Web Service can be reached at /monitor, i.e. via the URL http://localhost:9423/service/monitor, unless otherwise deployed or configured.

To use the monitoring API, you must configure an admin key. More information about this can be found in the section .

RESTful Resources of the Monitoring API
Resource HTTP method Description
/server GET Provides information about the server environment, amount of CPU cores, available memory, environment variables, Java system properties and the PDFreactor service. This includes all server parameters (see ) except for the admin key parameters.
/conversions GET Provides an overview of all conversions. This includes queued conversion requests, currently running conversions as well as the amount of total conversions and failed conversions.
/conversions/running GET Same as /conversions, but provides only information about running conversions.
/conversions/queued GET Same as /conversions, but provides only information about queued conversion requests.
/conversions/finished GET Shows the number of conversions that have finished since the server started.
/conversions/finished/successful GET Shows the number of conversions that have successfully finished since the server started.
/conversions/finished/failed GET Shows the number of conversions that have failed since the server started.

The monitoring API does not store any conversion information, except for the number of finished and failed conversion. Once the conversion is finished, all information about it is lost.

Asset Packages

Instead of using a simple configuration to convert an external document, the REST service also accepts an asset package in ZIP format. This package must have a configuration.xml or configuration.json file in its root directory. The content of this configuration file is a normal configuration in XML or JSON format, except that the document is specified as a URL relative to it. All other resources required by the document can also be placed in the asset package and can be linked relatively to the document.

This is an example asset package structure and configuration.

configuration.json:

{
    "document": "input.html",
    "addComments": true,
    "userStyleSheets": [
        {
            "uri": "styles/common.css"
        }
    ]
}

The configuration above points to a document that is located in the same directory as the configuration file as well as a user style sheet in the styles directory. Let's assume the content of the input document looks like this:

<html>
    <head>
        <link rel="stylesheet" href="styles/document.css">
        <script src="scripts/main.js"></script>
    </head>
    <body>
        <p>Hello World <img src="images/beach.png"></p>
    </body>
</html>

The input document also references a style sheet, a script and an image, all located in different directories. Files and directories are arbitrary, only the configuration file must be in located the root directory. All relative URLs are resolved against the root directory of the Asset Package.

With the configuration and input document above, the final package structure should look like this:

myPackage.zip
├ configuration.json
├ input.html
├ styles
│ ├ document.css
│ └ common.css
├ scripts
│ └ main.js
└ images
  └ beach.png

You could then convert this asset package to PDF using e.g. curl:

curl -X POST -H "Cache-Control: no-cache" -H "Content-Type: application/zip" --data-binary @myPackage.zip "http://localhost:9423/service/rest/convert.pdf" > result.pdf

Limitations and Restrictions

Asset packages are subject to the following limitations and restrictions:

  • Asset packages must have a configuration.json or configuration.xml file in their root directory.

  • A document in the asset package must be specified as URL relative to the configuration file.

  • All relatively linked resources must be put in the asset package.

  • No base URL can be specified in the configuration.

  • Relative URLs must not point to locations outside of the asset package.

Prioritizing Jobs

By default, the PDFreactor Web Service processes conversion jobs in FIFO order, i.e. in the same order as they arrive, although conversion times may of course vary. In addition, synchronous conversions generally have a higher priority than asynchronous ones. To prioritize certain jobs, you can specify the requestPriority configuration property. Its value determines at which position in the conversion queue the new conversion is placed. Greater values mean higher priority.

If no other priority is specified, the PDFreactor Web Service assigns the following default priorities:

  • Synchronous conversions: priority 10

  • Asynchronous conversions: priority 0

Downloading Document Bundles

To download a converted document, you can use the /document/{id} resource with the ID of the conversion. This downloads a single conversion result. However, sometimes it can be desirable to download multiple converted documents in one request. For this, you can use the /document/bundle resource. Note that this resource requires a POST request rather than GET. It returns a ZIP file containing the requested documents with file names of your choosing.

The operation will fail if at least one of the requested documents cannot be found or if the specified file names are not unique. If no file name is provided, the service will automatically generate one, by either using the documentName configuration property or the conversion ID.

This is an example POST body to download several converted documents. The name property specifies the file name.

{
    "documents": [
        {
            "id": "899159cc-7440-47e9-bd75-3c9be61bb5e3",
            "name": "November Report.pdf"
        },
        {
            "id": "a912e3e9-23b4-4821-bd1e-e72e1d2ce0b6",
            "name": "December Report.pdf"
        },
        {
            "id": "b9c643e0-5f9d-4843-a9f7-71fbb4f13c89",
            "name": "Projection Next Year.pdf"
        }
    ]
}

The resulting ZIP then contains the following files:

bundle.zip
├ November Report.pdf
├ December Report.pdf
└ Projection Next Year.pdf

Using a Client

PDFreactor can also be easily integrated in your web apps using one of the clients, i.e. PHP, .NET, Python, Perl, Ruby, Java, JavaScript, Node.js or Python Command Line. This has to be used in conjunction with the PDFreactor Web Service which is run by a Jetty web application server (see chapter ).

See also The PDFreactor Web Service for information on how to start the service.

Using PHP

To use the PDFreactor PHP API simply copy the PDFreactor.class.php to a directory of your web server where PHP is enabled.

Then include the PDFreactor.class.php with:

include("/path/to/PDFreactor.class.php");

With just a few lines you can create and show PDFs inside your PHP web application:

<?php
include("../PDFreactor.class.php");
$pdfReactor = new PDFreactor();
$config = array("document" => "https://www.pdfreactor.com");

try {
    $result = $pdfReactor->convertAsBinary($config);
    header("Content-Type: application/pdf");
    echo $result;
} catch (PDFreactorWebserviceException $e) {
    header("Content-Type: text/html");
    echo "<h1>An Error Has Occurred</h1>";
    echo "<h2>".$e->getMessage()."</h2>";
}
?>

PDFreactor methods in the PHP API docs for all available options.

PHP API specific issues

PHP Script timeout: Generally the timeout of PHP scripts is set to 30s within the php.ini. When rendering large documents this limit may be exceeded.

Using .NET

You can easily access the PDFreactor service from any .NET language. The library assembly PDFreactor.dll offers you a large subset of the Java-API and takes care of all communication with the service.

A simple usage in C# would be:

PDFreactor pdfReactor = new PDFreactor();
Configuration config = new Configuration();
config.Document = "https://www.pdfreactor.com/";

try
{
    byte[] pdf = pdfReactor.ConvertAsBinary(config);
}
catch (PDFreactorWebserviceException e)
{
    // ...
}

PDFreactor methods in the .NET API docs for all available options.

Using ASP.NET

To use the .NET API from ASP.NET Active Server Pages .NET, a framework by Mircosoft to build dynamic web sites and web applications copy PDFreactor.dll from clients\netstandard2\bin in your PDFreactor installation directory to bin in the root of your IIS-Application or to the global assembly cache.

An ASP.NET example would be:

<%@ Page Language="C#" Debug="false" %>
<%@ import namespace="RealObjects.PDFreactor.Webservice.Client" %>
<%
PDFreactor pdfReactor = new PDFreactor();
RealObjects.PDFreactor.Webservice.Client.Configuration config =
            new RealObjects.PDFreactor.Webservice.Client.Configuration();
config.Document = "https://www.pdfreactor.com/";

try
{
    byte[] result = pdfReactor.ConvertAsBinary(config);

    Response.ContentType = "application/pdf";
    Response.BinaryWrite(result);
}
catch (PDFreactorWebserviceException e)
{
    Result result = e.Result;
    Response.Write("<h1>Error During Rendering</h1>>");
    Response.Write("<h2>"+result.Error+"</h2>");
}
%>

Using Python

To use the PDFreactor Python API simply copy the PDFreactor.py to a directory of your web server where Python is enabled (by e.g. CGI or mod-python).

Then include the PDFreactor.py with:

import sys
sys.path.append("path/to/PDFreactor.py/")
from PDFreactor import *

With just a few lines you can create and show PDFs inside your Python web application:

pdfReactor = PDFreactor()
config = { "document": "https://www.pdfreactor.com" }

try:
    result = pdfReactor.convertAsBinary(config)

    # Used to prevent that newlines are converted to Windows newlines (\n --> \r\n)
    # when using Python on Windows systems
    if sys.platform == "win32":
        import os, msvcrt
        msvcrt.setmode(sys.stdout.fileno(), os.O_BINARY)

    print "Content-Type: application/pdf\n"
    sys.stdout.write(result)
except PDFreactorWebserviceException as e:
    print "Content-Type: text/html\n"
    print "<h1>Error During Rendering</h1>"
    print "<h2>"+str(e)+"</h2>"

To output the PDF directly to the browser please use the following code:

if sys.platform == "win32":
    import os, msvcrt
    msvcrt.setmode(sys.stdout.fileno(), os.O_BINARY)
    print "Content-Type: application/pdf\n"
    sys.stdout.write(result.document)

PDFreactor methods in the Python API docs for all available options.

Using Perl

To use the PDFreactor Perl API simply copy the PDFreactor.pm to a directory of your web server where Perl is enabled (by e.g. CGI or mod-perl).

Then include the PDFreactor.pm with:

require "PDFreactor.pm";

With just a few lines you can create and show PDFs inside your Perl web application:

my $pdfReactor = PDFreactor -> new();
$config = { "document" => "https://www.pdfreactor.com" };

eval {
    $result = $pdfReactor->convertAsBinary($config);

    print "Content-type: application/pdf\n\n";
    binmode(STDOUT);
    print $result;
} || do {
    my $e = $@;

    print "Content-type: text/html\n\n";
    print "<h1>Error During Rendering</h1>";
    
    if ($e->isa("PDFreactor::PDFreactorWebserviceException")) {
        print "<h2>".$e->{message}."</h2>";
    } else {
        print "<h2>".$e."</h2>";
    }
};

When outputting the PDF directly to the browser please use the following code before printing the result:

binmode(STDOUT);

PDFreactor methods in the Perl API docs for all available options.

Using Ruby

To use the PDFreactor Ruby API simply copy the PDFreactor.rb to a directory of your web server where Ruby is enabled (by e.g. CGI or mod-ruby).

Then include the PDFreactor.rb with:

require 'PDFreactor.rb'

With just a few lines you can create and show PDFs inside your Ruby web application:

pdfReactor = PDFreactor.new()
config = { document: "https://www.pdfreactor.com/" }

begin
    result = pdfReactor.convertAsBinary(config);

    print "Content-type: application/pdf\n\n"
    $stdout.binmode
    print result
rescue PDFreactorWebserviceException => e
    print "Content-type: text/html\n\n"
    puts "<h1>Error During Rendering</h1>"
    puts "<h2>#{e}</h2>"
end

When outputting the PDF directly to the browser please use the following code before printing the result:

$stdout.binmode

PDFreactor methods in the Ruby API docs for all available options.

Using Java

To use the PDFreactor Java client simply add the pdfreactor-client.jar to your Java application's class path.

With just a few lines you can create PDFs inside your Java application:

PDFreactor pdfReactor = new PDFreactor();
Configuration config = new Configuration();
config.setDocument("https://www.pdfreactor.com/");

try {
    byte[] result = pdfReactor.convertAsBinary(config);

    // handle the PDF
} catch (PDFreactorWebserviceException e) {
    System.out.println(e.getMessage());
}

PDFreactor methods in the Java API docs for all available options.

Using JavaScript/Node.js

This chapter refers to the JavaScript API that allows using PDFreactor from JavaScript in a browser. There are also:

To use the PDFreactor JavaScript API simply add the PDFreactor.js as a JavaScript to your web page or as a module in your Node.js application.

JavaScript

<script src="PDFreactor.js" />

Node.js

const PDFreactor = require('PDFreactor.js');

Because the JavaScript and Node.js clients use HTTP requests which are asynchronous by nature, the convert and all other API methods that retrieve data from the service return Promises.

With just a few lines you can create PDFs inside your web page or application:

pdfReactor = new PDFreactor();
const config = { document: "https://www.pdfreactor.com/" };

try {
    const result = await pdfReactor.convert(config);
    const pdf = result.document;
    // handle the PDF
} catch (e) {
    if (e instanceof PDFreactor.PDFreactorWebserviceError) {
        console.log(e.message);
    }
}

PDFreactor methods in the JavaScript or Node.js API docs for all available options.

Using the Python Command Line

PDFreactor features a Java based command line that uses the Java library and a Python based command line web service client which requires the PDFreactor Web Service to be running.

The Python Command Line executable is located in the PDFreactor/clients/cli directory. It can be used like this:

python pdfreactor.py

See for basic usage as the arguments are mostly identical.

Batch Processing

The Python Command Line client can be used to batch convert files by either specifying a directory on your system or using wildcards in the input file name.

python pdfreactor.py -i /directory/documents

Here all files in the /directory/documents are converted.

python pdfreactor.py -i /directory/documents/test*.html

Here all files in the /directory/documents matching the file name are converted.

Contrary to other clients, the Python Command Line client can also process file paths as input documents (in addition to URLs and content). When using file paths, the PDFreactor Web Service must be running on the same system. If not, the file paths cannot be accessed.

Asynchronous conversions are not possible using the Python Command Line client.

Custom Headers and Cookies

In certain situations it may be necessary to set custom headers and cookies to the connection from the client to the PDFreactor Web Service. This can be done with the connectionSettings.

If sticky cookies are a requirement (e.g. for load balanced scenarios), make sure to use the same instance of the connectionSettings object for each request that should use the same sticky session. PDFreactor automatically modifies the connectionSettings parameter to include all cookies from the response (and thus any potential load balancer sticky cookies).

ConnectionSettings connectionSettings = new ConnectionSettings();
connectionSettings.setHeaders(new HashMap<>());
connectionSettings.setCookies(new HashMap<>());
connectionSettings.getHeaders().put("my-header", "my-header-value");
connectionSettings.getCookies().put("my-cookie", "my-cookie-value");
pdfReactor.convert(config, connectionSettings);
ConnectionSettings connectionSettings = new ConnectionSettings()
{
    Headers = new NameValueCollection(),
    Cookies = new NameValueCollection()
};
connectionSettings.Headers.Set("my-header", "my-header-value");
connectionSettings.Cookies.Set("my-cookie", "my-cookie-value");
pdfReactor.Convert(config, connectionSettings);
$connectionSettings = array(
    "headers" => array("my-header" => "my-header-value"),
    "cookies" => array("my-cookie" => "my-cookie-value")
)
$pdfReactor->convert($config, $connectionSettings);
connectionSettings = {
    "headers": { "my-header": "my-header-value" },
    "cookies": { "my-cookie": "my-cookie-value" }
}
pdfReactor.convert(config, connectionSettings)
connectionSettings = {
    headers: { "my-header" => "my-header-value" },
    cookies: { "my-cookie" => "my-cookie-value" }
}
pdfReactor.convert(config, connectionSettings)
Note: Make sure to use symbols as property names and strings as header and cookie names and values.
const connectionSettings = {
    headers: { 'my-header': 'my-header-value' }
}
pdfReactor.convert(config, connectionSettings);
Note: Setting cookies manually is not possible in JavaScript. It is done automatically by the browser.
const connectionSettings = {
    headers: { 'my-header': 'my-header-value' },
    cookies: { 'my-cookie': 'my-cookie-value'}
}
pdfReactor.convert(config, connectionSettings);
my %connectionSettings = {
    headers => { 'my-header' => 'my-header-value' },
    cookies => { 'my-cookie' => 'my-cookie-value' }
} 
$pdfReactor->convert($config, \%connectionSettings);
Note: Make sure to use a hash reference as the connectionSettings object might be modified by PDFreactor.

Refer to the documentation of your HTTP client on how to set cookies and headers.

Not possible.

Web Service Configuration

The PDFreactor Web Service can be configured in several ways. Most commonly, as described in the chapter , you may want to increase the amount of memory available.

Increasing Memory

To increase the amount of memory available to the PDFreactor Web Service, you need to adapt the -Xmx1024m parameter in the file PDFreactor/jetty/start.d/main.ini.

To increase the memory to e.g. 2GB, change the parameter to -Xmx2048m and restart the web service.

It is recommended to adapt the memory parameter for the PDFreactor Web Service appropriately before going into production.

Increasing Maximum Threads

The number of maximum threads limits the number of parallel conversions. For machines with multiple CPU cores, this value can be increased to allow more parallel conversions. This number is automatically determined by the PDFreactor Web Service. It can also be configured manually (see the parameter threadPoolSize in ). The Jetty application server also has a configured limit of 200 maximum threads which should only be increased if absolutely necessary.

Keep in mind that more parallel conversions will result in increased memory usage. Please also see the chapter for more information.

Customizing the Server Configuration

Sometimes it may be necessary to change the host or port of the PDFreactor Web Service.

You can change the port in the following of the PDFreactor/jetty/start.d/main.ini:

…
jetty.http.port=9423
…

Usually it is recommended to run the PDFreactor Web Service on the same machine as the PDFreactor integration. This is not strictly necessary and the host for the service can be changed.

You have to remove the following line from the PDFreactor/jetty/start.d/main.ini:

…
jetty.http.host=localhost
…

This will enable the PDFreactor Web Service to be accessible from other machines. By default, the service is available under "http://localhost:9423/service".

When the PDFreactor Web Service is accessible from other hosts and if it is not secured by other means (e.g. firewalls), there are important security implications as explained in and particularly in .

If either the host or port were changed or if you use a completely custom server for the PDFreactor Web Service, you need to specify the new service URL in the constructor of the PDFreactor instance.

PDFreactor pdfReactor = new PDFreactor("http://myServer:9423/service/rest");
PDFreactor pdfReactor = new PDFreactor("http://myServer:9423/service/rest");
$pdfReactor = new PDFreactor("http://myServer:9423/service/rest");
pdfReactor = PDFreactor("http://myServer:9423/service/rest")
pdfReactor = PDFreactor.new("http://myServer:9423/service/rest")
pdfReactor = new PDFreactor("http://myServer:9423/service/rest");
pdfReactor = new PDFreactor("http://myServer:9423/service/rest");
my $pdfReactor = PDFreactor->new("http://myServer:9423/service/rest");
python pdfreactor.py -u http://myServer:9423/service/rest -i input.html

See Docker Configuration on how to specify memory, parallel conversion limits and the port when using the PDFreactor Docker image.

Accessing the Log

In addition to the possibilities mentioned in , log information is also available via the log and error properties of the Progress object. While the log property contains the conversion logs, the error property contains errors that may have occurred during the conversion and caused it to be aborted. If the conversion is not yet finished, only a partial log will be available.

Additionally, the entire log output of the Jetty application server is written into log files located in the PDFreactor/jetty/logs directory. The server log output can be configured separately using the server parameter.

Load Balancing

In high availability and high performance environments it is common to run multiple PDFreactor Web Services behind a load balancer.

When doing synchronous conversions, no additional configuration or settings are required since the request to the web service is completely stateless. When doing asynchronous conversions on the other hand, you have to make sure that all relevant requests are routed to the same web service by the load balancer. This can usually be achieved by setting a sticky cookie. Please refer to the manual of the load balancer on how exactly to handle sticky sessions. When using a client, cookies can be set using the connectionSettings parameter of the PDFreactor instance (see ).

You can set a pre-defined sticky cookie like this:

ConnectionSettings connectionSettings = new ConnectionSettings();
connectionSettings.setCookies(new HashMap<>());
connectionSettings.getCookies().put("sticky-cookie", "sticky-cookie-value");
String documentId = pdfReactor.convertAsync(config, connectionSettings);
// ...
pdfReactor.getDocument(documentId, connectionSettings);
ConnectionSettings connectionSettings = new ConnectionSettings()
{
    Cookies = new NameValueCollection()
};
connectionSettings.Cookies.Set("sticky-cookie", "sticky-cookie-value");
string documentId = pdfReactor.ConvertAsync(config, connectionSettings);
// ...
pdfReactor.GetDocument(documentId, connectionSettings);
$connectionSettings = array(
    "cookies" => array("sticky-cookie" => "sticky-cookie-value")
)
$documentId = $pdfReactor->convertAsync($config, $connectionSettings);
// ...
$pdfReactor->getDocument($documentId, $connectionSettings);
connectionSettings = {
    "cookies": { "sticky-cookie": "sticky-cookie-value" }
}
documentId = pdfReactor.convertAsync(config, connectionSettings)
# ...
pdfReactor.getDocument(documentId, connectionSettings)
connectionSettings = {
    cookies: { "sticky-cookie" => "sticky-cookie-value" }
}
documentId = pdfReactor.convertAsync(config, connectionSettings)
# ...
pdfReactor.getDocument(documentId, connectionSettings)
Note: Make sure to use symbols as property names and strings as header and cookie names and values.

Note: Setting cookies manually is not possible in JavaScript. It is done automatically by the browser.

const connectionSettings = {
    cookies: { 'sticky-cookie': 'sticky-cookie-value'}
}
const documentId = pdfReactor.convertAsync(config, connectionSettings);
// ...
pdfReactor.getDocument(documentId, connectionSettings);
my %connectionSettings = {
    cookies => { 'sticky-cookie' => 'sticky-cookie-value' }
} 
$documentId = $pdfReactor->convertAsync($config, \%connectionSettings);
# ...
pdfReactor->getDocument($documentId, \%connectionSettings);
Note: Make sure to use a hash reference as the connectionSettings object might be modified by PDFreactor.

Refer to the documentation of your HTTP client on how to set cookies and headers.

Not possible.

If the sticky cookie is set by the load balancer, you can leave the connectionSettings object empty. PDFreactor will automatically write all response cookies into the connectionSettings object so that they are part of subsequent requests.

Server Parameters

Additional configuration options for the server can be specified for the PDFreactor Web Service. These are parameters the client should not or cannot influence. They affect all conversions.

For a complete list of parameters that can be configured, please see appendix .

These server parameters can be configured in various ways:

Java System Properties

As system properties server parameters have the following form:

com.realobjects.pdfreactor.webservice.parameterName=parameterValue

To specify system properties for the PDFreactor Web Service, add them to the section "VM Arguments" in the PDFreactor/jetty/start.d/main.ini file, below the "--exec" line like this:

-Dcom.realobjects.pdfreactor.webservice.parameterName=parameterValue

The parameter name must be prefixed with com.realobjects.pdfreactor.webservice.

Servlet Init Parameters

Init parameters are specified in the PDFreactor/jetty/contexts/service.xml file. They appear similar to this:

<Call name="setInitParameter">
    <Arg>com.realobjects.pdfreactor.webservice.parameterName</Arg>
    <Arg>parameterValue</Arg>
</Call>

The parameter name should be prefixed with com.realobjects.pdfreactor.webservice.

Environment Variables

Another way to set server parameters is in form of environment variables. How exactly environment variables are set is dependent on your system, however it should be similar to this:

export PDFREACTOR_PARAMETERNAME=parameterValue

The parameter name is upper cased and must be prefixed with PDFREACTOR_ and all dots (".") must be converted to underscores ("_").

Configuration File

Server parameters can also be configured in a special configuration file. For this, create a new file pdfreactorwebservice.config at the same location where the pdfreactor-webservice.jar is located, which is usually in the PDFreactor/jetty/lib/ext directory. The content of this configuration file is one or more lines, each consisting of the following:

parameterName=parameterValue

This format is similar to Java's properties file format.

Parameter Priority

Should the same server parameter be specified in multiple ways (e.g. as system property and environment variable), the parameter with the highest priority is chosen. The priority is as follows, with the first item having highest priority:

  1. Configuration file

  2. System property

  3. Environment variable

  4. Servlet init parameter

Callbacks

When performing asynchronous conversions, you usually have to regularly poll the progress of these conversions to determine when they are finished. As an alternative, you could also use callbacks which will notify you automatically about certain steps of the conversion by performing an HTTP POST request to a specified URL. The posted data is either in JSON, XML or plain text format, depending on the content type that is specified for the callback. Some callbacks return the same data model as if you had called the appropriate API methods. If the specified format is plain text, the data consists of a small string containing only a minimum amount of information.

The following callback types are available:

Callbacks

Callback type

Trigger

Model

(JSON/XML)

Model

(plain text)

Similar API method

START

The conversion has started on the server.

Info

Document ID

N/A

FINISH

The conversion has finished on the server.

Result

Document ID

getDocument

PROGRESS

The conversion is in progress.

Progress

Progress percentage

getProgress

If you want to be notified once the conversion is done, this example demonstrates how to add a simple "ping" that just posts the document ID of the finished conversion to your serve.

config.setCallbacks(new Callback()
    .setUrl("http://myServer/myEndpoint1")
    .setType(CallbackType.FINISH)
    .setContentType(ContentType.TEXT));
config.Callbacks = new List<Callback>
{
    new Callback
    {
        Url = "http://myServer/myEndpoint1",
        Type = CallbackType.FINISH,
        ContentType = ContentType.TEXT
    }
};
config.callbacks = [{
    url: "http://myServer/myEndpoint1",
    type: PDFreactor.CallbackType.FINISH,
    contentType: PDFreactor.ContentType.TEXT
}];
config.callbacks = [{
    url: "http://myServer/myEndpoint1",
    type: PDFreactor.CallbackType.FINISH,
    contentType: PDFreactor.ContentType.TEXT
}];
$config["callbacks"] = array(
    array(
        "url" => "http://myServer/myEndpoint1",
        "type" => CallbackType::FINISH,
        "contentType" => ContentType::TEXT
    )
);
config['callbacks'] = [{
    'url': 'http://myServer/myEndpoint1',
    'type': PDFreactor.CallbackType.FINISH,
    'contentType': PDFreactor.ContentType.TEXT
}]
config['callbacks'] = [{
    url: 'http://myServer/myEndpoint1',
    type: PDFreactor::CallbackType::FINISH,
    contentType: PDFreactor::ContentType::TEXT
}]
$config["callbacks"] = [{
    'url' => "http://myServer/myEndpoint1"
    'type' => PDFreactor::CallbackType->FINISH,
    'contentType' => PDFreactor::ContentType->TEXT
}];
{ "callbacks": [{
    "url": "http://myServer/myEndpoint1",
    "type": "FINISH",
    "contentType": "TEXT"
}]
-C config.json

With the following config.json:

{ "callbacks": [{
    "url": "http://myServer/myEndpoint1",
    "type": "FINISH",
    "contentType": "TEXT"
}]

The next example demonstrates how to add a PROGRESS callback that will be called every 2 seconds until the conversion is finished. The posted data will be in JSON format.

config.setCallbacks(new Callback()
    .setUrl("http://myServer/myEndpoint2")
    .setType(CallbackType.PROGRESS)
    .setContentType(ContentType.JSON)
    .setInterval(2));
config.Callbacks = new List<Callback>
{
    new Callback
    {
        Url = "http://myServer/myEndpoint2",
        Type = CallbackType.PROGRESS,
        ContentType = ContentType.JSON,
        Interval = 2
    }
};
config.callbacks = {
    url: "http://myServer/myEndpoint2",
    type: PDFreactor.CallbackType.PROGRESS,
    contentType: PDFreactor.ContentType.JSON,
    interval: 2
}
config.callbacks = {
    url: "http://myServer/myEndpoint2",
    type: PDFreactor.CallbackType.PROGRESS,
    contentType: PDFreactor.ContentType.JSON,
    interval: 2
}
$config["callbacks"] = array(
    "url" => "http://myServer/myEndpoint2",
    "type" => CallbackType::PROGRESS,
    "contentType" => ContentType::JSON,
    "interval" => 2
);
config['callbacks'] = {
    'url': 'http://myServer/myEndpoint2',
    'type': PDFreactor.CallbackType.PROGRESS,
    'contentType': PDFreactor.ContentType.JSON,
    'interval': 2
}
config['callbacks'] = {
    url: 'http://myServer/myEndpoint2',
    type: PDFreactor::CallbackType::PROGRESS,
    contentType: PDFreactor::ContentType::JSON,
    interval: 2
}
$config["callbacks"] = {
    'url' => "http://myServer/myEndpoint2"
    'type' => PDFreactor::CallbackType->PROGRESS,
    'contentType' => PDFreactor::ContentType->JSON,
    'interval' => 2
};
{ "callbacks": [{
    "url": "http://myServer/myEndpoint2",
    "type": "PROGRESS",
    "contentType": "JSON",
    "interval": 2
}]}
-C config.json

With the following config.json:

{ "callbacks": [{
    "url": "http://myServer/myEndpoint2",
    "type": "PROGRESS",
    "contentType": "JSON",
    "interval": 2
}]}

Monitoring

Server administrators may wish to monitor the PDFreactor Web Service and gain access to conversion statistics or server specifics. This can be done via the .

JSON Configuration Files

Some configuration data is too complex to be packed into a single string, so certain require a URL or path to a JSON file which then contains the configuration data in JSON format. These parameters usually map certain configuration properties that are only available in the Java library, e.g. the server parameter behaves exactly as the configuration property connectionRules of the securitySettings.

To map the Java configuration property to JSON format, use the following rules:

  • A single object in Java maps to a JSON object

  • A list or array in Java maps to a JSON array of JSON objects

  • Java setter methods map to JSON properties by removing the prefix "set" and lowercasing the following character

  • Java Enums map to simple strings in JSON using the same value

Consider the following Connection Rule in Java:

new SecuritySettings()
    .setConnectionRules(
        new ConnectionRule()
            .setName("My Rule")
            .setAction(ConnectionSecurityAction.ALLOW)
            .setHost("**.pdfreactor.com")
    );

Since the Java property connectionRules is a list of connection rules instead of a single object, the JSON format then looks like this:

[
    {
        "name": "My Rule",
        "action": "ALLOW",
        "host": "**.pdfreactor.com"
    }
]

Using the Command Line Application

PDFreactor comes with a command line interface for easy integration in shell scripts or batch files. It is included in the pdfreactor.jar which is located in the PDFreactor/lib directory. For Windows systems a compiled version is provided which is located in the PDFreactor/bin directory. It can be used like this:

java -jar pdfreactor.jar -i input.html -o output.pdf

For a full list of all arguments and parameters, use the following command:

java -jar pdfreactor.jar --help

When using the Windows executable or the Python Command Line client, replace java -jar pdfreactor.jar with pdfreactor.exe and python pdfreactor.py, respectively.

Standard input and output

The Java command line interface supports standard input and output (stdin and stdout). To read from stdin, you have to specify the input argument as "stdin". To write to stdout, you have to specify the output argument as "stdout".

Reading from stdin:

java -jar pdfreactor.jar -i stdin -o output.pdf < input.html

Writing to stdout:

java -jar pdfreactor.jar -i input.html -o stdout > output.pdf

Combining both:

java -jar pdfreactor.jar -i stdin -o stdout < input.html > output.pdf

API Comparison

The following table shows a comparison between the API methods available in the Java library, in clients and as RESTful resources of the . Please note that depending on the client language, the method signature might be slightly different.

API Comparison

Java library

Client

REST resource

(HTTP method)

Description

convert(Configuration)

convert(Configuration)

/convert

(POST)

Converts the input document to PDF or image synchronously

convert(Configuration, OutputStream)

Not available

Not available

Converts the input document to PDF or image synchronously and writes it directly in the OutputStream

Not available

convertAsBinary(Configuration)

/convert.pdf

(POST)

Converts the input document to PDF or image synchronously and returns the binary data directly

Not available

convertAsBinary(Configuration, Stream)

/convert.pdf

(POST)

Converts the input document to PDF or image synchronously and streams the binary data directly to the given stream

Not available

convertAsync(Configuration)

/convert/async

(POST)

Converts the input document to PDF or image asynchronously

Not available

getProgress(id)

/progress/{id}

(GET)

Checks the progress of an asynchronous conversion

Not available

getDocument(id)

/document/{id}

(GET)

Retrieves the converted PDF or image

Not available

getDocumentAsBinary(id)

/document/{id}.bin

(GET)

Retrieves the converted PDF or image directly as binary data

Not available

getDocumentMetadata(id)

/document/metadata/{id}

(GET)

Retrieves the metadata of the converted PDF or image

Not available

Not available

/document/{id}/{page}

(GET)

Retrieves the specified page of a converted multi-page image directly as binary data

Not available

deleteDocument(id)

/document/{id}

(DELETE)

Deletes the converted PDF or image from the server

Not available

getStatus()

/status

(GET)

Checks if the PDFreactor Web Service is responsive and able to convert

VERSION

getVersion()

/version

(GET)

Gets the version of PDFreactor

The API /document/{id}/{page} is only available in REST. In the Java library and the clients, you can simply access the appropriate entry of the array property documentArray of the Result object.

Some methods do not return anything directly (e.g. deleteDocument and getStatus), however, all methods throw appropriate exceptions. RESTful resources respond with appropriate status codes.

The method getVersion does not exist in the API of the Java library, here the version is available as the constant VERSION.

What API Method Should I Use?

When using PDFreactor Web Service clients, you have several convert API methods (or RESTful resources) at your disposal. Depending on the use case, some API methods are more efficient than others.

Small Documents

Simple Case

Small and simple documents are best converted using the convertAsBinary API method. This method is the most efficient since the document is returned as binary data without any additional overhead.

Since the PDF data is streamed as soon as it is available, it is not possible for PDFreactor to relay errors to the client that occur while writing the PDF. For full error handling use convert or convertAsync instead.

Complex Case

For more complex documents you should use the convert API method. This returns a result object containing the document as a base64-encoded string, as well as a log, number of pages and exceeding content information. When using this method, the PDF document is converted and stored in-memory. It also has slightly more overhead but the result object contains helpful information about the conversion.

Large Documents

When converting large documents, you should convert asynchronously using the convertAsync API method. This has several advantages: Firstly, the connections to the server are closed directly after receiving the conversion request, thus avoiding keeping connections open for extended periods of time which is timeout and error prone. Secondly the client's integration does not block during the conversion and you have more control over when to retrieve the converted document. Lastly the document is stored on the file system of the server, so it does not allocate any memory.

Logging

PDFreactor can produce a detailed log of the entire conversion. To enable logging you have to set an appropriate log level first using the configuration property logLevel, e.g. like this:

config.setLogLevel(LogLevel.WARN);
config.LogLevel = LogLevel.WARN;
$config["logLevel"] = LogLevel::WARN;
config['logLevel'] = PDFreactor.LogLevel.WARN
config['logLevel'] = PDFreactor::LogLevel::WARN
config.logLevel = PDFreactor.LogLevel.WARN;
config.logLevel = PDFreactor.LogLevel.WARN;
$config["logLevel"] = PDFreactor::LogLevel->WARN;
{ "logLevel": "WARN" }
--logLevel WARN

To retrieve the logs, use the log property of the Result object. This gives you a Log object and access to the following logs:

Main log
The main log contains all relevant log information for that conversion. It can be accessed via the records property of a Log object.
CSS log
This log contains detail information for certain CSS warnings or errors. Those may occur in abbreviated form in the main log but are usually not critical for the conversion. It can be accessed via the recordsCss property.
JavaScript log
PDFreactor logs JavaScript output similar to a browser. While it is also available in the main log, the JavaScript log provides a more comprehensive and machine-readable access to the output. It can be accessed via the recordsJs property.

Additionally, you can retrieve the logs using appropriate debug settings. Refer to for more information.

Examples

The following examples show how to enable logging by setting an appropriate log level and then appending the log to the generated PDF.

Configuration config = new Configuration();
config.setLogLevel(LogLevel.DEBUG);
config.setDebugSettings(new DebugSettings()
    .setAppendLogs(true));
Configuration config = new Configuration
{
    LogLevel = LogLevel.DEBUG,
    DebugSettings = new DebugSettings
    {
        AppendLogs = true
    }
};
$config = array(
    logLevel => LogLevel::DEBUG,
    debugSettings => array(
        appendLogs => true
    )
);
config = {
    'logLevel': PDFreactor.LogLevel.DEBUG,
    'debugSettings': {
        appendLogs: True
    }
}
config = {
    logLevel: PDFreactor::LogLevel::DEBUG,
    debugSettings: {
        appendLogs: true
    }
}
config = {
    logLevel: PDFreactor.LogLevel.DEBUG,
    debugSettings: {
        appendLogs: true
    }
}
config = {
    logLevel: PDFreactor.LogLevel.DEBUG,
    debugSettings: {
        appendLogs: true
    }
}
$config = {
    'logLevel' => PDFreactor::LogLevel->DEBUG,
    'appendLogs' => true
}
{ "logLevel": "DEBUG", "debugSettings": { "all": true }}
-d

Conversion Name

You can specify an arbitrary name for each conversion using the conversionName configuration property. This name will be logged as the first and last line in each conversion log. This makes it easy to match a conversion log to a particular document.

Log Capacity

During the course of the conversion, PDFreactor stores several messages in internal logs so that they can be accessed afterwards. Those internal logs have a limited capacity. By default, each log stores 100 000 entries. This should be sufficient for most documents. In the rare cases where this number needs to be adjusted, you can use the configuration property logMaxLines like this:

config.setLogMaxLines(100);
config.LogMaxLines = 100;
$config["logMaxLines"] = 100;
config['logMaxLines'] = 100
config['logMaxLines'] = 100
config.logMaxLines = 100;
config.logMaxLines = 100;
$config["logMaxLines"] = 100;
{ "logMaxLines": 100 }
--logMaxLines 100

If the log capacity is exceeded, the oldest entries will be removed to make room for the new ones.

License Key

Evaluation Mode

Without a license key PDFreactor runs in evaluation mode. In evaluation mode it is possible to integrate and test PDFreactor just like the full version but the resulting PDF document will include watermarks and additional evaluation pages.

Receiving a License Key

To obtain a license key, please visit the PDFreactor website (https://www.pdfreactor.com). It provides information about all available licenses and how to receive license keys.

Setting the License Key

RealObjects provides you a license key file in XML format.

The license key can be set as a string using the licenseKey configuration property.

String licensekey = "<license>... your license ...</license>";
config.setLicenseKey(licensekey);
string licensekey = "<license>... your license ...</license>";
config.LicenseKey = licensekey;
$licensekey = "<license>... your license ...</license>";
$config["licenseKey"] = $licensekey;
licensekey = "<license>... your license ...</license>"
config['licenseKey'] = licensekey
licensekey = "<license>... your license ...</license>"
config['licenseKey'] = licensekey
const licensekey = "<license>... your license ...</license>";
config.licenseKey = licensekey;
const licensekey = "<license>... your license ...</license>";
config.licenseKey = licensekey;
$licensekey = "<license>... your license ...</license>";
$config["licenseKey"] = $licensekey;
{ "licenseKey": "<license>... your license ...</license>" }
--licenseKey "<license>... your license ...</license>"

You can ensure that no eval or license notices are added to PDF documents using an appropriate error policy:

config.setErrorPolicies(ErrorPolicy.LICENSE);
config.ErrorPolicies = new List<ErrorPolicy> { ErrorPolicy.LICENSE };
$config["errorPolicies"] = array(ErrorPolicy::LICENSE);
config['errorPolicies'] = [ PDFreactor.ErrorPolicy.LICENSE ]
config['errorPolicies'] = [ PDFreactor::ErrorPolicy::LICENSE ]
config.errorPolicies = [ PDFreactor.ErrorPolicy.LICENSE ];
config.errorPolicies = [ PDFreactor.ErrorPolicy.LICENSE ];
$config["errorPolicies"] = [ PDFreactor::ErrorPolicy->LICENSE ];
{ "errorPolicies": [ "LICENSE" ] }
--errorPolicies LICENSE

This forces PDFreactor to throw an exception instead of adding notices to PDF documents (see ).

Setting the License Key in the Web Service

For integrators that use the PDFreactor Web Service with either one of the clients or the REST API, it may be useful to not set the license key in their client-side integration. In this case, you can just copy the licensekey.txt file to the PDFreactor/jetty/lib/ext directory (where the pdfreactor.jar and the pdfreactor-webservice.jar files are located). PDFreactor will scan for a license key file in that location and use it if one is found.

See Docker Configuration on how to deploy a license key when using the PDFreactor Docker image.

Observing Document Content

When converting documents into PDF, it may be desirable to programmatically observe certain parts of the document content to ensure that the PDF result is as excepted. This can be especially important for highly dynamic input documents for which the result might not have been validated prior to the conversion.

There are currently two parts of the content that can be observed: Exceeding content and missing resources. Exceeding content observes content that overflows certain boundaries, missing resources observes all resources that could not be loaded during conversion.

All content observed this way is logged in the normal PDFreactor log. In addition to that, it is logged in separate, machine-parsable logs which can be retrieved and analyzed after the conversion has finished to verify the result.

A content observer can be configured like this:

ContentObserver contentObserver = new ContentObserver();
// set up contentObserver, see below...
config.setContentObserver(contentObserver);
ContentObserver contentObserver = new ContentObserver();
// set up contentObserver, see below...
config.ContentObserver = contentObserver;
$contentObserver = array();
// set up contentObserver, see below...
$config["contentObserver"] = $contentObserver;
contentObserver = {}
# set up contentObserver, see below...
config['contentObserver'] = contentObserver
contentObserver = {}
# set up contentObserver, see below...
config['contentObserver'] = contentObserver
const contentObserver = {};
// set up contentObserver, see below...
config.contentObserver = contentObserver;
$contentObserver = array();
// set up contentObserver, see below...
$config["contentObserver"] = $contentObserver;
$contentObserver = array();
# set up contentObserver, see below...
$config["contentObserver"] = $contentObserver;
{ "contentObserver": {set up contentObserver, see below...} }
-C config.json

With the following config.json:

{ "contentObserver": {set up contentObserver, see below...} }

Exceeding Content

Content that does not fit into its pages can be logged as well as programmatically analyzed. This functionality is enabled and configured by using the content observer and requires two arguments:

The first one specifies what to analyze:
Constant Description
ExceedingContentAnalyze.NONE Disable this functionality (default)
ExceedingContentAnalyze.CONTENT Analyze content (text and images) only
ExceedingContentAnalyze.CONTENT_AND_BOXES Analyze content as well as boxes. (catches exceeding borders and backgrounds)
ExceedingContentAnalyze.CONTENT_AND_STATIC_BOXES Analyze content as well as boxes, except for those with absolute or relative positioning
The second one specifies how to analyze:
Constant Description
ExceedingContentAgainst.NONE Disable this functionality (default)
ExceedingContentAgainst.PAGE_BORDERS Find content exceeding the actual edges of the page
ExceedingContentAgainst.PAGE_CONTENT Find content exceeding the page content area. (avoids content extending into the page margins)
ExceedingContentAgainst.PARENT Find content exceeding its parent (i.e. any visible overflow)

For example:

contentObserver
    .setExceedingContentAnalyze(ExceedingContentAnalyze.CONTENT_AND_STATIC_BOXES)
    .setExceedingContentAgainst(ExceedingContentAgainst.PAGE_CONTENT);
contentObserver.ExceedingContentAnalyze = ExceedingContentAnalyze.CONTENT_AND_STATIC_BOXES;
contentObserver.ExceedingContentAgainst = ExceedingContentAgainst.PAGE_CONTENT;
$contentObserver["exceedingContentAnalyze"] = ExceedingContentAnalyze::CONTENT_AND_STATIC_BOXES;
$contentObserver['exceedingContentAgainst"] = ExceedingContentAgainst::PAGE_CONTENT;
contentObserver['exceedingContentAnalyze'] = PDFreactor.ExceedingContentAnalyze.CONTENT_AND_STATIC_BOXES
contentObserver['exceedingContentAgainst'] = PDFreactor.ExceedingContentAgainst.PAGE_CONTENT
contentObserver['exceedingContentAnalyze'] = PDFreactor::ExceedingContentAnalyze::CONTENT_AND_STATIC_BOXES
contentObserver['exceedingContentAgainst'] = PDFreactor::ExceedingContentAgainst::PAGE_CONTENT
contentObserver.exceedingContentAnalyze = PDFreactor.ExceedingContentAnalyze.CONTENT_AND_STATIC_BOXES;
contentObserver.exceedingContentAgainst = PDFreactor.ExceedingContentAgainst.PAGE_CONTENT;
contentObserver.exceedingContentAnalyze = PDFreactor.ExceedingContentAnalyze.CONTENT_AND_STATIC_BOXES;
contentObserver.exceedingContentAgainst = PDFreactor.ExceedingContentAgainst.PAGE_CONTENT;
$contentObserver["exceedingContentAnalyze"] = PDFreactor::ExceedingContentAnalyze->CONTENT_AND_STATIC_BOXES;
$contentObserver['exceedingContentAgainst"] = PDFreactor::ExceedingContentAgainst->PAGE_CONTENT;
{ "exceedingContentAnalyze": "CONTENT_AND_STATIC_BOXES",
  "exceedingContentAgainst": "PAGE_CONTENT" }
-C config.json

With the following config.json:

{ "contentObserver": {
    "exceedingContentAnalyze": "CONTENT_AND_STATIC_BOXES",
    "exceedingContentAgainst": "PAGE_CONTENT" }}

To programmatically process the results you can get an array of ExceedingContent objects using the property exceedingContents. Please see the API documentation for details on this class.

Missing Resources

To ensure that all resources referenced in the input document (or in other resources) are loaded, configure the content observer like this:

contentObserver.setMissingResources(true);
contentObserver.MissingResources = true;
$contentObserver["missingResources"] = true;
contentObserver['missingResources'] = True
contentObserver['missingResources'] = true
contentObserver.missingResources = true;
contentObserver.missingResources = true;
$contentObserver["missingResources"] = true;
{ "missingResources": true }
-C config.json

With the following config.json:

{ "contentObserver": {
    "missingResources": true }}

After the conversion, you can access and analyze a log containing all missing resources using the property missingResources. It returns an array of MissingResource objects which contains the resource description, type (e.g. style sheet, image, etc.) as well as a description why the resource is missing. If the log is null, no resources are missing. Please see the API documentation for details on this class.

Connections

It is also possible to log all connections or connection attempts performed by PDFreactor. For this, configure the content observer like this:

contentObserver.setConnections(true);
contentObserver.Connections = true;
$contentObserver["connections"] = true;
contentObserver['connections'] = True
contentObserver['connections'] = true
contentObserver.connections = true;
contentObserver.connections = true;
$contentObserver["connections"] = true;
{ "connections": true }
-C config.json

With the following config.json:

{ "contentObserver": {
    "connections": true }}

A log containing all connections or connection attempts can be accessed after the conversion via the connections property. It returns an array of Connection objects which contain data about the connection. For HTTP connections, the data includes the status code as well as request and response headers. Please see the API documentation for details on this class.

Please note that connections that were blocked due to security settings are not included in this log since PDFreactor blocked the connection before even attempting to open it.

Error Policies

It is possible to adjust PDFreactor's default error policy. Depending on the configured policy, the conversion will now fail if certain criteria are met. The following error policies can be set and will terminate the conversion:

Error policies can be set like this:

config.setErrorPolicies(
    ErrorPolicy.LICENSE,
    ErrorPolicy.MISSING_RESOURCE);
config.ErrorPolicies = new List<ErrorPolicy>
{
    ErrorPolicy.LICENSE,
    ErrorPolicy.MISSING_RESOURCE
};
$config["errorPolicies"] = array(
    ErrorPolicy::LICENSE,
    ErrorPolicy::MISSING_RESOURCE);
config['errorPolicies'] = [
    PDFreactor.ErrorPolicy.LICENSE,
    PDFreactor.ErrorPolicy.MISSING_RESOURCE ]
config['errorPolicies'] = [
    PDFreactor::ErrorPolicy::LICENSE,
    PDFreactor::ErrorPolicy::MISSING_RESOURCE ]
config.errorPolicies = [
    PDFreactor.ErrorPolicy.LICENSE,
    PDFreactor.ErrorPolicy.MISSING_RESOURCE ];
config.errorPolicies = [
    PDFreactor.ErrorPolicy.LICENSE,
    PDFreactor.ErrorPolicy.MISSING_RESOURCE ];
$config["errorPolicies"] = [
    PDFreactor::ErrorPolicy->LICENSE,
    PDFreactor::ErrorPolicy->MISSING_RESOURCE ];
{ "errorPolicies": [ "LICENSE", "MISSING_RESOURCE" ] }
--errorPolicies LICENSE MISSING_RESOURCE

Limiting Conversion Times

To limit conversion times and to prevent certain inputs to cause extremely long or even indefinite conversion times, you can specify timeouts. If a timeout is exceeded, the conversion will be aborted.

Conversion times can be limited by specifying a conversionTimeout in seconds.

config.setConversionTimeout(30);
config.ConversionTimeout = 30;
$config["conversionTimeout"] = 30;
config['conversionTimeout'] = 30
config['conversionTimeout'] = 30
config.conversionTimeout = 30;
config.conversionTimeout = 30;
$config["conversionTimeout"] = 30;
{ "conversionTimeout": 30 }
--conversionTimeout 30

To specifically limit JavaScript processing times, see JavaScript Timeout.

To limit resource loading times, see Resource Timeout. These timeouts will not cause the conversion to abort.

Development and Debugging Tools

Debug Settings

When integrating PDFreactor, especially during the trial and development phases, it might be useful to retrieve debugging information about the conversion. The most convenient way to do this is by enabling the various debugging tools of PDFreactor. This can be done in the configuration like this:

config.setDebugSettings(new DebugSettings().setAll(true));
config.DebugSettings = new DebugSettings { All: true };
$config["debugSettings"] = array("all" => true);
config['debugSettings'] = { "all": True }
config['debugSettings'] = { "all": true }
config.debugSettings = { all: true };
config.debugSettings = { all: true };
$config["debugSettings"] = { "all" => true };
{ "debugSettings": { "all": true }}
-d

This causes PDFreactor to do the following:

  • Set the log level to the most verbose level, i.e. LogLevel.PERFORMANCE.

  • Append logs to the generated PDF with that log level. Can be controlled with the appendLogs property of the DebugSettings object.

  • Attach various debug files to the generated PDF. Can be controlled with the attachConfiguration, attachDocuments, attachResources, and attachLogs properties of the DebugSettings object.

  • No longer throw any exceptions. Instead, in case of an exception, a text document is returned that contains the conversion log as well as the exception that would have been thrown. Can be controlled with the forceResult property of the DebugSettings object.

The following debug files are attached by default:

Debug Files
Group Attachment URL File Description
documents #,
#originalsource
OriginalSource.txt The original input document
#finalsource FinalSource.txt The input document after XSLT preprocessing
#originaldocument OriginalDocument.txt The initially parsed input document
#originaldocumentpp OriginalDocumentPP.txt A pretty-printed version of the above
#finaldocument FinalDocument.txt The input document after all modifications (JavaScript etc.) are completed
#finaldocumentpp FinalDocumentPP.txt A pretty-printed version of the above
configuration #configuration Configuration.txt The configuration object passed to the PDFreactor instance
ClientConfiguration.txt The configuration object sent to the PDFreactor Web Service (if used)
resources #resources Resources.dat All used external resources like style sheets, scripts, images etc. as a ZIP file
logs #log Log.txt The main PDFreactor conversion log
#logcss LogCss.txt The PDFreactor CSS log
#logjavascript LogJavaScript.txt The PDFreactor JavaScript log
#systemproperties SystemProperties.txt A list of the current Java system properties
#connections Connections.txt A log of all URL connection attempts performed by PDFreactor
#missingresources MissingResources.txt A log of all resources that could not be loaded

Debug settings are intended for investigation purposes only and not for production use. Activating some or all debug settings may change other configuration properties, such as the log level. This is done for convenience to get the most verbose result when debugging.

Controlling Debug Behavior

If only specific debugging tools are required, instead of setting the all property, you can use the appropriate debug settings to enable the desired setting manually. The following properties are available:

  • all — Activates all of the following debugging tools

  • attachDocuments — Attaches all debug files belonging to the group "documents"

  • attachResources — Attaches all debug files belonging to the group "resources"

  • attachLogs — Attaches all debug files belonging to the group "logs"

  • appendLogs — Appends the PDFreactor log to the generated PDF

  • forceResult — Forces PDFreactor to return a result even if an exception occurred during the conversion

Debug File Dump

In certain cases where no converted document could be created (e.g. when a specific PDF/A conformance could not be achieved) it may be helpful to have access to the debug files mentioned previously. To do this, it is possible to specify a local directory when configuring the debug settings. If such a directory is specified, PDFreactor will attempt to write all available debug files as a single ZIP into that directory. The local directory can be specified like this:

config.setDebugSettings(new DebugSettings()
    .setAll(true)
    .setLocalDirectory(Paths.get("c:\\debug")));

Use the debugLocalDir server parameter to configure the location.

Use the debugLocalDir server parameter to configure the location.

Use the debugLocalDir server parameter to configure the location.

Use the debugLocalDir server parameter to configure the location.

Use the debugLocalDir server parameter to configure the location.

Use the debugLocalDir server parameter to configure the location.

Use the debugLocalDir server parameter to configure the location.

Use the debugLocalDir server parameter to configure the location.

-d c:\debug

Note: This is only available in the Java CLI. For the Python CLI, use the debugLocalDir server parameter to configure the location.

PDFreactor will create a ZIP file with the naming scheme

PDFreactor-dump-yyyy-MM-dd-HH-mm-ss-SSS

where yyyy-MM-dd-HH-mm-ss-SSS represents the serialized date of the dump.

When using the PDFreactor Web Service, the local directory property is not available. Instead, use the corresponding server parameter debugLocalDir (see ).

Attaching Debug Files Manually

If you only want specific debug files attached, you can forgo enabling the debugging tools entirely and use the feature to make PDFreactor attach the appropriate file. For that, use the URLs mentioned in the Debug Files table.

Inspectable Documents

To create inspectable documents that can be used with the PDFreactor Inspector application, use the inspectableSettings configuration option like this:

config.setInspectableSettings(new InspectableSettings()
    .setEnabled(true));
config.InspectableSettings = new InspectableSettings
{
    Enabled = true
};
config.inspectableSettings = {
    enabled: true
};
config.inspectableSettings = {
    enabled: true
};
$config["inspectableSettings"] = array(
    "enabled" => true
);
config['inspectableSettings'] = {
    'enabled': True
}
config['inspectableSettings'] = {
    enabled: true
}
$config["inspectableSettings"] = {
    'enabled' => true
};
{ "inspectableSettings": {
    "enabled": true
}}

Shorthand:

-I

Longhand:

-C config.json

With the following config.json:

{ "inspectableSettings": {
    "enabled": true
}}

A license key is required to enable the creation of inspectable documents.

Creating inspectable documents increases the conversion time and may require additional memory.

Docker Configuration

Java Options

When using the PDFreactor Docker image, Java arguments such as memory and Java system properties can be specified by passing an environment variable called JAVA_OPTIONS to the container on startup. If you are using Docker Compose, you can also specify the JAVA_OPTIONS environment variable using the environment key in your compose file.

Additional Configuration

The internal directory /ro/config is used for various configurations for the Docker container, so it is recommended that you mount this directory. The following can be configured by simply deploying files in this config directory:

License key

A license key can be deployed to /ro/config/licensekey.txt so that it is automatically loaded by the container.

Custom fonts

PDFreactor automatically loads fonts in the /ro/config/fonts directory and subdirectories.

Server parameters

Instead of using Java system properties or environment variables, server parameters can also be specified in a configuration file which will automatically be loaded when deployed to /ro/config/pdfreactorwebservice.config.

A Docker Compose file that configures memory, maximum parallel conversions and the port as well as a configuration directory could like like this:

version: "2"
services:
  pdfreactor:
    image: realobjects/pdfreactor
    container_name: pdfreactor
    ports:
      - "80:9423"
    volumes:
      - /your/config:/ro/config
    environment:
      JAVA_OPTIONS: >
        -Xmx2g
        -Dcom.realobjects.pdfreactor.webservice.threadPoolSize=4

Security

PDFreactor converts HTML or XML documents which can contain external style sheets, scripts, images or other resources. Depending on the use case, these documents and resources may come from untrusted sources, such as third-party users. This means they might contain malicious code or content which may be used to access private resources through Server-Side Request Forgery.

To protect against potential attacks, PDFreactor has a security layer in place which restricts certain functionality and filters URLs according to configurable security settings via the configuration properties securitySettings and customUrlStreamHandlers, with the latter only available in the Java library.

When using the PDFreactor Web Service, use appropriate "securitySettings" server parameters instead of configuration properties to configure the security settings. Please note that for custom connection rules, you have to specify a URL or path to an external JSON file. is not available in the PDFreactor Web Service.

Depending on your use case and processing chain, you should consider supplementing the security features offered by PDFreactor with your own security measures that can protect your system e.g. on the network layer (such as firewalls), which is beyond the scope of PDFreactor.

Connection Security

Whenever PDFreactor attempts a URL connection to a source from an untrusted security context, the URL is vetted against certain criteria before the connection is opened.

Trusted and Untrusted Contexts

PDFreactor distinguishes between two security contexts when applying the security settings: Trusted and untrusted. The PDFreactor API (i.e. the configuration object that is passed to the convert methods) is considered a trusted security context, because usually only integrators have access to it. Any documents or resources that are specified there are not subject to the connection security, although still works. So no matter how you configure the connection security settings, resources specified in configuration properties such as document, userStyleSheets, baseUrl etc. are always allowed because it is assumed they have been set by the integrator.

Please note that this is not transitive. Even though user style sheets and user scripts are always allowed, resources that they load, e.g. via "@import" rule or XHR are subject to the connection security.

System fonts can also always be loaded, however they can be disabled separately.

All other resources, especially those that are part of the input document which is potentially produced by untrusted third parties, are vetted according to the configured security settings.

Untrusted Clients

When using PDFreactor as a publicly available service or in certain other scenarios, PDFreactor processes configurations that may not have been specified by the integrator or that come from user machines which are by default untrusted environments. Additionally, if at any point in your processing chain it is possible for third parties to inject code or content into the configuration object, then the entire configuration object should be considered untrusted.

This is also the case when your PDFreactor integration code is executed on client machines (e.g. when using a JavaScript integration). In this case, your integration code is vulnerable and should not be considered safe.

To protect yourself, you can use the untrustedApi property to configure the security layer in such a way that PDFreactor treats the API as an untrusted context. This means that all security checks are also applied to any resources specified in the PDFreactor configuration object, including the input document. In addition to that, server-specific information is omitted from the logs.

Automatic Redirects

By default, PDFreactor follows redirects automatically. You can disable this with the allowRedirects property:

config.setSecuritySettings(new SecuritySettings()
    .setAllowRedirects(false));

Use the securitySettings server parameters to configure security.

Use the securitySettings server parameters to configure security.

Use the securitySettings server parameters to configure security.

Use the securitySettings server parameters to configure security.

Use the securitySettings server parameters to configure security.

Use the securitySettings server parameters to configure security.

Use the securitySettings server parameters to configure security.

Use the securitySettings server parameters to configure security.

Use the securitySettings server parameters to configure security.

Use the securitySettings server parameters to configure security.

Connection Rules

You can define security rules that either deny or allow connections to certain resources. These rules support wildcard patterns for their hosts and paths. Each rule also has a priority. Rules are evaluated in order of their priority, starting with the highest priority value. If rules have the same priority, they are evaluated in the same order as they were inserted in the API. The priority is 0 by default.

If a resource is not matched by any of the rules (or if there are no rules), the default security behavior is applied.

If multiple resource properties of a rule such as protocol, host, port or path are specified, the resource must match all of the defined properties.

When PDFreactor vets resource paths according to security policies, it normalizes the path, ignoring any query parameters and the fragment component. Additionally, relative path segments are resolved and non-URI characters are URL encoded. So for the purposes of path vetting, the path

/part/../resource path/file?param=value#fragment

is normalized to

/resource%20path/file

Both the host and the path in connection rules support wildcard patterns, meaning that you can substitute characters for the "?" or "*" characters. "?" represents a single wildcard character while "*" represents any single wildcard path segment (when used in the path property) or one domain label (when used in the host property). If you want to represent zero or any number of path segments or domain labels, use "**" instead.

Important: Invalid URI characters (according to RFC 2396) must be URL encoded for path segments!

The matching of hosts is always case-insensitive. The matching of paths is case-insensitive, unless the property caseSensitivePath of the connection rule is set to true.

Note that path patterns must always start with a slash.

This example illustrates how to allow connections to the internal host "company-cms" as well as connections to certain paths of a publicly available CDN. All other connections are automatically denied.

config.setSecuritySettings(new SecuritySettings()
    .setConnectionRules(
        new ConnectionRule()
            .setAction(ConnectionRuleAction.ALLOW)
            .setName("Allow internal company CMS")
            .setHost("company-cms"),
        new ConnectionRule()
            .setAction(ConnectionRuleAction.ALLOW)
            .setName("Allow public company CDN")
            .setProtocol("https")
            .setHost("cdn.company.com"),
            .setPath("/public%20assets/**") // Encode invalid URI characters
        new ConnectionRule()
            .setAction(ConnectionRuleAction.DENY)
            .setName("Deny all")
            .setPath("/**")
            .setPriority(-1) // Make sure this rule is evaluated last
    )
);

Use the securitySettings server parameters to configure security.

Use the securitySettings server parameters to configure security.

Use the securitySettings server parameters to configure security.

Use the securitySettings server parameters to configure security.

Use the securitySettings server parameters to configure security.

Use the securitySettings server parameters to configure security.

Use the securitySettings server parameters to configure security.

Use the securitySettings server parameters to configure security.

Use the securitySettings server parameters to configure security.

Use the securitySettings server parameters to configure security.

The pattern

*.pdfreactor.com

matches the hosts

cloud.pdfreactor.com
www.pdfreactor.com

but not

pdfreactor.com
www.cloud.pdfreactor.com

To match these hosts as well, you could use

**.pdfreactor.com

To allow only CSS files, more specifically files with the extension "css", regardless of the host and path, you could use the following path pattern:

/**/*.css

To ensure that no URLs can be accessed, you can deny all URLs with a rule:

config.setSecuritySettings(new SecuritySettings()
    .setConnectionRules(
        new ConnectionRule()
            .setAction(ConnectionRuleAction.DENY)
            .setName("Deny all")
            .setPath("/**")
            .setPriority(-1)
    )
);

Use the securitySettings server parameters to configure security.

Use the securitySettings server parameters to configure security.

Use the securitySettings server parameters to configure security.

Use the securitySettings server parameters to configure security.

Use the securitySettings server parameters to configure security.

Use the securitySettings server parameters to configure security.

Use the securitySettings server parameters to configure security.

Use the securitySettings server parameters to configure security.

Use the securitySettings server parameters to configure security.

Use the securitySettings server parameters to configure security.

Make sure to set the path property to "/**", so that it works for URL types that do not have a host (such as file URLs).

In this case, still allow resources inside the package to be accessed.

Refer to the chapter for more information on how to configure rules in JSON format for the PDFreactor Web Service.

Data URIs and Blobs

Data URIs and Blobs are not subject to connection security, and thus cannot be blocked by connection rules since this would be impractical. The single exception is the allowedProtocols setting which can be used to block data URIs or Blobs altogether by not allowing the "data" or "blob" protocol, respectively.

JAR URLs

When using JAR URLs, security rules apply only to the URL to the JAR file, not the whole JAR URL. When the security settings allow access to a JAR file, access is also automatically granted to all of its entries. You can control access to certain JAR entries by using the entry property of a connection rule. Entries are treated as paths, so you can use wildcard notation.

The following rule grants access to all resources inside the "resources" directory in a specific JAR file. Since an entry is specified, the rule does not grant access to the JAR file itself. Also note that the protocol is "file" and not "jar", since rules apply to the URL to the JAR file and not the whole URL.

config.setSecuritySettings(new SecuritySettings()
    .setConnectionRules(
        new ConnectionRule()
            .setAction(ConnectionRuleAction.ALLOW)
            .setName("Allow access to resources inside a JAR")
            .setProtocol("file")
            .setPath("/path/to/my.jar")
            .setEntry("/resources/**")
    )
);

Use the securitySettings server parameters to configure security.

Use the securitySettings server parameters to configure security.

Use the securitySettings server parameters to configure security.

Use the securitySettings server parameters to configure security.

Use the securitySettings server parameters to configure security.

Use the securitySettings server parameters to configure security.

Use the securitySettings server parameters to configure security.

Use the securitySettings server parameters to configure security.

Use the securitySettings server parameters to configure security.

Use the securitySettings server parameters to configure security.

The example above would grant access to e.g. the resource:

jar:file:///path/to/my.jar!/resources/image.png

If an entry is specified for any connection rule, the rule will no longer apply to the URL itself, only the entry. This means that specifying an entry on rules to non-JAR files makes them useless.

Default Security Behavior

The default security behavior is applied to any URL to which no connection rule matched. The appropriate configuration properties are grouped in the defaults property of the securitySettings. Checks are applied in the following order:

allowSameBasePath

This property is considered true if not specified.

When a document is converted from URL or a base URL is specified, access to resources within the same base path is allowed. No further security checks will be made for that resource. Please note that this allows for HSTS, i.e. when the base or document URL is HTTP, then resources within the same base path using HTTPS are also allowed.

This check is always skipped if the untrustedApi property is true.

If a resource is within the same base path, it is allowed. Otherwise, subsequent default checks below are applied.

The base path is the normalized part of the URL leading to the input document (or the base URL if specified), up to the last slash. For HTTP or HTTPS URLs, the base Path consists of at least the host, even if the URL does not end with a slash. For file URLs, it is ensured that the base Path is never the root directory.

For example, if the following URL is the input URL of your document:

http://myServer/document.html

Then the base path is the following URL:

http://myServer/
allowProtocols

This property is considered to have the values "http", "https", "data" and "blob" if not specified.

A list of URL protocols (as lower-case strings) that are allowed. If the protocol of a resource is not contained within this list, the resource is not loaded. Note that the "file" protocol is not handled by this setting. Use allowFileSystemAccess to allow or restrict file URLs.

If the resource's protocol is not allowed, the resource is denied. Otherwise, subsequent default checks below are applied.

allowFileSystemAccess

This property is considered false if not specified.

Allows access to the file system. This is prohibited by default.

If a resource points to a file and file system access is not allowed, the resource is denied. Otherwise, subsequent default checks below are applied.

allowAddresses

This property is considered to have the values PUBLIC, PRIVATE and LOCAL if not specified.

Allows connections to a certain type of host or IP address. Possible values are:

  • PUBLIC: Public hosts or IP addresses.

  • PRIVATE: Hosts in private networks or IP addresses in the private range.

  • LOCAL: Hosts or IP addresses pointing to the local machine.

  • LINK_LOCAL: Link-local addresses or auto-IPs which are usually assigned automatically and are usually not used to provide any useful resources for the conversion. Unless explicitly required, it is recommended to not grant access to this type of address.

If a resource points to a network address that is not allowed, the resource is denied.

When using JAR URLs, the URL to the JAR file is also validated against file system access, allowed protocols as well as allowed addresses. Security rules only apply to the URL to the JAR file.

To allow global file system access, you could use the following default settings. This is not recommended when processing content from untrusted sources!

config.setSecuritySettings(new SecuritySettings()
    .setDefaults(new SecurityDefaults()
        .setAllowFileSystemAccess(true)));

Use the securitySettings server parameters to configure security.

Use the securitySettings server parameters to configure security.

Use the securitySettings server parameters to configure security.

Use the securitySettings server parameters to configure security.

Use the securitySettings server parameters to configure security.

Use the securitySettings server parameters to configure security.

Use the securitySettings server parameters to configure security.

Use the securitySettings server parameters to configure security.

Use the securitySettings server parameters to configure security.

Use the securitySettings server parameters to configure security.

Custom URL Filtering

To further filter URLs, you can implement custom URLStreamHandlers java.net.URLStreamHandler for specific protocols. These are used before the internal security checks are made. It is also possible to register such a handler for all protocols, in this case use an asterisk for the protocol in the API. Only one CustomUrlStreamHandler can be used for a particular protocol. If more are specified, the first one is used. If one for a specific protocol and one for all protocols is defined, the one for the specific protocol is always used.

Please note that this feature is only available in the Java (non-Web Service) API.

config.setCustomUrlStreamHandlers(
    new CustomUrlStreamHandler()
        .setProtocol("file")
        .setHandler(new URLStreamHandler() {
            // your implementation
        })
);

Not possible.

Not possible.

Not possible.

Not possible.

Not possible.

Not possible.

Not possible.

Not possible.

Not possible.

Not possible.

config.setCustomUrlStreamHandlers(
    new CustomUrlStreamHandler()
        .setProtocol("*")
        .setHandler(new URLStreamHandler() {
            // your implementation
        })
);

Not possible.

Not possible.

Not possible.

Not possible.

Not possible.

Not possible.

Not possible.

Not possible.

Not possible.

Not possible.

External XML Parser Resources

By default, PDFreactor does not load external resources during XML parsing, such as DTDs, entities or XIncludes. To allow this for documents, you can use the allowExternalXmlParserResources property of the SecuritySettings.

config.setSecuritySettings(new SecuritySettings()
    .setAllowExternalXmlParserResources(true));

Use the securitySettings server parameters to configure security.

Use the securitySettings server parameters to configure security.

Use the securitySettings server parameters to configure security.

Use the securitySettings server parameters to configure security.

Use the securitySettings server parameters to configure security.

Use the securitySettings server parameters to configure security.

Use the securitySettings server parameters to configure security.

Use the securitySettings server parameters to configure security.

Use the securitySettings server parameters to configure security.

Use the securitySettings server parameters to configure security.

Controlling Client Access

Restricting Service Access

When your PDFreactor Web Service is accessible for a large number of clients or is located in a public cloud, it may be desirable to restrict access to it so that only authorized clients can use the API. This can be done with so called "API keys". API keys are arbitrary strings that clients must send with each request, otherwise the request will be rejected.

API keys can be configured via the server parameters (see ) apiKeys or apiKeysPath. The first parameter specifies a comma separated list of API keys. The latter one specifies the path to a file apikeys.json. That file contains a single JSON object with API keys as keys and a description of the API key as value. This is useful if you use lots of different API keys for different clients and want to have an overview of which API key is used for which client.

To gain access, clients must always send a valid API key with each request. When using one of the clients, an API key can be conveniently set like this (Java example):

pdfReactor.setApiKey("myApiKey");
pdfReactor.setApiKey("myApiKey");
pdfReactor.ApiKey = "myApiKey";
$pdfReactor["apiKey"] = "myApiKey";
pdfReactor['apiKey'] = "myApiKey"
pdfReactor['apiKey'] = "myApiKey"
pdfReactor.apiKey = "myApiKey";
pdfReactor.apiKey = "myApiKey";
$pdfReactor->{apiKey} = "myApiKey";

When using the REST API directly, the API key must always be included in the URL as a query parameter:

/rest/version?apiKey=myApiKey
Not possible.

Please note that this does not make integrations that run on the client (such as JavaScript) secure.

Restricting API Access

Usually when clients use a PDFreactor Web Service, they have access to the full client-side PDFreactor API. However, and especially when the client is untrusted, you may not always want to grant clients access to the full API since this may expose certain server or application-specific information (such as appended logs). To block access to certain parts of the API, you can specify an Override Configuration at the server side in JSON format. All properties that are specified there (and that are non-null) will override similar properties in the client configuration. This means that you can not only specify default values, but also essentially lock certain properties.

This example shows how a Override Configuration should look like to prevent clients from using the debug mode (remember to override the deprecated properties as well) and to add attachments.

{
    "debugSettings": {},
    "enableDebugMode": false,
    "appendLog": false,
    "attachments": []
}

The server parameter is used to specify a URL to such an Override Configuration JSON file.

Refer to the chapter for more information on mapping Java classes to JSON format.

Enabling Administrative Access

Certain RESTful APIs of the PDFreactor Web Service (such as the Monitoring API) require you to configure an admin key to be able to use them. Otherwise these APIs are not accessible at all. An admin key can be configured via Server Parameters, more specifically via adminKey or adminKeyPath.

The admin key can be an arbitrary string and is used similar to an API key. To send the admin key, it has to be appended as query parameter "adminKey" to the request URL like this:

http://localhost:9423/service/monitor/server?adminKey=yourAdminKey

Hiding Version Information

While information about the used PDFreactor version can be generally useful, disclosing version information can give potential attackers knowledge of the underlying system who may then develop attacks targeting a specific version of PDFreactor. To hide version information, use the security setting hideVersionInfo.

config.setSecuritySettings(new SecuritySettings()
    .setHideVersionInfo(true));

Use the securitySettings server parameters to configure security.

Use the securitySettings server parameters to configure security.

Use the securitySettings server parameters to configure security.

Use the securitySettings server parameters to configure security.

Use the securitySettings server parameters to configure security.

Use the securitySettings server parameters to configure security.

Use the securitySettings server parameters to configure security.

Use the securitySettings server parameters to configure security.

Use the securitySettings server parameters to configure security.

Use the securitySettings server parameters to configure security.

The version as well as other system or server information may also be included in the PDFreactor logs which can be embedded in or attached to the resulting PDF using . To make sure that PDFs do not contain this information, integrators must ensure that the PDFreactor API is properly restricted or not accessible to clients.

Input Formats

PDFreactor can process the following input formats. By default, it automatically tries to identify the right format. The input format of the source document can be overridden using the documentType configuration property.

HTML + CSS

HTML is rendered by PDFreactor using a default CSS style sheet for HTML in addition to the document's style.

HTML is parsed by the built-in HTML5 parser which parses the document according to HTML5 rules. This means that elements missing closing tags (such as <p> without </p>) are handled as demanded by the HTML5 specifications. SVG Elements should be used without having their namespace specified.

See and on how to load additional CSS that is not originally part of the input document.

You can force HTML processing like this:

config.setDocumentType(Doctype.HTML5);
config.DocumentType = Doctype.HTML5;
$config["documentType"] = Doctype::HTML5;
config['documentType'] = PDFreactor.Doctype.HTML5
config['documentType'] = PDFreactor::Doctype::HTML5
config.documentType = PDFreactor.Doctype.HTML5;
config.documentType = PDFreactor.Doctype.HTML5;
$config["documentType"] = PDFreactor::Doctype->HTML5;
{ "documentType": "HTML5" }
--documentType HTML5

Legacy XHTML

It is also possible, albeit discouraged, to enable the legacy XHTML parser and its cleanup processes. You can force this document type like this:

config.setDocumentType(Doctype.XHTML);
config.DocumentType = Doctype.XHTML;
$config["documentType"] = Doctype::XHTML;
config['documentType'] = PDFreactor.Doctype.XHTML
config['documentType'] = PDFreactor::Doctype::XHTML
config.documentType = PDFreactor.Doctype.XHTML;
config.documentType = PDFreactor.Doctype.XHTML;
$config["documentType"] = PDFreactor::Doctype->XHTML;
{ "documentType": "XHTML" }
--documentType XHTML

In legacy XHTML, there are various cleanup tools at your disposal that will attempt to repair non-well-formed XHTML documents:

  • CYBERNEKO (default)
  • JTIDY
  • TAGSOUP
  • NONE (no cleanup)

You can set a cleanup tool like this:

config.setCleanupTool(Cleanup.TAGSOUP);
config.CleanupTool = Cleanup.TAGSOUP;
$config["cleanupTool"] = Cleanup::TAGSOUP;
config['cleanupTool'] = PDFreactor.Cleanup.TAGSOUP
config['cleanupTool'] = PDFreactor::Cleanup::TAGSOUP
config.cleanupTool = PDFreactor.Cleanup.TAGSOUP;
config.cleanupTool = PDFreactor.Cleanup.TAGSOUP;
$config["cleanupTool"] = PDFreactor::Cleanup->TAGSOUP;
{ "cleanupTool": "TAGSOUP" }
--cleanupTool TAGSOUP

HTML + JavaScript

PDFreactor can also process JavaScript contained or linked in the HTML document. JavaScript processing is disabled by default and has to be enabled first. See for further details.

JavaScript processing is only possible when converting HTML, not XML.

See on how to load additional JavaScript that is not originally part of the input document.

XML + CSS

Like HTML, XML documents can be styled via CSS. Because XML does not have a default CSS style sheet, you will have to provide one for your specific XML language.

Alternatively or in addition to directly styling the XML content it can be processed by the built-in XSLT Extensible Stylesheet Language Transformations (https://www.w3.org/TR/xslt) processor, either to modify it or to convert it to HTML.

You can force XML processing like this:

config.setDocumentType(Doctype.XML);
config.DocumentType = Doctype.XML;
$config["documentType"] = Doctype::XML;
config['documentType'] = PDFreactor.Doctype.XML
config['documentType'] = PDFreactor::Doctype::XML
config.documentType = PDFreactor.Doctype.XML;
config.documentType = PDFreactor.Doctype.XML;
$config["documentType"] = PDFreactor::Doctype->XML;
{ "documentType": "XML" }
--documentType XML

XML + XSLT

PDFreactor can optionally transform XML documents using XSLT style sheets. This can transform the document into other formats such as HTML. As with the normal input document, PDFreactor attempts to detect the document type of the post-transformation document. This can be overridden by using the postTransformationDocumentType.

The configuration property xsltMode is used to enable XSLT processing.

config.setPostTransformationDocumentType(Doctype.HTML5);
config.setXsltMode(true);
config.PostTransformationDocumentType = Doctype.HTML5;
config.XsltMode = true;
$config["postTransformationDocumentType"] = Doctype::HTML5;
$config["xsltMode"] = true;
config['postTransformationDocumentType'] = PDFreactor.Doctype.HTML5
config['xsltMode'] = True
config['postTransformationDocumentType'] = PDFreactor::Doctype::HTML5
config['xsltMode'] = true
config.postTransformationDocumentType = PDFreactor.Doctype.HTML5;
config.xsltMode = true;
config.postTransformationDocumentType = PDFreactor.Doctype.HTML5;
config.xsltMode = true;
$config["postTransformationDocumentType"] = PDFreactor::Doctype->HTML5;
$config["xsltMode"] = true;
{ "postTransformationDocumentType": "HTML5", "xsltMode": true }
--postTransformationDocumentType HTML5 --xsltMode

See on how to load additional XSLT style sheets that are not originally part of the input document.

Encoding

PDFreactor automatically detects the encoding of the input document, however the encoding can also be forced to a specific value, e.g. like this:

config.setEncoding("UTF-8");
config.Encoding = "UTF-8";
$config["encoding"] = "UTF-8";
config['encoding'] = 'UTF-8'
config['encoding'] = 'UTF-8'
config.encoding = "UTF-8";
config.encoding = "UTF-8";
$config["encoding"] = "UTF-8";
{ "encoding": "UTF-8" }
--encoding "UTF-8"

CSS Validation

PDFreactor validates CSS, ignoring unknown properties and property values with invalid syntax. The cssSettings configuration property is used to adjust PDFreactor's default behavior by constructing a CssSettings object. This object has two properties, each one responsible for a different aspect of CSS validation:

validationMode

Adjusts the CSS property validation behavior. This effects how PDFreactor validates CSS property–value combinations when parsing style sheets. The default value is HTML_THIRD_PARTY.

supportQueryMode

Adjusts the CSS property support behavior. This effects how PDFreactor interprets the validity of CSS property–value combinations in CSS "@supports" queries or via JavaScript. The default value is HTML.

Both of these properties are configured using one of the constants below:

ALL

Indicates that all style declarations are considered valid disregarding the possibility of improper rendering.

Valid values may be overwritten by invalid style declarations.

HTML

Indicates that all values set in style declarations will be validated as long as PDFreactor supports the corresponding property.

Style declarations for properties not supported by PDFreactor are taken as invalid.

HTML_THIRD_PARTY

Indicates that all values set in style declarations will be validated as long as PDFreactor supports the corresponding property.

Style declarations for properties not supported by PDFreactor but by third party products are taken as valid.

HTML_THIRD_PARTY_LENIENT

Indicates that all values set in style declarations will be taken as valid if a third party product supports the corresponding property.

Style declarations for properties not supported by any third party product but supported by PDFreactor will be validated.

config.setCssSettings(new CssSettings()
    .setValidationMode(CssPropertySupport.ALL)
    .setSupportQueryMode(CssPropertySupport.ALL));
config.CssSettings = new QuirksSettings
{
    ValidationMode = CssPropertySupport.ALL,
    SupportQueryMode = CssPropertySupport.ALL
};
config.cssSettings = {
    validationMode: PDFreactor.CssPropertySupport.ALL,
    supportQueryMode: PDFreactor.CssPropertySupport.ALL
}
config.cssSettings = {
    validationMode: PDFreactor.CssPropertySupport.ALL,
    supportQueryMode: PDFreactor.CssPropertySupport.ALL
}
$config["cssSettings"] = array(
    "validationMode" => CssPropertySupport::ALL,
    "supportQueryMode" => CssPropertySupport::ALL
);
config['cssSettings'] = {
    'validationMode': PDFreactor.CssPropertySupport.ALL,
    'supportQueryMode': PDFreactor.CssPropertySupport.ALL
}
config['cssSettings'] = {
    validationMode: PDFreactor::CssPropertySupport::ALL,
    supportQueryMode: PDFreactor::CssPropertySupport::ALL
}
$config["cssSettings"] = {
    'validationMode' => PDFreactor::CssPropertySupport->ALL,
    'supportQueryMode' => PDFreactor::CssPropertySupport->ALL
};
{ "cssSettings": {
    "validationMode": "ALL",
    "supportQueryMode": "ALL"
}}
-C config.json

With the following config.json:

{ "cssSettings": {
    "validationMode": "ALL",
    "supportQueryMode": "ALL"
}}

Quirks Mode

Legacy HTML versions may have different CSS processing or layout rules. To be compatible, PDFreactor offers various quirks settings to adjust its behavior appropriately. This can be done with the quirksSettings configuration property. It takes an object with the following properties:

caseSensitiveClassSelectors

By default in HTML CSS class selectors are case sensitive.

In the default DETECT mode this behavior is disabled for old HTML doctypes or when there is no doctype.

minLineHeightFromContainer

By default the line-height of text containers, e.g. paragraph elements, is used as the minimum line-height of their lines.

In the default DETECT mode this behavior is disabled for old HTML doctypes or when there is no doctype.

Each of these properties is configured with a QuirksMode constant to enable or disable it independently of the document:

STANDARDS

Forced no-quirks (i.e. standard compliant) behavior.

QUIRKS

Forced quirks behavior.

DETECT

Doctype dependent behavior.

config.setQuirksSettings(new QuirksSettings()
    .setCaseSensitiveClassSelectors(QuirksMode.QUIRKS);
config.QuirksSettings = new QuirksSettings
{
    CaseSensitiveClassSelectors = QuirksMode.QUIRKS
};
config.quirksSettings = {
    caseSensitiveClassSelectors: PDFreactor.QuirksMode.QUIRKS
}
config.quirksSettings = {
    caseSensitiveClassSelectors: PDFreactor.QuirksMode.QUIRKS
}
$config["quirksSettings"] = array(
    "caseSensitiveClassSelectors" => QuirksMode::QUIRKS
);
config['quirksSettings'] = {
    'caseSensitiveClassSelectors': PDFreactor.QuirksMode.QUIRKS
}
config['quirksSettings'] = {
    caseSensitiveClassSelectors: PDFreactor::QuirksMode::QUIRKS
}
$config["quirksSettings"] = {
    'caseSensitiveClassSelectors' => PDFreactor::QuirksMode->QUIRKS
};
{ "quirksSettings": {
    "caseSensitiveClassSelectors": "QUIRKS"
}}
-C config.json

With the following config.json:

{ "quirksSettings": {
    "caseSensitiveClassSelectors": "QUIRKS"
}}

Resource Loading

PDFreactor automatically loads linked external resources, e.g. from tags like <link>, <img> etc. If the respective server does not respond within 60 seconds, loading of the resource will be aborted and it will not be included in the document.

For documents including relative resources, like

<img src="images/a.png" />
<a href="/english/index.html">...</a>
<link href="../css/layout.css" rel="stylesheet" type="text/css" />

PDFreactor needs a base URL Uniform Resource Locator (https://www.w3.org/Addressing/) to resolve these resources. If your input document source is a URL, the base URL will be set automatically. In all other cases you have to specify it manually:

config.setBaseUrl("https://someServer/public/");
config.BaseUrl = "https://someServer/public/";
$config["baseUrl"] = "https://someServer/public/";
config['baseUrl'] = "https://someServer/public/"
config['baseUrl'] = "https://someServer/public/"
config.baseUrl = "https://someServer/public/";
config.baseUrl = "https://someServer/public/";
$config["baseUrl"] = "https://someServer/public/";
{ "baseUrl": "https://someServer/public/" }
--baseUrl "https://someServer/public/"

It is also possible to specify file URLs:

config.setBaseUrl("file:///directory/");
config.BaseUrl = "file:///c:/directory/";
$config["baseUrl"] = "file:///directory/";
config['baseUrl'] = "file:///directory/"
config['baseUrl'] = "file:///directory/"
config.baseUrl = "file:///directory/";
config.baseUrl = "file:///directory/";
$config["baseUrl"] = "file:///directory/";
{ "baseUrl": "file:///directory/" }
--baseUrl "file:///directory/"

Timeout

Resource loading timeouts can be customized. Timeouts in milliseconds can be configured via the resourceConnectTimeout and resourceReadTimeout configuration options.

The connect timeout is the timeout in establishing the initial connection to the resource server, the read timeout is the timeout in downloading the resource from the server (after establishing the connection).

These timeouts can be configured like this:

config.setResourceConnectTimeout(1000);
config.setResourceReadTimeout(1000);
config.ResourceConnectTimeout = 1000;
config.ResourceReadTimeout = 1000;
$config["resourceConnectTimeout"] = 1000;
$config["resourceReadTimeout"] = 1000;
config['resourceConnectTimeout'] = 1000
config['resourceReadTimeout'] = 1000
config['resourceConnectTimeout'] = 1000
config['resourceReadTimeout'] = 1000
config.resourceConnectTimeout = 1000;
config.resourceReadTimeout = 1000;
config.resourceConnectTimeout = 1000;
config.resourceReadTimeout = 1000;
$config["resourceConnectTimeout"] = 1000;
$config["resourceReadTimeout"] = 1000;
{ "resourceConnectTimeout": "1000",
  "resourceReadTimeout": 1000 }
--resourceConnectTimeout 1000 --resourceReadTimeout 1000

HTTPS

PDFreactor supports resource loading from HTTPS and will automatically verify the target SSL certificate. Sometimes this can lead to PDFreactor refusing the connection due to issues with the certificate. If this certificate is still trustworthy (e.g. because the target is located in the intranet) or during the development phase, you can configure PDFreactor to use a more lenient approach and ignore many certificate issues. This can be done with the httpsMode configuration property like this:

config.setHttpsMode(HttpsMode.LENIENT);
config.HttpsMode = HttpsMode.LENIENT;
$config["httpsMode"] = HttpsMode::LENIENT;
config['httpsMode'] = PDFreactor.HttpsMode.LENIENT
config['httpsMode'] = PDFreactor::HttpsMode::LENIENT
config.httpsMode = PDFreactor.HttpsMode.LENIENT;
config.httpsMode = PDFreactor.HttpsMode.LENIENT;
$config["httpsMode"] = PDFreactor::HttpsMode->LENIENT;
{ "httpsMode": "LENIENT" }
--httpsMode "LENIENT"

Authentication

When resources are behind basic or digest authentication, PDFreactor can automatically send appropriate HTTP headers to gain access. You can specify the username and the password for the credentials via the authenticationCredentials configuration property like this:

config.setAuthenticationCredentials(
    new KeyValuePair("username", "password"));
config.AuthenticationCredentials =
    new KeyValuePair("username", "password");
$config["authenticationCredentials"] =
    array(
        "key" => "username",
        "value" => "password"
    );
config['authenticationCredentials'] =
    { 'key': "username", 'value': "password" }
config['authenticationCredentials'] =
    { key: "username", value: "password" }
config.authenticationCredentials =
    { key: "username", value: "password" };
config.authenticationCredentials =
    { key: "username", value: "password" };
$config["authenticationCredentials"] =
    { "key" => "username", "value" => "password" };
{ "authenticationCredentials":
    { "key": "username", "value": "password" }
}
-C config.json

With the following config.json:

{ "authenticationCredentials":
    { "key": "username", "value": "password" }
}

HTTP Headers and Cookies

Sometimes external resources require additional HTTP headers or cookies, especially when trying to access session-specific resources. PDFreactor will always send all configured headers and cookies when requesting resources. HTTP headers can be specified via the requestHeaders configuration property and cookies via cookies.

Resource servers may have a white list of user agents to which they deliver content. While PDFreactor always sends a default user agent header, it can be overridden if necessary.

config.setRequestHeaders(
        new KeyValuePair("User-Agent", "MyApp/2.0"));
config.RequestHeaders = new List<KeyValuePair>
{
    new KeyValuePair("User-Agent", "MyApp/2.0")
};
$config["requestHeaders"] = array(
    array(
        "key" => "User-Agent",
        "value" => "MyApp/2.0"
    )
);
config['requestHeaders'] = [
    { 'key': "User-Agent", 'value': "MyApp/2.0" }
]
config['requestHeaders'] = [
    { key: "User-Agent", value: "MyApp/2.0" }
]
config.requestHeaders = [
    { key: "User-Agent", value: "MyApp/2.0" }
];
config.requestHeaders = [
    { key: "User-Agent", value: "MyApp/2.0" }
];
$config["requestHeaders"] = [
    { "key" => "User-Agent", "value" => "MyApp/2.0" }
];
{ "requestHeaders": [
    { "key": "User-Agent", "value": "MyApp/2.0" }
]}
-C config.json

With the following config.json:

{ "requestHeaders": [
    { "key": "User-Agent", "value": "MyApp/2.0" }
]}

A common use case for a custom cookie are session cookies that need to be sent for each resource request so that PDFreactor has access to a user's session. This is relevant when PDFreactor is integrated into a session-based web application. Usually, you would have to find a way to read the session cookies. The example uses a static example value instead.

config.setCookies(
        new KeyValuePair("JSESSIONID", "123456789"));
config.Cookies = new List<KeyValuePair>
{
    new KeyValuePair("JSESSIONID", "123456789")
};
$config["cookies"] = array(
    array(
        "key" => "JSESSIONID",
        "value" => "123456789"
    )
);
config['cookies'] = [
    { 'key': "JSESSIONID", 'value': "123456789" }
]
config['cookies'] = [
    { key: "JSESSIONID", value: "123456789" }
]
config.cookies = [
    { key: "JSESSIONID", value: "123456789" }
];
config.cookies = [
    { key: "JSESSIONID", value: "123456789" }
];
$config["cookies"] = [
    { "key" => "JSESSIONID", "value" => "123456789" }
];
{ "cookies": [
    { "key": "JSESSIONID", "value": "123456789" }
]}
-C config.json

With the following config.json:

{ "cookies": [
    { "key": "JSESSIONID", "value": "123456789" }
]}

URL Rewrites

PDFreactor can rewrite all URLs before connections to resources are even opened. This is done via the urlRewriteSettings configuration property. This object takes one or more rules according to which URLs are rewritten. The new URLs are then used to open the connections.

URL rewrite rules take a regular expression pattern and a substitution. The substitution can include group identifiers and back references.

The following sample rewrites all URLs beginning with "http://myOldHost/" to URLs that begin with "https://myNewHost/".

config.setUrlRewriteSettings(new UrlRewriteSettings()
    .setRules(
        new UrlRewriteRule()
            .setPattern("^http://myOldHost/(.*)$")
            .setSubstitution("https://myNewHost/$1")        
    )
);
config.UrlRewriteSettings = new UrlRewriteSettings()
{
    Rules = new List<Resource>
    {
        new UrlRewriteRule()
        {
            Pattern = "^http://myOldHost/(.*)$",
            Substitution = "https://myNewHost/$1"
        }
    }
};
$config["urlRewriteSettings"] = array(
    "rules" => array(
        array(
            "pattern" => "^http://myOldHost/(.*)$",
            "substitution" => "https://myNewHost/$1"
        )
    )
);
config['urlRewriteSettings'] = {
    'rules': [{
        'pattern': "^http://myOldHost/(.*)$"
        'substitution': "https://myNewHost/$1"
    }]
}
config['urlRewriteSettings'] = {
    rules: [{
        pattern: "^http://myOldHost/(.*)$",
        substitution: "https://myNewHost/$1"
    }]
}
config.urlRewriteSettings = {
    rules: [{
        pattern: "^http://myOldHost/(.*)$",
        substitution: "https://myNewHost/$1"
    }]
};
config.urlRewriteSettings = {
    rules: [{
        pattern: "^http://myOldHost/(.*)$",
        substitution: "https://myNewHost/$1"
    }]
};
$config["urlRewriteSettings"] = {
    "rules" => [{
        "pattern" => "^http://myOldHost/(.*)$",
        "substitution" => "https://myNewHost/$1"
    }]
};
{ "urlRewriteSettings": {
    "rules": [{
        "pattern": "^http://myOldHost/(.*)$",
        "substitution": "https://myNewHost/$1"
    }]
}}
-C config.json

With the following config.json:

{ "urlRewriteSettings": {
    "rules": [{
        "pattern": "^http://myOldHost/(.*)$",
        "substitution": "https://myNewHost/$1"
    }]
}}

All URLs that are called by PDFreactor are matched agains all URL rewrite rules. The URLs that are being matched are always absolute and normalized. This means that:

  • If the original URL was not absolute, it is resolved against the document's base URL

  • All non-URI characters are URL encoded

  • Dot segments are resolved or removed

Otherwise the URL is matched as-is, including query parameters and userinfo.

Data URLs are also matched, but before the match the data part is removed. This means you can still match the header, but not the actual data.

Since only the result URLs of the rewrite are used to open connections, security settings only apply to the new URLs and not the original ones.

Additional Resources

In certain cases it is desirable to load additional resources, such as style sheets or scripts, without modifying the contents of the input document. This can be achieved by specifying the resources directly in the PDFreactor integration code instead of the document itself.

All of these resources use the Resource model. They are usually specified by a URL or by content. If both content and uri properties are set, the uri is used as a base URL for the resource.

User Style Sheets

User style sheets represent CSS that is loaded in addition to the CSS specified in the input document. Generally, user style sheets have higher priority as document style sheets, but lower priority as inline styles.

They can be added like this:

config.setUserStyleSheets(
    new Resource().setContent("p { color: red; }"),
    new Resource().setUri("http://myServer/my.css"));
config.UserStyleSheets = new List<Resource>
{
    new Resource() { Content = "p { color: red; }" },
    new Resource() { Uri = "http://myServer/my.css" }
};
config.userStyleSheets = [
    { content: "p { color: red; }" },
    { uri: "http://myServer/my.css" }
];
config.userStyleSheets = [
    { content: "p { color: red; }" },
    { uri: "http://myServer/my.css" }
];
$config["userStyleSheets"] = array(
    array("content" => "p { color: red; }"),
    array("uri" => "http://myServer/my.css")
);
config['userStyleSheets'] = [
    { 'content': 'p { color: red; }' },
    { 'uri': 'http://myServer/my.css' }
]
config['userStyleSheets'] = [
    { 'content': 'p { color: red; }' },
    { 'uri': 'http://myServer/my.css' }
]
$config["userStyleSheets"] = [
    { "content" => "p { color: red; }" },
    { "uri" => "http://myServer/my.css" }
);
{ "userStyleSheets": [
    { "content": "p { color: red; }" },
    { "uri": "http://myServer/my.css" }
]}

Shorthand:

-c "p { color: red; }" http://myServer/my.css

Longhand:

-C config.json

With the following config.json:

{ "userStyleSheets": [
    { "content": "p { color: red; }" },
    { "uri": "http://myServer/my.css" }
]}

Integration Style Sheets

Integration style sheets are similar to user style sheets, but they have a lower priority than document CSS, and thus also a lower priority than user style sheets.

config.setIntegrationStyleSheets(
    new Resource().setContent("p { font-family: sans-serif }"),
    new Resource().setUri("http://myServer/corporate-identity.css"));
config.IntegrationStyleSheets = new List<Resource>
{
    new Resource() { Content = "p { font-family: sans-serif }" },
    new Resource() { Uri = "http://myServer/corporate-identity.css" }
};
config.integrationStyleSheets = [
    { content: "p { font-family: sans-serif }" },
    { uri: "http://myServer/corporate-identity.css" }
];
config.integrationStyleSheets = [
    { content: "p { font-family: sans-serif }" },
    { uri: "http://myServer/corporate-identity.css" }
];
$config["integrationStyleSheets"] = array(
    array("content" => "p { font-family: sans-serif }"),
    array("uri" => "http://myServer/corporate-identity.css")
);
config['integrationStyleSheets'] = [
    { 'content': 'p { font-family: sans-serif }' },
    { 'uri': 'http://myServer/corporate-identity.css' }
]
config['integrationStyleSheets'] = [
    { 'content': 'p { font-family: sans-serif }' },
    { 'uri': 'http://myServer/corporate-identity.css' }
]
$config["integrationStyleSheets"] = [
    { "content" => "p { font-family: sans-serif }" },
    { "uri" => "http://myServer/corporate-identity.css" }
);
{ "integrationStyleSheets": [
    { "content": "p { font-family: sans-serif }" },
    { "uri": "http://myServer/corporate-identity.css" }
]}
-C config.json
{ "integrationStyleSheets": [
    { "content": "p { font-family: sans-serif }" },
    { "uri": "http://myServer/corporate-identity.css" }
]}

User Scripts

User scripts represent additional JavaScripts. They are executed after all document JavaScript has finished processing. You can optionally run certain user scripts before any document JavaScript by specifying the beforeDocumentScripts property. This is useful for e.g. JavaScript-based shims.

User scripts can be added like this:

config.setUserScripts(
    new Resource().setContent("console.log('executed first')")
        .setBeforeDocumentScripts(true),
    new Resource().setUri("http://myServer/my.js"));
config.UserScripts = new List<Resource>
{
    new Resource()
    {
        Content = "console.log('executed first')",
        BeforeDocumentScripts = true
    },
    new Resource() { Uri = "http://myServer/my.js" }
};
config.userScripts = [
    {
        content: "console.log('executed first')",
        beforeDocumentScripts: true
    },
    { uri: "http://myServer/my.js" }
];
config.userScripts = [
    {
        content: "console.log('executed first')",
        beforeDocumentScripts: true
    },
    { uri: "http://myServer/my.js" }
];
$config["userScripts"] = array(
    array(
        "content" => "console.log('executed first')"),
        "beforeDocumentScripts" => true
    ),
    array("uri" => "http://myServer/my.js")
);
config['userScripts'] = [
    {
        'content': 'console.log("executed first")',
        'beforeDocumentScripts': True
    },
    { 'uri': 'http://myServer/my.js' }
]
config['userScripts'] = [
    {
        'content': 'console.log("executed first")',
        'beforeDocumentScripts': true
    },
    { 'uri': 'http://myServer/my.js' }
]
$config["userScripts"] = [
    {
        "content" => "console.log('executed first')",
        "beforeDocumentScripts" => true
    },
    { "uri" => "http://myServer/my.js" }
);
{ "userScripts": [
    {
        "content": "console.log('executed first')",
        "beforeDocumentScripts": true
    },
    { "uri": "http://myServer/my.js" }
]}

Shorthand:

-j "console.log('executed first')" http://myServer/my.js

Longhand:

-C config.json

With the following config.json:

{ "userScripts": [
    {
        "content": "console.log('executed first')",
        "beforeDocumentScripts": true
    },
    { "uri": "http://myServer/my.js" }
]}

XSLT Style Sheets

When converting XML documents, you can add XSLT style sheets in your integration code to transform the XML into HTML. They can be added like this:

config.setXsltStyleSheets(
    new Resource().setUri("http://myServer/my.xsl"));
config.XsltStyleSheets = new List<Resource>
{
    new Resource() { Uri = "http://myServer/my.xsl" }
};
config.xsltStyleSheets = [
    { uri: "http://myServer/my.xsl" }
];
config.xsltStyleSheets = [
    { uri: "http://myServer/my.xsl" }
];
$config["xsltStyleSheets"] = array(
    array("uri" => "http://myServer/my.xsl")
);
config['xsltStyleSheets'] = [
    { 'uri': 'http://myServer/my.xsl' }
]
config['xsltStyleSheets'] = [
    { 'uri': 'http://myServer/my.xsl' }
]
$config["xsltStyleSheets"] = [
    { "uri" => "http://myServer/my.xsl" }
);
{ "xsltStyleSheets": [
    { "uri": "http://myServer/my.xsl" }
]}
-C config.json

With the following config.json:

{ "xsltStyleSheets": [
    { "uri": "http://myServer/my.xsl" }
]}

Colors

Color Keywords

Instead of using color functions or the hexadecimal notation a single human readable keyword can be used. For more information which keywords are supported by PDFreactor see the CSS Color Keywords table. The keywords are internally converted into the user-set color space. By default, they are converted into RGB colors.

RGB Colors

In CSS you can specify RGB Red Green Blue, additive color model, consisting of the color components red, blue and green. colors in the following ways:

  • # followed by a 6 digit RGB value in hexadecimal notation, e.g. #00ff00 for perfect green. Adding two more digits defines the alpha channel, with ff being opaque.

    You can abbreviate this notation by using only 3 digits which will be expanded internally, e.g. #0f5 equals #00ff55. The same can be done with 4 digits to also define the alpha channel.

  • Using the function rgb. It takes the 3 RGB component values as parameters in decimal or percent notation, e.g. rgb(0,255,0) or rgb(0%,100%,0%) for perfect green.

RGBA Colors

RGBA Red Green Blue Alpha, a color model similar to RGB, with extra information about the translucency. colors are also supported and can be specified by using the function rgba. It takes the 3 RGB component values as well as 1 alpha component value as parameters in decimal or percent notation, e.g. rgba(0,0,255,0.5) or rgba(0%,100%,0%,50%) for semi-translucent blue.

While it is currently possible to set RGBA colors on any CSS border, complex border settings (e.g. table cells borders) or border styles other than "solid" are not yet supported and may cause unexpected visual outcome.

The functions rgb and rgba share the same syntax and can be used interchangeably, so rgb(0%,100%,0%,50%) will also result in a semi-translucent blue.

CMYK Colors

Besides rgb and rgba PDFreactor also supports the non-standard function cmyk. It takes the 4 CMYK component values as parameters in decimal or percent notation, e.g. cmyk(0,0,1,0) or cmyk(0%,0%,100%,0%) for perfect yellow. An optional fifth parameter can be used to define the color's alpha value, e.g. cmyk(0%,0%,100%,0%,10%) would be a transparent yellow with an alpha of only 10%.

Color keywords can be converted automatically into CMYK using the configuration property colorSpaceSettings.targetColorSpace:

config.setColorSpaceSettings(new ColorSpaceSettings()
    .setTargetColorSpace(ColorSpace.CMYK);
config.ColorSpaceSettings = new ColorSpaceSettings
{
    TargetColorSpace = ColorSpace.CMYK
};
config.colorSpaceSettings = {
    targetColorSpace: PDFreactor.ColorSpace.CMYK
}
config.colorSpaceSettings = {
    targetColorSpace: PDFreactor.ColorSpace.CMYK
}
$config["colorSpaceSettings"] = array(
    "targetColorSpace" => ColorSpace::CMYK
);
config['colorSpaceSettings'] = {
    'targetColorSpace': PDFreactor.ColorSpace.CMYK
}
config['colorSpaceSettings'] = {
    targetColorSpace: PDFreactor::ColorSpace::CMYK
}
$config["colorSpaceSettings"] = {
    'targetColorSpace' => PDFreactor::ColorSpace->CMYK
};
{ "colorSpaceSettings": {
    "targetColorSpace": "CMYK"
}}
-C config.json

With the following config.json:

{ "colorSpaceSettings": {
    "targetColorSpace": "CMYK"
}}

CMYK colors are also supported in SVGs.

HSL Colors

HSL Hue Saturation Lightness, alternative representation of colors of the RGB color model. is another representation of the RGB color space. The hue value is in the range of 0 to 360, the saturation and lightness values range between 0 and 1. It is possible to set HSL colors using the function hsl. It takes the 3 HSL component values as parameters in decimal or percent notation, e.g. hsl(240,0,0) or hsl(66%,0%,0%) for blue. As with rgb, there is also the function hsla, though both functions allow an additional parameter for the alpha value.

Spot Colors

Spot or separation colors, e.g. Pantone colors, are special named colors for professional printing. The specific color name is passed as is to the print workflow. As they cannot be displayed on screen (or printed without the correct named color), a fallback color must be specified, e.g. a similar CMYK color. A spot color can be used via the CSS functions -ro-spot and -ro-separation. The functions takes two or three parameters: The spot color name, the color tint (which is optional and defaults to 1.0, which represents maximum "opacity") and the fallback color.

Color Conversion

Different colors can be converted into a common color space. See for more information.

Compound Formats

In addition to rendering HTML and XML styled with CSS, PDFreactor is also able to render documents with compound formats such as images, SVGs or barcodes, so-called replaced elements.

The replaced elements can be mapped to arbitrary elements using styles.

You can use namespaces to include other document formats to integrate XML elements from a different namespace directly within your document.

Images

PDFreactor has support for the image formats PNG, JPEG, TIFF, BMP, GIF as wells as limited support for WebP (lossy simple VP8).

Images are embedded by PDFreactor "as-is", whenever possible, unless the properties or are used. This means that images are not modified in any way and will be embedded without any re-encoding and without any loss in quality. Possible discrepancies in perceived quality might occur depending on the PDF viewer and the zoom level.

PDFreactor supports the img element per default in HTML. For other XML languages, you can use proprietary CSS extensions to define an image element. For example, in an XML vocabulary where an image element is <image source='test.jpg'>, the corresponding CSS definition would be:

image {
    -ro-replacedelement: image;
    -ro-source: attr(source);
}

To define an element as image element, you must specify the replaced element formatter for images for this element, as displayed in the example above. Using the property and the attr function, you can select an attribute of this element. The value of this attribute must always be of the type URI Uniform Resource Identifier (https://www.w3.org/Addressing/) and is used to load the image.

Corrupted images, embedded "as-is", may lead to corrupted PDF output.

Save Memory Mode

PDFreactor needs to access image data multiple times during the conversion. It needs to know an image's dimensions during layout, and then the actual binary data to embed it in the PDF during rendering. To avoid having to download the image multiple times and thus slowing down the conversion, PDFreactor keeps downloaded images in memory for quick access. However, in certain scenarios, images can be quite large, e.g. high-resolution TIFFs for print. In this case, it can actually be detrimental to keep the image in memory. You can use the processingPreferences configuration object to change the default behavior of PDFreactor. The value SAVE_MEMORY_IMAGES prevents PDFreactor from keeping images in memory. Instead, they are downloaded each time PDFreactor requires data.

config.setProcessingPreferences(
    ProcessingPreferences.SAVE_MEMORY_IMAGES);
config.ProcessingPreferences = new List<ProcessingPreferences>
{
    ProcessingPreferences.SAVE_MEMORY_IMAGES
};
$config["processingPreferences"] = 
    array(ProcessingPreferences::SAVE_MEMORY_IMAGES);
config['processingPreferences'] =
    [ PDFreactor.ProcessingPreferences.SAVE_MEMORY_IMAGES ]
config['processingPreferences'] =
    [ PDFreactor::ProcessingPreferences::SAVE_MEMORY_IMAGES ]
config.processingPreferences =
    [ PDFreactor.ProcessingPreferences.SAVE_MEMORY_IMAGES ];
config.processingPreferences =
    [ PDFreactor.ProcessingPreferences.SAVE_MEMORY_IMAGES ];
$config["processingPreferences"] =
    [ PDFreactor::ProcessingPreferences->SAVE_MEMORY_IMAGES ];
{ "processingPreferences": [ "SAVE_MEMORY_IMAGES" ] }
--processingPreferences SAVE_MEMORY_IMAGES

SVG

PDFreactor supports the following SVG Scalable Vector Graphics (https://www.w3.org/Graphics/SVG/) types: SVG and SVGZ. PDFreactor automatically converts SVG documents referenced via the img element. Example:

<img src="diagram.svg" />

Alternatively, you can embed SVG directly into your documents:

a circle:<br/>
<svg width="100" height="100">
    <circle cx="50" cy="50" r="45" fill="yellow" stroke="black" />
</svg>
<br/>sometext.......

When using non-HTML5 documents, an SVG namespace has to be added and used:

<svg:svg xmlns:svg="http://www.w3.org/2000/svg" width="100" height="100">
    <svg:circle cx="50" cy="50" r="45" fill="yellow" stroke="black" />
</svg:svg>

Rasterization

SVGs are embedded into the PDF as vector graphics, keeping them resolution independent. However, SVGs containing masks, filters or non-default composites have to be rasterized Rasterization is the task of taking an image described in a vector graphics format and converting it into a raster (pixel) image. . This behavior can be configured using CSS:

The style -ro-rasterization: avoid disables the aforementioned SVG features to avoid having to rasterize the image.

The property configures the resolution of the rasterization. The default value is 2, meaning twice the default CSS resolution of 96dpi. Accepted values are all positive integers. Higher resolution factors increase the quality of the image, but also increase the conversion time and the size of the output documents.

CMYK Colors in SVG

PDFreactor supports CMYK colors in SVGs. Those are passed to the PDF as-is, as long as the SVG is not rasterized.

stroke="cmyk(0.0, 0.0, 0.0, 1.0)"

MathML

To display MathMLMathematical Markup Language (https://www.w3.org/Math/) in documents we recommend using the JavaScript library MathJaxMathJax (https://www.mathjax.org/ & https://github.com/mathjax/MathJax/) licensed under the Apache License 2.0. To use it without modifying the input documents you can use the following user scripts (see ).

The first script consists of settings for the next one:
"roMjPath" must be set to the URL or path to the file MathJax.js, excluding the filename itself.
"roMjFile" specifies the name of the main MathJax file. It should should usually be left default.
"roMjSvgBlacker" allows to optionally increase the thickness of the fonts used by MathJax.
Please see the comments in the snippet for example values:

roMjPath = "";           // default:  "",
                         // examples: "MathJax/", "../../resource/js/mathjax/",
                         //     "https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/"

roMjFile = "MathJax.js"; // default:  "MathJax.js",
                         // examples: "mathjax.js", "mathjaxmod.js"

roMjSvgBlacker = 0;      // default:  0,
                         // examples: 1, 2

The second script uses the values from the first one and inserts the required script elements into the document, so MathJax is loaded and processes all "math" elements. It does not have to be modified.

document.documentElement.firstElementChild.insertAdjacentHTML('beforeend',
    '\u003Cscript type="text/x-mathjax-config">MathJax.Hub.Config(' +
    JSON.stringify({
               jax: ["input/MathML", "output/SVG"],
        extensions: ["mml2jax.js"],
            MathML: { extensions: ["content-mathml.js"] },
               SVG: { blacker: (typeof window.roMjSvgBlacker == "number" &&
                               window.roMjSvgBlacker > 0 ? window.roMjSvgBlacker : 0) }
    }) +
    ');\u003C/script>\n' +
    '\u003Cscript type="text/javascript" src="' +
    (window.roMjPath ? window.roMjPath : "MathJax/") +
    (window.roMjPath && !(window.roMjPath + "").endsWith("/") ? "/" : "") +
    (window.roMjFile ? window.roMjFile : "MathJax.js") +
    '">\u003C/script>'
);

PDFreactor supports MathJax up to version 2. We recommend using the most recent release of that version to display MathML. MathJax v3 is currently not suported.

Barcodes

PDFreactor supports displaying numerous linear and 2D barcode symbologies using the following style:

.barcode {
    -ro-replacedelement: barcode;
}

The resulting replaced element can be customized by applying various CSS properties.

The most important one is -ro-barcode-type, which can be used to select a specific type (and subtype) of barcode to be rendered. For some types, the last argument of the property is also used to configure a unique characteristic of the barcode (refer to the appendix for more information).

The behavior of most of the -ro-barcode-* properties depends on the selected barcode type.

A full list of all supported barcode types, their subtypes and applicable CSS properties can be found in the appendix.

Defining the Content

There are multiple ways to define the content of the barcode. To define it directly, you can use the -ro-barcode-content property:

.barcode {
    -ro-replacedelement: barcode;
    -ro-barcode-type: upc-e;
    -ro-barcode-content: "123456";
}
Example UPC-E barcode

As MaxiCodes require a primary string in mode 2 or 3, the last argument of -ro-barcode-type is used to add it.

.barcode {
    -ro-replacedelement: barcode;
    -ro-barcode-type: maxicode mode-3 "999999999840012";
    -ro-barcode-content: "1234567894561230";
}
Example MaxiCode barcode

If -ro-barcode-content is not set, PDFreactor will try to use the value of the element's href attribute:

HTML:

<a id="qrcode" href="https://www.pdfreactor.com/"></a>

CSS:

#qrcode {
    -ro-replacedelement: barcode;
    -ro-barcode-type: qrcode;
    -ro-barcode-ecc-level: H;
    -ro-barcode-size: 2;
}
Example QR code

If both -ro-barcode-content and the href attribute are empty, PDFreactor will use the text content of the element. That content is always trimmed, i.e. whitespace characters at its beginning and end are removed. By default other sequences of whitespace characters are collapsed to single spaces. Collapsing can be disabled by changing the value of white-space from normal to pre.

Automatically resolving relative URLs

If a relative URL is set as the barcode's content using the url function, or if it is retrieved from the href attribute, PDFreactor will automatically try to resolve it according to the document's baseUrl.

HTML:

<a id="gridmatrix" href="#Barcodes"></a>

CSS:

#gridmatrix {
    -ro-replacedelement: barcode;
    -ro-barcode-type: grid-matrix;
}
Example grid matrix

If links are enabled, PDFreactor will automatically check whether the content of the barcode is a valid URL and add the respective link.

Customizing the barcode color

By default, all barcodes will be rendered in black with a transparent background. To change the foreground color, you can use the -ro-barcode-color property. If it is set to currentColor, the value of the color property will be used.

Adjusting the barcode size

When adjusting the size of the barcode, you should differentiate between two aspects: On the one hand, there is its natural size (also called intrinsic size), which is the size the barcode itself would have without the influence of the layout around it. It depends on factors like the barcode type, its content and certain settings, for example its ecc level.

And on the other hand, there is the specific size (also called extrinsic size), which is the size the barcode actually consumes in the layout and the resulting document. It depends on the context and CSS styles of the barcode element.

Adjusting the specific size

The replaced element will be adjusted automatically to comply with the surrounding document's layout. However, as the aspect ratio is not always preserved, this might result in distorted barcodes, i.e. having an incorrect aspect ratio. This can be prevented by setting the object-fit property to contain.

Adjusting the natural size

For some barcode types, the -ro-barcode-size property can be used to select a certain sized version. E.g. for a QR code, setting -ro-barcode-size to 10 would result in a version 10 QR code, which contains 57 x 57 modules.

For other types, like databar-expanded, the property adjusts the amount of columns which should be used to store data. Applied to PDF417 codes, the property can additionally be used to adjust how many rows are rendered.

The value defined by -ro-barcode-size might be ignored in some cases, like when the selected size is not sufficient to store the specified amount of data.

If applied to a one dimensional barcode, the property sets the bar height.

More detailed descriptions on how -ro-barcode-size behaves depending on the used barcode type can be found in the appendix.

Object and Embed

PDFreactor supports the object and embed elements of HTML. You can use either element or a combination of both to embed any type of data such as for example a flash animation. The most simple code to do so is:

<embed src="myflash.swf" width="256" height="256"
       type="application/x-shockwave-flash"/>

Besides flash you can also embed various other formats, e.g. videos. The data is embedded in the PDF, but whether or not it is displayed depends on the formats supported by your PDF viewer.

iframes

An iframe allows another document, for example content from other pages, to be embedded inside an existing one.

The source document

There are two ways to define the inner document of an iframe. The first option is to use the src attribute and specifying the URL from which the document should be loaded. The URL might be absolute or relative and should refer to an HTML document.

The second option is useful if the inner document is very short and simple. When using the srcdoc attribute, its value is set to be the inner document's source code.

<iframe src="https://www.pdfreactor.com" width="600" height="400">
</iframe>

<iframe srcdoc="<p>Hello World</p>">
    <b>This is fallback text in case the user-agent does not support
        iframes.</b>
</iframe>
        

If both attributes have been set, srcdoc has priority over src.

Seamless

If the seamless attribute has been set, the iframe's document behaves as it would be in the document that contains the iframe. That means that the width and height of the iframe are ignored and the inner document is shown completely if possible.

Furthermore, the borders of the iframe are removed and most importantly all styles from the outer document are inherited by the inner document.

When generating the PDF, the headings and other bookmark styles inside the iframe are passed through, so they can be found in the bookmark list.

The seamless attribute is a boolean attribute, which means that if it is true it exists and false otherwise. The only valid values of seamless are an empty string or "seamless". The attribute can also be used without any value:

<iframe src="https://www.pdfreactor.com" width="600" height="400"
            seamless>
</iframe>

Generally, true and false are INVALID values for boolean attributes.

Customization

Using CSS styles, it is possible to customize the look and functionality of iframes.

The border, padding and margin can be set or removed with the appropriate styles.

iframe {
    border: none;
    padding: 0px;
    margin: 0px;
}

By default, if seamless is false neither style sheets nor inline styles are passed down to the iframe's document. However, by using the property , this behavior can be customized.

When generating a PDF with the bookmarks feature enabled, the headings in the document are added as bookmarks to quickly navigate the document.

Using the property it is possible to enable or disable this feature for iframes, thus allowing the headings of the inner document to be added to the bookmarks list or not. The property can be either set to true or false. If the iframe is seamless, it is set to true by default.

<iframe src="https://www.pdfreactor.com" width="600" height="400"
    seamless="seamless" style="-ro-passdown-styles:stylesheets-only;
    -ro-bookmarks-enabled:false;">
</iframe>

Canvas Element

PDFreactor has built-in support for the canvas element of HTML5. The canvas element is a dynamic image for rendering graphics primitives on the fly. In contrast to other replaced elements the content of the canvas element must be generated dynamically via , instead of referencing an external resource that contains the content to be displayed (as is the case for example for images).

Below is a simple code fragment which renders shadowed text into a canvas element:

<head>
    <script type="text/javascript">
        function draw() {
            var ctx = document.getElementById("canvas").getContext('2d');
            ctx.font = "50px 'sans-serif'";
            ctx.shadowBlur = 5;
            ctx.shadowColor = "#aaa";
            ctx.shadowOffsetX = 2;
            ctx.shadowOffsetY = 2;
            ctx.fillStyle = "black";
            ctx.fillText("PDFreactor",0,50);
        }
    </script>
</head>
...
<body onload="draw();">
    <canvas id="canvas" width="400" height="300">
        Canvas element is not supported.
    </canvas>
</body>

Resolution Independence

PDFreactor by default does not use a resolution-dependent bitmap as the core of the canvas. Instead it converts the graphics commands from JavaScript to resolution-independent PDF objects. This avoids resolution-related issues like blurriness or pixelation.

Shadows cannot be convert to PDF objects. So those are added as images. This does not affect other objects in the canvas.

Accessing ImageData of a canvas or setting a non-default composite causes that canvas to be rasterized entirely.

This behavior can be configured using CSS:

The style -ro-rasterization: avoid disables functionality that causes the rasterization of the canvas.

The style -ro-rasterization: always forces the canvas to be rasterized in any case.

The property configures the resolution at which the canvas or shadows are rasterized. The default value is 2, meaning twice the default CSS resolution of 96dpi. Accepted values are 1 to 4. Higher resolution factors increase the quality of the image, but also increase the conversion time and the size of the output documents. This does not affect canvas objects that are not rasterized.

PDF Pages as Images

PDFreactor can losslessly embed pages from other PDFs as images in the document to be converted to PDF or draw them into image output. To use a PDF as an image in a document, simply use the img element, like you would for any other image. Example:

<img src="https://resources.myserver.com/report.pdf" />

In the example above, the PDF image will always display the first page of the PDF. You can select which page should be displayed using the CSS property -ro-source-page. The example below shows how to display page 5 of the PDF:

<img src="https://resources.myserver.com/report.pdf" style="-ro-source-page: 5" />

By default the media box, i.e. the entire sheet, of the PDF page is visible and used for sizing. This can be reduced to any other PDF page box like "crop" or "trim" via the property -ro-source-area. The example below shows how to display only the crop box of the PDF page:

<img src="https://resources.myserver.com/report.pdf" style="-ro-source-area: 'crop'" />

PDF images expose the page count of their source document to JavaScript via the proprietary property roPageCount of the img HTML element. If the object is not a PDF image roPageCount will return 0. In the following example, let's assume we have a PDF image with the id "pdfimage":

var reportPdf = document.getElementById("pdfimage");
var pageCount = reportPdf.roPageCount;

Filters and Shadows

Certain effects, like blurring, are not natively supported by the PDF format. In such cases, PDFreactor has to generate an image of the corresponding element, with the effects already applied. The image can always be displayed in the PDF and if necessary an invisible text overlay above the image ensures, that the text inside the element can still be selected, copied and is accessible, e.g. to screen readers.

The CSS properties that require element rasterization are:

When creating soft shadows or using blur filters, the blurring itself is a time-consuming task and can, depending on the content to be generate, increase the creation time of the PDF significantly. Thus blurs and shadows should be used with caution if the conversion time of the PDF is important.

The resolution of the resulting image can be customized via the -ro-rasterization-supersampling property. The default value is 2, meaning 192dpi, as a compromise between quality, performance and size.

Please note that increasing the resolution or applying shadows and filters on large or many elements will not only increase the size of the converted PDF but may also slow down PDF readers.

As a safeguard against memory and performance issues, the maximum size of a single rasterized image can be limited. By default an image will be rasterized to have less than 2 megapixels. This is still large enough to cover an A4 page-sized image with the default supersampling. The CSS property allows to customize or disable that limit.

If the only filter function used is opacity, consider using the CSS property opacity instead. PDFreactor uses native PDF functionality to render the transparent element, thus avoiding the drawbacks of rasterization.

JavaScript

This chapter refers to JavaScript in the input document, processed by PDFreactor like in a browser. There are also:

PDFreactor can be configured to process JavaScript that is embedded into or linked from input HTML documents. This functionality can be enabled as follows:

config.setJavaScriptSettings(new JavaScriptSettings()
    .setEnabled(true));
config.JavaScriptSettings = new JavaScriptSettings
{
    Enabled = true
};
config.javaScriptSettings = {
    enabled: true
};
config.javaScriptSettings = {
    enabled: true
};
$config["javaScriptSettings"] = array(
    "enabled" => true
);
config['javaScriptSettings'] = {
    'enabled': True
}
config['javaScriptSettings'] = {
    enabled: true
}
$config["javaScriptSettings"] = {
    'enabled' => true
};
{ "javaScriptSettings": {
    "enabled": true
}}

Shorthand:

-j

Longhand:

-C config.json

With the following config.json:

{ "javaScriptSettings": {
    "enabled": true
}}

It is also possible to manually add scripts:

config.setUserScripts(
    new Resource().setContent("console.log('test')"));
config.UserScripts = new List<Resource>
{
    new Resource() { Content = "console.log('test')" }
};
config.UserScripts = [
    { content: "console.log('test')" }
];
config.UserScripts = [
    { content: "console.log('test')" }
];
$config["UserScripts"] = array(
    array("content" => "console.log('test')")
);
config['UserScripts'] = [
    { 'content': 'console.log("test")' }
]
config['UserScripts'] = [
    { 'content': 'console.log("test")' }
]
$config["UserScripts"] = [
    { "content" => "console.log('test')" }
);
{ "userScripts": [
    { "content": "console.log('test')" }
]}

Shorthand:

-j "console.log('test')"

Longhand:

-C config.json

With the following config.json:

{ "userScripts": [
    { "content": "console.log('test')" }
]}

The PDFreactor API documentation for details on these API methods.

JavaScript processing during PDF conversion works like it does in a browser, with some exceptions:

JavasScript processing is subject to a few other limitations that will be eliminated in future versions of PDFreactor:

JavaScript modes

Additional debug information can be logged at different granularities, provided that logging is enabled:

config.setJavaScriptSettings(new JavaScriptSettings()
    .setEnabled(true)
    .setDebugMode(JavaScriptDebugMode.EXCEPTIONS));
config.JavaScriptSettings = new JavaScriptSettings
{
    Enabled = true,
    DebugMode = JavaScriptDebugMode.EXCEPTIONS
};
config.javaScriptSettings = {
    enabled: true,
    debugMode: PDFreactor.JavaScriptDebugMode.EXCEPTIONS
};
config.javaScriptSettings = {
    enabled: true,
    debugMode: PDFreactor.JavaScriptDebugMode.EXCEPTIONS
};
$config["javaScriptSettings"] = array(
    "enabled" => true,
    "debugMode" => JavaScriptDebugMode::EXCEPTIONS
);
config['javaScriptSettings'] = {
    'enabled': True,
    'debugMode': PDFreactor.JavaScriptDebugMode.EXCEPTIONS
}
config['javaScriptSettings'] = {
    enabled: true,
    'debugMode': PDFreactor::JavaScriptDebugMode::EXCEPTIONS
}
$config["javaScriptSettings"] = {
    'enabled' => true,
    'debugMode' => PDFreactor::JavaScriptDebugMode->EXCEPTIONS
};
{ "javaScriptSettings": {
    "enabled": true,
    "debugMode": "EXCEPTIONS"
}}
-C config.json

With the following config.json:

{ "javaScriptSettings": {
    "enabled": true,
    "debugMode": "EXCEPTIONS"
}}

The values of JavaScriptDebugMode are, in order of verbosity:

  • NONE: disables debugging. This is the default mode. It is highly recommended for use in production, as all other affect performance negatively by providing the debug information.

  • POSITIONS: enables debugging at the least verbose level. The filenames and line numbers that caused output (e.g. via console.log) are logged. The names of scripts about to be processed are logged as well.

  • EXCEPTIONS: enables debugging with all output from POSITIONS and additionally logs all exceptions thrown during JavaScript processing.

  • FUNCTIONS: enables debugging with all output from EXCEPTIONS and additionally logs all functions entered or exited, including parameters and return values or exceptions.

  • LINES: enables debugging at the most verbose level. In addition to all output from FUNCTIONS every line of executed JavaScript is logged.

JavaScript libraries and frameworks

The following tables lists some of the JavaScript libraries and frameworks supported by PDFreactor:
Library Notes
jQuery functional, extensively tested
Highcharts functional
MooTools functional
Modernizr functional
Flotr2 functional
amCharts functional
Underscore functional
Handlebars functional
Less.js functional
Leaflet functional
RequireJS functional
Prototype functional, except for event functionality
MathJax functional, SVG output only, see

Proprietary Access to Layout Information

PDFreactor allows JavaScript access to some layout information via the proprietary object ro.layout.

Descriptions

Many proprietary JavaScript functions return so called Description objects: PageDescription, BoxDescription, etc. These objects provide layout information on the specific type of document item, such as a document page.

The description objects contain information about the layout of its content. The properties of a , and can be found in Appendix: JavaScript Objects And Types

Description objects are snapshots of the particular moment they were created. Changing the document after getting one has no effect on them.

PageDescriptions

Describes the dimensions of a page and its rectangles as well as some further information. The rectangles are described by using . A is retrieved via the index of the desired page. The first page has the index 0.

var pageDesc = ro.layout.getPageDescription(1);

BoxDescriptions

Describes the position and dimensions of the rectangles of a box as well as some further information. The rectangles are described by using a . A is retrieved via a DOM element, which may have a box, multiple ones or none.

var element = document.querySelector("#myElem");
var boxDescriptions = ro.layout.getBoxDescriptions(element);

if (boxDescriptions.length > 0) {
    var boxDescription = boxDescriptions[0];
}

LineDescriptions

Contains information about a line of text. It can be retrieved from a .

var lineDescriptions = boxDescription.lineDescriptions;

DOMRects

A contains the position and dimensions of a rectangle.

To retrieve the DOMRect from Page- and BoxDescription use the getter functions that take an optional string parameter. This parameter specifies the length unit of the values of the DOMRect and has to be one of the following absolute CSS units: "px", "pt", "pc", "cm", "mm", "in" or "q". By default this value is "px".

var marginRect = boxDescription.getMarginRect("cm");

PDF Output Options

It is possible to specify portions of the PDFreactor configuration in document JavaScript at runtime during the conversion. This can be useful if you want to create PDF attachments dynamically, specify PDF-specific settings like encryption on the fly, change the page order according to content-specific criteria, etc.

You can access these PDF output options via the proprietary object ro.pdf. For a full list of supported properties refer to . The default value of these properties is taken from their respective configuration setting from your PDFreactor configuration. For example, if you have specified the author to be "John Smith" in your configuration, the value of the ro.pdf.author property will also be "John Smith" initially and can be changed as desired.

In some cases it might be desirable to specify PDF attachments not in the PDFreactor API, but dynamically via JavaScript, depending on the document. This example shows how to add a PDF attachment from JavaScript.

ro.pdf.attachments.push({
    name: "log.txt",
    data: "My log text.",
    description: "A JavaScript log"
});

This example uses a custom page order to eliminate the third page from the document.

ro.pdf.pageOrder = "1..2,4..-1";

Even if the integration code specifies an author and a title in the configuration, these values can be overridden at runtime.

Original configuration:

config.setAuthor("Brian Greene");
config.setTitle("The Elegant Universe");
config.Author = "Brian Greene";
config.Title = "The Elegant Universe";
$config["author"] = "Brian Greene";
$config["title"] = "The Elegant Universe";
config['author'] = "Brian Greene"
config['title'] = "The Elegant Universe"
config['author'] = "Brian Greene"
config['title'] = "The Elegant Universe"
config.author = "Brian Greene";
config.title = "The Elegant Universe";
config.author = "Brian Greene";
config.title = "The Elegant Universe";
$config["author"] = "Brian Greene";
$config["title"] = "The Elegant Universe";
{ "author": "Brian Greene", "title": "The Elegant Universe" }
--author "Brian Greene" --title: "The Elegant Universe"

Override at runtime:

ro.pdf.author = "Stephen Hawking";
ro.pdf.title = "The Universe in a Nutshell";

Exporting Data From JavaScript

Sometimes it can be desirable to make data from JavaScript available to the PDFreactor integration for processing after the conversion has finished. You can export data from document JavaScript via the ro.exports JavaScript property. The exported data can then be accessed on the Result object via the javaScriptExports property.

You can export any data type with ro.exports. However, since the property javaScriptExports returns a string, the data will be converted internally. If the data type is not a string, PDFreactor will try to convert it to JSON. If the data can't be converted, a generic string representation of it is used or null if none is available. This means that you can conveniently export JavaScript objects or arrays, and then parse the data back from JSON.

While it is possible to export strings directly, it is generally recommended to only export JavaScript arrays or objects which will be converted into JSON. If an empty string is exported, it is converted to null when accessed through the Result object in the PDFreactor integration.

Export an object:

ro.exports = {
    message: "my exported data",
    content: [ 1, 2, 3 ]
};

The javaScriptExports property of the Result object will return the following string:

{"message":"my exported data","content":[1,2,3]}

This string can then be parsed or processed further.

Timeout

By default, PDFreactor will run JavaScript until it is completed. However, erroneous or malicious scripts might contain endless loops or other structures that will prevent the script from ever finishing. To cancel JavaScript processing after a certain amount of time, you can configure a timeout.

The following example limits JavaScript processing time to 20 seconds.

config.setJavaScriptSettings(new JavaScriptSettings()
    .setEnabled(true)
    .setTimeout(20));
config.JavaScriptSettings = new JavaScriptSettings
{
    Enabled = true,
    Timeout = 20
};
config.javaScriptSettings = {
    enabled: true,
    timeout: 20
};
config.javaScriptSettings = {
    enabled: true,
    timeout: 20
};
$config["javaScriptSettings"] = array(
    "enabled" => true,
    "timeout" => 20
);
config['javaScriptSettings'] = {
    'enabled': True,
    'timeout': 20
}
config['javaScriptSettings'] = {
    'enabled': true,
    'timeout': 20
}
$config["javaScriptSettings"] = {
    'enabled' => true,
    'timeout' => 20
};
{ "javaScriptSettings": {
    "enabled": true,
    "timeout": 20
}}
-C config.json

With the following config.json:

{ "javaScriptSettings": {
    "enabled": true,
    "timeout": 20
}}

awesomizr.js

The JavaScript library awesomizr.js is a collection of helpful functions for the use with PDFreactor. You have to import the JavaScript and in some cases the corresponding CSS. Both the script and the css files are located in the PDFreactor/samples directory.

You can add the library by using the PDFreactor configuration property userScripts. To add the respective CSS, use the property userStyleSheets:

config
    .setUserStyleSheets(new Resource()
        .setUri("awesomizr.css"))
    .setUserScripts(
        new Resource().setUri("awesomizr.js"),
        new Resource().setContent("Awesomizr.createTableOfContents();"));
config.UserStyleSheets = new List<Resource>
{
    new Resource
    {
        Uri = "awesomizr.css"
    }
};
config.UserScripts = new List<Resource>
{
    new Resource
    {
        Uri = "awesomizr.js"
    },
    new Resource
    {
        Content = "Awesomizr.createTableOfContents();"
    }
};
config.userStyleSheets = [{
    uri: "awesomizr.css"
}];
config.userScripts = [{
    uri: "awesomizr.js"
}, {
    content: "Awesomizr.createTableOfContents();"
}];
config.userStyleSheets = [{
    uri: "awesomizr.css"
}];
config.userScripts = [{
    uri: "awesomizr.js"
}, {
    content: "Awesomizr.createTableOfContents();"
}];
$config["userStyleSheets"] = array(
    array(
        "uri" => "awesomizr.css"
    )
);
$config["userScripts"] = array(
    array(
        "uri" => "awesomizr.js"
    ),
    array(
        "content" => "Awesomizr.createTableOfContents();"
    )
);
config['userStyleSheets'] = [{
    'uri': 'awesomizr.css'
}]
config['userScripts] = [{
    'uri': 'awesomizr.css'
}, {
    'content': 'Awesomizr.createTableOfContents();'
}]
config['userStyleSheets'] = [{
    'uri': 'awesomizr.css'
}]
config['userScripts] = [{
    'uri': 'awesomizr.css'
}, {
    'content': 'Awesomizr.createTableOfContents();'
}]
$config["userStyleSheets"] = {[
    "uri" => "awesomizr.css"
]};
$config["userScripts"] = [{
    "uri" => "awesomizr.js"
}, {
    "content" => "Awesomizr.createTableOfContents();"
}];
{
    "userStyleSheets": [{
        "uri": "awesomizr.css"
    }],
    "userScripts": [{
        "uri": "awesomizr.js"
    }, {
        "content": "Awesomizr.createTableOfContents();"
    }]
}
-C config.json

With the following config.json:

{
    "userStyleSheets": [{
        "uri": "awesomizr.css"
    }],
    "userScripts": [{
        "uri": "awesomizr.js"
    }, {
        "content": "Awesomizr.createTableOfContents();"
    }]
}

Of course, the library and the stylesheet can alternatively be imported by the document itself. However, please note that some functions only work with PDFreactor.

The capabilities of awesomizr.js include:

Output Formats

PDF Output

PDF is the default output format of PDFreactor.

Generally PDFreactor generates PDFs with the Adobe PDF version 1.4. However, some PDF features may require viewers that support newer versions of PDF.

PDF/A and PDF/X conformance may force different PDF versions.

The PDF documents created with PDFreactor may contain additional metadata, which may require a PDF reader that is able to display a later version of Adobe PDF correctly.

Some features of PDFreactor are specific to the PDF output format:

Bookmarks

Bookmarks
Bookmarks in the Adobe Reader

PDFreactor adds bookmarks to your document automatically. This can be disabled by using the disableBookmarks configuration property like this:

config.setDisableBookmarks(true);
config.DisableBookmarks = true;
$config["disableBookmarks"] = true;
config['disableBookmarks'] = True
config['disableBookmarks'] = true
config.disableBookmarks = true;
config.disableBookmarks = true;
$config["disableBookmarks"] = true;
{ "disableBookmarks": true }
--disableBookmarks

When the default HTML mode is enabled, some bookmark levels are applied by default, e.g. the following ones for heading elements:

h1 { bookmark-level: 1;}
h2 { bookmark-level: 2;}
h3 { bookmark-level: 3;}
h4 { bookmark-level: 4;}
h5 { bookmark-level: 5;}
h6 { bookmark-level: 6;}

Using the bookmark-level style you can create bookmarks which link to arbitrary XML elements in your PDF files.

element { bookmark-level: 1; }
Using this property, one can structure the specified elements within the bookmark view of the PDF viewer. The elements are ordered in ascending order. The element with the lowest bookmark level is on top of the bookmark hierarchy (similar to HTML headlines). Several bookmark levels can be set using the bookmark-level style.

The property bookmark-state defines whether the entry is initially open, showing its descendants in the bookmark view of the PDF viewer. With the property bookmark-label it is possible to define the bookmark title. By default, the element's text content is used.

How the coordinate to scroll to is determined can be changed via the property , e.g. the scroll target can be offset by 1cm or the page the element is on can be used instead of the element itself.

Links

PDFreactor adds links to your documents by default. This can be disabled by using the disableLinks configuration property like this:

config.setDisableLinks(true);
config.DisableLinks = true;
$config["disableLinks"] = true;
config['disableLinks'] = True
config['disableLinks'] = true
config.disableLinks = true;
config.disableLinks = true;
$config["disableLinks"] = true;
{ "disableLinks": true }
--disableLinks

For HTML documents the following link styles are applied by default, enabling external and internal links:

a[href] { -ro-link:   attr(href); }
a[name] { -ro-anchor: attr(name); }
[id]    { -ro-anchor: attr(id);   }

Using the styles and arbitrary elements can be defined to be links or anchors.

linkElement[linkAttribute] { -ro-link: attr(linkAttribute); }
anchorElement[anchorAttribute] { -ro-anchor: attr(anchorAttribute); }

Some PDF viewers recognize URLs written in plain text and convert them to links. This happens independently of PDFreactor and its settings and properties.

Please see for a way to embed target files into the output PDF instead of linking to them.

The clickable areas of links

The proprietary property can be used to specify how the 'clickable' areas of links are determined.

This style is not inherited. It has to be set on the same elements as -ro-link, when those should deviate from the default value: all.

The scroll coordinate for internal links

How the coordinate to scroll to is determined for internal links can be changed via the property on the target element, e.g. the scroll target can be offset by 1cm or the page the element is on can be used instead of the element itself.

Links in Images

When links are enabled the following also create clickable links:

  • Links in SVGs. The target is taken from the a element itself. The clickable area is the bounding rectangle of all elements contained in that element.

  • HTML image map links. The clickable area and target are based on the attributes of the area.

  • Barcodes containing an absolute URL. Those are clickable in their entirety pointing to that URL.

Metadata

The title of a generated PDF document, as well as the additional metadata author, subject and keywords, can be specified in multiple ways:

By default the <title> tag as well as various <meta> tags are read.

The metadata can also be read from other elements using the properties , , and .

When a metadata property applies to multiple elements the values are concatenated. Therefore it is recommended to disable the default set elements when specifying other ones:

/* Disable setting title from title or meta tags */
head * {
    -ro-title: none;
}
/* Set title from first heading */
body > h1:first-of-type {
    -ro-title: content();
}

The metadata of the document can be overridden from the API. The following metadata can be directly set by PDFreactor:

  • author – The author of the document

  • title – The document's title

  • subject – The subject of the document

  • creator – The content creator

  • keywords – Usually a comma-separated list of keywords for search engines

config
    setAuthor("John Doe")
    setTitle("Architecture of the World Wide Web, Volume One")
    setSubject("Architecture of the world wide web")
    setCreator("John's DoeNuts, Inc.")
    setKeywords("w3c, www");
config.Author = "John Doe";
config.Title = "Architecture of the World Wide Web, Volume One";
config.Subject = "Architecture of the world wide web";
config.Creator = "John's DoeNuts, Inc.";
config.Keywords = "w3c, www";
$config["author"] = "John Doe";
$config["title"] = "Architecture of the World Wide Web, Volume One";
$config["subject"] = "Architecture of the world wide web";
$config["creator"] = "John's DoeNuts, Inc.";
$config["keywords"] = "w3c, www";
config['author'] = 'John Doe'
config['title'] = 'Architecture of the World Wide Web, Volume One'
config['subject'] = 'Architecture of the world wide web'
config['creator'] = 'John's DoeNuts, Inc.'
config['keywords'] = 'w3c, www'
config['author'] = 'John Doe'
config['title'] = 'Architecture of the World Wide Web, Volume One'
config['subject'] = 'Architecture of the world wide web'
config['creator'] = 'John's DoeNuts, Inc.'
config['keywords'] = 'w3c, www'
config.author = "John Doe";
config.title = "Architecture of the World Wide Web, Volume One";
config.subject = "Architecture of the world wide web";
config.creator = "John's DoeNuts, Inc.";
config.keywords = "w3c, www";
config.author = "John Doe";
config.title = "Architecture of the World Wide Web, Volume One";
config.subject = "Architecture of the world wide web";
config.creator = "John's DoeNuts, Inc.";
config.keywords = "w3c, www";
$config["author"] = "John Doe";
$config["title"] = "Architecture of the World Wide Web, Volume One";
$config["subject"] = "Architecture of the world wide web";
$config["creator"] = "John's DoeNuts, Inc.";
$config["keywords"] = "w3c, www";
{ "author": "John Doe",
  "title": "Architecture of the World Wide Web, Volume One",
  "subject": "Architecture of the world wide web",
  "creator": "John's DoeNuts, Inc.",
  "keywords": "w3c, www" }
--author "John Doe" \
--title "Architecture of the World Wide Web, Volume One" \
--subject "Architecture of the world wide web" \
--creator "John's DoeNuts, Inc." \
--keywords "w3c, www"

The code above creates metadata as shown in the screenshot below:

Document properties
Document properties dialog of Adobe Reader

The PDF "producer" property, also known as "encoding software", cannot be overridden. It will always contain PDFreactor's name and version as well as basic information about the used license. For security purposes, the version number as well as the license information can be suppressed. See for more details.

Custom Properties

You can also add custom properties to the documents, for which you can define the name and value, e.g.

config.setCustomDocumentProperties(
        new KeyValuePair("feedback address", "peter@miller.com"));
config.CustomDocumentProperties = new List<KeyValuePair>
{
    new KeyValuePair("feedback address", "peter@miller.com")
};
$config["customDocumentProperties"] = array(
    array(
        "key" => "feedback address",
        "value" => "peter@miller.com"
    )
);
config['customDocumentProperties'] = [
    { 'key': "feedback address", 'value': "peter@miller.com" }
]
config['customDocumentProperties'] = [
    { key: "feedback address", value: "peter@miller.com" }
]
config.customDocumentProperties = [
    { key: "feedback address", value: "peter@miller.com" }
];
config.customDocumentProperties = [
    { key: "feedback address", value: "peter@miller.com" }
];
$config["customDocumentProperties"] = [
    { "key" => "feedback address", "value" => "peter@miller.com" }
];
{ "customDocumentProperties": [
    { "key": "feedback address", "value": "peter@miller.com" }
]}
-C config.json

With the following config.json:

{ "customDocumentProperties": [
    { "key": "feedback address", "value": "peter@miller.com" }
]}

Interactive PDF Forms

HTML forms are rendered automatically by PDFreactor. In addition, you can also convert HTML forms to fully functional interactive PDF forms (sometimes referred to as AcroForms) using the proprietary CSS property . This property must be specified for the forms you wish to convert to an interactive PDF form.

Example form:

<form id="credentials">
    First Name: <input type="text" value="firstname" />
    Last Name: <input type="text" value="lastname" />
    <input type="submit" />
</form>

To convert the form with the ID "credentials" to an AcroForm, you can use this style declaration:

#credentials, #credentials > input { -ro-pdf-format: pdf; }
Using this style declaration, only the form with the ID "credentials" and the input fields contained in this form are converted to an AcroForm when the PDF is rendered. Only the forms and form elements having this CSS style are converted. You can convert all forms and input fields using this CSS code:
form, form input { -ro-pdf-format: pdf; }

Tagged PDF

Tagged PDF files contain information about the structure of the document. The information about the structure is transported via so-called "PDF tags". Tagging a PDF makes it accessible assistive technology like screen readers. Furthermore, depending on the application, it may improve the results of copy and paste or allow more advanced processing of the PDF.

Using the addTags configuration property, you can add PDF tags to the PDF documents generated with PDFreactor. If you are generating a PDF from HTML input, the HTML elements and the resulting layout are automatically mapped to the appropriate PDF tag structures, so all you have to do is set the following configuration property to enable this feature:

config.setAddTags(true);
config.AddTags = true;
$config["addTags"] = true;
config['addTags'] = True
config['addTags'] = true
config.addTags = true;
config.addTags = true;
$config["addTags"] = true;
{ "addTags": true }
--addTags

PDF tagging is automatically enabled when it is required by a PDF conformance, like PDF/A-1a, PDF/A-3a or PDF/UA.

For accessible documents it is required to specify the document language, see .

For documents containing text in RTL direction that have to be accessible the property must not be set to "speed", as that does only ensure that the text is in the correct order visually, but not logically.

Creating tagged PDFs from non-HTML input documents

When generating PDFs from XML dialects, like DocBook, the elements of this XML language cannot be mapped to PDF tag types automatically. Most of the tag structure is still generated from the information available from the layout of paragraphs, lists, tables and so on. It is, however, necessary to manually mark elements with semantic or structural properties, especially headings.

To do so you can map XML elements to PDF tag types using proprietary CSS. The relevant properties are and , as well as to some extend and .

"-ro-pdf-tag-type" is used to map an element of the XML language you are using to a PDF tag, for example:

sect1 > title {
    -ro-pdf-tag-type: "H2";
}
If you were using DocBook, this would map the "title" elements inside "sect1" elements to the PDF tag "H2" (heading, level 2).

The property "-ro-alt-text" is used to specify an alternative description for an XML element. Example:

img {
    -ro-pdf-tag-type: "Figure";
}
img[alt] {
    -ro-alt-text: attr(alt);
}
The example above maps the HTML element <img> to the PDF tag "Figure", and the content of its alt attribute to an alternative description for this tag.

You can use the property to define which elements or attributes in the input document are used as the source for the names of form elements in the generated PDF. By default, the names are adopted from the value attribute of the form element.

Using the , the name for radio button groups can be adopted in the same way. By default, it will be adopted from the name attribute of the radio button element.

PDF/A Conformance

PDFreactor supports the creation of PDF/A-1a or PDF/A-3a conformant files, as well as other PDF/A sub-formats, which, however, will not be covered in detail.

PDF/A is a family of ISO standards ("ISO 19005") for long-term archiving of documents. The goal of these standards is to ensure the reproduction of the visual appearance as well as the inclusion of the document's structure. All information necessary for displaying the document in the same way every time is embedded in the file. Dependencies on external resources are not permitted. PDF/A-1a and PDF/A-3a also require the output PDF documents to be tagged, providing accessible documents. PDFreactor will automatically ensure the requirements are met as far as possible.

Many companies and government organizations worldwide require PDF/A compliant documents.

PDF/A-1a is the strictest PDF/A standard while the newer PDF/A-3a is more lenient, e.g. allowing transparency and attachments.

PDF/A imposes the following restrictions, which PDFreactor automatically enforces (overriding configuration settings), so no manual intervention is required unless noted otherwise:

  • All used fonts are embedded.

  • All images are embedded.

  • Multi-media content is forbidden.

  • PDF Script is prohibited. (Does not affect JavaScript in the source HTML document)

  • Encryption is prohibited.

  • The PDF must be tagged.

  • Metadata included in the PDF is required to be standard-based XMP.

  • Colors are specified in a device-independent manner. (see below)

  • Attachments are prohibited. (PDF/A-1 only)

  • Transparency is prohibited (PDF/A-1 only), see image alpha channels in PDF/A-1.

PDF/A documents must use either RGB or CMYK colors exclusively (color keywords and gray colors will be converted appropriately). By default RGB colors are expected. Using CMYK requires an output intent including an ICC profile. (It is also possible to specify an RGB profile to replace the default sRGB.) Please see .

To create a PDF/A conformant document, the configuration property conformance is used in the PDFreactor integration, e.g.:

config.setConformance(Conformance.PDFA3A);
config.Conformance = Conformance.PDFA3A;
$config["conformance"] = Conformance::PDFA3A;
config['conformance'] = PDFreactor.Conformance.PDFA3A
config['conformance'] = PDFreactor::Conformance::PDFA3A
config.conformance = PDFreactor.Conformance.PDFA3A;
config.conformance = PDFreactor.Conformance.PDFA3A;
$config["conformance"] = PDFreactor::Conformance->PDFA3A;
{ "conformance": "PDFA3A" }
--conformance "PDFA3A"

The supported PDF/A conformance levels are PDF/A-1a, PDF/A-1b, PDF/A-2a, PDF/A-2b, PDF/A-2u, PDF/A-3a, PDF/A-3b and PDF/A-3u.

PDF/A-1 alpha channels

Images in PDF/A-1 document may have an alpha channel. However, the values in the channel may only be the minimum and maximum, i.e. fully transparent and fully opaque. For images that violate that requirement PDFreactor applies dithering to the alpha channel to create a valid one that approximates the original.

PDFreactor can ignore the alpha channels of images, thus making them compatible with PDF/A-1 output. This can be done with the ignoreAlpha configuration property like this:

config.setIgnoreAlpha(true);
config.IgnoreAlpha = true;
$config["ignoreAlpha"] = true;
config['ignoreAlpha'] = True
config['ignoreAlpha'] = true
config.ignoreAlpha = true;
config.ignoreAlpha = true;
$config["ignoreAlpha"] = true;
{ "ignoreAlpha": true }
--ignoreAlpha

Please note that ignoring the alpha channel of images may lead to unexpected results.

Validation

PDFreactor can optionally validate the generated PDF against specified PDF/A or PDF/UA conformances using the configuration property validateConformance. Validation is optional and might take several minutes depending on the size and complexity of the document. It can be enabled like this:

config.setValidateConformance(true);
config.ValidateConformance = true;
$config["validateConformance"] = true;
config['validateConformance'] = True
config['validateConformance'] = true
config.validateConformance = true;
config.validateConformance = true;
$config["validateConformance"] = true;
{ "validateConformance": true }
--validateConformance

It is also possible to create documents that are PDF/UA compliant in addition to being PDF/A compliant, combining the benefits of both formats for maximum accessibility and archivability. We highly recommend adding PDF/UA conformance when creating PDF/A documents:

config.setConformance(Conformance.PDFA3A_PDFUA);
config.Conformance = Conformance.PDFA3A_PDFUA;
$config["conformance"] = Conformance::PDFA3A_PDFUA;
config['conformance'] = PDFreactor.Conformance.PDFA3A_PDFUA
config['conformance'] = PDFreactor::Conformance::PDFA3A_PDFUA
config.conformance = PDFreactor.Conformance.PDFA3A_PDFUA;
config.conformance = PDFreactor.Conformance.PDFA3A_PDFUA;
$config["conformance"] = PDFreactor::Conformance->PDFA3A_PDFUA;
{ "conformance": "PDFA3A_PDFUA" }
--conformance "PDFA3A_PDFUA"

PDF/UA Conformance

PDF/UA is a standard for accessible PDF documents, which has been adopted as a recommendation or requirement by many organizations worldwide.

It primarily defines correct PDF tagging. The only other restriction that may require manual intervention is that the document must have a title. (If the title is not specified in the input document, it can be set via the configuration property title.)

PDFreactor can create PDF/UA compliant documents. Tagging is done by a sophisticated algorithm. For most documents it does not require any manual tweaking to produce results that pass accessibility checks with no errors and little to no warnings.

To create a PDF/UA conformant document, the configuration property conformance can be used in the PDFreactor integration, e.g.:

config.setConformance(Conformance.PDFUA);
config.Conformance = Conformance.PDFUA;
$config["conformance"] = Conformance::PDFUA;
config['conformance'] = PDFreactor.Conformance.PDFUA
config['conformance'] = PDFreactor::Conformance::PDFUA
config.conformance = PDFreactor.Conformance.PDFUA;
config.conformance = PDFreactor.Conformance.PDFUA;
$config["conformance"] = PDFreactor::Conformance->PDFUA;
{ "conformance": "PDFUA" }
--conformance "PDFUA"

It is also possible to create documents that are PDF/A compliant in addition to being PDF/UA compliant, combining the benefits of both formats for maximum accessibility and archivability. We recommend adding PDF/A-3a conformance when creating PDF/UA documents, as long as the additional restrictions are met by the input document.

config.setConformance(Conformance.PDFA3A_PDFUA);
config.Conformance = Conformance.PDFA3A_PDFUA;
$config["conformance"] = Conformance::PDFA3A_PDFUA;
config['conformance'] = PDFreactor.Conformance.PDFA3A_PDFUA
config['conformance'] = PDFreactor::Conformance::PDFA3A_PDFUA
config.conformance = PDFreactor.Conformance.PDFA3A_PDFUA;
config.conformance = PDFreactor.Conformance.PDFA3A_PDFUA;
$config["conformance"] = PDFreactor::Conformance->PDFA3A_PDFUA;
{ "conformance": "PDFA3A_PDFUA" }
--conformance "PDFA3A_PDFUA"

PDF/X Conformance

PDFreactor supports the creation of PDF/X conformant files, specifically PDF/X-1a:2001, PDF/X-3:2002, PDF/X-1a:2003, PDF/X-3:2003, PDF/X-4 and PDF/X-4p. PDF/X restrictions and requirements are enforced as far as possible, which may cause configuration settings to be overridden or conversions to fail with an error message describing non-compliant content or settings that have to be resolved manually. The restrictions and requirements of PDF/X include:

  • All Fonts must be embedded.

  • Multimedia content and non-printable annotations are prohibited.

  • Encryption is prohibited.

  • No scripts may be embedded. (This does not affect JavaScript in the input document.)

  • Transparency is prohibited (except in PDF/X-4), see image alpha channels in PDF/A-1.

  • Colors must be specified as CMYK, gray, keywords or spot. (PDF/X-3 relaxes this restriction to allow RGB. However, this requires ICC profile based conversion, which not every print workflow can handle.)

  • An output intent is required, consisting of an output condition identifier string and an ICC profile. (Depending on the exact conformance and target environment it may be legal or required to omit the ICC profile, as long as the identifier is known to the target environment. Constants for the default profiles of Adobe Acrobat Pro DC are available for usage with PDF/X-4p. Please note that the availability of these default profiles may vary between different versions of Acrobat Pro.) Please see .

  • The title metadata is required. Usually, it is set by the document's title element, but it can also be set by the CSS property -ro-title. The third option is to set it via the configuration property title. Please see .

To create a PDF/X conformant document, the configuration property conformance can be used in the PDFreactor integration, e.g.:

config.setConformance(Conformance.PDFX4);
config.Conformance = Conformance.PDFX4;
$config["conformance"] = Conformance::PDFX4;
config['conformance'] = PDFreactor.Conformance.PDFX4
config['conformance'] = PDFreactor::Conformance::PDFX4
config.conformance = PDFreactor.Conformance.PDFX4;
config.conformance = PDFreactor.Conformance.PDFX4;
$config["conformance"] = PDFreactor::Conformance->PDFX4;
{ "conformance": "PDFX4" }
--conformance "PDFX4"

ICC Profiles and Output Intents

PDFreactor allows you to set the output intent of the PDF document, consisting of an identifier and an ICC profile. This is required for certain PDF/A and PDF/X conformance modes, with the ICC profile being optional in some cases. The example below demonstrates how to use the configuration property outputIntent:

config.setOutputIntent(new OutputIntent()
    .setIdentifier("ICC profile identifier")

    // Use this if you are loading the ICC profile via URL (ignored if data is set)
    .setUrl("URL/to/ICC/profile")

    // Use this if you want to specify the ICC profile's binary data
    .setData(iccProfileByteArray)
);
config.OutputIntent = new OutputIntent
{
    Identifier = "ICC profile identifier",

    // Use this if you are loading the ICC profile via URL (ignored if data is set)
    Url = "URL/to/ICC/profile",

    // Use this if you want to specify the ICC profile's binary data
    Data = iccProfileByteArray
};
config.outputIntent = {
    identifier: "ICC profile identifier",

    // Use this if you are loading the ICC profile via URL (ignored if data is set)
    url: "URL/to/ICC/profile",

    // Use this if you want to specify the ICC profile's binary data as base64 string
    data: iccProfileBase64
};
config.outputIntent = {
    identifier: "ICC profile identifier",

    // Use this if you are loading the ICC profile via URL (ignored if data is set)
    url: "URL/to/ICC/profile",

    // Use this if you want to specify the ICC profile's binary data as base64 string
    data: iccProfileBase64
};
$config["outputIntent"] = array(
    "identifier" => "ICC profile identifier",

    // Use this if you are loading the ICC profile via URL (ignored if data is set)
    "url" => "URL/to/ICC/profile",

    // Use this if you want to specify the ICC profile's binary data as base64 string
    "data" => iccProfileBase64
);
config['outputIntent'] = {
    'identifier': "ICC profile identifier",

    # Use this if you are loading the ICC profile via URL (ignored if data is set)
    'url': "URL/to/ICC/profile",

    # Use this if you want to specify the ICC profile's binary data as base64 string
    'data': iccProfileBase64
}
config['outputIntent'] = {
    identifier: "ICC profile identifier",

    # Use this if you are loading the ICC profile via URL (ignored if data is set)
    url: "URL/to/ICC/profile",

    # Use this if you want to specify the ICC profile's binary data as base64 string
    data: iccProfileBase64
}
$config["outputIntent"] = {
    "identifier" => "ICC profile identifier",

    # Use this if you are loading the ICC profile via URL (ignored if data is set)
    "url" => "URL/to/ICC/profile",

    # Use this if you want to specify the ICC profile's binary data as base64 string
    "data" => iccProfileBase64
};
{ "outputIntent": {
    "identifier": "ICC profile identifier",
    "url": "URL/to/ICC/profile",
    "data": iccProfileBase64
}}
-C config.json

With the following config.json:

{ "outputIntent": {
    "identifier": "ICC profile identifier",
    "url": "URL/to/ICC/profile",
    "data": iccProfileBase64
}}

The property identifier sets a string identifying the intended output device or production condition in human- or machine-readable form. The property url points to an ICC profile file and the property data sets the binary data of such a profile, the latter having priority.

The color space of the output intent profile overrides the target color space.

Color Space Conversion

In cases when output PDF documents must consist only of colors and images of a certain color space, but not all input documents and resources match that, you can enable color space conversion. For example, you can convert all CSS colors and images to CMYK with a specified ICC profile matching the output intent of a PDF/A or a PDF/X for printing:

// The required output intent
config.setOutputIntent(new OutputIntent()
    .setIdentifier("ICC profile identifier")
    .setUrl("URL/to/ICC/profile"));
// Color space conversion settings
config.setColorSpaceSettings(new ColorSpaceSettings()
    // The same profile as the output intent, required for accurate conversion to CMYK
    .setCmykIccProfile(new Resource().setUri("URL/to/ICC/profile"))
    // Not necessary to set in this case (overridden by output intent), but recommended
    .setTargetColorSpace(ColorSpace.CMYK)
    // Enable conversion of RGB colors and images to CMYK
    .setConversionEnabled(true));
// The required output intent
config.OutputIntent = new OutputIntent()
{
    Identifier = "ICC profile identifier",
    Url = "URL/to/ICC/profile"
};
// Color space conversion settings
config.ColorSpaceSettings = new ColorSpaceSettings
{
    // The same profile as the output intent, required for accurate conversion to CMYK
    CmykIccProfile = new Resource() { Uri = "URL/to/ICC/profile" },
    // Not necessary to set in this case (overridden by output intent), but recommended
    TargetColorSpace = ColorSpace.CMYK,
    // Enable conversion of RGB colors and images to CMYK
    ConversionEnabled = true
};
// The required output intent
config.outputIntent = {
    identifier: "ICC profile identifier",
    url: "URL/to/ICC/profile"
};
// Color space conversion settings
config.colorSpaceSettings = {
    // The same profile as the output intent, required for accurate conversion to CMYK
    cmykIccProfile: { uri: "URL/to/ICC/profile" },
    // Not necessary to set in this case (overridden by output intent), but recommended
    targetColorSpace: PDFreactor.ColorSpace.CMYK,
    // Enable conversion of RGB colors and images to CMYK
    conversionEnabled: true
};
// The required output intent
config.outputIntent = {
    identifier: "ICC profile identifier",
    url: "URL/to/ICC/profile"
};
// Color space conversion settings
config.colorSpaceSettings = {
    // The same profile as the output intent, required for accurate conversion to CMYK
    cmykIccProfile: { uri: "URL/to/ICC/profile" },
    // Not necessary to set in this case (overridden by output intent), but recommended
    targetColorSpace: PDFreactor.ColorSpace.CMYK,
    // Enable conversion of RGB colors and images to CMYK
    conversionEnabled: true
};
// The required output intent
$config["outputIntent"] = array(
    "identifier" => "ICC profile identifier",
    "url" => "URL/to/ICC/profile"
);
// Color space conversion settings
$config["colorSpaceSettings"] = array(
    // The same profile as the output intent, required for accurate conversion to CMYK
    "cmykIccProfile" => array("uri" => "URL/to/ICC/profile"),
    // Not necessary to set in this case (overridden by output intent), but recommended
    "targetColorSpace" => ColorSpace::CMYK,
    // Enable conversion of RGB colors and images to CMYK
    "conversionEnabled" => true
);
# The required output intent
config['outputIntent'] = {
    'identifier': "ICC profile identifier",
    'url': "URL/to/ICC/profile"
}
# Color space conversion settings
config['colorSpaceSettings'] = {
    # The same profile as the output intent, required for accurate conversion to CMYK
    'cmykIccProfile': { 'uri': "URL/to/ICC/profile" },
    # Not necessary to set in this case (overridden by output intent), but recommended
    'targetColorSpace': PDFreactor.ColorSpace.CMYK,
    # Enable conversion of RGB colors and images to CMYK
    'conversionEnabled' True
}
# The required output intent
config['outputIntent'] = {
    identifier: "ICC profile identifier",
    url: "URL/to/ICC/profile"
}
# Color space conversion settings
config['colorSpaceSettings'] = {
    # The same profile as the output intent, required for accurate conversion to CMYK
    cmykIccProfile: { uri: "URL/to/ICC/profile" },
    # Not necessary to set in this case (overridden by output intent), but recommended
    targetColorSpace: PDFreactor::ColorSpace::CMYK,
    # Enable conversion of RGB colors and images to CMYK
    conversionEnabled True
}
# The required output intent
$config["outputIntent"] = {
    "identifier" => "ICC profile identifier",
    "url" => "URL/to/ICC/profile"
};
# Color space conversion settings
$config["colorSpaceSettings"] = {
    # The same profile as the output intent, required for accurate conversion to CMYK
    "cmykIccProfile" => { "uri" => "URL/to/ICC/profile" },
    # Not necessary to set in this case (overridden by output intent), but recommended
    "targetColorSpace" => PDFreactor::ColorSpace->CMYK,
    # Enable conversion of RGB colors and images to CMYK
    "conversionEnabled" => true
};
{
    "outputIntent": {
        "identifier": "ICC profile identifier",
        "url": "URL/to/ICC/profile"
    },
    "colorSpaceSettings": {
        "cmykIccProfile": { "uri": "URL/to/ICC/profile" },
        "targetColorSpace": "CMYK",
        "conversionEnabled": true
}

You can also create a web version, which is smaller and in RGB:

// (No output intent required)
// Color space conversion settings
config.setColorSpaceSettings(new ColorSpaceSettings()
    // When converting to RGB the profile is used for accurate conversion from CMYK
    .setCmykIccProfile(new Resource().setUri("URL/to/ICC/profile"))
    // Not necessary to set in this case (default), but recommended
    .setTargetColorSpace(ColorSpace.RGB)
    // Enable conversion of CMYK colors and images to RGB
    .setConversionEnabled(true));
// Reduce image sizes by resampling and compression
config.setUserStyleSheets(new Resource().setContent(
    // downsample images that (in the final layout)
    // have a resolution of more then 200dpi
    "* { -ro-image-resampling: 200dpi; "
    // recompress all images to JPEG with a quality of 90%
      + "-ro-image-recompression: jpeg(90%) }"));
// (No output intent required)
// Color space conversion settings
config.ColorSpaceSettings = new ColorSpaceSettings {
    // When converting to RGB the profile is used for accurate conversion from CMYK
    CmykIccProfile = new Resource() { Uri = "URL/to/ICC/profile" },
    // Not necessary to set in this case (default), but recommended
    TargetColorSpace = ColorSpace.RGB,
    // Enable conversion of CMYK colors and images to RGB
    ConversionEnabled = true
};
// Reduce image sizes by resampling and compression
config.UserStyleSheets = new List<Resource>
{
    new Resource
    {
        // downsample images that (in the final layout)
        // have a resolution of more then 200dpi
        // recompress all images to JPEG with a quality of 90%
        Content = "* { -ro-image-resampling: 200dpi; -ro-image-recompression: jpeg(90%) }"
    }
};
// (No output intent required)
// Color space conversion settings
config.colorSpaceSettings = {
    // When converting to RGB the profile is used for accurate conversion from CMYK
    cmykIccProfile: { uri: "URL/to/ICC/profile" },
    // Not necessary to set in this case (default), but recommended
    targetColorSpace: PDFreactor.ColorSpace.RGB,
    // Enable conversion of CMYK colors and images to RGB
    conversionEnabled: true
};
// Reduce image sizes by resampling and compression
config.userStyleSheets = [{
    // downsample images that (in the final layout)
    // have a resolution of more then 200dpi
    // recompress all images to JPEG with a quality of 90%
    content: "* { -ro-image-resampling: 200dpi; -ro-image-recompression: jpeg(90%) }"
}];
// (No output intent required)
// Color space conversion settings
config.colorSpaceSettings = {
    // When converting to RGB the profile is used for accurate conversion from CMYK
    cmykIccProfile: { uri: "URL/to/ICC/profile" },
    // Not necessary to set in this case (default), but recommended
    targetColorSpace: PDFreactor.ColorSpace.RGB,
    // Enable conversion of CMYK colors and images to RGB
    conversionEnabled: true
};
// Reduce image sizes by resampling and compression
config.userStyleSheets = [{
    // downsample images that (in the final layout)
    // have a resolution of more then 200dpi
    // recompress all images to JPEG with a quality of 90%
    content: "* { -ro-image-resampling: 200dpi; -ro-image-recompression: jpeg(90%) }"
}];
// (No output intent required)
// Color space conversion settings
$config["colorSpaceSettings"] = array(
    // When converting to RGB the profile is used for accurate conversion from CMYK
    "cmykIccProfile" => array("uri" => "URL/to/ICC/profile"),
    // Not necessary to set in this case (default), but recommended
    "targetColorSpace" => ColorSpace::RGB,
    // Enable conversion of CMYK colors and images to RGB
    "conversionEnabled" => true
);
// Reduce image sizes by resampling and compression
$config["userStyleSheets"] = array(
    array(
        // downsample images that (in the final layout)
        // have a resolution of more then 200dpi
        // recompress all images to JPEG with a quality of 90%
        "content" => "* { -ro-image-resampling: 200dpi; -ro-image-recompression: jpeg(90%) }"
    )
}];
# (No output intent required)
# Color space conversion settings
config['colorSpaceSettings'] = {
    # When converting to RGB the profile is used for accurate conversion from CMYK
    'cmykIccProfile': { 'uri': "URL/to/ICC/profile" },
    # Not necessary to set in this case (default), but recommended
    'targetColorSpace': PDFreactor.ColorSpace.RGB,
    # Enable conversion of CMYK colors and images to RGB
    'conversionEnabled': True
}
# Reduce image sizes by resampling and compression
config.userStyleSheets = [{
    # downsample images that (in the final layout)
    # have a resolution of more then 200dpi
    # recompress all images to JPEG with a quality of 90%
    'content': "* { -ro-image-resampling: 200dpi; -ro-image-recompression: jpeg(90%) }"
}]
# (No output intent required)
# Color space conversion settings
config.colorSpaceSettings = {
    # When converting to RGB the profile is used for accurate conversion from CMYK
    cmykIccProfile: { uri: "URL/to/ICC/profile" },
    # Not necessary to set in this case (default), but recommended
    targetColorSpace: PDFreactor::ColorSpace::RGB,
    # Enable conversion of CMYK colors and images to RGB
    conversionEnabled: true
}
# Reduce image sizes by resampling and compression
config.userStyleSheets = [{
    # downsample images that (in the final layout)
    # have a resolution of more then 200dpi
    # recompress all images to JPEG with a quality of 90%
    content: "* { -ro-image-resampling: 200dpi; -ro-image-recompression: jpeg(90%) }"
}]
# (No output intent required)
# Color space conversion settings
$config["colorSpaceSettings"] = {
    # When converting to RGB the profile is used for accurate conversion from CMYK
    "cmykIccProfile" => { "uri": "URL/to/ICC/profile" },
    # Not necessary to set in this case (default), but recommended
    "targetColorSpace" => PDFreactor::ColorSpace->RGB,
    # Enable conversion of CMYK colors and images to RGB
    "conversionEnabled" => true
};
# Reduce image sizes by resampling and compression
$config.userStyleSheets = [{
    # downsample images that (in the final layout)
    # have a resolution of more then 200dpi
    # recompress all images to JPEG with a quality of 90%
    "content" => "* { -ro-image-resampling: 200dpi; -ro-image-recompression: jpeg(90%) }"
}];
{
    "colorSpaceSettings": {
        "cmykIccProfile": { "uri": "URL/to/ICC/profile" },
        "targetColorSpace": "RGB",
        "conversionEnabled": true
    },
    "userStyleSheets": [{
        "content": "* { -ro-image-resampling: 200dpi; -ro-image-recompression: jpeg(90%) }"
    }
}
-C config.json

With the following config.json:

{
    "colorSpaceSettings": {
        "cmykIccProfile": { "uri": "URL/to/ICC/profile" },
        "targetColorSpace": "RGB",
        "conversionEnabled": true
    },
    "userStyleSheets": [{
        "content": "* { -ro-image-resampling: 200dpi; -ro-image-recompression: jpeg(90%) }"
    }
}

If "cmykIccProfile" is not set, naive conversion, similar to the one of PDF viewers, is used.

Print Dialog Prompt

PDFreactor can be configured to immediately display a print dialog when a PDF file created with PDFreactor is opened. To do so, the printDialogPrompt configuration property must be used:

config.setPrintDialogPrompt(true);
config.PrintDialogPrompt = true;
$config["printDialogPrompt"] = true;
config['printDialogPrompt'] = True
config['printDialogPrompt'] = true
config.printDialogPrompt = true;
config.printDialogPrompt = true;
$config["printDialogPrompt"] = true;
{ "printDialogPrompt": true }
--printDialogPrompt

PDF Compression

Using the configuration property fullCompression, PDF files can be generated with full compression, thus reducing the file size of the resulting PDF document.

Example usage:

config.setFullCompression(true);
config.FullCompression = true;
$config["fullCompression"] = true;
config['fullCompression'] = True
config['fullCompression'] = true
config.fullCompression = true;
config.fullCompression = true;
$config["fullCompression"] = true;
{ "fullCompression": true }
--fullCompression

Your PDF reader needs to support Adobe PDF version 1.5 in order to be able to display PDF documents created with full compression enabled.

This lossless compression generally has little impact on the size of images. However, it is possible to use proprietary CSS properties to significantly reduce the resolution and quality of images and thus the file size of the PDF. See and for more information.

Full compression also eliminates some inherent size limitations of the PDF format, see .

Encryption and Restrictions

PDFreactor can protect generated PDF documents via 40 or 128 bit encryption.

To encrypt the output PDF, set the encryption strength to a value other than ENCRYPTION_NONE:

config.setEncryption(Encryption.TYPE_128);
config.Encryption = Encryption.TYPE_128;
$config["encryption"] = Encryption::TYPE_128;
config['encryption'] = PDFreactor.Encryption.TYPE_128
config['encryption'] = PDFreactor::Encryption::TYPE_128
config.encryption = PDFreactor.Encryption.TYPE_128;
config.encryption = PDFreactor.Encryption.TYPE_128;
$config["encryption"] = PDFreactor::Encryption->TYPE_128;
{ "encryption": "TYPE_128" }
--encryption "TYPE_128"

When the PDF document is opened, the user has to supply the user password in order to view the content. When no user password is set, the PDF can be viewed by any user. In either case, certain restrictions are imposed. These can be suspended by supplying the owner password. You can set the passwords as follows:

config
    .setUserPassword("upasswd")
    .setOwnerPassword("opasswd");
config.UserPassword = "upasswd";
config.OwnerPassword = "opasswd";
$config["userPassword"] = "upasswd";
$config["ownerPassword"] = "opasswd";
config['userPassword'] = "upasswd"
config['ownerPassword'] = "opasswd"
config['userPassword'] = "upasswd"
config['ownerPassword'] = "opasswd"
config.userPassword = "upasswd";
config.ownerPassword = "opasswd";
config.userPassword = "upasswd";
config.ownerPassword = "opasswd";
$config["userPassword"] = "upasswd";
$config["ownerPassword"] = "opasswd";
{ "userPassword": "upasswd"
  "ownerPassword": "opasswd" }
--userPassword "upasswd" \
--ownerPassword: "opasswd"

Though not recommended for security reasons, both passwords can be omitted. However, the owner password must be specified for certain postprocessing steps, e.g. for digital signing or merging.

By default, all restrictions are imposed on the PDF document. You can, however, exclude selected ones by using the following configuration properties:

List of configuration properties to disable restrictions
Property name Allows ...
allowPrinting printing
allowCopy copying or otherwise extracting content
allowAnnotations adding or modifying annotations and interactive form fields
allowModifyContents modifying the content of the document
allowDegradedPrinting printing (same as allowPrinting, however, with a limited resolution) (128 bit encryption only)
allowFillIn filling in form fields (128 bit encryption only)
allowAssembly inserting, removing and rotating pages and adding bookmarks (128 bit encryption only)
allowScreenReaders extracting content for use by accessibility devices (128 bit encryption only)

API docs for further information.

Viewer Preferences

You can configure the initial presentation of the document in the viewer by setting viewer preferences. Setting viewer preferences will activate / deactivate certain options of the viewer, for example it allows to hide the viewer's toolbar when the document is opened.

Note that these preferences are not enforced, i.e. if you decide to set the HIDE_TOOLBAR preference, the user can still display the toolbar again when viewing this PDF if he decides to do so. Setting this preference only affects the default state of the toolbar when the document is opened, but does not enforce this state.

Some viewer preferences also influence the default settings of the print dialog of the viewer.

You can set viewer preferences by using the configuration property viewerPreferences, e.g.:

config.setViewerPreferences(ViewerPreferences.PAGE_LAYOUT_SINGLE_PAGE,
    ViewerPreferences.DISPLAY_DOC_TITLE);
config.ViewerPreferences = new List<ViewerPreferences>
{
    ViewerPreferences.PAGE_LAYOUT_SINGLE_PAGE,
    ViewerPreferences.DISPLAY_DOC_TITLE
};
$config["viewerPreferences"] = new array(
    ViewerPreferences::PAGE_LAYOUT_SINGLE_PAGE
    ViewerPreferences::DISPLAY_DOC_TITLE
);
config['viewerPreferences'] = [
    PDFreactor.ViewerPreferences.PAGE_LAYOUT_SINGLE_PAGE
    PDFreactor.ViewerPreferences.DISPLAY_DOC_TITLE
]
config['viewerPreferences'] = [
    PDFreactor::ViewerPreferences::PAGE_LAYOUT_SINGLE_PAGE
    PDFreactor::ViewerPreferences::DISPLAY_DOC_TITLE
]
config.viewerPreferences = [
    PDFreactor.ViewerPreferences.PAGE_LAYOUT_SINGLE_PAGE
    PDFreactor.ViewerPreferences.DISPLAY_DOC_TITLE
];
config.viewerPreferences = [
    PDFreactor.ViewerPreferences.PAGE_LAYOUT_SINGLE_PAGE
    PDFreactor.ViewerPreferences.DISPLAY_DOC_TITLE
];
$config["viewerPreferences"] = [
    PDFreactor::ViewerPreferences->PAGE_LAYOUT_SINGLE_PAGE
    PDFreactor::ViewerPreferences->DISPLAY_DOC_TITLE
];
{ "viewerPreferences" = [ "PAGE_LAYOUT_SINGLE_PAGE", "DISPLAY_DOC_TITLE" ]}
--viewerPreferences "PAGE_LAYOUT_SINGLE_PAGE" "DISPLAY_DOC_TITLE"

PDFreactor supports the following viewer preferences:

List of Viewer Preferences
Viewer Preference Effect
PAGE_LAYOUT_SINGLE_PAGE Display one page at a time.
PAGE_LAYOUT_ONE_COLUMN Display the pages in one column.
PAGE_LAYOUT_TWO_COLUMN_LEFT Display the pages in two columns, with odd numbered pages on the left.
PAGE_LAYOUT_TWO_COLUMN_RIGHT Display the pages in two columns, with odd numbered pages on the right.
PAGE_LAYOUT_TWO_PAGE_LEFT Display two pages at a time, with odd numbered pages on the left.
PAGE_LAYOUT_TWO_PAGE_RIGHT Display two pages at a time, with odd numbered pages on the right.
PAGE_MODE_USE_NONE Show no panel on startup.
PAGE_MODE_USE_OUTLINES Show bookmarks panel on startup.
PAGE_MODE_USE_THUMBS Show thumbnail images panel on startup.
PAGE_MODE_FULLSCREEN Switch to full screen mode on startup.
PAGE_MODE_USE_OC Show optional content group panel on startup.
PAGE_MODE_USE_ATTACHMENTS Show attachments panel on startup.
HIDE_TOOLBAR Hide the viewer application's tool bars when the document is active.
HIDE_MENUBAR Hide the viewer application's menu bar when the document is active.
HIDE_WINDOW_UI Hide user interface elements in the document's window.
FIT_WINDOW Resize the document's window to fit the size of the first displayed page
CENTER_WINDOW Position the document's window in the center of the screen.
DISPLAY_DOC_TITLE Display the document's title in the top bar.
NON_FULLSCREEN_PAGE_MODE_USE_NONE Show no panel on exiting full-screen mode. Has to be combined with PageModeFullScreen.
NON_FULLSCREEN_PAGE_MODE_USE_OUTLINES Show bookmarks panel on exiting full-screen mode. Has to be combined with PageModeFullScreen.
NON_FULLSCREEN_PAGE_MODE_USE_THUMBS Show thumbnail images panel on exiting full-screen mode. Has to be combined with PageModeFullScreen.
NON_FULLSCREEN_PAGE_MODE_USE_OC Show optional content group panel on exiting full-screen mode. Has to be combined with PageModeFullScreen.
DIRECTION_L2R Position pages in ascending order from left to right.
DIRECTION_R2L Position pages in ascending order from right to left.
PRINTSCALING_NONE Print dialog default setting: disabled scaling
PRINTSCALING_APPDEFAULT Print dialog default setting: set scaling to application default value
DUPLEX_SIMPLEX Print dialog default setting: simplex
DUPLEX_FLIP_SHORT_EDGE Print dialog default setting: duplex (short edge)
DUPLEX_FLIP_SHORT_EDGE Print dialog default setting: duplex (long edge)
PICKTRAYBYPDFSIZE_FALSE Print dialog default setting: do not pick tray by PDF size
PICKTRAYBYPDFSIZE_TRUE Print dialog default setting: pick tray by PDF size

The PAGE_LAYOUT_ preferences are overridden by the @-ro-preferences properties and .

Merging PDFs

A generated PDF can easily be merged with existing ones. To merge with a single PDF or multiple PDFs use the mergeDocuments configuration property that declares either URLs to or binary data of existing PDF files.

config.setMergeDocuments(
    new Resource().setUri("https://www.myserver.com/overlaid1.pdf"),
    new Resource().setData(pdfBytes));
config.MergeDocuments = new List<Resource>
{
    new Resource { Uri = "https://www.myserver.com/overlaid1.pdf" },
    new Resource { Data = pdfBytes }
};
$config["mergeDocuments"] = array(
    array("uri" => "https://www.myserver.com/overlaid1.pdf"),
    array("data" => pdfBytesAsBase64)
);
config['mergeDocuments'] = [
    { 'uri': "https://www.myserver.com/overlaid1.pdf" },
    { 'data': pdfBytesAsBase64 }
]
config['mergeDocuments'] = [
    { uri: "https://www.myserver.com/overlaid1.pdf" },
    { data: pdfBytesAsBase64 }
]
config.mergeDocuments = [
    { uri: "https://www.myserver.com/overlaid1.pdf" },
    { data: pdfBytesAsBase64 }
];
config.mergeDocuments = [
    { uri: "https://www.myserver.com/overlaid1.pdf" },
    { data: pdfBytesAsBase64 }
];
config["mergeDocuments"] = [
    { "uri" => "https://www.myserver.com/overlaid1.pdf" },
    { "data" => pdfBytesAsBase64 }
];
{ "mergeDocuments": [
    { "uri": "https://www.myserver.com/overlaid1.pdf" },
    { "data": pdfBytesAsBase64 }
]}
-C config.json

With the following config.json:

{ "mergeDocuments": [
    { "uri": "https://www.myserver.com/overlaid1.pdf" },
    { "data": pdfBytesAsBase64 }
]}

Whether the generated PDF is appended or laid over the existing PDFs depends on the general type of merge:

  • Concatenation

  • Arrange

  • Overlay

Concatenation merges append the generated PDF before or after the existing ones. The following sample shows how to append the generated PDF after the existing one:

config
    .setMergeDocuments(
        new Resource().setUri("https://www.myserver.com/appendDoc.pdf"))
    .setMergeMode(MergeMode.APPEND);
config.MergeDocuments = new List<Resource>
{
    new Resource { Uri = "https://www.myserver.com/appendDoc.pdf" }
};
config.MergeMode = MergeMode.APPEND;
$config["mergeDocuments"] = array(
    array("uri" => "https://www.myserver.com/appendDoc.pdf")
);
$config["mergeMode"] = MergeMode::APPEND;
config['mergeDocuments'] = [
    { 'uri': "https://www.myserver.com/appendDoc.pdf" }
]
config['mergeMode'] = PDFreactor.MergeMode.APPEND
config['mergeDocuments'] = [
    { uri: "https://www.myserver.com/appendDoc.pdf" }
]
config['mergeMode'] = PDFreactor::MergeMode::APPEND
config.mergeDocuments = [
    { uri: "https://www.myserver.com/appendDoc.pdf" }
];
config.mergeMode = PDFreactor.MergeMode.APPEND;
config.mergeDocuments = [
    { uri: "https://www.myserver.com/appendDoc.pdf" }
];
config.mergeMode = PDFreactor.MergeMode.APPEND;
$config["mergeDocuments"] = [
    { "uri" => "https://www.myserver.com/appendDoc.pdf" }
];
$config["mergeMode"] = PDFreactor::MergeMode->APPEND;
{ "mergeDocuments": [
    { "uri": "https://www.myserver.com/appendDoc.pdf" }
], "mergeMode": "APPEND" }
-C config.json

With the following config.json:

{ "mergeDocuments": [
    { "uri": "https://www.myserver.com/appendDoc.pdf" }
], "mergeMode": "APPEND" }

To append the generated PDF before the existing ones use MergeMode.PREPEND.

Arrange inserts specified pages of PDFs into the generated PDF. This merge mode has to be combined with pageOrder (see ) in order to specify which page should be inserted where. The following sample shows how to insert the first page of an existing PDF after the second page of the generated one:

config
    .setMergeDocuments(
        new Resource().setUri("https://www.myserver.com/insertionDoc.pdf"))
    .setMergeMode(MergeMode.ARRANGE)
    .setPageOrder("1,1:1,2..-1");
config.MergeDocuments = new List<Resource>
{
    new Resource { Uri = "https://www.myserver.com/insertionDoc.pdf" }
};
config.MergeMode = MergeMode.ARRANGE;
config.PageOrder = "1,1:1,2..-1";
$config["mergeDocuments"] = array(
    array("uri" => "https://www.myserver.com/insertionDoc.pdf")
);
$config["mergeMode"] = MergeMode::ARRANGE;
$config["pageOrder"] = "1,1:1,2..-1";
config['mergeDocuments'] = [
    { 'uri': "https://www.myserver.com/insertionDoc.pdf" }
]
config['mergeMode'] = PDFreactor.MergeMode.ARRANGE
config['pageOrder'] = "1,1:1,2..-1"
config['mergeDocuments'] = [
    { uri: "https://www.myserver.com/insertionDoc.pdf" }
]
config['mergeMode'] = PDFreactor::MergeMode::ARRANGE
config['pageOrder'] = "1,1:1,2..-1"
config.mergeDocuments = [
    { uri: "https://www.myserver.com/insertionDoc.pdf" }
];
config.mergeMode = PDFreactor.MergeMode.ARRANGE;
config.pageOrder = "1,1:1,2..-1";
config.mergeDocuments = [
    { uri: "https://www.myserver.com/insertionDoc.pdf" }
];
config.mergeMode = PDFreactor.MergeMode.ARRANGE;
config.pageOrder = "1,1:1,2..-1";
$config["mergeDocuments"] = [
    { "uri" => "https://www.myserver.com/insertionDoc.pdf" }
];
$config["mergeMode"] = PDFreactor::MergeMode->ARRANGE;
$config["pageOrder"] = "1,1:1,2..-1";
{ "mergeDocuments": [
    { "uri": "https://www.myserver.com/insertionDoc.pdf" }
], "mergeMode": "ARRANGE",
   "pageOrder": "1,1:1,2..-1" }
-C config.json

With the following config.json:

{ "mergeDocuments": [
    { "uri": "https://www.myserver.com/insertionDoc.pdf" }
], "mergeMode": "ARRANGE",
   "pageOrder": "1,1:1,2..-1" }

More information on the syntax can be found at

Overlay merges add the generated PDF above or below existing PDFs. The following sample shows how to overlay an existing PDF:

config
    .setMergeDocuments(
        new Resource().setUri("https://www.myserver.com/appendDoc.pdf"))
    .setMergeMode(MergeMode.OVERLAY);
config.MergeDocuments = new List<Resource>
{
    new Resource { Uri = "https://www.myserver.com/overlaid.pdf" }
};
config.MergeMode = MergeMode.OVERLAY;
$config["mergeDocuments"] = array(
    array("uri" => "https://www.myserver.com/overlaid.pdf")
);
$config["mergeMode"] = MergeMode::OVERLAY;
config['mergeDocuments'] = [
    { 'uri': "https://www.myserver.com/overlaid.pdf" }
]
config['mergeMode'] = PDFreactor.MergeMode.OVERLAY
config['mergeDocuments'] = [
    { uri: "https://www.myserver.com/overlaid.pdf" }
]
config['mergeMode'] = PDFreactor::MergeMode::OVERLAY
config.mergeDocuments = [
    { uri: "https://www.myserver.com/overlaid.pdf" }
];
config.mergeMode = PDFreactor.MergeMode.OVERLAY;
config.mergeDocuments = [
    { uri: "https://www.myserver.com/overlaid.pdf" }
];
config.mergeMode = PDFreactor.MergeMode.OVERLAY;
$config["mergeDocuments"] = [
    { "uri" => "https://www.myserver.com/overlaid.pdf" }
];
$config["mergeMode"] = PDFreactor::MergeMode->OVERLAY;
{ "mergeDocuments": [
    { "uri": "https://www.myserver.com/overlaid.pdf" }
], "mergeMode": "OVERLAY" }
-C config.json

With the following config.json:

{ "mergeDocuments": [
    { "uri": "https://www.myserver.com/overlaid.pdf" }
], "mergeMode": "OVERLAY" }

To add the generated PDF below the existing one use MergeMode.OVERLAY_BELOW.

PDFreactor allows to repeat the pages of PDFs with less pages than other PDFs involved in the merger. The configuration property overlayRepeat offers different options to do this:

  • repeat only the last page

  • repeat all pages of the PDF

  • do not repeat any pages

  • trim to page count of the shorter document

In the following example, all pages are repeated:

config.setMergeMode(MergeMode.REPEAT_ALL_PAGES);
config.MergeMode = MergeMode.REPEAT_ALL_PAGES;
$config["mergeMode"] = MergeMode::REPEAT_ALL_PAGES;
config['mergeMode'] = PDFreactor.MergeMode.REPEAT_ALL_PAGES
config['mergeMode'] = PDFreactor::MergeMode::REPEAT_ALL_PAGES
config.mergeMode = PDFreactor.MergeMode.REPEAT_ALL_PAGES;
config.mergeMode = PDFreactor.MergeMode.REPEAT_ALL_PAGES;
$config["mergeMode"] = PDFreactor::MergeMode->REPEAT_ALL_PAGES;
{ "mergeMode": "REPEAT_ALL_PAGES" }
--mergeMode "REPEAT_ALL_PAGES"

The default merge behavior of PDFreactor is a concatenation after the pages of the existing PDFs.

Digital Signing

PDFreactor is able to sign the PDFs it creates. This allows to validate the identity of the creator of the document. A self-signed certificate may be used. A keystore file in which the certificate is included, is required to sign PDFs with PDFreactor.

The keystore type is required to be one of the following formats:

  • "pkcs12"

  • "jks"

To create a keystore from certificate(s) or read information of an existing keystore such as the keyAlias, the Oracle Keytool can be used.

PDFreactor supports various certificates types to sign a PDF such as self-signed certificates. Please see the API documentation for details on these modes.

To sign a PDF digitally use the configuration property signPDF:

config.setSignPDF(
new SignPDF()
    .setKeyAlias("keyAlias")
    .setKeystorePassword("keyStorePassword")
    .setKeystoreType(KeystoreType.JKS)
    .setKeystoreURL("http://myServer/Keystore.jks")
    .setSigningMode(SigningMode.SELF_SIGNED));
config.SignPDF = new SignPDF
{
    KeyAlias = "keyAlias",
    KeystorePassword = "keyStorePassword",
    KeystoreType = KeystoreType.JKS,
    KeystoreURL = "http://myServer/Keystore.jks",
    SigningMode = SigningMode.SELF_SIGNED
};
$config["signPDF"] = array(
    "keyAlias" => "keyAlias",
    "keystorePassword" => "keyStorePassword",
    "keystoreType" => KeystoreType::JKS,
    "keystoreURL" => "http://myServer/Keystore.jks",
    "signingMode" => SigningMode::SELF_SIGNED
);
config['signPDF'] = {
    'keyAlias': "keyAlias",
    'keystorePassword': "keyStorePassword",
    'keystoreType': PDFreactor.KeystoreType.JKS,
    'keystoreURL': "http://myServer/Keystore.jks",
    'signingMode': PDFreactor.SigningMode.SELF_SIGNED
}
config['signPDF'] = {
    keyAlias: "keyAlias",
    keystorePassword: "keyStorePassword",
    keystoreType: PDFreactor::KeystoreType::JKS,
    keystoreURL: "http://myServer/Keystore.jks",
    signingMode: PDFreactor::SigningMode::SELF_SIGNED
}
config.signPDF = {
    keyAlias: "keyAlias",
    keystorePassword: "keyStorePassword",
    keystoreType: PDFreactor.KeystoreType.JKS,
    keystoreURL: "http://myServer/Keystore.jks",
    signingMode: PDFreactor.SigningMode.SELF_SIGNED
};
config.signPDF = {
    keyAlias: "keyAlias",
    keystorePassword: "keyStorePassword",
    keystoreType: PDFreactor.KeystoreType.JKS,
    keystoreURL: "http://myServer/Keystore.jks",
    signingMode: PDFreactor.SigningMode.SELF_SIGNED
};
$config["signPDF"] = {
    "keyAlias" => "keyAlias",
    "keystorePassword" => "keyStorePassword",
    "keystoreType" => PDFreactor::KeystoreType->JKS,
    "keystoreURL" => "http://myServer/Keystore.jks",
    "signingMode" => PDFreactor::SigningMode->SELF_SIGNED
};
{ "signPDF": {
    "keyAlias": "keyAlias",
    "keystorePassword": "keyStorePassword",
    "keystoreType": PDFreactor.KeystoreType.JKS,
    "keystoreURL": "http://myServer/Keystore.jks",
    "signingMode": PDFreactor.SigningMode.SELF_SIGNED
}}
-C config.json

With the following config.json:

{ "signPDF": {
    "keyAlias": "keyAlias",
    "keystorePassword": "keyStorePassword",
    "keystoreType": PDFreactor.KeystoreType.JKS,
    "keystoreURL": "http://myServer/Keystore.jks",
    "signingMode": PDFreactor.SigningMode.SELF_SIGNED
}}

To specify the keystoreURL as file URL use the following syntax: file:///path/to/Keystore.jks

If a PDF is signed via the VeriSign signing mode, a plugin for the PDF viewer is required to show the signature.

Font Embedding

By default, PDFreactor automatically embeds the required subsets of all fonts used in the document. This can be disable using the configuration property disableFontEmbedding.

config.setDisableFontEmbedding(true);
config.DisableFontEmbedding = true;
$config["disableFontEmbedding"] = true;
config['disableFontEmbedding'] = True
config['disableFontEmbedding'] = true
config.disableFontEmbedding = true;
config.disableFontEmbedding = true;
$config["disableFontEmbedding"] = true;
{ "disableFontEmbedding": true }
--disableFontEmbedding

Doing so reduces the file size of the resulting PDF documents. However, these documents are likely to not look the same on all systems. Therefore this property should only be used when necessary.

Overprinting

Overprinting means that one color is printed on top of another color. For example, a background is printed completely, before the text is put on top. As this is a feature for printing it should be used with CMYK colors.

PDFreactor can set the values of the PDF graphics state parameters "overprint" and "overprint mode" via CSS. However, before the CSS properties have any effect, overprinting must first be enabled via the configuration property addOverprint:

config.setAddOverprint(true);
config.AddOverprint = true;
$config["addOverprint"] = true;
config['addOverprint'] = True
config['addOverprint'] = true
config.addOverprint = true;
config.addOverprint = true;
$config["addOverprint"] = true;
{ "addOverprint": true }
--addOverprint

Then using the styles -ro-pdf-overprint and -ro-pdf-overprint-content you can specify the overprint properties of elements and their content to either none (default), mode0 or mode1 (nonzero overprint mode).

-ro-pdf-overprint affects the entire element, while -ro-pdf-overprint-content only affects the content of the element (not its borders and backgrounds). In both cases the children of the element are affected entirely, unless overprint styles are applied to them as well.

The following example sets small text on solid background to overprint, without enabling overprinting for the background of either the paragraphs or the highlighting spans:

p.infobox {
    border: 1pt solid black;
    background-color: lightgrey;
    color: black;
    font-size: 8pt;
    -ro-pdf-overprint-content: mode1;
}
p.infobox span.marked {
    background-color: yellow;
    -ro-pdf-overprint: none;
    -ro-pdf-overprint-content: mode1;
}

When having small text with a background, overprinting can be very helpful to avoid white lines around the text, if the printing registration is imperfect.

Attachments

Alternatively to linking to external URLs (see ) PDFreactor also allows embedding their content into the PDF.

Attachments can be defined via CSS, which can be enabled by the configuration property addAttachments:

config.setAddAttachments(true);
config.AddAttachments = true;
$config["addAttachments"] = true;
config['addAttachments'] = True
config['addAttachments'] = true
config.addAttachments = true;
config.addAttachments = true;
$config["addAttachments"] = true;
{ "addAttachments": true }
--addAttachments

The following styles can be used to specify attachments:

  • :

    A URL pointing to the file to be embedded. This URL can be relative.

  • :

    The file name associated with the attachment. It is recommended to specify the correct file extension. If this is not specified the name is derived from the URL.

  • :

    The description of the attachment. If this is not specified the name is used.

  • :

    • element (default): The attachment is related to the area of the element. Viewers may show a marker near that area.

    • document: The file is attached to the document with no relation to the element.

Attachments can be specified for specific elements as follows:

#downloadReport {
    -ro-pdf-attachment-url: "../resources/0412/report.doc";
    -ro-pdf-attachment-name: "report-2012-04.doc";
    -ro-pdf-attachment-description: "Report for April of 2012";
}

Strings can be dynamically read from the document using the CSS functions attr and , that read specified attributes or the text content of the element respectively. Using those, certain a-tags can be changed from links to attachments:

.downloadReports a[href] {
    -ro-link: none;
    -ro-pdf-attachment-url: attr(href);
    -ro-pdf-attachment-description: content() " (" attr(href) ")";
}

Attachments can also be set via the configuration property attachments. This configuration property also allows specifying the content of the attachment as a byte array instead of an URL, so dynamically created data can be attached:

config.setAttachments(
    new Attachment()
        .setData("sample attachment text".getBytes())
        .setName("sample.txt")
        .setDescription("a dynamically created attachment containing text"),
    new Attachment()
        .setUrl("../resources/0412/report.doc")
        .setName("report-2012-04.doc")
        .setDescription("Report for April of 2012"));
config.Attachments = new List<Attachment>
{
    new Attachment
    {
        Data = sampleAttachmentTextBytes
        Name = "sample.txt"
        Description = "a dynamically created attachment containing text"
    },
    new Attachment
    {
        Url = "../resources/0412/report.doc",
        Name = "report-2012-04.doc",
        Description = "Report for April of 2012"
    }
};
$config["attachments"] = array(
    array(
        "data" => sampleAttachmentTextBytesAsBase64
        "name" => "sample.txt"
        "description" => "a dynamically created attachment containing text"
    ),
    array(
        "url" => "../resources/0412/report.doc",
        "name" => "report-2012-04.doc",
        "description" => "Report for April of 2012"
    )
);
config['attachments'] = [
    {
        'data': sampleAttachmentTextBytesAsBase64
        'name': "sample.txt"
        'description': "a dynamically created attachment containing text"
    },
    {
        'url': "../resources/0412/report.doc",
        'name': "report-2012-04.doc",
        'description': "Report for April of 2012"
    }
]
config['attachments'] = [
    {
        data: sampleAttachmentTextBytesAsBase64
        name: "sample.txt"
        description: "a dynamically created attachment containing text"
    },
    {
        url: "../resources/0412/report.doc",
        name: "report-2012-04.doc",
        description: "Report for April of 2012"
    }
]
config.attachments = [
    {
        data: sampleAttachmentTextBytesAsBase64
        name: "sample.txt"
        description: "a dynamically created attachment containing text"
    },
    {
        url: "../resources/0412/report.doc",
        name: "report-2012-04.doc",
        description: "Report for April of 2012"
    }
];
config.attachments = [
    {
        data: sampleAttachmentTextBytesAsBase64
        name: "sample.txt"
        description: "a dynamically created attachment containing text"
    },
    {
        url: "../resources/0412/report.doc",
        name: "report-2012-04.doc",
        description: "Report for April of 2012"
    }
];
$config["attachments"] = [
    {
        "data" => sampleAttachmentTextBytesAsBase64
        "name" => "sample.txt"
        "description" => "a dynamically created attachment containing text"
    },
    {
        "url" => "../resources/0412/report.doc",
        "name" => "report-2012-04.doc",
        "description" => "Report for April of 2012"
    }
];
{ "attachments": [
    {
        data: sampleAttachmentTextBytesAsBase64
        name: "sample.txt"
        description: "a dynamically created attachment containing text"
    },
    {
        url: "../resources/0412/report.doc",
        name: "report-2012-04.doc",
        description: "Report for April of 2012"
    }
]}
-C config.json

With the following config.json:

{ "attachments": [
    {
        data: sampleAttachmentTextBytesAsBase64
        name: "sample.txt"
        description: "a dynamically created attachment containing text"
    },
    {
        url: "../resources/0412/report.doc",
        name: "report-2012-04.doc",
        description: "Report for April of 2012"
    }
]}

Attaching Debug Files

PDFreactor offers a number of debug files containing useful information about the conversion, e.g. logs. These can be attached to the PDF by specifying a special URL for the attachment. Please refer to for an overview of all available debug files. Note that some debug files might require additional configuration options, such as .

PDF Script

This chapter refers to Scripts added to the resulting PDFs, processed by the PDF-viewer. There are also:

Some PDF viewers (e.g. Adobe Reader) allow the execution of JavaScript, which has been added to the PDF. This way, the document can be changed and dynamic content can be added long after the conversion is complete. Of course the structure of the PDF is different from the HTML and addressing certain elements with PDF scripts has to be done differently.

Please note, that support for PDF scripts is not wide spread among PDF reader software.

PDFreactor allows two ways to add such scripts to the converted PDF. The scripts can be added using the configuration property pdfScriptAction. The parameters are the script as a string and the event which should trigger the script.

The supported events are:

  • open: These scripts are triggered when opening the PDF in a viewer.

  • close: These scripts are triggered when closing the PDF.

  • before save: These events are triggered just before the viewer saves the PDF.

  • after save: These events are triggered after the viewer has saved the PDF.

  • before print: These events are triggered just before the viewer prints the PDF.

  • after print: These events are triggered after the viewer has printed the PDF.

These PDF scripts must not be confused with the JavaScript that is executed while creating the PDF. PDF scripts basically use the JavaScript syntax, however, they are executed (if this feature is supported and enabled by the viewer application) at a completely different time, e.g. when opening the PDF.

The following PDF script will display a message prompt when the PDF is opened.

config.setPdfScriptAction(new PdfScriptAction()
    .setScript("app.alert('hello');")
    .setTriggerEvent(PdfScriptTriggerEvent.OPEN));
config.PdfScriptAction = new PdfScriptAction
{
    Script = "app.alert('hello');",
    TriggerEvent = PdfScriptTriggerEvent.OPEN
};
$config["pdfScriptAction"] = array(
    "script" => "app.alert('hello');",
    "triggerEvent" => PdfScriptTriggerEvent::OPEN
);
config['pdfScriptAction'] = {
    'script': "app.alert('hello');",
    'triggerEvent': PDFreactor.PdfScriptTriggerEvent.OPEN
}
config['pdfScriptAction'] = {
    script: "app.alert('hello');",
    triggerEvent: PDFreactor::PdfScriptTriggerEvent::OPEN
}
config.pdfScriptAction = {
    script: "app.alert('hello');",
    triggerEvent: PDFreactor.PdfScriptTriggerEvent.OPEN
};
config.pdfScriptAction = {
    script: "app.alert('hello');",
    triggerEvent: PDFreactor.PdfScriptTriggerEvent.OPEN
};
$config["pdfScriptAction"] = {
    "script" => "app.alert('hello');",
    "triggerEvent" => PDFreactor.PdfScriptTriggerEvent->OPEN
};
{ "pdfScriptAction": {
    "script": "app.alert('hello');",
    "triggerEvent": "OPEN"
}}
-C config.json

With the following config.json:

{ "pdfScriptAction": {
    "script": "app.alert('hello');",
    "triggerEvent": "OPEN"
}}

The second way to set scripts is by using the proprietary CSS property pdf-script-action. By using this property, one can define the PDF scripts in the original document. For more information on this property, please see .

Please note, that the PDF scripts set via the CSS property have a higher priority than those defined via API.

For each trigger event there can be only one script. When setting scripts several times on the same event, only the last one set will be added to the PDF.

Preview Images

While most PDF viewers automatically generate page thumbnails to preview pages, PDFreactor can do this during the conversion and embed these preview images. This frees up PDF viewer resources and is especially useful for large documents. You can let PDFreactor create preview images with the addPreviewImages configuration property like this:

config.setAddPreviewImages(true);
config.AddPreviewImages = true;
$config["addPreviewImages"] = true;
config['addPreviewImages'] = True
config['addPreviewImages'] = true
config.addPreviewImages = true;
config.addPreviewImages = true;
$config["addPreviewImages"] = true;
{ "addPreviewImages": true }
--addPreviewImages

Custom XMP

When using conformance such as PDF/A, PDF/X or PDF/UA as well as other features, PDFreactor automatically creates and appends an appropriate XMP to the generated PDF.

Custom XMPs can be loaded via content or uri. You also need to specify a priority, which can be HIGH (which means that the custom XMP replaces the one generated by PDFreactor) or LOW (which means that the custom XMP is only attached if PDFreactor did not generate one).

config.setXmp(new OutputFormat()
    .setPriority(XmpPriority.HIGH)
    .setUri("http://cdn/myXmp.xml"));
config.Xmp = new OutputFormat {
    Priority = XmpPriority.HIGH,
    Uri = "http://cdn/myXmp.xml"
};
$config["xmp"] = array(
    "priority" => XmpPriority::HIGH,
    "uri" => "http://cdn/myXmp.xml"
);
config['xmp'] = {
    'priority': PDFreactor.XmpPriority.HIGH,
    'uri': 'http://cdn/myXmp.xml'
}
config['xmp'] = {
    priority: PDFreactor::XmpPriority::HIGH,
    uri: 'http://cdn/myXmp.xml'
}
config.xmp = {
    priority: PDFreactor.XmpPriority.HIGH,
    uri: "http://cdn/myXmp.xml"
};
config.xmp = {
    priority: PDFreactor.XmpPriority.HIGH,
    uri: "http://cdn/myXmp.xml"
};
$config["xmp"] = {
    "priority" => PDFreactor::XmpPriority->HIGH,
    "uri" => "http://cdn/myXmp.xml"
};
{ "xmp": {
    "priority": "HIGH",
    "uri": "http://cdn/myXmp.xml"
}}
-C config.json

With the following config.json:

{ "xmp": {
    "priority": "HIGH",
    "uri": "http://cdn/myXmp.xml"
}}

When attaching a custom XMP with high priority (thus overriding the PDFreactor-generated XMP), conformance such as PDF/A cannot be guaranteed.

Image Output

In addition to PDF, PDFreactor, with the optional Raster Image Output, supports the following image output formats:

These can be selected using the configuration property outputFormat, e.g.:

config.setOutputFormat(new OutputFormat()
    .setType(OutputType.PNG)
    .setWidth(512)
    .setHeight(-1));
config.OutputFormat = new OutputFormat {
    Type = OutputType.PNG,
    Width = 512,
    Height = -1
};
$config["outputFormat"] = array(
    "type" => OutputType::PNG,
    "width" => 512,
    "height" => -1
);
config['outputFormat'] = {
    'type': PDFreactor.OutputType.PNG,
    'width': 512,
    'height': -1
}
config['outputFormat'] = {
    type: PDFreactor::OutputType::PNG,
    width: 512,
    height: -1
}
config.outputFormat = {
    type: PDFreactor.OutputType.PNG,
    width: 512,
    height: -1
};
config.outputFormat = {
    type: PDFreactor.OutputType.PNG,
    width: 512,
    height: -1
};
$config["outputFormat"] = {
    "type" => PDFreactor::OutputType->PNG,
    "width" => 512,
    "height" => -1
};
{ "outputFormat": {
    "type": "PNG",
    "width": 512,
    "height": -1
}}
-C config.json

With the following config.json:

{ "outputFormat": {
    "type": "PNG",
    "width": 512,
    "height": -1
}}

The later two parameters set the width and height of the resulting images in pixels. If either of these is set to a value of less than 1 it is computed from the other value and the aspect ratio of the page.

for the media feature -ro-output-format, which allows setting styles specific for PDF or image output.

Selecting a page

All image output formats, except for the TIFF formats, create an image of a single page. By default, this is the first page. A different page can be selected using the configuration property pageOrder, e.g.:

config.setPageOrder("5");
config.PageOrder = "5";
$config["pageOrder"] = "5";
config['pageOrder'] = "5"
config['pageOrder'] = "5"
config.pageOrder = "5";
config.pageOrder = "5";
$config["pageOrder"] = "5";
{ "pageOrder": "5" }
--pageOrder "5"

Converting a Document Into Multiple Images

To convert a document into multiple images, you have to set the multiImage parameter of your OutputFormat to true e.g. like this:

config.setOutputFormat(new OutputFormat()
    .setType(OutputType.PNG)
    .setWidth(512)
    .setHeight(-1)
    .setMultiImage(true));
config.OutputFormat = new OutputFormat {
    Type = OutputType.PNG,
    Width = 512,
    Height = -1,
    MultiImage = true
};
$config["outputFormat"] = array(
    "type" => OutputType::PNG,
    "width" => 512,
    "height" => -1,
    "multiImage" => true
);
config['outputFormat'] = {
    'type': PDFreactor.OutputType.PNG,
    'width': 512,
    'height': -1,
    'multiImage': True
}
config['outputFormat'] = {
    type: PDFreactor::OutputType::PNG,
    width: 512,
    height: -1,
    multiImage: true
}
config.outputFormat = {
    type: PDFreactor.OutputType.PNG,
    width: 512,
    height: -1,
    multiImage: true
};
config.outputFormat = {
    type: PDFreactor.OutputType.PNG,
    width: 512,
    height: -1,
    multiImage: true
};
$config["outputFormat"] = {
    "type" => PDFreactor::OutputType->PNG,
    "width" => 512,
    "height" => -1,
    "multiImage" => true
};
{ "outputFormat": {
    "type": "PNG",
    "width": 512,
    "height": -1,
    "multiImage": true
}}
-C config.json

With the following config.json:

{ "outputFormat": {
    "type": "PNG",
    "width": 512,
    "height": -1,
    "multiImage": true
}}

The documentArray property of the Result object then returns an array of byte arrays, each containing an image representing one page of the document.

Continuous Output

The configuration property continuousOutput sets PDFreactor to continuous mode. In this mode each document is converted into one image. Also screen styles will be used and print styles will be ignored, resulting in a very browser-like look for the output image.

config.setContinuousOutput(new ContinuousOutput()
    .setWidth(1024)
    .setHeight(768));
config.ContinuousOutput = new ContinuousOutput {
    Width = 1024,
    Height = 768
};
$config["continuousOutput"] = array(
    "width" => 1024,
    "height" => 768
);
config['continuousOutput'] = {
    'width': 1024,
    'height': 768
}
config['continuousOutput'] = {
    width: 1024,
    height: 768
}
config.continuousOutput = {
    width: 1024,
    height: 768
};
config.continuousOutput = {
    width: 1024,
    height: 768
};
$config["continuousOutput"] = {
    "width" => 1024,
    "height" => 768
};
{ "continuousOutput": {
    "width": 1024,
    "height": 768
}}
-C config.json

With the following config.json:

{ "continuousOutput": {
    "width": 1024,
    "height": 768
}}

The first parameter sets the width of the layout. This has the same effect as the width of a browser window. This only changes the layout. The result will still be scaled to the width specified by outputFormat

The second parameter sets the height. This has the same effect as the height of a browser window, i.e. it will cut off the image or increase its height. Values of less than 1 cause the full height of the laid out document to be used.

Grayscale Image

PDFreactor can optionally output images that are entirely grayscale, i.e. that are composed exclusively of shades of gray and don't contain any other color. Such an output can be achieved using the forceGrayscaleImage configuration property like this:

config.setForceGrayscaleImage(true);
config.ForceGrayscaleImage = true;
$config["forceGrayscaleImage"] = true;
config['forceGrayscaleImage'] = True
config['forceGrayscaleImage'] = true
config.forceGrayscaleImage = true;
config.forceGrayscaleImage = true;
$config["forceGrayscaleImage"] = true;
{ "forceGrayscaleImage": true }
--forceGrayscaleImage

Grayscale output cannot be combined with transparency.

Layout Documents

This chapter provides information on how to lay out documents, while focusing on the differences of the paginated layout of PDFreactor, in contrast to the continuous layout of browsers.

The document layout mostly depends on CSS but there are PDFreactor configuration properties and JavaScript functionality that may also be of use to achieve the desired results. While the common CSS properties known from browsers are supported as well, they are not covered in this chapter. Therefore an understanding of basic CSS is required.

Pagination

PDFreactor renders HTML and XML documents on pages. The rules to achieve that are provided by CSS.

The document content is laid out page by page, whenever there is no more space left on a page, PDFreactor automatically breaks text and boxes to the next.

Basic page styles are provided for HTML. Page styles for XML documents need to be created based on the documents language.

Layout at Breaks

Boxes around or next to breaks are subject to minor adjustments depending on the situation:

Between Blocks

The top margin of the first block on a page or column is ignored, except for the first page or column and for breaks forced via CSS. This difference can be eliminated by setting the proprietary property to always or none to ensure this adjustment is performed in all or no cases, respectively.

A non-proprietary alternative, that also affects the layout of documents in browsers (especially relevant for multi-column) is to explicitly set specific top margins to 0.

h1 {
    break-before: page;
    margin-top: 0;
}

div.multiColumn > *:first-child {
    margin-top: 0;
}

The bottom margin of the last block on a page or column is always ignored.

Inside Blocks

When a break occurs inside a block (e.g. between two lines of text in a paragraph) the block is split into two parts. There is no border, margin or padding at the bottom of the first part or the top of the second one. Setting the property to clone forces the inclusion of these borders and paddings. This does not affect the margins.

Images

By default no breaks can occur inside images and other replaced elements. In cases when this is required the proprietary property can be set to the values auto or avoid to explicitly allow breaks inside block images. To avoid too small parts of images to be split-off at the beginning or end the and properties, multiplied by the computed , are taken into account.

Page Selectors

To create an individual page layout pages need to be selected with CSS. In principle it works the same way as selecting an element, but the selector is different.

To select all pages of the document, the @page rule is used instead of the usual element selector.

@page {
    margin: 1in;
}

:first, :left, :right and other page specific pseudo-classes make it possible to style specific pages, like the first ones, e.g. for cover pages or subsets, like left pages.

@page {
    margin: 0.5in;
}
@page:left {
    margin-right: 0.75in;
}
@page:right {
    margin-left: 0.75in;
}

Which pages are left or right can be specified via the @-ro-preferences property

Nth Page

It is possible to select any page by using the prefixed CSS3 pseudo-class :-ro-nth(). This pseudo-class takes a function of the form An+B, similar to the pseudo-class :nth-child().

A single page can be selected (e.g. :-ro-nth(3) selects the third page) or the function can be used to select multiple pages. For example, :-ro-nth(2n) selects every second page (i.e. even pages), while :-ro-nth(2n+1) selects the first and every other page (odd pages).

Note that the selected page number is independent of the page counter, which is used to display page numbers and which can be manipulated.

This pseudo-class can also be used in combination with page names. For more information see .

Last Page

As the counterpart to :first, there is the proprietary selector :-ro-last. It allows to select the last page of the document.

Please note that as the content of the last page is only known after its content has been computed, there can be situations where the last page is empty. This can happen if the styles that are applied to the last page influence the layout of the page content, e.g. changing the page margins.

Page Size & Orientation

The size and orientation of a page can be set with the property. PDFreactor supports many different page sizes, see Appendix Supported Page Size Formats.

@page {
    size: letter portrait;
}

To set a page to landscape orientation, "portrait" is replaced by "landscape":

@page {
    size: letter landscape;
}

Instead of setting fixed page formats with a specified orientation it is also possible to set two length values. These then define page size and orientation.

@page {
    size: 4.25in 6.75in;
}

Named Pages

With named pages an element is able to create and appear on a special page that has a name. This name can be used as part of a page selector to attach additional style properties to all pages of that name.

To create a named page, an element receives the property with a page name as identifier.

All HTML <table> elements have to appear on pages with the name pageName.

table {
    page: pageName;
}

A page break will be inserted before an element that has the page property set. Another page break will be inserted for the next element that defines a different page name (or none) to ensure the Named Page only contains elements that specify its name.

To attach styles to a named page, the page name is added to the @page rule. The page name is not a pseudo-class like :first for example. There is a space between @page and the page name, not a colon.

@page pageName {
    size: letter landscape;
}

Page Groups

When setting a page name, a page group of this name is created automatically. Compared to named pages, page groups are more flexible and can be used to select a certain page, e.g. the first page with a name instead of all pages with that name.

While each page can have only one name, it can belong to multiple page groups, thus allowing an author to nest special pages. This means that if an element sets a page name to 'A', that page belongs to a page group of the same name, but can also belong to a group named 'B', if that group was defined by a parent element.

The following sample applies page orientation and page background color to the same page, by using two page groups.

HTML:

<section>
    <table class="landscape"> ... </table>
</section>

CSS:

section {
    page: outerGroup;
}
.landscape {
    page: innerGroup;
}
/* Make all pages named 'outerGroup' lightblue */
@page :-ro-nth(n of outerGroup) {
    background-color: lightblue;
}
/* Make all pages named 'innerGoup' landscape */
@page :-ro-nth(n of innerGroup) {
    size: A4 landscape;
}

In contrast to named pages, it is possible to create a new group even if the page name did not change. To do so, two adjacent elements, both defining the same page name, have to be divided by a forced page break.

Another advantage of page groups, is the possibility to select certain pages belonging to a group name. This is especially useful, if the first page of a group should have different styles. To select the nth-page of a group, the -ro-nth(An+B of pageName) pseudo class is used:

Select the first page of each page group with the name pageName.

@page :-ro-nth(1 of pageName) {
    background-color: lightgrey;
}

For more information on the syntax of the -ro-nth() pseudo class, please see .

Breaking Text

Text is broken whenever there is not enough space left, e.g. inside the line or on the page.

Automatic Hyphenation

Automatic Hyphenation allows breaking words in a way appropriate for the language of the word.

To use Automatic Hyphenation two requirements must be met:

  • The text to hyphenate requires a language set in the document.

  • The language set for the hyphenated text is supported by PDFreactor (see Appendix for more information)

The lang attribute in HTML or the xml:lang attribute in XML allow defining a language for the document and on individual elements, in case they deviate from the document language.

<html lang="en">
    ...
</html>

Hyphenation is enabled or disabled via CSS with the property:

Hyphenation enabled for an entire document except for paragraphs of the noHyphenation class.

html {
    hyphens: auto;
}
p.noHyphenation {
    hyphens: none;
}

In addition it is possible to specify the number of minimum letters before or after which text can be broken within a word. This is done with the and properties.

Widows & Orphans

If the last line of a paragraph is also the first line of a page it is called a widow.

If the first line of a paragraph is also the last line of a page it is called an orphan.

By default, PDFreactor avoids widows and orphans by adding a page break before the paragraph. This behavior can be changed with the CSS properties and .

p {
    orphans: 2;
    widows: 2;
}

Changing the value to 1 will allow widows and orphans. Changing it to higher integer values will prevent even multiple line widows and orphans. (e.g.: orphans: 5 means that if the first 4 lines of a paragraph are the last 4 lines of a page these lines are considered an orphan.)

Customizing Line Breaks

By default, the rules for breaking words are defined by the Unicode StandardSee Unicode Standard Annex #14 - Unicode Line Breaking Algorithm - https://www.unicode.org/reports/tr14/. In certain situations however, you may want to define specific break opportunities and forbid others. While this can be done using white-space and soft-hyphens, PDFreactor provides a more convenient way for general rules. The proprietary property -ro-line-break-opportunity allows to precisely define between which characters a break is allowed or forbidden.

Specifying this is done via Regular Expression (Regex), excluding lookaheads or lookbehinds. Though the syntax may look confusing to those that are unfamiliar with Regex, it allows to define any possible break opportunity. The property value is divided in up to three parts:

  1. normal: This optional identifier specifies that the default rules still apply. Thus the existing rules are only extended instead of being completely overridden.

  2. <whitelist>: These regex expression describe where break opportunities should be added.

  3. <blacklist>: The blacklist is separated with a slash and describes where break opportunities should be removed. The blacklist is stronger than the whitelist and overrides it in the case of a conflict.

Both, whitelist and blacklist, describe the character matching using one or two strings. The first string describes the content that must come before, the second what must come after the break opportunity. The second string can be omitted, while the first string can be an empty string if it is not needed. In regex terms, the first string is a lookbehind, the second is a lookahead, hence the slightly reduced syntax.

As the strings are specified in CSS, each backslash must be escaped. For example an escaped opening parenthesis would require two backslashes. One to escape the parenthesis for regex and one to escape the backslash for CSS: "\\("

A common use case of this property is when trying to break a file path or other technical strings where normal breaking rules are not applied.

Examples
Property Value Effect
normal / "[/-]" Prevent breaks after a slash or a minus.
normal "\\." "\\w" Allow a break after a dot followed by a word character.
normal "\\w" "\\(" / "" "\\(\\)" Allow a break between a word character and a left parenthesis, except if a left and a right parenthesis follows.
normal "\\\\" / "" "\\\"" Allow a break after a backslash, except if it is followed by a quote.
"\\w" "(\\d){3,}" Only allow break between a word character and a number, if at least 3 digits follow.

Long and complex rules (especially those that include wildcards) can impact the performance depending on the length of the paragraphs, so it is best practice to apply the style only to the elements that may actually need them.

Generated Content

Generated content does not originate from the document. It is created by CSS during the rendering process and appears in the rendered result as if it was part of the document.

The pseudo-elements ::before and ::after are used to generate content before or after an element. The actual content is created with the property.

Generated Text

To create generated text, set a String as value of the content property.

Generated Text on an HTML <div> element.

HTML:

<div>This is a note.</div>

CSS:

div::before {
    /* Adds the text "Note:" at the start of the element. */
    content: "Note:";

    padding-right: 0.1in;
    font-weight: bold;
}
div {
    border: 1px solid black;
    background-color: palegoldenrod;
    padding: 0.1in;
}

As a result, the <div> would look like this:

This is a note.

Sometimes it is necessary to add an explicit line break to generated text. To create such a line break, a "\A " (an escaped line break character followed by a space) needs to be added to the String and the property needs to be set to either pre, pre-wrap or pre-line.

div::before {
    content: "RealObjects\A PDFreactor";
    white-space: pre;
}

The result would look like this:

Generated Images

A generated image can be created with the image's URL set as value of the content property.

h1::before {
    content: url("https://mydomain/pictures/image.svg");
}

Counters

Counters can be used to count elements or pages and then add the value of the Counter to generated text.

A Counter needs to be defined either with the or the property. Its value is read with the counter() function as value of the content property.

A common use-case for Counters are numbered headings. The chapter heading of a document is intended to display a number in front of its text that increases with each chapter.

A chapter heading for HTML <h1> elements using Counters and Generated Text.

h1 {
    /* increases the counter "heading1" by 1 on each <h1> element */
    counter-increment: heading1 1;
}
h1::before {
    /* Adds the current value of "heading1" before the <h1> element's
       text as decimal number */
    content: counter(heading1, decimal)
}

Subchapter headings, work the same way, with a simple addition. The number of each subchapter is intended to be reset whenever a new chapter begins. To restart numbering, the counter-reset property is used.

h1 {
    /* resets the value of counter "heading2" to 0 on every  <h1> element */
    counter-reset: heading2 0;
}
h2 {
    counter-increment: heading2 1;
}

h2::before {
    /* Shows the current value of "heading1" and "heading2", separated by a
       generated text ".", the value of "heading2" is shown as lower-case
       letter */
    content: counter(heading1, decimal) "." counter(heading2, lower-alpha)
}

To define custom counter representations use the @counter-style rule. It is structured like this:

@counter-style <counter-style-name> {

    system:             <counter-system>;
    symbols:            <counter-symbols>;
    additive-symbols:   <additive-symbols>;
    negative:           <negative-symbol>;
    prefix:             <prefix>;
    suffix:             <suffix>;
    range:              <range>;
    pad:                <padding>;
    fallback:           <counter-style-name>;

}

To learn more on how to use the @counter-style rule, see the MDN Documentation.

Page Header & Footer

Header, Footer & Page Side Boxes

It is possible to add Generated Content to a page within the page margin. The page margin is the space between the document content and the edges of a sheet. It is defined on a page using and the property.

Each page provides sixteen Page Margin Boxes that can display Generated Content much like a pseudo-element. To add Generated Content to a page, add a Page Margin Box declaration to an existing @page rule and set the Generated Content to the property as usual.

Page margin boxes
Page Margin Boxes

A Page Margin Box declaration consists of an "@" character followed by the name of the Page Margin Box.

@top-left {
    content: "RealObjects PDFreactor(R)";
}
@top-right {
    content: "copyright 2021 by RealObjects";
}

Running Elements

Running Elements are elements inside the document that are not rendered inside the document content but inside Page Margin Boxes.

They are useful whenever the content of a Page Margin Box needs to be more complex than Generated Content (e.g. a table) or parts of it need to be styled individually.

In case the document does not provide elements to use Running Elements and Generated Content does not suffice, it is possible to add elements to the document with JavaScript to be able to use Running Elements.

To create a Running Element, an element needs to be positioned as "running", using the running() function with an identifier for the element as argument. The function is set as value of the property. This removes the element from the document content.

To display a Running Element inside a Page Margin Box, set the element() function as value of the content property. The argument of the function is the same identifier used to in the running() function of the Running Element.

An HTML <footer> element at the start of the document used as page footer in all pages.

HTML:

<body>
    <footer>...</footer>
    ...
</body>

CSS:

footer {
    position: running(footerIdentifier);
}
@page {
    @bottom-center {
        content: element(footerIdentifier);
    }
}

The <footer> needs to be at the beginning of the HTML document to guarantee, that it will appear on every page of the document.

The reason for that is, that running elements stay anchored to the location they would appear in if they were not Running Elements.

The original position of the running element inside the document plays a key role when designing a document, it provides document designers with additional options.

First of all it is possible to have running elements of the same name, which makes it possible to change the content of a Page Margin Box over the course of the document.

Two Running Elements at the start of the document with the same name. The first appears on page one, the second on every page thereafter because it is the latest Running Element of the name.

HTML:

<body>
    <header id="titlePageHeader">...</header>
    <header id="pageHeader">...</header>
    <!-- first page content -->
    ...
    <!-- second page content -->
    ...
</body>

CSS:

#titlePageHeader, #pageHeader {
    position: running(headerIdentifier);
}
@page {
    @top-center {
        content: element(headerIdentifier);
    }
}

Second of all it is possible to have running elements appear for the first time later in the document than on the first page.

An HTML <footer> element at the end of the document is used as Running Element. The page footer displays it in the last page only, as it is not available earlier.

HTML:

<body>
    ...
    <footer>...</footer>
</body>

CSS:

footer {
    position: running(footerIdentifier);
}
@page {
    @bottom-center {
        content: element(footerIdentifier);
    }
}
Notice how the style does not differ from the one used in the first example of this chapter. This shows how much influence the position of a Running Element is inside the document has.

It is possible that more than one Running Element of the same name would anchor on the same page. Sometimes, it may not be the first Running Element on a page that should be used for that page. For that case it is possible to add one of these identifiers as second argument to the element() function:

  • start

    • Retrieves the latest Running Element of the name from previous pages.

    • If there is none, nothing is displayed.

  • first

    • Retrieves the first Running Element of the name on the page.

    • If there is none, it falls back to the behavior of start.

    • This is the default behavior if no argument is given.

  • last

    • Retrieves the last Running Element of the name on the page.

    • If there is none, it falls back to the behavior of start.

    • This keyword is useful in case a Running Element is displayed as footer throughout the document but the last page should receive a different Running Element, which is placed at the end of the document.

  • first-except

    • If a Running Element of the name is on the page, nothing is displayed.

    • If there is none, it falls back to the behavior of start.

    • This keyword is useful on chapter title pages where the chapter name is already displayed.

If a Running Element or its contents define Generated Content that contains (or ) their value will be the same as if they were defined as content of the Page Margin Box the Running Element is used in.

Running Documents

In case does not suffice and are not an option, it is possible to use Running Documents inside Page Margin Boxes.

A Running Document is a String containing an HTML document or document fragment or a URL that references a document as argument of the xhtml() function.

The xhtml() function is a proprietary extension of CSS and will only work for RealObjects products.

/* document fragment */
content: xhtml("<table>…</table>");
/* complete document */
content: xhtml("<html><head>...</head><body>...</body></html>");
/* external document */
content: xhtml(url("header.html"));

The document is loaded independently inside the Page Margin Box but styles from the document are passed down to it. This can be an advantage as the same style is used throughout all documents. In some cases though this behavior is not desired as this style may break the layout of the document inside the Page Margin Box. To prevent passing down style the –ro-passdown-styles property is used.

When using the xhtml() function in non-HTML5 documents (e.g. XHTML inside the head in a <style> element) the entire CSS needs to be wrapped in an XML comment.

<!--
@page {
    @top-center {
        content: xhtml("<table>...</table>");
    }
}
-->

Running Documents have access to and from their embedding document and may display them, but cannot influence them.

Counters and Named Strings created inside Running Documents have no effect outside of the Running Document.

Generated Content for Pages

Additional features for are available within Page Margin Boxes.

Page Counters

To add page numbers to documents, Page Counters are used. Page Counters work like regular counters, but are defined on pages and accessed in page margin boxes.

The default Page Counter is named "page" and automatically defined in HTML documents.

@page {
    @bottom-right {
        content: counter(page);
    }
}

For XML documents you can define the Page Counter as follows.

@page:first {
    counter-reset: page applicationValue("com/realobjects/pdfreactor/start-page-number");
}

Additionally there is the "pages" counter, which is always defined as the total number of pages of the laid out document.

content: "Page " counter(page) " of " counter(pages)

You can add an offset to the pages counter value (e.g. -1 to ignore the cover page) via the @-ro-preferences property .

Named Strings

Named Strings allow to store the text of an element and its as String for use in Page Margin Boxes.

A Named String is defined very similar to a Counter and is used in a similar way. To create a Named String the property is used, which requires an identifier and a definition of the contents of the String. To read a Named String the string() function is used as value of the content property.

A Named String "headingString" created from the heading's text with the function content() and read with the string() function from the page header:

h1 {
    string-set: headingString content(text);
}
@page {
    @top-left {
        content: string(headingString);
    }
}

The content of a named String is very flexible and can take a combination of Strings, counter() functions and Named String keywords.

/* Creates a Named String in the form of "Chapter [chapter number]: [chapter title]". */
h1 {
    string-set: headingString "Chapter " content(before) ": " content()
}
/* Retrieves the first letter of an address element, useful as part of a page header
    for a sorted list of addresses */
address {
    string-set: addressEntry content(first-letter);
}

When a Named String is set multiple times on the current page, the optional 2nd parameter of the function, defaulting to first, specifies which one to use:

  • first: the first one

  • last: the last one

  • first-except: none, use empty string

  • start: the first one, if it is at the beginning of the page

If there is none on the current page (or, in case of start, none at its beginning), the last one before is used. If there is none, either, the default is the empty string.

Cross-references

A Cross-reference is a piece of text that references another location in the document in order to establish a thematic relationship to that location.

Although it is perfectly possible to add such references by hand, this approach is prone to error when creating and modifying the document. After a change the numbering and page numbers might not match the numbering from when the cross-reference was first defined. The same could happen to the reference text if it includes the chapter title.

To always keep the reference up-to-date with the referenced location, CSS provides the target-counter() and target-text() functions to retrieve the exact numbering, title or page number of the referenced location.

PDFreactor only resolves internal links referring to an anchor in the same input document, see the chapter for more information.

Counter Cross-references

The target-counter() function is used inside the content property the same way a counter() function would be used. It receives a URL to the referenced location and the name of the counter as identifier. It may receive an optional third argument to define the output style of the counter, just like the counter() function.

Cross-references created from an HTML hyperlink to a chapter heading with a numbering. The Cross-reference is declared with generated text and target-counter() functions to retrieve the page and chapter numbers.

HTML:

...
<p>For more information <a href="#chapter">see</a>.
...
<h1 id="chapter">Cross-references</h1>
...

CSS:

@page {
    @bottom-right {
        content: counter(page);
    }
}
h1 {
    counter-increment: chapterCounter;
}
h1::before {
    content: counter(chapterCounter, upper-roman);
}
a[href]::after {
    content: "Chapter " target-counter(attr(href url), chapterCounter, upper-roman)
                " on page " target-counter(attr(href url), page);
}

Assuming the referenced chapter would render on page 5 as the third chapter, the cross-reference would read:

For more information, see Chapter III on page 5.

Text Cross-references

The target-text() function is used inside the content property in a similar way as the target-counter() function is used. It receives a URL to the referenced location and takes one of these four keywords to specify the text to retrieve:

  • content - Retrieves the textual content of the element. This is the default keyword if no keyword is present.

  • first-letter - Retrieves the first letter of the element's textual content.

  • before - Retrieves the before of an element.

  • after - Retrieves the after of an element.

The following example shows a cross-reference that references a heading and shows its before Generated Content and text:

a[href]{
    content: target-text(attr(href url), before) " "
        target-text(attr(href url), content);
}

target-text() makes it easy to retrieve the before Generated Content of an element, which may include its numbering. This method does not require any knowledge about how this before Generated Content is created but it also does not allow to rebuild it into something different.

If the before Generated Content of an element is "2.1" and the page header should be "Chapter 2, Section 1" the target-counter() function provides the necessary means to retrieve all the individually.

Footnotes

A footnote is a text note placed on the bottom of a page, a column or a region. It references a specific part of the main content of the document, giving further explanations or information about a citation. A footnote is marked by a defined symbol both in the main content of the page and in the footnote area at the bottom, to show which parts belong together.

For content that is required to have a footnote, the following style can be applied:

float: footnote;
The text content of the element that the style applied to, will appear in the footnote area at the bottom of the page. Content in the footnote area of pages can be styled via CSS using the footnote at-rule.

HTML (snippet)

<p>This is a CSS<span class="footnote">Cascading Style Sheet</span> generated footnote.</p>

CSS

.footnote {
    float: footnote;
}
@page {
    @footnote {
        border-top: solid black 1px;
    }
}

The pseudo-element ::-ro-footnote-area allows to select the footnote area of multi-column or region elements for styling.

.multiColumn {
    columns: 2;
}
.multiColumn::-ro-footnote-area {
    border-top: solid black 1px;
}

By defining a footnote, a footnote call is left behind in the main content. Its content and style can be influenced by the footnote-call pseudo-element.

For every footnote element, there is also a footnote-marker pseudo-element added. Usually this contains the same number or symbol as the footnote-call it belongs to.

.footnote::footnote-call {
    content: counter(footnote, decimal);
}
.footnote::footnote-marker {
    content: counter(footnote, decimal);
}

By default, the footnote counter is available and is automatically incremented for every element with the style:

float: footnote
By default, this counter numbers the footnotes sequentially for the entire document. To number footnotes on a per-page basis, the counter has to be reset on every page, using the following style:
@page {
    counter-reset: footnote;
}

PDFreactor currently does not support nested footnotes.

Normally, footnotes area laid out as block elements, which means that they are stacked on top of each other. When having several short footnotes, it can make sense to place them next to each other, as if they were inline elements. This can be achieved by using the property, which can either be set to block or inline:

.foonote {
    float: footnote;
    footnote-display: inline;
}

Continuation Markers

When content is fragmented it can be helpful to show a hint that it is continued on the next page or a fragment is a continuation from a previous one. PDFreactor allows to specify such continuation markers.

The markers are generated content and as such they are addressed with proprietary pseudo-elements. The pseudo-element ::-ro-before-break creates markers at the bottom or before a break (e.g. "Continued on next page"), while ::-ro-after-break creates markers at the top or after the break. These continuation markers are only created if there is a next or previous fragment, i.e. the respective element is split.

In the current implementation, the continuation markers can only be applied on block elements (display: block). This means that when intending to apply them on a table, they must be used on a container element that wraps the table:

HTML:

<div class="table">
    <table> ... </table>
</div>

CSS:

div.table::-ro-before-break {
    content: "Continued on page " -ro-counter-offset(page, 1);
    text-align: center;
    font-weight: bold;
}
div.table::-ro-after-break {
    content: "Continuation from page " -ro-counter-offset(page, -1);
    text-align: center;
    font-weight: bold;
}

In order to hint to the next page number, the proprietary function -ro-counter-offset is used in this sample to modify the current page number by one.

Transforms

PDFreactor is capable of applying two dimensional transformations on elements with the property, which makes moving, rotating and scaling document content possible.

Transforms do not have an impact on the document layout, e.g. content with scaled up size will not push other content away to prevent overlapping.

Reduce Table Width with Rotated Table Headers

is able to reduce the width of table headers with transforms.

The rotateTableHeaders() function transforms and rotates a table header, in order to reduce its width. If there is no table header, the first line is converted to one.

This function takes two parameters:

  • table: The HTML node of the table

  • params: An object of optional parameters

Options
Key Description Default
angle The angle in degrees at which the header will be rotated. Should be between -90 and 90 45
width The width that the header cells should have after the transformation, e.g. "20pt". "auto"
firstCol Whether to prevent the first column from being transformed. false
lastCol Whether to prevent the last column from being transformed. false
footer Whether to create a <tfoot> element from the last row in the table. Has no effect if the table already contains a <tfoot>. false

Multi-column Layout

The content of a document can be arranged in columns with elements like images or titles spanning through all columns if desired. Elements are laid out in a way similar to pages, text and boxes will break whenever no space is left in a column.

Multi-column layout is often used in print products like newspapers or magazines, it is intended to reduce the line width to make text easier to read.

The following box shows how text flows in a three-column layout. The paragraphs are numbered to better visualize the effect of multi-column layout.

[1] Lorem ipsum dolor sit a­met, consectetur adipiscing elit. Nulla in libero turpis. Sed sed dolor diam, eu da­pibus quam. Quisque ut nulla purus, iaculis sollicitu­din erat. Nullam dictum suscipit porttitor.

[2] Aliquam aliquam ele­mentum elementum. Donec vel odio nec diam ullamcor­per ultricies vel sit amet elit. Cras non aliquet lectus.

[3] Donec sollicitudin lorem placerat est condimentum rutrum. Fusce tempor cursus rutrum. Duis mattis mattis sapien. Pha­sellus tempus iaculis tellus sed vestibulum.

[4] Etiam faucibus consec­tetur augue, sit amet inter­dum elit dapibus at.

To create a multi-column layout inside an element add either the property or or both. By adding them the element becomes a multi-column element.

The column-count property defines the number of columns inside the element. Any number greater than 1 will create a multi-column layout. The column-count property is especially useful if the actual width of the columns is not as important as the number of columns.

Alternatively, the column-width property can be used to specify a minimum width for the columns. Based on this width the final column count is computed, thus the resulting column widths are likely larger than the specified value.

/* define two columns */
div.twoColumns { column-count: 2; }

/* define columns with a width of 2in */
div.twoInchColumns { column-width: 2in; }

By default, PDFreactor aims to balance the content of columns so that the content of all individual columns is equally long, if possible. This has the effect of keeping the height of each column at the possible minimum, which automatically determines the height of the multi-column element as a whole if it wasn't defined by a height property or attribute.

This behavior can also be modified to fill columns sequentially. In this case, the columns are filled until no more space is available in one column and the rest of the content needs to be moved to the next column. With this behavior a multi-column element whose height is not restricted will take up all the remaining space inside the multi-column-element, up to the remaining space available on the page until it breaks to another column.

The filling behavior can be controlled with the property:

/* sequential filling behavior */
div.sequentialFill { column-fill: auto; }

/* balanced filling behavior */
div.balancedFill { column-fill: balance; }

A defined height on the multi-column element will be used for an element, regardless of the filling behavior. If there is less content than there is space inside the multi-column element a balanced filling behavior will create shorter columns, leaving space at the bottom of the multi-column element. Sequential filling behavior may not have enough content to fill all the columns, thus columns may be left empty. If there is more content than there is space inside the multi-column element, the multi-column element will create a page break and continue on the next page, at the first column.

Usually elements inside a multi-column element are laid out one after another in columns as defined by the filling behavior. Some elements, however, may require a certain behavior when inside columns.

There are elements that are required to span all columns inside the multi-column element instead of only one. Headings, pictures or tables are the most common examples. To have an element span all columns the property is used.

/* a heading that spans all columns */
h1 { column-span: all; }

/* a table in a single column */
table { column-span: none; }

To add some visual appeal to the multi-column element borders, backgrounds and padding can be used. Beside these standard styles multi-column elements can also receive additional styles for the space between columns.

To visually separate columns it is possible to define the gap width. Gaps can be considered as padding between columns. To define the gap width for a multi-column element the column-gap property is used.

/* a gap of 0.25in */
div.multiColumn { column-gap: 0.25in; }

In addition to the gap a rule can be added between the columns as additional visual aid for separating columns. To define rules for a multi-column element the property either the shorthand or the individual properties , or can be used.

/* a solid black rule with 0.1in width*/
div.multiColumn {
    column-rule-width: 0.1in;
    column-rule-style: solid;
    column-rule-color: black;
}

/* the same definition as shorthand */
div.multiColumn { column-rule: 0.1in solid black; }

A Multi-column layout with justified text looks best when the text is laid out with enabled.

Line Grids and Snapping

With CSS it is possible to align lines of text to invisible grids in the document. This greatly improves readability of duplex printing or for documents with multi-column layouts. Lines remain at the same position on every page, thus keeping a vertical rhythm which is very beneficial to the reading experience.

The below images show how snapping to the line grid works and how it improves readability in a text with two columns (the line grid is visualized by the dotted lines).

Lines not snapped
Lines not snapped
Lines snapped
Lines snapped to grid

Snapping to grid can be enabled by using the CSS property . In addition to snapping to the baseline of the grid, it is also possible snap line boxes to the center of two of the grid's lines. The latter may be beneficial for text that contains small and large font sizes because the space in the grid is used more efficiently.

/* snapping to baseline */
p {
    -ro-line-snap: baseline;
}

/* snapping between grid lines */
p {
    -ro-line-snap: contain;
}

Line grids are created automatically. Normally, one line grid is created for the root element on each page and is then used by all its block-level descendants. It is also possible to create a new line grid for a block using its own font and line height settings. This is very useful for multi-column containers as it might be undesirable for such a container to use its parent's grid. A new grid can be created with the following style declaration, using the CSS property :

div {
    -ro-line-grid: create;
}

When using Page Floats and line grids, make sure that top floated elements are also set to snap to the grid, otherwise they may push the text below them downwards, so that the lines are no longer aligned with the grid.

Also avoid mixing different line grid settings with page floats, as on each page only the last page float that snaps to a grid can be taken into account, so using different line grids may also lead to misaligned text.

Region Layout

Regions are containers for document content similar to pages or columns, but they can be positioned individually. In contrast to automatically created pages and columns, regions are based on block elements from the document, which presents them with more styling options.

Regions belong to a region chain, that connects them and tells how their contents flows from one to another. The content of a region chain is called the named flow and elements can be added to a named flow to be displayed in regions.

Regions
A named flow flows through a region chain.

Adding Regions to Region Chains

Most block elements can be defined as a region. They are not required to be of the same size nor are they required to be the same node name.

To create a region from a block element, the property is used. It receives an identifier. A region chain contains all regions of the same identifier in document order. The identifier is also the name of the named flow these regions will display.

A region element will not have its subtree rendered. It either displays content from a named flow or nothing.

A chain of two regions defined for two HTML div elements with IDs "region1" and "region2".

#region1, #region2 {
    -ro-flow-from: regionChainName;
}

PDFreactor lays out content into regions and breaks text and boxes where no space is left. The number of regions inside a region chain is limited by the number of associated Region elements though and it is possible that the content of a named flow occupies more space than is available inside the regions of a region chain. In that case content from the named flow overflows the last region inside the region chain.

A region does not influence the style of the content it contains. No style is inherited from a region into the displayed named flow and style that would influence the content of an element has no effect on a region's content.

Adding Content to a Named Flow

The –ro-flow-into property adds document content to a named flow. The content may consist of content from one or more elements. Content assigned to a named flow is not rendered at its position inside the document but inside one of the regions inside the region chain.

The property receives an identifier which is the name of the named flow the content belongs to. An optional keyword defines what part of the styled element should be taken into the named flow:

  • element

    • Adds the entire element to the named flow.

    • If no keyword is given, this is the default behavior.

  • content

    • Adds the element's content to the named flow.

Creation of a named flow for two HTML <article> elements while an HTML <section> element from one of the articles is moved to a different named flow.

HTML:

<article>...</article>
<article>
    ...
    <section id="info">...</section>
</article>

CSS:

article {
    -ro-flow-into: articleNamedFlowName;
}
section#info {
    -ro-flow-into: infoNamedFlowName;
}

The content of a named flow may be rendered inside regions, but it still inherits style and computes its style the same way it would as if it did not appear inside a region.

Region Generated Content

A region element can have before and after just like any other element. This generated content is rendered above or below the region's content and is not moved to the next region due to lack of space. Instead the available space inside a region is reduced. If there is not enough space left, the region's content flows over.

Controlling Breaks

Although PDFreactor performs automatic breaks between boxes for pages, columns and regions, it is often necessary to add explicit breaks in certain situations or breaks should be avoided to keep content together where it belongs together. This chapter explains how both can be achieved.

PDFreactor provides styles for HTML that influence the break behavior for certain elements like headings. Break Styles for XML documents need to be created based on the document language.

Breaking Around Boxes

To manipulate the break behavior before and after boxes, the break-before and break-after properties are used. They provide keywords to force or avoid page, column and region breaks.

A manual page break before an HTML <h1> element, used to make a chapter start on top of a new page.

h1 {
    break-before: page;
}

A manual page break before an HTML <h1> element, that makes the chapter start on a right page.

h1 {
    break-before: right;
}

This style creates a page break before the h1 and moves it to the next page. In case this is a left page another page break is performed, to move it to a right page again.

h1, h2, h3, h4, h5, h6 {
    break-after: avoid;
}

PDFreactor also supports the CSS 2.1 properties page-break-before and page-break-after. They are resolved as shorthands for break-before and break-after.

Avoid Breaking Inside Boxes

To manipulate the break behavior inside a box, the property is used. It specifies whether breaking should be avoided inside the box or not.

Avoid breaks inside an HTML <div> element.

div {
    break-inside: avoid;
}

PDFreactor also accepts the CSS 2.1 property page-break-inside and resolves it as shorthand for break-inside.

Adaptive Page Breaks

is able to automatically add page breaks depending on the amount of space left below an element with the help of the applyAdaptivePageBreaks() function.

A possible use case is to prevent a new section from beginning at the bottom of a page.

The function also prevents large whitespaces that occur when in situations where only a couple of sentences from a previous section are followed by a page break as the next section begins.

The function takes two parameters:

  • selector: (optional) The CSS selector for the elements that may require a new page break. Default value: "h1, h2"

  • threshold: (optional) If an element is below this percentage of the page height, a page break is inserted. Default value: 67

Page Floats

Page floats are an extension of regular floats, also called inline floats, as they float in inline direction, i.e. left and right. Page floats on the other hand allow to float up and down, to the top or the bottom of a fragmentation container (page, column or region). If there is not enough space left, the page float is moved to the next fragmentation container, e.g. to the top/bottom of the following page, while the rest of the content continues on the current page.

The current implementation of page floats does come with some limitations:

The CSS property float has been extended with the values -ro-top and -ro-bottom to enable page floats. To set the distance between two page floats of the same side or to the corresponding edge of the page, the new property -ro-float-offset can be used.

With this sample, elements with the class pageFloatTop float to the top of their page with a gap of 5 mm to the page margin areas at the top.

CSS:

.pageFloatTop {
    float: -ro-top;
    -ro-float-offset: 5mm;
}

When inline floats (left or right floated) precede the page float, the inline float may overflow the page. The same may happen in wrapped column flex items. Basically, when blocks of content are next to each other, problems can arise when the page float does not originate from the first one. This is a known issue that will be addressed in a future version.

Print Specific Page Properties

PDFreactor provides additional means for professional printing that allow to specify oversized pages, a bleed area and marks for cutting sheets to the final page size and color proofing.

PDF Page Boxes

Page boxes are used to specify the page geometry, especially in professional printing. PDFreactor supports the TrimBox, MediaBox, BleedBox, CropBox and ArtBox.

TrimBox

The TrimBox defines the size of the final print result, the final page. It contains the page content.

The size of the TrimBox is defined equivalent to the page size, as mentioned in chapter , using the property.

The value of the size property also automatically specifies the TrimBox.

size: A4 portrait;

MediaBox

In prepress, a printed document can contain more information than just the actual content in the TrimBox (e.g. bleed or ).

As this information does not belong to the print result and instead needs to be printed around it, a print sheet larger than the print result is needed. The MediaBox defines the size of the print sheet.

Special oversize formats are used as print sheet in such cases. For DIN Deutsches Institut für Normung, in English: German Institute for Standardization, Germany's ISO member body. standard-based formats, the matching oversize formats to the A series are the DIN-RA and DIN-SRA formats. An overview of all supported page sizes can be found in the Appendix

The property is used to specify the media size.

The document should be printed in DIN-SRA4 and the MediaBox is set to this size

-ro-media-size: SRA4;

The MediaBox is the largest of all 5 page boxes and contains all others which can be smaller or equal than this box.

BleedBox

The BleedBox contains the TrimBox and is slightly larger. Content from the TrimBox may "bleed" into the BleedBox where it is still painted.

This is necessary for content that should reach to the edge of the print result. It prevents having unprinted areas due to unprecise trimming of the printed sheet.

The size of the BleedBox is defined as a width that adds to the TrimBox' size on all four sides. Common bleed values are 3-5 mm or 1/8 inch.

Setting the bleed size can be achieved by using the property bleed.

A bleed width of 3mm around the print result. The Bleed Box determines it's size from the TrimBox and this width.

bleed: 3mm;

CropBox

The CropBox defines the complete area of the document that should be displayed on screen or printed out.

The crop size can be defined using the property .

The crop size can be set to a specific page size format (like setting the trim size) or to one of the page boxes. It is set to none by default.

The CropBox is set to match the MediaBox.

-ro-crop-size: media;

ArtBox

The ArtBox is used to define a specific area inside which the page's content is located.

Using the property , the ArtBox can be set to a specific page size or one of the page boxes. It is set to none by default.

When generating a PDF/A conformant file (see PDF/A conformance), the ArtBox must not to be defined, so the property must be set to none.

Printer Marks

Printer Marks are special pieces of information located outside of the actual print result. They are used to prove the correctness of the result in prepress printing and are placed outside the .

Cutting out the print result of the print sheet is done inside the bleed area. Trim and bleed marks indicate where this area starts and ends. Both types of marks are displayed as hairlines in the corner of the print sheet.

Registration marks show whether the printer's colors are aligned properly. They are printed as crosshair-shaped objects located on each side of the print sheet.

Color bars show if the colors of the print result meet the expected result. They consist of a variety of colors that can be checked individually.

Printer marks
Printer Marks

The property is used to add crop, bleed and cross marks. The property sets the width of the mark lines, sets their color.

marks: crop -ro-bleed cross;
-ro-marks-width: 1pt;
-ro-marks-color: red;

Setting one of the -ro-colorbar-* properties defines where a color bar is added to the document.

-ro-colorbar-bottom-left: gradient-tint;
-ro-colorbar-bottom-right: progressive-color;

Positioning Content Relative to Page Boxes

Using the proprietary property allows content with ": absolute" to be positioned relative to any page box of its page. This is especially useful to place decorative content relative to the bleed box, thus making it exceed the trim box so bleed is properly utilized.

@page {
    bleed: 3mm; 
    -ro-media-size: SRA4 portrait; 
    -ro-crop-size: media;
    marks: trim bleed registration; 
    @top-right-corner {
        content: counter(page);
        vertical-align: top;
        text-align: right;
        padding: 1cm;
        position: absolute;
        top: 0;
        right: 0;
        width: 5cm;
        height: 5cm;
        background-image: radial-gradient(at 100% 0%, lightblue 0%, white 50%);
        -ro-position-origin: -ro-bleed-box; /* Position in the bleed box of the page */
    }
}

Leaders

Leaders are often used to draw a visual connection between an entry in a table of contents or similar structures, and a corresponding value.

In CSS, drawing leaders is accomplished via the use of the leader() function. This function accepts the following values:

A leader may be added using the content property, and can be combined freely with other generated content such as counters.

a.toc_ah2::after {
         content: leader(dotted) " " target-counter(attr(href url), page);
}

This may result in a display such as:

Leaders

Table of Contents

A table of contents can be inserted into a document to generate a list of the chapters or other important sections in the document.

This feature is usually used together with cross-references to add links to a table of contents. With the addition of counters, it can be complemented with the page numbers of the linked chapters.

The createTableOfContents() function provided by allows to insert a table of contents that is generated from given elements.

The table of contents requires certain styles to work properly. These styles are included in the awesomizr.css and should be added either to the document or by using the userStyleSheets configuration property of the PDFreactor API.

The table of contents is inserted as an HTML div element with the class ro-toc. Inside this div can be two headings (document title and a heading for the table of contents with the class ro-toc-heading) and the div elements with links to the pages and a class depending on the level of the referenced element (ro-toc-heading1, ro-toc-heading2, ...)

The level of a TOC entry is determined by the position of its selector in the elements array.

Awesomizr.createTableOfContents({elements: ["h1", "h2", "h3"]});

The function's optional parameter is an object with several options:

Values of the option object
Key Type Description Default
insertiontarget string CSS selector string of the element where the table of contents should be inserted. "body"
insertiontype string Specifies where exactly the table of contents should be inserted:
  • "beforebegin": Before the element

  • "afterbegin": As new first-child

  • "beforeend": As new last-child

  • "afterend": After the element

"afterbegin"
elements array An array of the CSS selector strings of elements that should be added to the table of contents. Each TOC entry gets a class name based on the index of the corresponding selector in this array, e.g. by default the h2 entries have the class ro-toc-level-2. ["h1", "h2"]
toctitle string The title of the table of contents. If an empty string is set, no title is inserted. "Table of Contents"
disabledocumenttitle boolean Whether the document title should NOT be inserted before the table of contents. false
text function By default, the text for the entries of the TOC is the text content of the element matching the specified selector. Alternatively, you can specify a function, the return value of which will be used as text for the respective entry. The element representing the entry is passed as an argument to the function. Returning false will skip the entry entirely and not include it in the TOC. null

Simple table of contents created with Awesomizr based on HTML <h2> elements.

<link href="css/awesomizr.css" type="text/css" rel="stylesheet" />
<script type="text/javascript" src="awesomizr.js"></script>
...
<body onload="Awesomizr.createTableOfContents({elements:['h2']});">
Awesomizr.createTableOfContents({
    elements: ['img'],
    text: function(elem) {
        // the entry text should be the image's alt text
        var txt = elem.alt;

        if (txt) {
            return txt;
        }

        // skip images without alt text
        return false;
    }
});

Alternatively, a table of contents can also be created by using XSLT. Both approaches are demonstrated by the two versions of the "Children's Novel" sample. You can find them in the PDFreactor/samples/novel directory.

Shrink-to-Fit

For some documents parts of the content are too wide to fit the pages. In most cases this is caused by HTML documents containing fixed widths intended for screens, e.g. 1024px for the main container element.

While the best solution is adding a print style sheet to override the critical styles with relative widths, such content can also be shrunk automatically without changing the source document or adding specific styles.

There are two different shrink-to-fit functionalities available in PDFreactor, pixelsPerInchShrinkToFit and . These are non-exclusive and are applied in the aforementioned order.

Shrink-to-fit is only recommended when you need to force content into the boundaries of pages. For high-fidelity print output, these modes should not be used.

Scaling Pixel Lengths

This configuration property adapts the "pixels per inch" value used for laying out the document, i.e. it only scales lengths set as px including such set via HTML attributes.

config.setPixelsPerInchShrinkToFit(true);
config.PixelsPerInchShrinkToFit = true;
$config["pixelsPerInchShrinkToFit"] = true;
config['pixelsPerInchShrinkToFit'] = True
config['pixelsPerInchShrinkToFit'] = true
config.pixelsPerInchShrinkToFit = true;
config.pixelsPerInchShrinkToFit = true;
$config["pixelsPerInchShrinkToFit"] = true;
{ "pixelsPerInchShrinkToFit": true }
--pixelsPerInchShrinkToFit

The pixels per inch can also be specified manually.

Scaling Down Page Content

This property must be part of the @page rule affecting the first page:

@page {
    -ro-scale-content: auto;
}

For further details see .

Page content scaling, if used, always applies to all pages equally. It cannot be applied to only a subset pages or page groups.

Scaling Down Text

The proprietary value -ro-scale-down of the CSS property allows visually scaling down paragraphs that overflow at the end of lines to automatically make their text fit their width.

Contrary to normal text overflow styles, -ro-scale-down also works with multi-line text. It then applies the scaling to all lines, so that the whole text content is scaled down equally. However, only overflow in inline (i.e. horizontal) direction is taken into account to determine whether scaling needs to be applied, not overflow in block (i.e. vertical) direction.

This feature is especially useful if you want to force text whose length you can't control into a pre-defined container, such as forcing user-supplied text into an existing form field.

Vertical Position

You can control the vertical position of the scaling effect with the CSS property and its usual values: start, end, center, baseline (default) and stretch.

The value stretch won't scale down the text vertically, instead the text is skewed to keep its original height.

.scaleDown {
    /* Enable text scale down */
    text-overflow: -ro-scale-down;
    /* Make sure we only have a single line */
    white-space: nowrap;
    /* Don't scale vertically */
    align-content: stretch;
}

Page Order

Usually, the page order of a PDF is only determined by its input document. However, using the configuration property pageOrder, the page order can be set by providing a string parameter.

For ease of use the following constants are available for the most common cases of page orders:

Instead of using a predefined order the parameter can also provide a custom order as comma-separated list of page numbers and ranges:

config.setPageOrder("2,5,6*2,8..10,-1,-2");
config.PageOrder = "2,5,6*2,8..10,-1,-2";
$config["pageOrder"] = "2,5,6*2,8..10,-1,-2";
config['pageOrder'] = "2,5,6*2,8..10,-1,-2"
config['pageOrder'] = "2,5,6*2,8..10,-1,-2"
config.pageOrder = "2,5,6*2,8..10,-1,-2";
config.pageOrder = "2,5,6*2,8..10,-1,-2";
$config["pageOrder"] = "2,5,6*2,8..10,-1,-2";
{ "pageOrder": "2,5,6*2,8..10,-1,-2" }
--pageOrder "2,5,6*2,8..10,-1,-2"

The page order shown above results in a PDF having the following page numbers from the original document, assuming it has 20 pages total: 2, 5, 6, 6, 8, 9, 10, 20, 19.

On the Python command line instead of --pageOrder "-1..1" we recommend using --pageOrder="-1..1" to specify the page order.

Merge Mode Arrange

The syntax of page order is extended when setting the merge mode to MERGE_MODE_ARRANGE.

As usual, when the merge mode is selected PDFreactor requires one or more merge PDFs to be set (see ).

The merge documents specified with the array are numbered, beginning with one for the first PDF (when specifying a single document, it is also addressed with "1").

To select pages from a merge document, first use its number followed by a colon, which then is followed by the page order syntax described above. Note that the converted document can be addressed using "0:", however, this is not necessary, as it is used by default if no document is specified.

config
    .setMergeMode(MergeMode.ARRANGE)
    .setMergeDocuments(
        new Resource().setUri("https://www.myserver.com/insert1.pdf"),
        new Resource().setUri("https://www.myserver.com/insert2.pdf"))
    .setPageOrder("1, 1:1, 2:A, 2..-1, 1:2");
config.MergeMode = MergeMode.ARRANGE;
config.MergeDocuments = new List<Resource>
{
    new Resource { Uri = "https://www.myserver.com/insert1.pdf" },
    new Resource { Uri = "https://www.myserver.com/insert2.pdf" }
};
config.PageOrder = "1,1:1,2..-1";
$config["mergeMode"] = MergeMode::ARRANGE;
$config["mergeDocuments"] = array(
    array("uri": "https://www.myserver.com/insert1.pdf"),
    array("uri": "https://www.myserver.com/insert2.pdf")
);
$config["pageOrder"] = "1,1:1,2..-1";
config['mergeMode'] = PDFreactor.MergeMode.ARRANGE
config['mergeDocuments'] = [
    { 'uri': "https://www.myserver.com/insert1.pdf" },
    { 'uri': "https://www.myserver.com/insert2.pdf" }
]
config['pageOrder'] = "1,1:1,2..-1"
config['mergeMode'] = PDFreactor::MergeMode::ARRANGE
config['mergeDocuments'] = [
    { uri: "https://www.myserver.com/insert1.pdf" },
    { uri: "https://www.myserver.com/insert2.pdf" }
]
config['pageOrder'] = "1,1:1,2..-1"
config.mergeMode = PDFreactor.MergeMode.ARRANGE;
config.mergeDocuments = [
    { uri: "https://www.myserver.com/insert1.pdf" },
    { uri: "https://www.myserver.com/insert2.pdf" }
];
config.pageOrder = "1,1:1,2..-1";
config.mergeMode = PDFreactor.MergeMode.ARRANGE;
config.mergeDocuments = [
    { uri: "https://www.myserver.com/insert1.pdf" },
    { uri: "https://www.myserver.com/insert2.pdf" }
];
config.pageOrder = "1,1:1,2..-1";
$config["mergeMode"] = PDFreactor::MergeMode->ARRANGE;
$config["mergeDocuments"] = [
    { "uri" => "https://www.myserver.com/insert1.pdf" },
    { "uri" => "https://www.myserver.com/insert2.pdf" }
];
$config["pageOrder"] = "1,1:1,2..-1";
{ "mergeMode": "ARRANGE",
  "mergeDocuments": [
    { "uri": "https://www.myserver.com/insert1.pdf" },
    { "uri": "https://www.myserver.com/insert2.pdf" }
], "pageOrder": "1,1:1,2..-1" }
-C config.json

With the following config.json:

{ "mergeMode": "ARRANGE",
  "mergeDocuments": [
    { "uri": "https://www.myserver.com/insert1.pdf" },
    { "uri": "https://www.myserver.com/insert2.pdf" }
], "pageOrder": "1,1:1,2..-1" }

The order shown above would be:

  • "1" — Page 1 from the converted PDF.

  • "1:1" — Page 1 from insert1.pdf.

  • "2:A" — All Pages from insert2.pdf.

  • "2..-1" — Pages 2 to the last page from the converted PDF.

  • "1:2" — Page 2 from insert1.pdf.

Pages Per Sheet

Instead of containing only one page of the input document per PDF page, multiple pages of the input document can be displayed on one sheet.

The pages will be arranged in a grid on the sheet. The number of columns and rows of this grid are user-defined.

To utilize Pages Per Sheet use the configuration property pagesPerSheetProperties.

The properties rows and cols define the corresponding number of pages that get laid out on a single page. Their values are required. The values for sheetSize, sheetMargin and spacing can be set as CSS width values. direction defines in which way the single pages are ordered.

There are the following options to set a direction:

config.setPagesPerSheetProperties(new PagesPerSheetProperties()
    .setCols(2)
    .setRows(2)
    .setSheetSize("A4 landscape")
    .setSheetMargin("2.5cm")
    .setSpacing("2cm")
    .setDirection(PagesPerSheetDirection.RIGHT_UP));
config.PagesPerSheetProperties = new PagesPerSheetProperties
{
    Cols = 2,
    Rows = 2,
    SheetSize = "A4 landscape",
    SheetMargin = "2,5cm",
    Spacing = "2cm",
    Direction = PagesPerSheetDirection.RIGHT_UP
};
$config["pagesPerSheetProperties"] = array(
    "cols" => 2,
    "rows" => 2,
    "sheetSize" => "A4 landscape",
    "sheetMargin": "2,5cm",
    "spacing" => "2cm",
    "direction" => PagesPerSheetDirection::RIGHT_UP
);
config['pagesPerSheetProperties'] = {
    'cols': 2,
    'rows': 2,
    'sheetSize': "A4 landscape",
    'sheetMargin': "2,5cm",
    'spacing': "2cm",
    'direction': PDFreactor.PagesPerSheetDirection.RIGHT_UP
}
config['pagesPerSheetProperties'] = {
    cols: 2,
    rows: 2,
    sheetSize: "A4 landscape",
    sheetMargin: "2,5cm",
    spacing: "2cm",
    direction: PDFreactor::PagesPerSheetDirection::RIGHT_UP
}
config.pagesPerSheetProperties = {
    cols: 2,
    rows: 2,
    sheetSize: "A4 landscape",
    sheetMargin: "2,5cm",
    spacing: "2cm",
    direction: PDFreactor.PagesPerSheetDirection.RIGHT_UP
};
config.pagesPerSheetProperties = {
    cols: 2,
    rows: 2,
    sheetSize: "A4 landscape",
    sheetMargin: "2,5cm",
    spacing: "2cm",
    direction: PDFreactor.PagesPerSheetDirection.RIGHT_UP
};
$config["pagesPerSheetProperties"] = {
    "cols" => 2,
    "rows" => 2,
    "sheetSize" => "A4 landscape",
    "sheetMargin" => "2,5cm",
    "spacing" => "2cm",
    "direction" => PDFreactor::PagesPerSheetDirection->RIGHT_UP
}
{ "pagesPerSheetProperties": {
    "cols": 2,
    "rows": 2,
    "sheetSize": "A4 landscape",
    "sheetMargin": "2,5cm",
    "spacing": "2cm",
    "direction": "RIGHT_UP"
}
-C config.json

With the following config.json:

{ "pagesPerSheetProperties": {
    "cols": 2,
    "rows": 2,
    "sheetSize": "A4 landscape",
    "sheetMargin": "2,5cm",
    "spacing": "2cm",
    "direction": "RIGHT_UP"
}

Booklet

A Booklet is a set of folded pages meant to be read like a book. PDFreactor supports creating Booklets by combining the functionality with the feature.

It orders the pages in booklet or rtl booklet page order and places two of these pages on each sheet, rotated by 90 degrees and side-to-side.

A configuration property allows to configure the page size and margins of the container page as well as to use the default booklet page order or a reversed order:

config.setBookletMode(new BookletMode()
    .setSheetSize("A4 landscape")
    .setSheetMargin("1cm")
    .setRtl(false));
config.BookletMode = new BookletMode
{
    SheetSize = "A4 landscape",
    SheetMargin = "1cm",
    Rtl = false
};
$config["bookletMode"] = array(
    "sheetSize" => "A4 landscape",
    "sheetMargin": "1cm",
    "rtl" => false
);
config['bookletMode'] = {
    'sheetSize': "A4 landscape",
    'sheetMargin': "1cm",
    'rtl': False
}
config['bookletMode'] = {
    sheetSize: "A4 landscape",
    sheetMargin: "1cm",
    rtl: false
}
config.bookletMode = {
    sheetSize: "A4 landscape",
    sheetMargin: "1cm",
    rtl: false
};
config.bookletMode = {
    sheetSize: "A4 landscape",
    sheetMargin: "1cm",
    rtl: false
};
$config["bookletMode"] = {
    "sheetSize" => "A4 landscape",
    "sheetMargin" => "1cm",
    "rtl" => false
}
{ "bookletMode": {
    "sheetSize": "A4 landscape",
    "sheetMargin": "1cm",
    "rtl": false
}
-C config.json

With the following config.json:

{ "bookletMode": {
    "sheetSize": "A4 landscape",
    "sheetMargin": "1cm",
    "rtl": false
}

Pixels per Inch

By default, lengths specified in pixels (i.e. via the CSS unit px or HTML attributes) are converted to physical lengths at a rate of 96 pixels per inch. With the configuration property pixelsPerInch this can be changed, e.g.:

config.setPixelsPerInch(120);
config.PixelsPerInch = 120;
$config["pixelsPerInch"] = 120;
config['pixelsPerInch'] = 120
config['pixelsPerInch'] = 120
config.pixelsPerInch = 120;
config.pixelsPerInch = 120;
$config["pixelsPerInch"] = 120;
{ "pixelsPerInch": 120 }
--pixelsPerInch 120

Increasing the pixels per inch can be used to shrink documents that would be to wide for pages due to fixed widths originally intended for screens.

Finding the optimum value can be automated using shrink to fit.

Internationalization

Languages

PDFreactor supports Unicode and includes default fonts for various non-Latin languages. See for more information on the included fonts and on how to add additional fonts.

You can specify a language for the whole document either by using the HTML lang attribute or by specifying a default in the API:

<html lang="de-DE">
config.setDocumentDefaultLanguage("de-DE");
config.DocumentDefaultLanguage = "de-DE";
$config["documentDefaultLanguage"] = "de-DE";
config['documentDefaultLanguage'] = "de-DE"
config['documentDefaultLanguage'] = "de-DE"
config.documentDefaultLanguage = "de-DE";
config.documentDefaultLanguage = "de-DE";
$config["documentDefaultLanguage"] = "de-DE";
{ "documentDefaultLanguage": "de-DE" }
--documentDefaultLanguage "de-DE"

The specified language will be used for automatic hyphenation of text (see ) and also conveys important information to screen readers when reading accessible PDFs (see ). It is required to specify the document language when producing accessible PDFs, otherwise PDFreactor may use "en-US" as the default.

Counters and list item markers can also be displayed in numerous languages and writing systems. See for all supported styles.

lang attributes can also be used to change the language for parts of the document.

Right-to-Left

PDFreactor analyzes the document to handle both left-to-right and right-to-left text correctly.

The base direction of the document defaults to left-to-right. You can set it to right-to-left by specifying the dir attribute on the root element as in the following example:

<html dir="rtl">

You can also override the base direction specifically for certain elements via the property :

div.english {
  direction: rtl;
}

You can override the implicit text direction by combining direction with the property :

span.forcertl {
  unicode-bidi: bidi-override;
  direction: ltr;
}

Text Direction Dependent Layouts

Using "logical" properties and values, as opposed to the common "physical" ones, allows layouts based on the text direction, instead of fixed "left" and "right" sides. They are mapped to physical sides based on the value of the direction property, which may be ltr (left-to-right, default) or rtl (right-to-left).

The "International Sample" document in the PDFreactor package demonstrates the usage of these properties and values. It can be found in the PDFreactor/samples/i18n directory.

The following tables list the direction dependent logical properties and values as well as the resulting physical ones for both left-to-right and right-to-left direction:

Logical Properties
Property LTR RTL
padding-inline padding-left padding-right padding-right padding-left
padding-inline-start padding-left padding-right
padding-inline-end padding-right padding-left
border-inline-start border-left border-right
border-inline-end border-right border-left
border-inline-start-color border-left-color border-right-color
border-inline-end-color border-right-color border-left-color
border-inline-start-style border-left-style border-right-style
border-inline-end-style border-right-style border-left-style
border-inline-start-width border-left-width border-right-width
border-inline-end-width border-right-width border-left-width
border-start-start-radius border-top-left-radius border-top-right-radius
border-start-end-radius border-top-right-radius border-top-left-radius
border-end-start-radius border-bottom-left-radius border-bottom-right-radius
border-end-end-radius border-bottom-right-radius border-bottom-left-radius
margin-inline margin-left margin-right margin-right margin-left
margin-inline-start margin-left margin-right
margin-inline-end margin-right margin-left
inset-inline left right right left
inset-inline-start left right
inset-inline-end right left
New Logical Values for float and clear
Property LTR RTL
inline-start left right
inline-end right left

Media Queries

Media Types

Media Queries are a CSS3 extension of media types. Media types allow to have styles that are only applied if the device or application displaying the document accepts the specified type. For example the following media rule will only be applied if the device accepts the media type print (which PDFreactor does):

@media print {
    p {
        background-color: transparent;
    }
}

If the styles of a certain media type have to be applied, but that media type is not accepted by PDFreactor (e.g. @media screen), the required media types can be set via API:

config.setMediaTypes("screen", "projection", "print");
config.MediaTypes = new List<string> { "screen", "projection", "print" }
$config["mediaTypes"] = array("screen", "projection", "print");
config['mediaTypes'] = [ "screen", "projection", "print" ]
config['mediaTypes'] = [ "screen", "projection", "print" ]
config.mediaTypes = [ "screen", "projection", "print" ];
config.mediaTypes = [ "screen", "projection", "print" ];
$config["mediaTypes"] = [ "screen", "projection", "print" ];
{ "mediaTypes": [ "screen", "projection", "print" ]}
--mediaTypes "screen" "projection" "print"

This example sets the three media types screen, projection and print, thereby overriding PDFreactor's default types.

CSS that should only be used by PDFreactor can either be added by using the API or if they depend on the specific document you can use the proprietary media type -ro-pdfreactor.

For example the following rule disables the page background color only if the document is used by PDFreactor:

@media -ro-pdfreactor {
    @page {
        background-color: transparent;
    }
}

Media Features

Media Queries allow to make styles dependent on certain device features like width and height of the viewport. As they extend media types they may start with one type which can be followed by media features, each linked with the keyword and.

Media features describe certain device properties, are always enclosed by parentheses and resemble CSS properties. Additionally, most features may be prefixed with min- or max- in order to express "greater or equal to" and "less or equal to" relationships to their value.

@media print and (max-device-width: 1024px) {
    ...
}

The styles of this media rule are only applied if the device width is 1024px or less.

The device properties for conversions can be set using the API:

config.setMediaFeatureValues(new MediaFeatureValue()
    .setMediaFeature(MediaFeature.DEVICE_WIDTH)
    .setValue("1024px"));
config.MediaFeatureValues = new MediaFeatureValue
{
    MediaFeature = MediaFeature.DEVICE_WIDTH,
    Value = "1024px"
};
$config["mediaFeatureValues"] = array(
    "mediaFeature" => MediaFeature::DEVICE_WIDTH,
    "value" => "1024px"
);
config['mediaFeatureValues'] = {
    'mediaFeature': PDFreactor.MediaFeature.DEVICE_WIDTH,
    'value': "1024px"
}
config['mediaFeatureValues'] = {
    mediaFeature: PDFreactor::MediaFeature::DEVICE_WIDTH,
    value: "1024px"
}
config.mediaFeatureValues = {
    mediaFeature: PDFreactor.MediaFeature.DEVICE_WIDTH,
    value: "1024px"
};
config.mediaFeatureValues = {
    mediaFeature: PDFreactor.MediaFeature.DEVICE_WIDTH,
    value: "1024px"
};
$config["mediaFeatureValues"] = {
    "mediaFeature" => PDFreactor::MediaFeature->DEVICE_WIDTH,
    "value" => "1024px"
};
{ "mediaFeatureValues": {
    "mediaFeature": "DEVICE_WIDTH",
    "value": "1024px" }
}
-C config.json

With the following config.json:

{ "mediaFeatureValues": {
    "mediaFeature": "DEVICE_WIDTH",
    "value": "1024px" }
}

The following table provides an overview of the supported media features. The default values can be found in the PDFreactor API documentation.

Supported media features
Feature Name Description min-/ max-
width The width of the targeted display area. Yes
height The height of the targeted display area. Yes
device-width The width of the rendering surface. Yes
device-height The height of the rendering surface. Yes
orientation Is portrait if height is greater than or equal to width, or landscape otherwise. No
aspect-ratio Calculated from width and height. The value is a fraction, e.g. 16/10. Yes
device-aspect-ratio Calculated from the device-width and device-height. The value is a fraction, e.g. 16/9. Yes
color The number of bits per color component of the output device. Yes
color-index The number of entries in the color lookup table. Yes
monochrome The number of bits per pixel in a monochrome frame buffer. Yes
resolution The device resolution in dpi, dpcm or dppx. This also defines the value of the window.devicePixelRatio property available from JavaScript. Yes
grid Whether the device is grid or bitmap based. No
-ro-output-format (proprietary) The output format of the conversion, either pdf, image or viewer (i.e. PDFreactor Preview app). No

PDFreactor does not take account of the values of CSS properties in the document when determining the values of media features. For example, setting the page height to 50mm will have no effect on a media query that tests the max-height of the document. Instead, the media features supported by PDFreactor all have default values (for details see the Configuration.MediaFeature class in the PDFreactor API documentation). These default values can be overridden through the PDFreactor API.

Document-Specific Preferences

PDFreactor allows setting certain configurations via the CSS of the document that is converted. This is done using the proprietary at-rule @-ro-preferences.

Example:

@-ro-preferences {
    /* The first page of the document should not be a cover page */
    first-page-side: verso;
}
@-ro-preferences properties
Property Name Values Description
first-page-side
  • left

  • right

  • verso

  • recto

  • auto (default)

Sets on which side the first page of the document should be. By default it is right, unless the document direction is right-to-left.
first-page-side-view
  • left

  • right

  • verso

  • recto

  • auto (default)

Sets on which side the first page of the document should appear in viewers, without impact on styles or layout. By default it is the same side as set by first-page-side.
page-layout
  • 1 column

  • 2 column

  • 1 page

  • 2 page

Sets the initial view mode for the document. Whether two pages should be next to each other and how scrolling between the pages should work.
initial-zoom
  • [percentage]

  • fit-page

  • fit-page-width

  • fit-page-height

  • fit-content

  • fit-content-width

  • fit-content-height

Sets the initial zoom factor when opening the document. Can either be a specific percentage value or the zoom factor can be computed dynamically so that the page (or its content) fits into the window of the viewer application. Please note, that not all fit-values are supported by all viewers. Generally, fit-page support is more common.
initial-page
  • [number]

Sets number of the page that should be scrolled to when opening the document. The default value is 1.
pdf-script-action
  • [String]

  • [String] [event] ...

  • none

Sets a PDF script that is executed when the PDF is opened by a viewer application, that supports PDF scripts and the corresponding event is triggered (e.g. on opening the PDF). This can also be set via the PDFreactor API. If set by both, the scripts set via API are overridden by those set via the CSS property (only if both are registered on the same event). The property allows a comma separated list of action and event pairs. More information can be found in the property description.
pages-counter-offset
  • [number]

Sets an optional offset to be added to the value of the pages counter. Negative values are valid. The default value is 0.
pdf-shape-optimization
  • visual (default)

  • none

Sets whether shapes should be written into the PDF in a way that prevents visualization issues in certain PDF viewers.

Converting Large Documents

In most cases, PDFreactor is able to handle even very large documents, provided that enough memory is made available. However, if there is not enough memory available or if large tables cause conversions to be too slow, PDFreactor offers specialized functionalities that disable certain resource intensive features to allow processing such documents much more efficiently in regards to memory and time. Those can be used separately or in combination.

Segmentation

Enabling segmentation allows PDFreactor to internally split conversions into multiple parts, drastically reducing the amount of memory required for large documents. The minimum document size for this to be noticeable depends on the complexity of the input document, but 5000 pages is a good estimate. This has no visible influence on the resulting PDF document, i.e. the edges of segments are not discernible. However there are some limitations:

  • Regions are not supported.

  • Shrink-to-Fit via pixelsPerInchShrinkToFit or -ro-scale-content is not supported.

  • The pageOrder setting is not supported.

  • The "pages" counter is not supported. This does not affect the "page" counter, other counters or named strings.

  • Using the function outside of page margin boxes may cause unpredictable results. When it is absolutely necessary it is highly recommended to use on an ancestor element of the ones using the value.

  • "tfoot"" and "thead" elements must be placed before the "tbody" or "tr" elements of the same "table". (If the document is not too large this can be corrected via JavaScript.)

  • All "style" elements must be in the header.

  • Due to the total amount of pages being unknown during the conversion of any segment but the last, log output and progress monitoring cannot estimate the progress of the conversion.

  • For the CSS functions target-counter and target-text to be able to access information from previous segments the property must be used.

  • , when enabled, is run in a preprocessing step with no access to any layout information and increases memory consumption to some extend.

If these restrictions are acceptable, the feature can be enabled in the PDFreactor configuration:

config.setSegmentationSettings(new SegmentationSettings()
    .setEnabled(true));
config.SegmentationSettings = new SegmentationSettings
{
    Enabled = true
};
$config["segmentationSettings"] = array(
    "enabled" => true
);
config['segmentationSettings'] = {
    'enabled': True
}
config['segmentationSettings'] = {
    enabled: true
}
config.segmentationSettings = {
    enabled: true
};
config.segmentationSettings = {
    enabled: true
};
$config["segmentationSettings"] = {
    "enabled" => true
};
{ "segmentationSettings": {
    "enabled": true
}}
-C config.json

With the following config.json:

{ "segmentationSettings": {
    "enabled": true
}}

Some optional functionalities increase the amount of memory required, due to data accumulating over the course of the entire conversion. These include links, bookmarks, tagging and logging at levels more verbose than info.

Fast Tables

Very large tables have a significant impact on performance. Tables that have simple structures and only basic sets of styles can be declared as fast tables, providing significantly better performance and lower memory requirements at the cost of the following restrictions:

  • Cell content is handled as a single line of text with uniform style and no influence on the table layout. If there is too much content, it will overflow.

  • Styles applied to the cells of the first two body rows are used for the rest of the table's content. Applying different styles to the second row allows alternating even/odd styles. Styles set on the child nodes of cells or other table body rows are ignored.

  • The structure is homogeneous, with all body rows having the same height and the cells of the first row (header or body) defining the widths of their columns. Widths are taken from style only, without measuring content. Column or row spans are not supported. Missing row elements and other incorrect structuring will lead to unexpected results.

  • Supported styles on cells are: , , , , , , , , , , , border-right, border-bottom, and related shorthands.

  • Supported styles on rows are: , and related shorthands.

  • Supported styles on col elements are: , and related shorthands.

  • The cell borders are created by using the border-right and border-bottom styles, creating a grid between the cells, similar to the effect of border-collapse: collapse. The borders at the table edges are created from the styles of the table element.

    Table footer cells are an exception as they use their border-top styles (instead of border-bottom) to create the horizontal border between body and footer cells.

  • Repeating table header and footer groups are limited to one row each. Those are styled independently from the table body.

  • All lengths must be absolute, except for the widths of columns which also support percentages.

  • The style set on the table element is also used for all cells. The property is not supported.

  • PDF tagging functionality has no access to the content of such tables. By default fast tables are marked as artifacts.

If these restrictions are acceptable, the feature can be enabled by setting the style : -ro-fast-table on table elements. The style can be applied selectively, to affect only specific tables of the document.

Recommendations for Large Documents

Enabling not only reduces the size of the resulting file, it also eliminates some inherent size limitations of the PDF format.

When converting via the Java API, an OutputStream should be passed to the convert method, so the document is streamed directly to disk or socket instead of keeping it in memory.

When converting via the web service, the convertAsync method should be used. See and for details.

Many PDF viewers and processors will not properly handle PDF files that are larger than 2GB.

Annotations

When using PDFs in a review process it is helpful to be able to effectively annotate the document. While HTML already provides elements like ins and del, PDFreactor also offers more specialized features.

Comments

It is possible to add PDF comments to the document using the addComments configuration property like this:

config.setAddComments(true);
config.AddComments = true;
$config["addComments"] = true;
config['addComments'] = True
config['addComments'] = true
config.addComments = true;
config.addComments = true;
$config["addComments"] = true;
{ "addComments": true }
--addComments

Depending on how the comment information is stored in your HTML source document, there are several style rules that can be applied. The most common use-cases are to either create a comment from an empty element (where any information is stored in its attributes) or to create a comment from a non-empty element (where the content is the text encompassed by the element):

HTML

<span class="comment" text="My Comment."></span>

CSS

span.comment {
    -ro-comment-content: attr(text);
}

HTML

<span class="comment">This text is commented</span>

CSS

span.comment {
    -ro-comment-content: content();
}

There are different styles to visualize a comment in the PDF:

  • note: Creates a small icon. This is the default style for all comments.

  • invisible: Does not create any visual effects.

  • highlight: Highlights the background of a section of text.

  • underline: Underlines a section of text with a straight line.

  • strikeout: Strikes out a section of text.

  • squiggly: Underlines a section of text with a squiggly line.

The comment styles highlight, underline, strikeout and squiggly are only applicable to comments that encompass a section of text.

The following example demonstrates how to style a simple comment.

HTML

<span class="comment">This is a styled comment</span>

CSS

span.comment {
    -ro-comment-content: content();
    -ro-comment-style: underline;
}

The visualization is ultimately dependent on the PDF viewer and may vary across viewers and/or platforms.

Comments can be customized further by using a variety of style rules. Besides content and style, you can also customize the following properties:

  • Title: The title of the comment. In some cases, this is also used for the author. Use the CSS property to specify the title.

  • Color: The color of the comment. The default value for the color depends on the comment style chosen. Use the CSS property to set a color.

  • Date: The date of the comment. When no date is specified, the current date is used. Use the CSS property to set the date.

  • Date Format: The format of the date you specified. The syntax is identical to Java's SimpleDateFormat SimpleDateFormat API documentation: https://docs.oracle.com/javase/8/docs/api/java/text/SimpleDateFormat.html . Use the CSS property to specify a date format for the comment's date.

  • Position: The position of the comment icon (only applicable for the comment style note). Use the CSS property to specify a position for the comment's note icon.

  • Initial state: The initial state of the comment, i.e. whether the comment should be open or closed when the PDF is opened in a viewer. Use the CSS property to specify the initial state of the comment bubbles. The state can be either open or closed with the latter being the default value.

The following sample shows how to customize all of the aforementioned properties.

.comment {
    /* Content: get the content of the comment from the text content of the element */
    -ro-comment-content: content();
    /* Title: get the title from the "author" attribute of the element */
    -ro-comment-title: attr(author);
    /* Style: set the comment style to "note" */
    -ro-comment-style: note;
    /* Color: specify a color for the comment */
    -ro-comment-color: steelblue;
    /* Date: get the date from the "date" attribute of the element */
    -ro-comment-date: attr(date);
    /* Date Format: specify a custom date format */
    -ro-comment-dateformat: "yyyy/dd/MM HH:mm:ss";
    /* Position: shift the comment icon to the right side of the page */
    -ro-comment-position: page-right;
    /* Initial state: open comment bubbles when the PDF is opened */
    -ro-comment-state: open;
    /* additional styles */
}

Please see the documentation of the individual CSS properties for more information.

Advanced Comments

In some cases, comments have a separate start and end tag. In this case the additional style rules -ro-comment-start or -ro-comment-end have to be set to match the comment's start and end elements.

commentstart {
    /* some customizations */
    -ro-comment-content: attr(text);
    -ro-comment-title: attr(author);
    -ro-comment-style: highlight;

    /* define the comment start element */
    -ro-comment-start: attr(uid)
}

commentend {
    /* define the comment end element */
    -ro-comment-end: attr(uid);
}

Change Bars

Especially when marking only a single word or even less, the usual highlighting styles may not be enough. In such cases, PDFreactor's Change Bars can help to draw attention. A change bar is simply a colored line next to the content, on the same height as the element that enabled it.

The proprietary property -ro-change-bar-color enables them when set to a color.

ins {
    -ro-change-bar-color: yellowgreen;
}

del {
    -ro-change-bar-color: orangered;
}

To prevent different kinds (i.e. colors) of change bars from overlapping, each change bar can be assigned a different offset from the page content edge, by setting -ro-change-bar-offset.

Alternatively, it is also possible to move a change bar to the other page side altogether by using -ro-change-bar-align. This property defines where the change bars are positioned. By default, the bars are positioned in the left (or right) page margin area. If they come from a multi-column element, however, it makes sense to position them next to the columns.

.multi-column ins {
    -ro-change-bar-color: yellowgreen;
    -ro-change-bar-width: thick;
    -ro-change-bar-align: outside column;
}

In the sample above, the bars will be placed next the respective column, while the side of the column depends on the side of the page. With outside meaning right side for right pages and left side for left pages. There is another special settings best used for multi-columns with only two columns. The value distribute-column is combined with page and distributes the change bars on the left and the right side of the page, depending on which side is closer to the column in which the change bar originates.

.multi-column ins {
    -ro-change-bar-color: yellowgreen;
    -ro-change-bar-align: outside distribute-column page;
}

Fonts

To be able to display text PDFreactor requires font data. This font data must be in TTF True Type Font or in OTF Open Type Font format and may come from different types of sources (see Font Sources).

Using OpenType fonts with CFF outlines requires Java SE 9 or higher.

Font Sources

The font data of PDFreactor may come from different types of sources.

Core Fonts Pack

PDFreactor contains fonts that will be used for the Default Font Mapping when no other fonts could be registered on the system, e.g. because of insufficient file permissions or due to the fact that there are no fonts available.

These fonts are distributed by RealObjects and licensed by their respective authors under the SIL Open Font License A free and open source license designed for fonts (https://scripts.sil.org/cms/scripts/page.php?id=OFL_web) or are in the Public Domain

The packaged core fonts are:
Original Font Name Type PDFreactor Font Name License
Arimo sans-serif RealObjects core sans-serif SIL Open Font License, Version 1.1
Tinos serif RealObjects core serif SIL Open Font License, Version 1.1
Cousine monospace RealObjects core monospace SIL Open Font License, Version 1.1
Dancing Script cursive RealObjects core cursive SIL Open Font License, Version 1.1
Orbitron fantasy RealObjects core fantasy SIL Open Font License, Version 1.1
Quivira symbol RealObjects core symbol Public Domain (http://en.quivira-font.com/notes.php)

Additionally the core fonts contain fallback fonts for symbols and characters from non-Latin languages. Those are the Noto fonts (SIL Open Font License), Nanum Gothic (SIL Open Font License), and Droid Sans Fallback (Apache License).

System and JVM Font Directories

The main sources PDFreactor uses to retrieve font data are:

  • fonts registered with the Java VM

  • fonts located in system font folders

Both provide fonts physically available to PDFreactor.

Java VM fonts are usually located in JAVA_HOME/jre/lib/fonts. The location of system font folders is platform dependent. PDFreactor registers fonts from these sources automatically.

If PDFreactor was unable to retrieve any font data, fonts from the Core Fonts Pack will be used. (see ).

PDFreactor can be configured to ignore all system fonts and only use fonts that either have been specifically added via configuration properties or that are web fonts from style sheets. This is useful if the system either has no fonts or if you want to avoid system-dependent output. See for examples.

Additional Fonts & Font Directories

PDFreactor allows setting additional fonts that are neither located in the system font directory nor the font directory of the Java VM. These fonts still need to be physically available to PDFreactor.

To register these fonts with PDFreactor via the API, use the following configuration properties:

  • fontDirectories — The fonts in the specified directories and all their subdirectories will be used by PDFreactor.

  • fonts — Additional fonts from a specified source URL.

For each directory added by the fontDirectories property and for each of their subdirectories, a separate font cache is created. Should the contents of these directories change, please delete the font cache files before running PDFreactor. See the Chapter The Font Cache Mechanism for more information about the font cache.

Font directories can be added like this:

config.setFontDirectories("/myFonts1", "/myFonts2/corporate");

Use the fontDirs server parameter to control custom font directories.

Use the fontDirs server parameter to control custom font directories.

Use the fontDirs server parameter to control custom font directories.

Use the fontDirs server parameter to control custom font directories.

Use the fontDirs server parameter to control custom font directories.

Use the fontDirs server parameter to control custom font directories.

Use the fontDirs server parameter to control custom font directories.

Use the fontDirs server parameter to control custom font directories.

Use the fontDirs server parameter to control custom font directories.

--fontDirectories "/myFonts1" "/myFonts2/corporate"

Instead of adding entire font directories that PDFreactor will scan, you can also add specific fonts like this:

config.setFonts(
    new Font().setFamily("My Font")
              .setBold(true)
              .setItalic(true)
              .setSource("https://url/to/font.ttf"));
config.Fonts = new List<Font>
{
    new Font()
    {
        Family = "My Font"
        Bold = true
        Italic = true
        Source = "https://url/to/font.ttf"
    }
};
config.fonts = [
    {
        family: "My Font",
        bold: true,
        italic: true,
        source: "https://url/to/font.ttf"
    }
];
config.fonts = [
    {
        family: "My Font",
        bold: true,
        italic: true,
        source: "https://url/to/font.ttf"
    }
];
$config["fonts"] = array(
    array(
        "family" => "My Font",
        "bold" => true,
        "italic" => true,
        "source" => "https://url/to/font.ttf"
    )
);
config['fonts'] = [
    {
        'family': 'My Font',
        'bold': True,
        'italic': True,
        'source': 'https://url/to/font.ttf'
    }
]
config['fonts'] = [
    {
        'family': 'My Font',
        'bold': true,
        'italic': true,
        'source': 'https://url/to/font.ttf'
    }
]
$config["fonts"] = [
    {
        "family" => "My Font",
        "bold" => true,
        "italic" => true,
        "source" => "https://url/to/font.ttf"
    }
);
{ "fonts": [
    {
        "family": "My Font",
        "bold": true,
        "italic": true,
        "source": "https://url/to/font.ttf"
    }
]}
-C config.json

With the following config.json:

{ "fonts": [
    {
        "family": "My Font",
        "bold": true,
        "italic": true,
        "source": "https://url/to/font.ttf"
    }
]}

See Docker Configuration on how to deploy fonts when using the PDFreactor Docker image.

CSS Defined Fonts

PDFreactor is capable of using fonts defined in CSS via the @font-face rule. These fonts are retrieved by PDFreactor along with other resources of the document (e.g. images) and will only be used to render the document they belong to.

@font-face {
    font-family: "My Font";
    src: url("https://www.my-server.com/fonts/my-font.ttf");
}

The Font Cache Mechanism

PDFreactor uses a font cache to store required information about available fonts.

Font Cache Lifecycle

One of the steps PDFreactor performs on startup is registering fonts. The first time this is done will take some time since every font inside the font directories available to PDFreactor will be identified and registered.

At the end of this step PDFreactor creates font cache files that will be used on subsequent starts to significantly reduce its startup time. The font caching ensures the rendering process will start as soon as possible.

If a font cache file is present, new fonts put into the font directories available to PDFreactor will be ignored by PDFreactor unless the font cache file has been deleted. Then PDFreactor will create a new font cache file on the next startup as it would on its first one.

To delete the font cache file, visit the user.home/.PDFreactor directory and delete all files inside it.

When using the PDFreactor Web Service, the font cache is located in the jetty/pdfreactor/fontcache directory of your PDFreactor installation instead (unless otherwise configured, see Customizing the Server Configuration)

Controlling the Font Registration and Caching Mechanism

It is possible to customize the registration and caching of fonts via the API.

The following configuration properties are used to control the font handling behavior of PDFreactor:

  • fontCachePath — Specifies the location where the font cache file should be stored.

  • cacheFonts — Activates or deactivates the file system font cache.

  • disableSystemFonts — If set to true, PDFreactor will neither register system fonts, nor use the font cache if it exists.

  • disableFontRegistration — Specifies that parts of the font caching mechanism should be disabled. This is a legacy property. In nearly all cases you should use either cacheFonts or disableSystemFonts.

As mentioned before, the default font cache is located in the user.home/.PDFreactor directory. To customize this location, you can use the configuration property fontCachePath.

config.setFontCachePath("/myPDFreactor/fontcache/cache.dat");

Use the fontCacheDir server parameter to control the font cache location.

Use the fontCacheDir server parameter to control the font cache location.

Use the fontCacheDir server parameter to control the font cache location.

Use the fontCacheDir server parameter to control the font cache location.

Use the fontCacheDir server parameter to control the font cache location.

Use the fontCacheDir server parameter to control the font cache location.

Use the fontCacheDir server parameter to control the font cache location.

Use the fontCacheDir server parameter to control the font cache location.

Use the fontCacheDir server parameter to control the font cache location.

--fontCachePath "/myPDFreactor/fontcache/cache.dat"

If it is undesireable to create a font cache on the server’s file system, e.g. because PDFreactor does not have sufficient privileges to do so, you can use the cacheFonts to disable the font cache.

config.setCacheFonts(false);

Use the disableFontCache server parameter to control the file system font cache.

Use the disableFontCache server parameter to control the file system font cache.

Use the disableFontCache server parameter to control the file system font cache.

Use the disableFontCache server parameter to control the file system font cache.

Use the disableFontCache server parameter to control the file system font cache.

Use the disableFontCache server parameter to control the file system font cache.

Use the disableFontCache server parameter to control the file system font cache.

Use the disableFontCache server parameter to control the file system font cache.

Use the disableFontCache server parameter to control the file system font cache.

--cacheFonts false

PDFreactor can be configured to ignore all system fonts and only use fonts that either have been specifically added via configuration properties or that are web fonts from style sheets:

config.setDisableSystemFonts(true);

Use the disableSystemFonts server parameter to control system font usage.

Use the disableSystemFonts server parameter to control system font usage.

Use the disableSystemFonts server parameter to control system font usage.

Use the disableSystemFonts server parameter to control system font usage.

Use the disableSystemFonts server parameter to control system font usage.

Use the disableSystemFonts server parameter to control system font usage.

Use the disableSystemFonts server parameter to control system font usage.

Use the disableSystemFonts server parameter to control system font usage.

Use the disableSystemFonts server parameter to control system font usage.

--disableSystemFonts

Font Matching

Matching Generic Font Families

The generic font families are mapped as follows:

Generic Font Mapping
Generic Font Family Matched Core Font First System Font Tried
sans-serif Arimo Arial
serif Tinos Times New Roman
monospace Cousine Courier New
cursive Dancing Script Comic Sans MS
fantasy Orbitron Impact

Font Alias Names

It is possible to add a font alias name for a font available in the system font directory or the font directory of the Java VM. The font alias name allows referencing to a font using a different name.

Authors can thus use a font alias name as the font-family value in CSS instead of the actual font name. Exchanging the font in all these documents can be done by changing the actual font behind the alias.

To define a font alias name via the Java API use the following configuration property:

  • fontAliases — Alias families for registered fonts.

The following example maps the registered font Arial to the name "My Font". So each time you refer to the name "My Font" in CSS, Arial is used internally.

config.setFontAliases(
    new Font().setFamily("My Font")
              .setSource("Arial"));
config.FontAliases = new List<Font>
{
    new Font()
    {
        Family = "My Font"
        Source = "Arial"
    }
};
config.fontAliases = [
    {
        family: "My Font",
        source: "Arial"
    }
];
config.fontAliases = [
    {
        family: "My Font",
        source: "Arial"
    }
];
$config["fontAliases"] = array(
    array(
        "family" => "My Font",
        "source" => "Arial"
    )
);
config['fontAliases'] = [
    {
        'family': 'My Font',
        'source': 'Arial'
    }
]
config['fontAliases'] = [
    {
        'family': 'My Font',
        'source': 'Arial'
    }
]
$config["fontAliases"] = [
    {
        "family" => "My Font",
        "source" => "Arial"
    }
);
{ "fontAliases": [
    {
        "family": "My Font",
        "source": "Arial"
    }
]}
-C config.json

With the following config.json:

{ "fontAliases": [
    {
        "family": "My Font",
        "source": "Arial"
    }
]}

Automatic Font Fallback

Whenever the current font cannot be used to display a certain character, an automatic font fallback is used to find a replacement font for this character. To do so fonts are iterated according to the following parameters:

  1. The font-family property of the current element

  2. The configuration property fontFallback

  3. An internal list of recommended fonts

  4. All fonts on the system, starting with those with the most glyphs

A list of fallback fonts can be specified like this:

config.setFontFallback("My Font", "Arial");
config.FontFallback = new List<String> { "My Font", "Arial" };
$config["fontFallback"] = array("My Font", "Arial");
config['fontFallback'] = [ "My Font", "Arial" ]
config['fontFallback'] = [ "My Font", "Arial" ]
config.fontFallback = [ "My Font", "Arial" ];
config.fontFallback = [ "My Font", "Arial" ];
$config["fontFallback"] = [ "My Font", "Arial" ];
{ "fontFallback": [ "My Font", "Arial" ] }
--fontFallback "My Font" "Arial"

JavaScript Objects and Types

Objects

ro
The ro or window.ro object provides access to PDFreactor's proprietary JavaScript API.
  • exports ?
  • Data that will be made available to the outside integration API. See

  • layout Layout
  • Proprietary layout information.

  • pdf PDF
  • Runtime PDFreactor API

  • terminateConversion Stringmessage
  • Terminates the current PDF conversion at the next possible moment, causing PDFreactor to throw an appropriate exception with a message equal to the parameter of this method.
    • message String
    • The exception message.

layout
PDFreactor allows JavaScript access to some layout information via the proprietary object ro.layout.
  • getPageDescription Numberindex
  • Returns a PageDescription for the page with the given index. The first page has the index 0.
    • index Number
    • The page index.

  • getBoxDescriptions Elementelement
  • Returns an array of BoxDescription objects for the given element. Note that one element can have several boxes (e.g. when a paragraph is spread over multiple pages).
    • element Element
    • The DOM element.

  • String getContent Elementelement StringpseudoElement
  • Returns a string containing the layout text content of the specified element and its descendants. The layout text can differ from the DOM text content due to processing, including white-space collapsing and the addition of generated content.
    • element Element
    • The DOM element.

    • pseudoElement String
    • A string specifying which content to return:

      • "before": Retrieves the "before" generated content of the element.

      • "after": Retrieves the "after" generated content of the element.

      • "text": Retrieves the content of the element, excluding its generated content.

      • "all": Retrieves the content of the element.

      If omitted "all" will be applied as default.

      Both "text" and "all" includes the generated content of all descendants.

  • String getContent NumberpageIndex StringmarginBox
  • Returns a string containing the content of the page margin box of the specified page.
    • pageIndex Number
    • The page of the page margin box. The first page has the index 0.

    • marginBox String
    • A string specifying the page margin box, eg. "top-left", see .

  • numberOfPages Number
  • Returns the current total number of pages of the document.

pdf
It is possible to use certain PDF-specific parts of the PDFreactor API during runtime via the proprietary object ro.pdf.
  • addAttachments Boolean
  • Enables or disables attachments specified in style sheets.

  • addComments Boolean
  • Enables or disables comments in the PDF document.

  • addOverprint Boolean
  • Enables or disables overprinting.

  • addPreviewImages Boolean
  • Enables or disables embedding of image previews per page in the PDF document.

  • addTags Boolean
  • Enables or disables tagging of the PDF document.

  • allowAnnotations Boolean
  • Enables or disables the 'annotations' restriction in the PDF document.

  • allowAssembly Boolean
  • Enables or disables the 'assembly' restriction in the PDF document.

  • allowCopy Boolean
  • Enables or disables the 'copy' restriction in the PDF document.

  • allowDegradedPrinting Boolean
  • Enables or disables the 'degraded printing' restriction in the PDF document.

  • allowFillIn Boolean
  • Enables or disables the 'fill in' restriction in the PDF document.

  • allowModifyContents Boolean
  • Enables or disables the 'modify contents' restriction in the PDF document.

  • allowPrinting Boolean
  • Enables or disables the 'printing' restriction in the PDF document.

  • allowScreenReaders Boolean
  • Enables or disables the 'screen readers' restriction in the PDF document.

  • attachments
  • Adds a file attachment to PDF document. All attachments that have been set previously in the PDFreactor integration are included as attachments with binary content which will be base64-encoded.

  • author String
  • Sets the value of the author field of the PDF document.

  • bookletMode
  • Convenience method to set pages-per-sheet properties and page order in one step to create a booklet.

  • creator String
  • Sets the value of creator field of the PDF document.

  • customDocumentProperties
  • Adds custom properties to the PDF document. An existing property of the same name will be replaced.

  • disableBookmarks Boolean
  • Disables bookmarks in the PDF document.

  • disableLinks Boolean
  • Disables links in the PDF document.

  • encryption String
  • Use one of the encryption constants to specify the encryption:

    • "none": Indicates that the document will not be encrypted. If encryption is disabled then no user password and no owner password can be used.

    • "type_128": Indicates that the document will be encrypted using RC4 128 bit encryption. For normal purposes this value should be used.

    • "type_40": Indicates that the document will be encrypted using RC4 40 bit encryption.

  • keywords String
  • Sets the value of the keywords field of the PDF document.

  • ownerPassword Boolean
  • Sets the owner password of the PDF document.

  • pageOrder String
  • Sets the page order of the direct result of the conversion.

    If the merge mode is set to ARRANGE (see ), this property is also used to specify the position of inserted pages from an existing PDF.

    A description of the syntax can be found in the section.

    Additionally, the pageOrder constants can be used:

    • "BOOKLET": Page order mode to arrange all pages in booklet order.

    • "BOOKLET_RTL": Page order mode to arrange all pages in right-to-left booklet order.

    • "EVEN": Page order mode to keep even pages only.

    • "ODD": Page order mode to keep odd pages only.

    • "REVERSE": Page order mode to reverse the page order.

  • pagesPerSheetProperties
  • Sets the properties of a sheet on which multiple pages are being arranged.

    If cols or rows is less than 1, no pages-per-sheet processing is done. This is the case by default.

  • pdfScriptActions
  • Sets a pair of trigger event and PDF script. The script is triggered on the specified event.

    A PDF script is JavaScript that is executed by a PDF viewer (e.g. Adobe Reader). Note that most viewers do not support this feature.

    PDF Scripts can also be set by using the proprietary CSS property pdf-script-action. More information on this property can be found here pdf-script-action.

    Please note, that scripts set via CSS have a higher priority. If two scripts are registered on the same event, but one via API and the other via the CSS property, the script set in the CSS will override the other one.

  • printDialogPrompt Boolean
  • Enables or disables a print dialog to be shown upon opening the generated PDF document by a PDF viewer.

  • subject String
  • Sets the value of the subject field of the PDF document.

  • title String
  • Sets the value of the title field of the PDF document.

  • userPassword String
  • Sets the user password of the PDF document.

Types

DOMRect
A contains the position and dimensions of a rectangle.
  • left Number
  • The x-coordinate.

  • right Number
  • The x-coordinate plus the width.

  • top Number
  • The y-coordinate.

  • bottom Number
  • The y-coordinate plus the height.

  • width Number
  • The width.

  • height Number
  • The height.

Range
Contains information about a fragment of a document that can contain nodes and parts of text nodes.
  • startContainer Node
  • Returns the DOM Node within which the range starts.

  • startOffset Number
  • Returns the offset in the startContainer at which the range starts.

  • endContainer Node
  • Returns the DOM Node within which the range ends.

  • endOffset Number
  • Returns the offset in the endContainer at which the range ends.

Proprietary Types

BoxDescription
Describes the position and dimensions of the rectangles of a box as well as some further information. The rectangles are described by using .
  • pageIndex Number
  • The index of the page of this box. The first page has the index 0.

  • pageLeft Boolean
  • Whether the page of this box is on the left.

  • pageDescription PageDescription
  • The PageDescription of the page of this box. It contains the data of the page from the moment when this BoxDescription was created.

  • lineDescriptions
  • Returns an array of LineDescriptions for this box if the box contains text directly.

  • generatedContentDescriptions
  • Returns an object providing access to BoxDescription arrays for the generated content via type name strings. Available generated content type names are "before" and "after" (for normal HTML elements) and "content" (for page margin boxes). Please note that generated content of inline elements is not yet accessible in this way.

  • columnIndex Number
  • For boxes inside a multi-column layout this returns the index of the column the box is in. Otherwise it returns -1. The index starts at 0 for the first column of the multi-column container element. It increases by one for each further column or column-span and is not reset on new pages or by column spans.

  • columnIndexLocal Number
  • For boxes inside a multi-column layout this returns the local index of the column the box is in. Otherwise it returns -1. The local index starts at 0 for the first column of the multi-column container element. It increases by one for each further column and is reset to 0 on new pages as well as on and after column spans.

  • regionIndex Number
  • For boxes inside a Region this returns the index of that region. Otherwise it returns -1. The index starts at 0 for the first region in its chain. It increases by one for each further region in the same chain and is not reset on new pages.

  • regionIndexLocal Number
  • For boxes inside a Region this returns the local index of that region. Otherwise it returns -1. The local index starts at 0 for the first region in its chain. It increases by one for each further region in the same chain and is reset to 0 on new pages.

  • getMarginRect Stringunit
  • Returns a DOMRect describing the margin rectangle. The point of origin is the upper left corner of the page content rectangle.
    • unit String
    • The desired length unit in which the dimensions and coordinates will be returned. (defaults to "px")

  • getBorderRect Stringunit
  • Returns a DOMRect describing the border rectangle. The point of origin is the upper left corner of the page content rectangle.
    • unit String
    • The desired length unit in which the dimensions and coordinates will be returned. (defaults to "px")

  • getPaddingRect Stringunit
  • Returns a DOMRect describing the padding rectangle. The point of origin is the upper left corner of the page content rectangle.
    • unit String
    • The desired length unit in which the dimensions and coordinates will be returned. (defaults to "px")

  • getContentRect Stringunit
  • Returns a DOMRect describing the content rectangle. The point of origin is the upper left corner of the page content rectangle.
    • unit String
    • The desired length unit in which the dimensions and coordinates will be returned. (defaults to "px")

  • getMarginRectInPage Stringunit
  • Returns a DOMRect describing the margin rectangle. The point of origin is the upper left corner of the page rectangle.
    • unit String
    • The desired length unit in which the dimensions and coordinates will be returned. (defaults to "px")

  • getBorderRectInPage Stringunit
  • Returns a DOMRect describing the border rectangle. The point of origin is the upper left corner of the page rectangle.
    • unit String
    • The desired length unit in which the dimensions and coordinates will be returned. (defaults to "px")

  • getPaddingRectInPage Stringunit
  • Returns a DOMRect describing the padding rectangle. The point of origin is the upper left corner of the page rectangle.
    • unit String
    • The desired length unit in which the dimensions and coordinates will be returned. (defaults to "px")

  • getContentRectInPage Stringunit
  • Returns a DOMRect describing the content rectangle. The point of origin is the upper left corner of the page rectangle.
    • unit String
    • The desired length unit in which the dimensions and coordinates will be returned. (defaults to "px")

  • getBoundingLineContentRect Stringunit
  • Returns a DOMRect describing the union of the content rectangles of the LineDescriptions contained in this box, i.e. the bounding rectangle of all text content of the box. The coordinates are relative to the box contaning this lines.
    • unit String
    • The desired length unit in which the dimensions and coordinates will be returned. (defaults to "px")

PageDescription
Describes the dimensions of a page and its rectangles as well as some further information. The rectangles are described by using s.
  • pageIndex Number
  • The index of this page. The first page has the index 0.

  • pageLeft Boolean
  • Whether this page is on the left.

  • pageName String
  • The name of this page, if it is a named page and an empty string otherwise.

  • pageGroups String
  • An array containing all names of this page or an empty array if there are none.

  • range
  • The DOM Range of the content of this page. The start- and endContainer are the most deeply nested nodes at the respective page breaks.

  • rangeShallow
  • The DOM Range of the content of this page. The start- and endContainer are the least deeply nested nodes at the respective page breaks.

  • marginBoxDescriptions
  • Returns an object providing access to BoxDescriptions for the page margin boxes via margin box name strings like "top-left". The BoxDescriptions for the content of a margin box are available via the 'content' key of its generatedContentDescriptions object.

  • getMediaRect Stringunit
  • Returns a DOMRect describing the media box of the page.
    The position is relative to the media/trim rectangle, so both values are negative or 0.
    • unit String
    • The desired length unit in which the dimensions and coordinates will be returned. (defaults to "px")

  • getBleedRect Stringunit
  • Returns a DOMRect describing the bleed box of the page.
    The position is relative to the media/trim rectangle, so both values are negative or 0.
    • unit String
    • The desired length unit in which the dimensions and coordinates will be returned. (defaults to "px")

  • getTrimRect Stringunit
  • Returns a DOMRect describing the trim box of the page. This is a synonym for getMarginRect and matches the page size.
    The position is relative to the media/trim rectangle itself, so both values are always 0.
    • unit String
    • The desired length unit in which the dimensions and coordinates will be returned. (defaults to "px")

  • getMarginRect Stringunit
  • Returns a DOMRect describing the margin rectangle of the page. This is a synonym for getTrimRect and matches the page size.
    The position is relative to the media/trim rectangle itself, so both values are always 0.
    • unit String
    • The desired length unit in which the dimensions and coordinates will be returned. (defaults to "px")

  • getBorderRect Stringunit
  • Returns a DOMRect describing the border rectangle of the page.
    The position is relative to the media/trim rectangle, so both values are positiv or 0.
    • unit String
    • The desired length unit in which the dimensions and coordinates will be returned. (defaults to "px")

  • getPaddingRect Stringunit
  • Returns a DOMRect describing the padding rectangle of the page.
    The position is relative to the media/trim rectangle, so both values are positiv or 0.
    • unit String
    • The desired length unit in which the dimensions and coordinates will be returned. (defaults to "px")

  • getContentRect Stringunit
  • Returns a DOMRect describing the content rectangle of the page.
    The position is relative to the media/trim rectangle, so both values are positiv or 0.
    • unit String
    • The desired length unit in which the dimensions and coordinates will be returned. (defaults to "px")

  • getCropRect Stringunit
  • Returns a DOMRect describing the crop box of the page or null if none is set.
    • unit String
    • The desired length unit in which the dimensions and coordinates will be returned. (defaults to "px")

  • getArtRect Stringunit
  • Returns a DOMRect describing the art box of the page or null if none is set.
    • unit String
    • The desired length unit in which the dimensions and coordinates will be returned. (defaults to "px")

LineDescription
Contains information about a line of text. It can be retrieved from a .
  • range
  • The DOM Range from the beginning to the end of the text of the line or null for empty lines.

  • Number getBaselinePosition Stringunit
  • Returns the vertical distance between the baseline position of the line and the top of the content rectangle of the box containing the line.
    • unit String
    • The desired length unit in which the dimensions and coordinates will be returned. (defaults to "px")

  • getContentRect Stringunit
  • Returns a DOMRect describing the content rectangle of the line, specifically the part of the line actually containing text. The coordinates are relative to the box contaning this line.
    • unit String
    • The desired length unit in which the dimensions and coordinates will be returned. (defaults to "px")

Attachment
A JavaScript object containing data for attachments. Unlike the attachments in the normal PDFreactor configuration, these attachments contain text by default, not binary data. It is still possible to attach binary data, however you have to base64-encode the data and set the binary property to true.
  • data String|Blob
  • The textual or base64-encoded binary content of the attachment. Binary content can also be a Blob. May be omitted.

  • url String
  • If data is not specified, the attachment will be retrieved from this URL. If this is "#" the input document URL is used instead.

  • name String
  • The file name associated with the attachment. It is recommended to specify the correct file extension. If this is omitted the name is derived from the URL.

  • description String
  • The description of the attachment. If this is omitted the name is used.

  • binary Boolean
  • This property indicates whether the data property contains base64-encoded binary data or not. If omitted it is treated as false, meaning that the attachment content is treated as UTF-8 encoded text, unless it is a Blob.

BookletMode
A JavaScript object containing data for bookletMode.
  • sheetSize String
  • The size of the sheet as CSS value, e.g. "A3", "letter landscape", "15in 20in", "20cm 30cm".

  • sheetMargin String
  • The sheet size as CSS size, e.g. "A4", "letter landscape", "15in 20in", "20cm 30cm".

  • rtl Boolean
  • Whether or not the reading order of the booklet should be right-to-left.

KeyValuePair
A JavaScript object containing data for customDocumentProperties.
  • key String
  • The key.

  • value String
  • The value.

PagesPerSheetProperties
A JavaScript object containing data for pagesPerSheetProperties.
  • cols Number
  • The number of columns per sheet.

  • rows Number
  • The number of rows per sheet.

  • sheetSize String
  • The sheet size as CSS size, e.g. "A4", "letter landscape", "15in 20in", "20cm 30cm".

  • sheetMargin String
  • The sheet margin as CSS margin, e.g. "1in", "1cm 1.5cm", "10mm 20mm 10mm 30mm". null is interpreted as 0mm.

  • spacing String
  • The horizontal and vertical space between pages on a sheet as CSS value, for example "0.1in" or "5mm 2mm". null is interpreted as "0mm".

  • direction String
  • The direction in which the pages are ordered on a sheet. Value is one of the following constants:

    • "DOWN_LEFT": Arranges the pages on a sheet from top to bottom and right to left.

    • "DOWN_RIGHT": Arranges the pages on a sheet from top to bottom and left to right.

    • "LEFT_DOWN": Arranges the pages on a sheet from right to left and top to bottom.

    • "LEFT_UP": Arranges the pages on a sheet from right to left and bottom to top.

    • "RIGHT_DOWN": Arranges the pages on a sheet from left to right and top to bottom.

    • "RIGHT_UP": Arranges the pages on a sheet from left to right and bottom to top.

    • "UP_LEFT": Arranges the pages on a sheet from bottom to top and right to left.

    • "UP_RIGHT": Arranges the pages on a sheet from bottom to top and left to right.

PdfScriptAction
A JavaScript object containing data for pdfScriptActions.
  • triggerEvent String
  • The event on which the script is executed. Value is one of the following constants:

    • "AFTER_PRINT": This event is triggered after the PDF has been printed by the viewer application.

    • "AFTER_SAVE": This event is triggered after the PDF has been saved by the viewer application.

    • "BEFORE_PRINT": This event is triggered before the PDF is printed by the viewer application.

    • "BEFORE_SAVE": This event is triggered before the PDF is saved by the viewer application.

    • "CLOSE": This event is triggered when the PDF is closed by the viewer application.

    • "OPEN": This event is triggered when the PDF is opened in the viewer application.

  • script String
  • The script source that should be executed.

PDFreactor Web Service Server Configuration

The PDFreactor Web Service server can be configured by using the following server parameters. For additional information, please refer to chapter .

The property "type" indicates with data type is used for the parameter. Some parameters also have a "unit" which is the unit the server parameter refers to. It is only mentioned for information purposes.

adminKey
This parameter specifies a key for privileged access to the service.
See: for more information.
Type: String
adminKeyPath
Similar to , but specifies the path to a file containing the admin key. If the path of this parameter indicates a directory, the contents of the file adminkey.txt are used, if present within the directory.
See: for more information.
Type: Path
apiKeys
This parameter specifies a comma separated list of strings that are used as API keys.
See: for more information.
Type: List<String>
apiKeysPath
Similar to , but instead of a comma separated list it specifies the path to a file containing a JSON object with API keys as keys and a description as value. If the path of this parameter indicates a directory, the contents of the file apikeys.json are used, if present within the directory.
See: for more information.
Type: Path
assetPackageFiles
This parameter limits the maximum number of files that an asset package may contain. A value of 0 or a negative value indicates that there is no file limit. The default value is 1000.
Type: Integer
Unit: Amount
assetPackageMaxSize
Limits the maximum size of the asset package (in bytes). A value of 0 or a negative value indicates that there is no size limit. By default, no maximum size is configured.
Type: Long
Unit: Bytes
callbackMaxTimeout
Callback timeouts with a negative or zero value are treated as an infinite timeout. If infinite timeouts are undesirable for your server, you can limit it to this value (in milliseconds). By default, no maximum timeout is configured.
Type: Integer
Unit: Milliseconds
callbackTimeout
When clients specify callbacks without a timeout, this value will be used as a default timeout (in milliseconds) for connections to the callback URL. The default value is 30000 (30 seconds).
Type: Integer
Unit: Milliseconds
cleanupInterval
This parameter specifies the interval (in days) at which the PDFreactor Web Service deletes asynchronous conversion results that have not been retrieved. The default value is 5.
Type: Integer
Unit: Days
conversionCacheSize
This parameter specifies the amount of conversions that are kept in memory (only their metadata, without the document). Otherwise they have to be reloaded from the file system.
Type: Integer
Unit: Amount
conversionTimeout
Specifies a timeout in seconds after which conversions automatically terminate. Specifying the value "0" means that there is no timeout. By default, no timeout is configured.
Type: Integer
Unit: Seconds
debugLocalDir
This specifies the directory where debug files will be dumped by PDFreactor in case debug mode is enabled and no converted document could be created.
Type: Path
disableDocTemp
If set to true, the Web Service will not use a temp folder. This also means that asynchronous conversions are not available. Synchronous conversions will be done in-memory, so make sure that the Web Service has sufficient amounts of memory available.
Type: Boolean
disableFontCache
If set to true, the Web Service will not use a file-based font cache. Generally, this is not recommended since the font cache will then have to be created for every conversion which is likely to have a significant performance impact. The default value is false.
Type: Boolean
disableFontRegistration
If set to true, font registration is disabled and any existing font cache will be ignored and the font directories will be scanned for font information. The default value is false.
Type: Boolean
disableSystemFonts
If set to true, PDFreactor will neither scan for nor use system fonts that are installed on the server. Only fonts specified via CSS and via the server parameter fontDirs as well as PDFreactor internal fonts will be used.
Type: Boolean
docTempDir
This parameter specifies the location of the Web Service's temporary folder which is used to store asynchronously converted documents. The pre-configured location is the pdfreactor/doctemp directory in the PDFreactor/jetty directory.
Type: Path
docTempRetentionPeriod
Asynchronous conversions create temporary files on the server, which are automatically deleted when they are read once. If results of asynchronous conversions are not accessed, these files remain on the server and are deleted after a certain amount of days equal to this parameter. The default value is 5 (days).
Type: Integer
Unit: Days
fontCacheDir
This specifies the directory of the font cache, which will be created by PDFreactor. If no path is specified, the font cache will be created in PDFreactor/jetty/pdfreactor/fontcache.
Type: Path
fontDirs
This parameter takes a colon or semicolon separated list of directories that PDFreactor should scan for fonts.
Type: List<Path>
ignoreClientPriority
If set to true, the Web Service will ignore any priority specified via the priority property in the client's Configuration object.
Type: Boolean
licenseKeyPath
Specifies a file system path, either directly to the license key file or to a directory where the license key file is located.
Type: Path
licenseKeyUrl
Specifies a URL where the license key file is located.
Type: URL
overrideConfig
A URL or path to a file containing a server-side configuration which is used to override any properties in the configuration send by clients. The file must be a Configuration object in JSON format.
Type: URL
securitySettings.allowExternalXmlParserResources
This parameter specifies whether the XML parser will process external XML resources during parsing, e.g. DTDs, entities, XInlcudes. This does not affect HTML5 document processing.
Type: Boolean
securitySettings.allowRedirects
This parameter specifies whether to allow automatic URL redirects when PDFreactor receives appropriate status codes.
Type: Boolean
securitySettings.connectionRules
A URL or path to a file containing a list of rules that PDFreactor evaluates and then either denies or allows connections to a particular resource. The file must be a JSON array of ConnectionRule objects in JSON format.
Type: URL
securitySettings.defaults.allowAddresses
This parameter specifies a list of address types to where PDFreactor is allowed to connect.
Type: List<Enum>
Values: link_local | local | private | public
securitySettings.defaults.allowFileSystemAccess
This parameter specified whether to allow document resources such as CSS or JavaScript file system access.
Type: Boolean
securitySettings.defaults.allowProtocols
This parameter specifies a list of allowed URL protocols. URLs with protocols not in this list will be blocked. Note that "file" protocols are handled by instead.
Type: List<String>
securitySettings.defaults.allowSameBasePath
This parameter specifies whether to allow loading of document resources that have the same base path as the document.
Type: Boolean
securitySettings.hideVersionInfo
Specifies whether PDFreactor will include version information in the PDF metadata or in response headers.
Type: Boolean
securitySettings.untrustedApi
Specifies whether the PDFreactor configuration object is considered an untrusted context for the purpose of security. If it is a trusted context, URLs that are specified in the configuration object are not vetted against the security settings and are always allowed. If it is not trusted, the same security settings that are used for document resources apply to all configuration resources (including the document) as well.
Type: Boolean
serverLogLevel
This parameter configures the log level of the server's log. The following levels are available:
  • severe (least verbose)
  • warning
  • info
  • config
  • fine
  • finer
  • finest (most verbose)
The level off disables server logging. The default value is config.
Type: Enum
Values: all | config | fine | finer | finest | info | off | severe | warning
serverLogMode
This parameter configures the log mode of the server. If set to bulk (the default value), the entire log output of a PDF conversion is dumped after the conversion is finished. This can also be set to live which outputs log entries directly. However if there are multiple conversions in parallel, log entries from other conversions may be written out at the same time, so there is no guarantee that you will receive a coherent log of a single conversion (contrary to bulk). The mode off disables the server-side logging of all conversions.
Type: Enum
Values: bulk | live | off
systemdLogLevel
This parameter configures Systemd logging. If this parameter is configured, log messages will be logged to the Systemd log in addition to the server log file. Available values are SEVERE, WARNING, CONFIG, and INFO. Systemd logging is only supported for Linux system that support Systemd. You can access PDFreactor logs through their identifier, e.g. journalctl -t pdfreactor
Type: Enum
Values: config | info | severe | warning
threadPoolSize
This parameter determines the number of parallel conversions that can be performed by the PDFreactor Web Service. Please note that while there is no maximum value for this, only a thread pool size that is lower as or equal to the system's maximum amount of threads will increase performance when converting documents in parallel. The default value is calculated from the system's number of processors.
Type: Integer
Unit: Amount

Supported Barcode Types and Properties

PDFreactor supports the following barcode symbologies, each handling some of the -ro-barcode-* CSS properties differently.

These -ro-barcode-* properties apply to all barcode types:

These apply to all barcode types with human readable text:

-ro-barcode-encoding applies to all barcode types, however they don't necessarily support all 3 available data types.

-ro-barcode-size applies to most barcode types. If the property is not explicitly mentioned, it adjusts the bar height.

Please refer to the CSS documentation for more information.

Some barcode symbologies impose additional restrictions on the input data besides limiting the allowed characters.

If the -ro-barcode-type property is mentioned below, the entry always refers to its optional last argument.

.barcode {
    -ro-replacedelement: barcode;
    -ro-barcode-type: code2of5 interleaved enabled;
    -ro-barcode-content: "1234567890";
}

QR Code

The QR Code bar code symbology according to ISO/IEC 18004:2015.

Identifier: qrcode

Allowed Characters: The Latin-1 set and Kanji characters which are members of the Shift-JIS encoding scheme.

Supported Data Types: eci, hibc, gs1

-ro-barcode-size
Default Value Possible Values Description
auto 1 - 40 Selects a QR code size, refer to the QR code version table for more detailed information.
-ro-barcode-ecc-level
Default Value Possible Values Description
auto L, M, Q, H Sets the error correction level.
QR Code Version Table
-ro-barcode-size Symbol Size
1 21 x 21
2 25 x 25
3 29 x 29
4 33 x 33
5 37 x 37
6 41 x 41
7 45 x 45
8 49 x 49
9 53 x 53
10 57 x 57
11 61 x 61
12 65 x 65
13 69 x 69
14 73 x 73
-ro-barcode-size Symbol Size
15 77 x 77
16 81 x 81
17 85 x 85
18 89 x 89
19 93 x 93
20 97 x 97
21 101 x 101
22 105 x 105
23 109 x 109
24 113 x 113
25 117 x 117
26 121 x 121
27 125 x 125
-ro-barcode-size Symbol Size
28 129 x 129
29 133 x 133
30 137 x 137
31 141 x 141
32 145 x 145
33 149 x 149
34 153 x 153
35 157 x 157
36 161 x 161
37 165 x 165
38 169 x 169
39 173 x 173
40 177 x 177

Code 128

The Code 128 barcode symbology as defined in ISO/IEC 15417:2007.

Identifier: code128

Allowed Characters: 8-bit ISO 8859-1 (Latin-1) characters.

Supported Data Types: eci, hibc, gs1

-ro-barcode-type
Default Value Possible Values Description
disabled enabled, disabled Defines whether to prohibit the barcode from using subset mode C for numeric data compression.
-ro-barcode-reader-initialization
Default Value Possible Values Description
disabled enabled, disabled Defines whether reader initialization instructions should be added to the barcode.

Code 32

Code 32, also known as Italian Pharmacode.

Identifier: code32

Allowed Characters: 0-9

Supported Data Types: eci, hibc

Code 49

Code 49 according to ANSI/AIM-BC6-2000.

Identifier: code49

Allowed Characters: ASCII

Supported Data Types: eci, hibc, gs1

Code 11

Identifier: code11

Allowed Characters: 0-9 and dash (-).

Supported Data Types: eci, hibc

-ro-barcode-human-readable-affix
Default Value Possible Values Description
none One or two strings with a length of 1. Determines the affix characters at the beginning and the end of the human readable text. The first argument sets the prefix, while the second sets the suffix. If the second is omitted, the first argument sets both.
-ro-barcode-type
Default Value Possible Values Description
2 1 or 2 Sets the number of checkdigits to be calculated.

Code 93

Identifier: code93

Allowed Characters: ASCII text.

Supported Data Types: eci, hibc

-ro-barcode-human-readable-affix
Default Value Possible Values Description
none A string with a length of 1. Determines the affix characters at the beginning and the end of the human readable text. When applied to a Code 93 barcode, this affix sets both the prefix and suffix.
-ro-barcode-type
Default Value Possible Values Description
2 1 or 2 Sets the number of checkdigits to be calculated.

Code16k

Identifier: code16k

-ro-barcode-reader-initialization
Default Value Possible Values Description
disabled enabled, disabled Defines whether reader initialization instructions should be added to the barcode.

PDF417

The PDF417/MicroPDF417 bar code symbologies according to ISO/IEC 15438:2006 and ISO/IEC 24728:2006.

Identifier: pdf417

Default Subtype: normal

Allowed Characters: ASCII

Supported Data Types: eci, hibc

Supported Subtypes
Identifier Description
normal A typical PDF417 barcode.
truncated As opposed to a normal PDF417, its truncated version are missing one data codeword and the stop bars from each row.
micro A smaller version of PDF417 codes.
-ro-barcode-ecc-level
Default Value Possible Values Description
auto 0-8 Sets the error correction level. Does not apply to MicroPDF417.
-ro-barcode-size
Default Value Possible Values Description
auto Columns: 1-30 for (truncated) PDF417, 1-4 for MicroPDF417.
Rows: 3-90 for (truncated) PDF417, 4-44 for MicroPDF417.
Sets the number of columns and rows this barcode should contain. The first value defines the columns, the second defines the rows.
-ro-barcode-structured-append
Default Value Possible Values Description
none Positive integers Defines a structured series. The first value sets the total number of barcodes belonging to it, the second value defines the ID of the series.
-ro-barcode-structured-append-position
Default Value Possible Values Description
auto Positive integers Defines the position of this barcode within a structured series.
-ro-barcode-reader-initialization
Default Value Possible Values Description
disabled enabled, disabled Defines whether reader initialization instructions should be added to the barcode.

Australia Post

Identifier: auspost

Supported Data Types: eci, hibc

Australia Post Reply Paid

Identifier: ausreply

Supported Data Types: eci, hibc

Australia Post Routing

Identifier: ausroute

Supported Data Types: eci, hibc

Australia Post Redirect

Identifier: ausredirect

Supported Data Types: eci, hibc

Code 3 of 9

The code 3 of 9 bar code symbology according to ISO/IEC 16388:2007.

Identifier: code39

Default Subtype: normal

Allowed Characters: 0-9, A-Z, dash (-), full stop (.), space, dollar ($), slash (/), plus (+) and percent (%). ASCII for Code 3 of 9 extended.

Supported Data Types: eci, hibc

Supported Subtypes
Identifier Description
normal A standard Code 3 of 9.
extended An extended version which is able to encode all ASCII characters.
-ro-barcode-checkdigit-mode
Default Value Possible Values Description
none mod43, none Sets whether checkdigits should be calculated.

MSI Plessey

Identifier: msiplessey

Allowed Characters: 0-9

Supported Data Types: eci, hibc

-ro-barcode-checkdigit-mode
Default Value Possible Values Description
none none, mod10, mod11, mod1010, mod1011 Sets how checkdigits should be calculated.

Channel Code

Channel Code according to ANSI/AIM BC12-1998.

Identifier: channelcode

Allowed Characters: 0-9

Supported Data Types: eci, hibc

-ro-barcode-type
Default Value Possible Values Description
auto 3 - 8 Sets the preferred amount of channels used to encode the data.

Codabar

Codabar barcode symbology according to BS EN 798:1996.

Also known as NW-7, Monarch, Code 27, Ames Code, USD-4 and ABC Codabar.

Identifier: codabar

Allowed Characters: 0-9, dash (-), dollar ($), colon (:), slash (/), full stop (.) and plus (+)

Content must start and end with "A", "B", "C", or "D"

Supported Data Types: eci, hibc

EAN-8

EAN bar code symbology according to BS EN 797:1996

Identifier: ean-8

Allowed Characters: 0-9

Supported Data Types: eci, hibc

-ro-barcode-type
Default Value Possible Values Description
auto An absolute length. Changes the guard length of the barcode.

EAN-13

EAN bar code symbology according to BS EN 797:1996

Identifier: ean-13

Allowed Characters: 0-9

Supported Data Types: eci, hibc

-ro-barcode-type
Default Value Possible Values Description
auto An absolute length. Changes the guard length of the barcode.

UPC-A

UPC bar code symbology according to BS EN 797:1996.

Identifier: upc-a

Allowed Characters: 0-9

Supported Data Types: eci, hibc

-ro-barcode-type
Default Value Possible Values Description
auto An absolute length. Changes the guard length of the barcode.

UPC-E

UPC bar code symbology according to BS EN 797:1996.

Identifier: upc-e

Allowed Characters: 0-9

Supported Data Types: eci, hibc

-ro-barcode-type
Default Value Possible Values Description
auto An absolute length. Changes the guard length of the barcode.

Ean/UPC Addon

EAN/UPC add-on bar code symbology according to BS EN 797:1996.

Identifier: addon

Allowed Characters: 0-9

Supported Data Types: eci, hibc

Telepen

Also known as Telepen Alpha.

Identifier: telepen

Allowed Characters: ASCII

Default Subtype: normal

Supported Data Types: eci, hibc

Supported Subtypes
Identifier Description
normal Allows all ASCII content.
numeric Only allows numeric content.

GS1 Databar / Databar 14

GS1 DataBar according to ISO/IEC 24724:2011

Identifier: databar

Default Subtype: linear

Allowed Characters: 0-9

Supported Data Types: gs1, but with an omitted Application Identifer and check digit. Thus not considered GS1 format data.

Supported Subtypes
Identifier Description
linear Standard Databar.
stacked A stacked version, which is smaller that a linear databar, but not omnidirectional.
omnidirectional A stacked omnidirectional Databar.

GS1 Databar Expanded / Databar 14 Expanded

GS1 DataBar Expanded according to ISO/IEC 24724:2011

Identifier: databar-expanded

Default Subtype: normal

Allowed Characters: 0-9

Supported Data Types: gs1

Supported Subtypes
Identifier Description
normal Standard GS1 Databar Expanded.
stacked A stacked version of the GS1 Databar Expanded.
-ro-barcode-size
Default Value Possible Values Description
auto An integer between 1 and 10 to set the column count, a length to set the bar length or both. Sets the bar length and the number of columns/symbol segments this barcode should contain.

GS1 Databar Limited

GS1 DataBar Limited according to ISO/IEC 24724:2011

Identifier: databar-limited

Allowed Characters: 0-9

Supported Data Types: gs1, but with an omitted Application Identifer and check digit. Thus not considered GS1 format data.

Dutch Post Kix Code

Dutch Post KIX Code as used by Royal Dutch TPG Post (Netherlands).

Identifier: kixcode

Allowed Characters: 0-9, A-Z

Supported Data Types: eci, hibc

Japan Post

The Japanese Postal Code symbology

Identifier: japan-post

Allowed Characters: 0-9, A-Z and the dash (-) character

Supported Data Types: eci, hibc

Royal Mail

Royal Mail 4-State Country Code

Identifier: royal-mail

Allowed Characters: 0-9, A-Z

Supported Data Types: eci, hibc

Korea Post

Identifier: korea-post

Allowed Characters: 0-9

Supported Data Types: eci, hibc

USPS OneCode (Intelligent Mail)

USPS OneCode (Intelligent Mail Barcode) according to USPS-B-3200F

Identifier: usps-onecode

Allowed Characters: 0-9, dash (-)

Supported Data Types: eci, hibc

USPS Package

USPS Intelligent Mail Package Barcode (IMpb), a linear barcode based on GS1-128.

Identifier: usps-package

Allowed Characters: 0-9

Supported Data Types: gs1

POSTNET (Postal Numeric Encoding Technique)

The POSTNET (Postal Numeric Encoding Technique) barcode symbology used by the United States Postal Service.

Identifier: postnet

Default Subtype: normal

Allowed Characters: 0-9

Supported Data Types: eci, hibc

Supported Subtypes
Identifier Description
normal A standard POSTNET code.
planet A Postal Alpha Numeric Encoding Technique (PLANET) barcode.

Pharmazentralnummer (PZN-8)

A Code 39 based symbology used by the pharmaceutical industry in Germany.

Identifier: pzn8

Allowed Characters: 0-9

Supported Data Types: eci, hibc

Pharmacode

Identifier: pharmacode

Default Subtype: onetrack

Allowed Characters: 0-9

Supported Data Types: eci, hibc

Supported Subtypes
Identifier Description
onetrack A Pharmacode consisting of one track.
twotrack A Phramacode consisting of two tracks.

Codablock-F

Symbology according to AIM Europe "Uniform Symbology Specification - Codablock F", 1995.

Identifier: codablockf

Allowed Characters: 8-bit ISO 8859-1 (Latin-1)

Supported Data Types: eci, hibc

Logmars

The LOGMARS (Logistics Applications of Automated Marking and Reading Symbols) standard used by the US Department of Defense.

Identifier: logmars

Allowed Characters: 0-9, A-Z, dash (-), full stop (.), space, dollar ($), slash (/), plus (+) and percent (%).

Supported Data Types: eci, hibc

Aztec Runes

Aztec Runes bar code symbology according to ISO/IEC 24778:2008 Annex A.

Identifier: aztec-runes

Allowed Characters: 0-9

Supported Data Types: eci, hibc

Aztec Code

Aztec Code bar code symbology According to ISO/IEC 24778:2008.

Identifier: aztec-code

Allowed Characters: 8-bit ISO 8859-1 (Latin-1)

Supported Data Types: eci, hibc, gs1

-ro-barcode-reader-initialization
Default Value Possible Values Description
disabled enabled, disabled Defines whether reader initialization instructions should be added to the barcode.
-ro-barcode-ecc-level
Default Value Possible Values Description
auto
Value Error Correction Capacity
1 > 10% + 3 codewords
2 > 23% + 3 codewords
3 > 36% + 3 codewords
4 > 50% + 3 codewords
Sets the error correction level.
-ro-barcode-size
Default Value Possible Values Description
auto 1 - 4 for "compact" Aztec code symbols,
5 - 36 for "full-range" Aztec code symbols.
Selects a Aztec code size, refer to the Aztec code version table for more detailed information.
-ro-barcode-structured-append
Default Value Possible Values Description
none An integer for the total number of barcodes, a string for the id. Defines a structured series. The first value sets the total number of barcodes belonging to it, the second value defines the ID of the series.
-ro-barcode-structured-append-position
Default Value Possible Values Description
auto Positive integers. Defines the position of this barcode within a structured series.
Aztec Code Version Table
-ro-barcode-size Symbol Size
1 15 x 15
2 19 x 19
3 23 x 23
4 27 x 27
5 19 x 19
6 23 x 23
7 27 x 27
8 31 x 31
9 37 x 37
1041 x 41
1145 x 45
1249 x 49
-ro-barcode-size Symbol Size
1353 x 53
1457 x 57
1561 x 61
1667 x 67
1771 x 71
1875 x 75
1979 x 79
2083 x 83
2187 x 87
2291 x 91
2395 x 95
24101 x 101
-ro-barcode-size Symbol Size
25105 x 105
26109 x 109
27113 x 113
28117 x 117
29121 x 121
30125 x 125
31131 x 131
32135 x 135
33139 x 139
34143 x 143
35147 x 147
36151 x 151

Data Matrix

Data Matrix ECC 200 bar code symbology According to ISO/IEC 16022:2006

Identifier: data-matrix

Default Subtype: square

Allowed Characters: ISO/IEC 8859-1 (Latin-1)

Supported Data Types: eci, hibc, gs1

Supported Subtypes
Identifier Description
square A square shaped data matrix.
rectangle A rectangular data matrix
-ro-barcode-reader-initialization
Default Value Possible Values Description
disabled enabled, disabled Defines whether reader initialization instructions should be added to the barcode.
-ro-barcode-size
Default Value Possible Values Description
auto 1 - 30 Selects a Data Matrix size, refer to the Data Matrix version table for more detailed information.
-ro-barcode-structured-append
Default Value Possible Values Description
none Two integers. Defines a structured series. The first value sets the total number of barcodes belonging to it, the second value defines the ID of the series.
-ro-barcode-structured-append-position
Default Value Possible Values Description
auto Positive integers. Defines the position of this barcode within a structured series.
Data Matrix Version Table
-ro-barcode-size Symbol Size
1 10 x 10
2 12 x 12
3 14 x 14
4 16 x 16
5 18 x 18
6 20 x 20
7 22 x 22
8 24 x 24
9 26 x 26
1032 x 32
-ro-barcode-size Symbol Size
1136 x 36
1240 x 40
1344 x 44
1448 x 48
1552 x 52
1664 x 64
1772 x 72
1880 x 80
1988 x 88
2096 x 96
-ro-barcode-size Symbol Size
21104 x 104
22120 x 120
23132 x 132
24144 x 144
258 x 18
268 x 32
2712 x 26
2812 x 36
2916 x 36
3016 x 48

Code One

Identifier: code-one

Allowed Characters: ISO 8859-1 (Latin-1)

Supported Data Types: eci, hibc, gs1

-ro-barcode-size
Default Value Possible Values Description
auto 1-10 Selects a Code One version, refer to the Code One version table for more detailed information.
Code One Version Table
-ro-barcode-size Version (Size)
1 A: 18 x 16
2 B: 22 x 22
3 C: 32 x 28
4 D: 42 x 40
5 E: 54 x 52
-ro-barcode-size Version (Size)
6 F: 76 x 70
7 G: 98 x 104
8 H: 134 x 148
9 S: ? x 9
10 T: ? x 17

The width of the Code One versions S and T is determined by the amount of encoded data. For version S it is either 13, 23 or 33, for version T it is either 19, 35 or 51.

Grid Matrix

Grid Matrix bar code symbology according to AIMD014

Identifier: grid-matrix

Allowed Characters: ISO/IEC 8859-1 (Latin-1) and GB-2312

Supported Data Types: eci, hibc

-ro-barcode-reader-initialization
Default Value Possible Values Description
disabled enabled, disabled Defines whether reader initialization instructions should be added to the barcode.
-ro-barcode-ecc-level
Default Value Possible Values Description
auto
Value Error Correction Capacity
1 ~10%
2 ~20%
3 ~30%
4 ~40%
5 ~50%
Sets the error correction level.
-ro-barcode-size
Default Value Possible Values Description
auto 1 - 13 Selects a Grid Matrix size, refer to the Grid Matrix version table for more detailed information.
Grid Matrix Version Table
-ro-barcode-size Symbol Size
118 x 18
230 x 30
342 x 42
454 x 54
566 x 66
678 x 78
790 x 90
-ro-barcode-size Symbol Size
8 102 x 102
9 114 x 114
10126 x 126
11138 x 138
12150 x 150
13162 x 162

Maxicode

MaxiCode barcode symbology according to ISO 16023:2000

Identifier: maxicode

Default Subtype: mode-4

Allowed Characters: ISO 8859-1 (Latin-1)

Supported Data Types: eci, hibc

Supported Subtypes
Identifier Description
mode-2 Formatted data containing a structured Carrier Message with a numeric postal code.
mode-3 Formatted data containing a structured Carrier Message with an alphanumeric postal code.
mode-4 Unformatted data with Standard Error Correction.
mode-5 Unformatted data with Enhanced Error Correction.
mode-6 Used for programming hardware devices.
-ro-barcode-structured-append
Default Value Possible Values Description
none Positive integers. Defines a structured series. The first value sets the total number of barcodes belonging to it. Structured Maxicode series do not have an ID.
-ro-barcode-structured-append-position
Default Value Possible Values Description
auto Positive integers. Defines the position of this barcode within a structured series.
-ro-barcode-type
Default Value Possible Values Description
auto A string whose characters should comform with the following requirements:
  • 1-9 - Postal code data consisting of up to 9 digits (mode 2) or up to 6 alphanumeric characters (mode 3). The remaining characters should be filled with spaces.
  • 10-12 - Three-digit country code according to ISO-3166.
  • 13-15 - Three digit service code. This depends on your parcel courier.
Sets the primary data and should only be used with Maxicode mode 2 or 3.

Micro QR

Micro QR Code according to ISO/IEC 18004:2006

Identifier: microqr

Allowed Characters: The Latin-1 set and Kanji characters which are members of the Shift-JIS encoding scheme.

Supported Data Types: eci, hibc

-ro-barcode-size
Default Value Possible Values Description
auto 1 - 4, maps to M1 to M4. Selects a Micro QR code size.
-ro-barcode-ecc-level
Default Value Possible Values Description
auto L, M, Q Sets the error correction level.

Code 2 of 5

The Code 2 of 5 family of barcode standards.

Identifier: code2of5

Default Subtype: matrix

Allowed Characters: 0-9

Supported Data Types: eci, hibc

Supported Subtypes
Identifier Description
matrix Standard Code 2 of 5 mode, also known as Code 2 of 5 Matrix.
industrial Industrial Code 2 of 5.
iata International Air Transport Agency variation of Code 2 of 5.
data-logic Code 2 of 5 Data Logic.
interleaved Interleaved Code 2 of 5.
itf14 ITF-14, also known as UPC Shipping Container Symbol or Case Code. Requires a 13-digit numeric input.
dp-leitcode Deutsche Post Leitcode. Requires a 13-digit numerical input.
dp-identcode Deutsche Post Identcode. Requires an 11-digit numerical input.
-ro-barcode-type
Default Value Possible Values Description
disabled enabled, disabled Defines whether a checkdigit should be added, only applicable to Code 2 of 5 interleaved.

ITF-14 (UPC Shipping Container Symbol or Case Code)

Identifier: itf14

Allowed Characters: 0-9

Supported Data Types: eci, hibc

Deutsche Post Leitcode

Identifier: dp-leitcode

Allowed Characters: 0-9

Supported Data Types: eci, hibc

Deutsche Post Identcode

Identifier: dp-identcode

Allowed Characters: 0-9

Supported Data Types: eci, hibc

Nummer der Versandeinheit / Serial Shipping Container Code

Identifier: nve18 or sscc18

Allowed Content: 0-9

Supported Data Types: gs1

GS1 Composite

GS1 Composite symbology according to ISO/IEC 24723:2010.

Identifier: composite

Consists of a linear and 2 dimensional part. The subtypes refer to the 2D one.

Default Subtype: cc-a

Allowed Content: ASCII

Supported Data Types: gs1

Supported Subtypes
Identifier Description
cc-a MicroPDF417 symbol variant, encodes up to 56 alphanumeric digits.
cc-b MicroPDF417 symbol variant, encodes up to 338 alphanumeric digits.
cc-c PDF417 symbol variant, encodes up to 2361 alphanumeric digits.
-ro-barcode-composite-type
Default Value Possible Values Description
code128 Behaves like -ro-barcode-type, but is restricted to the following types/subtypes:
  • code128
  • databar
  • databar stacked
  • databar omnidirectional-stacked
  • databar-expanded
  • databar-expanded stacked
  • databar-limited
  • ean-8
  • upc-a
  • upc-e
Defines the barcode type of the linear part of a GS1 Composite barcode.
-ro-barcode-composite-content
Default Value Possible Values Description
auto Depends on the selected barcode type. Sets the content to be encoded in the linear part of a GS1 composite barcode.

Code Samples for Other Languages

CSS Support

Default Style Rules

The element's default styles are described in the User Agent Stylesheet. While most of these styles are adapted from the specificationssee https://html.spec.whatwg.org/multipage/rendering.html and match the styles of browsers, PDFreactor adds some sets of style rules, for example those related to pagination:

Special PDFreactor Default Style Rules
Selector Declarations
@page size: A4;
margin: 2cm;
white-space: pre-line;
counter-increment: page;
h1, h2, h3, h4, h5, h6 break-after: avoid;
@footnote padding-top: 6pt;
border-top: solid black thin;
-ro-border-length: 30%;
margin-top: 6pt;
::footnote-call counter-increment: footnote 1;
content: counter(footnote, decimal);
line-height: 0;

font-size: smaller;
vertical-align: super;
::footnote-marker content: counter(footnote, decimal) " ";
line-height: 0;

font-size: smaller;
vertical-align: super;
blockquote[type="cite"] padding-inline-start: 1em;
border-inline-start: solid;
border-color: blue;
border-width: thin;

CSS Attribute Selector

PDFreactor supports the following CSS selectors which select elements that have certain attributes:

Supported attribute selectors
Attribute selector Meaning CSS Level
Elem[attr] An Elem element with a attr attribute. CSS 2.1
Elem[attr="val"] An Elem element whose attr attribute value is exactly equal to "val". CSS 2.1
Elem[attr~="val"] An Elem element whose attr attribute value is a list of whitespace-separated values, one of which is exactly equal to "val". CSS 2.1
Elem[attr^="val"] An Elem element whose attr attribute value begins exactly with the string "val". CSS 3
Elem[attr$="val"] An Elem element whose attr attribute value ends exactly with the string "val". CSS 3
Elem[attr*="val"] An Elem element whose attr attribute value contains the substring "val". CSS 3

Supported Page Size Formats

Keywords for the supported A series formats, based on DIN 476/ISO 216, and their corresponding oversize formats
A series Size [mm] RA oversizes Size [mm] SRA oversizes Size [mm]
A0 841 x 1189 RA0 860 x 1220 SRA0 900 x 1280
A1 594 x 841 RA1 610 x 860 SRA1 640 x 900
A2 420 x 594 RA2 430 x 610 SRA2 450 x 640
A3 297 x 420 RA3 305 x 430 SRA3 320 x 450
A4 210 x 297 RA4 215 x 305 SRA4 225 x 320
A5 148 x 210 RA5 152 x 215 SRA5 160 x 225
A6 105 x 148 RA6 107 x 152 SRA6 112 x 160
A7 74 x 105 RA7 76 x 107 SRA7 80 x 112
A8 52 x 74 RA8 53 x 76 SRA8 56 x 80
A9 37 x 52
A10 26 x 37
CSS Keywords for the supported B series formats
B series Size [mm]
B1 707 x 1000
B2 500 x 707
B3 353 x 500
B4 250 x 353
B5 176 x 250
B6 125 x 176
B7 88 x 125
B8 62 x 88
B9 44 x 62
B10 31 x 44
Keywords for the supported C series formats
C series Size [mm]
C1 648 x 917
C2 458 x 648
C3 324 x 458
C4 229 x 324
C5 162 x 229
C6 114 x 162
C7 81 x 114
C8 57 x 81
C9 40 x 57
C10 28 x 40
Keywords for supported international page formats
Page format Size [in]
Letter 8.5 x 11
Legal 8.5 x 14
Ledger 11 x 17
Invoice 5.5 x 8
Executive 7.25 x 10.5
Broadsheet 17 x 22

Supported Hyphenation Languages

Hyphenation languages (a-f)
ISO 639-1 Language
af Afrikaans
as Assamese
bg Bulgarian
bn Bengali, Bangla
ca Catalan
cy Welsh
da Danish
de New German
de-1901 German traditional
de-CH German, Switzerland
el Greek, Modern
el_Polyton.hyp Greek, Polyton
en English (US)
en-GB English (GB)
eo Esperanto
es Spanish
et Estonian
eu Basque
fi Finnish
fr French
fur Friulian
Hyphenation languages (g-m)
ISO 639-1 Language
gl Galician
grc Greek, Ancient
gu Gujarati
hi Hindi
hr Croatian
hsb Upper Sorbian
ia Interlingua
id Indonesian (Bahasa Indonesia)
is Icelandic
it Italian
ka Georgian
kmr Kurmanji (Northern Kurdish)
kn Kannada
la Latin
la Latin
la-CL Latin
lt Lithuanian
ml Malayalam
mn Mongolian
mr Marathi
mul Multiple languages
Hyphenation languages (n-z)
ISO 639-1 Language
nb Norwegian Bokmål
nl Dutch
nn Norwegian Nynorsk
oc Occitan
or Oriya
pa Panjabi
pl Polish
pms Piemontese
pt Portuguese
rm Romansh
ro Romanian
ru Russian
sa Sanskrit
sl Slovenian
sr-Cyrl Serbian, Cyrillic
sr-Latn Serbian, Latin
sv Swedish
ta Tamil
te Telugu
th Thai
tk Turkmen
tr Turkish
uk Ukrainian

Supported length units

Absolute length units
Unit Description
mm millimeters
cm centimeters
q quarter-millimeters
in inches
pt points
px pixels
pc pica
Proprietary length units
Unit Description
-ro-pw Equal to 1% of the width of the first page, including its margins.
-ro-ph Equal to 1% of the height of the first page, including its margins.
-ro-pmin Equal to the smaller of '-ro-pw' and '-ro-ph'.
-ro-pmax Equal to the larger of '-ro-pw' and '-ro-ph'.
-ro-bw Equal to 1% of the width of the page bleed box of the first page.
-ro-bh Equal to 1% of the height of the page bleed box of the first page.
-ro-bmin Equal to the smaller of '-ro-bw' and '-ro-bh'.
-ro-bmax Equal to the larger of '-ro-bw' and '-ro-bh'.
Relative length units
Unit Description
% percent
em Relative to the font size of the element.
rem Relative to the font size of the root element.
ex Equal to the used x-height of the first available font.
ch Equal to the width of the "0" glyph in the font of the element.
vw Equal to 1% of the width of the content area of the first page.
vh Equal to 1% of the height of the content area of the first page.
vmin Equal to the smaller of 'vw' and 'vh'.
vmax Equal to the larger of 'vw' and 'vh'.
-ro-cap Equal the capital letter height of the font.
-ro-ic Equal to the width of the glyph "水" (U+6C34) in the font of the element.
-ro-lh Equal to the line height of the element.
-ro-rlh Equal to the line height of the root element.

CSS Color Keywords

Supported Color Keywords
Color name Color hex RGB Decimal
aliceblue #F0F8FF 240,248,255
antiquewhite #FAEBD7 250,235,215
aqua #00FFFF 0,255,255
aquamarine #7FFFD4 127,255,212
azure #F0FFFF 240,255,255
beige #F5F5DC 245,245,220
bisque #FFE4C4 255,228,196
black #000000 0,0,0
blanchedalmond #FFEBCD 255,235,205
blue #0000FF 0,0,255
blueviolet #8A2BE2 138,43,226
brown #A52A2A 165,42,42
burlywood #DEB887 222,184,135
cadetblue #5F9EA0 95,158,160
chartreuse #7FFF00 127,255,0
chocolate #D2691E 210,105,30
coral #FF7F50 255,127,80
cornflowerblue #6495ED 100,149,237
cornsilk #FFF8DC 255,248,220
crimson #DC143C 220,20,60
cyan #00FFFF 0,255,255
darkblue #00008B 0,0,139
darkcyan #008B8B 0,139,139
darkgoldenrod #B8860B 184,134,11
darkgray/darkgrey #A9A9A9 169,169,169
darkgreen #006400 0,100,0
darkkhaki #BDB76B 189,183,107
darkmagenta #8B008B 139,0,139
darkolivegreen #556B2F 85,107,47
darkorange #FF8C00 255,140,0
darkorchid #9932CC 153,50,204
darkred #8B0000 139,0,0
darksalmon #E9967A 233,150,122
darkseagreen #8FBC8F 143,188,143
darkslateblue #483D8B 72,61,139
darkslategray/darkslategrey #2F4F4F 47,79,79
darkturquoise #00CED1 0,206,209
darkviolet #9400D3 148,0,211
deeppink #FF1493 255,20,147
deepskyblue #00BFFF 0,191,255
dimgray/dimgrey #696969 105,105,105
dodgerblue #1E90FF 30,144,255
firebrick #B22222 178,34,34
floralwhite #FFFAF0 255,250,240
forestgreen #228B22 34,139,34
fuchsia #FF00FF 255,0,255
gainsboro #DCDCDC 220,220,220
ghostwhite #F8F8FF 248,248,255
gold #FFD700 255,215,0
goldenrod #DAA520 218,165,32
gray/grey #808080 128,128,128
green #008000 0,128,0
greenyellow #ADFF2F 173,255,47
honeydew #F0FFF0 240,255,240
hotpink #FF69B4 255,105,180
indianred #CD5C5C 205,92,92
indigo #4B0082 75,0,130
ivory #FFFFF0 255,255,240
khaki #F0E68C 240,230,140
lavender #E6E6FA 230,230,250
lavenderblush #FFF0F5 255,240,245
lawngreen #7CFC00 124,252,0
lemonchiffon #FFFACD 255,250,205
lightblue #ADD8E6 173,216,230
lightcoral #F08080 240,128,128
lightcyan #E0FFFF 224,255,255
lightgoldenrodyellow #FAFAD2 250,250,210
lightgray/lightgrey #D3D3D3 211,211,211
lightgreen #90EE90 144,238,144
lightpink #FFB6C1 255,182,193
lightsalmon #FFA07A 255,160,122
lightseagreen #20B2AA 32,178,170
lightskyblue #87CEFA 135,206,250
lightslategray/lightslategrey #778899 119,136,153
lightsteelblue #B0C4DE 176,196,222
lightyellow #FFFFE0 255,255,224
lime #00FF00 0,255,0
limegreen #32CD32 50,205,50
linen #FAF0E6 250,240,230
magenta #FF00FF 255,0,255
maroon #800000 128,0,0
mediumaquamarine #66CDAA 102,205,170
mediumblue #0000CD 0,0,205
mediumorchid #BA55D3 186,85,211
mediumpurple #9370DB 147,112,219
mediumseagreen #3CB371 60,179,113
mediumslateblue #7B68EE 123,104,238
mediumspringgreen #00FA9A 0,250,154
mediumturquoise #48D1CC 72,209,204
mediumvioletred #C71585 199,21,133
midnightblue #191970 25,25,112
mintcream #F5FFFA 245,255,250
mistyrose #FFE4E1 255,228,225
moccasin #FFE4B5 255,228,181
navajowhite #FFDEAD 255,222,173
navy #000080 0,0,128
oldlace #FDF5E6 253,245,230
olive #808000 128,128,0
olivedrab #6B8E23 107,142,35
orange #FFA500 255,165,0
orangered #FF4500 255,69,0
orchid #DA70D6 218,112,214
palegoldenrod #EEE8AA 238,232,170
palegreen #98FB98 152,251,152
paleturquoise #AFEEEE 175,238,238
palevioletred #DB7093 219,112,147
papayawhip #FFEFD5 255,239,213
peachpuff #FFDAB9 255,218,185
peru #CD853F 205,133,63
pink #FFC0CB 255,192,203
plum #DDA0DD 221,160,221
powderblue #B0E0E6 176,224,230
purple #800080 128,0,128
rebeccapurple #663399 102,51,153
red #FF0000 255,0,0
rosybrown #BC8F8F 188,143,143
royalblue #4169E1 65,105,225
saddlebrown #8B4513 139,69,19
salmon #FA8072 250,128,114
sandybrown #F4A460 244,164,96
seagreen #2E8B57 46,139,87
seashell #FFF5EE 255,245,238
sienna #A0522D 160,82,45
silver #C0C0C0 192,192,192
skyblue #87CEEB 135,206,235
slateblue #6A5ACD 106,90,205
slategray/slategrey #708090 112,128,144
snow #FFFAFA 255,250,250
springgreen #00FF7F 0,255,127
steelblue #4682B4 70,130,180
tan #D2B48C 210,180,140
teal #008080 0,128,128
thistle #D8BFD8 216,191,216
tomato #FF6347 255,99,71
turquoise #40E0D0 64,224,208
violet #EE82EE 238,130,238
wheat #F5DEB3 245,222,179
white #FFFFFF 255,255,255
whitesmoke #F5F5F5 245,245,245
yellow #FFFF00 255,255,0
yellowgreen #9ACD32 154,205,50
-ro-comment-highlight #FFFF0B 255,255,11
-ro-comment-underline #23FF06 35,255,6
-ro-comment-strikeout #FB0007 251,0,7

Counter and Ordered List Style Types

Supported counter and ordered list style types
Counter style name 1 12 123 1234
decimal
  1.  
  1.  
  1.  
  1.  
decimal-leading-zero
  1.  
  1.  
  1.  
  1.  
super-decimal
  1.  
  1.  
  1.  
  1.  
upper-hexadecimal
  1.  
  1.  
  1.  
  1.  
lower-hexadecimal
  1.  
  1.  
  1.  
  1.  
octal
  1.  
  1.  
  1.  
  1.  
binary
  1.  
  1.  
  1.  
  1.  
upper-roman
  1.  
  1.  
  1.  
  1.  
lower-roman
  1.  
  1.  
  1.  
  1.  
upper-alpha
  1.  
  1.  
  1.  
  1.  
lower-alpha
  1.  
  1.  
  1.  
  1.  
arabic-indic
  1.  
  1.  
  1.  
  1.  
armenian
  1.  
  1.  
  1.  
  1.  
upper-armenian
  1.  
  1.  
  1.  
  1.  
lower-armenian
  1.  
  1.  
  1.  
  1.  
bengali
  1.  
  1.  
  1.  
  1.  
cambodian
  1.  
  1.  
  1.  
  1.  
devanagari
  1.  
  1.  
  1.  
  1.  
georgian
  1.  
  1.  
  1.  
  1.  
upper-greek
  1.  
  1.  
  1.  
  1.  
lower-greek
  1.  
  1.  
  1.  
  1.  
gujarati
  1.  
  1.  
  1.  
  1.  
gurmukhi
  1.  
  1.  
  1.  
  1.  
hiragana
  1.  
  1.  
  1.  
  1.  
hiragana-iroha
  1.  
  1.  
  1.  
  1.  
japanese-formal
  1.  
  1.  
  1.  
  1.  
japanese-informal
  1.  
  1.  
  1.  
  1.  
kannada
  1.  
  1.  
  1.  
  1.  
katakana
  1.  
  1.  
  1.  
  1.  
katakana-iroha
  1.  
  1.