HTML to PDF using Docker Container

Use PDFreactor in containerized environments

About the technology to convert HTML to PDF: Docker

Docker is a technology to develop, deploy, and run applications in containers. A container image is a standalone, executable and lightweight software package. It includes everything an application needs to run. When a container image is run it becomes a container, which is isolated from its environment like the the host system or other containers.

Docker Advantages

The main advantages of running an application – especially PDFreactor to convert HTML to PDF via Docker – in a container are as follows:

  • Flexibility: Most applications (even complex ones) can be containerized and used as a microservice.
  • Shared Resources: Containers share the resources of the host system and you can even limit the resources available.
  • Portability: Develop and build locally, deploy and run anywhere.
  • Easy maintenance: Install and deploy updates or upgrades on-the-fly.
  • Scalability: Increase or decrease the amount of (cloned) container instances depending on your requirements.

Installing Docker

Installing Docker is as easy as downloading and installing one of the packages available at the Docker download page. More information on how to get started can be found in the Docker documentation. After installing Docker, it is very easy to deploy a PDFreactor container to convert HTML to PDF using Docker.

Using PDFreactor to convert HTML to PDF via Docker

PDFreactor is provided as a preconfigured ready-to-run Docker image, which is available on Docker Hub. Using this image is certainly the most effective and easiest way to convert HTML to PDF using Docker.

PDFreactor Docker Image

The PDFreactor Docker image is available on Docker Hub and you can pull it using the following command:

docker pull realobjects/pdfreactor

The PDFreactor Docker Image is based on the debian image available on Docker Hub. In this base image we install an Oracle Java and a preconfigured Jetty application server (version 9.4.x) containing all required files for the PDFreactor Web Service.

You can of course also create your own Docker Image based on the PDFreactor Docker Image by creating an appropriate dockerfile and including other Docker Images as needed.

Running the PDFreactor Docker Container

docker run -d -p 8080:9423 realobjects/pdfreactor

This will map the PDFreactor Web Service running in the docker container on port 9423 to port 8080 of the host system. You then can access the PDFreactor Web Service on this port on your host system through any of the PDFreactor REST API wrappers, for example using the JavaScript wrapper:

var pdfReactor = new PDFreactor("http://yourhost.com:8080/service/rest");

PDFreactor Docker Orchestration: Two ways of converting HTML to PDF using Docker

In most cases a containerized application/solution is used to create a highly available and redundant environment. This is also possible using PDFreactor in orchestration tools like e.g. Kubernetes or Amazon Elastic Container Service with Elastic Load Balancing.

When using the PDFreactor Web Service there are two ways of converting HTML to PDF using Docker, which require different approaches in an orchestrated multi-node environment.

Synchronous conversions

Using synchronous conversion is the easiest and most straightforward approach to convert HTML to PDF using Docker, as only one request will be made to a PDFreactor node in your environment. You basically send the configuration and/or document to one of your PDFreactor nodes and the response of this request will contain the converted PDF. Distribution between nodes can be achieved through a load balancer using e.g. round-robin or load-based allocation.

Asynchronous conversion

When using asynchronous conversion to convert HTML to PDF using Docker, multiple requests to one specific PDFreactor node are required (starting the conversion, getting progress regularily, fetching the resulting PDF). To be able to use the asynchronous endpoints in a highly available redundant scenario there are the following approaches:

Use a load balancer which relies on header/cookie based persistency.

This is however something the PDFreactor Web Service currently does not support. This will be improved in PDFreactor 10 and is already available in the PDFreactor 10 beta release which you can download here.

The feature allows to pass an empty “CommunicationSettings” object to the “convertAsync” method where all header or cookie information from the response of the convert request are saved (e.g. session cookies/headers set by a load balancer). This CommunicationSettings object can now be used for all further requests regarding this specific conversion by passing it to e.g. the getProgress() or getDocument() method.

If you do not want to install the PDFreactor 10 Beta you could also use the PDFreactor 10 Web Service clients with your existing PDFreactor 9 installation to convert HTML to PDF using Docker, as they are compatible.

Use a shared document storage for all your PDFreactor instances.

This would allow to retrieve the document from any of the PDFreactor nodes regardless on which of the nodes the actual conversion process happened. The only drawback for this is, that this only works for the “getDocument” method, when the conversion has already finished. You would not be able to reliably retrieve the current progress of the conversion, as this is bound to the server the conversion is running on.

You can specify the path to the document storage using the server parameter “docTempDir”. Currently you can only use a file system path for this, so you would have to mount a shared disc into your PDFreactor containers.

As you are not able to reliably pull the progress of a conversion you can instead use the Callback functionality of PDFreactor to trigger an endpoint in your application to e.g. notify the user that the conversion has finished or to periodically report the progress of a conversion.

Example of PDFreactor Docker orchestration

Our PDFreactor Evaluation Cloud Service (available at https://cloud.pdfreactor.com) is set up on AWS using the Amazon Elastic Container Service with Elastic Load Balancing. So if you want to try to convert HTML to PDF using Docker in such an environment without having to install it in your own environment, feel free to use the REST API or one of the PDFreactor Client wrappers. Examples how to use it from different languages can be found here.