HTML to PDF using Docker Container
Use PDFreactor in containerized environments
Use PDFreactor in containerized environments
Docker is a technology to develop, deploy, and run applications in containers. A container image is a standalone, executable and lightweight software package. It includes everything an application needs to run. When a container image is run it becomes a container, which is isolated from its environment like the the host system or other containers.
The main advantages of running an application in a container are as follows:
PDFreactor is provided as a preconfigured ready-to-run Docker image, which is available on Docker Hub. Using this image is certainly the most effective and easiest way to convert HTML to PDF using Docker.
The PDFreactor Docker image is available on Docker Hub and you can pull it using the following command:
docker pull realobjects/pdfreactor
The PDFreactor Docker Image is based on the debian image available on Docker Hub. In this base image we install an Oracle Java and a preconfigured Jetty application server (version 9.4.x) containing all required files for the PDFreactor Web Service.
docker run -d -p 8080:9423 realobjects/pdfreactor
var pdfReactor = new PDFreactor("http://yourhost.com:8080/service/rest");
Further information on how to configure the PDFreactor Docker Container can be found here.
In most cases a containerized application/solution is used to create a highly available and redundant environment. This is also possible using PDFreactor in orchestration tools like e.g. Kubernetes or Amazon Elastic Container Service with Elastic Load Balancing.
When using the PDFreactor Web Service there are two ways of converting HTML to PDF using Docker, which require different approaches in an orchestrated multi-node environment.
Using synchronous conversion is the easiest and most straightforward approach to convert HTML to PDF using Docker, as only one request will be made to a PDFreactor node in your environment. You basically send the configuration and/or document to one of your PDFreactor nodes and the response of this request will contain the converted PDF. Distribution between nodes can be achieved through a load balancer using e.g. round-robin or load-based allocation.
When using asynchronous conversion to convert HTML to PDF using Docker, multiple requests to one specific PDFreactor node are required (starting the conversion, getting progress regularily, fetching the resulting PDF). To be able to use the asynchronous endpoints in a highly available redundant scenario there are the following approaches:
This is however something the PDFreactor Web Service currently does not support. This will be improved in PDFreactor 10 and is already available in the PDFreactor 10 beta release which you can download here.
The feature allows to pass an empty “CommunicationSettings” object to the “convertAsync” method where all header or cookie information from the response of the convert request are saved (e.g. session cookies/headers set by a load balancer). This CommunicationSettings object can now be used for all further requests regarding this specific conversion by passing it to e.g. the getProgress() or getDocument() method.
If you do not want to use PDFreactor 10 Beta you could also use the PDFreactor 10 Web Service clients with your existing PDFreactor 9 installation, as they are compatible.
This would allow to retrieve the document from any of the PDFreactor nodes regardless on which of the nodes the actual conversion process happened. The only drawback for this is, that this only works for the “getDocument” method, when the conversion has already finished. You would not be able to reliably retrieve the current progress of the conversion, as this is bound to the server the conversion is running on.
You can specify the path to the document storage using the server parameter “docTempDir”. Currently you can only use a file system path for this, so you would have to mount a shared disc into your PDFreactor containers.
As you are not able to reliably pull the progress of a conversion you can instead use the Callback functionality of PDFreactor to trigger an endpoint in your application to e.g. notify the user that the conversion has finished or to periodically report the progress of a conversion.
Our PDFreactor Evaluation Cloud Service (available at https://cloud.pdfreactor.com) is set up on AWS using the Amazon Elastic Container Service with Elastic Load Balancing. So if you want to try to convert HTML to PDF using Docker in such an environment, feel free to use the REST API or one of the PDFreactor Client wrappers. Examples how to use it from different languages can be found here.