Rendition Server examines incoming documents for any existing attachments, which can be present in emails, Word documents, PowerPoint presentations, and PDFs. Once identified, these attachments are extracted and included through the Work Item Handler for conversion, becoming additional pages in the resulting PDF.
This handler of the Work Item Handler supports the following file formats: pdf, doc, docx, rtf, ppt, pptx, xls, xlsx, odt, ods, odp, msg, eml, zip, rar, and 7z.
However, some attachments cannot be converted by Rendition Server, leading to the following error:
The request id="xxxxxxx-xxx-xxxx-xxx-xxxxxxx" contains elements that couldn't be converted to a PDF
To make possible the conversion of these files with unsupported attachment formats, Rendition Server offers the possibility to use regular expressions matching the names of file attachments to discard. To do that, you need to create a strategy based on a Work Item Handler.
Automated Removal using <FileExtractionDiscard>:
- This parameter can be configured through the administration interface http://localhost/config/StrategyConfig/Edit. Select the preferred strategy.
- If the parameter is not present in the WorkItemHandler section, choose it from the parameter list and click "Create" on the right.
The default value removes Adobe Acrobat Distiller Job Configurations (.joboptions).
This default regular expression has the following components:
-
"|" separates individual search terms and functions as "or." "^" indicates that the search term begins with the following character string.
-
"$" indicates that the search term ends with the preceding character.
-
.joboptions$| <*.joboptions> Configuration files containing settings from Acrobat, were already filtered by default before Release 3.5.1538.
-
^(pageEntities|PrintMetaData).json$| <*pageEntities.json> or <PrintMetaData.json> files with metadata from mobile apps, often from the iOS environment, were already filtered by default before Release 3.5.1538.
To add more file formats to the list, here are a couple of examples:
-
^Cx{20}GVService.Intern$| <CxxxxxxxxxxxxxxxxxxxxGVService.Intern> Binary file found in various public administration documents.
-
^[0-9a-f]{8}-([0-9a-f]{4}-){3}[0-9a-f]{12}.pdf$ Attachments in GMM documents, "problematic" attachments, XML, or HTML contents in PDF files. Attachments from provided examples are always named GUID.pdf, for example, <af88bf31-14a4-4b32-8dc4-30ce37491122.pdf>.
-
\.exe$ *.exe, filter out executables from attachments.
-
\.ps7$ *.ps7, filter out attached signatures.
-
\.doc$ *.doc, filter out Word binaries whose processing is blocked by Trust Center settings in Word itself.
The RegEx as a value for the parameter <FileExtractionDiscard> for this extended file discard list would look like this:
\.joboptions$|^(pageEntities|PrintMetaData).json$|^Cx{20}GVService.Intern$|^[0-9a-f]{8}-([0-9a-f]{4}-){3}[0-9a-f]{12}.pdf$|\.exe$|\.ps7$|\.doc$
For more information, please contact rendition_server@foxitsoftware.com