Dompdf and special characters
When generating dynamic PDF documents in PHP I normally use the excellent TCPDF library, but in my last project I wanted to try out another library called domdf.
From the dompdf website:
dompdf is an HTML to PDF converter. At its heart, dompdf is (mostly) CSS2.1 compliant HTML layout and rendering engine written in PHP. It is a style-driven renderer: it will download and read external stylesheets, inline style tags, and the style attributes of individual HTML elements. It also supports most presentational HTML attributes.
So dompdf basically makes it possible to create a page layout with good ol' HTML and CSS and then convert it directly to PDF format on the fly. This is extremely cool, both because HTML and CSS are perfect for this kind of stuff, and it makes for a great template system for your PDF files that is easy and painless to edit.
But...
When I was about 70% done with the task, I discovered a nasty problem with dompdf. When using special characters like the danish Æ, Ø and Å, something goes totally wrong with the calculation of character widths, resulting in a lot of messed up text that can't be read. To demonstrate the problem, I have made a small PHP script:
<?php require_once "dompdf/dompdf_config.inc.php"; $html = "<html>" . "<head>" . "<meta http-equiv='content-type' content='text/html; charset=UTF-8' />" . "<style type='text/css'>" . " body {" . " font-family: \"Verdana\";" . " font-size: 12px;" . " }" . " p {" . " width: 512px;" . " border: 1px dotted red;" . " }" . "</style>" . "<body>" . "<h2>Text without special characters</h2>" . "<p>Nunc ut nibh non nulla vulputate placerat non quis leo. Donec varius, felis vel placerat suscipit, libero eros lacinia nulla, ut interdum magna urna eleifend eros. Duis nec porttitor arcu. Integer vitae est adipiscing nisl mollis gravida. Proin metus lorem, ullamcorper vitae eleifend et, vehicula non elit. Donec dignissim mollis rutrum. Nunc vitae neque non nisl placerat sodales. Etiam accumsan diam id lectus hendrerit iaculis. Cras hendrerit arcu in libero euismod ac viverra ante euismod. Nullam felis tellus, dapibus dictum aliquam nec, posuere ut neque. Fusce nec ipsum velit. Nam quam orci, accumsan in tempus non, bibendum eu diam. Integer nec pulvinar velit. Nunc laoreet diam ante. Ut vitae nisl tortor. Phasellus est sapien, mollis sit amet tincidunt vel, eleifend posuere leo. Aenean nec orci mi. Nam diam tellus, egestas et consectetur vulputate, elementum ac lectus. Duis ac sem at elit facilisis pulvinar.</p>" . "<h2>Text with special characters</h2>" . "<p>Væstibulum vænønåtis erås et velit imperdiæt ut sållicætudin elit ornære. Mæcenæs egåt augue urnø. Aliquam rhoncus viverra blandit. Donæc imperdiet leo porttitor læctus frångilla quis porttitår læctus vestibulum. Vestibulåm pretium ullæmcårper døgnissim. Cras ut ærat non turpås porttitor commodo. Nunc euismod mættis tårtør quis fringilla. Integår viværrå, eros vel ælæmentåm feugiat, tellus mi pållentæsque nøsi, a dictum nibh magna nec ipsum. Morbi mauris dui, consectetur vitae fermentum vel, lacinia condimentum neque. Etiam faucibus, libero et pretium semper, orci justo ornare risus, id scelerisque purus risus eu arcu. In hac håbitæsse platæa dictåmst. Væstibulåm cøndimæntåm, læctus egøt cøndimæntum løbortæs, lacås ipsåm færmentåm auguøe, vitæ pølvinår lacøs dui sed est. Sed et nisl vel mægna laoræt rutråm quis id mægnæ. Donæc vænenåtis vålputæte påsuære. Sed in tøllæs non løråm dægniæsim hændrerit søt amåt in pærås.</p>" ."</body>" ."</html>"; $pdf = new DOMPDF(); $pdf->load_html($html); $pdf->render(); $pdf->stream("file.pdf");
When looking at the generated file, the first paragraph with no special characters renders just as it's supposed to, but in the second paragraph with special characters the letters start to collide, and the width of the container is no longer respected.
This was almost a show-stopper because we needed to render danish text, but after a couple of hours of research, I luckily found out how to fix the problem.
The fix
The first thing that needs to be done is to generate a set of custom font files by using the ttf2ufm program inside dompdf/lib/ttf2ufm. The program is an exe file, so you need to get your hands on a windows machine, if you're not already using one. The program needs a special .map file, which can be taken from the TCPDF library, inside the tcpdf/fonts/utils/enc folder. You need to take the file with the correct ISO standard for the language you intend to use. For the danish language and most other western European languages, you need the iso-8859-1.map, but have a look at this Wikipedia article on the ISO/IEC 8859 standard, to see what file you need to use.
Open up a command prompt and browse to the directory with ttf2ufm and execute the following commands:
> ttf2ufm -L iso-8859-1.map C:\WINDOWS\fonts\arial.ttf Arial > ttf2ufm -L iso-8859-1.map C:\WINDOWS\fonts\arialbd.ttf ArialBold > ttf2ufm -L iso-8859-1.map C:\WINDOWS\fonts\arialbi.ttf ArialBoldItalic > ttf2ufm -L iso-8859-1.map C:\WINDOWS\fonts\ariali.ttf ArialItalic
In the example above, I generate the files needed to use the Arial font, but you can of course change it to whatever you like. The procedure is the same.
You should now see a bunch of afm, ufm and t1a files at the same location as the ttf2ufm program. Copy all the afm and ufm files over to the dompdf fonts folder.
The next thing that needs to be done is to open the file dompdf_font_family_cache.dist inside the dompdf/lib/fonts folder. This file is used to tell the library, what files to use when rendering fonts. Add the following code just after array (:
'arial' => array ( 'normal' => DOMPDF_FONT_DIR . 'Arial', 'bold' => DOMPDF_FONT_DIR . 'ArialBold', 'italic' => DOMPDF_FONT_DIR . 'ArialItalic', 'bold_italic' => DOMPDF_FONT_DIR . 'ArialBoldItalic' ),
The last thing to do is to replace cpdf_adapter.cls.php and class.pdf.php with the ones from revision 191 of the dompdf project.
The result
After doing all of the above and changing the CSS to use the Arial font, the rendered PDF file looks much better! No more colliding letters and the width of the parent is respected.
To make life easier for you, I have made a version of dompdf with all the changes ready for download here. This version should be able to render special characters, but remember to set the font to Arial in your CSS.
No related posts.
January 4th, 2010 - 12:15
Thanks for your post!:) I was looking for a dompdf version already prepared to be used with special Arial font characters and I’ve finally found it. I tried to generate my own Arial font as I read how to do that on other blogs, but I had no success.
So now I’m glad I found your post, but..
I think there’s a problem with Arial bold. When I open my dompdf generated pdf, Adobe Reader says “The font ‘ArialBold’ contains a bad /BBox” and bold text appears like normal.
Any idea why is this happening?
Thank you!
January 6th, 2010 - 09:47
I’m glad that you found something, you could use on my site! :)
Yeah, that’s right, there seems to be a problem with the bold font. I haven’t seen that before now, because I don’t get that error when opening the generated file in Preview for Mac. I’ll try to play around with the different parameters for ttf2ufm program and see if I can figure something out.
January 13th, 2010 - 16:56
Thank you Michael!
Would be great to hear from you again with the solution! :)
January 20th, 2010 - 09:25
HI
When I tried to use this version on DOMPDF
text was generated as Arial
However I dont know why the images are not getting shown on page?
it shows small red cross where image should be
But using earlier version i.e. dompdf-0.5.1 I could see images properly
Please help
January 23rd, 2010 - 12:16
Hi Rash,
That sounds like a really strange problem! I just tried to do a test with my modified library, but I was unable to reproduce the problem you’re talking about. I tried to add a simple PNG image, and it rendered without any problems. Maybe it could be a problem with your HTML? – You’re welcome to send me your script and I’ll take a look at it.