THE RANT /
THE SCHPLOG
Schmorp's POD Blog a.k.a. THE RANT
a.k.a. the blog that cannot decide on a name

This document was published 2015-07-29 08:33:05, and since then has not been materially modified.

POD =encoding Woes

Every few months, usually for a different module, I get mail by somebody that goes like this:

Other: MetaCPAN (or something else, it's usually MetaCPAN) displays your documentation wrongly, because you don't have an =encoding directive in your POD but use UTF-8 characters. You need to add =encoding utf-8 to fix this.

Me: This is actually a bug in MetaCPAN, you should report this.

More recently, there is a frequent continuation:

Other: Oh man, just slap the encoding directive in for the sake of it, you don't hurt anybody with it, do you?

Me: The =encoding directive is relatively new, and many POD parsers (especially in older Perls) stop processing the document when the hit it, leading to missing man pages. It's preferable to have some garbled characters on a broken webservice than no documentation in older perls.

And that was it then, usually.

Why is it OK to use UTF-8 in POD?

Well, if you think about it, the only non-ASCII encoding of any relevance in POD these days is UTF-8, and this probably has always been the case. Not supporting UTF-8 by default these days when you can is just insane.

Apparently, the people who authored the perlpodspec manpage in perl 5.22 (and earlier perls) thought the same. Not only is =encoding entirely optional, UTF-8 is one of the supported default encodings:

[If there is no byte order mark] the character encoding should be
understood as being UTF-8 if the first highbit byte sequence in the
file seems valid as a UTF-8 sequence, or otherwise as CP-1252 (earlier
versions of this specification used Latin-1 instead of CP-1252).

I don't know of any more "official" specification of POD than perlpodspec. Incidentally, perlpod recommends =encoding (something I definitely agree with), but only so that the formatting module doesn't have to guess.

The Plot Thickens

Recently, however, I was a victim of overzealous bad parsing myself: I have POD correctness tests based on Test::Pod for almost all my modules, that I use to check documentation for obvious errors before release - and after upgrading to stableperl 5.22, all of them started failing with POD documentation containing UTF-8 characters (usually names of people who contribute, as most contributions come from Europe or the far east these days).

And it seems MetaCPAN has followed suit, as it now display "1 POD Error" in big red letters for those modules.

The reason seems to be Pod::Simple emitting a warning when you use UTF-8 characters without =encoding utf-8, and Test::Pod erroneously treats this warning as a bug in the POD file.

I reported this bug against Test::Pod, and the reaction by David Wheeler (apparently nowadays the maintainer) stunned me: I should just not run pod test in my modules, or simply ignore the error, and whatever Pod::Simple does is officially what POD does, even when the perl pod specification disagrees with his module, his module is right anyways.

Great attitude, but besides the point - explaining that the main problem is Test::Pod treating the perfectly valid (and correctly-understood and parsed) POD as erroneous didn't help. I am not even sure he understood the problem, but in any case, whatever Pod::Simple does is the holy grail, and the spec and other formatters be damned, backwards compatibility to older perls and other formatters likewise.

And now?

I had to remove POD tests from my modules so my automated installs and testing would work again, which is a great loss, I think. I am also considering simply maintaining my own version of Pod::Simple and/or Test::Pod, but that is of course not something to decide on lightly - I only do this for modules that are effectively dead and unmaintained, but critical for legacy reasons (the full list at this time is fortunately very short: XML::Sablotron and Digest::MD6).

So... well, sucks, but that's it.