Fixing the POD synopsis in OSX – take 2 (perldoc, nroff and UTF-8)

Ever copied and pasted a chunk from perldoc output and found you were getting mysterious errors from perl? I have.

I’ve learnt to rewrite the ‘-’ characters because although they look like ‘-’ characters they’re really a unicode HYPHEN: U+2010. Some other chars get mangled too, but that’s the most frequent problem for me.

So I was delighted to see a blog post by marcus ramberg called Fixing the POD synopsis in OSX wherein he fingers nroff as being the problem and gives a simple solution:

alias perldoc='perldoc -t'

Trouble is using perldoc -t means you loose the nice bold text that nroff gives you. So I went digging...

It seems the problem only affects people using UTF-8 and that nroff has a -T option that lets you specify an output encoding to use. So if perldoc ran 'nroff -Tascii' instead of plain 'nroff' that would avoid the hypen problem and let us keep the bold text.

It turns out that perldoc has an option to specify the nroff command to use, so the solution is simple:

alias perldoc="perldoc -n 'nroff -Tascii'"

This is, of course, still a hack. My main worry is that pod docs using non-ascii characters may get mangled. A much better fix would be to arrange for the ascii characters to not get mapped to unicode at all. So I went digging again...

nroff calls groff -m tty-char ... and tty-char refers to a tty-char.tmac file that defines the character mappings. The groff man pages point me to groff_tmac man pages which tell me I can get groff to look for .tmac files elsewhere by passing a -Mdir option or setting the GROFF_TMAC_PATH environment variable.

I looked at the default file, /usr/share/groff/1.19.2/tmac/tty-char.tmac on my Mac, and... decided it was time to go to sleep! The formatting is probably simple enough but I'm out of tuits.

So, what's needed is for someone to determine what change is needed to the tty-char.tmac file, or the files it refers to, to avoid unwanted conversions to unicode. Then put a modified file into a directory and either add a -Mdir option to the nroff alias above, or set the GROFF_TMAC_PATH environment variable. Setting the env var has the benefit of 'fixing' all the man pages.

So, anyone want to dig deeper? (For all I know the solution can already be found on google...)

About these ads

7 thoughts on “Fixing the POD synopsis in OSX – take 2 (perldoc, nroff and UTF-8)

  1. The *roff code for a hyphen is \-. I can’t see that mentioned anywhere in that file. I have a sneaky suspicion that it’s a “builtin” thing…

    Looking in /usr/share/groff/1.19.2/font/devascii/R, I can see a line:

    \- 24 0 0055

    According to the docs, that means that when it sees \- on input, it gets mapped to octal 55 (or a hyphen). So that’s not it either.

    Just out of interest, what is the output of locale for you? I wonder if that may be affecting things somehow…

  2. I added the following to .bashrc at some stage but have lost track
    of the details.

    export MANOPT=’-E ascii’ # avoid apostrophes going unicode

    • Interesting, but I don’t see a MANOPT env var or a -E option for man on Linux (CentOS 5) or OS X (Leopard).
      /etc/man.config defines the args passed to nroff etc., and that has some comments re character sets.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s