Andrew Stacey


About
Andrew Stacey
Information about my research, teaching, and other interests.

By: Andrew Stacey
Contact details


Andrew Stacey


blosxom icon


Mon, 11th Apr 2011 (HowDidIDoThat :: LaTeX)

LaTeX to Markdown+iTeX Conversion

LaTeX2iTeX is a perl script that converts LaTeX code to Markdown+iTeX (actually, the flavour of Markdown is maruku). It was written to facilitate the creation of nLab pages, but could be used for other circumstances, including being adapted to a full conversion from LaTeX code to XHTML.

The underlying engine is an implementation in perl of a subset of the TeX program. This is a perl module called Text::TeX. Not all of TeX is implemented, which has consequences for what documents can be converted.


Downloads

  • Text::TeX: perl module required for the program,
  • latex2itex: perl program that does the conversion,
  • latex.sty: definitions of some LaTeX necessary commands,
  • itex.sty: the special iTeX and Markdown commands,
  • PerlTeX: the whole lot as a single tar file.

The files should be installed according to the following scheme:

./latex2itex.pl
./latex.sty
./itex.sty
./Text/TeX.pm

To run the program (on a unix system), simply type

./latex2itex.pl < texfile.tex > output

Notes on the Implementation

This program attempts to implement a subset of TeX. It does not implement everything. Since the purpose is conversion from TeX to a markup format, the design of the program has been focussed on that goal. Other aspects may be implemented later.

Writing an Input File

The idea behind this program is that a user can write a page for, say, the nLab using the convenience of macros in ordinary TeX. A by-product of this is that some (La)TeX documents will be suitable for conversion to Markdown+iTeX via this program. However, the best way to use this program is to write the document knowing that it will be converted by this program and thus only use commands recognised by it. (Of course, one can define new commands but the base commands should be those listed below.)

In particular, importing a standard package file is almost certainly not going to work. (Indeed, at present, the \usepackage command is not implemented, though the \input primitive is.) The issue is less about the input as the output: it is not possible to automatically know what the correct output should be for the commands from an arbitrary package.

Primitives
  • \advance
  • \def: all \defs and \lets are \long and none are \outer
  • \edef
  • \input: doesn't expand its filename argument at present
  • \csname
  • \endcsname
  • \catcode
  • \bye
  • \par
  • \let
  • \relax
  • \show
  • \showthe
  • \expandafter
  • \if
  • \ifx
  • \ifcat
  • \ifnum
  • \iftrue
  • \iffalse
  • \ifinner
  • \ifvmode
  • \ifhmode
  • \fi
  • \else
  • \begingroup
  • \endgroup
  • \global
  • \futurelet
  • \string
  • \message
  • \countdef
  • \chardef
  • \char
  • \escapechar
  • \count
  • \number
  • \noexpand

LaTeX Commands

The included latex.sty defines the following commands. The definitions are simply copied from latex.ltx with only very minor modification. (Other commands are defined in order to make these work.)

  • \makeatletter
  • \makeatother
  • \gdef
  • \xdef
  • \newif
  • \newcommand
  • \renewcommand
  • \newenvironment
  • \renewenvironment

iTeX Commands

The included itex.sty defines the following commands. (Other commands are defined in order to make these work.)

  • \newitexcommand: This defines a command that is passed "as-is" to the output. The full syntax is \newitexcommand{\command}[n] where [n] is the number of arguments. The arguments are expanded as normal but placed in braces so that they "look right".
  • \newitexenvironment: This is the environment equivalent of \newitexcommand.

These commands are then used to declare all the itex commands as "itex commands" so that they pass through the parser. For example:

\newitexcommand{\alpha}

means that \alpha in the source becomes \alpha in the output.

Other commands and environments are also defined.

  • \tableofcontents: expands to * tic\n {: toc}\n
  • \citeyear: expands to [#1](##1)
  • \cite: expands to [#1](##1)
  • \emph: expands to *#1*
  • \parbox: expands to +--\n #2\n =--}
  • \section: expands to \par### #1 ###\par
  • \label: expands to {: \##1}
  • \textbf: expands to **#1**
  • \textup: expands to _#1_{: style="font-style: normal"}
  • \subsection: expands to \par#### #1 ####\par
  • \textrm: expands to \text{#1}

There is an environment called itexenv which is an auxilliary environment used to define environments such as center, definition, theorem and so forth.

Other things to be aware of are that some catcodes are changed. In particular, $ is made "other" so that mathmode changes are passed "as is". The angle brackets, < and >, are made active so that they can expand to \lt and \gt. The superscript and subscript characters are also made active so that their arguments can be expanded as necessary before being inserted back into the stream. This does mean that some things that work in TeX do not work in this program, in particular a^\mathcal{A} works in TeX but not in this program.

The following accent and symbol commands are also defined: \", \``,\',\^,\~,\c,\v,.,\ae,\AE,\oe,\OE,\aa,\AA,\o,\O,\ss,\P`.

Ideas For Further Development

There are several obvious things missing.

  • Paragraph Indentation: In Markdown, various environments are continued by indenting or prefixing subsequent paragraphs. This isn't completely implemented.

  • Maths Mode: Although most mathematical stuff is simply passed through, there are things that should be different in maths mode to text mode; a proper implementation of maths mode would enable this.

  • Error Handling: The error handling is appalling. It dies on just about any error, which isn't all that useful.

Acknowledgements

The inner workings of this program were worked out by consulting with the excellent TeX by Topic book (texdoc texbytopic), trial and error with TeX programs, and asking questions on http://tex.stackexchange.com.

[Full link]
Last modified on:
Mon, 11th Apr 2011