LaTeX to Markdown conversion

LaTeX to Markdown conversion#

Script to convert LaTeX files to Markdown, for use in Jupyter Books. Goal is to have a script that can fully translate content-based LaTeX, so one can write in LaTeX and compile pdfs from there, while also generating Jupyter Books directly.

Please note that the script is under active development. We have used it to succesfully translate two whole books from LaTeX to MarkDown (and on to Jupyter books), but we would like to test it on more source files. Please let us know if you’re interested in trying it on your own LaTeX, as that can help us improve the script further.

An extension to also include custom-drawn pdf figures will be included in the main script shortly.

Current status#

The script latextomarkdown.py reads a single LaTeX file (either specified in the variable ‘filename’ in the main function at the end of the script, or (preferred) passed as an argument to the script). The texfile is assumed to contain only content - embed in a larger texfile with \documentclass, preambles, and a \begin{document} .. \end{document} pair to run as LaTeX. Outputs filename.md (so currently testfile.md), in which the following have been done:

  • chapters, sections, subsections converted to MarkDown level 1/2/3 sections, with labels (if label given on next line after \chapter{} / etc.)

  • Opening quotes (`) replaced by ‘.

  • Non-breaking spaces (~) replaced by html non-breaking spaces ( ), which are also recognized in markdown.

  • Some of my personal newcommands are converted: \dd, \bvec, \unitvec, \diff, and \inprod.

  • Equations are converted to MarkDown equations. Labels are retained (assumed on next line after \begin{equation}).

  • Align environments are converted to MyST math blocks (shown as block in most Markdown renderers). Unfortunately these can only contain a single label. Therefore, if an align environment consists of multiple labeled equations, we break it up into multiple blocks, unless the whole block is enclosed in subequations, in which case we search for references to the sub-equation numbers and add the relevant letter (letters not shown in equation number, but this is the best we can do for now).

  • Figures are converted to MyST figure blocks (shown as block in most Markdown renderers). Captions and labels are retained. Pdf extensions are replaced by svg (as most browsers cannot render pdf). By default, the figure width will be the width of the column in the Jupyter book; if you wish to specify a different width, include a comment %Figurewidth: X (with X in pixels, just a number). If a python file for building the figure is provided, the figure is replaced with the python code, and a preamble is added to the MarkDown file to instruct JupyterBooks to build the python code. Note that this python file should be in the subfolder images/ of the main folder (or a subfolder of images/ if the Latex file itself is also in a subfolder).

  • Tables are converted to MyST table blocks (shown as block in most Markdown renderers). Captions and labels are retained.

  • Internal references to sections, figures, tables, and equations are converted to MyST references.

  • Citations are converted to MyST citations. Requires a bibtex file. There are two options: 1. have a block of references at the bottom of the page (if there are citations, this reference block is added) and 2. create a separate file with the references. The latter option is preferred if the number of citations is relatively large. Note that JupyterBook throws an error if you have the same citation in two pages (but not if you have the same reference to a citation on a separate page).

  • Footnotes are found and converted to actual footnotes in the Markdown page (in a format that makes them links in JupyterBook).

  • The LaTeX commands \emph, \textit, \textbf and \texttt are replaced by their Markdown equivalents;

  • the LaTeX command \index (and its argument) are replaced by JupyterBook equivalents (admonitions in MarkDown).

  • Info boxes are converted to admonition blocks. NB: These are specific for my own books, I found that many people have something similar, but always their own setup.

  • Problems are converted to Sphinx exercises. The Sphinx exercise gets a title if, in LaTeX, the first line of the problem file is a comment of the form %Problemtitle{X}, % Problemtitle X, or %Problemtitle X (with X the desired title). Any consecutive lines that start with a comment are removed, in particluar to allow for a line % Source. NB: Problems can be read from the file, or from separate files (one per problem) that are included through \input{}. For problems with sub-problems only separate files work.

  • (Worked) examples that are included in LaTeX comments from %Worked example start to %Worked example end are converted to blocks. Optionally, include a %Worked example solution somewhere inside the block.

LaTeX template#

Is available here. The template should be converted perfectly to markdown with the latextomarkdown.py script, and also shows how to include image source files, problem source files and other commonly used features. Please note that the main file of the LaTeX template only includes other files and LaTeX settings; this file cannot (and need not) be converted to MarkDown (and will throw an error if you try).

To do#

  • Test extensively on other LaTeX files.