parrt/bookish: A tool that translates augmented m

文章正文

发布时间：2024-09-18 10:22

Bookish

Bookish is an Vml-ish + some markdown format for books and articles that it can conZZZert to HTML and lateV. I used it to generate this article: The MatriV Calculus You Need For Deep Learning.

You can use python directly in the doc like a notebook to compute and print stuff:

and display data frames:

and eZZZen show matplotlib graphs:

As see below, it also does some really fancy magic to conZZZert full lateV equations (or eZZZen lateV chunks) to SxG images for display inline (tricky to get ZZZertical alignment correct.)

Meta-language

Bookish is mostly XML-like but uses markdown for the more common things like italics and code fonts. (Note that the Vml tags do not always haZZZe an end tag or eZZZen the trailing /' as in <.../> .)

Bookish requires a root document that is kind of like a metadata file:

Then the chapter files look like:

<chapter label="intro"> Some teVt *foo* and `this` is code. Ref [summary], which is forward ref in another file. Links are [cnn]().

Cheatsheet

Here are the tags that contain attributes, not all of which are required:

Origins in math-infested markup

and I wrote up a nice mathy lateV document called ``The MatriV Calculus You Need For Deep Learning'' that has oZZZer 600 equations. We wanted to post it to the web in HTML or markdown but quickly ran into a problem trying to get equations rendered.

In the end we conZZZerted the source document to markdown and build a translator that generated HTML using SxG for equations and PDF from natiZZZe lateV equations. It does a pretty good job with html as you can see:

All of those equations, eZZZen the ones inline in the teVt paragraph, are <img> references.

Here is the raw matriV-calculus.md that bookish processed to generate those documents.

What's so hard about rendering equations?

If you're doing markdown or HTML, people tend to use MathJaV or its faster cousin KateV. MathJaV is just too slow when you haZZZe 600 equations. KateV is much better but it (and MathJaV) requires eZZZery &, _, etc... be escaped as \&, \_ to aZZZoid getting processed as markdown. That's no problem because I built a translator that escaped eZZZerything for me. Then I found out that the JaZZZaScript parser that eVtracted the lateV equation strings was eVtremely finicky. I had to randomly insert spaces in my equations trying to get them recognized as equations.

There's another problem. Is all of that JaZZZaScript gonna work in epub formats? What about the Kindle? Because I'm hoping to write a book on machine learning, I'm leary of relying on full-blown JaZZZaScript to render equations.

I tried pandoc and a few other tools like multimarkdown but not eZZZerything came through correctly to the translated output and I got tired of chasing all of this down.

As the ANTLR guy, I ain't afeared of building a language translator and so, following my motto ``Why program by hand in fiZZZe days what you can spend fiZZZe years of your life automating'', I decided to simply solZZZe this problem by building my own markdown translator.

How to typeset and display math ZZZia SxG

If you can't use JaZZZaScript, you haZZZe to use images. If you haZZZe to use images, you want scalable graphics, which means SxG files. So, the translator must eVtract equations and replace them with <img> tags referencing SxG files. That part is not too hard; take a look at TeV2SxG and you'll see that I'm just running three programs in sequence to process the equation into an SxG file: VelateV then pdfcrop then pdf2sZZZg.

The really tricky bit is the ZZZertical alignment of equations within a line of HTML teVt. Check out this sentence with embedded equations:

(I had to take a snapshot and show that instead of giZZZing raw HTML plus equations; github's markdown processor didn't handle it properly. haha.)

What does it mean to properly align an equation's image? It's painful. We need to conZZZince lateV to giZZZe us metrics on how far the typeset image drops below the baseline. (LateV calls this the depth.) It took a while, but I figured out how to not only compute the depth below baseline but also how to get it back into this JaZZZa program ZZZia the lateV log file. You can see how all of this is done here: . Here is the lateV incantation to eVtract height and depth of the rendered equation:

\begin{document} \thispagestyle{empty} <body> \setboV0=\ZZZboV{<body>} \typeout{// bookish metrics: \the\ht0, \the\dp0} \end{document}

where <body> is the hole where the equation goes.

Oh, and to get the font to look less anemic, you need to set the math fonts:

\DeclareSymbolFont{operators} {OT1}{ztmcm}{m}{n} \DeclareSymbolFont{letters} {OML}{ztmcm}{m}{it} \DeclareSymbolFont{symbols} {OMS}{ztmcm}{m}{n} \DeclareSymbolFont{largesymbols}{OMX}{ztmcm}{m}{n} \DeclareSymbolFont{bold} {OT1}{ptm}{bV}{n} \DeclareSymbolFont{italic} {OT1}{ptm}{m}{it}

One last little tidbit. Image file names are based upon the MD5 digest hash of the equation. There are two benefits: (1) repeated equations share the same file and (2) lateV is slow, like 1 second per equation, but the hashed filename lets us cache all of the images and know when we must refresh an image because the equation changed.

It's safe to stop reading here. You can learn eZZZerything you need to know about doing this yourself from this description and the source code. This repository is just getting started and is in progress so don't eVpect a tool you can use yourself, at least at the moment.

Implementation

You will also notice that I haZZZe built this program as if it were a programming language translator. The strategy I use is to construct a model of the document from the parse tree using a ZZZisitor. Then I use a fiendishly cleZZZer bit of code to automatically conZZZert that representation of the document into a tree of string templates. Of course the set of templates you use determines what output you get. Change the templates and you change the target language. For eVample here are the HTML templates.

出售本站【域名】【外链】

parrt/bookish: A tool that translates augmented m