3 minute read

A few weeks ago, @fuglede demonstrated that it was possible to embed llama2 generative AI models inside TrueType fonts using the experimental HarfBuzz WASM shaper engine. Today, we take things a step further by embedding T5 (text-to-text transfer transformer) AI models inside fonts using the same method, granting our fonts the power of text translation. No more localization files or language packs required to port apps to other languages!

Subtitles from Sintel. Original English vs font-translated German.

The source code for translate.ttf can be found here.

Installation and usage

translate_de.ttf can be downloaded from here. This font renders the original English text in German. On Ubuntu, the font can be installed to ~/.local/share/fonts/.

Much like llama.ttf, we will need to build wasm-micro-runtime (WAMR) and HarfBuzz. To take full advantage of WebAssembly SIMD instructions, we will also need to enable the LLVM JIT compiler in WAMR.

The following assumes you are on Ubuntu 22.04 or similar. Adjust as needed for your platform.


# Install dependencies
sudo apt install git cmake ccache pkg-config libtool build-essential \
  python3-pip patchelf ragel llvm-15-dev libfreetype-dev
python3 -m pip install ninja

# Let's install all libs under /opt
sudo mkdir -p /opt/translate.ttf
export LD_LIBRARY_PATH=/opt/translate.ttf/lib:$LD_LIBRARY_PATH
export CFLAGS=-I/opt/translate.ttf/include
export CXXFLAGS=-I/opt/translate.ttf/include
export LDFLAGS=-L/opt/translate.ttf/lib

# Build WAMR
git clone https://github.com/bytecodealliance/wasm-micro-runtime.git \
  --branch WAMR-2.1.0 --single-branch --depth 1
cd wasm-micro-runtime/wamr-compiler
cd ..
cmake -B build \
  -DCMAKE_INSTALL_PREFIX=/opt/translate.ttf \
cmake --build build --config Release --parallel
sudo cmake --build build --config Release --target install

# The WAMR shared object doesn't appear to pull in libLLVM even though it's
# required at runtime, so add it here
sudo patchelf --add-needed libLLVM-15.so /opt/translate.ttf/lib/libiwasm.so

# Build HarfBuzz
git clone https://github.com/harfbuzz/harfbuzz.git --branch 8.5.0 \
  --single-branch --depth 1
cd harfbuzz
./configure --with-wasm=yes --prefix=/opt/translate.ttf
make -j4
sudo make install

When running applications such as gedit, set LD_LIBRARY_PATH or use LD_PRELOAD with the compiled libharfbuzz.so and libiwasm.so.

Translating subtitles with FFmpeg

To showcase the possible applications of this text translation font, let’s do some movie subtitle translation with FFmpeg. FFmpeg’s subtitle burn-in feature takes a video and its subtitles file, then renders a new file with the subtitles drawn directly onto the video. When using libass as the subtitle filter, FFmpeg will invoke HarfBuzz to shape the text. That’s where our custom font will come in.

We can test this with Sintel, a 2010 animated short film by the Blender Foundation. The film and .srt subtitles files can be downloaded from durian.blender.org. For this test, we will use the 720p MKV render and the English subtitles.

First, convert the subtitles file to ass (Advanced SubStation Alpha) format:

ffmpeg -i sintel_en.srt sintel_en.ass

Open sintel_en.ass and modify the requested font from Arial to OpenSans. Make sure you have translate_de.ttf installed locally.

Ensure that LD_LIBRARY_PATH or LD_PRELOAD is set to pull in libharfbuzz with WASM support. Then, subtitles can be generated with:

ffmpeg -i Sintel.2010.720p.mkv -vf "ass=sintel_en.ass" sintel-subtitled.mp4

As the video renders, FFmpeg should print to console that the translation font is being used. You should also see WASM debug logs appear in the console. These logs will show what text is being “shaped” by HarfBuzz and the corresponding results from translation.

To extract a single frame from the render instead:

ffmpeg -i Sintel.2010.720p.mkv -vf "ass=sintel_en.ass,select=eq(n\,3122)" -vframes 1 frame.png

Technical notes

translate.ttf uses the Candle ML framework to perform inference with candle-quantized-t5. Candle’s T5 WASM example was ported over to support WAMR without a dependency on a JavaScript enviroment. Specifically, the dependency on js-sys was removed and getrandom was patched to use WASI APIs instead.

The WASM module produced by translate.ttf uses WASM SIMD instructions, so WAMR must be built with LLVM JIT support. WAMR does have tiered running modes, which could be enabled when building WAMR, though they will not work with translate.ttf’s SIMD instructions.

By following the strings submitted to HarfBuzz, I observed that gedit implements text wrapping by first shaping the entire paragraph block to get the full length, then calculating how much text should be rendered per line, then re-shaping each line. The way translate.ttf currently performs inference is by breaking up a block of text into sentences, translating each sentence separately. This produces the desired result for the initial shaping request with the entire paragraph, but breaks when the second pass provides partial sentences due to line breaks inserted by text wrapping.