translate.ttf: A translation engine in your font
A few weeks ago, @fuglede demonstrated that it was possible to embed llama2 generative AI models inside TrueType fonts using the experimental HarfBuzz WASM shaper engine. Today, we take things a step further by embedding T5 (text-to-text transfer transformer) AI models inside fonts using the same method, granting our fonts the power of text translation. No more localization files or language packs required to port apps to other languages!
Subtitles from Sintel. Original English vs font-translated German.
The source code for translate.ttf
can be found here.
Installation and usage
translate_de.ttf
can be downloaded from here. This
font renders the original English text in German. On Ubuntu,
the font can be installed to ~/.local/share/fonts/
.
Much like llama.ttf
, we will need to build wasm-micro-runtime (WAMR)
and HarfBuzz. To take full advantage of WebAssembly SIMD instructions,
we will also need to enable the LLVM JIT compiler in WAMR.
The following assumes you are on Ubuntu 22.04 or similar. Adjust as needed for your platform.
WORKDIR=$(pwd)
# Install dependencies
sudo apt install git cmake ccache pkg-config libtool build-essential \
python3-pip patchelf ragel llvm-15-dev libfreetype-dev
python3 -m pip install ninja
# Let's install all libs under /opt
sudo mkdir -p /opt/translate.ttf
export LD_LIBRARY_PATH=/opt/translate.ttf/lib:$LD_LIBRARY_PATH
export CFLAGS=-I/opt/translate.ttf/include
export CXXFLAGS=-I/opt/translate.ttf/include
export LDFLAGS=-L/opt/translate.ttf/lib
# Build WAMR
cd ${WORKDIR}
git clone https://github.com/bytecodealliance/wasm-micro-runtime.git \
--branch WAMR-2.1.0 --single-branch --depth 1
cd wasm-micro-runtime/wamr-compiler
./build_llvm.sh
cd ..
cmake -B build \
-DCMAKE_INSTALL_PREFIX=/opt/translate.ttf \
-DWAMR_BUILD_REF_TYPES=1 \
-DWAMR_BUILD_LIBC_WASI=1 \
-DWAMR_BUILD_JIT=1 \
-DWAMR_BUILD_FAST_JIT=0 \
-DWAMR_BUILD_LAZY_JIT=0 \
-DWAMR_BUILD_STATIC=0
cmake --build build --config Release --parallel
sudo cmake --build build --config Release --target install
# The WAMR shared object doesn't appear to pull in libLLVM even though it's
# required at runtime, so add it here
sudo patchelf --add-needed libLLVM-15.so /opt/translate.ttf/lib/libiwasm.so
# Build HarfBuzz
cd ${WORKDIR}
git clone https://github.com/harfbuzz/harfbuzz.git --branch 8.5.0 \
--single-branch --depth 1
cd harfbuzz
./autogen.sh
./configure --with-wasm=yes --prefix=/opt/translate.ttf
make -j4
sudo make install
When running applications such as gedit, set LD_LIBRARY_PATH
or use LD_PRELOAD
with the compiled libharfbuzz.so
and libiwasm.so
.
Translating subtitles with FFmpeg
To showcase the possible applications of this text translation font, let’s
do some movie subtitle translation with FFmpeg. FFmpeg’s subtitle burn-in
feature takes a video and its subtitles file, then renders a new file
with the subtitles drawn directly onto the video. When using libass
as the subtitle
filter, FFmpeg will invoke HarfBuzz to shape the text. That’s where our
custom font will come in.
We can test this with Sintel, a 2010 animated short film by the Blender Foundation. The film and .srt subtitles files can be downloaded from durian.blender.org. For this test, we will use the 720p MKV render and the English subtitles.
First, convert the subtitles file to ass
(Advanced SubStation Alpha) format:
ffmpeg -i sintel_en.srt sintel_en.ass
Open sintel_en.ass
and modify the requested font from Arial
to OpenSans
. Make sure
you have translate_de.ttf
installed locally.
Ensure that LD_LIBRARY_PATH
or LD_PRELOAD
is set to pull in libharfbuzz
with WASM support. Then, subtitles can be generated with:
ffmpeg -i Sintel.2010.720p.mkv -vf "ass=sintel_en.ass" sintel-subtitled.mp4
As the video renders, FFmpeg should print to console that the translation font is being used. You should also see WASM debug logs appear in the console. These logs will show what text is being “shaped” by HarfBuzz and the corresponding results from translation.
To extract a single frame from the render instead:
ffmpeg -i Sintel.2010.720p.mkv -vf "ass=sintel_en.ass,select=eq(n\,3122)" -vframes 1 frame.png
Technical notes
translate.ttf
uses the Candle ML framework to perform inference
with candle-quantized-t5
. Candle’s T5 WASM example
was ported over to support WAMR without a dependency on a JavaScript enviroment. Specifically, the dependency on js-sys
was removed and
getrandom
was patched to use WASI APIs instead.
The WASM module produced by translate.ttf
uses WASM SIMD instructions,
so WAMR must be built with LLVM JIT support. WAMR does have tiered running modes,
which could be enabled when building WAMR, though they will not work with
translate.ttf
’s SIMD instructions.
By following the strings submitted to HarfBuzz, I observed that gedit implements text wrapping by first shaping the entire paragraph block
to get the full length, then calculating how much text should be rendered per line, then re-shaping
each line. The way translate.ttf
currently performs inference is by breaking up a block of
text into sentences, translating each sentence separately. This produces the desired
result for the initial shaping request with the entire paragraph, but breaks when the second
pass provides partial sentences due to line breaks inserted by text wrapping.