Extracting a Tagged PDF Structure Tree as XML

The project has produced a new tool to extract the Structure Tree from a tagged PDF file as XML. This is a Lua script just requiring the texlua that is distributed with all major TeX distributions.

RelaxNG schema are also provided to validate the resulting XML.

A discussion page with full details is available in the Project’s tagging-project repository.

To see the tool in action, validating one of the Project’s example WTPDF files, you may use the form at

https://texlive.net/showtags?doc=mathml-AF-ex2-se