Extracting a Tagged PDF Structure Tree as XML
The project has produced a new tool to extract the Structure Tree
from a tagged PDF file as XML. This is a Lua script just requiring
the texlua
that is distributed with all major TeX distributions.
RelaxNG schema are also provided to validate the resulting XML.
A discussion page with full details
is available in the Project’s tagging-project
repository.
To see the tool in action, validating one of the Project’s example WTPDF files, you may use the form at