New features in pylatexenc 2

Brief list of new features

  • Improvements to LaTeX parser and its API (pylatexenc.latexwalker):

    • More powerful and versatile way of providing a “latex context” with a collection of known macros, environment definitions, and “latex specials” provided by the pylatexenc.macrospec module.

    • Support for arbitrary sequences of characters that have a special meaning in LaTeX, such as ‘&’, ‘#’, ‘``’, which are referred to as “latex specials”. A new node type (LatexSpecialsNode) represents such sequences of characters;

    • Support for arbitrary macro arguments & formats via custom parsing code. We support, for instance, \verb+...+-type constructs;

    • Better parsing of math mode, and support for display math modes;

    • Parsed LaTeX nodes (LatexNode‘s) now retain information about which part of the original string they represent, and therefore what their verbatim latex representation is;

  • Improvements to the latex2text:

    • New feature: chunks of text can be filled at a given column width for a more aesthetic result. This can be enabled with the flag fill_text=True|<column-width> in LatexNodes2Text‘s constructor.

    • The default handling of white space was changed. The flag strict_latex_spaces= now takes the value ‘macros’ as default, which is more reasonable in most cases;

    • Renamed macro specification classes MacroDefMacroTextSpec etc., include support for “latex specials”;

    • New flag math_mode= specifying how to convert math mode to text, extends and replaces keep_inline_math=;

    • Adapted for the updated latexwalker API.

  • New interface for pylatexenc.latexencode, with UnicodeToLatexEncoder and unicode_to_latex(). You can specify custom conversion rules, custom behavior for unknown characters, and more.

    Additional latex escapes from the unicode.xml file maintained at https://www.w3.org/TR/xml-entity-names/#source were added to the default set of latex codes for unicode characters. You can also opt to use only the rules from unicode.xml.

    The earlier function pylatexenc.latexencode.utf8tolatex() was poorly named, given that its argument was a python unicode string, not a utf-8-encoded string. The old function is still provided as is to keep existing code working.

  • Improvements to the parser may mean that the results might differ slightly from earlier versions.

    For instance, latexwalker now recognizes -- and --- as “latex specials”, and by default latex2text substitutes the corresponding unicode characters for en-dash and em-dash, respecitively. You can disable this behavior by filtering out the ‘nonascii-specials’ category from the default latex context database in latex2text:

    latex_context = latex2text.get_default_latex_context_db().filter_context(
        exclude_categories=['nonascii-specials']
    )
    l2t = latex2text.LatexNodes2Text(latex_context=latex_context, ...)
    ...
    
  • The three main modules can now be used in command-line: latex2text, latexencode and latexwalker. Run with --help for information about usage and options.

API Changes that might affect existing code

With the important changes introduced in pylatexenc 2.0, some parts of the API were improved and are not necessarily 100% source compatible with pylatexenc 1.x. Code that uses the high-level features of pylatexenc 1.x should run without any modifications. However if you are using some advanced features of pylatexenc, you might have to make some small changes to your code to adapt to the new API.

  • The specification of known macros, environments, and latex specials for both LatexWalker and LatexNodes2Text have changed. The specifications are now streamlined and organized into categories and stored into a LatexContextDb object (one for each of these modules).

    Previously, to introduce a custom macro in latexwalker, one could write:

    >>> # pylatexenc 1.x (obsolete in pylatexenc 2 but still works)
    >>> from pylatexenc.latexwalker import LatexWalker, MacrosDef, default_macro_dict
    >>> my_macros = dict(default_macro_dict)
    >>> my_macros['mymacro'] = MacrosDef('mymacro', True, 2)
    >>> w = LatexWalker(r'Text with \mymacro[yes]{one}{two}.', macro_dict=my_macros)
    >>> (nodelist, pos, len_) = w.get_latex_nodes()
    >>> nodelist[1].nodeoptarg
    LatexGroupNode(nodelist=[LatexCharsNode(chars='yes')])
    

    This code still works in pylatexenc 2.0. It’s however recommended to use the new interface, which is more useful and powerful (see doc of pylatexenc.macrospec). The above example would now be written as:

    >>> # pylatexenc 2
    >>> from pylatexenc.latexwalker import LatexWalker, get_default_latex_context_db
    >>> from pylatexenc.macrospec import MacroSpec
    >>> latex_context = get_default_latex_context_db()
    >>> latex_context.add_context_category('mymacros', macros=[ MacroSpec('mymacro', '[{{') ])
    >>> w = LatexWalker(r'Text with \mymacro[yes]{one}{two}.', latex_context=latex_context)
    >>> (nodelist, pos, len_) = w.get_latex_nodes()
    >>> nodelist[1].nodeargd.argnlist[0]
    LatexGroupNode(parsing_state=<parsing state 4504427096>,pos=18, len=5,
    nodelist=[LatexCharsNode(parsing_state=<parsing state 4504427096>,pos=19,
    len=3, chars='yes')], delimiters=('[', ']'))
    

    The same holds for latex2text.

    The pylatexenc.latexwalker.MacrosDef class in pylatexenc 1.x was rewritten and renamed pylatexenc.macrospec.MacroSpec, and corresponding classes pylatexenc.macrospec.EnvironmentSpec and pylatexenc.macrospec.SpecialsSpec were introduced. [pylatexenc.latexwalker.MacrosDef() is now a function that returns a MacroSpec instance.] The pylatexenc.latex2text.MacroDef and pylatexenc.latex2text.EnvDef were rewritten and renamed pylatexenc.latex2text.MacroTextSpec and pylatexenc.latex2text.EnvironmentTextSpec, and the class pylatexenc.latex2text.SpecialsTextSpec was introduced. [The earlier class names now represent functions that return instances of the new classes.]

    For LatexWalker, macro, environment, and latex specials syntax specifications are provided as pylatexenc.macrospec.MacroSpec, pylatexenc.macrospec.EnvironmentSpec, and pylatexenc.macrospec.SpecialsSpec objects, which extend and completely replace the MacrosDef object in pylatexenc 1.x.

    For LatexNodes2Text, specification of replacement texts for macros, environments, and latex specials are provided as pylatexenc.latex2text.MacroTextSpec, pylatexenc.latex2text.EnvironmentTextSpec, and pylatexenc.latex2text.SpecialsTextSpec objects, which replace replace the MacroDef and EnvironmentDef objects in pylatexenc 1.x.

  • Text replacements are gone in latex2text. If you used custom text_replacements= in LatexNodes2Text, then you will have to change:

    # pylatexenc 1.x with text_replacements
    text_replacements = ...
    l2t = LatexNodes2Text(..., text_replacements=text_replacements)
    text = l2t.nodelist_to_text(...)
    

    to:

    # pylatexenc 2 text_replacements equivalent compatibility code
    text_replacements = ...
    l2t = LatexNodes2Text(...)
    temp = l2t.nodelist_to_text(...)
    text = l2t.apply_text_replacements(temp, text_replacements)
    

    as a quick fix. It is recommended however to treat text replacements instead as “latex specials”. (Otherwise the brutal text replacements might act on text generated from macros and environments and give unwanted results.) See pylatexenc.macrospec.SpecialsSpec and pylatexenc.latex2text.SpecialsTextSpec.

  • The keep_inline_math= option was deprecated for both in LatexWalker and LatexNodes2Text (see issue #14). Instead, you should set the option math_mode= in LatexNodes2Text.

    The design choice was made in pylatexenc 2.0 to have LatexWalker always parse math modes, and have the textual representation be altered not by a parser option but by an option in LatexNodes2Text.

    Both LatexWalker and LatexNodes2Text accept the keep_inline_math= keyword argument to avoid breaking code designed for pylatexenc 1.x; the former ignores it entirely and the latter attempts to set math_mode= to a suitable value.

    The result might differ when you run the same code with pylatexenc 2.0. However you can restore the required behavior by simply replacing the following idioms as follows (recall that the keyword argument to latex_to_text() is the option passed to LatexWalker):

    LatexNodes2Text(keep_inline_math=True).latex_to_text(..., keep_inline_math=True)
      →  LatexNodes2Text(math_mode='verbatim').latex_to_text(...)
    
    LatexNodes2Text(keep_inline_math=True).latex_to_text(..., keep_inline_math=False)
      →  LatexNodes2Text(math_mode='with-delimiters').latex_to_text(...)
    
    LatexNodes2Text(keep_inline_math=False).latex_to_text(..., keep_inline_math=True|False)
      →  LatexNodes2Text(math_mode='text').latex_to_text(...)
    
  • The node structure classes were changed to allow macros, environments and latex specials to have arbitrarily complicated, non-standard arguments. If you relied on the details of the LatexNode‘s returned by LatexWalker, then you might have to adjust your code to the API changes. See documentation of LatexNode and friends.

    • pylatexenc.latexwalker.LatexMacroNode.nodeoptarg and pylatexenc.latexwalker.LatexMacroNode.nodeargs are deprecated in favor of pylatexenc.latexwalker.LatexMacroNode.nodeargd which is now a pylatexenc.macrospec.ParsedMacroArgs instance (or a subclass instance for custom nonstandard macro argument structures);

    • pylatexenc.latexwalker.LatexEnvironmentNode.envname was deprecated in favor of pylatexenc.latexwalker.LatexEnvironmentNode.environmentname;

    • pylatexenc.latexwalker.LatexEnvironmentNode.optargs and pylatexenc.latexwalker.LatexEnvironmentNode.args are deprecated in favor of pylatexenc.latexwalker.LatexEnvironmentNode.nodeargd, which works like for macros;

    • the pylatexenc.latexwalker.LatexSpecialsNode node type was introduced;

    • new attributes were added, e.g., the parsing_context, pos, and len to all node types; also pylatexenc.latexwalker.LatexGroupNode.delimiters and pylatexenc.latexwalker.LatexMathNode.delimiters.

  • Be wary of instantiating pylatexenc.latexwalker.LatexNode‘s and subclasses directly, because new fields might not be initialized properly. Instead, you should consider using pylatexenc.latexwalker.LatexWalker.make_node().