Simple Parser for LaTeX Code

The latexwalker module provides a simple API for parsing LaTeX snippets, and representing the contents using a data structure based on node classes.

LatexWalker will understand the syntax of most common macros. However, latexwalker is NOT a replacement for a full LaTeX engine. (Originally, latexwalker was designed to extract useful text for indexing for text database searches of LaTeX content.)

Simple example usage:

>>> from pylatexenc.latexwalker import LatexWalker, LatexEnvironmentNode
>>> w = LatexWalker(r"""
... \textbf{Hi there!} Here is \emph{a list}:
... \begin{enumerate}[label=(i)]
... \item One
... \item Two
... \end{enumerate}
... and $x$ is a variable.
... """)
>>> (nodelist, pos, len_) = w.get_latex_nodes(pos=0)
>>> nodelist[0]
LatexCharsNode(pos=0, len=1, chars='\n')
>>> nodelist[1]
LatexMacroNode(pos=1, len=18, macroname='textbf',
nodeargd=ParsedMacroArgs(argnlist=[LatexGroupNode(pos=8, len=11,
nodelist=[LatexCharsNode(pos=9, len=9, chars='Hi there!')],
delimiters=('{', '}'))], argspec='{'), macro_post_space='')
>>> nodelist[5].isNodeType(LatexEnvironmentNode)
True
>>> nodelist[5].environmentname
'enumerate'
>>> nodelist[5].nodeargd.argspec
'['
>>> nodelist[5].nodeargd.argnlist
[LatexGroupNode(pos=60, len=11, nodelist=[LatexCharsNode(pos=61, len=9,
chars='label=(i)')], delimiters=('[', ']'))]
>>> nodelist[7].latex_verbatim()
'$x$'

You can also use latexwalker directly in command-line, producing JSON or a human-readable node tree:

$ echo '\textit{italic} text' | latexwalker --output-format=json
{
  "nodelist": [
    {
      "nodetype": "LatexMacroNode",
      "pos": 0,
      "len": 15,
      "macroname": "textit",
[...]

$ latexwalker --help
[...]

The parser can be influenced by specifying a collection of known macros and environments (the “latex context”) that are specified using pylatexenc.macrospec.MacroSpec and pylatexenc.macrospec.EnvironmentSpec objects in a pylatexenc.macrospec.LatexContextDb object. See the doc of the module pylatexenc.macrospec for more information.

The main LatexWalker class

class pylatexenc.latexwalker.LatexWalker(s, latex_context=None, **kwargs)

A parser which walks through an input stream, parsing it as LaTeX markup.

Arguments:

  • s: the string to parse as LaTeX code

  • latex_context: a pylatexenc.macrospec.LatexContextDb object that provides macro and environment specifications with instructions on how to parse arguments, etc. If you don’t specify this argument, or if you specify None, then the default database is used. The default database is obtained with get_default_latex_context_db().

    New in version 2.0: This latex_context argument was introduced in version 2.0.

Additional keyword arguments are flags which influence the parsing. Accepted flags are:

  • tolerant_parsing=True|False If set to True, then the parser generally ignores syntax errors rather than raising an exception.
  • strict_braces=True|False This option refers specifically to reading a encountering a closing brace when an expression is needed. You generally won’t need to specify this flag, use tolerant_parsing instead.

The methods provided in this class perform various parsing of the given string s. These methods typically accept a pos parameter, which must be an integer, which defines the position in the string s to start parsing.

These methods, unless otherwise documented, return a tuple (node, pos, len), where node is a LatexNode describing the parsed content, pos is the position at which the LaTeX element of iterest was encountered, and len is the length of the string that is considered to be part of the node. That is, the position in the string that is immediately after the node is pos+len.

The following obsolete flag is accepted by the constructor for backwards compatibility with pylatexenc 1.x:

  • macro_dict: This argument is kept for compatibility with pylatexenc 1.x. This is a dictionary of known LaTeX macro specifications. If specified, this should be a dictionary where the keys are macro names and values are pylatexenc.macrospec.MacroSpec instances, as returned for instance by the pylatexenc 1.x-emulating function MacrosDef(). If you specify this argument, you cannot provide a custom latex_context. This argument is superseded by the latex_context argument. Furthermore, if you specify this argument, no specials are parsed so that the behavior closer to pylatexenc 1.x.

    Deprecated since version 2.0: The macro_dict argument has been replaced by the much more powerful latex_context argument which allows you to further provide environment specifications, etc.

  • keep_inline_math=True|False: Obsolete option. In pylatexenc 1.x, this option triggered a weird behavior especially since there is a similarly named option in pylatexenc.latex2text.LatexNodes2Text with a different meaning. [See Issue #14.] You should now only use the option math_mode= in pylatexenc.latex2text.LatexNodes2Text.

    Deprecated since version 2.0: This option is ignored starting from pylatexenc 2. Instead, you should set the option math_mode= accordingly in pylatexenc.latex2text.LatexNodes2Text.

s

The string that is being parsed.

Do NOT modify this attribute.

get_latex_braced_group(pos, brace_type='{', parsing_state=None)

Parses the latex content given to the constructor (and stored in self.s), starting at position pos, to read a latex group delimited by braces.

Reads a latex expression enclosed in braces { ... }. The first token of s[pos:] must be an opening brace.

Parsing might be influenced by the parsing_state. See doc for ParsingState. If parsing_state is None, the default parsing state is used.

Returns a tuple (node, pos, len), where node is a LatexGroupNode instance, pos is the position of the first char of the expression (which has to be an opening brace), and len is the length of the group, including the closing brace (relative to the starting position).

The group must be delimited by the given brace_type. brace_type may be one of {, [, ( or <, or a 2-item tuple of two distinct single characters providing the opening and closing brace chars (e.g., ("<", ">")).

New in version 2.0: The parsing_state argument was introduced in version 2.0.

get_latex_environment(pos, environmentname=None, parsing_state=None)

Parses the latex content given to the constructor (and stored in self.s), starting at position pos, to read a latex environment.

Reads a latex expression enclosed in a \begin{environment}...\end{environment}. The first token in the stream must be the \begin{environment}.

If environmentname is given and nonempty, then additionally a LatexWalkerParseError is raised if the environment in the input stream does not match the provided environment name.

Arguments to the begin environment command are parsed according to the corresponding specification in the given latex context latex_context provided to the constructor. The environment name is looked up as a “macro name” in the macro spec.

Parsing might be influenced by the parsing_state. See doc for ParsingState. If parsing_state is None, the default parsing state is used.

Returns a tuple (node, pos, len) where node is a LatexEnvironmentNode.

New in version 2.0: The parsing_state argument was introduced in version 2.0.

get_latex_expression(pos, strict_braces=None, parsing_state=None)

Parses the latex content given to the constructor (and stored in self.s), starting at position pos, to parse a single LaTeX expression.

Reads a latex expression, e.g. macro argument. This may be a single char, an escape sequence, or a expression placed in braces. This is what TeX calls a “token” (and not what we call a token… anyway).

Parsing might be influenced by the parsing_state. See doc for ParsingState. If parsing_state is None, then the default parsing state is used.

Returns a tuple (node, pos, len), where pos is the position of the first char of the expression and len the length of the expression.

New in version 2.0: The parsing_state argument was introduced in version 2.0.

get_latex_maybe_optional_arg(pos, parsing_state=None)

Parses the latex content given to the constructor (and stored in self.s), starting at position pos, to attempt to parse an optional argument.

Parsing might be influenced by the parsing_state. See doc for ParsingState. If parsing_state is None, the default parsing state is used.

Attempts to parse an optional argument. If this is successful, we return a tuple (node, pos, len) if success where node is a LatexGroupNode. Otherwise, this method returns None.

New in version 2.0: The parsing_state argument was introduced in version 2.0.

get_latex_nodes(pos=0, stop_upon_closing_brace=None, stop_upon_end_environment=None, stop_upon_closing_mathmode=None, read_max_nodes=None, parsing_state=None)

Parses the latex content given to the constructor (and stored in self.s) into a list of nodes.

Returns a tuple (nodelist, pos, len) where:

  • nodelist is a list of LatexNode’s representing the parsed LaTeX code.
  • pos is the same as the pos given as argument; if there is leading whitespace it is reported in nodelist using a LatexCharsNode.
  • len is the length of the parsed expression. If one of the stop_upon_…= arguments are provided (cf below), then the len includes the length of the token/expression that stopped the parsing.

If stop_upon_closing_brace is given and set to a character, then parsing stops once the given closing brace is encountered (but not inside a subgroup). The brace is given as a character, ‘]’, ‘}’, ‘)’, or ‘>’. Alternatively you may specify a 2-item tuple of two single distinct characters representing the opening and closing brace chars. The returned len includes the closing brace, but the closing brace is not included in any of the nodes in the nodelist.

If stop_upon_end_environment is provided, then parsing stops once the given environment was closed. If there is an environment mismatch, then a LatexWalkerParseError is raised except in tolerant parsing mode (see parse_flags()). Again, the closing environment is included in the length count but not the nodes.

If stop_upon_closing_mathmode is specified, then the parsing stops once the corresponding math mode (assumed already open) is closed. This argument may take the values None (no particular request to stop at any math mode token), or one of $, $$, \) or \] indicating a closing math mode delimiter that we are expecting and at which point parsing should stop.

If the token ‘$’ (respectively ‘$$’) is encountered, it is interpreted as the beginning of a new math mode chunk unless the argument stop_upon_closing_mathmode=… has been set to ‘$’ (respectively ‘$$’).

If read_max_nodes is non-None, then it should be set to an integer specifying the maximum number of top-level nodes to read before returning. (Top-level nodes means that macro arguments, environment or group contents, etc., do not count towards read_max_nodes.) If None, the entire input string will be parsed.

Note

There are a few important differences between get_latex_nodes(read_max_nodes=1) and get_latex_expression(): The former reads a logical node of the LaTeX document, which can be a sequence of characters, a macro invocation with arguments, or an entire environment, but the latter reads a single LaTeX “token” in a similar way to how LaTeX parses macro arguments.

For instance, if a macro is encountered, then get_latex_nodes(read_max_nodes=1) will read and parse its arguments, and include it in the corresponding LatexMacroNode, whereas get_latex_expression() will return a minimal LatexMacroNode with no arguments regardless of the macro’s argument specification. The same holds for latex specials. For environments, get_latex_nodes(read_max_nodes=1) will return the entire parsed environment into a LatexEnvironmentNode, whereas get_latex_expression() will return a LatexMacroNode named ‘begin’ with no arguments.

Parsing might be influenced by the parsing_state. See doc for ParsingState. If parsing_state is None, the default parsing state is used.

New in version 2.0: The parsing_state argument was introduced in version 2.0.

get_token(pos, include_brace_chars=None, environments=True, keep_inline_math=None, parsing_state=None, **kwargs)

Parses the latex content given to the constructor (and stored in self.s), starting at position pos, to parse a single “token”, as defined by LatexToken.

Parse the token in the stream pointed to at position pos.

For tokens of type ‘char’, usually a single character is returned. The only exception is at paragraph boundaries, where a single ‘char’-type token has argument ‘\n\n’.

Returns a LatexToken. Raises LatexWalkerEndOfStream if end of stream reached.

The argument include_brace_chars= allows to specify additional pairs of single characters which should be considered as braces (i.e., of ‘brace_open’ and ‘brace_close’ token types). It should be a list of 2-item tuples, for instance [('[', ']'), ('<', '>')]. The pair (‘{’, ‘}’) is always considered as braces. The delimiters may not have more than one character each.

If environments=False, then \begin and \end tokens count as regular ‘macro’ tokens (see LatexToken); otherwise (the default) they are considered as the token types ‘begin_environment’ and ‘end_environment’.

The parsing of the tokens might be influcenced by the parsing_state (a ParsingState instance). Currently, the only influence this has is that some latex specials are parsed differently if in math mode. See doc for ParsingState. If parsing_state is None, the default parsing state returned by make_parsing_state() is used.

Deprecated since version 2.0: The flag keep_inline_math is only accepted for compatibiltiy with earlier versions of pylatexenc, but it has no effect starting in pylatexenc 2. See the LatexWalker class doc.

Deprecated since version 2.0: If brackets_are_chars=False, then square bracket characters count as ‘brace_open’ and ‘brace_close’ token types (see LatexToken); otherwise (the default) they are considered just like other normal characters.

New in version 2.0: The parsing_state argument was introduced in version 2.0.

make_node(node_class, **kwargs)

Create and return a node of type node_class which holds a representation of the latex code at position pos and of length len in the parsed string.

The node class should be a LatexNode subclass. Keyword arguments are supplied directly to the constructor of the node class.

Mandatory keyword-only arguments are ‘pos’, ‘len’, and ‘parsing_state’.

All nodes produced by get_latex_nodes() and friends use this method to create node classes.

New in version 2.0: This method was introduced in pylatexenc 2.0.

make_parsing_state(**kwargs)

Return a new parsing state object that corresponds to the current string that we are parsing (s provided to the constructor) and the current latex context (latex_context provided to the constructor).

If no arguments are provided, this returns the default parsing state.

If keyword arguments are provided, then they can override fields from the default parsing state. For instance, if we enter math mode, you might use:

parsing_state_mathmode = \
    my_latex_walker.make_parsing_state(in_math_mode=True)
parse_flags()

The parse flags currently set on this object. Returns a dictionary with keys ‘keep_inline_math’, ‘tolerant_parsing’ and ‘strict_braces’.

Deprecated since version 2.0: The ‘keep_inline_math’ key is always set to None starting in pylatexenc 2 and might be removed entirely in future versions.

pos_to_lineno_colno(pos, as_dict=False)

Return the line and column number corresponding to the given pos in our string self.s.

The first time this function is called, line numbers are calculated for the entire string. These are cached for future calls which are then fast.

Return a tuple (lineno, colno) giving line number and column number. Line numbers start at 1 and column numbers start at zero, i.e., the beginning of the document (pos=0) has line and column number (1,0). If as_dict=True, then a dictionary with keys ‘lineno’, ‘colno’ is returned instead of a tuple.

pylatexenc.latexwalker.get_default_latex_context_db()

Return a pylatexenc.macrospec.LatexContextDb instance initialized with a collection of known macros and environments.

TODO: document categories.

If you want to add your own definitions, you should use the pylatexenc.macrospec.LatexContextDb.add_context_category() method. If you would like to override some definitions, use that method with the argument prepend=True. See docs for pylatexenc.macrospec.LatexContextDb.add_context_category().

If there are too many macro/environment definitions, or if there are some irrelevant ones, you can always filter the returned database using pylatexenc.macrospec.LatexContextDb.filter_context().

New in version 2.0: The pylatexenc.macrospec.LatexContextDb class as well as this method, were all introduced in pylatexenc 2.0.

Exception Classes

class pylatexenc.latexwalker.LatexWalkerError

Generic exception class raised by this module.

class pylatexenc.latexwalker.LatexWalkerParseError(msg, s=None, pos=None, lineno=None, colno=None)

Represents an error while parsing LaTeX code.

The following attributes are available if they were provided to the class constructor:

msg

The error message

s

The string that was currently being parsed

pos

The index in the string where the error occurred, starting at zero.

lineno

The line number where the error occurred, starting at 1.

colno

The column number where the error occurred in the line lineno, starting at 1.

class pylatexenc.latexwalker.LatexWalkerEndOfStream(final_space='')

Reached end of input stream (e.g., end of file).

Data Node Classes

class pylatexenc.latexwalker.LatexNode(_fields, _redundant_fields=None, parsing_state=None, pos=None, len=None, **kwargs)

Represents an abstract ‘node’ of the latex document.

Use nodeType() to figure out what type of node this is, and isNodeType() to test whether it is of a given type.

You should use LatexWalker.make_node() to create nodes, so that the latex walker has the opportunity to do some additional setting up.

All nodes have the following attributes:

parsing_state

The parsing state at the time this node was created. This object stores additional context information for this node, such as whether or not this node was parsed in a math mode block of LaTeX code.

See also the LatexWalker.make_parsing_state() and the parsing_state argument of LatexWalker.get_latex_nodes().

pos

The position in the parsed string that this node represents. The parsed string can be recovered as parsing_state.s, see ParsingState.s.

len

How many characters in the parsed string this node represents, starting at position pos. The parsed string can be recovered as parsing_state.s, see ParsingState.s.

New in version 2.0: The attributes parsing_state, pos and len were added in pylatexenc 2.0.

isNodeType(t)

Returns True if the current node is of the given type. The argument t must be a Python class such as, e.g. LatexGroupNode.

latex_verbatim()

Return the chunk of LaTeX code that this node represents.

This is a shorthand for node.parsing_state.s[node.pos:node.pos+node.len].

nodeType()

Returns the class which corresponds to the type of this node. This is a Python class object, that is one of LatexCharsNode, LatexGroupNode, etc.

class pylatexenc.latexwalker.LatexCharsNode(chars, **kwargs)

Bases: pylatexenc.latexwalker.LatexNode

A string of characters in the LaTeX document, without any special LaTeX code.

chars

The string of characters represented by this node.

class pylatexenc.latexwalker.LatexGroupNode(nodelist, **kwargs)

Bases: pylatexenc.latexwalker.LatexNode

A LaTeX group delimited by braces, {like this}.

Note: in the case of an optional macro or environment argument, this node is also used to represents a group delimited by square braces instead of curly braces.

nodelist

A list of nodes describing the contents of the LaTeX braced group. Each item of the list is a LatexNode.

delimiters

A 2-item tuple that stores the delimiters for this group node. Usually this is (‘{’, ‘}’), except for optional macro arguments where this might be for instance (‘[’, ‘]’).

New in version 2.0: The delimiters field was added in pylatexenc 2.0.

class pylatexenc.latexwalker.LatexCommentNode(comment, **kwargs)

Bases: pylatexenc.latexwalker.LatexNode

A LaTeX comment, delimited by a percent sign until the end of line.

comment

The comment string, not including the ‘%’ sign nor the following newline

comment_post_space

The newline that terminated the comment possibly followed by spaces (e.g., indentation spaces of the next line)

class pylatexenc.latexwalker.LatexMacroNode(macroname, **kwargs)

Bases: pylatexenc.latexwalker.LatexNode

Represents a macro type node, e.g. \textbf

macroname

The name of the macro (string), without the leading backslash.

nodeargd

The pylatexenc.macrospec.ParsedMacroArgs object that represents the macro arguments.

For macros that do not accept any argument, this is an empty ParsedMacroArgs instance. The attribute nodeargd can be None even for macros that accept arguments, in the situation where LatexWalker.get_latex_expression() encounters the macro when reading a single expression.

Arguments must be declared in the latex context passed to the LatexWalker constructor, using a suitable pylatexenc.macrospec.MacroSpec object. Some known macros are already declared in the default latex context.

New in version 2.0: The nodeargd attribute was introduced in pylatexenc 2.

macro_post_space

Any spaces that were encountered immediately after the macro.

The following attributes are obsolete since pylatexenc 2.0.

nodeoptarg

Deprecated since version 2.0: Macro arguments are stored in nodeargd in pylatexenc 2. Accessing the argument nodeoptarg will still give a first optional argument for standard latex macros, for backwards compatibility.

If non-None, this corresponds to the optional argument of the macro.

nodeargs

Deprecated since version 2.0: Macro arguments are stored in nodeargd in pylatexenc 2. Accessing the argument nodeargs will still provide a list of argument nodes for standard latex macros, for backwards compatibility.

A list of arguments to the macro. Each item in the list is a LatexNode.

class pylatexenc.latexwalker.LatexEnvironmentNode(environmentname, nodelist, **kwargs)

Bases: pylatexenc.latexwalker.LatexNode

A LaTeX Environment Node, i.e. \begin{something} ... \end{something}.

environmentname

The name of the environment (‘itemize’, ‘equation’, …)

nodelist

A list of LatexNode’s that represent all the contents between the \begin{...} instruction and the \end{...} instruction.

nodeargd

The pylatexenc.macrospec.ParsedMacroArgs object that represents the arguments passed to the environment. These are arguments that are present after the \begin{xxxxxx} command, as in \begin{tabular}{ccc} or \begin{figure}[H]. Arguments must be declared in the latex context passed to the LatexWalker constructor, using a suitable pylatexenc.macrospec.EnvironmentSpec object. Some known environments are already declared in the default latex context.

New in version 2.0: The nodeargd attribute was introduced in pylatexenc 2.

The following attributes are available, but they are obsolete since pylatexenc 2.0.

envname

Deprecated since version 2.0: This attribute was renamed environmentname for consistency with the rest of the package.

optargs

Deprecated since version 2.0: Macro arguments are stored in nodeargd in pylatexenc 2. Accessing the argument optargs will still give a list of initial optional arguments for standard latex macros, for backwards compatibility.

args

Deprecated since version 2.0: Macro arguments are stored in nodeargd in pylatexenc 2. Accessing the argument args will still give a list of curly-brace-delimited arguments for standard latex macros, for backwards compatibility.

class pylatexenc.latexwalker.LatexSpecialsNode(specials_chars, **kwargs)

Bases: pylatexenc.latexwalker.LatexNode

Represents a specials type node, e.g. & or ~

specials_chars

The name of the specials (string), without the leading backslash.

nodeargd

If the specials spec (cf. SpecialsSpec) has args_parser=None then the attribute nodeargd is set to None. If args_parser is specified in the spec, then the attribute nodeargd is a pylatexenc.macrospec.ParsedMacroArgs instance that represents the arguments to the specials.

The nodeargd attribute can also be None even if the specials expects arguments, in the special situation where LatexWalker.get_latex_expression() encounters this specials.

Arguments must be declared in the latex context passed to the LatexWalker constructor, using a suitable pylatexenc.macrospec.SpecialsSpec object. Some known latex specials are already declared in the default latex context.

New in version 2.0: Latex specials were introduced in pylatexenc 2.0.

class pylatexenc.latexwalker.LatexMathNode(displaytype, nodelist=[], **kwargs)

Bases: pylatexenc.latexwalker.LatexNode

A Math node type.

Note that currently only ‘inline’ math environments are detected.

displaytype

Either ‘inline’ or ‘display’, to indicate an inline math block or a display math block. (Note that math environments such as \begin{equation}...\end{equation}, are reported as LatexEnvironmentNode’s, and not as LatexMathNode’s.)

delimiters

A 2-item tuple containing the begin and end delimiters used to delimit this math mode section.

New in version 2.0: The delimiters attribute was introduced in pylatexenc 2.

nodelist

The contents of the environment, given as a list of LatexNode’s.

Parsing helpers

class pylatexenc.latexwalker.ParsingState(**kwargs)

Stores some information about the current parsing state, such as whether we are currently in a math mode block.

One of the ideas of pylatexenc is to make the parsing of LaTeX code mostly state-independent mark-up parsing (in contrast to a full TeX engine, whose state constantly changes and whose parsing behavior is altered dynamically while parsing). However a minimal state of the context might come in handy sometimes. Perhaps some macros or specials should behave differently in math mode than in text mode.

This class also stores some essential information that is associated with LatexNode’s and which provides a context to better understand the node structure. For instance, we store the original parsed string, and each node refers to which part of the string they represent.

s

The string that is parsed by the LatexWalker

latex_context

The latex context (with macros/environments specifications) that was used when parsing the string s. This is a pylatexenc.macrospec.LatexContextDb object.

in_math_mode

Whether or not we are in a math mode chunk of LaTeX (True or False). This can be inline or display, and can be caused by an equation environment.

math_mode_delimiter

Information about the kind of math mode we are currently in, if in_math_mode is True. This is a string which can be set to aid the parser. The parser sets this field to the math mode delimiter that initiated the math mode (one of '$', '$$', r'\(', r'\)'). For user-initiated math modes (e.g. by a custom environment definition), you can set this string to any custom value EXCEPT any of the core math mode delimiters listed above.

Note

The tokenizer/parser relies on the value of the math_mode_delimiter attribute to disambiguate two consecutive dollar signs ...$$... into either a display math mode delimiter or two inline math mode delimiters (as in $a$$b$). You should only set math_mode_delimiter=’$’ if you know what you’re doing.

New in version 2.0: This class was introduced in version 2.0.

New in version 2.7: The attribute math_mode_delimiter was introduced in version 2.7.

Changed in version 2.7: All arguments must now be specified as keyword arguments as of version 2.7.

get_fields()

Returns the fields and values associated with this ParsingState as a dictionary.

sub_context(**kwargs)

Return a new ParsingState instance that is a copy of the current parsing state, but where the given properties keys have been set to the corresponding values (given as keyword arguments).

This makes it easy to create a sub-context in a given parser. For instance, if we enter math mode, we might write:

parsing_state_inner = parsing_state.sub_context(in_math_mode=True)

If no arguments are provided, this returns a copy of the present parsing context object.

class pylatexenc.latexwalker.LatexToken(tok, arg, pos, len, pre_space, post_space='')

Represents a token read from the LaTeX input.

This is used internally by LatexWalker’s methods. You probably don’t need to worry about individual tokens. Rather, you should use the high-level functions provided by LatexWalker (e.g., get_latex_nodes()). So most likely, you can ignore this class entirely.

Instances of this class are what the method LatexWalker.get_token() returns. See the doc of that function for more information on how tokens are parsed.

This is not the same thing as a LaTeX token, it’s just a part of the input which we treat in the same way (e.g. a bunch of content characters, a comment, a macro, etc.)

Information about the object is stored into the fields tok and arg. The tok field is a string which identifies the type of the token. The arg depends on what tok is, and describes the actual input.

Additionally, this class stores information about the position of the token in the input stream in the field pos. This pos is an integer which corresponds to the index in the input string. The field len stores the length of the token in the input string. This means that this token spans in the input string from pos to pos+len.

Leading whitespace before the token is not returned as a separate ‘char’-type token, but it is given in the pre_space field of the token which follows. Pre-space may contain a newline, but not two consecutive newlines.

The post_space is only used for ‘macro’ and ‘comment’ tokens, and it stores any spaces encountered after a macro, or the newline with any following spaces that terminates a LaTeX comment. When we encounter two consecutive newlines these are not included in post_space.

The tok field may be one of:

  • ‘char’: raw character(s) which have no special LaTeX meaning and which are part of the text content.

    The arg field contains the characters themselves.

  • ‘macro’: a macro invocation, but not \begin or \end

    The arg field contains the name of the macro, without the leading backslash.

  • ‘begin_environment’: an invocation of \begin{environment}.

    The arg field contains the name of the environment inside the braces.

  • ‘end_environment’: an invocation of \end{environment}.

    The arg field contains the name of the environment inside the braces.

  • ‘comment’: a LaTeX comment delimited by a percent sign up to the end of the line.

    The arg field contains the text in the comment line, not including the percent sign nor the newline.

  • ‘brace_open’: an opening brace. This is usually a curly brace, and sometimes also a square bracket. What is parsed as a brace depends on the arguments to get_token().

    The arg is a string which contains the relevant brace character.

  • ‘brace_close’: a closing brace. This is usually a curly brace, and sometimes also a square bracket. What is parsed as a brace depends on the arguments to get_token().

    The arg is a string which contains the relevant brace character.

  • ‘mathmode_inline’: a delimiter which starts/ends inline math. This is (e.g.) a single ‘$’ character which is not part of a double ‘$$’ display environment delimiter.

    The arg is the string value of the delimiter in question (‘$’)

  • ‘mathmode_display’: a delimiter which starts/ends display math, e.g., \[.

    The arg is the string value of the delimiter in question (e.g., \[ or $$)

  • ‘specials’: a character or character sequence that has a special meaning in LaTeX. E.g., ‘~’, ‘&’, etc.

    The arg field is then the corresponding SpecialsSpec instance. [The rationale for setting arg to a SpecialsSpec instance, in contrast to the behavior for macros and envrionments, is that macros and environments are delimited directly by LaTeX syntax and are determined unambiguously without any lookup in the latex context database. This is not the case for specials.]

Legacy Macro Definitions (for pylatexenc 1.x)

pylatexenc.latexwalker.MacrosDef = <function MacrosDef>

Deprecated since version 2.0: Use pylatexenc.macrospec.std_macro() instead which does the same thing, or invoke the MacroSpec class directly (or a subclass).

In pylatexenc 1.x, MacrosDef was a class. Since pylatexenc 2.0, MacrosDef is a function which returns a MacroSpec instance. In this way the earlier idiom MacrosDef(...) still works in pylatexenc 2. The field names of the constructed object might have changed since pylatexenc 1.x, so you might have to adapt existing code if you were accessing individual fields of MacrosDef objects.

In the object returned by MacrosDef(), we provide the legacy attributes macname, optarg, and numargs, so that existing code accessing those properties can continue to work.

pylatexenc.latexwalker.default_macro_dict

Deprecated since version 2.0: Use get_default_latex_context_db() instead, or create your own pylatexenc.macrospec.LatexContextDb object.

Provide an access to the default macro specs for latexwalker in a form that is compatible with pylatexenc 1.x’s default_macro_dict module-level dictionary.

This is implemented using a custom lazy mutable mapping, which behaves just like a regular dictionary but that loads the data only once the dictionary is accessed. In this way the default latex specs into a python dictionary unless they are actually queried or modified, and thus users of pylatexenc 2.0 that don’t rely on the default macro/environment definitions shouldn’t notice any decrease in performance.