Simple Parser for LaTeX Code¶
The latexwalker
module provides a simple API for parsing LaTeX snippets,
and representing the contents using a data structure based on node classes.
LatexWalker will understand the syntax of most common macros. However,
latexwalker
is NOT a replacement for a full LaTeX engine. (Originally,
latexwalker
was designed to extract useful text for indexing for text
database searches of LaTeX content.)
Simple example usage:
>>> from pylatexenc.latexwalker import LatexWalker, LatexEnvironmentNode
>>> w = LatexWalker(r"""
... \textbf{Hi there!} Here is \emph{a list}:
... \begin{enumerate}[label=(i)]
... \item One
... \item Two
... \end{enumerate}
... and $x$ is a variable.
... """)
>>> (nodelist, pos, len_) = w.get_latex_nodes(pos=0)
>>> nodelist[0]
LatexCharsNode(pos=0, len=1, chars='\n')
>>> nodelist[1]
LatexMacroNode(pos=1, len=18, macroname='textbf',
nodeargd=ParsedMacroArgs(argnlist=[LatexGroupNode(pos=8, len=11,
nodelist=[LatexCharsNode(pos=9, len=9, chars='Hi there!')],
delimiters=('{', '}'))], argspec='{'), macro_post_space='')
>>> nodelist[5].isNodeType(LatexEnvironmentNode)
True
>>> nodelist[5].environmentname
'enumerate'
>>> nodelist[5].nodeargd.argspec
'['
>>> nodelist[5].nodeargd.argnlist
[LatexGroupNode(pos=60, len=11, nodelist=[LatexCharsNode(pos=61, len=9,
chars='label=(i)')], delimiters=('[', ']'))]
>>> nodelist[7].latex_verbatim()
'$x$'
You can also use latexwalker directly in command-line, producing JSON or a human-readable node tree:
$ echo '\textit{italic} text' | latexwalker --output-format=json
{
"nodelist": [
{
"nodetype": "LatexMacroNode",
"pos": 0,
"len": 15,
"macroname": "textit",
[...]
$ latexwalker --help
[...]
The parser can be influenced by specifying a collection of known macros and
environments (the “latex context”) that are specified using
pylatexenc.macrospec.MacroSpec
and
pylatexenc.macrospec.EnvironmentSpec
objects in a
pylatexenc.macrospec.LatexContextDb
object. See the doc of the
module pylatexenc.macrospec
for more information.
Contents:
The main LatexWalker class¶
-
class
pylatexenc.latexwalker.
LatexWalker
(s, latex_context=None, **kwargs)¶ A parser which walks through an input stream, parsing it as LaTeX markup.
Arguments:
s: the string to parse as LaTeX code
latex_context: a
pylatexenc.macrospec.LatexContextDb
object that provides macro and environment specifications with instructions on how to parse arguments, etc. If you don’t specify this argument, or if you specify None, then the default database is used. The default database is obtained withget_default_latex_context_db()
.New in version 2.0: This latex_context argument was introduced in version 2.0.
Additional keyword arguments are flags which influence the parsing. Accepted flags are:
- tolerant_parsing=True|False If set to True, then the parser generally ignores syntax errors rather than raising an exception.
- strict_braces=True|False This option refers specifically to reading a encountering a closing brace when an expression is needed. You generally won’t need to specify this flag, use tolerant_parsing instead.
The methods provided in this class perform various parsing of the given string s. These methods typically accept a pos parameter, which must be an integer, which defines the position in the string s to start parsing.
These methods, unless otherwise documented, return a tuple (node, pos, len), where node is a
LatexNode
describing the parsed content, pos is the position at which the LaTeX element of iterest was encountered, and len is the length of the string that is considered to be part of the node. That is, the position in the string that is immediately after the node is pos+len.The following obsolete flag is accepted by the constructor for backwards compatibility with pylatexenc 1.x:
macro_dict: This argument is kept for compatibility with pylatexenc 1.x. This is a dictionary of known LaTeX macro specifications. If specified, this should be a dictionary where the keys are macro names and values are
pylatexenc.macrospec.MacroSpec
instances, as returned for instance by the pylatexenc 1.x-emulating functionMacrosDef()
. If you specify this argument, you cannot provide a custom latex_context. This argument is superseded by the latex_context argument. Furthermore, if you specify this argument, no specials are parsed so that the behavior closer to pylatexenc 1.x.Deprecated since version 2.0: The macro_dict argument has been replaced by the much more powerful latex_context argument which allows you to further provide environment specifications, etc.
keep_inline_math=True|False: Obsolete option. In pylatexenc 1.x, this option triggered a weird behavior especially since there is a similarly named option in
pylatexenc.latex2text.LatexNodes2Text
with a different meaning. [See Issue #14.] You should now only use the option math_mode= inpylatexenc.latex2text.LatexNodes2Text
.Deprecated since version 2.0: This option is ignored starting from pylatexenc 2. Instead, you should set the option math_mode= accordingly in
pylatexenc.latex2text.LatexNodes2Text
.
-
s
¶ The string that is being parsed.
Do NOT modify this attribute.
-
get_latex_braced_group
(pos, brace_type='{', parsing_state=None)¶ Parses the latex content given to the constructor (and stored in self.s), starting at position pos, to read a latex group delimited by braces.
Reads a latex expression enclosed in braces
{ ... }
. The first token of s[pos:] must be an opening brace.Parsing might be influenced by the parsing_state. See doc for
ParsingState
. If parsing_state is None, the default parsing state is used.Returns a tuple (node, pos, len), where node is a
LatexGroupNode
instance, pos is the position of the first char of the expression (which has to be an opening brace), and len is the length of the group, including the closing brace (relative to the starting position).The group must be delimited by the given brace_type. brace_type may be one of
{
,[
,(
or<
, or a 2-item tuple of two distinct single characters providing the opening and closing brace chars (e.g.,("<", ">")
).New in version 2.0: The parsing_state argument was introduced in version 2.0.
-
get_latex_environment
(pos, environmentname=None, parsing_state=None)¶ Parses the latex content given to the constructor (and stored in self.s), starting at position pos, to read a latex environment.
Reads a latex expression enclosed in a
\begin{environment}...\end{environment}
. The first token in the stream must be the\begin{environment}
.If environmentname is given and nonempty, then additionally a
LatexWalkerParseError
is raised if the environment in the input stream does not match the provided environment name.Arguments to the begin environment command are parsed according to the corresponding specification in the given latex context latex_context provided to the constructor. The environment name is looked up as a “macro name” in the macro spec.
Parsing might be influenced by the parsing_state. See doc for
ParsingState
. If parsing_state is None, the default parsing state is used.Returns a tuple (node, pos, len) where node is a
LatexEnvironmentNode
.New in version 2.0: The parsing_state argument was introduced in version 2.0.
-
get_latex_expression
(pos, strict_braces=None, parsing_state=None)¶ Parses the latex content given to the constructor (and stored in self.s), starting at position pos, to parse a single LaTeX expression.
Reads a latex expression, e.g. macro argument. This may be a single char, an escape sequence, or a expression placed in braces. This is what TeX calls a “token” (and not what we call a token… anyway).
Parsing might be influenced by the parsing_state. See doc for
ParsingState
. If parsing_state is None, then the default parsing state is used.Returns a tuple (node, pos, len), where pos is the position of the first char of the expression and len the length of the expression.
New in version 2.0: The parsing_state argument was introduced in version 2.0.
-
get_latex_maybe_optional_arg
(pos, parsing_state=None)¶ Parses the latex content given to the constructor (and stored in self.s), starting at position pos, to attempt to parse an optional argument.
Parsing might be influenced by the parsing_state. See doc for
ParsingState
. If parsing_state is None, the default parsing state is used.Attempts to parse an optional argument. If this is successful, we return a tuple (node, pos, len) if success where node is a
LatexGroupNode
. Otherwise, this method returns None.New in version 2.0: The parsing_state argument was introduced in version 2.0.
-
get_latex_nodes
(pos=0, stop_upon_closing_brace=None, stop_upon_end_environment=None, stop_upon_closing_mathmode=None, read_max_nodes=None, parsing_state=None)¶ Parses the latex content given to the constructor (and stored in self.s) into a list of nodes.
Returns a tuple (nodelist, pos, len) where:
- nodelist is a list of
LatexNode
’s representing the parsed LaTeX code. - pos is the same as the pos given as argument; if there is
leading whitespace it is reported in nodelist using a
LatexCharsNode
. - len is the length of the parsed expression. If one of the stop_upon_…= arguments are provided (cf below), then the len includes the length of the token/expression that stopped the parsing.
If stop_upon_closing_brace is given and set to a character, then parsing stops once the given closing brace is encountered (but not inside a subgroup). The brace is given as a character, ‘]’, ‘}’, ‘)’, or ‘>’. Alternatively you may specify a 2-item tuple of two single distinct characters representing the opening and closing brace chars. The returned len includes the closing brace, but the closing brace is not included in any of the nodes in the nodelist.
If stop_upon_end_environment is provided, then parsing stops once the given environment was closed. If there is an environment mismatch, then a LatexWalkerParseError is raised except in tolerant parsing mode (see
parse_flags()
). Again, the closing environment is included in the length count but not the nodes.If stop_upon_closing_mathmode is specified, then the parsing stops once the corresponding math mode (assumed already open) is closed. This argument may take the values None (no particular request to stop at any math mode token), or one of
$
,$$
,\)
or\]
indicating a closing math mode delimiter that we are expecting and at which point parsing should stop.If the token ‘$’ (respectively ‘$$’) is encountered, it is interpreted as the beginning of a new math mode chunk unless the argument stop_upon_closing_mathmode=… has been set to ‘$’ (respectively ‘$$’).
If read_max_nodes is non-None, then it should be set to an integer specifying the maximum number of top-level nodes to read before returning. (Top-level nodes means that macro arguments, environment or group contents, etc., do not count towards read_max_nodes.) If None, the entire input string will be parsed.
Note
There are a few important differences between
get_latex_nodes(read_max_nodes=1)
andget_latex_expression()
: The former reads a logical node of the LaTeX document, which can be a sequence of characters, a macro invocation with arguments, or an entire environment, but the latter reads a single LaTeX “token” in a similar way to how LaTeX parses macro arguments.For instance, if a macro is encountered, then
get_latex_nodes(read_max_nodes=1)
will read and parse its arguments, and include it in the correspondingLatexMacroNode
, whereasget_latex_expression()
will return a minimalLatexMacroNode
with no arguments regardless of the macro’s argument specification. The same holds for latex specials. For environments,get_latex_nodes(read_max_nodes=1)
will return the entire parsed environment into aLatexEnvironmentNode
, whereasget_latex_expression()
will return aLatexMacroNode
named ‘begin’ with no arguments.Parsing might be influenced by the parsing_state. See doc for
ParsingState
. If parsing_state is None, the default parsing state is used.New in version 2.0: The parsing_state argument was introduced in version 2.0.
- nodelist is a list of
-
get_token
(pos, include_brace_chars=None, environments=True, keep_inline_math=None, parsing_state=None, **kwargs)¶ Parses the latex content given to the constructor (and stored in self.s), starting at position pos, to parse a single “token”, as defined by
LatexToken
.Parse the token in the stream pointed to at position pos.
For tokens of type ‘char’, usually a single character is returned. The only exception is at paragraph boundaries, where a single ‘char’-type token has argument ‘\n\n’.
Returns a
LatexToken
. RaisesLatexWalkerEndOfStream
if end of stream reached.The argument include_brace_chars= allows to specify additional pairs of single characters which should be considered as braces (i.e., of ‘brace_open’ and ‘brace_close’ token types). It should be a list of 2-item tuples, for instance
[('[', ']'), ('<', '>')]
. The pair (‘{’, ‘}’) is always considered as braces. The delimiters may not have more than one character each.If environments=False, then
\begin
and\end
tokens count as regular ‘macro’ tokens (seeLatexToken
); otherwise (the default) they are considered as the token types ‘begin_environment’ and ‘end_environment’.The parsing of the tokens might be influcenced by the parsing_state (a
ParsingState
instance). Currently, the only influence this has is that some latex specials are parsed differently if in math mode. See doc forParsingState
. If parsing_state is None, the default parsing state returned bymake_parsing_state()
is used.Deprecated since version 2.0: The flag keep_inline_math is only accepted for compatibiltiy with earlier versions of pylatexenc, but it has no effect starting in pylatexenc 2. See the
LatexWalker
class doc.Deprecated since version 2.0: If brackets_are_chars=False, then square bracket characters count as ‘brace_open’ and ‘brace_close’ token types (see
LatexToken
); otherwise (the default) they are considered just like other normal characters.New in version 2.0: The parsing_state argument was introduced in version 2.0.
-
make_node
(node_class, **kwargs)¶ Create and return a node of type node_class which holds a representation of the latex code at position pos and of length len in the parsed string.
The node class should be a
LatexNode
subclass. Keyword arguments are supplied directly to the constructor of the node class.Mandatory keyword-only arguments are ‘pos’, ‘len’, and ‘parsing_state’.
All nodes produced by
get_latex_nodes()
and friends use this method to create node classes.New in version 2.0: This method was introduced in pylatexenc 2.0.
-
make_parsing_state
(**kwargs)¶ Return a new parsing state object that corresponds to the current string that we are parsing (s provided to the constructor) and the current latex context (latex_context provided to the constructor).
If no arguments are provided, this returns the default parsing state.
If keyword arguments are provided, then they can override fields from the default parsing state. For instance, if we enter math mode, you might use:
parsing_state_mathmode = \ my_latex_walker.make_parsing_state(in_math_mode=True)
-
parse_flags
()¶ The parse flags currently set on this object. Returns a dictionary with keys ‘keep_inline_math’, ‘tolerant_parsing’ and ‘strict_braces’.
Deprecated since version 2.0: The ‘keep_inline_math’ key is always set to None starting in pylatexenc 2 and might be removed entirely in future versions.
-
pos_to_lineno_colno
(pos, as_dict=False)¶ Return the line and column number corresponding to the given pos in our string self.s.
The first time this function is called, line numbers are calculated for the entire string. These are cached for future calls which are then fast.
Return a tuple (lineno, colno) giving line number and column number. Line numbers start at 1 and column numbers start at zero, i.e., the beginning of the document (pos=0) has line and column number (1,0). If as_dict=True, then a dictionary with keys ‘lineno’, ‘colno’ is returned instead of a tuple.
-
pylatexenc.latexwalker.
get_default_latex_context_db
()¶ Return a
pylatexenc.macrospec.LatexContextDb
instance initialized with a collection of known macros and environments.TODO: document categories.
If you want to add your own definitions, you should use the
pylatexenc.macrospec.LatexContextDb.add_context_category()
method. If you would like to override some definitions, use that method with the argument prepend=True. See docs forpylatexenc.macrospec.LatexContextDb.add_context_category()
.If there are too many macro/environment definitions, or if there are some irrelevant ones, you can always filter the returned database using
pylatexenc.macrospec.LatexContextDb.filter_context()
.New in version 2.0: The
pylatexenc.macrospec.LatexContextDb
class as well as this method, were all introduced in pylatexenc 2.0.
Exception Classes¶
-
class
pylatexenc.latexwalker.
LatexWalkerError
¶ Generic exception class raised by this module.
-
class
pylatexenc.latexwalker.
LatexWalkerParseError
(msg, s=None, pos=None, lineno=None, colno=None)¶ Represents an error while parsing LaTeX code.
The following attributes are available if they were provided to the class constructor:
-
msg
¶ The error message
-
s
¶ The string that was currently being parsed
-
pos
¶ The index in the string where the error occurred, starting at zero.
-
lineno
¶ The line number where the error occurred, starting at 1.
-
colno
¶ The column number where the error occurred in the line lineno, starting at 1.
-
-
class
pylatexenc.latexwalker.
LatexWalkerEndOfStream
(final_space='')¶ Reached end of input stream (e.g., end of file).
Data Node Classes¶
-
class
pylatexenc.latexwalker.
LatexNode
(_fields, _redundant_fields=None, parsing_state=None, pos=None, len=None, **kwargs)¶ Represents an abstract ‘node’ of the latex document.
Use
nodeType()
to figure out what type of node this is, andisNodeType()
to test whether it is of a given type.You should use
LatexWalker.make_node()
to create nodes, so that the latex walker has the opportunity to do some additional setting up.All nodes have the following attributes:
-
parsing_state
¶ The parsing state at the time this node was created. This object stores additional context information for this node, such as whether or not this node was parsed in a math mode block of LaTeX code.
See also the
LatexWalker.make_parsing_state()
and the parsing_state argument ofLatexWalker.get_latex_nodes()
.
-
pos
¶ The position in the parsed string that this node represents. The parsed string can be recovered as parsing_state.s, see
ParsingState.s
.
-
len
¶ How many characters in the parsed string this node represents, starting at position pos. The parsed string can be recovered as parsing_state.s, see
ParsingState.s
.
New in version 2.0: The attributes parsing_state, pos and len were added in pylatexenc 2.0.
-
isNodeType
(t)¶ Returns True if the current node is of the given type. The argument t must be a Python class such as, e.g.
LatexGroupNode
.
-
latex_verbatim
()¶ Return the chunk of LaTeX code that this node represents.
This is a shorthand for
node.parsing_state.s[node.pos:node.pos+node.len]
.
-
nodeType
()¶ Returns the class which corresponds to the type of this node. This is a Python class object, that is one of
LatexCharsNode
,LatexGroupNode
, etc.
-
-
class
pylatexenc.latexwalker.
LatexCharsNode
(chars, **kwargs)¶ Bases:
pylatexenc.latexwalker.LatexNode
A string of characters in the LaTeX document, without any special LaTeX code.
-
chars
¶ The string of characters represented by this node.
-
-
class
pylatexenc.latexwalker.
LatexGroupNode
(nodelist, **kwargs)¶ Bases:
pylatexenc.latexwalker.LatexNode
A LaTeX group delimited by braces,
{like this}
.Note: in the case of an optional macro or environment argument, this node is also used to represents a group delimited by square braces instead of curly braces.
-
nodelist
¶ A list of nodes describing the contents of the LaTeX braced group. Each item of the list is a
LatexNode
.
-
delimiters
¶ A 2-item tuple that stores the delimiters for this group node. Usually this is (‘{’, ‘}’), except for optional macro arguments where this might be for instance (‘[’, ‘]’).
New in version 2.0: The delimiters field was added in pylatexenc 2.0.
-
-
class
pylatexenc.latexwalker.
LatexCommentNode
(comment, **kwargs)¶ Bases:
pylatexenc.latexwalker.LatexNode
A LaTeX comment, delimited by a percent sign until the end of line.
-
comment
¶ The comment string, not including the ‘%’ sign nor the following newline
-
comment_post_space
¶ The newline that terminated the comment possibly followed by spaces (e.g., indentation spaces of the next line)
-
-
class
pylatexenc.latexwalker.
LatexMacroNode
(macroname, **kwargs)¶ Bases:
pylatexenc.latexwalker.LatexNode
Represents a macro type node, e.g.
\textbf
-
macroname
¶ The name of the macro (string), without the leading backslash.
-
nodeargd
¶ The
pylatexenc.macrospec.ParsedMacroArgs
object that represents the macro arguments.For macros that do not accept any argument, this is an empty
ParsedMacroArgs
instance. The attribute nodeargd can be None even for macros that accept arguments, in the situation whereLatexWalker.get_latex_expression()
encounters the macro when reading a single expression.Arguments must be declared in the latex context passed to the
LatexWalker
constructor, using a suitablepylatexenc.macrospec.MacroSpec
object. Some known macros are already declared in the default latex context.New in version 2.0: The nodeargd attribute was introduced in pylatexenc 2.
-
macro_post_space
¶ Any spaces that were encountered immediately after the macro.
The following attributes are obsolete since pylatexenc 2.0.
-
nodeoptarg
¶ Deprecated since version 2.0: Macro arguments are stored in nodeargd in pylatexenc 2. Accessing the argument nodeoptarg will still give a first optional argument for standard latex macros, for backwards compatibility.
If non-None, this corresponds to the optional argument of the macro.
-
-
class
pylatexenc.latexwalker.
LatexEnvironmentNode
(environmentname, nodelist, **kwargs)¶ Bases:
pylatexenc.latexwalker.LatexNode
A LaTeX Environment Node, i.e.
\begin{something} ... \end{something}
.-
environmentname
¶ The name of the environment (‘itemize’, ‘equation’, …)
-
nodelist
¶ A list of
LatexNode
’s that represent all the contents between the\begin{...}
instruction and the\end{...}
instruction.
-
nodeargd
¶ The
pylatexenc.macrospec.ParsedMacroArgs
object that represents the arguments passed to the environment. These are arguments that are present after the\begin{xxxxxx}
command, as in\begin{tabular}{ccc}
or\begin{figure}[H]
. Arguments must be declared in the latex context passed to theLatexWalker
constructor, using a suitablepylatexenc.macrospec.EnvironmentSpec
object. Some known environments are already declared in the default latex context.New in version 2.0: The nodeargd attribute was introduced in pylatexenc 2.
The following attributes are available, but they are obsolete since pylatexenc 2.0.
-
envname
¶ Deprecated since version 2.0: This attribute was renamed environmentname for consistency with the rest of the package.
-
optargs
¶ Deprecated since version 2.0: Macro arguments are stored in nodeargd in pylatexenc 2. Accessing the argument optargs will still give a list of initial optional arguments for standard latex macros, for backwards compatibility.
-
args
¶ Deprecated since version 2.0: Macro arguments are stored in nodeargd in pylatexenc 2. Accessing the argument args will still give a list of curly-brace-delimited arguments for standard latex macros, for backwards compatibility.
-
-
class
pylatexenc.latexwalker.
LatexSpecialsNode
(specials_chars, **kwargs)¶ Bases:
pylatexenc.latexwalker.LatexNode
Represents a specials type node, e.g.
&
or~
-
specials_chars
¶ The name of the specials (string), without the leading backslash.
-
nodeargd
¶ If the specials spec (cf.
SpecialsSpec
) has args_parser=None then the attribute nodeargd is set to None. If args_parser is specified in the spec, then the attribute nodeargd is apylatexenc.macrospec.ParsedMacroArgs
instance that represents the arguments to the specials.The nodeargd attribute can also be None even if the specials expects arguments, in the special situation where
LatexWalker.get_latex_expression()
encounters this specials.Arguments must be declared in the latex context passed to the
LatexWalker
constructor, using a suitablepylatexenc.macrospec.SpecialsSpec
object. Some known latex specials are already declared in the default latex context.
New in version 2.0: Latex specials were introduced in pylatexenc 2.0.
-
-
class
pylatexenc.latexwalker.
LatexMathNode
(displaytype, nodelist=[], **kwargs)¶ Bases:
pylatexenc.latexwalker.LatexNode
A Math node type.
Note that currently only ‘inline’ math environments are detected.
-
displaytype
¶ Either ‘inline’ or ‘display’, to indicate an inline math block or a display math block. (Note that math environments such as
\begin{equation}...\end{equation}
, are reported asLatexEnvironmentNode
’s, and not asLatexMathNode
’s.)
-
delimiters
¶ A 2-item tuple containing the begin and end delimiters used to delimit this math mode section.
New in version 2.0: The delimiters attribute was introduced in pylatexenc 2.
-
Parsing helpers¶
-
class
pylatexenc.latexwalker.
ParsingState
(**kwargs)¶ Stores some information about the current parsing state, such as whether we are currently in a math mode block.
One of the ideas of pylatexenc is to make the parsing of LaTeX code mostly state-independent mark-up parsing (in contrast to a full TeX engine, whose state constantly changes and whose parsing behavior is altered dynamically while parsing). However a minimal state of the context might come in handy sometimes. Perhaps some macros or specials should behave differently in math mode than in text mode.
This class also stores some essential information that is associated with
LatexNode
’s and which provides a context to better understand the node structure. For instance, we store the original parsed string, and each node refers to which part of the string they represent.-
s
¶ The string that is parsed by the
LatexWalker
-
latex_context
¶ The latex context (with macros/environments specifications) that was used when parsing the string s. This is a
pylatexenc.macrospec.LatexContextDb
object.
-
in_math_mode
¶ Whether or not we are in a math mode chunk of LaTeX (True or False). This can be inline or display, and can be caused by an equation environment.
-
math_mode_delimiter
¶ Information about the kind of math mode we are currently in, if in_math_mode is True. This is a string which can be set to aid the parser. The parser sets this field to the math mode delimiter that initiated the math mode (one of
'$'
,'$$'
,r'\('
,r'\)'
). For user-initiated math modes (e.g. by a custom environment definition), you can set this string to any custom value EXCEPT any of the core math mode delimiters listed above.Note
The tokenizer/parser relies on the value of the math_mode_delimiter attribute to disambiguate two consecutive dollar signs
...$$...
into either a display math mode delimiter or two inline math mode delimiters (as in$a$$b$
). You should only set math_mode_delimiter=’$’ if you know what you’re doing.
New in version 2.0: This class was introduced in version 2.0.
New in version 2.7: The attribute math_mode_delimiter was introduced in version 2.7.
Changed in version 2.7: All arguments must now be specified as keyword arguments as of version 2.7.
-
get_fields
()¶ Returns the fields and values associated with this ParsingState as a dictionary.
-
sub_context
(**kwargs)¶ Return a new
ParsingState
instance that is a copy of the current parsing state, but where the given properties keys have been set to the corresponding values (given as keyword arguments).This makes it easy to create a sub-context in a given parser. For instance, if we enter math mode, we might write:
parsing_state_inner = parsing_state.sub_context(in_math_mode=True)
If no arguments are provided, this returns a copy of the present parsing context object.
-
-
class
pylatexenc.latexwalker.
LatexToken
(tok, arg, pos, len, pre_space, post_space='')¶ Represents a token read from the LaTeX input.
This is used internally by
LatexWalker
’s methods. You probably don’t need to worry about individual tokens. Rather, you should use the high-level functions provided byLatexWalker
(e.g.,get_latex_nodes()
). So most likely, you can ignore this class entirely.Instances of this class are what the method
LatexWalker.get_token()
returns. See the doc of that function for more information on how tokens are parsed.This is not the same thing as a LaTeX token, it’s just a part of the input which we treat in the same way (e.g. a bunch of content characters, a comment, a macro, etc.)
Information about the object is stored into the fields tok and arg. The tok field is a string which identifies the type of the token. The arg depends on what tok is, and describes the actual input.
Additionally, this class stores information about the position of the token in the input stream in the field pos. This pos is an integer which corresponds to the index in the input string. The field len stores the length of the token in the input string. This means that this token spans in the input string from pos to pos+len.
Leading whitespace before the token is not returned as a separate ‘char’-type token, but it is given in the pre_space field of the token which follows. Pre-space may contain a newline, but not two consecutive newlines.
The post_space is only used for ‘macro’ and ‘comment’ tokens, and it stores any spaces encountered after a macro, or the newline with any following spaces that terminates a LaTeX comment. When we encounter two consecutive newlines these are not included in post_space.
The tok field may be one of:
‘char’: raw character(s) which have no special LaTeX meaning and which are part of the text content.
The arg field contains the characters themselves.
‘macro’: a macro invocation, but not
\begin
or\end
The arg field contains the name of the macro, without the leading backslash.
‘begin_environment’: an invocation of
\begin{environment}
.The arg field contains the name of the environment inside the braces.
‘end_environment’: an invocation of
\end{environment}
.The arg field contains the name of the environment inside the braces.
‘comment’: a LaTeX comment delimited by a percent sign up to the end of the line.
The arg field contains the text in the comment line, not including the percent sign nor the newline.
‘brace_open’: an opening brace. This is usually a curly brace, and sometimes also a square bracket. What is parsed as a brace depends on the arguments to
get_token()
.The arg is a string which contains the relevant brace character.
‘brace_close’: a closing brace. This is usually a curly brace, and sometimes also a square bracket. What is parsed as a brace depends on the arguments to
get_token()
.The arg is a string which contains the relevant brace character.
‘mathmode_inline’: a delimiter which starts/ends inline math. This is (e.g.) a single ‘$’ character which is not part of a double ‘$$’ display environment delimiter.
The arg is the string value of the delimiter in question (‘$’)
‘mathmode_display’: a delimiter which starts/ends display math, e.g.,
\[
.The arg is the string value of the delimiter in question (e.g.,
\[
or$$
)‘specials’: a character or character sequence that has a special meaning in LaTeX. E.g., ‘~’, ‘&’, etc.
The arg field is then the corresponding
SpecialsSpec
instance. [The rationale for setting arg to a SpecialsSpec instance, in contrast to the behavior for macros and envrionments, is that macros and environments are delimited directly by LaTeX syntax and are determined unambiguously without any lookup in the latex context database. This is not the case for specials.]
Legacy Macro Definitions (for pylatexenc 1.x)¶
-
pylatexenc.latexwalker.
MacrosDef
= <function MacrosDef>¶ Deprecated since version 2.0: Use
pylatexenc.macrospec.std_macro()
instead which does the same thing, or invoke theMacroSpec
class directly (or a subclass).In pylatexenc 1.x, MacrosDef was a class. Since pylatexenc 2.0, MacrosDef is a function which returns a
MacroSpec
instance. In this way the earlier idiomMacrosDef(...)
still works in pylatexenc 2. The field names of the constructed object might have changed since pylatexenc 1.x, so you might have to adapt existing code if you were accessing individual fields of MacrosDef objects.In the object returned by MacrosDef(), we provide the legacy attributes macname, optarg, and numargs, so that existing code accessing those properties can continue to work.
-
pylatexenc.latexwalker.
default_macro_dict
¶ Deprecated since version 2.0: Use
get_default_latex_context_db()
instead, or create your ownpylatexenc.macrospec.LatexContextDb
object.Provide an access to the default macro specs for latexwalker in a form that is compatible with pylatexenc 1.x’s default_macro_dict module-level dictionary.
This is implemented using a custom lazy mutable mapping, which behaves just like a regular dictionary but that loads the data only once the dictionary is accessed. In this way the default latex specs into a python dictionary unless they are actually queried or modified, and thus users of pylatexenc 2.0 that don’t rely on the default macro/environment definitions shouldn’t notice any decrease in performance.