latexnodes — LaTeX Nodes Tree and Parsers¶
New in version 3.0: The latexnodes module was introduced in pylatexenc 3.
Parsing State¶
- class pylatexenc.latexnodes.ParsingState(**kwargs)¶
Stores some information about the current parsing state, such as whether we are currently in a math mode block.
One of the ideas of pylatexenc is to make the parsing of LaTeX code mostly state-independent mark-up parsing (in contrast to a full TeX engine, whose state constantly changes and whose parsing behavior is altered dynamically while parsing). However a minimal state of the context might come in handy sometimes. Perhaps some macros or specials should behave differently in math mode than in text mode.
This class also stores some essential information that is associated with
LatexNode
‘s and which provides a context to better understand the node structure. For instance, we store the original parsed string, and each node refers to which part of the string they represent.- s¶
The string that is parsed by the
LatexWalker
Deprecated since version 3.0: The s attribute is deprecated starting in pylatexenc 3. If you have access to a node instance (cf.
LatexNode
) and would like to find out the original string that was parsed, use node.latex_walker.s instead of querying the parsing state. (The rationale of removing the s attribute from the parsing state is for parsing state objects to have a meaning of their own independently of any string being parsed or any latex walker instance.)
- latex_context¶
The latex context (with macros/environments specifications) that was used when parsing the string s. This is a
pylatexenc.macrospec.LatexContextDb
object.
- in_math_mode¶
Whether or not we are in a math mode chunk of LaTeX (True or False). This can be inline or display, and can be caused by an equation environment.
- math_mode_delimiter¶
Information about the kind of math mode we are currently in, if in_math_mode is True. This is a string which can be set to aid the parser. The parser sets this field to the math mode delimiter that initiated the math mode (one of
'$'
,'$$'
,r'\('
,r'\['
). For user-initiated math modes (e.g. by a custom environment definition), you can set this string to any custom value EXCEPT any of the core math mode delimiters listed above.Note
The tokenizer/parser relies on the value of the math_mode_delimiter attribute to disambiguate two consecutive dollar signs
...$$...
into either a display math mode delimiter or two inline math mode delimiters (as in$a$$b$
). You should only set math_mode_delimiter=’$’ if you know what you’re doing.
- latex_group_delimiters¶
Doc …………
- latex_inline_math_delimiters¶
Doc …………
- latex_display_math_delimiters¶
Doc …………
- enable_double_newline_paragraphs¶
Doc …………
- enable_environments¶
.Doc ………..
- enable_comments¶
Doc …………
- macro_alpha_chars¶
Doc …………
- macro_escape_char¶
Doc …………….
- comment_start¶
Doc ………….
- forbidden_characters¶
Characters that are simply forbidden to occur as regular characters. You can use this for instance if you’d like to disable some LaTeX-like features but cause the corresponding character to raise an error. For instance, you can force inline math to be typed as
\(...\)
and not as$...$
, and yet still force users to type\$
for a dollar sign by including ‘$’ in the list of forbidden characters.The forbidden_characters can be a string, or a list of single-character strings; this attribute will be used with the syntax
if (c in forbidden_characters): ...
New in version 2.0: This class was introduced in version 2.0.
New in version 2.7: The attribute math_mode_delimiter was introduced in version 2.7.
Changed in version 2.7: All arguments must now be specified as keyword arguments as of version 2.7.
New in version 3.0: The attributes latex_group_delimiters, latex_inline_math_delimiters, latex_display_math_delimiters, enable_double_newline_paragraphs, enable_environments, enable_comments, macro_alpha_chars, macro_escape_char, and forbidden_characters were introduced in version 3.
New in version 3.0: This class was moved to
pylatexenc.latexnodes.ParsingState
starting in pylatexenc 3.0. In earlier versions, this class was located in thelatexwalker
module, seeParsingState
.- sub_context(**kwargs)¶
Return a new
ParsingState
instance that is a copy of the current parsing state, but where the given properties keys have been set to the corresponding values (given as keyword arguments).This makes it easy to create a sub-context in a given parser. For instance, if we enter math mode, we might write:
parsing_state_inner = parsing_state.sub_context(in_math_mode=True)
If no arguments are provided, this returns a copy of the present parsing context object.
- get_fields()¶
Returns the fields and values associated with this ParsingState as a dictionary.
- class pylatexenc.latexnodes.ParsingStateDelta(set_attributes=None, _fields=None, **kwargs)¶
Describe a change in the parsing state. Can be the transition into math mode; the definition of a new macro causing the latex context to change; etc. etc.
There are many ways in which the parsing state can change, and this is reflected in the many different subclasses of ParsingStateDelta (e.g.,
ParsingStateDeltaEnterMathMode
).This class serves both as a base class for general parsing state changes, as well as a simple implementation of a parsing state change based on parsing state attributes that are to be changed.
- get_updated_parsing_state(parsing_state, latex_walker)¶
Apply any required changes to the given parsing_state and return a new parsing state that reflects all the necessary changes.
The new parsing state instance might be the same object instance as is if no changes need to be applied.
- class pylatexenc.latexnodes.ParsingStateDeltaReplaceParsingState(set_parsing_state, **kwargs)¶
A parsing state change in which a new full parsing state object entirely replaces the previous parsing state.
- class pylatexenc.latexnodes.ParsingStateDeltaChained(parsing_state_deltas, **kwargs)¶
Apply multiple parsing state deltas, in the order specified.
- class pylatexenc.latexnodes.ParsingStateDeltaWalkerEvent(walker_event_name, walker_event_kwargs)¶
A parsing state change representing a logical “event” (like entering math mode), for which the actual parsing state changes should be requested to the latex walker instance.
DOC………………….
- class pylatexenc.latexnodes.ParsingStateDeltaEnterMathMode(math_mode_delimiter=None, trigger_token=None)¶
A parsing state change representing the beginning of math mode contents.
This class is a semantic marker for entering math mode and does not itself set the field in_math_mode=True for the parsing state. It’s a “walker event parsing state delta”, see
ParsingStateDeltaWalkerEvent
. The latexwalker is queried to obtain the actual parsing state change that should be effected because of the change to math mode. (There might be changes other than in_math_mode=True, such as a different set of macro definitions, etc.)
- class pylatexenc.latexnodes.ParsingStateDeltaLeaveMathMode(trigger_token=None)¶
A parsing state change representing contents in text mode.
See also
ParsingStateDeltaEnterMathMode
.
Latex Token¶
- class pylatexenc.latexnodes.LatexToken(tok, arg, pos, pos_end=None, pre_space='', post_space='', **kwargs)¶
Represents a token read from the LaTeX input. Instances of this class are return by token readers such as
LatexTokenReader
.This is not the same thing as a LaTeX token, it’s just a part of the input which we treat in the same way (e.g. a text character, a comment, a macro, etc.)
Information about the object is stored into the fields tok and arg. The tok field is a string which identifies the type of the token. The arg depends on what tok is, and describes the actual input.
Additionally, this class stores information about the position of the token in the input stream in the field pos. This pos is an integer which corresponds to the index in the input string. The field pos_end stores the position immediately past the token in the input string. This means that the string length spanned by this token is pos_end - pos (without leading whitespace).
Leading whitespace before the token is not returned as a separate ‘char’-type token, but it is given in the pre_space field of the token which follows. Pre-space may contain a newline, but not two consecutive newlines. The pos position is the position of the first character of the token itself, which immediately follows any leading whitespace.
The post_space is only used for ‘macro’ and ‘comment’ tokens, and it stores any spaces encountered after a macro, or the newline with any following spaces that terminates a LaTeX comment. When we encounter two consecutive newlines these are not included in post_space. Contrary to pre_space, the post_space is accounted for in the attribute pos_end, i.e., pos_end points immediately after any trailing whitespace.
The tok field may be one of:
‘char’: raw character(s) which have no special LaTeX meaning and which are part of the text content.
The arg field contains the characters themselves.
‘macro’: a macro invocation, but not
\begin
or\end
The arg field contains the name of the macro, without the leading backslash.
‘begin_environment’: an invocation of
\begin{environment}
.The arg field contains the name of the environment inside the braces.
‘end_environment’: an invocation of
\end{environment}
.The arg field contains the name of the environment inside the braces.
‘comment’: a LaTeX comment delimited by a percent sign up to the end of the line.
The arg field contains the text in the comment line, not including the percent sign nor the newline.
‘brace_open’: an opening brace. This is usually a curly brace, and sometimes also a square bracket. What is parsed as a brace depends on the arguments to
get_token()
.The arg is a string which contains the relevant brace character.
‘brace_close’: a closing brace. This is usually a curly brace, and sometimes also a square bracket. What is parsed as a brace depends on the arguments to
get_token()
.The arg is a string which contains the relevant brace character.
‘mathmode_inline’: a delimiter which starts/ends inline math. This is (e.g.) a single ‘$’ character which is not part of a double ‘$$’ display environment delimiter.
The arg is the string value of the delimiter in question (‘$’)
‘mathmode_display’: a delimiter which starts/ends display math, e.g.,
\[
.The arg is the string value of the delimiter in question (e.g.,
\[
or$$
)‘specials’: a character or character sequence that has a special meaning in LaTeX. E.g., ‘~’, ‘&’, etc.
The arg field is then the corresponding
SpecialsSpec
instance.The rationale for setting arg to a SpecialsSpec instance, in contrast to the behavior for macros and envrionments, is that macros and environments are delimited directly by LaTeX syntax and are determined unambiguously without any lookup in the latex context database. This is not the case for specials, where successfully parsing a specials already requires a lookup in the context database, and so the spec object is readily available.
Changed in version 3.0: Starting in pylatexenc 3, the len argument was replaced by pos_end. For backwards compatibility, kwargs arguments are inspected for a len argument. If a len argument is provided and pos_end was left None, then pos_end is set to pos+len.
New in version 3.0: This class was moved to
pylatexenc.latexnodes.LatexToken
starting in pylatexenc 3.0. In earlier versions, this class was located in thelatexwalker
module, seeLatexToken
.
Token Readers¶
- class pylatexenc.latexnodes.LatexTokenReaderBase(**kwargs)¶
Base class for token readers.
A token reader is able to transform input characters (usually given as a single string) into tokens. Tokens are instances of
LatexToken
.A token reader also has an internal position pointer that remembers where in the string we should continue to read more tokens. A call to
next_token()
will both parse a new token and advance the internal position pointer past the token that was just read, such that future calls tonext_token()
continue parsing tokens as they appear in the string.A token reader should at minimum provide implementations to
peek_token()
,move_to_token()
,move_past_token()
, andcur_pos()
.A token reader can (but does not have to) also provide character-level access to the input. This can be used by some special parsers like verbatim parsers. In this case, the token reader should implement
peek_chars()
,next_chars()
, andmove_to_pos_chars()
.New in version 3.0: The
LatexTokenReaderBase
class was introduced in pylatexenc 3.0.- make_token(**kwargs)¶
Return a new
LatexToken
instance with the given parameters. Can be reimplemented if you want to use a custom token class, although I’m not sure why you’d want to do that.
- move_to_token(tok, rewind_pre_space=True)¶
Move the internal position pointer of this token reader to point to the position of the given token tok. That is, a subsequent call to
peek_token()
ornext_token()
should read the given token again.For token readers that can worry about whitespace, if rewind_pre_space=True, then the internal position is set to point on the whitespace that precedes the token tok (as specified in the instance tok); if rewind_pre_space=False the internal position pointer is set to point on the actual token after the preceding whitespace.
- move_past_token(tok, fastforward_post_space=True)¶
Move the internal position pointer of this token reader to point immediately past the given token tok. That is, a subsequent call to
peek_token()
ornext_token()
should return the token that follows tok in the input stream.For token readers that can worry about whitespace, if fastforward_post_space=True, then whitespace that follows the given tok (for macro and comment nodes) is also skipped.
- peek_token(parsing_state)¶
Parse a single token at the current position in the input stream. Parsing is influenced by the given parsing_state. (See
ParsingState
.)The internal position pointer is not updated. I.e., a subsequent call to peek_token() with the same parsing state should return the same token.
If the end of stream is reached, i.e., if there are no remaining tokens at the current internal position, then
LatexWalkerEndOfStream
is raised.
- peek_token_or_none(parsing_state)¶
A convenience method that calls
peek_token()
, but that returns None instead of raisingLatexWalkerEndOfStream
.
- next_token(parsing_state)¶
Same as
peek_token()
, but then also updates the internal position pointer of this token reader to advance past the token that was read.
- cur_pos()¶
Return the current internal position pointer’s state.
- peek_space_chars(parsing_state)¶
Read a sequence of whitespace characters and return them. Whitespace characters should be read until a nonwhitespace character is found.
The current internal position pointer should remain as it is.
- skip_space_chars(parsing_state)¶
Read a sequence of whitespace characters and return them. Whitespace characters should be read until a nonwhitespace character is found.
Advance internal position as whitespace characters are read. The position pointer should be left immediately after any encountered whitespace. If the current pointed position is not whitespace, the position should not be advanced.
- peek_chars(num_chars, parsing_state)¶
Reads at most num_chars of characters at the current position and returns them. The internal position pointer is not changed.
If the pointer is already at the end of the string and there are no chars we can read, then
LatexWalkerEndOfStream
is raised.
- next_chars(num_chars, parsing_state)¶
Reads at most num_chars of characters at the current position and returns them. The internal position pointer is advanced to point immediately after the characters read.
If the pointer is already at the end of the string and there are no chars we can read, then
LatexWalkerEndOfStream
is raised.
- move_to_pos_chars(pos)¶
Move the internal position pointer to a specific character-level position in the input string/stream.
- class pylatexenc.latexnodes.LatexTokenReader(s, *, tolerant_parsing=False)¶
Parse tokens from an input string to create
LatexToken
instances.Inherits
LatexTokenReaderBase
. See also the methods there for the standard token reader interface (such asLatexTokenReaderBase.peek_token()
and friends).The main functionality of this class is coded in the impl_***() methods. To extend this class with custom functionality, you should reimplement those. The methods reimplemented from
LatexTokenReaderBase
add layers of exception catching and recovery, etc., so be wary of reimplementing them manually.Attributes:
New in version 3.0: The
LatexTokenReader
class was introduced in pylatexenc 3.0.- move_to_token(tok, rewind_pre_space=True)¶
Reimplemented from
LatexTokenReaderBase.move_to_token()
.
- move_past_token(tok, fastforward_post_space=True)¶
Reimplemented from
LatexTokenReaderBase.move_past_token()
.
- peek_chars(num_chars, parsing_state)¶
Reimplemented from
LatexTokenReaderBase.peek_chars()
.
- next_chars(num_chars, parsing_state)¶
Reimplemented from
LatexTokenReaderBase.next_chars()
.
- cur_pos()¶
Reimplemented from
LatexTokenReaderBase.cur_pos()
.
- move_to_pos_chars(pos)¶
Reimplemented from
LatexTokenReaderBase.move_to_pos_chars()
.
- skip_space_chars(parsing_state)¶
Move internal position to skip any whitespace. The position pointer is left immediately after any encountered whitespace. If the current pointed position is not whitespace, the position is not advanced.
If parsing_state.enable_double_newline_paragraphs is set, then two consecutive newlines do not count as whitespace.
Returns the string of whitespace characters that was skipped.
Reimplemented from
LatexTokenReaderBase.skip_space_chars()
.
- peek_space_chars(parsing_state)¶
Reimplemented from
LatexTokenReaderBase.peek_space_chars()
.
- peek_token(parsing_state)¶
Read a single token without updating the current position pointer. Returns the token that was parsed.
Parse errors while reading the token are handled differently whether or not we are in tolerant parsing mode. (See
tolerant_parsing
attribute and constructor argument.) If not in tolerant mode, the error is raised. When in tolerant parsing mode, the error is translated into a “recovery token” provided by the error object. The “recovery token” is returned as if no error had occurred, in order to continue parsing.Reimplemented from
LatexTokenReaderBase.peek_token()
.
- impl_peek_token(parsing_state)¶
Read a single token and return it.
If the end of stream is reached, raise
LatexWalkerEndOfStream
(regardless of whether or not we are in tolerant parsing mode).
- impl_peek_space_chars(s, pos, parsing_state)¶
Look at the string s, and identify how many characters need to be skipped in order to skip whitespace. Does not update the internal position pointer.
Return a tuple (space_string, pos, pos_end) where space_string is the string of whitespace characters that would be skipped at the current position pointer (reported in pos). The integer pos_end is the position immediately after the space characters.
No exception is raised if we encounter the end of the stream, we simply stop looking for more spaces.
- impl_char_token(c, pos, pos_end, parsing_state, pre_space)¶
Read a character token.
This method checks that the given character is not a forbidden character, see
ParsingState.forbidden_characters
.
- impl_maybe_read_math_mode_delimiter(s, pos, parsing_state, pre_space)¶
See if we can read a math mode delimiter token. This method is called only after a first check (math mode is enabled in parsing state, and the character is one of the first characters of known math mode delimiters).
Return the math mode token, or None if we didn’t encounter a math mode delimiter.
- impl_read_macro(s, pos, parsing_state, pre_space)¶
Read a macro call token. Called when the character at the current position is a macro escape character (usually
\
, seeParsingState.macro_escape_char
).Macro characters that form long macro names are determined by the py:attr:ParsingState.macro_alpha_chars attribute.
Return the macro token.
- rx_environment_name = re.compile('\\s*\\{(?P<environmentname>[A-Za-z0-9*._ :/!^()\\[\\]-]+)\\}')¶
A regular expression that will read the environment name after encountering the
\begin
or\end
constructs.
- parse_latex_environment_name(pos, beginend, pos_envname)¶
Parse an environment name in curly braces after encountering
\begin
or\end
.We allow for whitespace, an opening brace, a macro name with normal ASCII alphanumeric characters and some standard punctuation, and a closing curly brace.
We use the regular expression stored as the class attribute rx_environment_name. To override it, you can simply set this attribute to your token reader object instance, e.g.,
my_token_reader.rx_environment_name = .....
Return a tuple (environmentname, environment_match_end_pos). If the environment name could not be read because of a parse error, then return (None, None).
- impl_read_environment(s, pos, parsing_state, beginend, pre_space)¶
Parse a
\begin{environmentname}
or\end{environmentname}
token.This method is called after we have seen that at the position pos in the string we indeed have
\begin
or\end
(or with the current escape character instead of\
).Return the parsed token.
- impl_read_comment(s, pos, parsing_state, pre_space)¶
Parse and return a comment token.
We also parse the post-space and include it in the token object. New paragraph tokens are never included in the comment’s post-space attribute.
- class pylatexenc.latexnodes.LatexTokenListTokenReader(token_list)¶
A token reader object that simply yields tokens from a list of already-parsed tokens.
This object doesn’t parse any LaTeX code. Use LatexTokenReader for that.
Arguments and Parsed Arguments¶
- class pylatexenc.latexnodes.LatexArgumentSpec(parser, argname=None, parsing_state_delta=None)¶
Specify an argument accepted by a callable (a macro, an environment, or specials).
- parser¶
The parser instance to use to parse an argument to this callable.
For the constructor you can also specify a string represending a standard argument type, such as ‘{’, ‘[’, ‘*’, or also some xparse-inspired strings. See
LatexStandardArgumentParser
. In this case, a suitable parser is instanciated and stored in the parser attribute.
- argname¶
A name for the argument (which can be None, if the argument is to be referred to only by number).
The name can serve for easier argument lookups and can offer more future-proof flexibility: E.g., while adding more optional arguments renumbers all arguments, you can refer to them by name to avoid having to update all references to argument numbers.
See
ParsedArgumentsInfo
for an interface for looking up argument values on a node instance.
- parsing_state_delta¶
Specify if this argument should be parsed with a specifically altered parsing state (e.g., if the argument should be parsed in math mode).
New in version 3.0: This class was introduced in pylatexenc 3.
- class pylatexenc.latexnodes.ParsedArguments(argnlist=None, arguments_spec_list=None, **kwargs)¶
Parsed representation of macro arguments.
The base class provides a simple way of storing the arguments as a list of parsed nodes.
This base class can be subclassed to store additional information and provide more advanced APIs to access macro arguments for certain categories of macros.
Arguments:
argnlist is a list of latexwalker nodes that represent macro arguments. If the macro arguments are too complicated to store in a list, leave this as None. (But then code that uses the latexwalker must be aware of your own API to access the macro arguments.)
The difference between argnlist and the legacy nodeargs (in pylatexenc 1.x) is that all options, regardless of optional or mandatory, are stored in the list argnlist with possible None‘s at places where optional arguments were not provided. Previously, whether a first optional argument was included in nodeoptarg or nodeargs depended on how the macro specification was given.
argspec is a string or a list that describes how each corresponding argument in argnlist represents. If the macro arguments are too complicated to store in a list, leave this as None. For standard macros and parsed arguments this is a string with characters ‘*’, ‘[‘, ‘{’ describing an optional star argument, an optional square-bracket-delimited argument, and a mandatory argument.
Attributes:
- argnlist¶
The list of latexwalker nodes that was provided to the constructor
- arguments_spec_list¶
Argument types, etc. …………….
- argspec¶
Argument type specification provided to the constructor
Deprecated since version 3.0: The attribute argspec is deprecated and only read-only starting from pylatexenc 3. Use the arguments_spec_list attribute instead.
- legacy_nodeoptarg_nodeargs¶
A tuple (nodeoptarg, nodeargs) that should be exposed as properties in
LatexMacroNode
to provide (as best as possible) compatibility with pylatexenc < 2.This is either (<1st optional arg node>, <list of remaining args>) if the first argument is optional and all remaining args are mandatory; or it is (None, <list of args>) for any other argument structure.
Deprecated since version 2.0: The legacy_nodeoptarg_nodeargs might be removed in a future version of pylatexenc.
Changed in version 3.0: This class used to be called ParsedMacroArgs in pylatexenc 2. It provides a mostly backwards-compatible interface to the earlier ParsedMacroArgs class, and is still exposed as macrospec.ParsedMacroArgs.
- to_json_object()¶
Called when we export the node structure to JSON when running latexwalker in command-line.
Return a representation of the current parsed arguments in an object, typically a dictionary, that can easily be exported to JSON. The object may contain latex nodes and other parsed-argument objects, as we use a custom JSON encoder that understands these types.
- class pylatexenc.latexnodes.ParsedArgumentsInfo(parsed_arguments=None, node=None)¶
Utility class that can gather information about the arguments stored in a
ParsedArguments
instance.- get_argument_info(arg)¶
Return some information about an argument.
If arg is an integer, then it is interpreted as an index in the list of arguments. If it is a string, then it is interpreted as a named argument, and a corresponding
LatexArgumentSpec
will be sought with a matching argname attribute.The returned object is a
SingleParsedArgumentInfo
instance.
- get_all_arguments_info(args=None, allow_additional_arguments=False, skip_nonexistent_arguments=False, include_unrequested_argnames=None, include_unrequested_argindices=None)¶
A helper function to return info objects for all arguments.
Here, args specifies which arguments to retrieve information for. If args=None, then information about all known arguments are returned. Otherwise, you can specify a list wherein each item is an argument name or an argument index.
This method returns a dictionary of argument names or argument indices to
ParsedArgumentInfo
instances.Which keys are included in the returned dictionary is determined by args, include_unrequested_argnames, and include_unrequested_argindices:
If args is non-None, then include_unrequested_argnames and include_unrequested_argindices both default to False if they are not specified or if they are set to None. All argument names and indices specified in args are included in the returned dictionary (except those corresponding to missing args in case skip_nonexistent_arguments=True, see below). If some argument names (resp., indicies) are not specified in args, they are included only if include_unrequested_argnames (resp. include_unrequested_argindices) is True.
If args is None, then include_unrequested_argnames and include_unrequested_argindices both default to True if they are not specified or if they are set to None. If include_unrequested_argnames is True, then the returned dictionary contains all the known argument names for the parsed arguments. If include_unrequested_argindices is True, then the returned dictionary contains all the known argument indices for the parsed arguments.
The allow_additional_arguments flag sets the behavior to adopt if an argument was found in the present argument list that is not in args. If False, then a parse error is raised complaining about an unexpected argument. If True, it is ignored.
The skip_nonexistent_arguments flag defines the behavior to adopt if an argument requested in args does not appear in the present argument list. If False, then a parse error is raised complaining about a missing argument. If True, the error is ignored and the returned dictionary will not include an entry for that argument.
- class pylatexenc.latexnodes.SingleParsedArgumentInfo(argument_node_object)¶
Helper class to retrieve information about a given argument that was specified and parsed to a latex callable object (macro, environment, or specials).
You normally won’t have to instantiate this object yourself, rather, instances are returned by
ParsedArgumentsInfo.get_argument_info()
andParsedArgumentsInfo.get_all_arguments_info()
.New in version 3.0: This class was introduced in pylatexenc 3.
- was_provided()¶
Return True if the given argument was provided to the macro (or environment/specials) call, False if the argument was not provided. This only makes sense for optional arguments and will always return True for a mandatory argument that was provided.
Checks that the given node object argument_node_object is not None.
- get_content_nodelist(unwrap_double_group=True, make_nodelist=None)¶
Return a node list with the contents of the argument. The returned object is always a
LatexNodeList
instance.If the argument node is a
LatexGroupNode
instance (e.g., a mandatory argument delimited by braces as in\textbf{Hello world}
), then we return the node list contents of that group node. If the argument is a single node instance of a type other than a group node, then we return a new node list containing that single node. If an optional argument was not provided, then we return a node list that contains a single None item.Additionally, if unwrap_double_group is True, and if the argument was wrapped in a LatexGroupNode, and if the group node has only a single node that is itself a LatexGroupNode with different delimiters, then the contents of the inner group is returned. This makes sure that for instance, we can pass a raw brace ([) as an optional argument with the construct
[{[}]
.
- get_content_as_chars()¶
Return the argument contents as a single character string.
The argument must be such that only character nodes (and possibly comment nodes) were given, and an error will be raised otherwise. The content might still be contained in a single group node.
This method first extracts the content node list with
get_content_nodelist()
. Then, it iterates through the node list, ignoring None items and comment nodes, while concatenating strings in character nodes. Any other node type causes a LatexWalkerParseError to be raised.This method is useful to extract character arguments from macro calls with an argument that requires a single string, such as
\label{my-label}
or\href{https://example.com/}{...}
.If the argument consists of a group which contains character and comment nodes (as happens with arguments delimited by braces), the group delimiters are not included in the returned string.
- parse_content_as_keyval(**kwargs)¶
Return a dictionary of key-values, parsing the present argument as key-value pairs of the form
key1=<value1>,key2=<value2>,...
.This method is a shorthand for
parse_keyval_content()
Nodes Collector¶
- class pylatexenc.latexnodes.LatexNodesCollector(latex_walker, token_reader, parsing_state, stop_token_condition=None, stop_nodelist_condition=None, make_child_parsing_state=None, include_stop_token_pre_space_chars=True)¶
Process a stream of LaTeX tokens and convert them into a list of nodes.
The LatexNodesCollector class functions hand-in-hand with parsers to transform tokens into nodes. A parser such as
LatexGeneralNodesParser
might set up the parsing state correctly and then defer to a LatexNodesCollector instance to actually parse a bulk of contents. The LatexNodesCollector instance, on the other hand, recurses down to calling parsers when we encounter new macros, environments, specials, etc. in the bulk that is being parsed. The result is a node list containing a full tree of child nodes that represents the logical structure of the tokens that were encountered.The public API of this class resides essentially in the
process_tokens()
, as well as theget_final_nodelist()
(and some other friends, see docs below).New in version 3.0: The
LatexNodesCollector
class was added in pylatexenc 3.0.- exception ReachedEndOfStream¶
Raised by the
process_one_token()
method if we reached the end of stream.You should not have to worry about this exception unless you call
process_one_token()
yourself. But most of the time you’ll be callingprocess_tokens()
instead, which does not raise this exception; it directly raisesLatexWalkerEndOfStream
as the higher-level parsers do.
- exception ReachedStoppingCondition(stop_data, **kwargs)¶
Raised by the
process_one_token()
method to indicate that a stopping condition was met.You should not have to worry about this exception unless you call
process_one_token()
yourself. But most of the time you’ll be callingprocess_tokens()
instead, which simply stops processing tokens if a stopping condition is met.
- get_final_nodelist()¶
Returns the final nodelist collected from the processed tokens.
The return value is a
LatexNodeList
instance.
- get_parser_parsing_state_delta()¶
Doc. …………
- pos_start()¶
Returns the first position of nodes in the collected node list (collected up to this point).
- pos_end()¶
Returns the position immediately after the last node in the collected node list (collected up to this point).
- stop_token_condition_met()¶
Returns True if the condition set as stop_token_condition was met while processing tokens.
- stop_token_condition_met_token()¶
Returns the token that caused the stop condition to be met.
- stop_nodelist_condition_met()¶
Returns True if the condition set as stop_nodelist_condition was met while processing tokens.
- stop_condition_stop_data()¶
If a stopping condition was met, returns whatever the stopping condition callback returned that was non-None and caused the processing to stop.
- reached_end_of_stream()¶
Returns True if we reached the end of the stream.
- is_finalized()¶
Whether this object’s node list has been finalized.
Once the object is finalized, you cannot parse any more tokens. See
finalize()
.
- finalize()¶
Finalize this object’s node list. This ensures that any pending characters that were read are collected into a final chars node. (In the future, there might be other tasks to perform to finalize the node list.)
Normally you don’t have to worry about calling finalize() yourself, because it is automatically called by
process_tokens()
. You should only worry about calling finalize() if you are calling process_one_token() manually.Once you call finalize(), you can no longer make any further calls to
process_tokens()
orprocess_one_token()
.
- push_pending_chars(chars, pos)¶
This method should only be called internally or by subclass derived methods.
Adds chars to the pending chars string, i.e., the latest chars that we have seen that will have to be collected into a chars node once we encounter anything other than a regular char.
- flush_pending_chars()¶
This method should only be called internally or by subclass derived methods.
Create a chars node out of all the pending chars that were added with calls to push_pending_chars(). Adds the chars node to the node list, and clears the pending chars string.
- push_to_nodelist(node)¶
This method should only be called internally or by subclass derived methods.
Add the given node to the final node list that we are building.
- update_state_from_parsing_state_delta(parsing_state_delta)¶
This method should only be called internally or by subclass derived methods.
Update our parsing_state attribute to account for any parsing state changes information that might have been provided by some parsed construct (say, a macro call).
- process_tokens()¶
Read tokens from token_reader until either we reach the end of the stream, or a stopping condition is met.
This function never returns anything interesting.
In all cases, the object is finalized (see
finalize()
) before this method finishes its execution, regardless of whether the function finishes by normal return or by raising an exception.You can inspect the reason that caused the end of the processing using the methods
stop_token_condition_met()
,stop_nodelist_condition_met()
andreached_end_of_stream()
.You can then call
get_final_nodelist()
to get the nodelist,get_parser_parsing_state_delta()
to get any carry-over information for the parser for future parsing, etc.
- process_one_token()¶
Read a single token and process it, recursing into brace blocks and environments etc if needed, and appending stuff to nodelist.
Whereas
process_tokens()
gathers tokens into nodes until a stopping condition is met or until the end of the stream is reached, the process_one_token() provides finer control on the execution of the process of collecting tokens and gathering them into nodes.Warning
Normally, it is better to use process_tokens() directly. If you want to read a single node, simply set a stopping condition that stops for instance once the node list has length at least one.
The process_one_token() method requires you to take care of some tasks yourself, which are normally automatically taken care of by
process_tokens()
. Read on below for more information.A number of tasks that are taken care of by
process_tokens()
are NOT taken care of here:If an end of stream is reached, we raise the exception LatexNodesCollector.ReachedEndOfStream. It’s up to you to catch it and do something relevant.
If a stopping condition is met, we raise the exception LatexNodesCollector.ReachedStoppingCondition. It’s up to you to catch it and do something relevant.
The function returns normally (without any return value) if neither a stopping condition is met nor the end of stream is met. Normally, this means we should continue processing tokens.
You have to take care that you call
finalize()
on the nodes collector instance once you’re done processing tokens.
- make_child_parsing_state(parsing_state, node_class, token)¶
Create a parsing state a child node of the given type node_class.
You can reimplement this method to customize the parsing state of child nodes.
- parse_comment_node(tok)¶
Process a token that introduces a comment. The token tok is of type
tok.tok == 'comment'
.The default implementation creates a
LatexCommentNode
and pushes it onto the node list.This method can be reimplemented to customize its behavior. Implementations should create the relevant node(s) and push them onto the node list with a call to
push_to_nodelist()
(refer to that method’s doc).
- parse_latex_group(tok)¶
Process a token that introduces a LaTeX group (e.g.
{a group}
). The token tok is of typetok.tok == 'brace_open'
according to the current parsing state.The default implementation uses the make_latex_group_parser provided by the LatexWalker instance to parse the group node, and pushes the resulting node onto the node list.
This method can be reimplemented to customize its behavior. Implementations should create the relevant node(s) and push them onto the node list with a call to
push_to_nodelist()
(refer to that method’s doc).
- parse_macro(tok)¶
Process a token representing a macro (e.g.
\macro
). The token tok is of typetok.tok == 'macro'
.The default implementation looks up the corresponding macro specification object via the parsing state’s latex context database, and defers to
parse_invocable_token_type()
.This method can be reimplemented to customize its behavior. Implementations should create the relevant node(s) and push them onto the node list with a call to
push_to_nodelist()
(refer to that method’s doc).
- parse_environment(tok)¶
Process a token representing an environment (e.g.
\begin{environment}
). The token tok is of typetok.tok == 'begin_environment'
.The default implementation looks up the corresponding environment specification object via the parsing state’s latex context database, and defers to
parse_invocable_token_type()
.This method can be reimplemented to customize its behavior. Implementations should create the relevant node(s) and push them onto the node list with a call to
push_to_nodelist()
(refer to that method’s doc).
- parse_specials(tok)¶
Process a token representing LaTeX specials (e.g.
~
). The token tok is of typetok.tok == 'specials'
.The default implementation defers to
parse_invocable_token_type()
.This method can be reimplemented to customize its behavior. Implementations should create the relevant node(s) and push them onto the node list with a call to
push_to_nodelist()
(refer to that method’s doc).
- parse_invocable_token_type(tok, spec, node_class, what)¶
Process a token representing either a macro call, a begin environment call, or specials chars.
This method is a convenience method that collects the similar processing for these three node types. The specification class is queried for the relevant parser object (
spec.get_node_parser()
), to which we defer for parsing the macro call / the environment / the specials.Additionally, the current parsing state is updated using the carry-over information reported by the call parser.
This method can be reimplemented to customize its behavior. Implementations should create the relevant node(s) and push them onto the node list with a call to
push_to_nodelist()
(refer to that method’s doc).
- parse_math(tok)¶
Process a token that introduces LaTeX math mode (e.g.
$ ... $
or\[ ... \]
). The token tok is of typetok.tok in ('mathmode_inline', 'mathmode_display')
according to the current parsing state.The default implementation uses the make_latex_math_parser() provided by the latex walker to parse the group node, and pushes the resulting node onto the node list.
This method can be reimplemented to customize its behavior. Implementations should create the relevant node(s) and push them onto the node list with a call to
push_to_nodelist()
(refer to that method’s doc).
Exception classes¶
- class pylatexenc.latexnodes.LatexWalkerError¶
Generic exception class raised while parsing LaTeX code. Common subclass to LatexWalkerLocatedError as well as LatexWalkerEndOfStream.
- class pylatexenc.latexnodes.LatexWalkerLocatedError(msg, s=None, pos=None, lineno=None, colno=None, error_type_info=None, **kwargs)¶
Exception class raised to the user when there was an error dealing with LaTeX code. The exception is accompanied by information about where the error occurred in the source LaTeX code.
The following attributes are available if they were provided to the class constructor:
- msg¶
The error message
- s¶
The string that was currently being parsed
- pos¶
The index in the string where the error occurred, starting at zero.
- lineno¶
The line number where the error occurred, starting at 1.
- colno¶
The column number where the error occurred in the line lineno, starting at 0.
- input_source¶
The name of the source (e.g. file name) from which the LaTeX code was obtained. (Optional.)
- error_type_info¶
Specify additional information about the error so that specific applications can interpret the error and provide more meaningful messages to the user. For instance, the message “Character is forbidden: ‘%’” might be cryptic to a user, whereas an application might be able to parse the error_type_info to see that the error is of the type of a forbidden character, and issue a message like “LaTeX comments are not permitted (‘%’ char forbidden), use ‘\%’ for a literal percent sign.”
The error_type_info attribute is a dictionary with at least one key named what. The what key should reflect the type of error that occurred, e.g., token_forbidden_character. Other keys might give additional information about the error (e.g., which character was encountered and was forbidden).
- class pylatexenc.latexnodes.LatexWalkerLocatedErrorFormatter(exc)¶
Format the
- class pylatexenc.latexnodes.LatexWalkerParseError(msg, s=None, pos=None, lineno=None, colno=None, error_type_info=None, **kwargs)¶
Represents an error while LaTeX code, specifically while parsing the code into the nodes structure.
- class pylatexenc.latexnodes.LatexWalkerNodesParseError(recovery_nodes=None, recovery_parsing_state_delta=None, recovery_at_token=None, recovery_past_token=None, **kwargs)¶
Represents an error while parsing content nodes, typically as a consequence of LatexWalker.parse_content(). This class carries some additional information about how best to recover from this parse error if we are operating in tolerant parsing mode. E.g., we can already report a list of nodes parsed so far.
In addition to the attributes inherited by
LatexWalkerParseError
, we have:- recovery_nodes¶
Nodes result (a
LatexNode
orLatexNodeList
instance) to use as if the parser call had returned successfully.
- recovery_parsing_state_delta¶
Parsing state delta to use as if the parser call had returned successfully.
- recovery_at_token¶
If non-None, then we should reset the token reader’s internal position to try to continue parsing at the given token’s position.
- recovery_past_token¶
If non-None, then we should reset the token reader’s internal position to try to continue parsing immediately after the given token’s position.
This attribute is not to be set if recovery_at_token is already non-None.
New in version 3.0: The
LatexWalkerNodesParseError
class was introduced in pylatexenc 3.
- class pylatexenc.latexnodes.LatexWalkerTokenParseError(recovery_token_placeholder, recovery_token_at_pos, **kwargs)¶
Represents an error while parsing a single token of LaTeX code. See
LatexTokenReader
.In addition to the attributes inherited by
LatexWalkerParseError
, we have:- recovery_token_placeholder¶
A
LatexToken
instance to use in place of a token that we tried, but failed, to parse.
- recovery_token_at_pos¶
The position at which to reset the token_reader’s internal state to attempt to recover from this error.
New in version 3.0: The
LatexWalkerTokenParseError
class was introduced in pylatexenc 3.
Base classes¶
- class pylatexenc.latexnodes.CallableSpecBase¶
The base class for macro, environment, and specials spec classes (see the
pylatexenc.macrospec
module).As far as this
latexnodes
module’s classes are concerned, a spec object is simply something that can provide a parser to parse the given construct (macro, environment, or specials).The spec object should implement
get_node_parser()
, and it should return a parser instance that can be used to parse the entire construct.See
macrospec.MacroSpec
for how this is implemented in thepylatexenc.macrospec
module.New in version 3.0: The
CallableSpecBase
class was added in pylatexenc 3.0.
- class pylatexenc.latexnodes.LatexWalkerParsingStateEventHandler¶
A LatexWalker parsing state event handler.
The LatexWalker instance will call methods on this object to determine how to update the parsing state upon certain events, such as entering or exiting math mode.
Events:
enter math mode
exit math mode
New in version 3.0: The
LatexWalkerParsingStateEventHandler
class was added in pylatexenc 3.0.
- class pylatexenc.latexnodes.LatexWalkerBase¶
Base class for a latex-walker. Essentially, this is all that the classes and methods in the
latexnodes
module need to know about what a LatexWalker does.See also
latexwalker.LatexWalker
.New in version 3.0: The
LatexWalkerBase
class was added in pylatexenc 3.0.- parsing_state_event_handler()¶
Doc……
- parse_content(parser, token_reader=None, parsing_state=None, open_context=None, **kwargs)¶
Doc……
- make_node(node_class, **kwargs)¶
Doc……
- make_nodelist(nodelist, **kwargs)¶
Doc……
- make_nodes_collector(token_reader, parsing_state, **kwargs)¶
Doc……
- make_latex_group_parser(delimiters)¶
Doc……
- make_latex_math_parser(math_mode_delimiters)¶
Doc……
- check_tolerant_parsing_ignore_error(exc)¶
You can inspect the exception object exc and decide whether or not to attempt to recover from the exception (if you want to be tolerant to parsing errors).
Return the exception object if it should be raised, or return None if recovery should be attempted.
- format_node_pos(node)¶
Doc……
- class pylatexenc.latexnodes.LatexContextDbBase¶
Base class for a parsing state’s LaTeX context database.
A full implementation of how to specify macro, environment, and specials definitions are actually in the
pylatexenc.macrospec
module. As far as thislatexnodes
is concerned, a latex context database object is simply an object that provides theget_***_spec()
family of methods along withtest_for_specials()
, and they return relevant spec objects.The spec objects returned by
get_***_spec()
andtest_for_specials()
are subclasses ofCallableSpecBase
.New in version 3.0: The
LatexContextDbBase
class was added in pylatexenc 3.0.- get_macro_spec(macroname)¶
Return the macro spec to use to parse a macro named macroname. The macroname does not contain the escape character (
\
) itself.This method should return the relevant spec object, which should be an instance of a subclass of
CallableSpecBase
.The latex context database object may choose to provide a default spec object if macroname wasn’t formally defined. As far as the parsers are concerned, if get_macro_spec() returns a spec object, then the parsers know how to parse the given macro and will happily proceed.
If a macro of name macroname should not be considered as defined, and the parser should not attempt to parse a macro and raise an error instead (or recover from it in tolerant parsing mode), then this method should return None.
- get_environment_spec(environmentname)¶
Like
get_macro_spec()
, but for environments. The environmentname is the name of the environment specified between the curly braces after the\begin
call.This method should return the relevant spec object, which should be an instance of a subclass of
CallableSpecBase
.The latex context database object may choose to provide a default spec object if an environment named environmentname wasn’t somehow formally defined. As far as the parsers are concerned, if get_environment_spec() returns a spec object, then the parsers know how to parse the given environment and will happily proceed.
If an environment of name environmentname should not be considered as defined, and the parser should not attempt to parse the environment and raise an error instead (or recover from it in tolerant parsing mode), then this method should return None.
- get_specials_spec(specials_chars)¶
Like
get_macro_spec()
, but for specials. The specials_chars is the sequence of characters for which we’d like to find if they are a specials construct.Parsing of specials is different from macros and environments, because there is no universal syntax that distinguishes them (macros and environments are always initiated with the escape character
\
). So the token reader will calltest_for_specials()
to see if the string at the given position can be matched for specials.The result is that
get_specials_spec()
usually doesn’t get called when parsing tokens. Theget_specials_spec()
method is only called in certain specific situations, such as to get the spec object associated with the new paragraph token\n\n
.This method should return the relevant spec object, which should be an instance of a subclass of
CallableSpecBase
, or None if these characters are not to be considered as specials.
- test_for_specials(s, pos, parsing_state)¶
Test the string s at position pos for the presence of specials.
For instance, if the parser tests the string
"Eq.~\eqref{eq:xyz}"
at position 3, then the latex context database might want to report the character~
as a specials construct and return a specials spec for it.If specials characters are recognized, then this method should return a corresponding spec object. The spec object should be an instance of a
CallableSpecBase
subclass. In addition, the returned spec object must expose the attributespecials_chars
. That attribute should contain the sequence of characters that were recognized as special.If no specials characters are recongized at exactly the position pos, then this method should return None.