xlnt/docs/formula.rst
2015-11-03 18:26:33 -05:00

107 lines
3.8 KiB
ReStructuredText

Parsing Formulas
================
`xlnt` supports limited parsing of formulas embedded in cells. The
`xlnt/formula` module contains a `tokenizer` class to break
formulas into their consitutuent tokens. Usage is as follows:
.. code-block:: cpp
#include <xlnt/xlnt.hpp>
int main()
{
std::string formula = R"(=IF($A$1,"then True",MAX(DEFAULT_VAL,'Sheet 2'!B1)))";
xlnt::tokenizer tok(formula);
tok.parse();
for(auto &t : tok.get_items())
{
std::cout << t.get_value() << "\t" << t.get_type() << "\t" << t.get_subtype() << std::endl;
}
// prints:
//
// IF( FUNC OPEN
// $A$1 OPERAND RANGE
// , SEP ARG
// "then True" OPERAND TEXT
// , SEP ARG
// MAX( FUNC OPEN
// DEFAULT_VAL OPERAND RANGE
// , SEP ARG
// 'Sheet 2'!B1 OPERAND RANGE
// ) FUNC CLOSE
// ) FUNC CLOSE
return 0;
}
As shown above, tokens have three attributes of interest:
* ``.value``: The substring of the formula that produced this token
* ``.type``: The type of token this represents. Can be one of
- ``token::literal``: If the cell does not contain a formula, its
value is represented by a single ``LITERAL`` token.
- ``token::operand``: A generic term for any value in the Excel
formula. (See ``.subtype`` below for more details).
- ``token::func``: Function calls are broken up into tokens for the
opener (e.g., ``SUM(``), followed by the arguments, followed by
the closer (i.e., ``)``). The function name and opening
parenthesis together form one ``FUNC`` token, and the matching
parenthesis forms another ``FUNC`` token.
- ``token::array``: Array literals (enclosed between curly braces)
get two ``ARRAY`` tokens each, one for the opening ``{`` and one
for the closing ``}``.
- ``token::paren``: When used for grouping subexpressions (and not to
denote function calls), parentheses are tokenized as ``paren``
tokens (one per character).
- ``token::sep``: These tokens are created from either commas (``,``)
or semicolons (``;``). Commas create ``sep`` tokens when they are
used to separate function arguments (e.g., ``SUM(a,b)``) or when
they are used to separate array elements (e.g., ``{a,b}``). (They
have another use as an infix operator for joining
ranges). Semicolons are always used to separate rows in an array
literal, so always create ``sep`` tokens.
- ``token::op_pre``: Designates a prefix unary operator. Its value is
always ``+`` or ``-``
- ``token::op_in``: Designates an infix binary operator. Possible
values are ``>=``, ``<=``, ``<>``, ``=``, ``>``, ``<``, ``*``,
``/``, ``+``, ``-``, ``^``, or ``&``.
- ``token::op_post``: Designates a postfix unary operator. Its value
is always ``%``.
- ``token::wspace``: Created for any whitespace encountered. Its
value is always a single space, regardless of how much whitespace
is found.
* ``.subtype``: Some of the token types above use the subtype to
provide additional information about the token. Possible subtypes
are:
+ ``token::text``, ``token::number``, ``token::logical``,
``token::error``, ``token::range``: these subtypes describe the
various forms of ``operand`` found in formulae. ``logical`` is
either ``true`` or ``false``, ``range`` is either a named range or
a direct reference to another range. ``text``, ``number``, and
``error`` all refer to literal values in the formula
+ ``token::open`` and ``token::close``: these two subtypes are used by
``paren``, ``func``, and ``array``, to describe whether the token
is opening a new subexpression or closing it.
+ ``token::arg`` and ``token::row``: are used by the ``sep`` tokens,
to distinguish between the comma and semicolon. Commas produce
tokens of subtype ``arg`` whereas semicolons produce tokens of
subtype ``row``