优化boost :: spirit :: qi解析器

时间:2022-10-31 20:20:31

I have a parser that basically prints out the actions of a stack machine with my operator precedence given some expression. My goal is to optimize for speed as much as possible. I have read an article concerning qi optimizations that provides this example code. I understand the gist of the optimizations described in the main article, however I am unclear how to integrate this into my code.

我有一个解析器,基本上打印出堆栈机器的动作,我的运算符优先级给出了一些表达式。我的目标是尽可能优化速度。我已经阅读了一篇关于提供此示例代码的qi优化的文章。我理解主要文章中描述的优化的要点,但是我不清楚如何将它集成到我的代码中。

Here is a following working example of my parser. I have already tried to optimize it somewhat by using raw[] to provide base iterators. The phoenix action calls must be given strings or iterators by which they can create strings; the real versions of these functions are not trivial and their functionality can not yet be evaluated in parsing-time:

以下是我的解析器的以下工作示例。我已经尝试通过使用raw []来提供基本迭代器来进行一些优化。必须给phoenix动作调用赋予字符串或迭代器,通过它们可以创建字符串;这些函数的真实版本并不简单,它们的功能尚不能在解析时评估:

#include <iostream>
#include <vector>
#include <string>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/qi_char.hpp>
#include <boost/spirit/include/qi_parse.hpp>
#include <boost/spirit/include/phoenix_bind.hpp>
namespace qi = boost::spirit::qi;
namespace phx = boost::phoenix;
using std::endl;
using std::cout;
using std::string;
using std::vector;

void fPushOp(const char* op){
  cout << "PushOp: " << op << endl;
}

void fPushInt(const boost::iterator_range<string::const_iterator>& my_str){
  cout << "PushInt: " << my_str << endl;
}

template<typename Iterator, typename Skipper = qi::space_type>
struct Calculator : public qi::grammar<Iterator, Skipper> {

  qi::rule<Iterator, Skipper>  
    expression, logical_or_expression, logical_and_expression, negate_expression, series_expression,
    single_expression, inclusive_or_expression, exclusive_or_expression, and_expression, equality_expression, 
    relational_expression, shift_expression, additive_expression, multiplicative_expression, 
    term, complement_factor, factor, result, integer, variable, variable_combo, word, prefix;

  qi::rule<Iterator> number;
  Calculator() : Calculator::base_type(result)
  {
    number = 
        qi::raw[
            ("0x" >> +qi::char_("0-9a-fA-F"))     
          | ("0b" >> +qi::char_("0-1"))
          | ("0" >>  +qi::char_("0-7"))
          | (+qi::char_("0-9"))
        ] [phx::bind(&fPushInt, qi::_1)]
        ;

    integer = 
          number
        | ('-' >> number) [phx::bind(&fPushOp, "OP_UNARY_MINUS")]
        ;

    variable =
          ((qi::alpha | qi::char_('_')) 
              >> *(qi::alnum | qi::char_('_')) 
              >> '['
              >>  (+(qi::alnum | qi::char_('_') | qi::char_(',')) 
                | ('\'' >> *~qi::char_('\'') >> '\'')) 
              >> ']')
        | ((qi::alpha | qi::char_('_')) >> *(qi::alnum | qi::char_('_')))
        ;

    variable_combo =
        qi::raw [
          variable >> *(qi::char_('.') >> variable)
        ] [phx::bind(&fPushInt, qi::_1)]
        ;

    word = 
        qi::raw[
          variable
        ] [phx::bind(&fPushInt, qi::_1)]
        ;

    factor =
            ("ceil(" >> expression >> ')')                                                      [phx::bind(&fPushOp, "OP_CEIL")]
        |   ("wrap(" >> expression >> ')')                                                      [phx::bind(&fPushOp, "OP_WRAP")]
        |   ("abs(" >> expression >> ')')                                                       [phx::bind(&fPushOp, "OP_ABS")]
        |   ("count1(" >> expression >> ')')                                                    [phx::bind(&fPushOp, "OP_COUNT1")]
        |   ("pick(" >> expression >> ')')                                                      [phx::bind(&fPushOp, "OP_PICK")]
        |   ("defined(" >> expression >> ')')                                                   [phx::bind(&fPushOp, "OP_DEF")]
        |   ("string_equal(" >> word >> ',' >> word >> ')')                                     [phx::bind(&fPushOp, "OP_STREQ")]
        |   ("string_contains(" >> word >> ',' >> word >> ')')                                  [phx::bind(&fPushOp, "OP_STRCON")]
        |   ("lsl(" >> single_expression >> ',' >> single_expression >> ',' >> number >> ')')   [phx::bind(&fPushOp, "OP_LSL")]
        |   ("lsr(" >> single_expression >> ',' >> single_expression >> ')')                    [phx::bind(&fPushOp, "OP_LSR")]
        |   ("asr(" >> single_expression >> ',' >> single_expression >> ',' >> number >> ')')   [phx::bind(&fPushOp, "OP_ASR")]
        |   ("ror(" >> single_expression >> ',' >> single_expression >> ',' >> number >> ')')   [phx::bind(&fPushOp, "OP_ROR")]
        |   ("rrx(" >> single_expression >> ',' >> single_expression >> ',' >> single_expression >> ',' >> number >> ')')[phx::bind(&fPushOp, "OP_RRX")]
        |   ('(' >> expression >> ')')
        |   variable_combo
        |   integer
        ;
    complement_factor = factor
        | ('~' >> factor) [phx::bind(&fPushOp, "OP_COMPLEMENT")]
        ;
    term = complement_factor
      >> *( (".." >> complement_factor) [phx::bind(&fPushOp, "OP_LEGER")]
          | ('\\' >> complement_factor) [phx::bind(&fPushOp, "OP_MASK")]
          ); 
    multiplicative_expression = term
      >> *( ('/' >> term) [phx::bind(&fPushOp, "OP_DIV")]
          | ('%' >> term) [phx::bind(&fPushOp, "OP_MOD")]
          | ('*' >> term) [phx::bind(&fPushOp, "OP_MUL")]
          );
    additive_expression = multiplicative_expression
      >> *( ('+' >> multiplicative_expression)  [phx::bind(&fPushOp, "OP_ADD")]
          | ('-' >> multiplicative_expression)  [phx::bind(&fPushOp, "OP_SUB")]
          );
    shift_expression = additive_expression
      >> *( (">>" >> additive_expression) [phx::bind(&fPushOp, "OP_SRL")]
          | ("<<" >> additive_expression) [phx::bind(&fPushOp, "OP_SLL")]
          );
    relational_expression = shift_expression
      >> *( ('<' >> shift_expression) [phx::bind(&fPushOp, "OP_LT")]
          | ('>' >> shift_expression) [phx::bind(&fPushOp, "OP_GT")]
          | ("<=" >> shift_expression)[phx::bind(&fPushOp, "OP_LET")]
          | (">=" >> shift_expression)[phx::bind(&fPushOp, "OP_GET")]
          );
    equality_expression = relational_expression 
      >> *( ("==" >> relational_expression)[phx::bind(&fPushOp, "OP_EQ")]
          | ("!=" >> relational_expression)[phx::bind(&fPushOp, "OP_NEQ")] 
          );
    and_expression = equality_expression 
      >> *(('&' >> equality_expression)     [phx::bind(&fPushOp, "OP_AND")]); 
    exclusive_or_expression = and_expression 
      >> *(('^' >> and_expression)          [phx::bind(&fPushOp, "OP_XOR")]); 
    inclusive_or_expression = exclusive_or_expression 
      >> *(('|' >> exclusive_or_expression) [phx::bind(&fPushOp, "OP_OR")]); 
    single_expression = inclusive_or_expression;
    series_expression = inclusive_or_expression 
      >> *((',' >> inclusive_or_expression) [phx::bind(&fPushOp, "OP_SERIES")]);
    negate_expression = series_expression
        | ('!' >> series_expression)        [phx::bind(&fPushOp, "OP_NEGATE")];
    logical_and_expression = negate_expression
      >> *(("&&" >> negate_expression)      [phx::bind(&fPushOp, "OP_LOGICAL_AND")]); 
    logical_or_expression = logical_and_expression 
      >> *(("||" >> logical_and_expression) [phx::bind(&fPushOp, "OP_LOGICAL_OR")]);
    expression = logical_or_expression;

    result = expression;
  }
};

int main(){
  Calculator<string::const_iterator> calc;
  const string expr("0xff0000 >> 3 && 3 + (!9) | (0,200)");
  cout << "Expression: " << expr << endl;

  string::const_iterator it = expr.begin();
  phrase_parse(it, expr.end(), calc, qi::space);

  cout << "Remaining: " << (string(it,expr.end())) << endl;
  return 0;
}

Additionally, I read the slides from this pdf concerning utree and am trying to figure out how, if possible, to use the utree output instead of semantic actions since apparently such things are evil. Can someone provide or point me to a simple example on how to construct a utree that can then be fed to a stack machine to print out operations in order?

另外,我从这个关于utree的pdf中读到幻灯片,并试图弄清楚如果可能的话,如何使用utree输出而不是语义动作,因为显然这些事情是邪恶的。有人可以提供或指出一个关于如何构造一个简单的例子,然后可以将其提供给堆栈机器以按顺序打印出操作吗?

1 个解决方案

#1


1  

The optimizations depend on what you want to achieve. Therefore, I think you're optimizing prematurely.

优化取决于您想要实现的目标。因此,我认为你过早地进行了优化。

E.g. parsing a variable_combo as a raw[] input sequence does not make any sense if you want to interpret the symbols later (because you'll have to parse the variable combo again, and you'll even have to anticipate whitespace in there: "foo . bar .tux" is a valid variable combo here).

例如。将variable_combo解析为raw []输入序列如果你想稍后解释这些符号没有任何意义(因为你必须再次解析变量combo,你甚至必须预测那里的空格:“foo” 。吧.tux“这里是一个有效的变量组合)。

I have quite a lot of posts in general dealing with optimizing Boost Spirit (start here e.g.). Quick observations here:

我有很多关于优化Boost Spirit的帖子(例如从这里开始)。快速观察:

  • consider correctness under backtracking; with your grammar parsing 'ceil(3.7') you'll get:

    考虑回溯的正确性;用你的语法解析'ceil(3.7')你会得到:

    Expression: ceil(3.7)
    PushInt: 3
    PushInt: ceil
    Remaining: (3.7)
    

    Note how this emits opcodes when parsing failed. Note also, it emits the wrong opcodes

    请注意解析失败时如何发出操作码。另请注意,它会发出错误的操作码

    • it pushes 3 instead of 3.7
    • 它推3而不是3.7
    • it pushes ceil as an PushInt?
    • 它推动ceil作为PushInt?

    So not only does it detect failure to parse too late, it just ignores the parentheses, fails to spot the function call and parses the number wrong.

    因此,它不仅检测到解析失败太晚,它只是忽略括号,无法发现函数调用并解析数字错误。

    Regarding the premature evaluation, I'm going to point to this popular answer: Boost Spirit: "Semantic actions are evil"?

    关于过早评估,我将指出这个流行的答案:提升精神:“语义行为是邪恶的”?

    Other than that, I'm just confirming my suspicion that you're doing premature optimization. Consider doing

    除此之外,我只是确认我怀疑你正在做过早的优化。考虑做

    #define BOOST_SPIRIT_DEBUG
    

    and then later, in the grammar constructor:

    然后,在语法构造函数中:

    BOOST_SPIRIT_DEBUG_NODES(
            (expression)(logical_or_expression)(logical_and_expression)(negate_expression)(series_expression)(single_expression)
            (inclusive_or_expression)(exclusive_or_expression)(and_expression)(equality_expression)(relational_expression)
            (shift_expression)(additive_expression)(multiplicative_expression)(term)(complement_factor)(factor)(result)(integer)
            (variable)(variable_combo)(word)(prefix)
    

    To really see how your parser behaves.

    要真正了解解析器的行为方式。

  • consider qi::symbols e.g.:

    考虑qi ::符号例如:

    qi::symbols<char,const char*> unary_function;
    
    unary_function.add
        ("ceil",    "OP_CEIL")
        ("wrap",    "OP_WRAP")
        ("abs",     "OP_ABS")
        ("count1",  "OP_COUNT1")
        ("pick",    "OP_PICK")
        ("defined", "OP_DEF");
    
    unary_call = (unary_function >> "(" >> expression >> ')') [phx::bind(&fPushOp, qi::_1)];
    
  • traits might leave more potential for the compiler to optimize after inlining (as opposed to semantic actions, since the many levels of template instantiation can obscure some cases, especially when bind is involved)

    在内联之后,traits可能为编译器提供更多的优化潜力(与语义操作相反,因为模板实例化的许多级别可能会掩盖某些情况,特别是涉及绑定时)

You may want to make operator precedence table driven here, as some of the spirit samples show. The traditional way to use rule-hierarchy to enforce precedence that you've taken complicates the grammar. This has two key downsides:

您可能希望在此处创建运算符优先级表,正如某些精神示例所示。使用规则层次结构强制执行优先级的传统方法使语法复杂化。这有两个关键的缺点:

  • each rule introduces a virtual dispatch (Spirit X3 may not have this limitation anymore in the future)
  • 每条规则都引入了一个虚拟调度(Spirit X3将来可能不再有这个限制)
  • your grammar got so complicated that you lost the overview already (see first bullet)
  • 你的语法变得如此复杂,你已经失去了概述(参见第一篇文章)

Recommendations

I'd suggest

我建议

  1. moving away from evaluating during parsing as the semantic actions grow unwieldy, and are very (very) tricky to get right in the face of (late) backtracking (or even parser failures; the latter could be detected easily, but backtracking can also be benign and very hard to correct for when semantic actions have side effects).

    在解析过程中远离评估,因为语义动作变得笨拙,面对(后期)回溯(甚至解析器失败)非常(非常)棘手;后者可以很容易地被检测到,但回溯也可能是良性的并且很难纠正语义动作有副作用的时候)。

  2. start building the grammar from the simplest rule, gradually building it as you add test cases

    从最简单的规则开始构建语法,在添加测试用例时逐步构建语法

#1


1  

The optimizations depend on what you want to achieve. Therefore, I think you're optimizing prematurely.

优化取决于您想要实现的目标。因此,我认为你过早地进行了优化。

E.g. parsing a variable_combo as a raw[] input sequence does not make any sense if you want to interpret the symbols later (because you'll have to parse the variable combo again, and you'll even have to anticipate whitespace in there: "foo . bar .tux" is a valid variable combo here).

例如。将variable_combo解析为raw []输入序列如果你想稍后解释这些符号没有任何意义(因为你必须再次解析变量combo,你甚至必须预测那里的空格:“foo” 。吧.tux“这里是一个有效的变量组合)。

I have quite a lot of posts in general dealing with optimizing Boost Spirit (start here e.g.). Quick observations here:

我有很多关于优化Boost Spirit的帖子(例如从这里开始)。快速观察:

  • consider correctness under backtracking; with your grammar parsing 'ceil(3.7') you'll get:

    考虑回溯的正确性;用你的语法解析'ceil(3.7')你会得到:

    Expression: ceil(3.7)
    PushInt: 3
    PushInt: ceil
    Remaining: (3.7)
    

    Note how this emits opcodes when parsing failed. Note also, it emits the wrong opcodes

    请注意解析失败时如何发出操作码。另请注意,它会发出错误的操作码

    • it pushes 3 instead of 3.7
    • 它推3而不是3.7
    • it pushes ceil as an PushInt?
    • 它推动ceil作为PushInt?

    So not only does it detect failure to parse too late, it just ignores the parentheses, fails to spot the function call and parses the number wrong.

    因此,它不仅检测到解析失败太晚,它只是忽略括号,无法发现函数调用并解析数字错误。

    Regarding the premature evaluation, I'm going to point to this popular answer: Boost Spirit: "Semantic actions are evil"?

    关于过早评估,我将指出这个流行的答案:提升精神:“语义行为是邪恶的”?

    Other than that, I'm just confirming my suspicion that you're doing premature optimization. Consider doing

    除此之外,我只是确认我怀疑你正在做过早的优化。考虑做

    #define BOOST_SPIRIT_DEBUG
    

    and then later, in the grammar constructor:

    然后,在语法构造函数中:

    BOOST_SPIRIT_DEBUG_NODES(
            (expression)(logical_or_expression)(logical_and_expression)(negate_expression)(series_expression)(single_expression)
            (inclusive_or_expression)(exclusive_or_expression)(and_expression)(equality_expression)(relational_expression)
            (shift_expression)(additive_expression)(multiplicative_expression)(term)(complement_factor)(factor)(result)(integer)
            (variable)(variable_combo)(word)(prefix)
    

    To really see how your parser behaves.

    要真正了解解析器的行为方式。

  • consider qi::symbols e.g.:

    考虑qi ::符号例如:

    qi::symbols<char,const char*> unary_function;
    
    unary_function.add
        ("ceil",    "OP_CEIL")
        ("wrap",    "OP_WRAP")
        ("abs",     "OP_ABS")
        ("count1",  "OP_COUNT1")
        ("pick",    "OP_PICK")
        ("defined", "OP_DEF");
    
    unary_call = (unary_function >> "(" >> expression >> ')') [phx::bind(&fPushOp, qi::_1)];
    
  • traits might leave more potential for the compiler to optimize after inlining (as opposed to semantic actions, since the many levels of template instantiation can obscure some cases, especially when bind is involved)

    在内联之后,traits可能为编译器提供更多的优化潜力(与语义操作相反,因为模板实例化的许多级别可能会掩盖某些情况,特别是涉及绑定时)

You may want to make operator precedence table driven here, as some of the spirit samples show. The traditional way to use rule-hierarchy to enforce precedence that you've taken complicates the grammar. This has two key downsides:

您可能希望在此处创建运算符优先级表,正如某些精神示例所示。使用规则层次结构强制执行优先级的传统方法使语法复杂化。这有两个关键的缺点:

  • each rule introduces a virtual dispatch (Spirit X3 may not have this limitation anymore in the future)
  • 每条规则都引入了一个虚拟调度(Spirit X3将来可能不再有这个限制)
  • your grammar got so complicated that you lost the overview already (see first bullet)
  • 你的语法变得如此复杂,你已经失去了概述(参见第一篇文章)

Recommendations

I'd suggest

我建议

  1. moving away from evaluating during parsing as the semantic actions grow unwieldy, and are very (very) tricky to get right in the face of (late) backtracking (or even parser failures; the latter could be detected easily, but backtracking can also be benign and very hard to correct for when semantic actions have side effects).

    在解析过程中远离评估,因为语义动作变得笨拙,面对(后期)回溯(甚至解析器失败)非常(非常)棘手;后者可以很容易地被检测到,但回溯也可能是良性的并且很难纠正语义动作有副作用的时候)。

  2. start building the grammar from the simplest rule, gradually building it as you add test cases

    从最简单的规则开始构建语法,在添加测试用例时逐步构建语法