flex: Cxx

 
 18 Generating C++ Scanners
 **************************
 
 *IMPORTANT*: the present form of the scanning class is _experimental_
 and may change considerably between major releases.
 
    'flex' provides two different ways to generate scanners for use with
 C++.  The first way is to simply compile a scanner generated by 'flex'
 using a C++ compiler instead of a C compiler.  You should not encounter
 any compilation errors (⇒Reporting Bugs).  You can then use C++
 code in your rule actions instead of C code.  Note that the default
 input source for your scanner remains 'yyin', and default echoing is
 still done to 'yyout'.  Both of these remain 'FILE *' variables and not
 C++ _streams_.
 
    You can also use 'flex' to generate a C++ scanner class, using the
 '-+' option (or, equivalently, '%option c++)', which is automatically
 specified if the name of the 'flex' executable ends in a '+', such as
 'flex++'.  When using this option, 'flex' defaults to generating the
 scanner to the file 'lex.yy.cc' instead of 'lex.yy.c'.  The generated
 scanner includes the header file 'FlexLexer.h', which defines the
 interface to two C++ classes.
 
    The first class in 'FlexLexer.h', 'FlexLexer', provides an abstract
 base class defining the general scanner class interface.  It provides
 the following member functions:
 
 'const char* YYText()'
      returns the text of the most recently matched token, the equivalent
      of 'yytext'.
 
 'int YYLeng()'
      returns the length of the most recently matched token, the
      equivalent of 'yyleng'.
 
 'int lineno() const'
      returns the current input line number (see '%option yylineno)', or
      '1' if '%option yylineno' was not used.
 
 'void set_debug( int flag )'
      sets the debugging flag for the scanner, equivalent to assigning to
      'yy_flex_debug' (⇒Scanner Options).  Note that you must
      build the scanner using '%option debug' to include debugging
      information in it.
 
 'int debug() const'
      returns the current setting of the debugging flag.
 
    Also provided are member functions equivalent to
 'yy_switch_to_buffer()', 'yy_create_buffer()' (though the first argument
 is an 'istream&' object reference and not a 'FILE*)',
 'yy_flush_buffer()', 'yy_delete_buffer()', and 'yyrestart()' (again, the
 first argument is a 'istream&' object reference).
 
    The second class defined in 'FlexLexer.h' is 'yyFlexLexer', which is
 derived from 'FlexLexer'.  It defines the following additional member
 functions:
 
 'yyFlexLexer( istream* arg_yyin = 0, ostream* arg_yyout = 0 )'
 'yyFlexLexer( istream& arg_yyin, ostream& arg_yyout )'
      constructs a 'yyFlexLexer' object using the given streams for input
      and output.  If not specified, the streams default to 'cin' and
      'cout', respectively.  'yyFlexLexer' does not take ownership of its
      stream arguments.  It's up to the user to ensure the streams
      pointed to remain alive at least as long as the 'yyFlexLexer'
      instance.
 
 'virtual int yylex()'
      performs the same role is 'yylex()' does for ordinary 'flex'
      scanners: it scans the input stream, consuming tokens, until a
      rule's action returns a value.  If you derive a subclass 'S' from
      'yyFlexLexer' and want to access the member functions and variables
      of 'S' inside 'yylex()', then you need to use '%option yyclass="S"'
      to inform 'flex' that you will be using that subclass instead of
      'yyFlexLexer'.  In this case, rather than generating
      'yyFlexLexer::yylex()', 'flex' generates 'S::yylex()' (and also
      generates a dummy 'yyFlexLexer::yylex()' that calls
      'yyFlexLexer::LexerError()' if called).
 
 'virtual void switch_streams(istream* new_in = 0, ostream* new_out = 0)'
 'virtual void switch_streams(istream& new_in, ostream& new_out)'
      reassigns 'yyin' to 'new_in' (if non-null) and 'yyout' to 'new_out'
      (if non-null), deleting the previous input buffer if 'yyin' is
      reassigned.
 
 'int yylex( istream* new_in, ostream* new_out = 0 )'
 'int yylex( istream& new_in, ostream& new_out )'
      first switches the input streams via 'switch_streams( new_in,
      new_out )' and then returns the value of 'yylex()'.
 
    In addition, 'yyFlexLexer' defines the following protected virtual
 functions which you can redefine in derived classes to tailor the
 scanner:
 
 'virtual int LexerInput( char* buf, int max_size )'
      reads up to 'max_size' characters into 'buf' and returns the number
      of characters read.  To indicate end-of-input, return 0 characters.
      Note that 'interactive' scanners (see the '-B' and '-I' flags in
      ⇒Scanner Options) define the macro 'YY_INTERACTIVE'.  If you
      redefine 'LexerInput()' and need to take different actions
      depending on whether or not the scanner might be scanning an
      interactive input source, you can test for the presence of this
      name via '#ifdef' statements.
 
 'virtual void LexerOutput( const char* buf, int size )'
      writes out 'size' characters from the buffer 'buf', which, while
      'NUL'-terminated, may also contain internal 'NUL's if the scanner's
      rules can match text with 'NUL's in them.
 
 'virtual void LexerError( const char* msg )'
      reports a fatal error message.  The default version of this
      function writes the message to the stream 'cerr' and exits.
 
    Note that a 'yyFlexLexer' object contains its _entire_ scanning
 state.  Thus you can use such objects to create reentrant scanners, but
 see also ⇒Reentrant.  You can instantiate multiple instances of
 the same 'yyFlexLexer' class, and you can also combine multiple C++
 scanner classes together in the same program using the '-P' option
 discussed above.
 
    Finally, note that the '%array' feature is not available to C++
 scanner classes; you must use '%pointer' (the default).
 
    Here is an example of a simple C++ scanner:
 
           // An example of using the flex C++ scanner class.
      
          %{
          #include <iostream>
          using namespace std;
          int mylineno = 0;
          %}
      
          %option noyywrap c++
      
          string  \"[^\n"]+\"
      
          ws      [ \t]+
      
          alpha   [A-Za-z]
          dig     [0-9]
          name    ({alpha}|{dig}|\$)({alpha}|{dig}|[_.\-/$])*
          num1    [-+]?{dig}+\.?([eE][-+]?{dig}+)?
          num2    [-+]?{dig}*\.{dig}+([eE][-+]?{dig}+)?
          number  {num1}|{num2}
      
          %%
      
          {ws}    /* skip blanks and tabs */
      
          "/*"    {
                  int c;
      
                  while((c = yyinput()) != 0)
                      {
                      if(c == '\n')
                          ++mylineno;
      
                      else if(c == '*')
                          {
                          if((c = yyinput()) == '/')
                              break;
                          else
                              unput(c);
                          }
                      }
                  }
      
          {number}  cout << "number " << YYText() << '\n';
      
          \n        mylineno++;
      
          {name}    cout << "name " << YYText() << '\n';
      
          {string}  cout << "string " << YYText() << '\n';
      
          %%
      
      	// This include is required if main() is an another source file.
      	//#include <FlexLexer.h>
      
          int main( int /* argc */, char** /* argv */ )
          {
              FlexLexer* lexer = new yyFlexLexer;
              while(lexer->yylex() != 0)
                  ;
              return 0;
          }
 
    If you want to create multiple (different) lexer classes, you use the
 '-P' flag (or the 'prefix=' option) to rename each 'yyFlexLexer' to some
 other 'xxFlexLexer'.  You then can include '<FlexLexer.h>' in your other
 sources once per lexer class, first renaming 'yyFlexLexer' as follows:
 
          #undef yyFlexLexer
          #define yyFlexLexer xxFlexLexer
          #include <FlexLexer.h>
      
          #undef yyFlexLexer
          #define yyFlexLexer zzFlexLexer
          #include <FlexLexer.h>
 
    if, for example, you used '%option prefix="xx"' for one of your
 scanners and '%option prefix="zz"' for the other.