The wrapping tools consist of executables that pull information from C++ header files, and produce wrapper code that allows the C++ interfaces to be used from other programming languages (Python and Java). One can think of the wrappers as having a front-end that parses C++ header files, and a back-end that produces language-specific glue code.
All of the code in this directory is C, rather than C++. One might think this is silly, since the front-end parses C++ .h files and the back-end generates .cxx files. The original reason for this is that the parser uses lex and yacc, which are written in C and previously could not easily be linked into C++ programs.
The C++ Parser#
The header vtkParse.h provides a C API for the C++ parser that wrappers use to read the VTK header files. The parser consists of three critical pieces: a preprocessor (see below), a lex-based lexical analyzer (lex.yy.c, generated from vtkParse.l) and a bison-based glr parser (vtkParse.tab.c, generated from vtkParse.y). Instructions on rebuilding the parser are provided at the end of this document.
This is a preprocessor that can run independently of the parser. In general,
the parser does not recursively parse
#include files, but it does
recursively preprocess them in order to gather all of the macro definitions
This provides low-level string handling routines that are used by the parser and the preprocessor. Most importantly, it contains a C++ tokenizer. It also contains a cache for storing strings (type names, etc.) that are encountered during the parse.
This contains utilities for file system access. One of its functionalities is to manage a cache of where header files are located on the file system, so that header file lookups can be done inexpensively even on slow file systems.
This is a header file that defines numerical constants that we use to identify C++ types, type qualifiers, and specifiers. These constants are used in the vtkParseData data structures described below.
This is a header file that defines numerical constants for wrapper-specific
attributes that can be added to declarations in the VTK header files. For
[[vtk::deprecated]]. These attribute
constants are stored in the vtkParseData data structures.
The data structures defined in vtkParseData.h are used for the output of the parser. This header provides data structures for namespaces, classes, methods, typedefs, and for other entities that can be declared in a C++ file. The wrappers convert this data into wrapper code.
This file provides routines for managing certain abstractions of the data that is produced by the parser. Most specifically, it provides facilities for expanding typedefs and for instantiating templates. Its code is not pretty.
This provides methods for dealing with method resolution order. It defines a data structure for managing a class along with all the classes it derives from. It is needed for managing tricky details relating to inheritance, such as “using” declarations, overrides, virtual methods, etc.
The Python wrappers rely on name-mangling routines to convert C++ names into names that can be used in Python. The mangling is done according to the rules of the IA64 ABI (this same mangling is used to convert C++ APIs into C APIs)
A hierarchy file is a text file that lists information about all the types defined in a VTK module. The wrappers use these files to look up types from names. Through the use of vtkParseHierarchy, the wrappers can get detailed information about a type even if the header file only contains a forward reference, as long as the type is defined somewhere in another header.
A common main() function for use by wrapper tool executables. It provides a standard set of command-line options as well as response-file handling. It also invokes the parser.
This has functions that are common to the wrapper tools for all the wrapper languages. Unlikely vtkParse, it deals with the generation of code, rather than the parsing of code.
This has functions for automatically generating documentation from the header files that are parsed. It produces the Python docstrings.
These are named according to the pieces of wrapper code they produce.
vtkWrapPythonClass creates type objects for vtkObjectBase classes
vtkWrapPythonType creates type objects for other wrapped classes
vtkWrapPythonMethod for calling C++ methods from Python
vtkWrapPythonOverload maps a Python method to multiple C++ overloads
vtkWrapPythonMethodDef generates the method tables for wrapped classes
vtkWrapPythonTemplate for wrapping of C++ class templates
vtkWrapPythonNamespace for wrapping namespaces
vtkWrapPythonEnum creates type objects for enum types
vtkWrapPythonConstant adds C++ constants to Python classes, namespaces
Python Wrapper Executables#
This executable will parse the C++ declarations from a header file and produce wrapper code that can be linked into a Python extension module.
This will produce the PyInit entry point for a Python extension module, as well as code for loading all the dependent modules. The .cxx file produced by vtkWrapPythonInit is linked together to the .cxx files that are produced by vtkWrapPython to create the module.
Java Wrapper Executables#
vtkWrapJava produces C++ wrapper code that uses the JNI
vtkParseJava produces Java code that sits on top of the C++ code
This will slurp up all the header files in a VTK module and produce a “hierarchy.txt” file that provides information about all of the types that are defined in that module. In other words, it provides a summary of the module’s contents. The Python and Java wrapper executables rely on these “hierarchy.txt” files in order to look up types by name.
Rebuilding the Parser#
The code for the C++ parser is generated from the files vtkParse.l and vtkParse.y with the classic compiler-generator tools lex and yacc (or, more specifically, with their modern incarnations flex and bison). These tools are readily available on macOS and Linux systems, and they can be installed (with some difficulty) on Windows systems.
The C code that flex and bison generate is not styled according to VTK standards, and must be cleaned up in order to compile without warnings and in order to satisfy VTK’s git hooks and style checks.
The file vtkParse.l contains regular expressions for tokenizing a C++ header file. It is used to generate the file lex.yy.c, which is directly included (i.e. as a C file) by the main parser file, vtkParse.tab.c.
To generate lex.yy.c from vtkParse.l, use the following steps.
Get a copy of flex, version 2.6.4 or later
flex --nodefault --noline -olex.yy.c vtkParse.l
In an editor, remove blank lines from the top and bottom of lex.yy.c
Replace all tabs with two spaces (e.g.
:%s/\t/ /gin vi)
Remove spaces from the ends of lines (e.g.
:%s/ *$//in vi)
struct yy_trans_info, which is used nowhere in the code
Add the following code at line 23 (after “
end standard C headers”)
#ifndef __cplusplus extern int isatty(int); #endif /* __cplusplus */
Finally, if you have clang-format installed, you can use it to re-style the code.
The file vtkParse.y contains the rules for parsing a C++ header file. Many of the rules in this file have the same names as in the description of the grammar in the official ISO standard. The file vtkParse.y is used to generate the file vtkParse.tab.c, which contains the parser.
Get a copy of bison 3.2.3 or later, it has a yacc-compatible front end.
bison --no-lines -b vtkParse vtkParse.y, to generate vtkParse.tab.c
In an editor, replace every
static inlinein vtkParse.tab.c with
#if ! defined lint || defined __GNUC__with
comment out the
If you are familiar with “diff” and “patch” and if you have clang-format, you can automate these code changes as follows. For this, you must use exactly version 3.2.3 of bison to ensure that the code that is produced is as similar as possible to what is currently in the VTK repository.
Run bison (as above) on the vtkParse.y from the master branch
Use clang-format-8 to re-style vtkParse.tab.c to match VTK code style
Use “git diff -R vtkParse.tab.c” to produce a patch file
If done correctly, this will produce a patch file that contains all the changes above (steps 3 through 9 in the original list). Load the patch file into a text editor to verify that this is so, and remove any superfluous changes from the patch file.
Then, switch to your new vtkParse.y (the one you have modified). Repeat steps 1 and 2 (generate vtkParse.tab.c and reformat it with clang-format). Now you can apply the patch file to automate the original steps 3 through 9. Note that as you continue to edit vtkParse.y and regenerate vtkParse.tab.c, you can continue to use the same patch. Just remember to run clang-format every time that you run bison.
Debugging the Parser#
When bison is run, it should not report any shift/reduce or reduce/reduce
warnings. If modifications to the rules cause these warnings to occur,
you can run bison with the
bison --debug --verbose -b vtkParse vtkParse.y
This will cause bison to produce a file called “vtkParse.output” that will show which rules conflict with other rules.