Data Assembly#
VTK 10.0 introduces a new mechanism for representing data hierarchies using vtkPartitionedDataSetCollection and vtkDataAssembly. This document describes the design details.
Data Model#
The design is based on three classes:
vtkPartitionedDataSet
is a collection of datasets (not to be confused withvtkDataSet
).vtkPartitionedDataSetCollection
is a collection ofvtkPartitionedDataSet
s.vtkDataAssembly
defines the hierarchical relationships between items in avtkPartitionedDataSetCollection
.
Partitioned Dataset#
vtkPartitionedDataSet
is simply a collection of datasets that are to be
treated as a logical whole. In data-parallel applications, each dataset may
represent a partition of the complete dataset on the current worker process,
rank, or thread. Each dataset in a vtkPartitionedDataSet
is called a
partition, implying it is only a part of a whole.
All non-null partitions have similar field and attribute arrays. For example, if
a vtkPartitionedDataSet
comprises of vtkDataSet
subclasses, all will have
exactly the same number of point data/cell data arrays, with same names, same
number of components, and same data types.
Partitioned Dataset Collection#
vtkPartitionedDataSetCollection
is a collection of vtkPartitionedDataSet
.
Thus, it is simply a mechanism to group multiple vtkPartitionedDataSet
instances together. Since each vtkPartitionedDataSet
represents a whole dataset
(not be confused with vtkDataSet
), we can refer to each item in a
vtkPartitionedDataSetCollection
as a partitioned-dataset.
Unlike items in the vtkPartitionedDataSet
, there are no restrictions of consistency
between each items, partitioned-datasets, in the vtkPartitionedDataSetCollection
.
Thus, in the multiblock-dataset parlance, each item in this collection can be thought
of as a block.
Data Assembly#
vtkDataAssembly
is a means to define an hierarchical organization of items in a
vtkPartitionedDataSetCollection
. This is literally a tree made up of named nodes.
Each node in the tree can have associated dataset-indices. For a vtkDataAssembly
is
associated with a vtkPartitionedDataSetCollection
, each of the
dataset-indices is simply the index of a partitioned-dataset in the
vtkPartitionedDataSetCollection
. A dataset-index can be associated with multiple nodes in
the assembly, however, a dataset-index cannot be associated with the same node more than once.
An assembly provides an ability to define a more complex view of the raw data blocks in
a more application-specific form. This is not much different than what could be achieved using simply
a vtkMultiBlockDataSet
. However, there are several advantages to this separation of storage
(vtkPartitionedDataSetCollection
) and organization (vtkDataAssembly
). These will become clear as
we cover different use-cases.
While nodes in the data-assembly have unique ids, public facing algorithm APIs should not use them. For example an extract-block filter that allows users to choose which blocks (rather partitioned-datasets) to extract from vtkPartitionedDataSetCollection can expose an API that lets users provide path-expression to identify nodes in the associated data-assembly using their names.
Besides accessing nodes by querying using their names, vtkDataAssembly
also
supports a mechanism to iterate over all nodes in depth-first or breadth-first
order using a visitor. vtkDataAssemblyVisitor defines a API that can be
implemented to do custom action as each node in the tree is visited.
Design Implications#
Since
vtkPartitionedDataSet
is simply parts of a whole, there is no specific significance to the number of partitions. In distributed pipelines, for example, avtkPartitionedDataSet
on each rank can have arbitrarily many partitions. Furthermore, filters can add/remove partitions as needed. Since thevtkDataAssembly
never refers to individual partitions, this has no implication to filters that use the hierarchical relationships.When constructing
vtkPartitionedDataSetCollection
in distributed data-parallel cases, each rank should have exactly the same number of partitioned-datasets. In this case, eachvtkPartitionedDataSet
at a specific index across all ranks together is treated as a whole dataset. Similarly, thevtkDataAssembly
on each should be identical.When developing filters, it is worth considering whether the filter really is a
vtkPartitionedDataSetCollection
filter or simply avtkPartitionedDataSet
-aware filter that needs to operate on eachvtkPartitionedDataSet
individually. For example, typical multiblock-aware filters like ghost-cell-generation, data-redistribution, etc., are simplyvtkPartitionedDataSet
filters. ForvtkPartitionedDataSet
-only filters, when the input is avtkPartitionedDataSetCollection
, the executive takes care of looping over each of the partitioned-dataset in the collection, thus simplifying the filter development.Filters that don’t change the number of partitioned-datasets in a vtkPartitionedDataSetCollection don’t generally affect the relationships between the partitioned-datasets and hence can largely pass through the vtkDataAssembly. Only filter like extract-block that remove partitioned-datasets need to update the vtkDataAssembly. There too, vtkDataAssembly provides several convenience methods to update the tree with ease.
It is possible to develop a mapper that uses the
vtkDataAssembly
. Using APIs that let users use path-queries to specify rendering properties for various nodes, the mapper can support use-cases where the input structure keeps changing but the relationships remain largely intact. Since the same dataset-index can be associated with multiple nodes in avtkDataAssembly
, the mapper can effectively support scene-graph like capabilities where user can specify transforms, and other rendering parameters, while reusing the heavy datasets. The mapper can easily tell if a dataset has already been uploaded to the rendering pipeline since it will have the same id and indeed be the same instance even if is being visited through different branches in the tree.