vtkPolyhedron Contour, Clip, and IsInside Algorithms#
Overview#
This document describes the algorithms used by vtkPolyhedron for contouring
(isosurface extraction), clipping, and inside/outside classification.
Reference:
J. López, A. Esteban, J. Hernández, P. Gómez, R. Zamora, C. Zanzi, F. Faura, “A new isosurface extraction method on arbitrary grids”, Journal of Computational Physics, Volume 444, 2021, 110579. https://doi.org/10.1016/j.jcp.2021.110579
The open-source isoap library implementing this algorithm is available at: https://doi.org/10.17632/4rcf98s74c.1
Face Winding Requirement#
All algorithms below require that polyhedron faces have outward-pointing
normals: vertices must be ordered counter-clockwise when viewed from outside
the cell (right-hand rule). This requirement is documented in vtkPolyhedron.h.
The solid-angle IsInside() depends on this for correct sign of the winding
number. The López contour/clip algorithm depends on this for correct
“key face” assignment (outside→inside edge transitions).
IsInside: Solid Angle Summation#
vtkPolyhedron::IsInside() determines whether a query point lies inside or
outside the polyhedron using the winding number computed via solid angle
summation.
Algorithm#
A closed surface subtends exactly 4π steradians when viewed from any interior point, and 0 steradians from any exterior point. This is the 3D generalization of the 2D winding number.
total_solid_angle = 0
for each face:
triangulate face (fan from minimum-index vertex)
for each triangle (a, b, c):
total_solid_angle += solid_angle(a, b, c, query_point)
if |total_solid_angle| ≈ 4π → INSIDE
if |total_solid_angle| ≈ 0 → OUTSIDE
The solid angle of a triangle is computed using the Van Oosterom–Strackee formula:
tan(Ω/2) = det(ra, rb, rc) / (|ra||rb||rc| + |ra|(rb·rc) + |rb|(ra·rc) + |rc|(ra·rb))
where ra = a − x, etc.
A near-surface tolerance provides an early-out for points very close to faces, treating them as inside. This avoids numerical ambiguity at the boundary.
Comparison with Previous Implementation#
Aspect |
Old (Random Ray Casting) |
New (Solid Angle) |
|---|---|---|
Determinism |
Non-deterministic |
Deterministic |
Edge cases |
Ray hits vertex → retry |
No special cases |
Iterations |
Up to 10 with voting |
Single pass |
Winding dependence |
None (counts intersections) |
Requires outward normals |
Contour: López Polygon-Tracing Algorithm#
vtkPolyhedron::Contour() extracts isosurface polygons using the López
polygon-tracing method, implemented in vtkPolyhedronContour.
Algorithm#
Classify vertices as inside (scalar > isoValue) or outside.
Find iso-vertices on edges that cross the isosurface. Each iso-vertex lies on an edge shared by exactly two faces. One face sees the edge as outside→inside (the key face); the other sees inside→outside.
Trace iso-polygons by walking through key faces:
start at any unvisited iso-vertex repeat: face = current.KeyFace next = next iso-vertex on this face (circular order) until next == start
The next-iso-vertex-on-key-face rule produces closed polygons that form the isosurface contour without any post-processing.
Interpolate positions linearly along the crossing edges.
Key Insight#
Each iso-vertex has a “key face” (where the edge goes outside→inside in face traversal order). Tracing through key faces to the next iso-vertex automatically produces closed contour polygons. No triangulation, no edge merging, no fragment assembly.
Comparison with Previous Implementation#
Aspect |
Old (Triangulate-and-Merge) |
New (López) |
|---|---|---|
Face handling |
Triangulates all faces |
Works on original faces |
Internal edges |
Created then removed |
Never created |
Post-processing |
Complex MergeTriFacePolygons |
None |
Winding dependence |
None |
Requires outward normals |
Code complexity |
~800 lines, 20+ functions |
~550 lines, 1 class |
Clip#
vtkPolyhedron::Clip() reuses the contour infrastructure with an additional
face-building step:
Steps 1–3 from Contour (classify, find iso-vertices, trace iso-polygons).
Build clipped faces: for each original face, walk vertices keeping those on the retained side, inserting iso-vertices at cut edges.
Add cap faces: the traced iso-polygons become cap faces that close the clipped polyhedron (with reversed winding for
insideOut).Output as a polyhedron face stream.
Implementation Structure#
File |
Description |
|---|---|
|
Cell class; delegates to |
|
López algorithm: per-cell and bulk threaded API |
|
Threaded contour: polyhedron handling in fast + general paths |
|
Threaded clip: polyhedron handling with face stream output |
Centroid: Divergence Theorem#
vtkPolyhedron::GetCentroid() computes the volume and centroid simultaneously
using the divergence theorem. Each face is fan-triangulated from its first
vertex. Each triangle (v0, v1, v2) forms a signed tetrahedron with the origin
whose volume is v0 . (v1 x v2) / 6. The centroid is the volume-weighted
average of tet centroids (v0 + v1 + v2) / 4.
This replaces the previous pyramid-decomposition approach that required an
arbitrary apex point, per-face normal/area computation via vtkPolygon, and
face centroid calculations.
Computational Complexity#
Let E = total edge count across all faces (equivalently, sum of face vertex counts). For typical polyhedra E is O(F·k) where F = faces and k = average vertices per face.
Operation |
Old |
New |
|---|---|---|
Contour |
O(F·k²) — per-face triangle merge with linear-search edge ordering |
O(E) — single-pass polygon tracing |
Clip |
O(F·k²) — same contour + face rebuild |
O(E) — trace + linear face walk |
IsInside |
O(F) with up to 10 non-deterministic retries |
O(E) single deterministic pass |
GetCentroid |
O(E) with per-face vtkPolygon dispatch |
O(E) direct arithmetic |
The old contour/clip bottleneck was OrderEdgePolygon, which performs a linear
scan plus std::vector::erase (both O(n)) for each edge in a face’s contour
fragment, giving O(k²) per face. The new López algorithm visits each iso-vertex
exactly once during polygon tracing.
The constant-factor improvement is also significant: the old implementation creates ~20 intermediate data structures (vectors of edge vectors, edge-face maps, point-edge multimaps, edge count maps) and converts between representations multiple times. The new implementation uses one hash map for edge-to-iso-vertex lookup and traces directly.
Bulk Threaded API#
vtkPolyhedronContour provides a per-cell static API for integration with
VTK’s threaded contour and clip filters. The API takes the cell’s
vtkCellArray face stream and a vtkDataArray scalar field directly, without
instantiating vtkPolyhedron objects.
Contour: single-call per-cell helper#
Contour extraction uses a single-pass cell helper, ContourCell. The helper
runs the full López trace for one polyhedron and reports the intersected
edges as (globalPtId0, globalPtId1, t) tuples plus a per-polygon
vertex-count array. Iso-vertex deduplication is performed by the calling
filter through its shared edge locator (vtkStaticEdgeLocatorTemplate); the
helper does not write final iso-vertex coordinates. There is no separate
count phase because vtkContour3DLinearGrid does not prefix-sum polyhedron
cell sizes before running the trace: the point merger absorbs per-cell
output into thread-local buffers, and the fast path emits triangle vertex
coordinates directly.
Clip: two-pass Count/Emit#
Clip uses the classic two-pass pattern required to deduplicate iso-vertices through a shared edge locator:
Count pass (parallel): Each thread calls
CountClipper cell to determine output sizes and to record the set of intersected edges.Prefix sum + edge locator build (serial): Global offsets are computed from per-cell counts, and the collected edge list is fed into a shared
vtkStaticEdgeLocatorTemplatethat assigns unique iso-vertex IDs.Emit pass (parallel): Each thread calls
EmitClipper cell, writing geometry into pre-allocated output arrays at the computed offsets.
Both Count and Emit internally call the same RunLopezTrace function. The
trace is O(E). Running it twice per cell (once for Count, once for Emit) is
the canonical price paid for the two-pass parallel structure, and the
thread-local trace workspace keeps the constant factor low.
API Summary#
All entry points take the cell’s vtkCellArray face stream and a
vtkDataArray scalar directly, and report intersected edges as
(globalPtId0, globalPtId1, t) tuples for downstream deduplication via a
shared edge locator.
Contour — single-call helper. Used by vtkContour3DLinearGrid in both
fast and general paths.
static void ContourCell(numPointIds, pointIds, polyhedronFaces, scalars,
isoValue, generateTriangles,
polygonsSize, intersectedEdges);
polygonsSize is a per-output-polygon vertex count (3 for each triangle
when generateTriangles=true); intersectedEdges holds the edge tuples in
polygon traversal order so the caller can build polygon connectivity by
looking up each edge’s deduplicated iso-vertex ID.
Clip — two-pass Count/Emit. Used by vtkTableBasedClipDataSet. Output
point IDs are written directly (combining the kept-point pointMap lookup
with the iso-vertex numberOfKeptPoints + edgeId offset) so the caller can
copy the resulting cell arrays into pre-reserved global offsets without a
second remapping pass.
static void CountClip(numPointIds, pointIds, polyhedronFaces, scalars,
isoValue, insideOut,
&numOutputCells, &numOutputCellConnectivity,
&numOutputFaces, &numOutputFacesConnectivity,
intersectedEdges);
static void EmitClip(numPointIds, pointIds, polyhedronFaces, scalars,
isoValue, insideOut, pointMap, numberOfKeptPoints, edgeLocator,
outputCells, outputFaces);
The clip Count/Emit functions return edge endpoint pairs and interpolation weights rather than interpolated point data, so the calling filter can perform point data interpolation externally in bulk, matching the pattern used by existing VTK threaded filters.
Filter Integration#
Filter |
Path |
How polyhedra are handled |
|---|---|---|
|
Delegates to |
Threaded fast + general paths |
|
|
|
|
|
|
|
|
Polyhedra flow through the same batch / prefix-sum / extract pipeline as other cell types, using the cell-array |
|
Per-cell fallback |
|
Batch Data Extension (vtkTableBasedClipDataSet)#
The vtkTableBasedClipDataSet batch framework uses TableBasedCellBatchData
with an operator+= that accumulates per-batch counts across threads via
prefix sum. Polyhedron support required adding two new fields:
PolyhedronFacesOffset: total face connectivity entries (vertices per face)PolyhedronFaceLocsOffset: total face count entries (number of faces per cell)
These accumulate through the existing operator+= / BuildOffsetsAndGetGlobalSum
machinery alongside CellsOffset, CellsConnectivityOffset, and
CentroidsOffset. The prefix sum converts per-batch counts into global offsets
that ExtractCells uses to write face data at the correct positions.
Polyhedron cells reuse the existing sentinel case values that
vtkMarchingCellsClipCases.h defines for all cell types: KEPT_CELL_CASE
(fast-kept, all vertices on the retained side) and DISCARDED_CELL_CASE
(fast-discarded, no vertices on the retained side). Cells that actually
intersect the isosurface get the case value 1, indicating “handled by the
Count/Emit path.” These values are distinct from the 8-bit MC case indices
used by other cell types, so there is no collision.
Performance#
Benchmarks on an 8-core machine with polyhedron grids (hexahedral cells stored
as VTK_POLYHEDRON), comparing the threaded vtkContour3DLinearGrid and
vtkTableBasedClipDataSet against the legacy per-cell vtkContourGrid and
vtkClipDataSet paths:
Contour (vtkContour3DLinearGrid with MergePoints + InterpolateAttributes
vs vtkContourGrid):
Grid size |
Legacy (ms) |
Threaded (ms) |
Speedup |
|---|---|---|---|
30³ (27K) |
68 |
15 |
4.4x |
50³ (125K) |
194 |
63 |
3.1x |
70³ (343K) |
400 |
183 |
2.2x |
Clip (vtkTableBasedClipDataSet vs vtkClipDataSet):
Grid size |
Legacy (ms) |
Threaded (ms) |
Speedup |
|---|---|---|---|
20³ (8K) |
36 |
14 |
2.6x |
30³ (27K) |
67 |
32 |
2.1x |
See the production CFD mesh section below for detailed clip benchmarks across multiple cut scenarios on a 2.88M-cell mesh.
Key optimizations: direct grid array access when points and scalars are
double (eliminates per-cell copy), thread_local trace workspace (reuses
hash maps and vectors across cells within each thread), and a compact-remap
path that keeps the per-cell scratch buffers dense in local indices.
When input points are not double, a compact-remap path is used: a per-thread
globalToLocal map remaps cell point IDs to dense local indices 0..nCellPts-1,
the face stream is remapped to local indices, and cellPts/cellScalars buffers
of size nCellPts are filled. Performance is comparable to the double path
(68 ms vs 61 ms on the 2.88M-cell drone mesh).
Production CFD Mesh (drone_results.cgns)#
A production all-polyhedra CFD mesh with 2.88M VTK_POLYHEDRON cells and
6.5M points (CGNS NFACE/NGON format). Benchmarks run on an 8-core Linux
workstation. Each reported number is the median wall-time across 5
consecutive Update() calls, after one warmup Update(), measured with
time.perf_counter() around the filter Update(). Baseline is the
released ParaView 6.1.0 binary (pvpython) on the same Linux machine,
against the same mesh and pipeline.
Contour — PRES isocontour at value=0:
The speedup comes from two independent contributions:
Change |
Baseline |
Result |
Speedup |
|---|---|---|---|
López algorithm alone (serial |
~10,300 ms (PV 6.1.0 baseline) |
~1,300 ms (Linux, warning-free) |
~8x |
Threading + C3DLG infrastructure |
~1,300 ms (serial López) |
59 ms (C3DLG, 8 cores, no normals) |
~22x |
Combined vs PV 6.1.0 baseline (no normals) |
~10,300 ms |
59 ms |
~175x |
Combined vs PV 6.1.0 baseline (ComputeNormals=ON) |
~10,400 ms |
73 ms |
~143x |
The old algorithm emits hundreds of topology-defect warnings per run via
mutex-serialized vtkGenericWarningMacro, which dominates wall-clock time
in a warning-heavy environment. The López algorithm handles the same defective
cells silently and correctly, so warning overhead is not a factor in the new
path.
Path |
Time (ms) |
Output cells |
Output size |
|---|---|---|---|
ParaView 6.1.0 baseline, GT=ON, ComputeNormals=OFF |
10,309 |
438K triangles |
18.6 MB |
ParaView 6.1.0 baseline, GT=OFF, ComputeNormals=OFF |
11,247 |
438K triangles |
18.6 MB |
|
59 |
439K triangles |
18.6 MB |
|
73 |
439K triangles |
18.6 MB |
|
51 |
175K polygons |
12.6 MB |
|
55 |
175K polygons |
12.6 MB |
ComputeNormals=ON adds only a few milliseconds (cell-normal compute and
point-normal averaging at merged vertices). The previous orientation pass via
vtkOrientPolyData is no longer needed because the López trace produces
consistently oriented iso-polygons directly, so polygon-mode contours no
longer pay an orientation cost. Linux, 8-core workstation, 2.88M-cell drone
mesh, PRES=0.
The GenerateTriangles=OFF path uses ContourCell(generateTriangles=false) to
extract polygon connectivity directly, bypassing fan-triangulation. Output
polygons carry correct cell data and point normals. The 175K polygon count vs
439K triangle count reflects that polyhedron iso-polygons are typically quads
or pentagons that would otherwise each be split into 2-3 triangles. The 6 MB
output size reduction (32%) is meaningful in time-series workflows. With the
orientation pass removed, GT=OFF is now strictly faster than GT=ON on this
mesh (51 ms vs 59 ms with no normals).
Contour memory — on Windows, the old vtkContourGrid working set peaks at
3,310 MB during processing vs a 2,703 MB baseline — a ~600 MB transient
spike from per-cell vtkPolyhedron intermediate allocations. The new C3DLG
path uses thread-local workspace that stabilizes at high-water mark after the
first few cells and shows no measurable spike.
Clip — plane y=0 (half-domain, ~50% cells kept), 5 reps, median:
Path |
Time (ms) |
Output |
|---|---|---|
ParaView 6.1.0 baseline ( |
4,822 |
1,450,017 cells, 3,406,651 pts |
|
194 |
1,449,631 cells, 3,283,329 pts |
This is a ~25x speedup at the user-visible Clip filter level. The
dominant contribution is that vtkTableBasedClipDataSet now handles
polyhedra directly via CountClip/EmitClip instead of falling back to a
contour-based clip pipeline; the unified inside/outside classification
(shared with ContourCell) avoids redundant per-cell vertex tagging across
the count and emit passes.
Edge-locator sizing is bounded by the set of edges actually intersected by the plane (O(N_intersected × avg_edges_per_cell)) rather than the full input edge set, which keeps small-cut scenarios fast and avoids the half-domain slowdown seen in earlier clip implementations that pre-populated the locator with all input points.
Summary: on this all-polyhedra CFD mesh, vtkContour3DLinearGrid is
~22x faster than serial López contour and ~175x faster than the
ParaView 6.1.0 baseline (no normals). With ComputeNormals=ON the speedup vs
baseline is ~143x. vtkTableBasedClipDataSet with first-class polyhedron
support is ~25x faster than the ParaView 6.1.0 baseline on the y=0 plane
cut. All numbers are PRES=0 contour and y=0 plane clip on the 2.88M-cell
drone mesh, 8-core Linux workstation, comparing to ParaView 6.1.0 (released)
on the same hardware.
Synthetic All-Polyhedra Scaling Study#
To characterize how wall-time scales with mesh size, we run the same
contour and clip operations on synthetic all-polyhedra grids generated by
vtkCellTypeSource(VTK_POLYHEDRON) with a wavelet scalar field
(vtkRTAnalyticSource formula evaluated analytically on the polyhedral
mesh’s points; no auxiliary image or resampler). Isovalue is the field
midrange; clip plane is y = D/2 (half-domain).
Timing methodology: each reported number is the arithmetic mean of 5
consecutive Update() calls on the same filter instance, after a single
warmup Update() to absorb any one-time scheduling or allocation costs.
Each Update() runs the full filter end-to-end (RequestData() to
output ready). Wall-time is measured with time.perf_counter() around
the Update() call. Input dataset construction, scalar field evaluation,
and any pipeline plumbing happen once before timing starts and are not
included in the reported numbers.
This is a problem-size scaling study at fixed thread count: every row uses the same Linux 8-core workstation as the drone benchmark, with VTK SMP running at its default thread count (8). Threads are not varied. The timings therefore reflect how total work grows with mesh size on a fixed machine, not parallel efficiency (strong scaling) or per-thread efficiency under proportional growth (weak scaling).
Grid |
Input cells |
Output cells (contour) |
Contour GT=ON (ms) |
Contour GT=OFF (ms) |
Clip y=mid (ms) |
|---|---|---|---|---|---|
100³ |
1.0 M |
167 K |
18 |
16 |
39 |
200³ |
8.0 M |
673 K |
90 |
76 |
309 |
300³ |
27.0 M |
1.52 M |
230 |
200 |
1,077 |
400³ |
64.0 M |
2.69 M |
463 |
435 |
2,656 |
500³ |
125.0 M |
4.21 M |
761 |
720 |
5,180 |
The plot above is log-log; the dashed line is slope-1 (linear) for
reference. Contour wall-time scales sublinearly with input cells: 125x
input growth from 100³ to 500³ produces only ~42x growth in contour
time. The reason is dimensional: contour input is volumetric (O(D³))
but contour output is the iso-surface, which is O(D²). The output-bounded
portion of the algorithm therefore grows more slowly than total input
work. Clip output is also volumetric (a fraction of the input mesh), so
its cost scales close to linearly with input.
The drone production benchmark (2.88 M cells, 51-73 ms contour, 194 ms clip) sits comfortably within the linear regime of this scaling curve. The 125 M-cell point demonstrates that the algorithm contours nine-figure polyhedral meshes in under one second on commodity hardware.
GenerateTriangles=OFF for Linear Cells#
vtkContour3DLinearGrid now handles GenerateTriangles=OFF for all supported
linear cell types (VTK_HEXAHEDRON, VTK_TETRA, VTK_WEDGE, VTK_PYRAMID) via a
vtkPolygonBuilder post-process on the merged triangle output. Previously,
vtkContourFilter with GenerateTriangles=OFF fell back to vtkContourGrid
(serial, per-cell) for all these types. Now the full threaded C3DLG pipeline
handles them end-to-end — vtkContourGrid is no longer involved.
The polygon builder groups merged triangles by input cell ID and uses edge
interior/boundary counting to reconstruct the iso-polygon boundary. This
reproduces what vtkContourHelper does inside vtkContourGrid, but applied
once on the globally merged output rather than per cell.
Benchmarks on a 30×30×30 synthetic grid (vtkCellTypeSource),
vtkContour3DLinearGrid vs vtkContourGrid:
Cell type |
CGrid ON (ms) |
C3DLG ON (ms) |
CGrid OFF (ms) |
C3DLG OFF (ms) |
OFF speedup |
|---|---|---|---|---|---|
HEX (27K cells) |
12.8 |
1.7 |
6.0 |
3.4 |
1.8x |
TET (324K cells) |
26.0 |
2.1 |
40.0 |
17.0 |
2.4x |
WEDGE (54K cells) |
5.5 |
0.8 |
9.5 |
4.7 |
2.0x |
PYRAMID (162K cells) |
15.1 |
2.0 |
24.6 |
13.1 |
1.9x |
vtkContourFilter now unconditionally routes all supported cell types to
vtkContour3DLinearGrid regardless of GenerateTriangles, passing the flag
through via SetGenerateTriangles(). vtkContourGrid is no longer used for
any mesh that CanFullyProcessDataObject accepts.
Limitations#
These are inherent to vertex-based isosurface extraction, shared with the previous implementation:
Vertex-based sampling: only vertex scalar values are considered
Linear interpolation: iso-vertex position is linearly interpolated
One crossing per edge: an edge can have at most one iso-vertex
Manifold assumption: each edge must be shared by exactly two faces
Outward normals required: faces must be wound with outward-pointing normals
Normal Computation (ComputeNormals=ON)#
When ComputeNormals=ON, vtkContour3DLinearGrid computes per-cell normals
on the contour output and averages them at each merged vertex:
Cell normals —
vtkTriangle::ComputeNormalfor triangles,vtkPolygon::ComputeNormalover all polygon vertices for native iso-polygons (GenerateTriangles=OFF).Point normals — unweighted average of adjacent cell normals at each merged vertex, normalized. Identical to
vtkPolyDataNormals::GetPointNormals.
This relies on the López trace and the marching case tables producing
consistently wound output for the cell types they handle. Polyhedron
generation paths that may emit reordered faces (e.g. the wedge decomposition
in vtkUnstructuredGrid) are corrected at the source so the contour output
is consistent without a post-pass orientation step.
Future Work#
Face-Based Algorithm and GPU Execution#
The current implementation operates cell-by-cell: for each polyhedron, it unpacks the face stream, builds a global-to-local index map, copies coordinates and scalars into a compact local buffer, and runs the López trace. This is efficient on CPU but does not map well to GPU execution, where irregular per-cell control flow and per-cell memory allocation are costly.
We have developed a face-based reformulation of the López algorithm that eliminates all cell-level bookkeeping and exposes the computation as two purely data-parallel phases.
Key Insight#
In a face-based mesh (the native format of CFD solvers such as Fluent and OpenFOAM), each internal face stores its vertex list wound CCW for the owner cell (c0), and a neighbour cell ID (c1). The López polygon trace, which walks cell faces in CCW order to assign key vertices, is equivalent to walking each face once for each of its two cells — forward for c0, reversed for c1. The owner/neighbour direction flag encodes all the cell-level orientation information López needs; no cell topology is accessed during the algorithm.
Algorithm#
Phase 1 runs in parallel over all (face, cell) pairs. For each face, walk its vertices in the CCW direction for the given cell. Collect crossing edges in walk order to form the IPVINT list. For each key edge (a crossing where the walk steps outside-to-inside), pair it with the immediately preceding crossing edge in the IPVINT list (circular). Emit that pair as a polygon graph edge attributed to the cell. For faces with four crossing edges (saddle configurations), this rule resolves the topological ambiguity identically to the original López Algorithm 3.
Phase 2 runs in parallel over cells. Each cell accumulates a small polygon adjacency graph from Phase 1. Since every iso-vertex has degree two in this graph, the polygon is recovered by following the chain until it closes. Multiple disconnected cycles produce separate output polygons.
This reformulation has been verified to produce bit-for-bit identical polygon vertex sets and cyclic orderings to the original López algorithm across 300 test cases including random scalar fields and deliberately constructed ambiguous face configurations.
Phase 1 maps directly to a GPU compute shader or Viskores worklet: one thread per (face, cell) pair, no shared state during emission. Phase 2 maps to one thread per intersected cell executing a short sequential loop of 3 to 20 steps. The compact local index remap required by the current CPU implementation disappears entirely since the algorithm operates on global vertex IDs throughout.
Implications for vtkPolyhedronContour#
The face-based reformulation could replace the current ContourCell and
CountClip/EmitClip implementations, removing the per-cell scratch
buffers (ScratchGlobalToLocal, ScratchFaceStream, ScratchCoords,
ScratchScalars) and the global-to-local hash map build. The public API
surface would change to accept a face list with owner/neighbour
assignments rather than a per-cell face stream. For meshes already stored
in face-based format no preprocessing would be required.
Native Face-Based Storage for Polyhedral Datasets#
The current VTK unstructured grid stores polyhedra via a face stream
embedded in vtkCellArray: for each polyhedron cell, the face stream
encodes face count, then for each face its vertex count followed by global
vertex IDs. This format requires unpacking the full face stream to access
any single face, and does not directly expose the owner/neighbour
relationship between adjacent cells.
A more efficient representation for polyhedral CFD data would store faces as first-class entities:
Face list: one entry per unique face, storing vertex IDs wound CCW for the owner cell, the owner cell ID (c0), and the neighbour cell ID (c1, or -1 for boundary faces).
Cell-to-face map: for each cell, the list of face IDs that bound it. Used only for cell reconstruction — not needed during isosurface extraction.
This representation matches the native format of OpenFOAM
(owner/neighbour/faces files) and Fluent polyhedral meshes. It
enables the face-based isosurface algorithm to operate directly on the
stored data without any intermediate unpacking or reformatting.
Cell Reconstruction from Faces#
When cell-level output is required (e.g. for clip operations that output a clipped unstructured grid rather than a surface), cells can be reconstructed from the face list using the cell-to-face map. Each output polyhedron is described by collecting its bounding face IDs, remapping face vertices to local indices, and writing the VTK face stream. This reconstruction pass is a straightforward map over cells and is independent of the isosurface extraction.
For some in-situ and GPU use cases, cell reconstruction may not be needed at all: the isosurface output is a surface mesh of polygons, which is already in the correct form for GPU rendering without any cell-level structure.
GPU Execution Paths#
Three implementation strategies are being considered for GPU execution of the face-based algorithm:
Viskores worklets provide GPU portability across CUDA, HIP, and OpenMP backends. Phase 1 maps to a scatter worklet over faces; Phase 2 maps to a small gather worklet over cells. Viskores does not currently have a native face-based mesh representation and would require either a new dataset type or a face-array adapter.
Direct CUDA kernels give full control and are appropriate for use cases committed to NVIDIA hardware. The face-based algorithm’s two-phase structure maps directly: a CUDA kernel over (face, cell) pairs for Phase 1, a prefix sum for output sizing, and a second kernel over cells for Phase 2. This path requires no new VTK data structures but is not portable to AMD or Intel GPUs.
Compute shaders (GLSL/WGSL via VTK’s WebGPU or OpenGL backends) are the most forward-looking option for browser-delivered rendering where output is streamed to a JavaScript frontend. Phase 1 is a compute pass with atomic append or two-pass count/emit; Phase 2 can be skipped by rendering polygon edges directly from the Phase 1 output and fan-triangulating from a centroid.
For CFD meshes dominated by polyhedra with smooth scalar fields, ambiguous faces (four crossing edges on a single face) are rare. The IPVINT circular-previous rule handles them correctly in all cases without any geometric test, so all three GPU paths can use the same branch-free per-face kernel.