Improved Deep Image Compositing Using Subpixel Masks

Improved Deep Image Compositing Using Subpixel Masks
Jonathan Egstad *
Mark Davis †
Dylan Lacewell ‡
DreamWorks Animation Technical Report 2015-348
Abstract
2 Deep Workflow Challenges
We present an improved method of producing and manipulating
deep pixel data which retains important surface information
calculated during the rendering algorithm for later use during
compositing, allowing operations normally performed in the
renderer to be deferred until compositing. These include
pixel-coverage calculation, pixel filtering, hard-surface blending,
and matte object handling.
Current methodologies for
representing and transmitting deep pixel data work well for
combining volumetric and hard-surface renders but are not very
successful at combining hard-surfaces. By retaining additional
surface information a renderer’s final integration steps can be
reconstructed later in compositing.
The current industry standard workflow for rendering and
handling deep images was outlined in [Hillman 2013], and
implemented in the OpenEXR library starting in version 2.0
[Kainz 2013]. The manipulation of deep data for compositing is
often performed in The Foundry’s Nuke compositing package
which provides a specialized tool set for manipulating deep data
which conform to the OpenEXR 2.0 recommendations.
In this workflow each deep sample contains at least one Z-depth
value defining the sample’s distance from camera, or two Z values
(termed Zfront and Zback) which define the depth range the
sample covers. A depth range of 0 indicates a hard surface while >
0 indicates a homogeneous volume segment. The color encoded
into such a volumetric sample is the color at Zback with
logarithmic interpolation being used to determine a color value in
between Zfront and Zback.
CR Categories: I.3.3 [Computer Graphics]: Deep Compositing:
Compositing;
Keywords: deep compositing, compositing, rendering, subpixel
masks
1 Deep Image Compositing
A typical pixel produced by a cg renderer is normally a series of
separate surface shading calculations combined, or merged,
together. Most renderers sample surfaces at a higher rate than the
output pixel resolution in order to reduce aliasing and integrate
samples, possibly from adjoining pixels, through a pixel filter to
produce a smoother final result.
At each subpixel location the renderer evaluates and shades
overlapping surfaces and flattens the result before integrating that
subpixel’s result with the surrounding ones. The final result is
termed a flat render since the result is a flat 2D image. Surfaces
are typically shaded front to back so that surfaces hidden behind
an opaque one are ignored, saving render time. Unfortunately
hidden surfaces are potentially useful during certain post
processing operations.
To generate a deep image, the renderer outputs shaded surface
fragments as a series of deep samples contained within a deep
pixel, without flattening, optionally retaining some hidden
surfaces. The final flattening step is performed at the very end of
the post processes which is typically in a compositing package.
Deferring flattening can avoid costly re-renders in production and
one common use is rendering volumetric and hard-surface
elements in separate render passes or completely different render
packages, and combining them later with accurate
partially-transparent depth intersections.
*e-mail: [email protected]
†e-mail: [email protected]
‡e-mail: [email protected]
While this workflow works well for combining volumetric and
hard-surface samples or combining multiple volumetric samples,
it does not work so well when combining hard-surface samples.
This is primarily due to: a) lack of subpixel spatial information, b)
no ability to pixel-filter deep samples, and c) only logarithmic
interpolation of samples is supported.
Lacking additional surface information there is no way to
determine x/y correlation between samples so it’s impossible to
correctly weight their respective contribution to the final flattened
pixel. One way around this is to multiply (pre-weight) the sample
color & alpha by its pixel contribution, or coverage, but that only
works when the samples are kept isolated, and since flattening is
performed on the depth-sorted samples front-to-back by
successive under[Porter and Duff 1984] operations the weighting
must be non-uniform to compensate for the decreasing
contribution of each successive sample. When these pre-weighted
samples are interleaved with other samples during a deep merge
operation there’s no guarantee the correct sample weighting will
still exist leading to visual artifacts.
Another common issue is the need to handle the merging of
mutually-cutout objects while accurately applying filter effects
like camera defocus (bokeh) in preparation for normal, flat
compositing. Mutual cutouts occur when two or more objects are
in close proximity, often overlapping or intersecting, but need to
be rendered separately from each other. Rendering the objects as
deep images can defer overlap/intersect resolution to the deep
merging and flattening steps and allow operations like defocus to
work accurately as the algorithm has the information to resolve
depth intersections and reveal hidden surfaces. However keeping
all rendered elements as deep data to defer the cutout issue is
often impractical due to the high memory and cpu cost of
interactively compositing deep data and leads to a loss in
compositing control since many common comp operations cannot
be performed on deep data. To be most flexible in production we
generally still want to composite with flat 2D images but have
them pre-processed with proper cutouts, defocusing and ready to
merge.
3 Subpixel Masks
To illustrate the subpixel spatial problem let's take pixel filtering.
No renderer can perform pixel filtering on output deep samples
and without it there's a perceptual difference between a flat render
and a deep render after flattening. Since pixel filtering must
always be performed last we need a method of retaining and
transmitting the subpixel surface fragment information through all
post deep operations until flattening is finally performed.
Outputting all subpixel surface fragments as deep samples
requires a tremendous amount of memory and disk space which
are already stressed by existing deep compositing workflows. And
even if we did we would still be missing their x/y locations within
the pixel and thus unable to determine their relative distances
within the pixel filter disc. We could store the subpixel x/y
coordinate in an additional deep data channel, but we are still
outputting all the surface fragments and they still need to be
coverage pre-weighted. A brute-force solution is to scale up the
render resolution to match the subpixel rate and only sample once
per-pixel producing subpixel deep samples that have implicit x/y
alignment. Unfortunately the increased image resolution would
need to be retained though all deep compositing ops until
flattening / pixel-filtering / down-sampling is performed.
A better solution to retaining the subpixel spatial information and
reduce deep sample count is by combining (collapsing) subpixel
surface fragments together while simultaneously building up a
subpixel bitmask which is interpreted as a 2D array of bits. This
bitmask provides x/y correlation and pixel coverage information
in a minimum of additional per-sample storage - see section 4 for
details on collapsing. The bitmask size should be at least 8x8(64
bits) to adequately capture high-frequency details like fur and
hair. Larger masks could be used but their storage needs become
prohibitive and supporting variable-sized masks severely
complicates sample management.
Deep pixel flattening is performed at each subpixel mask bit by
finding all deep samples that have that bit enabled, depth-sorting
them and merging front-to-back while handling sample overlaps.
The subpixel's flattened result is integrated with other subpixel
results in a pixel filter to produce the final result. This produces
more accurate results from overlapping and common-edge
surfaces since the depth ordering of the deep samples at each
subpixel location is handled uniquely. This produces anti-aliasing
along surface edges and eliminates the incorrect color mixing of
overlapping opaque samples (Figures 1a, 1b, 1c.) As mentioned in
section 1 it is important to keep surface opacity and pixel
coverage separated so that interleaving uncorrelated deep samples
together do not produce weighting artifacts upon flattening. The
surface color is still premultiplied by the surface opacity but is not
premultiplied by coverage which is captured in the subpixel mask
pattern (Figure 2.)
The final result from this flattening and pixel filtering is not
exactly the same as the result from the renderer’s filter (for
example jittered subpixel locations have been lost,) but it is
significantly better than no filtering at all.
However while subpixel masks solve aliasing issues at the edges
of surfaces there will still be aliasing when flattening uncorrelated
deep samples since there are often no surface edges at those pixels
and the subpixel masks are saturated (all bits on.) This happens
when separate hard-surface renders are deep merged together and
the renderer has collapsed the subpixel surface fragments into a
single deep sample with a saturated subpixel mask. For example
two walls at right angles to each other and intersecting but each
wall is rendered separately and deep merged. Since all subpixel
bits in the mask share the same Z depth value the slope of one
surface relative to another cannot be determined at a subpixel
level. To anti-alias these types of hard-surface intersections we
also need the Z-depth range of the subpixel fragments before they
were collapsed - see section 5 for more details on how this is
handled.
While adding a 64-bit mask to each deep sample may at first seem
very expensive, in practice it produces a net decrease in deep
sample count due to the collapsing together of surface fragments.
Each deep sample has a relatively high memory cost and typically
takes six 32-bit floats minimum (R, G, B, A, Zfront, Zback)
totaling 24 bytes while the mask adds another 8 bytes, so
dropping every other sample will save 16 bytes per sample. When
written to an OpenEXR file with lossless compression the mask
data channels will usually highly compress since the mask pattern
typically only changes along the edges of surfaces. Most
hard-surface renders will have little to no change in the pattern
along a scanline. Furry/hairy objects are an obvious exception to
this but they still enjoy a sample count reduction from the
collapsing of subsamples - see section 4 for details.
For objects that don’t readily produce subpixel information such
as volumes the subpixel mask can be completely omitted. A mask
value of 0x0 is handled as a special case and indicates
full-coverage, since a sample with zero coverage would not be
output in the first place. This means that legacy renders which do
not provide subpixel masks will be interpreted as having full
coverage during flattening.
Figure 1a: Relationship of the front and back surfaces for figures
1b and 1c. In both figures the surfaces are separated in Z and
have one edge that lines up as viewed from camera. In figure 1c
the back sample is completely hidden by the front sample.
Figure 1b: Common-edge surface compositing results between a
renderer subpixel algorithm, OpenEXR 2.0 method, and our
modified method using subpixel masks. Note the final color
resulting from the subpixel mask method closely matches the
renderer’s result.
Figure 2: Comparison of flattening results. In (a) note the
presence of seams along common-edge surfaces due to the
premultiplication of samples by coverage in the renderer. In (b)
no coverage-premultiplication resolves the seams but results in a
loss of anti-aliasing. In (c) using subpixel masks to provide
coverage restores anti-aliasing with no seams.
Figure 3: Comparison of pixel filtering kernels of varying sizes.
The blackman-sinc and box filter implementations of the flattener
match the production renderer implementation so the result is a
very close match to the renderer's pixel filter output.
4 Sample Collapsing
Figure 1c: Overlapping surface compositing results between a
renderer subpixel algorithm, OpenEXR 2.0 method, and our
modified method using subpixel masks. Note the final color
resulting from the subpixel mask method closely matches the
renderer’s result.
As discussed the collapsing of surface fragments together while
producing subpixel masks can significantly reduce the resulting
deep sample count, but what criteria can be used to associate
surface fragments together? In our experience there’s no one
correct way and a number of factors can be taken into account.
Here's a few:
 Primitive and/or surface ID - do the fragments come from
the same geometric primitive or surface?
 Surface normal - do the fragments share the same general
orientation?
Z-depth - are the fragments close together in Z?
Brightness - are the fragments generally the same
brightness?
Some or all of these factors can be considered when collapsing
fragments. However care must be taken with respect to subpixel
brightness changes (or any high-contrast color change) as a loss of
high-frequency detail can result if the contrast changes are
pre-filtered out during fragment collapsing and are no longer
reconstruct-able during flattening/pixel-filtering. In those
instances it’s better to generate multiple deep samples with correct
subpixel masks to capture the extremes of the contrast change. A
good example of this would be a furry character with glinting fur
highlights.


The implementation of this scheme will be unique to every
renderer but the basic operation of combining fragments involves
averaging their color channels together and finding the min/max
depth range for them all. Each fragment x/y subpixel location and
shape is used to build up the final subpixel mask by enabling the
corresponding bit(s) in the mask. Since the sample rate of the
renderer can be higher or lower than 8x8 more than one bit may
need to be enabled so it's important that the sampling locations are
stratified to guarantee at least one falls inside each bit bin,
otherwise holes can occur in the mask producing visible cracks in
the flattened result.
At this point the final deep sample with the extra subpixel mask
data channels is written into the OpenEXR deep file - see section
6 for more info.
For example in a REYES-style renderer, one or more
micropolygons will intersect a given pixel but only cover a subset
of the subpixels. Each micropolygon could be inserted into the
deep pixel as a discrete sample with its corresponding subpixel
coverage mask. However in the presence of motion blur and/or
highly tessellated geometry there could be as many samples as
there are subpixels. In order to mitigate this our renderer builds
clusters of micropolygons with the same geometry and material
ID and collapses them into a single sample. The sample color is
derived from the average color of each micropolygon in the
cluster weighted by their respective subpixel contribution. The
Zfront and Zback of the sample are derived from min/max Z of all
the micropolygons in the cluster.
5 Surface Flags and Sample Interpolation
Deep merging of separate hard-surface renders together will often
produce aliasing along any surface intersections due to the lack of
surface slope information at the intersection points. Subpixel
masks will not help here as both surfaces likely have full pixel
coverage at these locations. The normal of the surface is of limited
value as orientation alone does not provide enough information
without corresponding depth information. What is needed is the
Z-depth range that the surface covers in the deep pixel so that the
visible portion of each surface’s contribution can be linearly
weighted relative to the other surface's resulting in an anti-aliased
intersection (Figure 4.) This Z-blending effect can be visually
significant at the edges of rounded objects where the angle of the
surface to camera is most oblique and in regions where the slopes
of intersecting surfaces are nearly equal.
However actually performing the linear interpolation is a
challenge in the current OpenEXR workflow as samples with
thickness > 0 are assumed to be volumetric segments and only log
interpolation is supported. We need some way of knowing
whether a deep sample is hard-surface or volumetric and it must
be stored on the sample so that deep merging hard-surface and
volumetric samples together retains that info through subsequent
deep operations.
To handle this we add a hard-surface flag bit to each deep sample
and store the bit in a 16-bit half-float flag channel as an integer
value. If the flag is on (1.0) the flattener performs linear
interpolation of the sample’s depth range and if it’s off (0.0) it
performs the normal log interpolation for a volumetric sample.
The flattening algorithm must carefully handle overlaps of
differing surface types when combining sample segments (Figure
5.)
This scheme is backward-compatible with existing volumetric
images written without the hard-surface flag since the flag
channel will be filled with zeros when deep pixels are merged
together during compositing. We have modified our renderer to
set this flag when outputting deep samples as it is aware of the
surface type during shading. We also expose controls for the user
to set/change this bit on Nuke's deep OpenEXR reader, or by
using a separate custom DeepSurfaceType operator.
Another useful custom attribute to store on each sample is
whether or not it's a matte object. Making an object become the
matte source for another object is a common operation in
production rendering. The result is similar to setting the matte
object’s color to black (0) and merging it with other objects so the
black matte object becomes a holdout of the others. Unfortunately
just setting the object color to zero does not produce a correctly
held out alpha, and setting the matte object's alpha to zero simply
makes it a transparent black object producing no holdout at all.
Because of this the matte operation is handled as a special case in
a renderer when merging the surface samples together and
requires some indicator that a surface is matte, either using a
shader setting or geometry attribute. This surface information is
normally only valid inside the renderer and is difficult to pass on
to later compositing steps.
The matte flag bit has float value 2.0 and is checked during
flattening (or by any operation that needs to take matte-ness into
account) and the matte operation is handled just like in the
renderer.
One common use case for deferring matte object handling until
flattening is the defocusing of deep pixels. Defocusing deep
samples produces a flattened pixel result so the defocus operation
must perform both the blurring logic and flattening logic
simultaneously. Performing the defocusing after flattening on
already held out images will often have incorrectly blurred edges
along intersections leading to artifacts when merged. A matte
sample that's closer to camera can become blurred and partially
obscure non-matte object samples with black, and since depth
intersections are being handled at that moment the intersection
edges of the objects are accurately blurred.
This is not such a concern for the flag bits as they are stored as
integer float values (0.0, 1.0, 2.0, 4.0, etc) and will survive integer
and half-float conversions.
When writing an OpenEXR deep file there will be the usual
RGBA channels plus AOVs, the Z(front) & ZBack channels, and
three new spmask channels storing the mask (spmask.1,
spmask.2) and the surface flags (spmask.3.) This is the channel
list from a typical OpenEXR deep file written by our renderer
with the customized deep data:
Figure 4: Comparing the hard-surface intersections of two
teapots offset and overlapped, with and without linear
interpolation. Note that (a) and (b) appear identical as although
the samples in (b) have depth ranges, log interpolation fails when
sample alpha is 1.0, a very common case with hard-surfaces.
A, 16­bit floating­point, sampling 1 1
B, 16­bit floating­point, sampling 1 1
G, 16­bit floating­point, sampling 1 1
R, 16­bit floating­point, sampling 1 1
Z, 32­bit floating­point, sampling 1 1
ZBack, 32­bit floating­point, sampling 1 1
spmask.1, 32­bit floating­point, sampling 1 1
spmask.2, 32­bit floating­point, sampling 1 1
spmask.3, 16­bit floating­point, sampling 1 1
Note again that spmask.1 and spmask.2 are 32-bit floating-point
rather than unsigned int.
7 Nuke Deep System Changes
Figure 5: Flattening steps handling a combination of overlapping
volumetric (s0) and hard-surface (s1, s2) samples.
6 Storage of Subpixel Mask and Flags
An 8x8 subpixel mask requires 64 bits or 8 bytes. Storing a 64-bit
unsigned integer value uncompressed in an OpenEXR file or
loading it into Nuke's deep system is not possible since both only
support 32-bit pixel data channels. The 64-bit mask can be split
into two 32-bit unsigned integer channels but in practice the mask
is split into two 32-bit float channels since applications often
convert integers into floats on read, destroying the bitmask
pattern.
The bitmask pattern is copied directly into the two floats and the
reverse is done in the flattener to reassemble the mask (possible
endian issues are currently ignored.) To avoid the conversion risk
the mask can also be split into four 16-bit integer values and
stored in four 32-bit float channels, but that incurs a heavier
disk-space and memory footprint which was deemed undesirable.
In practice we have not seen problems with keeping the bitmask
pattern stored in floats except when writing OpenEXR deep files
back out of Nuke as care must be taken to not inadvertently drop
the 32-bit floats down to 16-bit half-floats.
Since the current Nuke deep system (as of Nuke-9, 2015) will not
support these workflow changes some deep nodes needed to be
modified or added:

DeepToImage: Replacement for stock Foundry
flattening node implementing all the features described
in this paper

DeepSurfaceType: Set/clear the hard-surface and matte
flags

DeepMatte: Sets the matte flag for all samples to mark
an image as a matte source

DeepPremult: Premults/unpremults the color channels
by the subpixel coverage value

DeepCamDefocus: Performs camera bokeh defocusing
of deep data with support for matte flag and subpixel
coverage during flattening (same flattening algorithm as
DeepToImage)

DeepGrade: Simplified per-sample color correction
with no attempt to re-weight samples

exrReaderDeep: Modified to add override controls for
controlling/setting subpixel mask and surface flags

exrWriterDeep: Modified to always write subpixel
mask channels as 32-bit floats
9 Conclusion
We have described a method for reproducing several rendering
operations that are unavailable in current deep image compositing
workflows. The production challenges of the current workflows
are: a) lack of subpixel spatial information to correctly merge
overlapping or common-edge samples containing pixel coverage
weights, b) lack of ability to pixel-filter, c) limiting the definition
of a thick sample segment (> 0 depth range) as volumetric data,
and d) lack of formal support for matte objects during merging
and flattening.
By extending rather than replacing the current methodology we
maintain
backward-compatibility
while
offering
new
functionality. Adding per-sample subpixel masks provide this
minimum set of advantages:

Improved merging of overlapping surfaces

Improved merging of common-edge surfaces

Pixel filtering can be performed

Reduced sample count
By supporting a hard-surface indicator flag and linear
interpolation of thick samples we can blend the intersections of
these hard surfaces and reduce aliasing.
Adding support for a matte flag avoids the current destructive
holdout methodology and allows complex filtering effects to be
applied with accurate holdouts.
This workflow is a work in progress and we've identified several
areas that would benefit from future work:
•
Can OpenEXR and Nuke’s deep pixel representation be
extended to store the additional bits for the subpixel
mask and surface flags as deep sample metadata rather
than as raw channel data? This would dramatically
simplify the management of the data and avoid its
accidental destruction.
•
Commercial renderer providers should be encouraged to
include the subpixel mask and surface flag information
when they write deep OpenEXRs.
•
Storing the xy of the surface normal as 16-bit half-floats
to better define the slope direction of the surface and
combining this with the subpixel mask location to find a
more accurate per-subpixel Z-depth intersection.
References
1. Hillman, P. 2013. The Theory of OpenEXR Deep Samples
http://www.openexr.com/TheoryDeepPixels.pdf
2. Kainz, F. 2013. Interpreting OpenEXR Deep Pixels
http://www.openexr.com/InterpretingDeepPixels.pdf
3. Porter, T., Duff, T. 1984. Compositing Digital Images
http://graphics.pixar.com/library/Compositing/paper.pdf