Improved Deep Image Compositing Using Subpixel Masks Jonathan Egstad * Mark Davis † Dylan Lacewell ‡ DreamWorks Animation Technical Report 2015-348 Abstract 2 Deep Workflow Challenges We present an improved method of producing and manipulating deep pixel data which retains important surface information calculated during the rendering algorithm for later use during compositing, allowing operations normally performed in the renderer to be deferred until compositing. These include pixel-coverage calculation, pixel filtering, hard-surface blending, and matte object handling. Current methodologies for representing and transmitting deep pixel data work well for combining volumetric and hard-surface renders but are not very successful at combining hard-surfaces. By retaining additional surface information a renderer’s final integration steps can be reconstructed later in compositing. The current industry standard workflow for rendering and handling deep images was outlined in [Hillman 2013], and implemented in the OpenEXR library starting in version 2.0 [Kainz 2013]. The manipulation of deep data for compositing is often performed in The Foundry’s Nuke compositing package which provides a specialized tool set for manipulating deep data which conform to the OpenEXR 2.0 recommendations. In this workflow each deep sample contains at least one Z-depth value defining the sample’s distance from camera, or two Z values (termed Zfront and Zback) which define the depth range the sample covers. A depth range of 0 indicates a hard surface while > 0 indicates a homogeneous volume segment. The color encoded into such a volumetric sample is the color at Zback with logarithmic interpolation being used to determine a color value in between Zfront and Zback. CR Categories: I.3.3 [Computer Graphics]: Deep Compositing: Compositing; Keywords: deep compositing, compositing, rendering, subpixel masks 1 Deep Image Compositing A typical pixel produced by a cg renderer is normally a series of separate surface shading calculations combined, or merged, together. Most renderers sample surfaces at a higher rate than the output pixel resolution in order to reduce aliasing and integrate samples, possibly from adjoining pixels, through a pixel filter to produce a smoother final result. At each subpixel location the renderer evaluates and shades overlapping surfaces and flattens the result before integrating that subpixel’s result with the surrounding ones. The final result is termed a flat render since the result is a flat 2D image. Surfaces are typically shaded front to back so that surfaces hidden behind an opaque one are ignored, saving render time. Unfortunately hidden surfaces are potentially useful during certain post processing operations. To generate a deep image, the renderer outputs shaded surface fragments as a series of deep samples contained within a deep pixel, without flattening, optionally retaining some hidden surfaces. The final flattening step is performed at the very end of the post processes which is typically in a compositing package. Deferring flattening can avoid costly re-renders in production and one common use is rendering volumetric and hard-surface elements in separate render passes or completely different render packages, and combining them later with accurate partially-transparent depth intersections. *e-mail: [email protected] †e-mail: [email protected] ‡e-mail: [email protected] While this workflow works well for combining volumetric and hard-surface samples or combining multiple volumetric samples, it does not work so well when combining hard-surface samples. This is primarily due to: a) lack of subpixel spatial information, b) no ability to pixel-filter deep samples, and c) only logarithmic interpolation of samples is supported. Lacking additional surface information there is no way to determine x/y correlation between samples so it’s impossible to correctly weight their respective contribution to the final flattened pixel. One way around this is to multiply (pre-weight) the sample color & alpha by its pixel contribution, or coverage, but that only works when the samples are kept isolated, and since flattening is performed on the depth-sorted samples front-to-back by successive under[Porter and Duff 1984] operations the weighting must be non-uniform to compensate for the decreasing contribution of each successive sample. When these pre-weighted samples are interleaved with other samples during a deep merge operation there’s no guarantee the correct sample weighting will still exist leading to visual artifacts. Another common issue is the need to handle the merging of mutually-cutout objects while accurately applying filter effects like camera defocus (bokeh) in preparation for normal, flat compositing. Mutual cutouts occur when two or more objects are in close proximity, often overlapping or intersecting, but need to be rendered separately from each other. Rendering the objects as deep images can defer overlap/intersect resolution to the deep merging and flattening steps and allow operations like defocus to work accurately as the algorithm has the information to resolve depth intersections and reveal hidden surfaces. However keeping all rendered elements as deep data to defer the cutout issue is often impractical due to the high memory and cpu cost of interactively compositing deep data and leads to a loss in compositing control since many common comp operations cannot be performed on deep data. To be most flexible in production we generally still want to composite with flat 2D images but have them pre-processed with proper cutouts, defocusing and ready to merge. 3 Subpixel Masks To illustrate the subpixel spatial problem let's take pixel filtering. No renderer can perform pixel filtering on output deep samples and without it there's a perceptual difference between a flat render and a deep render after flattening. Since pixel filtering must always be performed last we need a method of retaining and transmitting the subpixel surface fragment information through all post deep operations until flattening is finally performed. Outputting all subpixel surface fragments as deep samples requires a tremendous amount of memory and disk space which are already stressed by existing deep compositing workflows. And even if we did we would still be missing their x/y locations within the pixel and thus unable to determine their relative distances within the pixel filter disc. We could store the subpixel x/y coordinate in an additional deep data channel, but we are still outputting all the surface fragments and they still need to be coverage pre-weighted. A brute-force solution is to scale up the render resolution to match the subpixel rate and only sample once per-pixel producing subpixel deep samples that have implicit x/y alignment. Unfortunately the increased image resolution would need to be retained though all deep compositing ops until flattening / pixel-filtering / down-sampling is performed. A better solution to retaining the subpixel spatial information and reduce deep sample count is by combining (collapsing) subpixel surface fragments together while simultaneously building up a subpixel bitmask which is interpreted as a 2D array of bits. This bitmask provides x/y correlation and pixel coverage information in a minimum of additional per-sample storage - see section 4 for details on collapsing. The bitmask size should be at least 8x8(64 bits) to adequately capture high-frequency details like fur and hair. Larger masks could be used but their storage needs become prohibitive and supporting variable-sized masks severely complicates sample management. Deep pixel flattening is performed at each subpixel mask bit by finding all deep samples that have that bit enabled, depth-sorting them and merging front-to-back while handling sample overlaps. The subpixel's flattened result is integrated with other subpixel results in a pixel filter to produce the final result. This produces more accurate results from overlapping and common-edge surfaces since the depth ordering of the deep samples at each subpixel location is handled uniquely. This produces anti-aliasing along surface edges and eliminates the incorrect color mixing of overlapping opaque samples (Figures 1a, 1b, 1c.) As mentioned in section 1 it is important to keep surface opacity and pixel coverage separated so that interleaving uncorrelated deep samples together do not produce weighting artifacts upon flattening. The surface color is still premultiplied by the surface opacity but is not premultiplied by coverage which is captured in the subpixel mask pattern (Figure 2.) The final result from this flattening and pixel filtering is not exactly the same as the result from the renderer’s filter (for example jittered subpixel locations have been lost,) but it is significantly better than no filtering at all. However while subpixel masks solve aliasing issues at the edges of surfaces there will still be aliasing when flattening uncorrelated deep samples since there are often no surface edges at those pixels and the subpixel masks are saturated (all bits on.) This happens when separate hard-surface renders are deep merged together and the renderer has collapsed the subpixel surface fragments into a single deep sample with a saturated subpixel mask. For example two walls at right angles to each other and intersecting but each wall is rendered separately and deep merged. Since all subpixel bits in the mask share the same Z depth value the slope of one surface relative to another cannot be determined at a subpixel level. To anti-alias these types of hard-surface intersections we also need the Z-depth range of the subpixel fragments before they were collapsed - see section 5 for more details on how this is handled. While adding a 64-bit mask to each deep sample may at first seem very expensive, in practice it produces a net decrease in deep sample count due to the collapsing together of surface fragments. Each deep sample has a relatively high memory cost and typically takes six 32-bit floats minimum (R, G, B, A, Zfront, Zback) totaling 24 bytes while the mask adds another 8 bytes, so dropping every other sample will save 16 bytes per sample. When written to an OpenEXR file with lossless compression the mask data channels will usually highly compress since the mask pattern typically only changes along the edges of surfaces. Most hard-surface renders will have little to no change in the pattern along a scanline. Furry/hairy objects are an obvious exception to this but they still enjoy a sample count reduction from the collapsing of subsamples - see section 4 for details. For objects that don’t readily produce subpixel information such as volumes the subpixel mask can be completely omitted. A mask value of 0x0 is handled as a special case and indicates full-coverage, since a sample with zero coverage would not be output in the first place. This means that legacy renders which do not provide subpixel masks will be interpreted as having full coverage during flattening. Figure 1a: Relationship of the front and back surfaces for figures 1b and 1c. In both figures the surfaces are separated in Z and have one edge that lines up as viewed from camera. In figure 1c the back sample is completely hidden by the front sample. Figure 1b: Common-edge surface compositing results between a renderer subpixel algorithm, OpenEXR 2.0 method, and our modified method using subpixel masks. Note the final color resulting from the subpixel mask method closely matches the renderer’s result. Figure 2: Comparison of flattening results. In (a) note the presence of seams along common-edge surfaces due to the premultiplication of samples by coverage in the renderer. In (b) no coverage-premultiplication resolves the seams but results in a loss of anti-aliasing. In (c) using subpixel masks to provide coverage restores anti-aliasing with no seams. Figure 3: Comparison of pixel filtering kernels of varying sizes. The blackman-sinc and box filter implementations of the flattener match the production renderer implementation so the result is a very close match to the renderer's pixel filter output. 4 Sample Collapsing Figure 1c: Overlapping surface compositing results between a renderer subpixel algorithm, OpenEXR 2.0 method, and our modified method using subpixel masks. Note the final color resulting from the subpixel mask method closely matches the renderer’s result. As discussed the collapsing of surface fragments together while producing subpixel masks can significantly reduce the resulting deep sample count, but what criteria can be used to associate surface fragments together? In our experience there’s no one correct way and a number of factors can be taken into account. Here's a few: Primitive and/or surface ID - do the fragments come from the same geometric primitive or surface? Surface normal - do the fragments share the same general orientation? Z-depth - are the fragments close together in Z? Brightness - are the fragments generally the same brightness? Some or all of these factors can be considered when collapsing fragments. However care must be taken with respect to subpixel brightness changes (or any high-contrast color change) as a loss of high-frequency detail can result if the contrast changes are pre-filtered out during fragment collapsing and are no longer reconstruct-able during flattening/pixel-filtering. In those instances it’s better to generate multiple deep samples with correct subpixel masks to capture the extremes of the contrast change. A good example of this would be a furry character with glinting fur highlights. The implementation of this scheme will be unique to every renderer but the basic operation of combining fragments involves averaging their color channels together and finding the min/max depth range for them all. Each fragment x/y subpixel location and shape is used to build up the final subpixel mask by enabling the corresponding bit(s) in the mask. Since the sample rate of the renderer can be higher or lower than 8x8 more than one bit may need to be enabled so it's important that the sampling locations are stratified to guarantee at least one falls inside each bit bin, otherwise holes can occur in the mask producing visible cracks in the flattened result. At this point the final deep sample with the extra subpixel mask data channels is written into the OpenEXR deep file - see section 6 for more info. For example in a REYES-style renderer, one or more micropolygons will intersect a given pixel but only cover a subset of the subpixels. Each micropolygon could be inserted into the deep pixel as a discrete sample with its corresponding subpixel coverage mask. However in the presence of motion blur and/or highly tessellated geometry there could be as many samples as there are subpixels. In order to mitigate this our renderer builds clusters of micropolygons with the same geometry and material ID and collapses them into a single sample. The sample color is derived from the average color of each micropolygon in the cluster weighted by their respective subpixel contribution. The Zfront and Zback of the sample are derived from min/max Z of all the micropolygons in the cluster. 5 Surface Flags and Sample Interpolation Deep merging of separate hard-surface renders together will often produce aliasing along any surface intersections due to the lack of surface slope information at the intersection points. Subpixel masks will not help here as both surfaces likely have full pixel coverage at these locations. The normal of the surface is of limited value as orientation alone does not provide enough information without corresponding depth information. What is needed is the Z-depth range that the surface covers in the deep pixel so that the visible portion of each surface’s contribution can be linearly weighted relative to the other surface's resulting in an anti-aliased intersection (Figure 4.) This Z-blending effect can be visually significant at the edges of rounded objects where the angle of the surface to camera is most oblique and in regions where the slopes of intersecting surfaces are nearly equal. However actually performing the linear interpolation is a challenge in the current OpenEXR workflow as samples with thickness > 0 are assumed to be volumetric segments and only log interpolation is supported. We need some way of knowing whether a deep sample is hard-surface or volumetric and it must be stored on the sample so that deep merging hard-surface and volumetric samples together retains that info through subsequent deep operations. To handle this we add a hard-surface flag bit to each deep sample and store the bit in a 16-bit half-float flag channel as an integer value. If the flag is on (1.0) the flattener performs linear interpolation of the sample’s depth range and if it’s off (0.0) it performs the normal log interpolation for a volumetric sample. The flattening algorithm must carefully handle overlaps of differing surface types when combining sample segments (Figure 5.) This scheme is backward-compatible with existing volumetric images written without the hard-surface flag since the flag channel will be filled with zeros when deep pixels are merged together during compositing. We have modified our renderer to set this flag when outputting deep samples as it is aware of the surface type during shading. We also expose controls for the user to set/change this bit on Nuke's deep OpenEXR reader, or by using a separate custom DeepSurfaceType operator. Another useful custom attribute to store on each sample is whether or not it's a matte object. Making an object become the matte source for another object is a common operation in production rendering. The result is similar to setting the matte object’s color to black (0) and merging it with other objects so the black matte object becomes a holdout of the others. Unfortunately just setting the object color to zero does not produce a correctly held out alpha, and setting the matte object's alpha to zero simply makes it a transparent black object producing no holdout at all. Because of this the matte operation is handled as a special case in a renderer when merging the surface samples together and requires some indicator that a surface is matte, either using a shader setting or geometry attribute. This surface information is normally only valid inside the renderer and is difficult to pass on to later compositing steps. The matte flag bit has float value 2.0 and is checked during flattening (or by any operation that needs to take matte-ness into account) and the matte operation is handled just like in the renderer. One common use case for deferring matte object handling until flattening is the defocusing of deep pixels. Defocusing deep samples produces a flattened pixel result so the defocus operation must perform both the blurring logic and flattening logic simultaneously. Performing the defocusing after flattening on already held out images will often have incorrectly blurred edges along intersections leading to artifacts when merged. A matte sample that's closer to camera can become blurred and partially obscure non-matte object samples with black, and since depth intersections are being handled at that moment the intersection edges of the objects are accurately blurred. This is not such a concern for the flag bits as they are stored as integer float values (0.0, 1.0, 2.0, 4.0, etc) and will survive integer and half-float conversions. When writing an OpenEXR deep file there will be the usual RGBA channels plus AOVs, the Z(front) & ZBack channels, and three new spmask channels storing the mask (spmask.1, spmask.2) and the surface flags (spmask.3.) This is the channel list from a typical OpenEXR deep file written by our renderer with the customized deep data: Figure 4: Comparing the hard-surface intersections of two teapots offset and overlapped, with and without linear interpolation. Note that (a) and (b) appear identical as although the samples in (b) have depth ranges, log interpolation fails when sample alpha is 1.0, a very common case with hard-surfaces. A, 16bit floatingpoint, sampling 1 1 B, 16bit floatingpoint, sampling 1 1 G, 16bit floatingpoint, sampling 1 1 R, 16bit floatingpoint, sampling 1 1 Z, 32bit floatingpoint, sampling 1 1 ZBack, 32bit floatingpoint, sampling 1 1 spmask.1, 32bit floatingpoint, sampling 1 1 spmask.2, 32bit floatingpoint, sampling 1 1 spmask.3, 16bit floatingpoint, sampling 1 1 Note again that spmask.1 and spmask.2 are 32-bit floating-point rather than unsigned int. 7 Nuke Deep System Changes Figure 5: Flattening steps handling a combination of overlapping volumetric (s0) and hard-surface (s1, s2) samples. 6 Storage of Subpixel Mask and Flags An 8x8 subpixel mask requires 64 bits or 8 bytes. Storing a 64-bit unsigned integer value uncompressed in an OpenEXR file or loading it into Nuke's deep system is not possible since both only support 32-bit pixel data channels. The 64-bit mask can be split into two 32-bit unsigned integer channels but in practice the mask is split into two 32-bit float channels since applications often convert integers into floats on read, destroying the bitmask pattern. The bitmask pattern is copied directly into the two floats and the reverse is done in the flattener to reassemble the mask (possible endian issues are currently ignored.) To avoid the conversion risk the mask can also be split into four 16-bit integer values and stored in four 32-bit float channels, but that incurs a heavier disk-space and memory footprint which was deemed undesirable. In practice we have not seen problems with keeping the bitmask pattern stored in floats except when writing OpenEXR deep files back out of Nuke as care must be taken to not inadvertently drop the 32-bit floats down to 16-bit half-floats. Since the current Nuke deep system (as of Nuke-9, 2015) will not support these workflow changes some deep nodes needed to be modified or added: DeepToImage: Replacement for stock Foundry flattening node implementing all the features described in this paper DeepSurfaceType: Set/clear the hard-surface and matte flags DeepMatte: Sets the matte flag for all samples to mark an image as a matte source DeepPremult: Premults/unpremults the color channels by the subpixel coverage value DeepCamDefocus: Performs camera bokeh defocusing of deep data with support for matte flag and subpixel coverage during flattening (same flattening algorithm as DeepToImage) DeepGrade: Simplified per-sample color correction with no attempt to re-weight samples exrReaderDeep: Modified to add override controls for controlling/setting subpixel mask and surface flags exrWriterDeep: Modified to always write subpixel mask channels as 32-bit floats 9 Conclusion We have described a method for reproducing several rendering operations that are unavailable in current deep image compositing workflows. The production challenges of the current workflows are: a) lack of subpixel spatial information to correctly merge overlapping or common-edge samples containing pixel coverage weights, b) lack of ability to pixel-filter, c) limiting the definition of a thick sample segment (> 0 depth range) as volumetric data, and d) lack of formal support for matte objects during merging and flattening. By extending rather than replacing the current methodology we maintain backward-compatibility while offering new functionality. Adding per-sample subpixel masks provide this minimum set of advantages: Improved merging of overlapping surfaces Improved merging of common-edge surfaces Pixel filtering can be performed Reduced sample count By supporting a hard-surface indicator flag and linear interpolation of thick samples we can blend the intersections of these hard surfaces and reduce aliasing. Adding support for a matte flag avoids the current destructive holdout methodology and allows complex filtering effects to be applied with accurate holdouts. This workflow is a work in progress and we've identified several areas that would benefit from future work: • Can OpenEXR and Nuke’s deep pixel representation be extended to store the additional bits for the subpixel mask and surface flags as deep sample metadata rather than as raw channel data? This would dramatically simplify the management of the data and avoid its accidental destruction. • Commercial renderer providers should be encouraged to include the subpixel mask and surface flag information when they write deep OpenEXRs. • Storing the xy of the surface normal as 16-bit half-floats to better define the slope direction of the surface and combining this with the subpixel mask location to find a more accurate per-subpixel Z-depth intersection. References 1. Hillman, P. 2013. The Theory of OpenEXR Deep Samples http://www.openexr.com/TheoryDeepPixels.pdf 2. Kainz, F. 2013. Interpreting OpenEXR Deep Pixels http://www.openexr.com/InterpretingDeepPixels.pdf 3. Porter, T., Duff, T. 1984. Compositing Digital Images http://graphics.pixar.com/library/Compositing/paper.pdf
© Copyright 2024