[ZZ] Understanding 3D rendering step by step with 3DMark11 - BeHardware >> Graphics cards

http://www.behardware.com/art/lire/845/

-->

Understanding 3D rendering step by step with 3DMark11 - BeHardware
>>
Graphics cards

Written by Damien
Triolet

Published on November 28, 2011

URL: http://www.behardware.com/art/lire/845/

Page
1

Introduction

The representation of real-time 3D
in modern games has become so complex that the old adage of a picture being as
good as a thousand words is generally speaking a hard one to follow here. While
it’s relatively easy to illustrate most of the graphics effects with particular
examples, it’s much harder to represent them as stages of a full rendering.
Nevertheless this is what’s required if we want to understand how images in
recent games are constructed.

Although we will be going into the stats
and other technical detail in this report, we have also come across an ideal
example that allows us to illustrate 3D rendering in practice and somewhat
demystify the process.

3DMark 11

Since the release 3DMark 11 about a year ago, we have been
getting to grips with its inner workings so as to see if it did indeed represent
the sort of implementation of DirectX 11 that would serve help us judge the
capabilities of current GPUs in games to come. This process has taken us some
time given the thousands of rendering commands to be observed and the various
bugs and other limitations of the analytical tools on offer from AMD and NVIDIA
and these complications have meant we have had to put the report on hold on
several occasions.

While these observations have enabled us to formulate
a critique of how 3DMark 11 puts DirectX11 innovations into practice – something
we’ll be coming back to in a forthcoming report – they also represent an
opportunity for us to shed some light, using some clear visuals, on the
different stages required in the construction of the type of real-time 3D
rendering used in recent video games, namely deferred rendering. Deferred
rendering consists in preparing all the ingredients needed for the construction
of an image in advance, storing them in intermediate memory buffers and only
combining them to compute the lighting once the whole scene has been reviewed,
so as to avoid processing hidden pixels.

If they’re doing their work
properly, developers make the effort to optimise the slightest details of the 3D
rendering they have gone for, which, in terms of the level at which we are able
to observe it, results in blurring the edges between the different stages that
make a rendering up or even removing any separation between these stages
altogether. The situation is slightly different for Futuremark, the developer
behind 3DMark11, as their goal is to compare the performance of different
graphics cards with modern rendering techniques in as objective a way as
possible and not to try and implement all the deepest optimisations. This is
what has allowed us to take some ‘snapshots’ of the image construction
process.

We have added some stats to our snapshots to
enable us to give you an idea of the complexity of modern rendering. We will
also give you an explanation of some of the techniques used. With a view to
allowing as many readers as possible to understand how 3D works, we have put the
most detailed explanations in insets and included a summary of
the different stages on the last page of the report.
Those for whom the
words "normal map" or "R11G11B10_FLOAT" mean nothing will therefore be able to
visualise simply and rapidly how a 3D image is
constructed.

Page 2

Deferred rendering, our
observations

Deferred
rendering

Before getting into more detail we want to
describe the type of rendering observed. 3DMark 11 and more and more games with
advanced graphics use deferred rendering, with Battlefield 3 probably
representing the most advanced implementation. Standard or forward rendering
consists in computing lighting triangle by triangle as objects are processed.
Given that some triangles or pieces of them end up being masked by others,
forward rendering implies the calculation of many pixels that don’t actually
show up in the image. This can result in a very significant waste of processing
resources.

Deferred rendering provides a solution to this problem by
calculating only the basic components of the lighting (including textures) when
it initially takes stock of all the objects in a scene. This data is then stored
in temporary memory buffers known as Render Targets (RT) (together they make up
the g-buffer) and used later for the final calculation of lighting. This process
can be seen as a kind of post-processing filter that is only implemented on the
pixels displayed on screen. This saves processing power and makes it easier to
manage complex lighting from numerous light sources.

However it can cause memory consumption to
increase and the bandwidth required for the storage of all the intermediate data
can block up the GPU during the early rendering stages. Disadavantages also
include some challenges to manage multi-sample type antialiasing and transparent
surfaces. Furturemark have put into place a solution for multi-sample
antialiasing but have opted to keep things simple by ignoring transparent
surfaces, which means you won’t see any windscreen on the 4x4 that appears in
some scenes.

Our observations

To
explain how 3D rendering works, we went for scene 3 in 3DMark 11, in Extreme
mode, namely at 1920x1080 with 4x antialiasing. This scene has the advantage of
showing day light.

We have segmented the rendering into stages that more
or less correspond to the passes that structure 3D rendering. While modern GPUs
can do an enormous number of things in a single pass (before writing a result to
memory), it is simpler, more efficient and sometimes compulsory to go for
several rendering passes. This is moreover a fundamental part of deferred
rendering and post processing effects.

We have extracted visuals to
represent each stage as clearly as possible. Given that certain Render Targets
are in HDR, a format that can’t be directly displayed, we have had to modify
them slightly to make them more representative.

For those who really want
to get into the detail, we have added technical explanations and a certain
amount of information linked to each pass along with stats obtained in GPU Perf
Studio:

Rendering time: the time (in ms) taken by the Radeon HD 6970
GPU to process the whole pass, with a small overhead linked to the measuring
tools (+ % of total time for the rendering of the image).

Vertices
before tessellation: number of vertices that fit into the GPU, excluding the
triangles generated through tessellation.

Vertices after
tessellation: number of vertices going out of the tessellator, including the
triangles generated by tessellation.

Primitives: number of
primitives (triangles, lines or points) which fit in the setup
engine.

Primitives ejected from the rendering: number of
primitives ejected from the rendering by the setup engine, either because they
aren’t facing the camera and are therefore invisible or because they’re out of
the field of view.

Pixels: number of pixels generated by the
rasterizer (2.1 million pixels for a 1920x1080 area).

Elements
exported by the pixel shaders: number of elements written to memory by the
pixels shaders, of which there can be several per pixel generated by the
rasterizer, ie. in the construction of the g-buffer.

Texels:
number of texels (texturing components) read by texturing units; the more
complex the filtering, the more there are.

Instructions executed:
number of instructions executed by a Radeon HD 6970 for all shader
processing.

Quantity of data read: total quantity of data read
from both textures and RTs, in case of blending (with the exception of geometric
and depth data).

Quantity of data written: total quantity of data
written to the RTs (with the exception of depth data)

Note that these
quantities of data are not the same as those that transit to video memory as
GPUs implement numerous optimisations to compress
them.

Page 3

Stage 1: clearing memory
buffers

Stage 1: clearing memory
buffers

The first stage in any 3D rendering is the least
interesting and consists in resetting the memory buffer zones, known as Render
Targets (RTs) to which the GPU writes data. Without this the data defining the
previous image will interfere with the new image to be computed.

In
certain types of rendering, RTs can be shared between several successive images,
to accumulate information for example. Here of course they aren’t reset. 3DMark
11 doesn’t however share any data beween successive images, which is a
requirement for maximum efficiency in a multi-GPU set up.

Resetting all these buffers basically means
stripping all the values they contain back to zero, which corresponds to a black
image. Recent GPUs carry out this process very rapidly, depending on the size of
the memory buffers.

When the rendering is initialised, 3DMark 11 resets 7
RTs very rapidly: 0.1ms or 0.1% of the rendering time. Later five very large RTs
dedicated to shadows will also have to be reset, taking the total time taken up
with this thankless task to 1.4ms, or 1.1% of the overall rendering
time.

Page
4

Stage 2: filling the
g-buffer

Stage 2: filling the
g-buffer

After preparing the RTs, the engine starts a
first geometric pass: filling the g-buffer. At this relatively heavy stage all
the objects that make up the scene are taken into account and processed to fill
the g-buffer. This includes tessellation and the application of the different
textures.

Objects can be presented to the GPU in different
formats.
3DMark 11 uses instancing as often as possible, a mode
that allows you to send a series of identical objects (eg. all the leaves, all
the heads that decorate the columns and so on) with a single rendering command
(draw call). Limiting the number of these reduces CPU consumption. There are 91
in all in this main rendering pass, 42 of which use tessellation. Here are some
examples:

Rendering
commands: [ 1 ][ 6 ][ 24 ][ 35 ][ 86 ]

The g-buffer consists of 4 RTs at
1920x1080 with multi-sample type antialiasing (MSAA) 4x. Note that if you look
carefully you can see a small rendering bug:

[ Z-buffer ]
[ Normals ]
[ Diffuse colours ]
[ Specular colours ]

The
Depth Buffer, or Z-buffer, is in D32 format (32-bit). It contains depth
information for each element with respect to the camera: the darker the object
the closer it is.
The normals (= perpendicular to each point) are in
R10G10B10A2_UNORM (32 bits, 10-bit integer for each component). They allow the
addition of details to objects via a highly developed bump mapping
technique.

The diffuse components of pixel colours are in the
R8G8B8A8_UNORM (32 bits standard, 8-bit integer for each component) format, they
represent a uniform lighting which takes into account the angle at which the
light hits an object but ignores the direction of the reflected
light.

The specular components of pixel colours are in the R8G8B8A8_UNORM
(standard 32 bits, 8-bit integer per component) format and here they take
account of the direction of the reflected light, which means glossy objects can
be designed with a slight light reflection on the edge.

The last of the rendering commands is for the sky, which is
represented by an hemisphere that englobes the scene. Given that the sky is not
lit like the other parts of the scene but is itself a luminous surface, it is
rendered directly and not with deferred rendering, which starts the construction
of the final image:

A few stats:

Rendering
time: 18.2 ms (14.5 %)
Vertices before tessellation: 0.91
million
Vertices after tessellation: 1.95 million
Primitives:
1.90 million
Primitives ejected from the rendering: 1.02
million
Pixels: 8.96 million
Elements exported by the pixel
shaders: 30.00 million
Texels: 861.31 million
Instructions
executed: 609.53 million
Quantity of data read: 130.2
MB
Quantity of data written: 158.9
MB

Page 5

Stage 3: ambient
occlusion

Stage 3: ambient
occlusion

The lighting in 3DMark 11 tries to get as close
as possible to the principle of global illumination (radiosity, ray-tracing and
so on), which is very heavy on resources but which takes refractions and
reflections and therefore indirect illumination, (ie. the light reflected by any
object in the scene) into account. To get close to this type of rendering,
Futuremark uses various simulated effects:

- A directional light coming from the ground and numerous fill
lights that simulate the sunlight transmitted indirectly from the ground and
surrounding objects. We’ll cover this further when we come to lighting
passes.

- An ambient occlusion texture that simulates soft shadows
generated by the deficit of indirect light, which can’t be represented by the
first effect (not precise enough). Here’s what it looks like:

Ambient
occlusion, written to an RT in R8_UNORM (8-bit integer) format is calculated
from the Depth Buffer and normals in such a way as to take account of all the
geometric details, even those simulated from bump mapping as is the case in the
HDAO from AMD that is used in several games. With the Extreme preset, 5x6
samples are selected with a random parameter and used to determine ambient
occlusion. You can find more detail on this subject in our
report on ambient occlusion.

A few stats:

Rendering
times: 2.3 ms (1.8%)
Vertices before tessellation:
6
Vertices after tessellation: -
Primitives:
2
Primitives ejected from the rendering: 0
Pixels: 2.59
million
Elements exported by pixel shaders: 2.59 million
Texels:
78.80 million
Instructions executed: 626.23 million
Quantity
of data read: 73.3 MB
Quantity of data written: 3.0
MB

Page 6

Stage 4: antialiasing

Stage 4:
antialiasing

As deferred rendering isn’t directly
compatible with standard MSAA type antialiasing, notably because the lighting
isn’t calculated during geometry processing, Futuremark had to set up an
alternative technique. It consists in the creation of a map of edges which is
used to filter them during the calculation of lighting, as MSAA would have
done:

Up
until here, all the RTs were rendered with MSAA 4x antialiasing as Futuremark
opts not to use post processing antialiasing such as FXAA and MLAA, provided by
NVIDIA and AMD for video games developers.
MSAA isn’t however natively
compatible with deferred rendering, which is only designed to calculate lighting
once per pixel and therefore ignores the samples that make it up. One rather
rough and ready approach would be to switch, at this moment, to something
similar to super sampling, which is facilitated by DirectX 10.1 and 11. That
would however mean calculating lighting at 3840x2160, would waste a lot of
resources and would work against the very definition of deferred rendering.

Futuremark went for something else, a hybrid between MSAA and
post-processing. Like post-processing, it consists of using an algorithm capable
of detecting the edges that need to be smoothed using the g-buffer data.
Although not perfect (that would be too resource heavy), this algorithm does a
good job to detect those edges that are external to objects (there’s no need to
filter internal edges).

This RT, in R8_UNORM (8 bits integer) format,
which contains the edges detected will be used during all the lighting passes to
come to mark out the complex pixels that require particular attention. Dynamic
branching in the pixel shaders enables calculation of the value of the mix of
the four samples, as would have been the case with a standard use of
MSAA.

At the same time the RT in which the image is constructed and which
only contains the sky up until this point, as well as the Depth Buffer, in MSAA
4x format at first, can be filtered here as the additional information they
contain will not be of any use hereafter. The RTs which contain the diffuse and
specular components of pixel colours must however be conserved in MSAA 4x
format, as the additional samples they contain will be required in the
calculation of complex pixels.

A few stats:

Rendering
times: 1.4 ms (1.1 %)
Vertices before tessellation:
3
Vertices after tessellation: -
Primitives:
2
Primitives ejected from the rendering: 0
Pixels: 2.07
million
Elements exported by pixel shaders: 6.22 million
Texels:
39.43 million
Instructions executed: 185.44 million
Quantity
of data read: 182.3 MB
Quantity of data written: 9.9
MB

Page 7

Stage 5: shadows

Stage 5:
shadows

3DMark 11 can generate shadows linked to
directional lights (the sun or the moon) and spot lights (not present in test
3). In both cases shadow mapping is used. This technique consists in projecting
all the objects in the scene from the point of view of the source of light and
only retaining a Z-buffer which is then called a shadow map. In contrast to what
its name (shadow map) might lead you to think, a shadow texture is not applied
to the image.

A shadow map shows, for each of its points, the distance
from the light source at which objects are in shadow. A pixel’s position is then
simply cross checked with the information in the shadow maps to ascertain
whether it’s lit or in shadow.

For directional light sources, 3DMark 11 uses a little
variant: cascaded shadow maps (CSM). Given the immense area lit by the sun, it’s
difficult, even at very high resolution (4096x4096) to get enough precision for
shadows, which tend to pixelise. CSMs provide a solution to this by working with
several levels of shadow maps which focus on a progressively smaller area in the
view frustum, so as to conserve optimal quality.

In Extreme mode 3DMark
11 creates five shadow maps of 4096x4096 which are generated from 339 rendering
commands of which 142 use tessellation. This represents one of the largest loads
of the scene. The darker an object is, the closer it is to the light
source:

The
scene from the sun: [ CSM 1 ][ CSM 2 ][ CSM 3 ][ CSM 4 ][ CSM 5 ]

Although
it’s possible to calculate all these shadow maps first followed by the lighting
afterwards, Futuremark has decided to interleave them, which probably makes
light processing a little less efficient but avoids putting excessive demands on
memory space. At any given moment then, there is never more than a single shadow
map in the video memory, which is partly why 3DMark 11 can still run pretty well
on graphics cards equipped with just 768 MB, or even 512 MB.
As with the
creation of the g-buffer, we’re talking about geometric passes here given that
the whole scene must be taken into account, or at least a subset of it for the
lower level CSMs. Tessellation is also used as the shadows must correspond to
the objects that make them and this can represent an enormous processing load.
In contrast to the pass for the creation of the g-buffer however, no colour data
is calculated, only depth. Since Doom 3 and the introduction of the GeForce FXs,
GPUs have been able to increase their throughput to a great extent in this
simplified rendering mode.

Note this exception: objects such as
vegetation, generated from false geometry, namely alpha tests, are not processed
in this fast mode as pixels must then be generated so that they can be placed in
the scene.

A few stats:

Rendering
times: 22.6 ms (17.9 %)
Vertices before tessellation: 3.35
million
Vertices after tessellation: 8.91 million
Primitives:
8.50 million
Primitives ejected from the rendering: 5.17
million
Pixels: 83.67 million
Elements exported by the pixel
shaders: 24.03 million
Texels: 416.66 million
Instructions
executed: 725.13 million
Quantity of data read: 50.5
MB
Quantity of data written: 0.0 MB (the depth data isn’t taken into
account)

Page 8

Stage 6: primary lights

Stage 6: primary
lights

After preparing the data required for the creation
of shadows, 3DMark 11 moves on to the rendering of the primary light sources,
which take the shadows into account. These sources of light may be directional
(sun, moon…) or spot type. There are no spot sources in the scene observed here
but there is light from the sun. Five cascade shadow maps are required for the
shadows generated by the sun across the scene. Calculation of these shadow maps
is interleaved with the rendering of the lighting in the area of the field of
view they cover so that they don’t monopolise the video memory too
much.

This means that 3DMark 11 requires five passes to compute the
directional lighting to simulate light from the sun (LD2a/b/c/d/e). An
additional pass is used to help simulate the global illumination and more
particularly the light from the sun reflected by the ground, as this then itself
becomes a low intensity source of directional light (LD1). Thus the light
accumulates little by little in the image under preparation:

[ Sky ] + [ LD 1 ] + [ LD2a ] + [ LD2b ] + [ LD2c ] + [ LD2d ] + [ LD2e ]

This
image under preparation, in R11G11B10_FLOAT (fast HDR 32-bit) format, represents
surface lighting, the model for which is a combination of diffuse Oren-Nayar
reflectance and Cook-Torrance specular reflectance as well as Rayleigh-Mie type
atmospheric attenuation. In addition to the shadow maps, it takes into account
the ambient occlusion calculated previously.
In parallel to the surface
lighting, volumetric lighting is also calculated. See the page on this for
further details. Its performance cost is however included in the figures given
here as it’s processed in the same pixel shader as surface lighting.

A few stats:

Rendering
times: 24.7 ms (19.6 %)
Vertices before tessellation:
18
Vertices after tessellation: -
Primitives:
6
Primitives ejected from the rendering: 0
Pixels: 8.13
million
Elements exported by pixel shaders: 14.18
million
Texels: 390.91 million
Instructions executed:
2567.59 million
Quantity of data read: 1979.2 MB
Quantity of
data written: 54.6 MB

Page 9

Stage 7: secondary
lights

Stage 7: secondary
lights

To simulate global illumination, 3DMark 11 also
calls on numerous secondary point lights. They represent a point which sends
light in all directions. In the 3DMark 11 implementation, these are fill lights
which ‘fill’ the light space and are thus part of the simulation effects taken
into account for global illumination. More specifically, each of these light
sources slightly illuminates the area it covers (a cube):

There are no fewer than 84 of these
point lights in our test scene:

[ Directional lights ] + [ Fill lights ]

The
point lights don’t generate any shadow as ambient occlusion simulates them at a
lower processing cost. 3DMark 11 processes them in 2 passes to take into account
a special case : when their volume of influence intersects the camera near
plane.
Volumetric lighting can be computed for fill lights as well but it
is not the case in our test scene.

Given the number of point lights, this
part of the process represents a significant component of the rendering
time.

A few stats:

Rendering
times: 33.7 ms (26.8 %)
Vertices before tessellation:
688
Vertices after tessellation: -
Primitives:
1008
Primitives ejected from the rendering: 853
Pixels:
45.87 million
Elements exported by the pixel shaders: 45.87
million
Texels: 369.86 million
Instructions executed:
9073.06 million
Quantity of data read: 1494.2 MB
Quantity of
data written: 177.6 MB

Page 10

Stage 8: volumetric
lighting

Stage 8: volumetric
lighting

3DMark 11 uses volumetric lighting to simulate
the rays of sun that shine through the atmosphere, or water in underwater
scenes. This approximation uses a ray creation technique and is generated
progressively over the course of the previous lighting passes that, to recap,
represent the gound (LD1) and the sun (LD2a/b/c/d/e):

[ LD1 ] + [ LD2a ] + [ LD2b ] + [ LD2c ] + [ LD2d ] + [ LD2e ]

The last lighting pass simply
integrates this volumetric component in the final image, still under
construction :

[ Without volumetric lighting ] [ With volumetric lighting ]

Volumetric
lighting is obtained by an approximation for each pixel of light dispersed by
the atmosphere (or water) between the object and the surface being observed and
the camera. One ray is sent per pixel and per light source with sampling carried
out at several depth levels.
Note that while the optical density is fixed
for the atmosphere, for the water it’s precomputed for each image (as well as
the resulting accumulated transmittance) in an array of 2D textures. This stage
takes place right at the beginning of the rendering, but isn’t required in the
scene we’re looking at.

A few stats:

Rendering
times: 0.7 ms (0.6 %)
Vertices before tessellation:
3
Vertices after tessellation: -
Primitives:
2
Primitives ejected from the rendering: 0
Pixels: 2.07
million
Elements exported by the pixel shaders: 2.07
million
Texels: 33.18 million
Instructions executed: 232.24
million
Quantity of data read: 15.9 MB
Quantity of data written:
7.9 MB

Page 11

Stage 9: depth of field
effect

Stage 9: depth of field
effect

For the Depth of Field (DoF) effect, 3DMark uses a
more complex technique than a simple post-processing filter. It’s similar to the
"Sprite-based
Bokeh Depth of Field" that’s used in Crysis 2. Basically this technique
consists in stretching every pixel that isn’t in the sharp area of the image
using the geometry shaders introduced in DirectX 10, to a proportion
corresponding to the blurriness of the pixel. Here’s what it gives on a section
of the image (click on the links to get the full image):

[ Without DoF ] [ With DoF ]

This type of depth of field effect uses the geometry shaders to
generate a sprite (2 triangles that face the camera) for each pixel that must be
blurred. The size of this sprite depends on the circle of confusion, which is
computed beforehand in a 16-bit floating point buffer, and a hexagonal bokeh is
used to simulate a diaphragm with six blades.
This operation is carried
out in a 64-bit HDR format, R16G16B16A16_FLOAT, at full resolution as well as at
a resolution divided by 2 and 4. Each pixel to be processed is sent to one of
these resolutions depending on the size of its circle of confusion and they are
combined afterwards to finalise the depth of field effect that can then be added
to the final image.

The darker a pixel, the smaller its circle of
confusion. Here white pixels represent pixels whose circle of confusion is
higher than the value beyond which they are no longer sharp.

More than 2 million small triangles are generated
in fuchsia.

A few stats:

Rendering
times: 9.7 ms (7.7 %)
Vertices before tessellation: 1.10
million
Vertices after tessellation: -
Primitives: 2.20
million
Primitives ejected from the rendering: 0
Pixels:
22.41 million
Elements exported by the pixel shaders: 22.70
million
Texels: 93.12 million
Instructions executed: 217.96
million
Quantity of data read: 87.1 MB
Quantity of data written:
49.8 MB

Page 12

Stage 10:
post-processing

Stage 10:
post-processing

The last heavy processing rendering stage
in 3DMark is post-processing, which includes different filters and optical
effects: bloom, halos (lens flares) and reflections formed in the lenses, grain,
tone mapping and resizing. Optical effects are calculated by the compute shaders
and represent the biggest post-processing load. Tone mapping allows to interpret
the HDR image while resizing simulates a large anamorphic format:

[ Before post-processing ] [ After post-processing ]

Post-processing
is segmented into three stages: bloom + lens flares, internal lenses reflections
and tone mapping + the grain. The last stage is the simplest: a relatively
simple pixel shader combines the two effects.
The other two stages, which
require a 128-bit HDR format (R32G32B32A32_FLOAT), are more complex and call on
a fast Fourrier transform (FFT) four times which is executed via a succession of
compute shaders. First of all, the image to be processed is reduced to a
resolution that corresponds to the power of two directly above a quarter of the
original resolution (1920 -> 480 -> 512). Next it’s transformed to
frequency-domain from which the bloom and the lens flares on one hand and the
reflections on the other take form by means of dedicated filters. In the first
case, the filter must be computed in advance, corresponding to one of the four
usages of the fast Fourrier transformation.

[ Filter ]
+ [ Image in
frequency-domain ] ->[ Filter
applied ]
->[
Reconstruction – inverse FFT ] = [ Bloom + lens
flares ]
[ Lens
reflections ]

A few stats:

Rendering
times: 10.7 ms (8.5 %)of which 10.3 ms via compute shader
(8.2%)
Vertices before tessellation: 22
Vertices after
tessellation: -
Primitives: 24
Primitives ejected from the
rendering: 0
Pixels: 3.44 million
Elements exported by the
pixel shaders: 3.44 million
Texels: 104.99 million of which 72.48
million via compute shader
Instructions executed: 165.20 million of
which 126.48 million via compute shader
Qunatity of data read: 819.1
MB of which 590.0 MB via compute shader
Quantity of data written:
615.1 MB of which 448.9 MB via compute
shader

Page 13

Stage 11: interface

Stage 11:
interface

The final stage is also the simplest: drawing
the interface above the image that has just been calculated. For this, each of
the elements that go to make it up are integrated in the form of a texture drawn
on a quad (rectangle formed by two triangles):

A few stats:

Rendering
times: 0.4 ms (0.03 %)
Vertices before tessellation:
96
Vertices after tessellation: -
Primitives:
46
Primitives ejected from the rendering: 0
Pixels:
82972
Elements exported by the pixel shaders: 76096
Texels:
86112
Instructions executed: 609.53 million
Quantity of data
read: 0.6 MB
Quantity of data written: 0.3
MB

Page 14

The final image

The final image

Preparation
: [ Objects ] ->[ G-buffer
] + [ Shadows
]
Lighting: [ Sky ] + [ Primary ] + [ Secondary ] + [ Volumetric ]
Post-processing + interface : [ Final image ]
To create an image such as
this one, 3DMark 11 does not hold back in the deployment of resources and here
it has processed 564 draw calls, 12 million triangles, 150 million pixels, 85
lights and 14 billion instructions!

This is enough to put any current
DirectX 11 GPU on its knees, what with tessellation, geometry shaders, compute
shaders, high quality shadows, depth of field effects and complex camera lenses
effects, not to forget a lighting that is extremely resource heavy.

This
sort of complexity will inevitably eventually turn up in video games, no doubt
in more efficient forms. Crysis 2 and Battlefield 3 alone already use similar
graphics engines with a few compromises when it comes to geometric load and
lighting algorithms calibrated so as to run on current hardware.

We hope
that this report will have given you a slightly clearer idea of how a modern
graphics engine works. To finish up then, here are the final stats representing
the load to be processed by the GPU:

Rendering times: 125.9 ms (= 8 fps)
Vertices before
tessellation: 5.36 million
Vertices after tessellation: 11.97
million
Primitives: 12.61 million
Primitives ejected from the
rendering: 6.19 million
Pixels: 179.29 million
Elements
exported by the pixel shaders: 151.18 million
Texels: 2.39
billion
Instructions executed: 14.40 billion
Quantity of data
read: 4.73 GB
Quantity of data written: 1.08
GB