Commits

Jason McKesson committed 2edef3e

Improved build instructions.

Comments (0)

Files changed (7)

Documents/Basics/Tutorial 01.xml

                     <para>Change the positions of the vertex data. Keep position values in the [-1,
                         1] range, then see what happens when triangles go outside this range. Notice
                         what happens when you change the Z value of the positions (note: nothing
-                        should happen while they're within the range). Keep W at 1.0 for now.</para>
-                </listitem>
-                <listitem>
-                    <para>Change the values that <function>reshape</function> gives to
-                            <function>glViewport</function>. Make them bigger or smaller than the
-                        window and see what happens. Shift them around to different quadrants within
-                        the window.</para>
-                </listitem>
-                <listitem>
-                    <para>Change the <function>reshape</function> function so that it respects
-                        aspect ratio. This means that the area rendered to may be smaller than the
-                        window area. Also, try to make it so that it always centers the area within
-                        the window.</para>
+                        should happen while they're within the range). Keep the W values at 1.0 for
+                        now.</para>
                 </listitem>
                 <listitem>
                     <para>Change the clear color, using values in the range [0, 1]. Notice how this
                         in the <function>glDrawArrays</function> call from 3 to 6. Add more and play
                         with them.</para>
                 </listitem>
+                <listitem>
+                    <para>Change the values that <function>reshape</function> gives to
+                            <function>glViewport</function>. Make them bigger or smaller than the
+                        window and see what happens. Shift them around to different quadrants within
+                        the window.</para>
+                </listitem>
+                <listitem>
+                    <para>Change the <function>reshape</function> function so that changing the
+                        window size does not stretch the triangle. This means that the area rendered
+                        to, the viewport, may be smaller than the window area. Also, try to make it
+                        so that it always centers the area within the window.</para>
+                </listitem>
             </itemizedlist>
         </section>
         <section>

Documents/Building the Tutorials.xml

     <title>Building the Tutorials</title>
     <para>This section describes how to build the tutorials.</para>
     <formalpara>
-        <title>What You Need</title>
+        <title>What You Need to Download</title>
         <para>Obviously, you will need a C++ compiler and build environment. You will also need the
             Windows or Linux operating systems, as these are the only OS's supported by the
-            tutorials.</para>
+            tutorials. Supported build environments include Visual Studio 2008/2010, Code::Blocks,
+            and Linux-based GNU Makefiles. Other build systems may work, but they are not regularly
+            tested.</para>
     </formalpara>
     <para>You will need to download the <link
             xlink:href="http://bitbucket.org/alfonse/gltut/downloads">source distribution</link>.
         All of the libraries needed to build the tutorials are bundled as part of the distribution,
         so this is the only source code download you will need.</para>
+    <para>You will also need to download the <link xlink:href="http://industriousone.com/premake"
+            >Premake 4</link> utility for your platform of choice. Place the executable somewhere in
+        your command path.</para>
     <para>You will need minimal familiarity with using the command line in order to build these
         tutorials. Also, any mention of directories is always relative to where you unzipped this
         distribution.</para>
-    <simplesect>
+    <formalpara>
         <title>Distribution File Layout</title>
         <para>The layout of the files in the tutorial directory is quite simple. The
                 <filename>framework</filename> directory and all directories of the form
                 <filename>Tut*</filename> directory has the code for the various tutorials. The
                 <filename>framework</filename> directory simply contains utility code that is
             commonly used by each tutorial.</para>
-        <para>Each tutorial contains one or more projects; each project is referenced in the text
-            for that tutorial.</para>
-        <para>The <filename>Documents</filename> directory contains the source for the text
-            documentation explaining how these tutorials work. This source is in xml files using the
-            DocBook 5.0 format.</para>
-        <para>The other directories either contain libraries used by the tutorials or data files
-            that the tutorials load.</para>
-    </simplesect>
-    <simplesect>
-        <title>Necessary Utilities</title>
-        <para>In order to build everything, you will need to download the <link
-                xlink:href="http://industriousone.com/premake">Premake 4</link> utility for your
-            platform of choice.</para>
+    </formalpara>
+    <para>Each tutorial contains one or more projects; each project is referenced in the text for
+        that tutorial.</para>
+    <para>The <filename>Documents</filename> directory contains the source for the text
+        documentation explaining how these tutorials work. This source is in xml files using the
+        DocBook 5.0 format.</para>
+    <para>The other directories either contain libraries used by the tutorials or data files that
+        the tutorials load.</para>
+    <formalpara>
+        <title>Premake 4</title>
         <para>Premake is a utility like <link xlink:href="http://www.cmake.org/">CMake</link>: it
             generates build files for a specific platform. Unlike CMake, Premake is strictly a
             command-line utility. Premake's build scripts are written in the <link
                 xlink:href="http://www.lua.org/home.html">Lua language</link>, unlike CMake's build
             scripts that use their own language.</para>
-        <para>Note that Premake only generates build files; once the build files are created, you
-            can use them as normal. It can generate project files for Visual Studio, <link
-                xlink:href="http://www.codeblocks.org/">Code::Blocks</link>, and XCode, as well as
-            GNU Makefiles. And unless you want to modify one of the tutorials, you only need to run
-            Premake once for each tutorial.</para>
-        <para>The Premake download comes as a pre-built executable for all platforms of interest,
-            including Linux.</para>
-    </simplesect>
-    <simplesect>
+    </formalpara>
+    <para>Note that Premake only generates build files; once the build files are created, you can
+        use them as normal. It can generate project files for Visual Studio, <link
+            xlink:href="http://www.codeblocks.org/">Code::Blocks</link>, as well as GNU Makefiles.
+        And unless you want to modify one of the tutorials, you only need to run Premake once for
+        each tutorial.</para>
+    <para>The Premake download comes as a pre-built executable for all platforms of interest,
+        including Linux.</para>
+    <formalpara>
         <title>Unofficial OpenGL SDK</title>
-        <para>Distributed with the tutorials is the Unofficial OpenGL SDK. This is an aggregation of
-            libraries, unifying a number of tools for developing OpenGL applications, all bound
-            together with a unified build system. You do not need to download it; a version of the
-            SDK is part of the tutorial distribution. The copy that comes with these tutorials does
-            not contain the documentation or GLFW.</para>
-        <para>The SDK library uses Premake to generate its build files. So, with
-                <command>premake4.exe</command> in your path, go to the <filename>glsdk</filename>
-            directory. Type <userinput>premake4 <replaceable>plat</replaceable></userinput>, where
-                <replaceable>plat</replaceable> is the name of the platform of choice. For Visual
-            Studio 2008, this would be <quote>vs2008</quote>; for VS2010, this would be
-                <quote>vs2010.</quote> This will generate Visual Studio projects and solution files
-            for that particular version.</para>
-        <para>For GNU and makefile-based builds, this is <quote>gmake</quote>. This will generate a
-            makefile. To build for debug, use <userinput>make config=debug</userinput>; similarly,
-            to build for release, use <userinput>make config=release</userinput>.</para>
-        <para>Using the generated build files, compile for both debug and release. You should build
-            the entire solution; the tutorials use all of the libraries provided.</para>
-        <para>Note that there is no execution of <userinput>make install</userinput> or similar
-            constructs. The SDK is designed to be used where it is; it does not install itself to
-            any system directories on your machine. Incidentally, neither do these tutorials. Also,
-            you should not run the SDK from your the IDE; it's just a library. Successfully
-            compiling it in debug and release is sufficient.</para>
-    </simplesect>
-    <simplesect>
-        <title>Tutorial Building</title>
-        <para>Each tutorial directory has a <filename>premake4.lua</filename> file; this file is
-            used by Premake to generate the build files for that tutorial. Therefore, to build any
-            tutorial, you need only go to that directory and type <userinput>premake4
-                    <replaceable>plat</replaceable></userinput>, then use those build files to build
-            the tutorial.</para>
-        <para>Each tutorial will generally have more than one source file and generate multiple
-            executables. Each executable represents a different section of the tutorial, as
-            explained in that tutorial's documentation.</para>
-        <para>If you want to build all of the tutorials at once, go to the root directory of the
-            distribution and use Premake on the <filename>premake4.lua</filename> file in that
-            directory. It will put all of the tutorials into one giant project that you can
-            build.</para>
-        <para>If you look at any of the tutorial source files, you will not find the
-                <function>main</function> function defined anywhere. This function is defined in
-                <filename>framework/framework.cpp</filename>; it and all of the other source files
-            in the <filename>framework</filename> directory is shared by every tutorial. It does the
-            basic boilerplate work: creating a FreeGLUT window, etc. This allows the tutorial source
-            files to focus on the useful OpenGL-specific code.</para>
-        <para>Note that the framework project is a library, not an executable. So attempting to run
-            the framework from your IDE of choice will not work. You must select one of the tutorial
-            projects and set it to be the active project. Then you will be able to run that tutorial
-            from the IDE.</para>
-    </simplesect>
+        <para>Bundled with the tutorials is the Unofficial OpenGL SDK (you do not need to download
+            it separately). This is an aggregation of libraries, unifying a number of tools for
+            developing OpenGL applications, all bound together with a unified build system. You do
+            not need to download it; a version of the SDK is part of the tutorial distribution. The
+            copy that comes with these tutorials does not contain the documentation or GLFW.</para>
+    </formalpara>
+    <para>The SDK library uses Premake to generate its build files. So, with
+            <command>premake4.exe</command> in your path, open your command prompt and go to the
+            <filename>glsdk</filename> directory. Type <userinput>premake4
+                <replaceable>plat</replaceable></userinput>, where <replaceable>plat</replaceable>
+        is the name of the platform of choice. For Visual Studio 2008, this would be
+            <quote>vs2008</quote>; for VS2010, this would be <quote>vs2010.</quote> This will
+        generate Visual Studio projects and solution files for that particular version.</para>
+    <para>For GNU and makefile-based builds, this is <quote>gmake</quote>. This will generate a
+        makefile. To build for debug, use <userinput>make config=debug</userinput>; similarly, to
+        build for release, use <userinput>make config=release</userinput>.</para>
+    <para>Using the generated build files, compile for both debug and release. You should build the
+        entire solution; the tutorials use all of the libraries provided. Do not attempt to run the
+        SDK from your IDE; it's just a set of libraries. All you have to do is compile them for
+        debug and release.</para>
+    <para>That there is also no need to execute <userinput>make install</userinput> or similar
+        commands after building the SDK. The SDK is designed to be used where it is; it does not
+        install itself to any system directories on your machine. Incidentally, neither do these
+        tutorials.</para>
+    <formalpara>
+        <title>Tutorial Building and Running</title>
+    </formalpara>
+    <para>Each tutorial directory has a <filename>premake4.lua</filename> file; this file is used by
+        Premake to generate the build files for that tutorial. Therefore, to build any tutorial, you
+        need only go to that directory and type <userinput>premake4
+            <replaceable>plat</replaceable></userinput>, then use those build files to build the
+        tutorial.</para>
+    <para>Each tutorial will generally have more than one source file and generate multiple
+        executables. Each executable represents a different section of the tutorial, as explained in
+        that tutorial's documentation.</para>
+    <para>If you want to build all of the tutorials at once, go to the root directory of the
+        distribution and use Premake on the <filename>premake4.lua</filename> file in that
+        directory. It will put all of the tutorials into one giant project that you can
+        build.</para>
+    <para>If you look at any of the tutorial source files, you will not find the
+            <function>main</function> function defined anywhere. This function is defined in
+            <filename>framework/framework.cpp</filename>; it and all of the other source files in
+        the <filename>framework</filename> directory is shared by every tutorial. It does the basic
+        boilerplate work: creating a FreeGLUT window, etc. This allows the tutorial source files to
+        focus on the useful OpenGL-specific code.</para>
+    <para>Note that the framework project is a library, not an executable. So attempting to run the
+        framework from your IDE of choice will not work. You must select one of the tutorial
+        projects and set it to be the active project. Then you will be able to run that tutorial
+        from the IDE.</para>
 </article>

Documents/Further Study.xml

     xmlns:xlink="http://www.w3.org/1999/xlink" version="5.0">
     <?dbhtml filename="Further Study.html" ?>
     <title>Further Study</title>
-    <para>G</para>
+    <para>This book provides a firm foundation for you to get started in your adventures as a
+        graphics programmer. However, it ultimately cannot cover everything. What follows will be a
+        general overview of other topics that you should investigate now that you have a general
+        understanding of how graphics work.</para>
     <section>
         <?dbhtml filename="Further Study Debugging.html" ?>
         <title>Debugging</title>
         <para>This book provides functioning code to solve various problems and implement a variety
             of effects. However, it does not talk about how to get code from a non-working state
             into a working one. That is, debugging.</para>
-        <para>Debugging OpenGL code is very difficult. Frequently when there is an OpenGL bug, the
-            result is a massively unhelpful blank screen. If the problem is localized to a single
-            shader or state used to render an object, the result is a black object. Compounding this
-            problem is the fact that OpenGL has a lot of global state. One of the reasons this book
-            will often bind objects, do something with them, and then unbind them, is to reduce the
-            amount of state dependencies. It ensures that every object is rendered with a specific
-            program, a set of textures, a certain VAO, etc. It may be slightly slower to do this,
-            but for a simple application, getting it working is more important.</para>
+        <para>Debugging OpenGL code is very difficult. Frequently when there is a bug in graphics
+            code, the result is a massively unhelpful blank screen. If the problem is localized to a
+            single shader or state used to render an object, the result is a black object or general
+            garbage. Compounding this problem is the fact that OpenGL has a lot of global state. One
+            of the reasons this book will often bind objects, do something with them, and then
+            unbind them, is to reduce the amount of state dependencies. It ensures that every object
+            is rendered with a specific program, a set of textures, a certain VAO, etc. It may be
+            slightly slower to do this, but for a simple application, getting it working is more
+            important.</para>
         <para>Debugging shaders is even more problematic; there are no breakpoints or watches you
             can put on GLSL shaders. Fragment shaders offer the possibility of
                 <function>printf</function>-style debugging: one can always write some values to the
         </formalpara>
         <para>The system for dealing with this is called vertex weighting or skinning (note:
                 <quote>skinning</quote>, as a term, has also been applied to mapping a texture on an
-            object. So be aware of that when doing searches). A character is made of a hierarchy of
-            transformations; each transform is called a bone. Vertices are weighted to particular
-            bones. Where it gets interesting is that vertices can have weights to multiple bones.
-            This means that the vertex's final position is determined by a weighted combination of
-            two (or more) transforms.</para>
+            object. Be aware of that when doing Internet searches). A character is made of a
+            hierarchy of transformations; each transform is called a bone. Vertices are weighted to
+            particular bones. Where it gets interesting is that vertices can have weights to
+            multiple bones. This means that the vertex's final position is determined by a weighted
+            combination of two (or more) transforms.</para>
         <para>Vertex shaders generally do this by taking an array of matrices as a uniform block.
             Each matrix is a bone. Each vertex contains a <type>vec4</type> which contains up to 4
             indices in the bone matrix array, and another <type>vec4</type> that contains the weight
                 which are specified relative to the surface normal. This last part makes the BRDF
                 independent of the surface normal, as it is an implicit parameter in the equation.
                 The output of the BRDF is the percentage of light from the light source that is
-                reflected along the view direction. Thus, the output of the BRDF is multiples into
-                the incident light intensity to produce the output light intensity.</para>
+                reflected along the view direction. Thus, the output of the BRDF, when multiplied by
+                the incident light intensity, produces the reflected light intensity towards the
+                viewer.</para>
         </formalpara>
         <para>By all rights, this sounds like a lighting equation. And it is. Indeed, every lighting
             equation in this book can be expressed in the form of a BRDF. One of the things that
             object into a lab, perform a series of tests on it, and produce a BRDF table out of
             them. This BRDF table, typically expressed as a texture, can then be directly used by a
             shader to show how a surface in the real world actually behaves under lighting
-            conditions. This can provide much more accurate results than using models as we have
-            done.</para>
+            conditions. This can provide much more accurate results than using lighting models as we
+            have done here.</para>
         <formalpara>
             <title>Scalable Alpha Testing</title>
             <para>We have seen how alpha-test works via <literal>discard</literal>: a fragment is
             powerful and very performance-friendly.</para>
         <formalpara>
             <title>Screen-Space Ambient Occlusion</title>
-            <para>One of the many difficult processes when doing rasterization-based rendering is
+            <para>One of the many difficult issues when doing rasterization-based rendering is
                 dealing with interreflection. That is, light reflected from one object that reflects
                 off of another. We covered this by providing a single ambient light as something of
                 a hack. A useful one, but a hack nonetheless.</para>
                 through. Thick clouds appear dark because they scatter and absorb so much light that
                 not much passes through them.</para>
         </formalpara>
-        <para>All of these are light scattering effects. The most common in real-time scenarios is
-            fog, which meteorologically speaking, is simply a low-lying cloud. Ground fog is
-            commonly approximated in graphics by applying a change to the intensity of the light
-            reflected from a surface towards the viewer. The farther the light travels, the more of
-            it is absorbed and reflected, converting it into the fog's color. So objects that are
-            extremely distant from the viewer would be indistinguishable from the fog itself. The
-            thickness of the fog is based on the distance light has to travel before it becomes just
-            more fog.</para>
+        <para>All of these are atmospheric light scattering effects. The most common in real-time
+            scenarios is fog, which meteorologically speaking, is simply a low-lying cloud. Ground
+            fog is commonly approximated in graphics by applying a change to the intensity of the
+            light reflected from a surface towards the viewer. The farther the light travels, the
+            more of it is absorbed and reflected, converting it into the fog's color. So objects
+            that are extremely distant from the viewer would be indistinguishable from the fog
+            itself. The thickness of the fog is based on the distance light has to travel before it
+            becomes just more fog.</para>
         <para>Fog can also be volumetric, localized in a specific region in space. This is often
             done to create the effect of a room full of steam, smoke, or other particulate aerosols.
             Volumetric fog is much more complex to implement than distance-based fog. This is
         <para>Other NPR techniques include drawing objects that look like pencil sketches, which
             require more texture work than rendering system work. Some find ways to make what could
             have been a photorealistic rendering look like an oil painting of some form, or in some
-            cases, the glossy colors of a comic book. And so on. NPR has as its limits the user's
-            imagination. And the cleverness of the programmer to find a way to make it work, of
-            course.</para>
+            cases, the glossy colors of a comic book. And so on. NPR is limited only by the graphics
+            programmer's imagination. And the cleverness of said programmer to find a way to make it
+            work, of course.</para>
     </section>
 </appendix>

Documents/History of Graphics Hardware.xml

             the texture's alpha value. The alpha of the output was controlled with a separate math
             function, thus allowing the user to generate the alpha with different math than the RGB
             portion of the output color. This was the sum total of its fragment processing.</para>
-        <para>It had framebuffer blending support. Its framebuffer could even support a
-            destination alpha value, though you had to give up having a depth buffer to get it.
-            Probably not a good tradeoff. Outside of that issue, its blending support was superior
-            even to OpenGL 1.1. It could use different source and destination factors for the alpha
-            component than the RGB component; the old GL 1.1 forced the RGB and A to be blended with
-            the same factors.</para>
-        <para>The blending was even performed with full 24-bit color precision and then downsampled
-            to the 16-bit precision of the output upon writing.</para>
+        <para>It had framebuffer blending support. Its framebuffer could even support a destination
+            alpha value, though you had to give up having a depth buffer to get it. Probably not a
+            good tradeoff. Outside of that issue, its blending support was superior even to OpenGL
+            1.1. It could use different source and destination factors for the alpha component than
+            the RGB component; the old GL 1.1 forced the RGB and A to be blended with the same
+            factors. The blending was even performed with full 24-bit color precision and then
+            downsampled to the 16-bit precision of the output upon writing.</para>
         <para>From a modern perspective, spoiled with our full programmability, this all looks
             incredibly primitive. And, to some degree, it is. But compared to the pure CPU solutions
             to 3D rendering of the day, the Voodoo Graphics card was a monster.</para>
         <para>The next phase of hardware came, not from 3Dfx, but from a new company, NVIDIA. While
             3Dfx's Voodoo II was much more popular than NVIDIA's product, the NVIDIA Riva TNT
             (released in 1998) was more interesting in terms of what it brought to the table for
-            programmers. Voodoo II was purely a performance improvement; TNT was the next step in
-            the evolution of graphics hardware.</para>
+            programmers.</para>
         <para>Like other graphics cards of the day, the TNT hardware had no vertex processing.
             Vertex data was in clip-space, as normal, so the CPU had to do all of the transformation
             and lighting. Where the TNT shone was in its fragment processing. The power of the TNT
             <acronym>T</acronym>exel. It could access from two textures at once. And while the
             Voodoo II could do that as well, the TNT had much more flexibility to its fragment
             processing pipeline.</para>
-        <para>In order to accomidate two textures, the vertex input was expanded. Two textures meant
-            two texture coordinates, since each texture coordinate was directly bound to a
+        <para>In order to accommodate two textures, the vertex input was expanded. Two textures
+            meant two texture coordinates, since each texture coordinate was directly bound to a
             particular texture. While they were allowing two of things, NVIDIA also allowed for two
             per-vertex colors. The idea here has to do with lighting equations.</para>
         <para>For regular diffuse lighting, the CPU-computed color would simply be dot(N, L),
             single <quote>constant</quote> color. The latter, in modern parlance, is the equivalent
             of a shader uniform value.</para>
         <para>That's a lot of potential inputs. The solution NVIDIA came up with to produce a final
-            color was a bit of fixed functionality that we will call the texture environment. It is
-            directly analogous to the OpenGL 1.1 fixed-function pipeline, but with extensions for
-            multiple textures and some TNT-specific features.</para>
-        <para>The idea is that each texture has an environment. The environment is a specific math
-            function, such as addition, subtraction, multiplication, and linear interpolation. The
-            operands to this function could be taken from any of the fragment inputs, as well as a
-            constant zero color value.</para>
-        <para>It can also use the result from the previous environment as one of its arguments.
-            Textures and environments are numbered, from zero to one (two textures, two
-            environments). The first one executes, followed by the second.</para>
-        <para>If you look at it from a hardware perspective, what you have is a two-opcode assembly
-            language. The available registers for the language are two vertex colors, a single
-            uniform color, two texture colors, and a zero register. There is also a single temporary
-            register to hold the output from the first opcode.</para>
+            color was a bit of fixed functionality that NVIDIA calls texture combiners. It is
+            directly analogous to the OpenGL 1.1 fixed-function pipeline texture environment
+            concept, but with extensions for multiple textures and some TNT-specific
+            features.</para>
+        <para>The idea is that each texture has an <quote>environment</quote>. The environment is a
+            specific math function, such as addition, subtraction, multiplication, and linear
+            interpolation. The standard GL fixed-function pipeline only allowed the environment
+            functions to use as parameters the per-vertex color, the color sampled from that
+            particular texture, and a constant color. For multiple textures, the environments are
+            executed in sequence: the environment function for texture 0 executes, then for texture
+            1. The texture 1 environment used the output from texture 0 instead of the per-vertex
+            color.</para>
+        <para>NVIDIA's texture combiners augmented this significantly. The standard environment
+            functions were very limited in terms of operations. For example, the previous color
+            could be multiplied or added to the texture color, but it could not simply ignore the
+            texture color and multiply with the constant color instead. NVIDIA's texture combiners
+            could do this.</para>
+        <para>If you look at it from a hardware perspective, what texture combiners provide is a
+            two-opcode assembly language. The available registers for the language are two vertex
+            colors, a single uniform color, the current opcode's texture color, and a zero register.
+            There is also a single temporary register to hold the output from the first
+            opcode.</para>
         <para>Graphics programmers, by this point, had gotten used to multipass-based algorithms.
             After all, until TNT, that was the only way to apply multiple textures to a single
             surface. And even with TNT, it had a pretty confining limit of two textures and two
                 8x8 depth buffers, so you can use very fast, on-chip memory for it. Rather than
                 having to deal with caches, DRAM, and large bandwidth memory channels, you just have
                 a small block of memory where you do all of your logic. You still need memory for
-                textures and the output image, but your bandwidth needs can be devoted solely to
-                textures.</para>
-            <para>For a time, these cards were competitive with the other graphics chip makers.
-                However, the tile-based approach simply did not scale well with resolution or
-                geometry complexity. Also, they missed the geometry processing bandwagon, which
+                textures, the vertex buffer, and the output image, but your bandwidth needs can be
+                devoted to textures and the vertex buffer.</para>
+            <para>For a time, these cards were competitive with those from the other graphics chip
+                makers. However, the tile-based approach simply did not scale well with resolution
+                or geometry complexity. Also, they missed the geometry processing bandwagon, which
                 really hurt their standing. They fell farther and farther behind the other major
                 players, until they stopped making desktop parts altogether.</para>
             <para>However, they may ultimately have the last laugh; unlike 3Dfx and so many others,
                 longer-lasting mobile devices. Embedded devices tend to use smaller resolutions,
                 which their platform excels at. And with low resolutions, you are not trying to push
                 nearly as much geometry.</para>
-            <para>Thanks to these facts, PowerVR graphics chips power the vast majority of mobile
+            <para>Thanks to these facts, PowerVR's graphics chips power the vast majority of mobile
                 platforms that have any 3D rendering in them. Just about every iPhone, Droid, iPad,
                 or similar device is running PowerVR technology. And that's a growth market these
                 days.</para>
         <?dbhtml filename="History GeForce.html" ?>
         <title>Vertices and Registers</title>
         <para>The next stage in the evolution of graphics hardware again came from NVIDIA. While
-            3Dfx released competing cards, they were again behind the curve. The NVIDIA GeForce 256
-            (not to be confused with the GeForce GT250, a much more modern card), released in 1999,
+            3Dfx released competing cards, they were behind the curve. The NVIDIA GeForce 256 (not
+            to be confused with the GeForce GT250, a much more modern card), released in 1999,
             provided something truly new: a vertex processing pipeline.</para>
         <para>The OpenGL API has always defined a vertex processing pipeline (it was fixed-function
             in those days rather than shader-based). And NVIDIA implemented it in their TNT-era
             themselves can perform operations that generate negative values. Opcodes can even
             scale/bias their inputs, which allow them to turn unsigned colors into signed
             values.</para>
-        <para>Because of this, the GeForce 256 was the first hardware to be able to do functional
-            bump mapping, without hacks or tricks. A single register combiner stage could do 2
-            3-vector dot-products at a time. Textures could store normals by compressing them to a
-            [0, 1] range. The light direction could either be a constant or interpolated per-vertex
-            in texture space.</para>
+        <para>Because of this, the GeForce 256 was the first hardware to be able to do true normal
+            mapping, without hacks or tricks. A single register combiner stage could do 2 3-vector
+            dot-products at a time. Textures could store normals by compressing them to a [0, 1]
+            range. The light direction could either be a constant or interpolated per-vertex in
+            texture space.</para>
         <para>Now granted, this still was a primitive form of bump mapping. There was no way to
-            correct for texture-space values with binormals and tangents. But this was at least
+            correct for tangent-space values with bitantent and tangents. But this was at least
             something. And it really was the first step towards programmability; it showed that
             textures could truly represent values other than colors.</para>
         <para>There was also a single final combiner stage. This was a much more limited stage than
             to provide this level of programmability. While GeForce 3 hardware did indeed have the
             fixed-function vertex pipeline, it also had very flexible programmable pipeline. The
             retaining of the fixed-function code was a performance need; the vertex shader was not
-            as fast as the fixed-function one. It should be noted that the original X-Box's GPU,
-            designed in tandem with the GeForce 3, eschewed the fixed-functionality altogether in
-            favor of having multiple vertex shaders that could compute several vertices at a time.
-            This was eventually adopted for later GeForces.</para>
+            as fast as the fixed-function one.</para>
         <para>Vertex shaders were pretty powerful, even in their first incarnation. While there was
             no conditional branching, there was conditional logic, the equivalent of the ?:
             operator. These vertex shaders exposed up to 128 <type>vec4</type> uniforms, up to 16
             more tricks. But the main change was something that, in OpenGL terminology, would be
             called <quote>texture shaders.</quote></para>
         <para>What texture shaders did was allow the user to, instead of accessing a texture,
-            perform a computation on that texture's texture unit. This was much like the old texture
-            environment functionality, except only for texture coordinates. The textures were
-            arranged in a sequence. And instead of accessing a texture, you could perform a
+            perform a computation using that texture's texture unit. This was much like the old
+            texture environment functionality, except only for texture coordinates. The textures
+            were arranged in a sequence. And instead of accessing a texture, you could perform a
             computation between that texture unit's coordinate and possibly the coordinate from the
             previous texture shader operation, if there was one.</para>
         <para>It was not very flexible functionality. It did allow for full texture-space bump
                 but the base API was the same. Not so in OpenGL land.</para>
             <para>NVIDIA and ATI released entirely separate proprietary extensions for specifying
                 fragment shaders. NVIDIA's extensions built on the register combiner extension they
-                released with the GeForce 256. They were completely incompatible. And worse, they
-                were not even string-based.</para>
+                released with the GeForce 256. ATI's was brand new. They were completely
+                incompatible, and worse, they were not even string-based.</para>
             <para>Imagine having to call a C++ function to write every opcode of a shader. Now
                 imagine having to call <emphasis>three</emphasis> functions to write each opcode.
                 That's what using those APIs was like.</para>

Documents/Optimization.xml

         where performance is lacking.</para>
     <para>This appendix will instead cover the most basic optimizations. These are not guaranteed to
         improve performance in any particular program, but they almost never hurt. They are also
-        things you can implement relatively easily. These of these as the default standard practice
+        things you can implement relatively easily. Think of these as the default standard practice
         you should start with before performing real optimizations. For the sake of clarity, most of
         the code in this book did not use these practices, so many of them will be new.</para>
+    <para>Do as I say, not as I do.</para>
     <section>
         <title>Vertex Format</title>
-        <para>Interleave vertex arrays for objects where possible. Obviously, if you need to
-            overwrite some vertex data frequently while other data remains static, then you will
-            need to separate that data. But unless you have some specific need to do so, interleave
-            your vertex data.</para>
-        <para>Equally importantly, use the smallest vertex data possible. In the tutorials, the
-            vertex data was almost always 32-bit floats. You should only use 32-bit floats when you
-            absolutely need that much precision.</para>
-        <para>The biggest key to this is the use of normalized integer values for attributes. Here
-            is the definition of <function>glVertexAttribPointer</function>:</para>
+        <para>Interleave vertex attribute arrays for objects where possible. Obviously, if you need
+            to overwrite certain attributes frequently while other attributes remains static, then
+            you will need to separate that data. But unless you have some specific need to do so,
+            interleave your vertex data.</para>
+        <para>Equally importantly, try to use the smallest vertex data possible. Small data means
+            that GPU caches are more efficient; they store more vertex attributes per cache line.
+            This means fewer direct memory accesses, which means increasing the performance that
+            vertex shaders receive their attributes. In this book, the vertex data was almost always
+            32-bit floats. You should only use 32-bit floats when you absolutely need that much
+            precision.</para>
+        <para>The biggest key to this is the use of normalized integer values for attributes. As a
+            reminder for how this works, here is the definition of
+                <function>glVertexAttribPointer</function>:</para>
         <funcsynopsis>
             <funcprototype>
                 <funcdef>void <function>glVertexAttribPointer</function></funcdef>
         <para>The best part is that all of this is free; it costs no actual performance. Note
             however that 32-bit integers cannot be normalized.</para>
         <para>Sometimes, color values need higher precision than 8-bits, but less than 16-bits. If a
-            color is a linear RGB color, it is often desirable to give them greater than 8-bit
-            precision. If the alpha of the color is negligible or non-existent, then a special
+            color is in the linear RGB colorspace, it is often desirable to give them greater than
+            8-bit precision. If the alpha of the color is negligible or non-existent, then a special
                 <varname>type</varname> can be used. This type is
                 <literal>GL_UNSIGNED_INT_2_10_10_10_REV</literal>. It takes 32-bit unsigned
             normalized integers and pulls the four components of the attributes out of each integer.
             This type can only be used with normalization:</para>
         <programlisting language="cpp">glVertexAttribPointer(#, 4, GL_UNSIGNED_BYTE, GL_TRUE, ...);</programlisting>
         <para>The most significant 2 bits of each integer is the Alpha. The next 10 bits are the
-            Blue, then Green, and finally red. It is equivalent to this struct in C:</para>
+            Blue, then Green, and finally Red. Make note of the fact that it is reversed. It is
+            equivalent to this bitfield struct in C:</para>
         <programlisting language="cpp">struct RGB10_A2
 {
   unsigned int alpha    : 2;
                 to be on the [-1, 1] range. So signed normalized integers are appropriate here.
                 8-bits of precision are sometimes enough, but 10-bit precision is going to be an
                 improvement. 16-bit precision, <literal>GL_SHORT</literal>, may be overkill, so
-                stick with <literal>GL_INT_2_10_10_10_REV</literal>. Because this format provides 4
-                values, you will still need to use 4 as the size of the attribute, but you can still
-                use <type>vec3</type> in the shader as the normal's input variable.</para>
+                stick with <literal>GL_INT_2_10_10_10_REV</literal> (the signed version of the
+                above). Because this format provides 4 values, you will need to use 4 as the size of
+                the attribute, but you can still use <type>vec3</type> in the shader as the normal's
+                input variable.</para>
         </formalpara>
         <formalpara>
             <title>Texture Coordinates</title>
             well. There is no native 16-bit float type, unlike virtually every other type. Even the
             10-bit format can be built using bit selectors in structs, as above. Generating a 16-bit
             float from a 32-bit float requires care, as well as an understanding of how
-            floating-point values work. The details of that are beyond the scope of this work,
-            however.</para>
+            floating-point values work.</para>
+        <para>This is where the GLM math library comes in handy. It has the <type>glm::thalf</type>,
+            which is a type that represents a 16-bit floating-point value. It has overloaded
+            operators, so that it can be used like a regular <type>float</type>. GLM also provides
+                <type>glm::hvec</type> and <type>glm::hmat</type> types for vectors and matrices,
+            respectively.</para>
         <formalpara>
             <title>Positions</title>
             <para>In general, positions are the least likely attribute to be easily optimized
                 of approximately [-6550.4, 6550.4]. They also lack some precision, which may be
                 necessary depending on the size and detail of the object in model space.</para>
         </formalpara>
-        <para>If 16-bit floats are insufficient, there are things that can be done. The process is
-            as follows:</para>
+        <para>If 16-bit floats are insufficient, a certain form of compression can be used. The
+            process is as follows:</para>
         <orderedlist>
             <listitem>
                 <para>When loading the mesh data, find the bounding volume of the mesh in model
                     space. To do this, find the maximum and minimum values in the X, Y and Z
                     directions independently. This represents a rectangle in model space that
-                    contains all of the vertices. This rectangle is defined by two vectors: the
+                    contains all of the vertices. This rectangle is defined by two 3D vectors: the
                     maximum vector (containing the max X, Y and Z values), and the minimum vector.
                     These are named <varname>max</varname> and <varname>min</varname>.</para>
             </listitem>
                 attributes begin on a 4-byte boundary. This is true for attributes that are smaller
                 than 4-bytes, such as a 3-vector of 8-bit values. While OpenGL will allow you to use
                 arbitrary alignments, hardware may have problems making it work. So if you make your
-                position data 16-bit floats or signed normalized integers, you will still waste 2
-                bytes from every position. You may want to try making your position values
-                4-dimensional values and using the last value for something useful.</para>
+                3D position data 16-bit floats or 16-bit signed normalized integers, you will still
+                waste 2 bytes from every position. You may want to try making your position values
+                4-dimensional values and putting something useful in the W component.</para>
         </formalpara>
     </section>
     <section>
-        <title>Image Formats</title>
-        <para>As with vertex formats, try to use the smallest format that you can get away with.
-            Also, as with vertex formats, what you can get away with tends to be defined by what you
-            are trying to store in the texture.</para>
-        <formalpara>
-            <title>Normals</title>
-            <para>Textures containing normals can use <literal>GL_RGB10_A2_SNORM</literal>, which is
-                the texture equivalent to the 10-bit signed normalized format we used for attribute
-                normals. However, this can be made more precise if the normals are for a
-                tangent-space bump map. Since the tangent-space normals always have a positive Z
-                coordinate, and since the normals are normalized, the actual Z value can be computed
-                from the other two. So you only need to store 2 values;
-                    <literal>GL_RG16_SNORM</literal> is sufficient for these needs. To compute the
-                third value, do this:</para>
-        </formalpara>
-        <programlisting language="glsl">vec2 norm2d = texture(tangentBumpTex, texCoord).xy;
-vec3 tanSpaceNormal = sqrt(1.0 - dot(norm2d, norm2d));</programlisting>
-        <para>Obviously this costs some performance, so the added precision may not be worthwhile.
-            On the plus side, you will not have to do any normalization of the tangent-space
-            normal.</para>
-        <para>The <literal>GL_RG16_SNORM</literal> format can be made even smaller with texture
-            compression. The <literal>GL_COMPRESSED_SIGNED_RG_RGTC1</literal> compressed texture
-            format is a 2-channel signed integer format. It only takes up 8-bits per pixel.</para>
-        <formalpara>
-            <title>Floating-point Intensity</title>
-            <para>There are two unorthodox formats for floating-point textures, both of which have
-                important uses. The <literal>GL_R11F_G11F_B10F</literal> format is potentially a
-                good format to use for HDR render targets. As the name suggests, it takes up only
-                32-bits. The downside is the relative loss of precision compared to
-                    <literal>GL_RGB16F</literal>. They can store approximately the same magnitude of
-                values, but the smaller format loses some precision. This may or may not impact the
-                overall visual quality of the scene. It should be fairly simple to test to see which
-                is better.</para>
-        </formalpara>
-        <para>The <literal>GL_RGB9_E5</literal> format is used for input floating-point textures. If
-            you have a texture that represents light intensity in HDR situations, this format can be
-            quite handy. The way it works is that each of the RGB colors get 9 bits for their
-            values, but they all share the same exponent. This has to do with how floating-point
-            numbers work, but what it boils down to is that the values have to be relatively close
-            to one another in magnitude. They do not have to be that close; there's still some
-            leeway. Values that are too small relative to larger ones become zero. This is
-            oftentimes an acceptable tradeoff, depending on the particular magnitude in
-            question.</para>
-        <para>This format is useful for textures that are generated offline by tools. You cannot
-            render to a texture in this format.</para>
-        <formalpara>
-            <title>Colors</title>
-            <para>Storing colors that are clamped to [0, 1] can be done with good precision with
-                    <literal>GL_RGBA8</literal> or <literal>GL_SRGB8_ALPHA8</literal> as needed.
-                However, compressed texture formats are available. The S3TC formats are good choices
-                if the compression works reasonably well for the texture. There are sRGB versions of
-                the S3TC formats as well.</para>
-        </formalpara>
-        <para>The difference in the various S3TC formats are how much alpha you need. The choices
-            are as follows:</para>
-        <glosslist>
-            <glossentry>
-                <glossterm>GL_COMPRESSED_RGB_S3TC_DXT1_EXT</glossterm>
-                <glossdef>
-                    <para>No alpha.</para>
-                </glossdef>
-            </glossentry>
-            <glossentry>
-                <glossterm>GL_COMPRESSED_RGBA_S3TC_DXT1_EXT</glossterm>
-                <glossdef>
-                    <para>Binary alpha. Either zero or one for each texel. The RGB color for any
-                        alpha of zero will also be zero.</para>
-                </glossdef>
-            </glossentry>
-            <glossentry>
-                <glossterm>GL_COMPRESSED_RGBA_S3TC_DXT3_EXT</glossterm>
-                <glossdef>
-                    <para>4-bits of alpha per pixel.</para>
-                </glossdef>
-            </glossentry>
-            <glossentry>
-                <glossterm>GL_COMPRESSED_RGBA_S3TC_DXT5_EXT</glossterm>
-                <glossdef>
-                    <para>Alpha is compressed in an S3TC block, much like RG texture
-                        compression.</para>
-                </glossdef>
-            </glossentry>
-        </glosslist>
-        <para>If a variable alpha matters for a texture, the primary difference will be between DXT3
-            and DXT5. DXT5 has the potential for better results, but if the alpha does not compress
-            well with the S3TC algorithm, the results will be rather worse.</para>
+        <title>Textures</title>
+        <para>There are various techniques you can use to improve the performance of texture
+            accesses.</para>
+        <section>
+            <title>Image Formats</title>
+            <para>The smaller the data, the faster it can be fetched into a shader. As with vertex
+                formats, try to use the smallest format that you can get away with. As with vertex
+                formats, what you can get away with tends to be defined by what you are trying to
+                store in the texture.</para>
+            <formalpara>
+                <title>Normals</title>
+                <para>Textures containing normals can use <literal>GL_RGB10_A2_SNORM</literal>,
+                    which is the texture equivalent to the 10-bit signed normalized format we used
+                    for attribute normals. However, this can be made more precise if the normals are
+                    for a tangent-space normal map. Since the tangent-space normals always have a
+                    positive Z coordinate, and since the normals are normalized, the actual Z value
+                    can be computed from the other two. So you only need to store 2 values;
+                        <literal>GL_RG16_SNORM</literal> is sufficient for these needs. To compute
+                    the third value, do this:</para>
+            </formalpara>
+            <programlisting language="glsl">vec2 norm2d = texture(tangentBumpTex, texCoord).xy;
+vec3 tanSpaceNormal = vec3(norm2d, sqrt(1.0 - dot(norm2d, norm2d)));</programlisting>
+            <para>Obviously this costs some performance, so it's a question of how much precision
+                you actually need. On the plus side, using this method means that you will not have
+                to normalize the tangent-space normal fetched from the texture.</para>
+            <para>The <literal>GL_RG16_SNORM</literal> format can be made even smaller with texture
+                compression. The <literal>GL_COMPRESSED_SIGNED_RG_RGTC1</literal> compressed texture
+                format is a 2-channel signed integer format. It only takes up 8-bits per
+                pixel.</para>
+            <formalpara>
+                <title>Floating-point Intensity</title>
+                <para>There are two unorthodox formats for floating-point textures, both of which
+                    have important uses. The <literal>GL_R11F_G11F_B10F</literal> format is
+                    potentially a good format to use for HDR render targets. As the name suggests,
+                    it takes up only 32-bits. The downside is the relative loss of precision
+                    compared to <literal>GL_RGB16F</literal> (as well as the complete loss of a
+                    destination alpha). They can store approximately the same magnitude of values,
+                    but the smaller format loses some precision. This may or may not impact the
+                    overall visual quality of the scene. It should be fairly simple to test to see
+                    which is better.</para>
+            </formalpara>
+            <para>The <literal>GL_RGB9_E5</literal> format is used for input floating-point
+                textures. If you have a texture that represents light intensity in HDR situations,
+                this format can be quite handy. The way it works is that each of the RGB colors get
+                9 bits for their values, but they all share the same exponent. This has to do with
+                how floating-point numbers work, but what it boils down to is that the values have
+                to be relatively close to one another in magnitude. They do not have to be that
+                close; there's still some leeway. Values that are too small relative to larger ones
+                become zero. This is oftentimes an acceptable tradeoff, depending on the particular
+                magnitude in question.</para>
+            <para>This format is useful for textures that are generated offline by tools. You cannot
+                render to a texture in this format.</para>
+            <formalpara>
+                <title>Colors</title>
+                <para>Storing colors that are clamped to [0, 1] can be done with good precision with
+                        <literal>GL_RGBA8</literal> or <literal>GL_SRGB8_ALPHA8</literal> as needed.
+                    However, compressed texture formats are available. The S3TC formats are good
+                    choices if the compression artifacts are not too noticable. There are sRGB
+                    versions of the S3TC formats as well.</para>
+            </formalpara>
+            <para>The difference in the various S3TC formats are how much alpha you need. The
+                choices are as follows:</para>
+            <glosslist>
+                <glossentry>
+                    <glossterm>GL_COMPRESSED_RGB_S3TC_DXT1_EXT</glossterm>
+                    <glossdef>
+                        <para>No alpha.</para>
+                    </glossdef>
+                </glossentry>
+                <glossentry>
+                    <glossterm>GL_COMPRESSED_RGBA_S3TC_DXT1_EXT</glossterm>
+                    <glossdef>
+                        <para>Binary alpha. Either zero or one for each texel. The RGB color for any
+                            texel with a zero alpha will also be zero.</para>
+                    </glossdef>
+                </glossentry>
+                <glossentry>
+                    <glossterm>GL_COMPRESSED_RGBA_S3TC_DXT3_EXT</glossterm>
+                    <glossdef>
+                        <para>4-bits of alpha per pixel.</para>
+                    </glossdef>
+                </glossentry>
+                <glossentry>
+                    <glossterm>GL_COMPRESSED_RGBA_S3TC_DXT5_EXT</glossterm>
+                    <glossdef>
+                        <para>Alpha is compressed in an S3TC block, much like RG texture
+                            compression.</para>
+                    </glossdef>
+                </glossentry>
+            </glosslist>
+            <para>If an image needs to have a varying alpha, the primary difference will be between
+                DXT3 and DXT5. DXT5 has the potential for better results, but if the alpha does not
+                compress well with the S3TC algorithm, the results will be rather worse than
+                DXT3.</para>
+        </section>
+        <section>
+            <title>Use Mipmaps Often</title>
+            <para>Mipmapping improves performance when textures are mapped to regions that are
+                larger in texel space than in window space. That is, when texture minification
+                happens. Mipmapping improves performance because it keeps the locality of texture
+                accesses near each other. Texture hardware is optimized for accessing regions of
+                textures, so improving locality of texture data will help performance.</para>
+            <para>How much this matters depends on how the texture is mapped to the surface. Static
+                mapping with explicit texture coordinates, or with linear computation based on
+                surface properties, can use mipmapping to improve locality of texture access. For
+                more unusual mappings or for pure-lookup tables, mipmapping may not help locality at
+                all.</para>
+            <para>Ultimately, mipmaps are more likely to help performance when the texture in
+                question represents some characteristic of a surface, and is therefore mapped
+                directly to that surface. So diffuse textures, normal maps, specular maps, and other
+                surface characteristics are all very likely to gain some performance from using
+                mipmaps. Projective lights are less likely to gain from this, as it depends on the
+                geometry that they are projected onto.</para>
+        </section>
     </section>
     <section>
-        <title>Textures</title>
-        <para>Mipmapping improves performance when textures are mapped to regions that are larger in
-            texel space than in window space. That is, when texture minification happens. Mipmapping
-            improves performance because it keeps the locality of texture accesses near each other.
-            Texture hardware is optimized for accessing regions of textures, so improving locality
-            of texture data will help performance.</para>
-        <para>How much this matters depends on how the texture is mapped to the surface. Static
-            mapping with explicit texture coordinates, or with linear computation based on surface
-            properties, can use mipmapping to improve locality of texture access. For more unusual
-            mappings or for pure-lookup tables, mipmapping may not help locality at all.</para>
-        <para/>
+        <?dbhtml filename="Optimize Core.html"?>
+        <title>Object Optimizations</title>
+        <para>These optimizations all have to do with the concept of objects. An object, for the
+            purpose of this discussion, is a combination of a mesh, program, uniform data, and set
+            of textures used to render some specific thing in the world.</para>
+        <section>
+            <title>Object Culling</title>
+            <para>A virtual world consists of many objects. The more objects we draw, the longer
+                rendering takes.</para>
+            <para>One major optimization is also a very simple one: render only what must be
+                rendered. There is no point in drawing an object in the world that is not actually
+                visible. Thus, the task here is to, for each object, detect if it would be visible;
+                if it is not, then it is not rendered. This process is called visiblity culling or
+                object culling.</para>
+            <para>As a first pass, we can say that objects that are not within the view frustum are
+                not visible. This is called frustum culling, for obvious reasons. Determining that
+                an object is off screen is generally a CPU task. Each object must be represented by
+                a simple volume, such as a sphere or camera-space box. These objects are used
+                because they are relatively easy to test against the view frustum; if they are
+                within the frustum, then the corresponding object is considered visible.</para>
+            <para>Of course, this only boils the scene down to the objects in front of the camera.
+                Objects that are entirely occluded by other objects will still be rendered. There
+                are a number of techniques for detecting whether objects obstruct the view of other
+                objects. Portals, BSPs, and a variety of other techniques involve preprocessing
+                certain static terrain to determine visibility sets. Therefore it can be known that,
+                when the camera is in a certain region of the world, objects in certain other
+                regions cannot be visible even if they are within the view frustum.</para>
+            <para>A more fine-grained solution involves using a hardware feature called occlusion
+                queries. This is a way to render an object and then ask how many fragments of that
+                object were actually rasterized. If even one fragment passed the depth test
+                (assuming all possible occluding surfaces have been rendered), then the object is
+                visible and must be rendered.</para>
+            <para>It is generally preferred to render simple test objects, such that if any part of
+                the test object is visible, then the real object will be visible. Drawing a test
+                object is much faster than drawing a complex hierarchial model with specialized
+                skinning vertex shaders. Write masks (set with <function>glColorMask</function> and
+                    <function>glDepthMask</function>) are used to prevent writing the fragment
+                shader outputs of the test object to the framebuffer. Thus, the test object is only
+                tested against the depth buffer, not actually rendered.</para>
+            <para>Occlusion queries in OpenGL are objects that have state. They are created with the
+                    <function>glGenQueries</function> function. To start rendering a test object for
+                occlusion queries, the object generated from <function>glGenQueries</function> is
+                passed to the <function>glBeginQuery</function> function, along with the mode of
+                    <literal>GL_SAMPLES_PASSED</literal>. All rendering commands between
+                    <function>glBeginQuery</function> and the corresponding
+                    <function>glEndQuery</function> are part of the test object. If all of the
+                fragments of the object were discarded (via depth buffer or something else), then
+                the query failed. If even one fragment was rendered, then it passed.</para>
+            <para>This can be used with a concept called conditional rendering. This is exactly what
+                it says: rendering an object conditionally. It allows a series of rendering
+                commands, bracketed by
+                    <function>glBeginConditionalRender</function>/<function>glEndConditionalRender</function>
+                functions, to cause the execution of those rendering commands to happen or not
+                happen based on the status of an occlusion query object. If the occlusion query
+                passed, then the rendering commands will be executed. If it did not, then they will
+                not be.</para>
+            <para>Of course, conditional rendering can cause pipeline stalls; OpenGL still requires
+                that operations execute in-order, even conditional ones. So all later operations
+                will be held up if a conditional render is waiting for its occlusion query to
+                finish. To avoid this, you can specify <literal>GL_QUERY_NO_WAIT</literal> when
+                beginning the conditional render. This will cause OpenGL to render if the query has
+                not completed before this conditional render is ready to be rendered. To gain the
+                maximum benefit from this, it is best to render the conditional objects well after
+                the test objects they are conditioned on.</para>
+        </section>
+        <section>
+            <title>Model LOD</title>
+            <para>When a model is far away, it does not need to look as detailed, since most of the
+                details will be lost due to lack of resolution. Therefore, one can substitute more
+                detailed models for less detailed ones. This is commonly referred to as Level of
+                Detail (<acronym>LOD</acronym>).</para>
+            <para>Of course in modern rendering, detail means more than just the number of polygons
+                in a mesh. It can often mean what shader to use, what textures to use with it, etc.
+                So while meshes will often have LODs, so will shaders. Textures have their own
+                built-in LODing mechanism in mip-mapping. But it is often the case that low-LOD
+                shaders (those used from far away) do not need as many textures as the closer LOD
+                shaders. You might be able to get away with per-vertex lighting for distant models,
+                while you need per-fragment lighting for those close up.</para>
+            <para>The problem with this visually is how to deal with the transitions between LOD
+                levels. If you change them too close to the camera, then the user will notice a pop.
+                If you do them too far away, you lose much of the performance gain from rendering a
+                low-detail mesh far away. Finding a good middle-ground is key.</para>
+        </section>
+        <section>
+            <title>State Changes</title>
+            <para>OpenGL has three kinds of functions: those that actually do rendering, those that
+                retrieve information from OpenGL, and those that modify some information stored in
+                OpenGL. The vast majority of OpenGL functions are the latter. OpenGL's information
+                is generally called <quote>state,</quote> and needlessly changing state can be
+                expensive.</para>
+            <para>Therefore, this optimization rule is to, as best as possible, minimize the number
+                of state changes. For simple scenes, this can be trivial. But in a complicated,
+                data-driven environment, this can be exceedingly complex.</para>
+            <para>The general idea is to gather up a list of all objects that need to be rendered
+                (after culling non-visible objects and performing any LOD work), then sort them
+                based on their shared state. Objects that use the same program share program state,
+                for example. By doing this, if you render the objects in state order, you will
+                minimize the number of changes to OpenGL state.</para>
+            <para>The three most important pieces of state to sort by are the ones that change most
+                frequently: programs (and their associated uniforms), textures, and VAO state.
+                Global state, such as face culling, blending, etc, are less expensive because they
+                don't change as often. Generally, all meshes use the same culling parameters,
+                viewport settings, depth comparison state, and so forth.</para>
+            <para>Minimizing vertex array state changes generally requires more than just sorting;
+                it requires changing how mesh data is stored. This book usually gives every mesh its
+                own VAO, which represents its own separate state. This is certainly very convenient,
+                but it can work against performance if the CPU is a bottleneck.</para>
+            <para>To avoid this, try to group meshes that have the same vertex data formats in the
+                same buffer objects and VAOs. This makes it possible to render several objects, with
+                several different <function>glDraw*</function> commands, all using the same VAO
+                state. <function>glDrawElementsBaseVertex</function> is very useful for this purpose
+                when rendering with indexed data. The fewer VAO binds, the better.</para>
+            <para>There is less information on how harmful uniform state changes are to performance,
+                or the performance difference between changing in-program uniforms and buffer-based
+                uniforms.</para>
+            <para>Be advised that state sorting cannot help when dealing with blending, because
+                blending correctness requires sorting based on depth. Thus, it is necessary to avoid
+                that.</para>
+            <para>There are also certain tricky states that can hurt, depending on hardware. For
+                example, it is best to avoid changing the direction of the depth test once you have
+                cleared the depth buffer and started rendering to it. This is for reasons having to
+                do with specific hardware optimizations of depth buffering.</para>
+        </section>
     </section>
     <section>
         <title>Finding the Bottleneck</title>
                 milliseconds to spend performing all rendering tasks.</para>
             <para>One thing that confounds performance metrics is the fact that the GPU is both
                 pipelined and asynchronous. When running regular code, if you call a function,
-                you're usually assured that the action the function took has completed when it
+                you're usually assured that the actions the function took have all completed when it
                 returns. When you issue a rendering call (any <function>glDraw*</function>
                 function), not only is it likely that rendering has not completed by the time it has
-                returned, it is very possible that rendering has not even
-                    <emphasis>started</emphasis>. Not even doing a buffer swap will ensure that the
-                GPU has finished, as GPUs can wait to actual perform the buffer swap until
-                later.</para>
+                returned, it is very likely that rendering has not even
+                <emphasis>started</emphasis>. Not even doing a buffer swap will ensure that the GPU
+                has finished, as GPUs can wait to actual perform the buffer swap until later.</para>
             <para>If you specifically want to time the GPU, then you must force the GPU to finish
                 its work. To do that in OpenGL, you call a function cleverly titled
                     <function>glFinish</function>. It will return sometime after the GPU finishes.
                     the time to render doubles, then you are fragment processing bound.</para>
                 <para>Note that rendering time will go up when you increase the resolution. What you
                     are interested in is whether it goes up linearly with the number of fragments
-                    rendered. If the render time only goes up by 1.2x with a 2x increase in number
-                    of fragments, then the code was not fragment processing bound.</para>
+                    rendered. If the rendering time only goes up by 1.2x with a 2x increase in
+                    number of fragments, then the code was not entirely fragment processing
+                    bound.</para>
             </section>
             <section>
                 <title>Vertex Processing</title>
                     significantly (there will generally be some change), then you were vertex
                     processing bound.</para>
                 <para>To turn off fragment processing, simply
-                        <function>glEnable</function>(<literal>GL_CULL_FACE</literal>) and set
-                        <function>glCullFace</function> to <literal>GL_FRONT_AND_BACK</literal>.
-                    That will cause the clipping system to cull all triangles before rasterization.
-                    Obviously, nothing will be rendered, but your performance timings will be for
-                    vertex processing alone.</para>
+                        <function>glEnable</function>(<literal>GL_RASTERIZER_DISCARD​</literal>).
+                    This will cause all fragments to be discarded. Obviously, nothing will be
+                    rendered, but all of the steps before rasterization will still be executed.
+                    Therefore, your performance timings will be for vertex processing alone.</para>
             </section>
             <section>
                 <title>CPU</title>
                 way to avoid a vertex-processing heavy section of your renderer. Perhaps you need
                 all of that fragment processing in a certain area of rendering.</para>
             <para>If there is some bottleneck that cannot be optimized away, then turn it to your
-                advantage. If you have a CPU bottleneck, then render more detailed models. If you
-                have a vertex-shader bottleneck, improve your lighting by adding some
-                fragment-shader complexity. And so forth. Just make sure that you do not increase
-                complexity to the point where you move the bottleneck.</para>
-        </section>
-    </section>
-    <section>
-        <?dbhtml filename="Optimize Core.html" ?>
-        <title>Core Optimizations</title>
-        <para/>
-        <section>
-            <title>State Changes</title>
-            <para>This rule is designed to decrease CPU bottlenecks. The rule itself is simple:
-                minimize the number of state changes. Actually doing it is a complex exercise in
-                graphics engine design.</para>
-            <para>What is a state change? Any OpenGL function that changes the state of the current
-                context is a state change. This includes any function that changes the state of
-                objects bound to the current context.</para>
-            <para>What you should do is gather all of the things you need to render and sort them
-                based on state changes. Objects with similar state will be rendered one after the
-                other. But not all state changes are equal to one another; some state changes are
-                more expensive than others.</para>
-            <para>Vertex array state, for example, is generally considered quite expensive. Try to
-                group many objects that have the same vertex attribute data formats in the same
-                buffer objects. Use glDrawElementsBaseVertex to help when using indexed
-                rendering.</para>
-            <para>The currently bound texture state is also somewhat expensive. Program state is
-                analogous to this.</para>
-            <para>Global state, such as face culling, blending, etc, are generally considered less
-                expensive. You should still only change it when necessary, but buffer object and
-                texture state are much more important in state sorting.</para>
-            <para>There are also certain tricky states that can hurt you. For example, it is best to
-                avoid changing the direction of the depth test once you have cleared the depth
-                buffer and started rendering to it. This is for reasons having to do with specific
-                hardware optimizations of depth buffering.</para>
-            <para>It is less well-understood how important uniform state is, or how uniform buffer
-                objects compare with traditional uniform values.</para>
-        </section>
-        <section>
-            <title>Object Culling</title>
-            <para>The fastest object is one not drawn. And there's no point in drawing something
-                that is not seen.</para>
-            <para>The simplest form of object culling is frustum culling: choosing not to render
-                objects that are entirely outside of the view frustum. Determining that an object is
-                off screen is a CPU task. You generally have to represent each object as a sphere or
-                camera-space box; then you test the sphere or box to see if it is partially within
-                the view space.</para>
-            <para>There are also a number of techniques for dealing with knowing whether the view to
-                certain objects are obstructed by other objects. Portals, BSPs, and a variety of
-                other techniques involve preprocessing terrain to determine visibility sets.
-                Therefore, it can be known that, when the camera is in a certain region of the
-                world, objects in certain other regions cannot be visible, even if they are within
-                the view frustum.</para>
-            <para>A level beyond that involves using something called occlusion queries. This is a
-                way to render an object with the GPU and then ask how many fragments of that object
-                were rasterized. It is generally preferred to render simple test objects, such that
-                if any part of the test object is visible, then the real object will be visible.
-                Color masks (with <function>glColorMask</function>) are used to prevent writing the
-                fragment shader outputs of the test object to the framebuffer.</para>
-            <para>Occlusion queries in OpenGL are objects that have state. They are created with the
-                    <function>glGenQueries</function> function. To start rendering a test object for
-                occlusion queries, the object generated from <function>glGenQueries</function> is
-                passed to the <function>glBeginQuery</function> function, along with the mode of
-                    <literal>GL_SAMPLES_PASSED</literal>. All rendering commands between
-                    <function>glBeginQuery</function> and the corresponding
-                    <function>glEndQuery</function> are part of the test object. If all of the
-                fragments of the object were discarded (via depth buffer or something else), then
-                the query failed. If even one fragment was rendered, then it passed.</para>
-            <para>This can be used with conditional rendering. Conditional rendering allows a series
-                of rendering commands, bracketed by
-                    <function>glBeginConditionalRender</function>/<function>glEndConditionalRender</function>
-                functions, to cause rendering of an object to happen or not happen based on the
-                status of an occlusion query object. If the occlusion query passed, then the
-                rendering commands will be executed. If it did not, then they will not be.</para>
-            <para>Of course, conditional rendering can cause pipeline stalls; OpenGL still requires
-                that operations execute in-order, even conditional ones. So all later operations
-                will be held up if a conditional render is waiting for its occlusion query to
-                finish. To avoid this, you can specify <literal>GL_QUERY_NO_WAIT</literal> when
-                beginning the conditional render. This will cause OpenGL to render if the query has
-                not completed before this conditional render is ready to be rendered.</para>
-        </section>
-        <section>
-            <title>Model LOD</title>
-            <para>When a model is far away, it does not need to look as detailed. Therefore, one can
-                substitute more detailed models for less detailed ones. This is commonly referred to
-                as Level of Detail (<acronym>LOD</acronym>).</para>
-            <para>Of course in modern rendering, detail means more than just the number of polygons
-                in a mesh. It can often mean what shader to use, what textures to use with it, etc.
-                So while meshes will often have LODs, so will shaders. Textures have their own
-                built-in LODing mechanism in mip-mapping. But it is often the case that low-LOD
-                shaders (those used from far away) do not need as many textures as the closer LOD
-                shaders. You might be able to get away with per-vertex lighting for distant models,
-                while you need per-fragment lighting for those close up.</para>
-            <para>The general problem is how to deal with the transitions between LOD levels. If you
-                change them too close to the camera, then the user will notice the pop. If you do
-                them too far away, you lose much of the performance impact. Finding a good
-                middle-ground is key.</para>
-        </section>
-        <section>
-            <title>Mipmapping</title>
-            <para>For any texture that represents a surface property of an object, strongly consider
-                giving it mipmaps. This includes bump maps, diffuse textures, specular textures,
-                etc. This is primarily for performance reasons.</para>
-            <para>When you fetch a texel from a texture, the texture unit hardware will usually
-                fetch the neighboring textures at the mip LOD(s) in question. These texels will be
-                stored in local memory called the texture cache. This means that, when the next
-                fragment on the surface comes along, that texel will already be in the cache. But
-                this only works for texels that are near each other.</para>
-            <para>When an object is far from the camera or angled sharply relative to the view, then
-                the two texture coordinates for two neighboring fragments can be quite different
-                from one another. When fetching from a low mipmap (remember: 0 is the biggest
-                mipmap), then the two fragments will get texels that are far apart. Neither one will
-                fetch texels near each other.</para>
-            <para>But if they are fetching from a high mipmap, then the large texture coordinate
-                difference between them translates into a small texel-space difference. With proper
-                mipmaping, neighboring texels can feed on the cache and do fewer memory accesses.
-                This speeds up texturing performance.</para>
-            <para>This also means that biasing the mipmap LOD lower (to larger mipmaps) can cause
-                serious performance problems in addition to aliasing.</para>
+                advantage by increasing the complexity of the other stages in the pipeline. If you
+                have an unfixable CPU bottleneck, then render more detailed models. If you have a
+                vertex-shader bottleneck, improve your lighting by adding some fragment-shader
+                complexity. And so forth. Just make sure that you do not increase complexity to the
+                point where you move the bottleneck and make things slower.</para>
         </section>
     </section>
     <section>

Documents/Texturing/Tutorial 14.xml

                 data, is all OpenGL needs to read our data.</para>
             <para>That describes the data format as we are providing it. The format parameter, the
                 third parameter to the <function>glTexImage*</function> functions, describes the
-                format of the texture's internal storage. The texture's format defines the
-                properties of the texels stored in that texture:</para>
+                format of the texture's internal storage. This is how OpenGL
+                    <emphasis>itself</emphasis> will store the texel data; this does not have to
+                exactly match the format provided. The texture's format defines the properties of
+                the texels stored in that texture:</para>
             <itemizedlist>
                 <listitem>
                     <para>The components stored in the texel. Multiple components can be used, but
                 unsigned normalized 8-bit integers as the input data.</para>
             <para>This is not strictly necessary. We could have used <literal>GL_R16</literal> as
                 our format instead. OpenGL would have created a texture that contained 16-bit
-                unsigned normalized integers. OpenGL would then have had to convert our input data
-                to the 16-bit format. It is good practice to try to match the texture's format with
-                the format of the data that you upload to OpenGL.</para>
+                unsigned normalized integers. OpenGL would then have had to convert our 8-bit input
+                data to the 16-bit format. It is good practice for the sake of performance to try to
+                match the texture's format with the format of the data that you upload to OpenGL, so
+                as to avoid conversion.</para>
             <para>The calls to <function>glTexParameter</function> set parameters on the texture
                 object. These parameters define certain properties of the texture. Exactly what
                 these parameters are doing is something that will be discussed in the next

Documents/preface.xml

             but they are often ignored by most introductory material because they do not work with
             the fixed function pipeline.</para>
         <para>This book is first and foremost about learning how to be a graphics programmer.
-            Therefore, whenever it is possible and practical, this book will present material in a
+            Therefore whenever it is possible and practical, this book will present material in a
             way that encourages the reader to examine what graphics hardware can do in new and
             interesting ways. A good graphics programmer sees the graphics hardware as a set of
             tools to fulfill their needs, and this book tries to encourage this kind of
             thinking.</para>
-        <para>One thing this book is not, however, is a book on graphics APIs. While it does use
-            OpenGL and out of necessity teach rendering concepts in terms of OpenGL, it is not truly
-            a book that is <emphasis>about</emphasis> OpenGL. It is not the purpose of this book to
-            teach you all of the ins and outs of the OpenGL API.There will be parts of OpenGL
-            functionality that are not dealt with because they are not relevant to any of the
-            lessons that this book teaches. If you already know graphics and are in need of a book
-            that teaches modern OpenGL programming, this is not it. It may be useful to you in that
-            capacity, but that is not this book's main thrust.</para>
+        <para>One thing this book is not however, is a book on graphics APIs. While it does use
+            OpenGL and out of necessity teach rendering concepts in terms of OpenGL, it is not a
+            comprehensive guide to the OpenGL API. It will cover much of the API, as needed for
+            various graphics concepts, but there will be many pieces of functionality supported by
+            the API that will not be covered. If you already know graphics and are in need of a book
+            that teaches modern OpenGL programming, this book may be too basic for you. This book
+            will teach you the lion's share of the API, but this book is primarily focused on
+            graphics more than the API itself.</para>
         <para>This book is intended to teach you how to be a graphics programmer. It is not aimed at
             any particular graphics field; it is designed to cover most of the basics of 3D
             rendering. So if you want to be a game developer, a CAD program designer, do some
             computer visualization, or any number of things, this book can still be an asset for
-            you.</para>
-        <para>This does not mean that it covers everything there is about 3D graphics. Hardly. It
+            you. This does not mean that it covers everything there is about 3D graphics. Hardly. It
             tries to provide a sound foundation for your further exploration in whatever field of 3D
             graphics you are interested in.</para>
         <para>One topic this book does not cover in depth is optimization. The reason for this is