Shared posts

01 May 21:25

Doom 3 - Volumetric Glow

by Simon

This article moved to its own webspace! You can follow him by clicking this:
25 Apr 07:23

Visualizing path finding in JavaScript

by hanecci
  • 経路探索のアルゴリズムを可視化する Javascript のサンプルがあったので下にリンクを貼っておきます.
  • あと, Jump Point Search という通常の A* を最適化して速くしたアルゴリズムがあるようです.

Demo

Jump Point Search path finding

24 Apr 11:28

Release: Klei's survival game Don't Starve is out in the digital wild

by John Polson
Michael Nischt

My favorite since quite some time now.
P.S. There is also a version for chrome native which includes a steam key.

Eets and Mark of the Ninja developer Klei has released its wilderness survival game Don't Starve for Windows, Mac, and Linux users today. The Tim Burton take on Minecraft will have players maintaining their health, hunger, and sanity while foraging in randomly-generated worlds.

This is just the beginning for Don't Starve, as Klei announced six months of free content to follow the launch. Don't Starve is available for $11.99 at GOG or $13.49 at Steam and the Humble Store.

22 Apr 21:34

On GC in Games (response to Jeff and Casey)

by sebastiansylvan

So it turns out youtube comments suck. I’ll write my response to Jeff and Casey’s latest podcast in blog form instead of continuing the discussion there. View it here: http://www.youtube.com/watch?v=tK50z_gUpZI

Now, first let me say that I agree with 99% of the sentiment of this podcast. I think writing high performance games in Java or C# is kinda crazy, and the current trend of writing apps in HTML5 and JavaScript and then running it on top of some browser-like environment is positively bonkers. The proliferation of abstraction layers and general “cruft” is a huge pet peeve of mine – I don’t understand why it takes 30 seconds to launch a glorified text editor (like most IDEs – Eclipse, Visual Studio, etc.), when it took a fraction of a second twenty years ago on hardware that was thousands of times slower.

That said, I do think their arguments against GC aren’t quite fair (and note that GC has nothing to do with JITs or VMs). The pile up roughly the right things in the “cons” column, but they completely ignore “pros” column, and as a result act baffled that anyone would ever think GC is appropriate for any reason.

Before I get into it, I should probably link to my previous post on GC where I spend a large chunk of time lamenting how poorly designed C# and Java are w.r.t. GC in particular. Read it here.

To summarize: no mainstream language does this “right”. What you want is a language that’s memory safe, but without relegating every single allocation to a garbage collected heap. 95% of your memory allocations should either be a pure stack allocation, or anchored to the stack (RAII helps), or be tied uniquely to some owning object and die immediately when the parent dies. Furthermore, the language should highly discourage allocations in general – it should be value-oriented like C so that there’s just plain less garbage to deal with in the first place. Rust is a good example of a language of this kind.

You’ll note that most of Jeff and Casey’s ranting is not actually about the GC itself, but about promiscuous allocation behavior, and I fully agree with that, but I think it’s a mistake to conflate the two. GC doesn’t imply that you should heap allocate at the drop of a hat, or that you shouldn’t think about who “owns” what memory.

Here’s the point: Garbage collection is about memory safety. It’s not about convenience, really. Nobody serious argues that GC means you don’t have to worry about resource usage. If you have type safety, array bounds checks, null safety, and garbage collection, you can eliminate memory corruption. That’s why people accept all the downsides of GC even in languages where it comes with much higher than necessary penalties (e.g. Java, C#, Ruby, Lua, Python, and so on… pretty much all mainstream languages).

A couple of weeks ago I spent several days tracking down a heap corruption in a very popular third party game engine. I haven’t tracked down who’s responsible for the bug (though I have access to their repository history), and exactly how long it’s been there, but from the kind of bug it was I wouldn’t be surprised if it’s been there for many years, and therefore in hundreds (or even thousands?) of shipped games. It just started happening after several years for no real reason (maybe the link order change just enough, or the order of heap allocations changed just enough, to make it actually show up as a crash).

The main thing to say about this bug (I won’t detail it here because it’s not my code) is that it was caused by three different pieces of code interacting badly, but neither piece was necessarily doing anything stupid. I can easily see very smart and professional programmers writing these three pieces of code at different times, and going through a few iterations perhaps, and all of a sudden there’s a perfect storm, and a latent memory corruption is born.

I mention this because it raises a few important points:

  • Memory corruption is not always caught before you ship. Any argument about manual memory corruption not being so bad because it’s at least transparent and debuggable, unlike the opaque GC, falls flat on its face for this reason. Yes, you have all the code, and it’s not very complicated, but how does that help you if you never even see the bug before you ship? Memory corruption bugs are frequently difficult to even repro. They might happen once every thousand hours due to some rare race condition, or some extremely rare sequence of heap events. You could in principle debug it (though it often takes considerable effort and time), if you knew it was there, but very sometimes you just don’t.
  • Memory corruption is often very hard to debug. Often this goes hand in hand with the previous point. Something scribbles to some memory, and fourty minutes later enough errors have cascaded from this to cause a visible crash. It’s extremely hard to trace back in time to figure out the root cause of these things. This is another ding against the “the GC is so opaque” argument. Opacity isn’t just about whether or not you have access to the code – it’s also about how easy it is to fix even if you do. The extreme difficulty of tracking down some of the more subtle memory corruption bugs means that the theoretical transparency you get from owning all the code really doesn’t mean much. With a GC at least most problems are simple to understand – yes you may have to “fix” it by tuning some parameters, or even pre-allocating/reusing memory to avoid the GC altogether (because you can’t break open the GC itself), but this is far less effort and complexity than a lot of heap corruption bugs.
  • Smart people fuck up too. In the comments there were a number of arguments that essentially took the form “real programmers can deal with manual memory management”*. Well, this is an engine developed by some of the best developers in the industry, and it’s used for many thousands of games, including many AAA games. Furthermore, there was absolutely nothing “stupid” going on here. It was all code that looked completely sane and sensible, but due to some very subtle interactions caused a scribble. Also, it’s not hard to go through the release notes for RAD-developed middleware and find fixes for memory corruption bugs – so clearly even RAD engineers (of whom I have a very high opinion) occasionally fuck up here.

With memory safety, most of these bugs simply disappear. The majority of them really don’t happen at all anymore – and the rest turn into a different kind of bug, which is much easier to track down: a space leak (a dangling pointer in a memory safe language just means you’ll end up using more memory than you expected, which can be tracked down in minutes using rudimentary heap analysis tools).

In other words: memory safety eliminates a whole host of bugs, and improves debuggability of some other bugs. Even when a GC causes additional issues (which they do – there’s a real cost to GC for sure) they at least do so before you ship, unlike the bugs caused by not having memory safety. This is a very important distinction!

Yes, you should be careful, and I’m certainly not advocating Java or C# here, but when you do consider the tradeoffs you should at least be honest about the downsides of not having memory safety. There is real value in eliminating these issues up front.

In current languages I would probably almost always come down on the side of not paying the cost of GC for high-performance applications. E.g. I’ll generally argue against any kind of interpretation or VM-based scripting altogether (DSLs that compile to native is a different issue), especially if they require a GC. However, I don’t think you need to overstate your case when making the tradeoff.

If I could pay a small fixed cost of, let’s say 0.5ms per frame, but be guaranteed that I’m not going to have to worry about any memory corruption ever again I’d totally take that tradeoff. We’re not there yet, but we really aren’t that far off either – the problem isn’t intrinsic to GC. Plenty of high performance games, even 60Hz ones, have shipped with GC’d scripting languages, and while they don’t usually collect the whole heap, a lot of them do manage to keep the GC overhead around that level of cost. So maybe in the future, instead of paying 0.5ms to GC a small heap that’s completely ruined by a shitty language that generates too much garbage, we could instead GC the whole heap and end up with similar levels of complexity by just creating less garbage in the first place (using a non-shitty language).

 

*Side note: I really hate arguments of the form “real programmers can deal with X” used to dismiss the problem by basically implying that anyone who has a problem just isn’t very good. It’s incredibly insulting and lazy, and no discussion was ever improved by saying it. In my opinion hubris, or extrapolating too far from your own experience, is a far more common sign of incompetence or inexperience than admitting that something is hard.


18 Apr 14:31

A Journey to Monaco: Andy Schatz Looks Back

A profile of the developer of the hotly anticipated, IGF-winning game. ...

17 Apr 07:28

Dustforce Sales Figures

by Timothy Lottes
16 Apr 08:02

Glossy reflections

by Dom Penfold

Up to now the reflections that have been visible from the raytracer have been perfect reflections. This means that we only need to trace a single reflection ray and we can do perfect mirrors all over the place.

Sadly for photorealism, very few surfaces show perfect reflections. In most materials the reflections become blurry as objects get further away from a surface. This is because most surfaces aren’t perfectly flat. The surfaces have small imperfections at a microsopic level which mean that light is reflected in slightly different directions depending on which part of the imperfections they hit.

Glossy reflection example

Example of glossy reflection

As you can see, the right hand side of the image is a reflection (it’s reversed) and the reflection itself is quite distorted. This is because I took the photo by pointing the camera at one of the kitchen units. This surface has a slightly mottled surface which leads to imperfect reflections.

To simulate this in a raytracer we need to change the part of our raytracing system that calculates the reflection colour. Currently the code looks like this…

    DVector3 reflection = eyeVector - normal * (normal.Dot(eyeVector) * 2.f);
    scene->Intersect(DRay(hitPos, reflection), response);
    mRed += response.mColour.mRed * mReflectivity.mRed;
    mGreen += response.mColour.mGreen * mReflectivity.mGreen;
    mBlue += response.mColour.mBlue * mReflectivity.mBlue;

To change this so that we calculate a glossy reflection we introduce a new parameter to the material model called “shininess”. If a surface is completely shiny, then the reflection will be perfect and we won’t modify the reflection vector at all. As the shininess decreases we add an increasing amount of noise to the reflection vector. When completely non-shiny we end up with no cohesive reflections.

Now we need to calculate some “noise” for the reflection vector. For a completely uniform level of glossiness we’ll use a random perturbation vector and blend this in using the shininess value. We need to add the following line of code before intersecting the reflection vector with the scene.

    DVector3 perturb = DVector3(scene->GetRandom()*2-1, scene->GetRandom()*2-1, scene->GetRandom()*2-1);
    perturb *= 1.0f - m_Shininess;
    reflection += perturb;
    reflection.Normalise();

This does the trick quite nicely. There may be some argument for using a random vector selected from a sphere rather than a cube, but it probably makes little difference when shininess is close to 1.0.

Once you’ve got this code in place you then to render multiple reflections per pixel, you can either do this by wrapping the reflection lookup in a while loop, or by multisampling each pixel. As the pixels are sampled multiple times the glossiness will start to appear. In the following scenes I’ve used 32 samples per pixel to show the glossiness.

First off lets look at a scene with a single reflective object.

Perfect Reflections

Default scene, shininess = 1.0

Now lets change the shininess to 0.95 (5% perturbation). This now gives us a noticeably glossy reflection

Glossy reflection

Glossy reflection (shininess = 0.95)

We can go further, here’s an image with shininess of 0.9 (10% perturbation)

Very Glossy reflection

Glossy reflection (shininess = 0.9)

As you can see we’re now starting to reach the point where we need to run many more samples per pixel to reduce the noise visible in the image.

For the final feature render I’ve mixed in depth of field and area lights. To get this to look really smooth I’ve sampled each pixel 1024 times. This took about 10 minutes to render on my machine at 750×500. That’s 384 million initial rays, with many more secondary rays sent into the scene.

Glossy ephere

Mixture of Glossy Surfaces, Depth of Field and Area lights

Hope you found this useful, as ever, drop me a comment if you’ve got any questions.

13 Apr 10:11

Are Explicit Location Bindings a Good Idea for a Shading Language?

by tangentvector

Probably Not.

Introduction

Both the HLSL and GLSL shading languages support mechanisms for assigning explicit, programmer-selected locations to certain shader parameters. In HLSL, this is done with the regsiter keyword:

// texture/buffer/resource:
Texture2D someTexture          : register(t0); 

// sampler state:
SamplerState linearSampler     : register(s0);

// unordered access view (UAV):
RWStructuredBuffer<T> someData : register(u0);

// constant buffer:
cbuffer PerFrame               : register(b0)    
{
    // offset for values in buffer:
    float4x4 view              : packoffset(c0);
    float4x4 proj              : packoffset(c4);
    // ...
}

When setting shader parameters through the Direct3D API, these explicit locations tell us where to bind data for each parameter. For example, since the cbuffer PerFrame is bound to register b0, we will associate data with it by binding an ID3D11Buffer* to constant buffer slot zero (with, say, PSSetConstantBuffers).

The OpenGL Shading Language did not initially support such mechanisms, but subsequent API revisions have added more and more uses of the layout keyword:

// texture/buffer/resource
layout(binding = 0) sampler2D someTexture;

// shader storage buffer (SSB)
layout(binding = 0) buffer T someData[];

// uniform buffer:
layout(binding = 0) PerFrame
{
    mat4 view;
    mat4 proj;
    // ...
}

// input and output attributes:
layout(location = 2) in  vec3 normal;
layout(location = 0) out vec4 color;

// default block uniforms (not backed by buffer)
layout(location = 0) uniform mat4 model;
layout(location = 1) uniform mat4 modelInvTranspose;

It is clear that location binding was an afterthought in the design of both languages; the syntax is ugly and obtrusive. Using explicit locations can also be error-prone, since it becomes the programmer’s responsibility to avoid conflicts, etc., and ensure a match between application and shader code. The shading languages “want” you to write your code without any explicit locations.

If you talk to game graphics programmers, though, you will find that they use explicit locations almost exclusively. If you try to give them a shading language without this feature (as GLSL did), they will keep demanding that you add it until you relent (as GLSL did).

Why Do We Need These Things?

If the programmer does not assign explicit locations, then it is up to the shader compiler to do so. Unfortunately, there is no particular scheme that the compiler is required to implement, and in particular:

  • The locations assigned to parameters might not reflect their declared order.
  • A parameter might not be assigned a location at all (if it is statically unreferenced in the shader code).
  • Two different GL implementations might (indeed, will) assign locations differently.
  • A single implementation might assign locations differently for two shaders that share parameters in common.

When an application relies on the shader compiler to assign locations, it must then query the resulting assignment through a “reflection” interface before it can go on to bind shader parameters. What used to be a call like PSSetConstantBuffers(0, ...) must now be somethin glike PSSetConstantBuffers(queriedLocations[0], ...). In the case of Direct3D, these locations can be queried once a shader is compiled to bytecode, after which the relevant meta-data can be stripped, and the overhead of reflection can be avoided at runtime; this is not an option in OpenGL.

Even statically querying the compiler-assigned locations does not help us with the issue that two different shaders with identical (or near-identical) parameter lists may end up with completely different location assignments. This makes it impossible to bind “long-lived” parameters (e.g., per-frame or camera-related uniforms) once and re-use that state across many draw calls. Every time we change shaders, we would need to re-bind everything since the locations might have changed. In the context of OpenGL, this issue means that linkage between separately-compiled vertex and fragment requires an exact signature match (no attributes dropped), unless explicit locations are used.

As it stands today, you can have clean shader code at the cost of messy application logic (and the loss of some useful mix-and-match functionality), or you can have clean application logic at the cost of uglier shader code.

A Brief Digression

On the face of it, the whole situation is a bit silly. When I declare a function in C, I don’t have to specify explicit “locations” for the parameters lest the compiler reorder them behind my back (and eliminate those I’m not using):

int SomeFunction(
    layout(location = 0) float x,
    layout(location = 1) float A[] );

When I declare a struct, I don’t have to declare the byte offset for each field (or, again, worry about unused fields being optimized away):

struct T {
    layout(offset = 0) int32_t x;
    layout(offset = 4) float y;
};

In practice, most C compilers provide fairly strong guarantees about struct layout, and conform to a platform ABI which guarantees the calling convention for functions, even across binaries generated with different compilers. A high level of interoperability can be achieved, all without the onerous busy-work of assigning locations/offsets manually.

Guarantees, and the Lack Thereof

Why don’t shader compilers provide similar guarantees? For example, why not just assign locations to shader parameters in a well-defined manner based on lexical order: the first texture gets location #0, the next gets #1, and so on? After all, what makes the parameters of a shader any different from the parameters of a C function?

(In fact, the Direct3D system already follows just such an approach for the matching of input and output attributes across stage boundaries. The attributes declared in a shader entry point are assigned locations in the input/output signature in a well-defined fashion based on order of declaration, and unused attributes aren’t skipped.)

Historically, the rationale for not providing guarantees about layout assignment was so that shader compilers could optimize away unreferenced parameters. By assigning locations only to those textures or constants that are actually used, it might be possible to compile shaders that would otherwise fail due to resource limits. In the case of GLSL, different implementations might perform different optimizations, and thus some might do a better job of eliminating parameters than others; the final number of parameters is thus implementation-specific.

This historical rationale breaks down for two reasons. First is the simple fact that on modern graphics hardware, the limits are much harder to reach. The Direct3D 10/11 limits of 128 resources, 16 samplers, and 15 constant buffers is more than enough for most shaders (the limit of only 8 UAVs is a bit more restrictive). Second, and more important, is that if a programmer really cares about staying within certain resource bounds, they will carefully declare only the parameters they intend to use rather than count on implementation-specific optimizations in driver compilers to get them under the limits (at which point they could just as easily use explicit locations).

One wrinkle is that common practice in HLSL is to define several shaders in the same file, and to declare uniform and resource parameters at the global scope. This practice increases the apparent benefit of optimizing away unreferenced parameters. The underlying problem, though, is that the language design forces programmers to use global variables for what are, logically, function parameters. Trying to “fix” this design decision by optimizing away unused parameters is treating the symptoms rather than the disease.

As far as I can tell, there is no particularly compelling reason why a modern shading language should not just assign locations to parameters in a straightforward and deterministic manner. We need an ABI and calling convention for interface between the application and shader code, not a black box.

So Then What About Explicit Locations?

If our compiler assigned locations in a deterministic fashion, would there still be a need for explicit location bindings? Deterministic assignment would serve the same purpose in eliminating the need for a “reflection” API to query parameter bindings (though one could, of course, still be provided).

The remaining benefit of explicit locations is that they allow for us to make the parameter signatures of different shaders “match,” as described above. In the simplest case, a deterministic assignment strategy can ensure that if two shaders share some initial subsequence of parameters in common, then they assign matching locations to those parameters. In the more complex cases (where we want a “gap” in the parameter matching), it seems like the answer is right there in C already: unions allow us to create heterogeneous layouts just fine.

All we need to do, then, is provide a shading language the same kinds of tools that C has for describing layouts (structs and unions), and we should be able to rid ourselves of all these fiddly layout and register declarations.

So What Now?

In the long run, this whole issue is probably moot, as “bindless” resource mechanisms start to supplant the very idea of binding locations as a concept in shading languages and graphics APIs.

In the short term, though, I hope that we can get support for deterministic and predictable layout assignment in near-future versions of HLSL and GLSL. This would allow us to write cleaner and simpler shader declarations, without having to compromise the simplicity of our C/C++ application logic.


08 Apr 09:46

GDC13 Summary: Animation Bootcamp Part 4/6

by David Rosen

In this series of posts, I summarize my takeaways from some of the GDC 2013 sessions for anyone who couldn’t be there, starting with the animation bootcamp sessions on the first day. These are reconstructed from notes and memory, and may not exactly match what the speakers said.

Animating the 3rd Assassin

Jonathan Cooper, Animation Director, Ubisoft Montreal

Jonathan has been animating games for 13 years, including lead roles on Mass Effect 1 & 2, Deus Ex: Human Revolution, and Assassin’s Creed 3, and recently won the DICE award for “Outstanding Achievement in Animation”.

What is Assassin’s Creed? It was originally going to be a new Prince of Persia game, but ended up becoming its own IP, focusing on efficient parkour movements through realistic environments instead of flashy freerunning stunts through designed puzzles. The movement of the assassin is meant to be practical and smooth, with no interruptions from backflips or other tricks.

Every Ubisoft game is pitched to the executives using a 7-minute “target video” showing how the game would look in action, so they put together a video demonstrating the key features they wanted in Assassin’s Creed 3 -- such as tree navigation, rope darts, and deep snow. It ended up being greenlit, but the executives said the assassin in the video was too light and airy. They wanted him to feel heavier and tougher, so that was one of the major animation challenges this time around.

They created many thumbnails and character concepts to figure out how this stronger assassin should look. They decided to remove a lot of the armor and other extra clothing that the old assassins wore, and replace it with muscle mass, giving him a broader, but more streamlined look. He has new animations to make his movements look like they use more effort -- in particular, he has a “sprint propulsion” animation at the beginning of every run which emphasizes how much effort he puts into overcoming his significant inertia.

His attacks and assassinations have more follow-through to show how powerful they are, which at first made them seem less responsive. The team solved this problem with a brute force approach, by animating every possible transition in advance. For example, there is one assassination that starts with a jump off a box, and then ends with a run to the right. Or another that starts with a walking approach from behind, and ends with a walk to the left. It all added up to hundreds of animations just for the wrist-blade assassination move, in a detailed matrix including all the possible start and end states.

Many of the animation problems were solved with similar parametric matrices to blend between a large set of animations. For example, there were many jump animations for different heights and distances, and the system would interpolate between the closest ones for any particular jump. They even used pneumatic jump pad to physically launch stuntmen into the air for various heights and distances, and used those performances as mocap foundations to add more believability to the jump animations.

The technical animation team experimented with many different physics-based and procedural animation techniques, but most of them were rejected -- they created this odd effect in which the characters mindlessly react to the environment, and never anticipate anything. They ended up only using a few procedural effects. Here are two examples: First, there is a detailed lean and foot-placement system for turning while running, which was created procedurally, and then fine-tuned by comparing it to real motion capture data. Second, Assassin’s Creed games have always used IK for the detailed hand and foot placement in climbing, but now there is an additional procedural layer for overall body movement, used for reacting to jumps and landings.

One of the most important secrets of assassin’s creed animation has always been the secondary animation -- such as the cloth simulation on the cape, and the spring physics on the bits of equipment attached to the character. Since these items are driven by the physics system instead of directly keyframed in the animations, they really help smooth out animation transitions and increase the perceived fidelity of the game’s animation. That is why the animation in the assassin levels looks smoother and more detailed than the animation in the Desmond levels.

The force of the new assassin’s attacks is also communicated through the camera: each finishing move has an associated camera animation, which is applied when defeating the last enemy in a group. This allows for close-ups on enemy faces as they die, and impacts reinforced with camera movements. The combat camera is much closer in general, dynamically zooming in and out to be as close as it can possibly be without leaving important enemies out of the frame.

He demonstrated the combat animation process in detail, showing how the actors act out the basic movements, but the curves are all wrong because they don’t actually want to hit each other or move at a dangerous speed. The mocap data is then heavily modified by hand to emphasize impacts, ensure that blows actually connect, and generally increase the speed and energy level.

The non-interactive cinematics were recorded using full performance capture -- that is, simultaneous face, body and voice recording, as popularized by Avatar. As Simon mentioned, mocap is just a starting point for the animation team, but the simultaneous recording is really valuable for getting a natural performance from the actors. With the face cameras, the actors are more confident that details of their facial expressions and body language will be conveyed, so they don’t feel the need to exaggerate as much, or use cartoonish voice inflection.

The talk ended with a demo reel of animation from Assassin’s Creed 3, along with text conveying the sheer quantity of animations of each type. The assassin has 330 jump animations, 220 basic locomotion animations, 280 climbing animations, 210 new assassinations, and 3200 fight animations. There are 3400 crowd animations, and 3000 animal animations! Each animal had its own set of parkour moves so it could follow the player around in the environment, including bears, wolves, foxes, deer, and so on. There were 50 animators and animation coders on the project.

I hoped you liked this summary! The next one will be all about first-person animation.

06 Apr 11:17

Valve on Porting Source Engine to Linux, Android Next

by noreply@blogger.com (Nitesh)
At the GPU Technology Conference, Valve recently talked about their experience of porting Source Engine to Linux. The slides from the session are now available, you can have a look at these slides...

[This is a content summary only. Click on the title link to see full content. First published on www.ubuntuvibes.com]
03 Apr 19:03

Some clarifications on my previous post about young people trying to 'get into' the game industry..

by John Ratcliff

My previous post titled "So your teenager tells you they want to 'make video games for a living'.." has gotten a huge response.  I wanted to take a moment to briefly address some of the criticism it has received.

So far, almost all of my friends who actually work in the game industry have given me strong positive feedback, but, there have been a lot of people upset about it too.

My original post was quite lengthy and had some Charles Bloom rant-like elements to it.  That's a good thing, a post which is full of qualifications and half-expressed opinions can feel a bit wishy-washy.  The fact that it has gotten a strong response and spawned a lot of discussion is a good thing too.  Nevertheless, I want to address some very specific criticism I have received.

(1) People take issue with my claim that you have to be a really great artist to work as an artist in the game industry.  


I really don't know what to say about this.  Every single one of my friends who is a professional artist in the game industry strongly and enthusiastically supported this view.  The level and quality of artwork in video games today is incredibly high.  I really don't see much opportunity for people working as a professional artist in the game industry without being exceptionally talented.

(2) People take issue with my claim that you must be 'born' with artistic talent to become a great professional artist.


I really don't know what to say about this either.  In my experience, and by the way I am an artist myself though not good enough to be a professional one, I have never met a great artist who did not have innate talent.  To me this is kind of like saying 'anyone can learn to be a great musician' or 'anyone can learn to be a great singer'.  I think most people would consider that an absurd statement and to me, it is equally absurd to say 'anyone can learn to be a great artist'.

I'm sorry, I just really do not buy this argument at all.

(3) Many people got upset by my statements about the amount of math necessary to become a computer programmer in the game industry.  Even John Carmack took issue with that statement in a comment on the original post.  I realize that I didn't explain this carefully enough, so I will now.


To begin with, the core of my argument is that you should not get a degree/certificate from a 'gaming college'. I have said this, as well as many others, that not only are most of these institutions running a scam, preying on the hopes and dreams of young people, but most importantly they terribly limit your opportunities.

My argument is that if you aren't good enough to be accepted to a respected university for a degree program as a software engineer, why would you assume you have the necessary skills to work in the video game industry?

My argument is that you should pursue a college degree in computer science at a respected university.  By doing so, you open yourself up to the opportunity to get a job in a lot of fields with any number of technology companies.  The minimum math requirement to get a degree in computer science at most schools is three semesters of calculus, differential equations, matrix and linear algebra.

Now, you may not need all of that math to actually program games.  But you sure need all of that math to get a computer science degree.  I think this may be the distinction people were missing in my original post.

Yes, lots of things in computer games today require a solid math background.  And, yes, there is also a lot of work in computer games which do not require great math skills.  But, I assure you, that a degree in computer science from a respected University is going to require that you to take these classes.

The audience for my original article was to parents with teenagers wanting to pursue 'video games' as a career.  My point was, and remains, that if your child does not have an aptitude for math then it's probably a strong indication they are not going to be successful on a path as a software engineer.

I didn't mean to imply that your kid must be a 'math genius' literally.  What I meant is that only a small percentage of high school students are interested in and take advanced math classes.  Compared to the rest of the student body these kids are the 'brains', the 'smart kids'.  The mere fact that you take a calculus class in high school puts you in some pretty rarefied company.  You are probably on the math team and probably have a deep love and interest in math.

Most importantly to my original point, having that level of math in high-school is generally considered a prerequisite to pursue a degree in computer science.  When I went to school the math requirement was the first thing that weeded out most engineering students.

So, what I was trying to get at is that if your teenager expresses an interest in programming computer games but shows no aptitude for math, that should probably rule them out for pursuing a degree in computer science.

A software engineer is a problem solver.  That is what they are driven to do.  It ultimately doesn't matter so much whether those problems are in computer games, or finance, or robotics, or any one of hundreds of fields.  Personally I have worked on software in a lot of different fields, including doing cardiovascular research at St. Louis University hospital while, at the same time, I was writing a computer game for Electronic Arts.

In my experience, the standards for being hired as a software engineer (in a salaried position) at a game company are really, really, high.

I've done a bunch of hiring in my life and I know the kind of standards we expect candidates to meet.  More to the point, I have gone on interviews at game companies where they wouldn't even consider hiring me either.

I want to be very clear here.  I am not that brilliant of a guy.  I am certainly not a genius or anything like that.  What I am is a very hard worker.  I have a very strong work ethic and I am really good at computer programming and debugging.  I am largely self taught since most everything we do in the game industry was invented long after I left college.

I should also point out that I do not, personally, have a four year college degree.  I have an associate of arts degree and then I attended college at the University of Missouri Rolla to pursue a degree in computer science.

At the time that I attended school we still used punch-cards and programmed in FORTRAN and COBOL.  The C language wasn't even a consideration back then.

Due to financial reasons I was never able to finish my degree and I paid for that heavily.

My employment options were very limited. I could not get a job at any one of hundreds of technology companies which require a bachelors degree to get in the door.

I made far less money in salary than I could have.  I worked for a couple of years writing games and educational software, largely in assembly language, for a $12,000 a year salary!  At the same exact time (this is 1982) my friends who got their four year degree from UMR in computer science were going to work for McDonnell Douglas, Lockheed, and other companies with starting salaries of nearly $50,000 a year!  It took me many years go get up to that level.

And, over the years, I have seen many self-taught computer programmers, as talented as they may have been, struggle with employment opportunities and lower salaries.

So, my advice is all born out of personal experience.

Can you be a brilliant computer programmer without ever going to school?  Absolutely.  Can you write incredible software without knowing a great deal of math?  Without a doubt.

However....you are digging yourself a very long and deep hole if you do so.

As a parent, do you want to steer your child towards a career path which affords the most opportunity?  Then, my advice is, and remains without apology, that you should dissuade them from pursuing a limited education at a 'gaming institute' and instead encourage them to pursue a degree in computer science from a reputable University.  That will afford them them best salary and the widest array of employment opportunities in the future.  And, if they do not have the aptitude, interest, or desire to pursue a degree in computer science then I would argue, for sure, getting a lower degree with a lower standards and quality education from a 'gaming school' is a complete and utter waste of time and money.

(4) One thing that really annoyed people is that I talked about how there were 'only two jobs' in the game industry and that is simply not true.

This is another case where I didn't parse things carefully enough.  Yes, there are lots of other jobs in the game industry other than simply artist or programmer.  However, many of those jobs are low paying and, therefore, not viable as a career choice (which was the focus of my original article).

Let's put it this way... because I have been in the game industry for so many years, I am constantly flooded with notices from headhunters.  I daily see them looking for artists and programmers with varying levels of experience. I can barely remember them ever putting out an open casting call for a game designer.  I'm sure it happens, but not that often.  There are a handful of celebrity game designers but it's hardly the sort of thing you go get a dedicated college degree in and then fight off the six figure job offers that come pouring in.

The problem with these other jobs is that for every one job opening there are probably hundreds and hundreds of people who would love to get it.  People want to get into the game industry so badly that they will work for free just for the chance.

I understand how much young people love games and romanticize the idea of working for a game company. But, as a parent, is this really the career path you want to direct your child towards?

Other jobs in the game industry, such as in management and marketing, are certainly available, but a conventional business degree is all that is necessary to pursue those.  For game design, a liberal arts program with some computer classes and game theory are all good, but don't set yourself up for failure by limiting your career choices.

My advice all stays the same.  Don't limit yourself by attending a 'game university'. Get a degree where you can find opportunities in many fields and industries.

But, if you pursue a path where the single and only outcome is getting one of the few, rare, jobs open for a game designer in the game industry, you are setting yourself up for a major disappointment.  The chances are really quite low.

(5) Another critique is that I didn't address all of the opportunities available in the game industry as a whole.

This is true because my article was specifically addressing salaried positions available from established game companies as a career choice.  It would be a completely separate article if I were to talk (enthusiastically) about the indie gaming scene and entrepreneurial business models in gaming.

It's a super exciting time in the game industry with the advent of the mobile platforms.  There are all kinds of incredible opportunities for young people to get into games and completely bypass the traditional established game publishers.

And, you sure don't need a college education to pursue these opportunities either.  However, that really isn't what my previous article was about.

For a more colorful observation on this topic, I direct you to a blog post by my friend Charles Bloom.

(6) This one nobody argued with me about.  The game industry as a 'career' is extremely unstable, and this simply reinforces how important it is not to limit yourself to this very iffy field.  Just to hammer home that point, read these headlines that just came out *today*!  Note the reward for one team shipping a game is they all lose their jobs; hurray!

By the way, to make my point futher about the predatory practices of these 'gaming colleges', it's more than a little bit amusing that Google AdWords keeps inserting 'game design school' advertisements at the top of these posts now.  It's probably not helping their cause much...

Square Enix LA office again hit with layoffs

by John Keefer, Apr 03, 2013 1:00pm PDT
Related Topics – Square EnixLayoffs
Square Enix LA office again hit with layoffs
The Los Angeles offices of Square Enix America have againbeen hit with an unspecified number of layoffs, reportedly including the CEO and head of marketing. The office houses mostly marketing and public relations staff.




Activision lays off 40 at Deadpool developer

by John Keefer, Apr 03, 2013 12:00pm PDT
Activision lays off 40 at Deadpool developer
Deadpool is scheduled to come out sometime this summer, and development is winding down at High Moon Studios. Unfortunately for 40 full-time members of the dev team, that not only meant an end to their work on the game, but a pink slip as well.





Disney closes LucasArts, cancels current projects

by Steve Watts, Apr 03, 2013 10:40am PDT
Related Topics – LucasArtsStar Wars Series
Disney closes LucasArts, cancels current projects
Disney has shut down LucasArts and canceled all current projects, as part of a move toward licensing out the Star Wars brand to other publishers. This would impact Star Wars 1313and Star Wars: First Assault.


03 Apr 19:02

General wisdom

by noreply@blogger.com (Dennis Gustafsson)
I'm following quite a few game programming blogs, and whenever there is a post about a lifehack or general wisdom that can help me simplify my work I'm all ears. So, I thought I'd share some of my own experiences:

Automate everything that can be automated. Especially project file generation. Editing project files is a real energy drainer, and even though IDE's are trying to make the process smooth, it never is. This becomes a big problem first when multiple platforms come into the picture. Personally I have a Python script that take a few input parameters, scans the source tree and outputs a nice project file for Visual Studio, or a makefile. You have to bite the sour apple every time Visual Studio changes it's project file format, but it's so worth it. I also have similar scripts for documentation, distribution and in some cases code generation. Writing the scripts take a while, but they can be reused, you get better at writing them every time you do it, and it's more fun than doing dull monkeywork over and over again.

Minimize external library dependencies. People are way too eager on including external libraries and middleware in their projects. I think it is very common that the usage of libraries and middleware end up costing way more than it would have done just writing the code yourself. Only include an external library to your project if it: 1) Solves one specific task extremely well. 2) Can be trusted doing that. 3) Puts you in control of all memory and IO operations. 4) Can easily be included as source code.

Keep everything in the same project. This ties into the last criteria for using external libraries above. I want all third party libraries to be part of the source tree. Not a dynamic library, not a static library, not even a separate project in Visual Studio, just plain soure code in a separate folder. This is important, because it simplifies cross-platform development, especially when automatically generating project files. It also completely takes away the problems with conflicting runtimes for static libraries, mismatching PDB's, etc. It's all going in the same binary anyway, just put your files in the same project and be done with it.

Refactor code by first adding new and then remove old. I used to do it the other way around for a long time, ripping out what's ugly, leaving the code base in a broken state until the replacement code is in place. Yes, it sounds kind of obvious in retrospect, but it took me a long time to actually implement this behavior in practice. The only practical problem I've experienced is naming clashes. I usually add a suffix to the replacement code while developing and then remove once the original code is gone. As an example, if you want to replace your Vector3, create a new called Vector3New, and then gradually move your code base over to using Vector3New, while continuously testing, and when you're done, remove the original Vector3 and do a search/replace for Vector3New to Vector3.

Don't over-structure your code. This one is really hard. People often talk about code bases lacking structure, but I think it's a much worse and more common problem that a code base has inappropriate structure, or just too much of it. Consider this - given two implementation of some algorithm, where one is a couple of large messy functions in a single file and the other is fifteen files with a ton of inherited classes, abstract interfaces, visitors and decorators. Given none of them suits your current needs, which one would you rather refactor? My point is that you shouldn't try to structure something until you know all the requirements. Not to save time first building it, but because it's a pain in the ass to restructure something that already has structure. You can compare it to building a house. Would you rather start with a pile of building material or first disassemble an existing building? To me that's a no-brainer, even if the pile happens to be quite messy. Hence, never define an abstract interface with only one implementation, never write a manager that manages one object, etc. Just start out writing your desired functionality in the simplest possible way, then structure it if and when there is a need for it.

Stay away from modern C++ features and standard libraries. I've tried introducing bits and pieces from STL, boost, exceptions and RTTI throughout the years, but every time I do, something comes out and bites me right in the butt. Buggy implementation, compiler bugs, missing feaures, restrictions on memory alignment, etc. This is depressing and discouraging, but the sad truth we have to deal with. If you want your code to be truly portable without the hassle (not just in theory, but in practice) you'll have to stick to a very small subset of the C++ standard. In my experience it's better to just accept this and design for it rather than putting up a fight.

Use naming prefixes rather than namespaces. I was advocating namespaces for a long time, but now I've switched sides completely and use prefixes for everything. I kind of agree prefixes are ugly, but it has two obvious benefits that just makes it worth it. A) You can search your code base for all instances of a particular class or function, and B) it makes forward declarations as easy as they should be. With namespaces, especially nested, forward declarations is just painful, to a point where you tend to not use them at all, leaving you with ridiculous build times. I usually don't even forward declare classes at the top any more, but rather inline them where needed, like: "class PfxSomeClass* myFunction(class PfxSomeOtherClass& param)".


02 Apr 08:03

007 Legends - The World

by Simon

This article moved to its own webspace! You can follow him by clicking this:
01 Apr 13:27

GDC 2013

by Jare

It’s that time of the year again, when everyone else is partying and having fun (or maybe not – WTF IGDA?) at GDC in San Francisco. I’ll try to collect any links to lecture materials I come across. If you know of stuff missing here, please tell me about it on Twitter @TheJare. You can also check industry sites like Gamasutra, Polygon or Develop for ongoing coverage, and hopefully a lot of these materials will show up in the GDC Vault soon.

Edit: also check out eXile’s awesome and better organized compilation (focused on programming & rendering).

Constantly updated…

30 Mar 11:04

GDC 2013 Slides

by Ogre

 

I uploaded the slides from my GDC talk, “Network Serialization and Routing in World of Warcraft”.  You can download them in several formats:

  • Keynote What I used to build and present them
  • PowerPoint Auto-exported from Keynote, haven’t checked it AT ALL.  Likely has a few broken things
  • PDF - maybe your best bet to just read them

Also, Simon Koo (@sm9kr) seems to have written a summary of the talk in Korean along with pictures of all or most of the slides.  I’m taking his word for it that the text is actually about my talk!

FlashForwardSlide.001

29 Mar 10:11

Two Uses of Voxels in LittleBigPlanet 2’s Graphics Engine - Anton and Alex at Siggraph 2011

Alex recently voyaged over land and sea to attend Siggraph 2011 in Vancouver. There he gave a talk entitled ‘Two Uses of Voxels in LittleBigPlanet2’s Graphics Engine’ as part of the Advances in Real-Time rendering in 3D Graphics and Games course.

Oooooh!

The slides from that talk are now available to interested people,  along with some other talks on the same subject matter too, from Bungie, Crytek, EA and DICE!

28 Mar 09:33

Adaptive Volumetric Shadow Maps

by mad\rssvilpa

Adaptive Volumetric Shadow Maps use a new technique that allows a developer to generate real-time adaptively sampled shadow representations for transmittance through volumetric media.  This new data structure and technique allows for generation of dynamic, self-shadowed volumetric media in realtime rendering engines using today’s DirectX 11 hardware.

Each texel of this new kind of shadow map stores a compact approximation to the transmittance curve along the corresponding light ray.  The main innovation of the AVSM technique is a new streaming compression algorithm that is capable of building a constant-storage, variable-error representation of a visibility curve that represents the light ray’s travel through the media that can be used in later shadow lookups.

Another exciting part of this sample is the use of a new Pixel Shader Ordering feature (called PixelSync) found in new Intel graphics hardware. Two implementations are provided in this sample - a standard DirectX implementation, and an implementation that utilizes this new graphics hardware feature.

This sample was authored by Matt Fife (Intel), Filip Strugar (Intel) with contributions from Leigh Davies (Intel), based on an original whitepaper and samplet from Marco Salvi (Intel)

Adaptive Volumetric Shadow Maps whitepaper by Marco Salvi.

  • PixelSync
  • InstantAccess
  • Pixel Shader Ordering
  • Icon Image: 

    Attachments: 

    http://software.intel.com/sites/default/files/blog/382519/avsm.zip http://software.intel.com/sites/default/files/blog/382519/avsm-white-paper.docx
  • Download
  • Product Support
  • Technical Article
  • 28 Mar 09:33

    CPU Texture Compositing with InstantAccess

    by mad\djbookou
    CPU Texture Compositing with InstantAccess

     

    Introduction

    The InstantAccess, also known as Direct Resource Access (DRA), extension that Intel has implemented for 4th Generation Intel® Core™ processors Intel®HD Graphics allows direct access to memory allocated on the GPU. The extension provides a mechanism for specifying which buffers will be shared and locking the memory for reading and writing from CPU side code. This sample updates the existing CPU Texture Compositing sample to use the InstantAccess extension for the composited textures.

    InstantAccess

    The InstantAccess extension provides a mechanism for directly reading from and writing to buffers that are allocated for the GPU. The extension requires some additional set up code and overrides the D3D functions CopyResource and Map. To utilize the extension, some host-side initialization code is required. Sample code is provided in the IGFXExtensionsHelper.h/cpp files. Before using any extension, the Init function must be called.

    To utilize InstantAccess , two buffers are created, each with a call to SetDirectAccessResouceExtension immediately before the buffer creation calls. One buffer is created as a normal, but the call to SetDirectAccessResouceExtension marks it as lockable from the CPU. The other buffer is created as a staging buffer, but the call to SetDirectAccessResouceExtension means that the buffer will be used for InstantAccess. Finally, the D3D CopyResource function is called with the two buffers as parameters to bind them together.

    To get CPU-side access to the resource, the application calls the D3D Map function on the staging buffer. The data returned in the D3D11_MAPPED_SUBRESOURCE structure is RESOURCE_SHARED_MAP_DATA structure. This new structure contains information about the GPU resource including the CPU-side pointer to the memory.

    Sample Implementation

    The terrain is broken into tiles where each tile composites 5 diffuse and 5 normal map textures together based on a blend texture that spans the entire terrain. The 9 tiles that surround the camera will use a one diffuse texture and one normal texture that are pre-composited on the CPU.

    The original sample composites textures together asynchronously into staging textures. The main thread maps the staging textures and creates the tasks for compositing. The main thread checks each frame if any textures have finished compositing. The main thread unmaps the staging textures and calls CopyResource to copy the texture from the staging resource to the standard Texture2D resource. The copy uses the graphics execution units to perform a swizzle from the linear memory for mat of the staging buffer to the tiled format used by texture resource.

    The new InstantAccess pass is very similar, but instead of writing into a staging buffer, the task copies and swizzles the composited texture directly into the texture memory. This avoids the synchronous copy and performs the swizzle by the CPU saving valuable execution unit time.

     

    Icon Image: 

    Attachments: 

    http://software.intel.com/sites/default/files/blog/382469/cputexturecompositing-dra.zip
  • Sample Code
  • 28 Mar 09:32

    Programmable Blend with PixelSync

    by mad\djbookou
    Programmable Blend with PixelSync

     

    Introduction

    PixelSync, also known as Pixel Shader Ordering, is a graphics extension that Intel has implemented for Intel's 4th Generation Core Processors with processor graphics. PixelSync guarantees ordered access to unordered access view resources from a pixel shader. This sample demonstrates how to use PixelSync to perform blending in a pixel shader without using fixed function blending.

    Pixel Shader Ordering

    PixelSync provides a mechanism for controlling access to memory in the pixel shader. When invoked all reads and writes to the specified resources from a given pixel location will be performed in submit order. Note that this order is not guaranteed across pixels. For example, if pixel shader invocations operating in two different pixel locations attempt to modify the same memory location, no ordering can be assumed.

    To utilize the extension, some host side initialization code is required. Sample code is provided in the IGFXExtensionsHelper.h/cpp files. Before using any extension, the Init function must be called.

    Additionally, prior to utilizing the extension in a pixel shader, the shader must execute the IntelExt_Init() provided in IntelExtensions.hlsl. IntelExt_BeginPixelShaderOrderingOnUAV( RENDER_TARGET_UAV_SLOT ) or IntelExt_BeginPixelShaderOrdering( ) to serialize access to all bound resources. The extension mechanism uses the final Render Target View / Unordered Access View slot, so that slot will not be available for use in the shader.

     

    Sample Implementation

    The sample uses a shared exponent floating point format for the render target where 8 bits of precision are used to store the mantissa for the red, green, and blue values and they all share a single exponent value. This format is not supported by DirectX and is fully defined in the pixel shaders and could be modified to change the distribution of values. Because the format is not supported, fixed function blending cannot be configured to correctly blend values in the render target.

    The sample renders all of the opaque geometry, then binds the render target as an unordered access view, and uses the PixelSync extension to perform blending of transparent geometry. If the extension is not available, the shader will still perform the blending in the pixel shader, but artifacts will appear where transparent geometry overlaps.

    The sample also provides a path that uses the DXGI_R11G11B10_FLOAT format and fixed function blending for comparison.

    Icon Image: 

    Attachments: 

    http://software.intel.com/sites/default/files/blog/382465/programmableblend-pixelshaderordering.zip
  • Sample Code
  • 17 Mar 19:10

    GCN's 25 Performance Tips from Nick Thibieroz

    by Matias N. Goldberg
    Update: Tips 26 to 50 were published.

    Nicolas Thibieroz from AMD has been posting a daily series of "Performance Tips" through his Twitter account.

    These performance tips refer to the GCN architecture, which stands for "Graphics Core Next" that can be found in the ATI Radeon HD 7000 series.

    I do NOT work at AMD, and I thought it would be a shame if all these tips would be lost in the Twitterverse as it is not the most reliable place to keep long term documentation nor tips. They will just get lost forever or become sparsed.

    So, I've gathered all of them and posted here. I will try to keep this page up to date as he keeps posting more of them.
    Before you start reading, if you're really new to how GPUs work or some of these tips leave you with "wth is he talking about??" I highly recommend Emil Persson's ATI Radeon HD 2000 programming guide & Depth In Depth. Also reading NVIDIA's old GPU Programming Guide GeForce 8 & 9 Series is also very enlightening and can constrast differences.
    It's note worthy that many old tips still apply, and that some of the new tips are also general advices (apply to many other archs as well, including old ones).

    Ok, here you go:
    #1: Issues with Z-Fighting? Use D32_FLOAT_S8X24_UINT format with no performance or memory impact compared to D24S8.

    #2: Binding a depth buffer as a texture will decompress it, making subsequent Z ops more expensive.

    #3: Invest in DirectCompute R&D to unlock new performance levels in your games.

    #4: On current GCN DX11 drivers the maximum recommended size for NO_OVERWRITE dynamic buffers is 4Mb.

    #5: Limit Vertex and Domain Shader output size to 4 float4/int4 attributes for best performance.

    #6: RGBA16 and RGBA16F are fast export, use those to pack G-Buffer data and avoid ROP bottlenecks.

    #7: Design your game engine with geometry instancing support from an early stage.

    #8: Pure Vertex Shader-based solutions can be faster than using the GS or HS/DS.

    #9: Use a ring of STAGING resources to update textures. UpdateSubresource is slow unless texture size is <4Kb.

    #10: DX11 supports free-threaded resource creation, use it to reduce shader compilation and texture loading times.

    #11: Use the smallest Input Layout necessary for a given VS; this is especially important for depth-only rendering.

    #12: Don't forget to optimize geometry for index locality and sequential read access - including procedural geometry.

    #13: Implement backface culling in Hull Shader if tessellation factors are on the high side.

    #14: Use flow control in shaders but watch out for GPR pressure caused by deep nested branches.

    #15: Some shader instructions are costly; pre-compute constants and store them in constant buffers (e.g. reciprocals).

    #16: Use [maxtessfactor(X)] in Hull Shader declaration to control tessellation costs. Max recommended value is 15.

    #17: Filtering 64-bit texture formats is half-rate on current GCN architectures, only use if needed.

    #18: clip, discard, alpha-to-mask and writing to oMask or oDepth disable Early-Z when depth writes are on.

    #19: Writing to a UAV or oDepth disables both Early-Z and Hi-Z unless conservative oDepth is used.

    #20: DispatchIndirect() and Draw[Indexed]InstancedIndirect() can be used to implement some form of conditional rendering.

    #21: Use a multiple of 64 in Compute Shader threadgroup declaration. 256 is often a good choice.

    #22: Occlusion queries will stall the CPU if not used correctly.

    #23: GetDimensions() is a TEX instruction; prefer storing texture dimensions in a Constant Buffer if TEX-bound.

    #24: Avoid indexing into arrays of shader variables - this has a high performance impact.

    #25: Pack Vertex Shader outputs to a float4 vector to optimize attributes storage.

    Personal notes

    These are my personal notes on the tips. Remember, I do not work at AMD and I'm human, I could be wrong:
    #1: Issues with Z-Fighting? Use D32_FLOAT_S8X24_UINT format with no performance or memory impact compared to D24S8.
    Interestingly, AMD was not recommending DXGI_FORMAT_D24_UNORM_S8_UINT for shadow maps in 2008. They recommended instead to use DXGI_FORMAT_D16_UNORM (better) or DXGI_FORMAT_D32_FLOAT (slower)
    NVIDIA, on the other hand, recommended DXGI_FORMAT_D24_UNORM_S8_UINT and noted that DXGI_FORMAT_D32_FLOAT has lower ZCULL efficiency. And unlike AMD, they completely disregarded DXGI_FORMAT_D16_UNORM as it will not save memory or increase performance

    Source: GDC 08 DirectX 10 Performance

    #6: RGBA16 and RGBA16F are fast export, use those to pack G-Buffer data and avoid ROP bottlenecks.

    According to Thibieroz in 2011 export costs should be calculated as follow:
    AMD: Total Export Cost = ( Num RTs ) * ( Slowest RT )
    NVIDIA: Total Export Cost = Cost( RT0 ) + Cost( RT1 ) + Cost( RT2 ) +...

    I don't know if the same formula still applies for GCN architecture.

    AMD was discouraging the use of RGBA16 back then, so probably GCN improved in this aspect.
    NVIDIA said cost is proportional to bit depth except:
    • <32bpp same speed as 32bpp
    • sRGB formats are slower
    • 1010102 & 111110 are slower than 8888
    Source: GDC 2011 Deferred Shading Optimizations

    #12: Don't forget to optimize geometry for index locality and sequential read access - including procedural geometry.
    AMD Tootle is an excellent tool for that. There was also a paper in which Tootle was inspired from (or was it backwards?) so you should look for it if you want to do your own implementation.
    It's worth noting this tip is even more important for Mobile GPUs.

    #17: Filtering 64-bit texture formats is half-rate on current GCN architectures, only use if needed.
    When asked further about it by Doug Binks, Thibieroz clarified he meant that 64-bit bilinear filtering is half the rate of 64-bit point filtering. I had the same doubt, so I thought it was worth mentioning.

    #18: clip, discard, alpha-to-mask and writing to oMask or oDepth disable Early-Z when depth writes are on.
    Won Chun  asked "Oh, only clip/mask/discarded fragments are affected, not subsequent fragments that land on the same pixel. Cool." to which Thibieroz replied "Correct :)"
    This is important. IIRC some old hardware, when using clip/discard; they would not only prevent Early-Z, but they would also prevent Early-Z for any next draw call even if subsequent passes didn't use discard at all.

    As a further note, it looks like GCN is a step backwards in comparison to ATI Radeon HD 2000. According to Persson's Depth In Depth, page 2 and 3, Early Z was enabled for the discard/clip cases. May be the documentation was incorrect. May be it's a step backwards. Bummer.

    #24: Avoid indexing into arrays of shader variables - this has a high performance impact.

    I asked whether he was referring to constant waterfalling. To those who don't know, Constant Waterfalling happens when indexing constant variables as an array. When many vertices that are being processed together try to index a different portion of the constant register, operations have to be serialized (i.e. HW skinning & some forms of instancing). Therefore, you should be arranging/sorting the vertices in a way that they all read the same index sequentially to reduce serialization. Constant Waterfalling doesn't happen if all your vertices access the same indices in the same order (i.e. likely when doing lighting calculations)

    Anyway, he wasn't referring to that, he answered: "Referring to declaring (and indexing into) arrays of temp variables. Those increase GPR usage and affect latency hiding".
    Well, that one's new for me!


    Last Updated:  2013-03-16