Shared posts

03 Aug 21:01

Second Reality – source code

by admin
Just a quick follow-up to my previous note. As mentioned by Michal, Future Crew guys decided to celebrate the 20th anniversary of Second Reality in the best way possible – they released a full source code. Obviously, it’s more of a tidbit than anything else, but it’s still interesting to finally see how certain effects [...]
02 Aug 05:35

Demoscene tribute – Second Reality

by admin
I know I claimed I would not write about Second Reality here, mostly because everyone knows it, but it’s a special day today… It’s been exactly 20 years since Second Reality has been shown for the first time, at Assembly 1993. I’ve seen it few months later and still remember that day. I was living [...]
01 Aug 10:25

A Macbook Air and a Thunderbolt GPU

by Brian Benchoff

GPU

When Intel and Apple released Thunderbolt, hallelujahs from the Apple choir were heard. Since very little in any of Apple’s hardware lineup is upgradeable, an external video card is the best of all possible world. Unfortunately, Intel doesn’t seem to be taking kindly to the idea of external GPUs. That hasn’t stopped a few creative people like [Larry Gadea] from figuring it out on their own. Right now he’s running a GTX 570 through the Thunderbolt port of his MacBook Air, and displaying everything on the internal LCD. A dream come true.

[Larry] is doing this with a few fairly specialized bits of hardware. The first is a Thunderbolt to ExpressCard/34 adapter, after that an ExpressCard to PCI-E adapter. Couple that with a power supply, GPU, and a whole lot of software configuration, and [Larry] had a real Thunderbolt GPU on his hands.

There are, of course, a few downsides to running a GPU through a Thunderbolt port. The current Thunderbolt spec is equivalent to a PCI-E 4X slot, a quarter of what is needed to get all the horsepower out of high-end GPUs. That being said, it is an elegant-yet-kludgy way for better graphics performance on the MBA,

Demo video below.


Filed under: macs hacks
29 Jul 06:35

Larry and Jen Do Roman Numerals in C++—Jon Jagger and Olve Maudal

larry-and-jen-roman.PNGYes, C++ is for beginning programmers too. This is a delightful introduction to C++ programming -- and with nice test-first style to boot.

Larry and Jen Do Roman Numerals in C++

by Jon Jagger and Olve Maudal

26 Jul 08:26

SIGGRAPH 2013 Course: Physically Based Shading in Theory and Practice

by Timothy Lottes
16 Jul 08:15

DX11: GPU "printf"

by DEADC0DE
So, first a little "announcement": I'm crafting a small DX11 rendering framework in my spare time. I want to have it opensourced, and it's based on MJP's excellent SampleFramework11.
The goals are to provide an environment roughly as fast to iterate upon as FXComposer was (I consider it dead now...) but for programmers, without being a "shader editor".
If you're interested in collaborating, send me an email at c0de517e (it's a gmail account) with a brief introduction, there is an interesting list of things to do.

That said, this is a little bit of functionality Maurizio Cerrato and I have been working on in a couple of days, a "printf" like function for pixel (and compute) shaders. It all started when chatting Daniel Sewell (a brilliant guy, was my rendering lead on Fight Night) he made me notice that he found, working on CS that a neat way to debug them was to display all kinds of interesting debug visualizations by having geometry shaders "decode" buffers and emit lines.

if(IsDebuggedPixel(input.PositionSS.xy)) DebugDrawFloat(float2(ssao, bloom.x), clipPos);
The astute readers will at this point have already all figured it out. PS and CS support append buffers, so a "printf" has only to append some data to a buffer that later you can convert to lines in a geometry shader.

You could emit such data per each PS invocation and later sift through it and display what you needed in a meaningful way, but that will be quite slow (and at that point you might want to consider just packing everything into some MRT outputs). The idea behind appendbuffers is to do the work only for a handful of invocations (e.g. screen positions, if current sv_position equals the pixel to "debug" then GPU printf...).

In order to keep everything snappy we also minimize the structure size we use in the append buffer, you can't really printf strings, the debugger so far support only one to three floats w/color and position or lines. Lines is were we started really, our struct containts two end-points a color (index) and a flag which distinguishes lines from float printf. Floats just reinterpret one of the endpoints as the data to print.

This append buffer structure gets then fed to a VS/GS that is invocated twice the times the append buffer count (via draw indirect, you need to multiply by two the count in a small CS, remember, you can't emit the start/end vertices as two separate append calls because the order of these is not deterministic, the vertices will end all mixed in the buffer!), and the GS emits extra lines if we're priting floats to display a small line-based font.

If you're thinking that is lame, well it is, there are certain limitations in the number of primitives the GS can emit that effectively limit the number of digits you can display, and you have to be careful about that, I "optimized" the code to display the most digits possible which unfortunately gives you very low-precision 3-float printf and higher precision 2-float and 1-float (you could though call three times the 1-float version... as there the ordering of the three call doesn't matter).

Keeping the same number of printed digits, the point has to float...
Why not using a bitmap font instead? Glad you asked. Lazyness, partially justified by the fact that I didn't want to have two different append buffers, one for lines and one for fonts, as the append buffers are a scarce resource on DX11. But it's a very lame justification, because there are plenty of workarounds left for the reader, you could filter the append buffer in two drawcalls in a computer shader, or even draw lines as quads, which would probably be better anyways!

Anyhow, together with shader hot-reloading (which everybody has, right), this is a quite a handy trick. Bonus: on a similar note, have a look at this shadertoy snippet by my coworker Paul Malin... brilliant guy!

Some code, without doubt full of bugs:

Snippet from the CPU/C++ side, drawing the debug lines...
void ShaderDebugDraw(ID3D11DeviceContext* context, const Float4x4& viewProjectionMatrix, const Float4x4& projMatrix )
{
    SampleFramework11::PIXEvent market(L"ShaderDebug Draw");

    context->CopyStructureCount(AppendBufferCountCopy, 0, AppendBuffer.UAView);

    // We need a compute shader to write BufferCountUAV, as we need to multiply CopyStructureCount by two
    ID3D11ShaderResourceView* srViews[] = { AppendBuffer.SRView };
    ID3D11UnorderedAccessView* uaViews[] = { AppendBufferCountCopyUAV };
    UINT uavsCount[] = { 0 };
    context->CSSetUnorderedAccessViews(1, 1, uaViews, uavsCount);
    context->CSSetShader(DebugDrawShader.AcquireCS(), NULL, 0);
    context->Dispatch(1,1,1);
    context->CSSetShader(NULL, NULL, 0);
    uaViews[0] = NULL;
    context->CSSetUnorderedAccessViews(1, 1, uaViews, uavsCount);

    // Set all IA stage inputs to NULL, since we're not using it at all.
    void* nulls[D3D11_IA_VERTEX_INPUT_RESOURCE_SLOT_COUNT] = { NULL };

    context->IASetVertexBuffers(0, D3D11_IA_VERTEX_INPUT_RESOURCE_SLOT_COUNT, (ID3D11Buffer**)nulls, (UINT*)nulls, (UINT*)nulls);
    context->IASetInputLayout(NULL);
    context->IASetIndexBuffer(NULL, DXGI_FORMAT_UNKNOWN, 0);

    // Draw debug lines
    srViews[0] =  AppendBuffer.SRView;
    context->VSSetShaderResources(0, 1, srViews);
    context->GSSetShaderResources(0, 1, srViews);
    context->GSSetShader(DebugDrawShader.AcquireGS(), NULL, 0);
    context->VSSetShader(DebugDrawShader.AcquireVS(), NULL, 0);
    context->PSSetShader(DebugDrawShader.AcquirePS(), NULL, 0);

    shaderDebugDrawDataVS.Data.ViewProjection = viewProjectionMatrix;
    shaderDebugDrawDataVS.Data.Projection = projMatrix;
    shaderDebugDrawDataVS.ApplyChanges(context);
    shaderDebugDrawDataVS.SetVS(context, 0);

    context->IASetPrimitiveTopology(D3D11_PRIMITIVE_TOPOLOGY_LINELIST);
    context->DrawInstancedIndirect(AppendBufferCountCopy, 0);
[...]
This is roughly how the shader library looks for emitting debug lines/debug numbers from pixel shaders
struct ShaderDebugLine
{
float3 posStart;
float3 posEnd;
uint color;
uint flag;
};

cbuffer ShaderDebugData : register(b13)
{
float2 debugPixelCoords;
float2 oneOverDisplaySize;
int debugType;
};
void DebugDrawFloat(float3 number,  float3 pos, int color = 0, uint spaceFlag = SHADER_DEBUG_FLAG_2D)
{
ShaderDebugLine l;
l.posStart = pos;
l.color = color;
l.posEnd = number;
l.flag = SHADER_DEBUG_PRIM_FLOAT3|spaceFlag;
ShaderDebugAppendBuffer.Append(l);
}
float2 SVPosToClipspace(float2 svPos, float2 oneOverDisplaySize) { return (svPos * oneOverDisplaySize) * float2(2,-2) + float2(-1,1); }

bool IsDebuggedPixel(float2 svPos)
{
// This is a bit tricky because it depends on the MSAA pattern

if(debugType == 1)
return dot(abs(debugPixelCoords - svPos + float2(0.5,0.5)), 1.0.xx) <= 0.01f;
else if(debugType == 2)
return dot(abs(svPos % float2(100,100)), 1.0.xx) <= 1.01f;
else return false;
}
And finally, the VS/GS/CS shaders needed to draw the debug buffer emitted from the various PS executions:
static const int DigitFontOffsets[] =
{
0, 8, 10, 20, 30, 38, 48, 58, 62, 72, 82, 84, 86
};

static const float DigitFontScaling = 0.03;
static const float DigitFontWidth = 0.7 * DigitFontScaling; // The font width is 0.5, but we add spacing
static const int DigitFontMaxLinesPerDigit = 5;
static const float2 DigitFont[] =
{
/* 0 */
float2(0.f, 0.f), float2(0.5f, 0.f), float2(0.5f, 0.f), float2(0.5f, -1.f),
float2(0.5f, -1.f), float2(0.f, -1.f), float2(0.f, -1.f), float2(0.f, 0.f),
/*1*/
float2(0.5f, 0.f), float2(0.5f, -1.f),
/*2*/
float2(0.f, 0.f), float2(0.5f, 0.f), float2(0.5f, 0.f), float2(0.5f, -0.5f),
float2(0.5f, -0.5f), float2(0.f, -0.5f), float2(0.f, -0.5f), float2(0.f, -1.f),
float2(0.f, -1.f), float2(0.5f, -1.f),
/*3*/
float2(0.f, 0.f), float2(0.5f,0.f), float2(0.5f,0.f), float2(0.5f,-0.5f),
float2(0.5f,-0.5f), float2(0.f,-0.5f), float2(0.5f,-0.5f), float2(0.5f,-1.f),
float2(0.5f,-1.f), float2(0.f,-1.f),
/*4*/
float2(0.f, 0.f), float2(0.f, -0.5f), float2(0.f, -0.5f), float2(0.5f, -0.5f),
float2(0.5f, -0.5f), float2(0.5f, 0.f), float2(0.5f, -0.5f), float2(0.5f, -1.f),
/*5*/
float2(0.f, 0.f), float2(0.f, -0.5f), float2(0.f, -0.5f), float2(0.5f, -0.5f),
float2(0.5f, -0.5f), float2(0.5f, -1.f), float2(0.f, 0.f), float2(0.5f, 0.f),
float2(0.f, -1.f), float2(0.5f, -1.f),
/*6*/
float2(0.f, 0.f), float2(0.f, -1.f), float2(0.f, -0.5f), float2(0.5f, -0.5f),
float2(0.5f, -0.5f), float2(0.5f, -1.f), /* avoidable */ float2(0.f, 0.f), float2(0.5f, 0.f),
float2(0.f, -1.f), float2(0.5f, -1.f),
/*7*/
float2(0.5f, 0.f), float2(0.5f, -1.f), float2(0.5f, 0.f), float2(0.f, 0.f),
/* 8 */
float2(0.f, 0.f), float2(0.5f, 0.f), float2(0.5f, 0.f), float2(0.5f, -1.f),
float2(0.5f, -1.f), float2(0.f, -1.f), float2(0.f, -1.f), float2(0.f, 0.f),
float2(0.f, -0.5f), float2(0.5f, -0.5f),
/*9*/
float2(0.f, 0.f), float2(0.5f, 0.f), float2(0.5f, 0.f), float2(0.5f, -1.f),
float2(0.5f, -0.5f), float2(0.f, -0.5f), float2(0.f, -0.5f), float2(0.f, 0.f),
float2(0.5f, -1.f), float2(0.f, -1.f),
/*-*/
float2(0.5f, -0.5f), float2(0.f, -0.5f),    
/*.*/
float2(0.8f, -0.9f), float2(0.9f, -1.f),
};

cbuffer ShaderDebugDrawData : register(b0)
{
float4x4 Projection;
float4x4 ViewProjection;
};

struct vsOut
{
float4 Pos : SV_Position;
float3 Color : TexCoord0;
};

StructuredBuffer ShaderDebugStructuredBuffer : register(u0);
RWBuffer<uint> StructureCount : register(u1);

void DebugDrawDigit(int digit, float4 pos, inout LineStream GS_Out, float3 color)
{  
for (int i = DigitFontOffsets[digit]; i < DigitFontOffsets[digit+1] - 1; i+=2)
{
vsOut p;
p.Color = color;

p.Pos = pos + float4(DigitFont[i] * DigitFontScaling, 0, 0);
GS_Out.Append(p);

p.Pos = pos + float4(DigitFont[i +1] * DigitFontScaling, 0, 0);
GS_Out.Append(p);

GS_Out.RestartStrip();
}
}

float4 DebugDrawIntGS(int numberAbs, uint numdigit, float4 pos, inout LineStream GS_Out, float3 color)
{
while(numdigit > 0)
{
DebugDrawDigit(numberAbs % 10u , pos, GS_Out, color);
numberAbs /= 10u;
--numdigit;
pos.x -= DigitFontWidth;
}

return pos;
}

void DebugDrawFloatHelperGS(float number, float4 pos, inout LineStream GS_Out, float3 color, int totalDigits)
{
float numberAbs = abs(number);
uint intPart = (int)numberAbs;
uint intDigits = 0;

if(intPart > 0)
intDigits = (uint) log10 ((float) intPart) + 1;

uint fractDigits = max(0, totalDigits - intDigits);

// Get the fractional part 
uint fractPart = round(frac(numberAbs) * pow(10, (fractDigits-1)));

// Draw the fractional part
pos = DebugDrawIntGS(fractPart, fractDigits, pos, GS_Out, color * 0.5 /* make fractional part darker */);

// Draw the .
pos.x -= DigitFontWidth * 0.5;
DebugDrawDigit(11, pos, GS_Out, color);
pos.x += DigitFontWidth * 0.25;

// Draw the int part
if (numberAbs > 0)
{
pos = DebugDrawIntGS(intPart, intDigits, pos, GS_Out, color);
if (number < 0)
DebugDrawDigit(10 /* draw a minus sign */, pos, GS_Out, color);
}
}

vsOut VS(uint VertexID : SV_VertexID)
{
uint index = VertexID/2;

uint col = ShaderDebugStructuredBuffer[index].color;
uint flags = ShaderDebugStructuredBuffer[index].flag;

float3 pos;
if((VertexID & 1)==0) // we're processing the start of the line
pos = ShaderDebugStructuredBuffer[index].posStart;
else // we're processing the start of the line
pos = ShaderDebugStructuredBuffer[index].posEnd;

vsOut output = (vsOut)0;
output.Color = ShaderDebugColors[col];

if(flags & SHADER_DEBUG_FLAG_2D)
output.Pos = float4(pos.xy,0,1);
else if (flags & SHADER_DEBUG_FLAG_3D_VIEWSPACE)
output.Pos = mul( float4(pos.xyz,1.0) , Projection);
else // we just assume SHADER_DEBUG_FLAG_3D_WORLDSPACE otherwise
output.Pos = mul( float4(pos.xyz,1.0) , ViewProjection);

return output;
}

[numthreads(1,1,1)]
void CS(uint3 id : SV_DispatchThreadID)
{
StructureCount[0] *= 2;
StructureCount[1] = 1;
StructureCount[2] = 0;
StructureCount[3] = 0; 
}

float4 PS(vsOut input) : SV_Target0
{
return float4(input.Color, 1.0f);
}

// Worst case we print 3 floats... 4 digits per float plus we need 4 vertices for the . and -, and another four 4 for the cross
[maxvertexcount(3 * (4*(2*DigitFontMaxLinesPerDigit)+4) + 4)]
void GS(line vsOut gin[2], inout LineStream GS_Out, uint PrimitiveID : SV_PrimitiveID)
{
// We'll get two vertices, one primitive, out of the VS for each element in ShaderDebugStructuredBuffer...
// TODO: we could avoid reading ShaderDebugStructuredBuffer if we passed the number flag along from the VS
ShaderDebugLine dbgLine = ShaderDebugStructuredBuffer[PrimitiveID];

// If we got a line, then just re-emit the line coordinates
if((dbgLine.flag & SHADER_DEBUG_PRIM_MASKBITS) == SHADER_DEBUG_PRIM_LINE)
{
GS_Out.Append(gin[0]);
GS_Out.Append(gin[1]);
GS_Out.RestartStrip();

return;
}

float4 pos = gin[0].Pos;

// Draw cross
vsOut p;
p.Color = gin[0].Color;

p.Pos = pos + float4(DigitFontWidth*0.5,0,0,0);
GS_Out.Append(p);
p.Pos = pos + float4(-DigitFontWidth*0.5,0,0,0);
GS_Out.Append(p);
GS_Out.RestartStrip();

p.Pos = pos + float4(0,DigitFontWidth*0.5,0,0);
GS_Out.Append(p);
p.Pos = pos + float4(0,-DigitFontWidth*0.5,0,0);
GS_Out.Append(p);
GS_Out.RestartStrip();

// Draw the numbers, as lines
pos += float4(0,-DigitFontWidth*1.5,0,0);
float3 number = gin[1].Pos.xyz;

if ((dbgLine.flag & SHADER_DEBUG_PRIM_MASKBITS) == SHADER_DEBUG_PRIM_FLOAT1)
{
// Less floats drawn means we can afford more precision without exceeding maxvertexcount
DebugDrawFloatHelperGS(number.x, pos, GS_Out, gin[0].Color, 12);
}
else if ((dbgLine.flag & SHADER_DEBUG_PRIM_MASKBITS) == SHADER_DEBUG_PRIM_FLOAT2)
{
// Less floats drawn means we can afford more precision without exceeding maxvertexcount, 12/2 = 6 digits
DebugDrawFloatHelperGS(number.x, pos, GS_Out, gin[0].Color, 6);
pos.y -= DigitFontWidth * 2;
DebugDrawFloatHelperGS(number.y, pos, GS_Out, gin[0].Color, 6);
}
else //if ((dbgLine.flag & SHADER_DEBUG_PRIM_MASKBITS) == SHADER_DEBUG_PRIM_FLOAT3)
{
// 3*4 we draw 12 digits here...
DebugDrawFloatHelperGS(number.x, pos, GS_Out, gin[0].Color, 4);
pos.y -= DigitFontWidth * 2;
DebugDrawFloatHelperGS(number.y, pos, GS_Out, gin[0].Color, 4);
pos.y -= DigitFontWidth * 2;
DebugDrawFloatHelperGS(number.z, pos, GS_Out, gin[0].Color, 4);
}
}
12 Jul 13:13

Beat Boxing Takes the Big Stage in This Amazing Video

Jose L. Hidalgo

Epico el tio este

Submitted by: Unknown

Tagged: Music , BAMF , beat boxing , funny , Video , g rated , win
10 Jul 16:26

Seashell

This is roughly equivalent to 'number of times I've picked up a seashell at the ocean' / 'number of times I've picked up a seashell', which in my case is pretty close to 1, and gets much closer if we're considering only times I didn't put it to my ear.
09 Jul 19:15

mathimage #20: volcanic landscape

by admin

This one was, again, a completely improvised one.

It all started with a sphere, but after 30 minutes of exploration of a couple of random ideas, I had settled in some sort of a flat landscape full of caves (or, “empty with caves”?). From there I focused for two hours on adding a rocky feel to it, find some nice color/material distributions for the depths (yellow and browns) and the aerial bits in the top (grey and greens), through three lights (sun, sky, and bounce), add volumetric smoke and oranges for the lava, and call it a day. A very successful day! Cause, despite all the fakery involved here (which would make any computer graphics professional scream in pain, for these methodology and tools are exactly the opposite of how you are supposed to do things), the image is compelling and definitely complex for what it is: 300 lines of code/maths!

The realtime version with source code is here:

And a prerendered video (for those with slow computers) and with extra cinematic look, here:

09 Jul 09:43

SIMD transposes 1

by fgiesen

This one tends to show up fairly frequently in SIMD code: Matrix transposes of one sort or another. The canonical application is transforming data from AoS (array of structures) to the more SIMD-friendly SoA (structure of arrays) format: For concreteness, say we have 4 float vertex positions in 4-wide SIMD registers

  p0 = { x0, y0, z0, w0 }
  p1 = { x1, y1, z1, w1 }
  p2 = { x2, y2, z2, w2 }
  p3 = { x3, y3, z3, w3 }

and would really like them in transposed order instead:

  X = { x0, x1, x2, x3 }
  Y = { y0, y1, y2, y3 }
  Z = { z0, z1, z2, z3 }
  W = { w0, w1, w2, w3 }

Note that here and in the following, I’m writing SIMD 4-vectors as arrays of 4 elements here – none of this nonsense that some authors tend to do where they write vectors as “w, z, y, x” on Little Endian platforms. Endianness is a concept that makes sense for numbers and no sense at all for arrays, which SIMD vectors are, but that’s a rant for another day, so just be advised that I’m always writing things in the order that they’re stored in memory.

Anyway, transposing vectors like this is one application, and the one I’m gonna stick with for the moment because it “only” requires 4×4 values, which are the smallest “interesting” size in a certain sense. Keep in mind there are other applications though. For example, when implementing 2D separable filters, the “vertical” direction (filtering between rows) is usually easy, whereas “horizontal” (filtering between columns within the same register) is trickier – to the point that it’s often faster to transpose, perform a vertical filter, and then transpose back. Anyway, let’s not worry about applications right now, just trust me that it tends to come up more frequently than you might expect. So how do we do this?

One way to do it

The method I see most often is to try and group increasingly larger parts of the result together. For example, we’d like to get “x0″ and “x1″ adjacent to each other, and the same with “x2″ and “x3″, “y0″ and “y1″ and so forth. The canonical way to do this is using the “unpack” (x86), “merge” (PowerPC) or “unzip” (ARM NEON) intrinsics. So to bring “x0″ and “x1″ together in the right order, we would do:

  a0 = interleave32_lft(p0, p1) = { x0, x1, y0, y1 }

where interleave32_lft (“interleave 32-bit words, left half”) corresponds to UNPCKLPS (x86, floats), PUNPCKLDQ (x86, ints), or vmrghw (PowerPC). And to be symmetric, we do the same thing with the other half, giving us:

  a0 = interleave32_lft(p0, p1) = { x0, x1, y0, y1 }
  a1 = interleave32_rgt(p0, p1) = { z0, z1, w0, w1 }

where interleave32_rgt corresponds to UNPCKHPS (x86, floats), PUNPCKHDQ (x86, ints), or vmrglw (PowerPC). The reason I haven’t mentioned the individual opcodes for NEON is that their “unzips” always work on pairs of registers and handle both the “left” and “right” halves at once, forming a combined

  (a0, a1) = interleave32(p0, p1)

operation (VUZP.32) that also happens to be a good way to thing about the whole operation on other architectures – even though it is not the ideal way to perform transposes on NEON, but I’m getting ahead of myself here. Anyway, again by symmetry we then do the same process with the other two rows, yielding:

  // (a0, a1) = interleave32(p0, p1)
  // (a2, a3) = interleave32(p2, p3)
  a0 = interleave32_lft(p0, p1) = { x0, x1, y0, y1 }
  a1 = interleave32_rgt(p0, p1) = { z0, z1, w0, w1 }
  a2 = interleave32_lft(p2, p3) = { x2, x3, y2, y3 }
  a3 = interleave32_rgt(p2, p3) = { z2, z3, w2, w3 }

And presto, we now have all even-odd pairs nicely lined up. Now we can build X by combining the left halves from a0 and a2. Their respective right halves also combine into Y. So we do a similar process like before, only this time we’re working on groups that are pairs of 32-bit values – in other words, we’re really dealing with 64-bit groups:

  // (X, Y) = interleave64(a0, a2)
  // (Z, W) = interleave64(a1, a3)
  X = interleave64_lft(a0, a2) = { x0, x1, x2, x3 }
  Y = interleave64_rgt(a0, a2) = { y0, y1, y2, y3 }
  Z = interleave64_lft(a1, a3) = { z0, z1, z2, z3 }
  W = interleave64_rgt(a1, a3) = { w0, w1, w2, w3 }

This time, interleave64_lft (interleave64_rgt) correspond to MOVLHPS (MOVHLPS) for floats on x86, PUNPCKLQDQ (PUNPCKHQDQ) for ints on x86, or VSWP of d registers on ARM NEON. PowerPCs have no dedicated instruction for this but can synthesize it using vperm. The variety here is why I use my own naming scheme in this article, by the way.

Anyway, that’s one way to do it with interleaves. There’s more than one, however!

Interleaves, variant 2

What if, instead of interleaving p0 with p1, we pair it with p2 instead? By process of elimination, that means we have to pair p1 with p3. Where does that lead us? Let’s find out!

  // (b0, b2) = interleave32(p0, p2)
  // (b1, b3) = interleave32(p1, p3)
  b0 = interleave32_lft(p0, p2) = { x0, x2, y0, y2 }
  b1 = interleave32_lft(p1, p3) = { x1, x3, y1, y3 }
  b2 = interleave32_rgt(p0, p2) = { z0, z2, w0, w2 }
  b3 = interleave32_rgt(p1, p3) = { z1, z3, w1, w3 }

Can you see it? We have four nice little squares in each of the quadrants now, and are in fact just one more set of interleaves away from our desired result:

  // (X, Y) = interleave32(b0, b1)
  // (Z, W) = interleave32(b2, b3)
  X = interleave32_lft(b0, b1) = { x0, x1, x2, x3 }
  Y = interleave32_rgt(b0, b1) = { y0, y1, y2, y3 }
  Z = interleave32_lft(b2, b3) = { z0, z1, z2, z3 }
  W = interleave32_rgt(b2, b3) = { w0, w1, w2, w3 }

This one uses just one type of interleave instruction, which is preferable if you the 64-bit interleaves don’t exist on your target platform (PowerPC) or would require loading a different permutation vector (SPUs, which have to do the whole thing using shufb).

Okay, both of these methods start with a 32-bit interleave. What if we were to start with a 64-bit interleave instead?

It gets a bit weird

Well, let’s just plunge ahead and start by 64-bit interleaving p0 and p1, then see whether it leads anywhere.

  // (c0, c1) = interleave64(p0, p1)
  // (c2, c3) = interleave64(p2, p3)
  c0 = interleave64_lft(p0, p1) = { x0, y0, x1, y1 }
  c1 = interleave64_rgt(p0, p1) = { z0, w0, z1, w1 }
  c2 = interleave64_lft(p2, p3) = { x2, y2, x3, y3 }
  c3 = interleave64_rgt(p2, p3) = { z2, w2, z3, w3 }

Okay. For this one, we can’t continue with our regular interleaves, but we still have the property that each of our target vectors (X, Y, Z, and W) can be built using elements from only two of the c’s. In fact, the low half of each target vector comes from one c and the high half from another, which means that on x86, we can combine the two using SHUFPS. On PPC, there’s always vperm, SPUs have shufb, and NEON has VTBL, all of which are much more general, so again, it can be done there as well:

  // 4 SHUFPS on x86
  X = { c0[0], c0[2], c2[0], c2[2] } = { x0, x1, x2, x3 }
  Y = { c0[1], co[3], c2[1], c2[3] } = { y0, y1, y2, y3 }
  Z = { c1[0], z1[2], c3[0], c3[2] } = { z0, z1, z2, z3 }
  W = { c1[1], c1[3], c3[1], c3[3] } = { w0, w1, w3, w3 }

As said, this one is a bit weird, but it’s the method used for _MM_TRANSPOSE4_PS in Microsoft’s version of Intel’s emmintrin.h (SSE intrinsics header) to this day, and used to be the standard implementation in GCC’s version as well until it got replaced with the first method I discussed.

Anyway, that was starting by 64-bit interleaving p0 and p1. Can we get it if we interleave with p2 too?

The plot thickens

Again, let’s just try it!

  // (c0, c2) = interleave64(p0, p2)
  // (c1, c3) = interleave64(p1, p3)
  c0 = interleave64_lft(p0, p2) = { x0, y0, x2, y2 }
  c1 = interleave64_lft(p1, p3) = { x1, y1, x3, y3 }
  c2 = interleave64_rgt(p0, p2) = { z0, w0, z2, w2 }
  c3 = interleave64_rgt(p1, p3) = { z1, w1, z3, w3 }

Huh. This one leaves the top left and bottom right 2×2 blocks alone and swaps the other two. But we still got closer to our goal – if we swap the top right and bottom left element in each of the four 2×2 blocks, we have a full transpose as well. And NEON happens to have an instruction for that (VTRN.32). As usual, the other platforms can try to emulate this using more general shuffles:

  // 2 VTRN.32 on NEON:
  // (X, Y) = vtrn.32(c0, c1)
  // (Z, W) = vtrn.32(c2, c3)
  X = { c0[0], c1[0], c0[2], c1[2] } = { x0, x1, x2, x3 }
  Y = { c0[1], c1[1], c0[3], c1[3] } = { y0, y1, y2, y3 }
  Z = { c2[0], c3[0], c2[2], c3[2] } = { z0, z1, z2, z3 }
  W = { c2[1], c3[1], c2[3], c3[3] } = { w0, w1, w2, w3 }

Just like NEON’s “unzip” instructions, VTRN both reads and writes two registers, so it is in essence doing the work of two instructions on the other architectures. Which means that we now have 4 different methods to do the same thing that are essentially the same cost in terms of computational complexity. Sure, some methods end up faster than others on different architectures due to various implementation choices, but really, in essence none of these are fundamentally more difficult (or easier) than the others.

Nor are these the only ones – for the last variant, we started by swapping the 2×2 blocks within the 4×4 matrix and then transposing the individual 2×2 blocks, but doing it the other way round works just as well (and is again the same cost). In fact, this generalizes to arbitrary power-of-two sized square matrices – you can just partition it into differently sized block transposes which can run in any order. This even works with rectangular matrices, with some restrictions. (A standard way to perform “chunky to planar” conversion for old bit plane-based graphics architectures uses this general approach to good effect).

And now?

Okay, so far, we have a menagerie of different matrix transpose techniques, all of which essentially have the same complexity. If you’re interested in SIMD coding, I suppose you can just use this as a reference. However, that’s not the actual reason I’m writing this; the real reason is that the whole “why are these things all essentially the same complexity” thing intrigued me, so a while back I looked into this and found out a whole bunch of cool properties that are probably not useful for coding at all, but which I nevertheless found interesting. In other words, I’ll write a few more posts on this topic, which I will spend gleefully nerding out with no particular goal whatsoever. If you don’t care, just stop reading now. You’re welcome!


08 Jul 19:58

OpenGL 4.3 Pipeline Map available

For OpenGL Insights, we created an OpenGL 4.2 and OpenGL ES 2.0 pipeline map. Patrick Cozzi asked me few times to update it for OpenGL 4.3 and OpenGL ES 3.0 but obviously I found interest in making it only to make it a lot more detailled considering that it would not be for printing so it could be as big as I see fit. This endeavour represented a significant amount of work but finally I have completed the OpenGL 4.3 Pipeline Map.

Wonder why Larrabee for graphics failed? I guess when we see all this fixed function functionalities that hardware vendor can implement very effectively, it gives us some clues.

Enjoy and send me your feedback for improvements!

01 Jul 05:22

Deferred SSDO

Hello! I've been reading this page recently,
http://kayru.org/articles/dssdo/

 

The idea is quite straightforward and it's quite easy to replace existing ssao implementation. However, one thing really confused me was that the author said, 'the local occlusion function can be reconstructed by the dot product of occlusion coefficients and light vector'?

AFAIK, if calculating diffuse lighting, we also need to project incoming light to the shpere, then do a dot product between both coefficients.

 

Just wondering why a simple dot product between occlusion coefficient and light vector is sufficient in this case. Anyone can help explain the math behind it? Or is there any paper describing the detail/trick?

Thanks.

28 Jun 13:26

La controvertida colocación del supositorio

by Shora
Jose L. Hidalgo

... por si os vienen las dudas, yo me quedo con el final, ...

... hay que sacar los supositorios del blister antes de meterlos por el ano.

Publicado en MedTempus - Blog de medicina y salud.

Supositorios 400x264 La controvertida colocación del supositorio¿Cuál es la forma correcta de introducir un supositorio? ¿Por el extremo afilado o por el plano? Esta trascendental y vital cuestión fue protagonista indiscutible en un programa de El Intermedio de hace unos meses. Estuvieron medio programa hablando con humor sobre supositorios, entrevistando a una farmacéutica y realizando encuestas a nivel nacional e internacional para ver cómo pensaba la gente que se introducían adecuadamente y la opinión que tenían sobre ellos.

El resultado de esas encuestas fue el esperado, todos los que aparecieron (tanto en España como en el Inglaterra) pensaban que era de sentido común insertar el supositorio por el extremo afilado, para facilitar su paso por el recto. Sin embargo, la farmacéutica entrevistada argumentaba que la mejor forma de colocarlo era por el extremo plano, para que, de esa forma, los esfínteres del recto, en contacto con la parte afilada, empujaran hacia arriba el supositorio y se colocase así de forma más profunda, facilitando la absorción de su principio activo. Esta afirmación es la mayoritaria entre los profesionales sanitarios de todo el mundo en la actualidad y parece estar regida por la ciencia… Sin embargo, como veremos a continuación la historia dista de ser tan sencilla.

El origen de los supositorios es ciertamente antiguo y se tiene constancia de que los antiguos egipcios, griegos y romanos ya utilizaban este sistema. Aunque fue a partir de finales del siglo XIX cuando su uso se extendió a la población por fabricarse a gran escala y comenzó a establecerse como estándar el familiar modelo torpedo, cuyo diseñador (Henry S. Wellcome) sugería su introducción por el extremo afilado. Y, así, desde los comienzos mismos de esta forma característica de supositorio, su colocación fue muy unida al sentido común y muy poco, por no decir nada, a la ciencia.

Durante alrededor de un siglo, tanto la población general como los profesionales sanitarios recomendaron y aplicaron su introducción en el recto por el extremo afilado, al dictado de la ciencia infusa y la intuición… Hasta que llegó el año 1991 y la recomendación general imperante dio un giro de 180 grados, tanto en el sentido metafórico como en el literal. ¿Qué gran acontecimiento ocurrió ese año? En 1991, apareció publicado un pequeño estudio en The Lancet con el título de “Supositorio rectal: sentido común y modo de inserción” y rompió todos los esquemas en cuanto al uso de tan glamuroso invento farmacológico. La primera parte del estudio, más irrelevante, consistía en una encuesta realizada a 620 hombres y mujeres egipcios, de muy diferentes edades y nivel educativo, sobre con qué extremo introducían el supositorio por el recto. Como era de esperar, todos menos 2 personas afirmaron que lo insertaban por el extremo afilado y la explicación mayoritaria era “por sentido común”.

En la segunda parte del estudio, la más interesante, realizaron un experimento con 100 pacientes a los que se les solicitaba que se introdujeran la primera vez el supositorio por el extremo afilado y la segunda vez que se fueran a colocar otro supositorio, lo hicieran por el extremo plano. Los resultados que obtuvieron los investigadores fueron que al introducir el supositorio por el extremo plano, era mucho menos probable tener que introducir el dedo por el ano para colocarlo bien (sólo el 1 % de los pacientes con este método tenían que recurrir a ello, frente al 83 % del método convencional con el extremo afilado). Esto, ciertamente, hace un poquito menos indigna y engorrosa la aplicación de un supositorio, especialmente si es un profesional sanitario el que tiene que colocárselo a otra persona. Además de este resultado, también observaron que había menos probabilidades de una expulsión involuntaria del supositorio mediante la inserción del extremo plano (0 %, frente a un 3 % del extremo afilado) y propusieron para este fenómeno la explicación de que el esfínter anal empujaba el extremo afilado hacia arriba, asegurando su retención.

Ante estos hallazgos los autores aconsejaron, por tanto, usar el extremo plano para introducir los supositorios. Así, esta contraintuitiva información se dispersó como la pólvora por toda la comunidad sanitaria. Revistas clínicas, libros de texto sanitarios, guías y recomendaciones en todo el mundo incorporaron en pocos años la nueva recomendación… Hasta el día de hoy, que sigue siendo la medida imperante en la comunidad sanitaria, pese a que la población general no se ha empapado de dicha información todavía.

Sin embargo, este no es, ni mucho menos, el final de nuestra historia sobre el supositorio y su controvertida colocación. Desde que el citado estudio en The Lancet fue publicado no se han vuelto a replicar dichos resultados en nuevos estudios (no imagino el porqué, con la vital importancia que tiene para la medicina saber de qué extremo colocarse un supositorio por el culo…) y sí que han aparecido varias revisiones sobre el tema, criticando con dureza al artículo original por varios errores, imprecisiones, fallos en la metodología… El artículo más crítico y contundente contra el original de The Lancet es “Inserción del supositorio rectal: fiabilidad de la evidencia como base para la práctica enfermera”, publicado en papel en el año 2007. En él ponen de manifiesto muchos datos que no se tuvieron en cuenta en su día como por ejemplo:

-Si se inserta un supositorio de efecto local por el extremo plano, utilizando el esfínter anal como ayuda en la inserción y desplazamiento hacia arriba, no existen garantías de que quede en contacto con la pared intestinal, dificultando o imposibilitando la liberación y absorción del principio activo, haciendo esta intervención no sólo desagradable sino también inútil.

-Muchos fabricantes de supositorios advierten en las licencias de sus productos que su inserción recomendada es a través del extremo afilado (pues fue así cuando se testaron en su día por ensayos clínicos). Una minoría de fabricantes no especifica de qué manera deben colocarse, mientras que alguno recomienda su inserción por el extremo plano. Por tanto, podrían existir problemas legales por usar el extremo plano de los supositorios para su inserción, ya que el fabricante no se haría responsable de ellos si no se aplican de la forma en la que él lo define.

-La supuesta explicación de que la fisiología rectal podría ayudar a retener mejor los supositorios si se insertan por su extremo plano es una hipótesis que no ha sido todavía demostrada.

-El estudio de The Lancet fue un estudio pequeño, con bastantes limitaciones y muy poco detallado y, entre los abundantes datos que no se muestran son el tipo de supositorios que se administraron, la indicación para la que se utilizaron y si se dio el efecto deseado. Es un factor importante ya que no es lo mismo aplicar supositorios de acción local o sistémica.

Por todo ello, múltiples investigadores sanitarios se preguntan si realmente fue acertado cambiar una práctica tan extendida como insertar el supositorio por el extremo afilado, cuando el único estudio existente defensor del extremo plano, arranca tantas dudas en su validez y fortaleza. Y se quejan, además, de que haya existido tan poca crítica frente a un estudio pequeño y muy limitado. Alguna enfermera va más allá y concluye lo siguiente:

En ausencia de una evidencia concluyente para recomendar un método particular de inserción del supositorio, parece que es necesario un enfoque por sentido común (Bradshaw and Price, 2006)

Aunque la idea de que un paciente pueda recibir una atención clínica que no esté basada en las mejores prácticas es inaceptable, las recomendaciones en la inserción del supositorio en los libros de texto de enfermería y los artículos han cambiado radicalmente siguiendo las sugerencias hechas por un pequeño ensayo clínico. Existe ambigüedad sobre lo que constituye “la mejor práctica basada en la evidencia” en la administración de los supositorios. Si la inserción por el extremo afilado o plano realmente importa, entonces puede afirmarse que se requiere urgentemente una investigación más extensa.

¿Veremos algún día un metaanálisis Cochrane sobre la opción recomendada para insertarse un supositorio? Mientras tanto, parece que sólo existe una cosa totalmente cristalina hoy en día en cuanto a la colocación de los supositorios: que hay que sacarlos del blíster antes de introducirlos por el ano.

Enlace al artículo La controvertida colocación del supositorio

25 Jun 17:36

Dirty Game Development Tricks

Stories of deadline-driven tricks and hacks. ...

22 Jun 06:51

3D scanning by calculating the focus of each pixel

by Mike Szczys

calculating-focus-to-generate-depth-map

We understand the concept [Jean] used to create a 3D scan of his face, but the particulars are a bit beyond our own experience. He is not using a dark room and laser line to capture slices which can be reassembled later. Nope, this approach uses pictures taken with several different focal lengths.

The idea is to process the photos using luminance. It looks at a pixel and it’s neighbors, subtracting the luminance and summing the absolute values to estimate how well that pixel is in focus. Apparently if you do this with the entire image, and a set of other images taken from the same vantage point with different focal lengths, you end up with a depth map of pixels.

What we find most interesting about this is the resulting pixels retain their original color values. So after removing the cruft you get a 3D scan that is still in full color.

If you want to learn more about laser-based 3D scanning check out this project.

[Thanks Luca]


Filed under: digital cameras hacks
11 Jun 08:57

WebGL raytracing presented at Visionday

by admin

We presented our WebGL raytracer at the Visionday 2013 event.

Material presented in the talk:
Slides (pdf)
Demo 1: Motorcycle
Demo 2: Cornell box Xmas theme

Usage
Use the left mouse button and the keys "wasdqe" to control the camera.
Use the right mouse button to select objects in the scene. The active object can be translated with the gizmo and the material can be changed in the column on the right hand side.

Requirements
You should ensure that you have a web browser that supports WebGL with the OES_texture_float extension.
If you use Windows you will need a recent version Firefox (v17 has been tested) or Chrome (v23 has been tested) due to some optimizations in the shader compiler in the Angle layer. Alternatively you have to enable native opengl in your browser (in Firefox open about:config and set webgl.prefer-native-gl=true, in Chrome use the command line argument --use-gl=desktop).



You can also load your own 3d models. This feature was used for our Xmas competition which was won by this nice image created by Jonas Raagaard.
10 Jun 14:05

Video: Collision detection in MDK2, NeverWinter Nights

Former BioWare lead 3D developer Stan Melax discusses the math and programming behind BSP collision detection used in MDK2 and NeverWinter Nights. ...

10 Jun 07:37

Video: Building SimCity's sandbox

The fun in sandbox games comes from discovering how the mechanics work, says Maxis' Dan Moskowitz, who shares lessons learned from building SimCity, in this free GDC 2013 video. ...

08 Jun 19:16

2013 Demo Tubes

by Timothy Lottes








28 May 09:55

Second impression of the Oculus Rift

by Robert
Jose L. Hidalgo

Para cuando llegue el oculus :)

After a long wait since I first had the chance to test the Oculus Rift, my own device from the original Kickstarter arrived. So here are some notes after a few hours of testing:

  • It’s more comfortable to wear than the initial prototype. Only downside: there is a spot where my nose supports some of the weight of the Rift and that spot is not covered with foam, so that hurts a bit after a while (but that can be easily fixed by gluing in some foam).
  • The changeable lenses are great, I’m near sighted and cup C works best without glasses for me. Sadly this means that I see a smaller part of the screen.
  • Wearing glasses won’t work for me: while they fit into the Rift, the glasses will touch the lenses even if the ‘clicky adjustment things’[1] are set to extend the Rift to the maximal distance. This could lead to scratches on the glasses and/or the lenses of the Rift…
  • The head tracking works great! For example: when you adjust the placement of the Rift on your nose, the rendered image stays static and you get the impression that the Rift is just a pair of binoculars that point to a large but static screen.
  • The resolution is too low: you can see pixels, the black gaps between them and also the RGB subpixels.
  • After 90 minutes of playing Half-Life 2 I got a bit motion sick while I had no problems with the other demos I tried.
  • Framerate matters a lot: The demos that run very smooth on a desktop PC that only make 30-40 FPS on a notebook feel very laggy inside of the Rift. The demand for low-latency rendering and update-loops is higher when using HMDs!
  • My first (not too serious) attempts on 3D video see-through were successful but not very practical ;-)
Oculus Rift video see through

Oculus Rift 3D stereo video see-through

If your Rift has already arrived or you can borrow one, I would recommend to test it with these apps:

  • Oculus Rift SDK Tuscany demo: It’s part of the SDK so you might have it already (given that you are a developer).
  • RiftRacer: A racing game with currently 3 test tracks. The nice thing about racing games is, that you expect to sit and control your vehicle somehow artificially with your hands. If a first person shooter gives you motion sickness, try a racing game (or similar like a flight simulator etc.).
  • Blue Marble: Listen to your music and float around in space. No interactions needed, just drift away and look around – probably the least problematic demo if you have motion sickness problems.
  • Half-Life 2: Set the launch option to ‘-vr’ in steam and activate the opt-in to the beta program and see how a big game looks in VR. The HUD is not working very well yet, but the game itself is already looking good.

The first three demos work on MacOS X and Windows.

Update 5/27/13: Regarding HL2, one problem with the HUD is the low contrast on the health display, you basically need to look into a dark area to read it. The second problem is that the weapon switch menu is out of my FoV, but Tom Forsyth gave the hint, that this can be adjusted by setting vr_hud_max_fov to 50 or 40.

[1] yes, that’s the official term [2] [2] probably not ;-) [3] [3] on the other hand, Tom Forsyth does work for Oculus now…
26 May 08:24

Cipher

by Oliver Widder
Maya
24 May 18:05

The anatomy of a memory manager

Fable developer Stewart Lynch focuses on C++ memory management for console game development, offering techniques for fragmentation and allocation. ...

22 May 20:29

Scratch-built desk adjusts so you may sit or stand

by Mike Szczys
Jose L. Hidalgo

Por si sois manitas...

custom-height-adjustable-battlestation

Knowing that this desk was built from scratch is pretty impressive. But the motorized legs that raise and lower the desk to any height really puts the project over the top.

Surprisingly this started off as a computer case project. [Loren] upgraded his hardware and couldn’t find a case that would organize it the way he liked. His desk at the time had a glass top and he figured, why not build a new base for the glass which would double as a computer case? From there the project took off as his notebook sketches blossomed into computer renderings which matured into the wooden frame seen above.

Much like the machined computer desk from last December this uses motorized legs to adjust the height of the desk. These cost about $50 each, and he used four of them. If you consider the cost of purchasing a desk this size (which would not have been motorized) he’s still not breaking the bank. This battlestation is now fully functional, but he does plan to add automated control of the legs at some point. We think that means that each has an individual adjustment control which he wants to tie into one controller to rule them all.


Filed under: home hacks
20 May 17:43

El Gobierno asegura que hoy no es lunes sino viernes en diferido y aplazado en el tiempo

by Xavi Puig
Jose L. Hidalgo

JAJAJAJAJAJA

cospedalTras escuchar las quejas de numerosos ciudadanos, que han lamentado a lo largo de toda la mañana que hoy fuera lunes, la secretaria general del Partido Popular, María Dolores de Cospedal, ha comparecido de urgencia, y con semblante serio, para “contradecir las últimas informaciones publicadas sobre el día en el que este país se encuentra en estos momentos”.

“España saldrá de esta semana con esfuerzo”, promete Cospedal

Cospedal, que no ha admitido preguntas de los periodistas, ha asegurado que hoy es viernes y que el fin de semana está cada vez más cerca de llegar a España porque el Gobierno y todos los españoles “estamos trabajando en esa dirección”.

“Frente a los rumores interesados que atacan a este Gobierno por encima de los intereses de nuestro país, debo decir que estamos ya en la recta final de esta semana”, ha aclarado, insistiendo en que “hoy es viernes en diferido, aplazado en el tiempo”.

La secretaria ha querido tranquilizar a la ciudadanía y ha animado a los españoles “a seguir luchando para salir de esta semana lo antes posible”. También ha mostrado un calendario de la semana, señalando la casilla del lunes 20 de mayo, y ha expresado su indignación “hacia unas publicaciones que solo pretenden dañar al PP”.

09 May 08:58

real time ray tracing part 2.

by directtovideo
Jose L. Hidalgo

fuck yeah

Here’s the science bit. Concentrate..

In the last post I gave an overview of my journey through realtime raytracing and how I ended up with a performant technique that worked in a production setting (well, a demo) and was efficient and useful. Now I’m going go into some more technical details about the approaches I tried and ended up using.

There’s a massive amount of research in raytracing, realtime raytracing, GPU raytracing and so on. Very little of that research ended up with the conclusions I did – discarding the kind of spatial database that’s considered “the way” (i.e. bounding volume hierarchies) and instead using something pretty basic and probably rather inefficient (regular grids / brick maps). I feel that conclusion needs some explanation, so here goes.

I am not dealing with the “general case” problem that ray tracers usually try and solve. Firstly, my solution was always designed as a hybrid with rasterisation. If a problem can be solved efficiently by rasterisation I don’t need to solve it with ray tracing unless it’s proved that it would work out much better that way. That means I don’t care about ray tracing geometry from the point of view of a pin hole camera: I can just rasterise it instead and render out GBuffers. The only rays I care about are secondary – shadows, occlusion, reflections, refractions – which are much harder to deal with via rasterisation. Secondly I’m able to limit my use case. I don’t need to deal with enormous 10 million poly scenes, patches, heavy instancing and so on. My target is more along the lines of a scene consisting of 50-100,000 triangles – although 5 Faces topped that by some margin in places – and a reasonably enclosed (but not tiny .. see the city in 5 Faces) area. Thirdly I care about data structure generation time. A lot. I have a real time fully dynamic scene which will change every frame, so the data structure needs to be refreshed every frame to keep up. It doesn’t matter if I can trace it in real time if I can’t keep the structure up to date. Forthly I have very limited scope for temporal refinement – I want a dynamic camera and dynamic objects, so stuff can’t just be left to refine for a second or two and keep up. And fifth(ly), I’m willing to sacrifice accuracy & quality for speed, and I’m mainly interested in high value / lower cost effects like reflections rather than a perfect accurate unbiased path trace. So this all forms a slightly different picture to what most ray tracers are aiming for.

Conventional wisdom says a BVH or kD-Tree will be the most efficient data structure for real time ray tracing – and wise men have said that BVH works best for GPU tracing. But let’s take a look at BVH in my scenario:
– BVH is slow to build, at least to build well, and building on GPU is still an open area of research.
– BVH is great at quickly rejecting rays that start at the camera and miss everything. However, I care about secondary rays cast off GBuffers: essentially all my rays start on the surface of a mesh, i.e. at the leaf node of a BVH. I’d need to walk down the BVH all the way to the leaf just to find the cell the ray starts in – let alone where it ends up.
– BVH traversal is not that kind to the current architecture of GPU shaders. You can either implement the traversal using a stack – in which case you need a bunch of groupshared memory in the shader, which hammers occupancy. Using groupshared, beyond a very low limit, is bad news mmkay? All that 64k of memory is shared between everything you have in flight at once. The more you use, the less in flight. If you’re using a load of groupshared to optimise something you better be smart. Smart enough to beat the GPU’s ability to keep a lot of dumb stuff in flight and switch between it. Fortunately you can implement BVH traversal using a branching linked list instead (pass / fail links) and it becomes a stackless BVH, which works without groupshared.
But then you hit the other issue: thread divergence. This is a general problem with SIMD ray tracing on CPU and GPU: if rays executed inside one SIMD take different paths through the structure, their execution diverges. One thread can finish while others continue, and you waste resources. Or, one bad ugly ray ends up taking a very long time and the rest of the threads are idle. Or, you have branches in your code to handle different tree paths, and your threads inside a single wavefront end up hitting different branches continually – i.e. you pay the total cost for all of it. Dynamic branches, conditional loops and so on can seriously hurt efficiency for that reason.
– BVH makes it harder to modify / bend rays in flight. You can’t just keep going where you were in your tree traversal if you modify a ray – you need to go back up to the root to be accurate. Multiple bounces of reflections would mean making new rays.

All this adds up to BVH not being all that good in my scenario.

So, what about a really really dumb solution: storing triangle lists in cells in a regular 3D grid? This is generally considered a terrible structure because:
– You can’t skip free space – you have to step over every cell along the ray to see what it contains; rays take ages to work out they’ve hit nothing. Rays that hit nothing are actually worse than rays that do hit, because they can’t early out.
– You need a high granularity of cells or you end up with too many triangles in each cell to be efficient, but then you end up making the first problem a lot worse (and needing lots of memory etc).

However, it has some clear advantages in my case:
– Ray marching voxels on a GPU is fast. I know because I’ve done it many times before, e.g. for volumetric rendering of smoke. If the voxel field is quite low res – say, 64x64x64 or 128x128x128 – I can march all the way through it in just a few milliseconds.
– I read up on the DDA algorithm so I know how to ray march through the grid properly, i.e. visit every cell along the ray exactly once 🙂
– I can build them really really fast, even with lots of triangles to deal with. To put a triangle mesh into a voxel grid all I have to do is render the mesh with a geometry shader, pass the triangle to each 2D slice it intersects, then use a UAV containing a linked list per cell to push out the triangle index on to the list for each intersected cell.
– If the scene isn’t too high poly and spread out kindly, I don’t have too many triangles per cell so it intersects fast.
– There’s hardly any branches or divergence in the shader except when choosing to check triangles or not. All I’m doing is stepping to next cell, checking contents, tracing triangles if they exist, stepping to next cell. If the ray exits the grid or hits, the thread goes idle. There’s no groupshared memory requirement and low register usage, so lots of wavefronts can be in flight to switch between and eat up cycles when I’m waiting for memory accesses and so on.
– It’s easy to bounce a ray mid-loop. I can just change direction, change DDA coefficients and keep stepping. Indeed it’s an advantage – a ray that bounces 10 times in quick succession can follow more or less the same code path and execution time as a ray that misses and takes a long time to exit. They still both just step, visit cells and intersect triangles; it’s just that one ray hits and bounces too.

Gratuitous screenshot from 5 Faces

Gratuitous screenshot from 5 Faces

So this super simple, very poor data structure is actually not all that terrible after all. But it still has some major failings. It’s basically hard limited on scene complexity. If I have too large a scene with too many triangles, the grid will either have too many triangles per cell in the areas that are filled, or I’ll have to make the grid too high res. And that burns memory and makes the voxel marching time longer even when nothing is hit. Step in the sparse voxel octree (SVO) and the brick map.

Sparse voxel octrees solve the problem of free space management by a) storing a multi-level octree not a flat grid, and b) only storing child cells when the parent cells are filled. This works out being very space-efficient. However the traversal is quite slow; the shader has to traverse a tree to find any leaf node in the structure, so you end up with a problem not completely unlike BVH tracing. You either traverse the whole structure at every step along the ray, which is slow; or use a stack, which is also slow and makes it hard to e.g. bend the ray in flight. Brick maps however just have two discrete levels: a low level voxel grid, and a high level sparse brick map.

In practice this works out as a complete voxel grid (volume texture) at say 64x64x64 resolution, where each cell contains a uint index. The index either indicates the cell is empty, or it points into a buffer containing the brick data. The brick data is a structured buffer (or volume texture) split into say 8x8x8 cell bricks. The bricks contain uints pointing at triangle linked lists containing the list of triangles in each cell. When traversing this structure you step along the low res voxel grid exactly as for a regular voxel grid; when you encounter a filled cell you read the brick, and step along that instead until you hit a cell with triangles in, and then trace those.

The key advantage over an SVO is that there’s only two levels, so the traversal from the top down to the leaf can be hard coded: you read the low level cell at your point in space, see if it contains a brick, look up the brick and read the brick cell at your point in space. You don’t need to branch into a different block of code when tracing inside a brick either – you just change the distance you step along the ray, and always read the low level cell at every iteration. This makes the shader really simple and with very little divergence.

Brick map generation in 2D

Brick map generation in 2D

Building a brick map works in 3 steps and can be done sparsely, top down:
– Render the geometry to the low res voxel grid. Just mark which cells are filled;
– Run over the whole grid in a post process and allocate bricks to filled low res cells. Store indices in the low res grid in a volume texture
– Render the geometry as if rendering to a high res grid (low res size * brick size); when filling in the grid, first read the low res grid, find the brick, then find the location in the brick and fill in the cell. Use a triangle linked list per cell again. Make sure to update the linked list atomically. 🙂

The voxel filling is done with a geometry shader and pixel shader in my case – it balances workload nicely using the rasteriser, which is quite difficult to do using compute because you have to load balance yourself. I preallocate a brick buffer based on how much of the grid I expect to be filled. In my case I guess at around 10-20%. I usually go for a 64x64x64 low res map and 4x4x4 bricks for an effective resolution of 256x256x256. This is because it worked out as a good balance overall for the scenes; some would have been better at different resolutions, but if I had to manage different allocation sizes I ran into a few little VRAM problems – i.e. running out. The high resolution is important: it means I don’t have too many tris per cell. Typically it took around 2-10 ms to build the brick map per frame for the scenes in 5 Faces – depending on tri count, tris per cell (i.e. contention), tri size etc.

One other thing I should mention: where do the triangles come from? In my scenes the triangles move, but they vary in count per frame too, and can be generated on GPU – e.g. marching cubes – or can be instanced and driven by GPU simulations (e.g. cubes moved around on GPU as fluids). I have a first pass which runs through everything in the scene and “captures” its triangles into a big structured buffer. This works in my ubershader setup and handles skins, deformers, instancing, generated geometry etc. This structured buffer is what is used to generate the brick maps in one single draw call. Naturally you could split it up if you had static and dynamic parts, but in my case the time to generate that buffer was well under 1ms each frame (usually more like 0.3ms).

Key brick map advantages:
– Simple and fast to build, much like a regular voxel grid
– Much more memory-efficient than a regular voxel grid for high resolution grids
– Skips (some) free space
– Efficient, simple shader with no complex tree traversal necessary, and relatively little divergence
– You can find the exact leaf cell any point in space is in in 2 steps – useful for secondary rays
– It’s quite possible to mix dynamic and static content – prebake some of the brick map, update or append dynamic parts
– You can generate the brick map in camera space, world space, a moving grid – doesn’t really matter
– You can easily bend or bounce the ray in flight just like you could with a regular voxel grid. Very important for multi-bounce reflections and refractions. I can limit the shader execution loop by number of cells marched not by number of bounces – so a ray with a lot of quick local bounces can take as long as a ray that doesn’t hit anything and exits.

Gratuitous screenshot from 5 Faces

Gratuitous screenshot from 5 Faces

In conclusion: brick maps gave me a fast, efficient way of managing triangles for my real time raytracer which has a very particular use case and set of limitations. I don’t want to handle camera rays, only secondary rays – i.e. all of my rays start already on a surface. I don’t need to handle millions of triangles. I do need to build the structure very quickly and have it completely dynamic and solid. I want to be able to bounce rays. From a shader coding point of view, I want as little divergence as possible and some control over total execution time of a thread.

I don’t see it taking over as the structure used in Octane or Optix any time soon, but for me it definitely worked out.

07 May 20:36

real time ray tracing.

by directtovideo

It’s practically a tradition.

New hardware generation, new feature set. Ask the age old question: “is real time ray tracing practical yet?”. No, no it’s not is the answer that comes back every time.

But when I moved to Directx 11 sometime in the second half of 2011 I had the feeling that maybe this time it’d be different and the tide was changing. Ray tracing on GPUs in various forms has become popular and even efficient – be it in terms of signed distance field tracing in demos, sparse voxel octrees in game engines, nice looking WebGL path tracers, or actual proper in-viewport production rendering tracers like Brigade / Octane . So I had to try it.

My experience of ray tracing had been quite limited up til then. I had used signed distance field tracing in a 64k, some primitive intersection checking and metaball tracing for effects, and a simple octree-based voxel tracer, but never written a proper ray tracer to handle big polygonal scenes with a spatial database. So I started from the ground up. It didn’t really help that my experience of DX11 was quite limited too at the time, so the learning curve was steep. My initial goal was to render real time sub surface scattering for a certain particular degenerate case – something that could only be achieved effectively by path tracing – and using polygonal meshes with thin features that could not be represented effectively by distance fields or voxels – they needed triangles. I had a secondary goal too; we are increasingly using the demo tools to render things for offline – i.e. videos – and we wanted to be able to achieve much better render quality in this case, with the kind of lighting and rendering you’d get from using a 3d modelling package. We could do a lot with post processing and antialiasing quality but the lighting was hard limited – we didn’t have a secondary illumination method that worked with everything and gave the quality needed. Being able to raytrace the triangle scenes we were rendering would make this possible – we could then apply all kinds of global illumination techniques to the render. Some of those scenes were generated on GPU so this added an immediate requirement – the tracer should work entirely on GPU.

I started reading the research papers on GPU ray tracing. The major consideration for a triangle ray tracer is the data structure you use to store the triangles; a structure that allows rays to quickly traverse space and determine if, and what, they hit. Timo Aila and Samuli Laine in particular released a load of material on data structures for ray acceleration on GPUs, and they also released some source. This led into my first attempt: implementing a bounding volume hierarchy (BVH) structure. A BVH is a tree of (in thise case) axis aligned bounding boxes. The top level box encloses the entire scene, and at each step down the tree the current box is split in half at a position and axis determined by some heuristic. Then you put the triangles in each half depending on which one they sit inside, then generate two new boxes that actually enclose their triangles. Those boxes contain nodes and you recurse again. BVH building was a mystery to me until I read their stuff and figured out that it’s not actually all that complicated. It’s not all that fast either, though. The algorithm is quite heavyweight so a GPU implementation didn’t look trivial – it had to run on CPU as a precalc and it took its time. That pretty much eliminated the ability to use dynamic scenes. The actual tracer for the BVH was pretty straightforward to implement in pixel or compute shader.

Finally for the first time I could actually ray trace a polygon mesh efficiently and accurately on GPU. This was a big breakthrough – suddenly a lot of things seemed possible. I tried stuff out just to see what could be done, how fast it would run etc. and I quickly came to an annoying conclusion – it wasn’t fast enough. I could trace a camera ray per pixel at the object at a decent resolution in a frame, but if it was meant to bounce or scatter and I tried to handle that it got way too slow. If I spread the work over multiple frames or allowed it seconds to run I could achieve some pretty nice results, though. The advantages of proper ambient occlusion, accurate sharp shadow intersections with no errors or artefacts, soft shadows from area lights and so on were obvious.

An early ambient occlusion ray tracing test

An early ambient occlusion ray tracing test

Unfortunately just being able to ray trace wasn’t enough. To make it useful I needed a lot of rays, and a lot of performance. I spent a month or so working on ways to at first speed up the techniques I was trying to do, then on ways to cache or reduce workload, then on ways to just totally cheat.

Eventually I had a solution where every face on every mesh was assigned a portion of a global lightmap, and all the bounce results were cached in a map per bounce index. The lightmaps were intentionally low resolution, meaning fewer rays, and I blurred it regularly to spread out and smooth results. The bounce map was also heavily temporally smoothed over frames. Then for the final ray I traced out at full resolution into the bounce map so I kept some sharpness. It worked..

Multiple-bounce GI using a light map to cache - bounce 1
Multiple-bounce GI using a light map to cache - bounce 2
Multiple-bounce GI using a light map to cache - bounce 3

.. But it wasn’t all that quick, either. It relied heavily on lots of temporal smoothing & reprojection, so if anything moved it took an age to update. However this wasn’t much of a problem because I was using a single BVH built on CPU – i.e. it was completely static. That wasn’t going to do.

At this point I underwent something of a reboot and changed direction completely. Instead of a structure that was quite efficient to trace but slow to build (and only buildable on CPU), I moved to a structure that was as simple to build as I could possibly think of: a voxel grid, where each cell contains a list of triangles that overlap it. Building it was trivial: you can pretty much just render the mesh into the grid and use a UAV to write out the triangle indices of triangles that intersect the voxels they overlapped. Tracing it was trivial too – just ray march the voxels, and if the voxel contains triangles then trace the triangles in it. Naturally this was much less efficient to trace than BVH – you could march over multiple cells that contain the same triangles and had to test them again, and you can’t skip free space at all, you have to trace every voxel. But it meant one important thing: I could ray trace dynamic scenes. It actually worked.

At this point we started work on an ill fated demo for Revision 2012 which pushed this stuff into actual production.

Ray tracing - unreleased Revision demo, 2012

Ray tracing - unreleased Revision demo, 2012

 

 
 
 
 
 
 
 
 

It was here we hit a problem. This stuff was, objectively speaking, pretty slow and not actually that good looking. It was noisy, and we needed loads of temporal smoothing and reprojection so it had to move really slowly to look decent. Clever though it probably was, it wasn’t actually achieving the kind of results that made it stand up well enough on its own to justify the simple scenes we were limited to being able to achieve with it. That’s a hard lesson to learn with effect coding: no matter how clever the technique, how cool the theory, if it looks like a low resolution baked light map but takes 50ms every frame to do then it’s probably not worth doing, and the audience – who naturally finds it a lot harder than the creator of the demo to know what’s going on technically – is never going to “get it” either. As a result production came to a halt and in the end the demo was dropped; we used the violinist and the soundtrack as the intro sequence for Spacecut (1st place at Assembly 2012) instead with an entirely different and much more traditional rendering path.

The work I did on ray tracing still proved useful – we got some new tech out of it, it taught me a lot about compute, DX11 and data structures, and we used the BVH routine for static particle collisions for some time afterwards. I also prototyped some other things like reflections with BVH tracing. And here my ray tracing journey comes to a close.

Ray tracing - unreleased Revision demo, 2012

Ray tracing – unreleased Revision demo, 2012

Ray tracing - unreleased Revision demo, 2012

Ray tracing – unreleased Revision demo, 2012

 
 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 
 
 
 
 
 
 

.. Until the end of 2012.

In the interim I had been working on a lot of techniques involving distance field meshing, fluid dynamics and particle systems, and also volume rendering techniques. Something that always came up was that the techniques typically involved discretising things onto a volume grid – or at least storing lists in a volume grid. The limitation was the resolution of the grid. Too low and it didn’t provide enough detail or had too much in each cell; too high and it ate too much memory and performance. This became a brick wall to making these techniques work.

One day I finally hit on a solution that allowed me to use a sparse grid or octree for these structures. This meant that the grid represented space with a very low resolution volume and then allowed each cell to be subdivided and refined in a tree structure like an octree – but only in the parts of the grid that actually contained stuff. Previously I had considered using these structures but could only build them bottom-up – i.e. start with the highest resolution, generate all the data then optimise into a sparse structure. That didn’t help when it came to trying to build the structure in low memory fast and in realtime – I needed to build it top down, i.e. sparse while generating. This is something I finally figured out, and it proved a solution to a whole bunch of problems.

Around that time I was reading up on sparse voxel octrees and I was wondering if it was actually performant – whether you could use it to ray trace ambient occlusion etc for realtime in a general case. Then I thought – why not put triangles in the leaf nodes so I could trace triangles too? The advantages were clear – fast realtime building times like the old voxel implementation, but with added space skipping when raytraced – and higher resolution grids so the cells contained less triangles. I got it working and started trying some things out. A path tracer, ambient occlusion and so on. Performance was showing a lot more potential. It also worked with any triangle content, including meshes that I generated on GPU – e.g. marching cubes, fluids etc.

At this point I made a decision about design. The last time I tried to use a tracer in a practical application didnt work out because I aimed for something a) too heavy and b) too easy to fake with a lightmap. If I was going to show it it needed to show something that a) couldn’t be done with a lightmap or be baked or faked easily and b) didn’t need loads of rays. So I decided to focus on reflections. Then I added refractions into the mix and started working on rendering some convincing glass. Glass is very hard to render without a raytracer – the light interactions and refraction is really hard to fake. It seemed like a scenario where a raytracer could win out and it’d be obvious it was doing something clever.

Over time, sparse voxel octrees just weren’t giving me the performance I needed when tracing – the traversal of the tree structure was just too slow and complex in the shader – so I ended up rewriting it all and replacing it with a different technique: brick maps. Brick maps are a kindof special case of sparse voxels: you only have 2 levels: a complete low resolution level grid where filled cells contain pointers into an array of bricks. A brick is a small block of high resolution cells, e.g. 8x8x8 cells in a brick. So you have for example a 64x64x64 low res voxel map pointing into 8x8x8 bricks, and you have an effective resolution of 512x512x512 – but stored sparsely so you only need the memory requirements of a small % of the total. The great thing about this is, as well as being fast to build it’s also fast to trace. The shader only has to deal with two levels so it has much less branching and path divergence. This gave me much higher performance – around 2-3x the SVO method in many places. Finally things were getting practical and fast.

I started doing some proper tests. I found that I could take a reasonable scene – e.g. a city of 50,000 triangles – and build the data structure in 3-4 ms, then ray trace reflections in 6 ms. Adding in extra bounces in the reflection code was easy and only pushed the time up to around 10-12 ms. Suddenly I had a technique capable of rendering something that looked impressive – something that wasn’t going to be easily faked. Something that actually looked worth the time and effort it took.

Then I started working heavily on glass. Getting efficient raytracing working was only a small part of the battle; making a good looking glass shader even with the ray tracing working was a feat in itself. It took a whole lot of hacking, approximations and reading of maths to get a result.

The evolution of glass - 1

The evolution of glass – 1

The evolution of glass - 2

The evolution of glass – 2

The evolution of glass - 3

The evolution of glass – 3

The evolution of glass - final

The evolution of glass – final

After at last getting a decent result out of the ray tracer I started working on a demo for Revision 2013. At the time I was also working with Jani on a music video – the tail end of that project – so I left him to work on that and tried to do the demo on my own; sometimes doing something on your own is a valuable experience for yourself, if nothing else. It meant that I basically had no art whatsoever, so I went on the rob – begged stole and borrowed anything I could from my various talented artist friends, and filled in the gaps myself.

I was also, more seriously, completely without a soundtrack. Unfortunately Revision’s rules caused a serious headache: they don’t allow any GEMA-affiliated musicians to compete. GEMA affliated equates to “member of a copyright society” – which ruled out almost all the musicians I am friends with or have worked with before who are actually still active. Gargaj one day suggested to me, “why don’t you just ask this guy”, linking me to Cloudkicker – an amazing indie artist who happily appears to be anti copyright organisations and releases his stuff under “pay what you want”. I mailed him and he gave me the OK. Just hoped he would be OK with the result..

I spent around 3 weeks making and editing content and putting it all together. Making a demo yourself is hard. You’re torn between fixing code bugs or design bugs; making the shaders & effects look good or actually getting content on screen. It proved tough but educational. Using your own tool & engine in anger is always a good exercise, and this time a positive one: it never crashed once (except when I reset the GPU with some shader bug). It was all looking good..

.. until I got to Revision and tried it on the compo PC. I had tested on a high end Radeon and assumed the Geforce 680 in the compo PC would behave similarly. It didn’t. It was about 60% the performance in many places, and had some real problems with fillrate-heavy stuff (the bokeh DOF was slower than the raytracer..). The performance was terrible, and worse – it was erratic. Jumping between 30 and 60 in many places. Thankfully the kind Revision compo organisers (Chaos et al) let me actually sit and work on the compo PC and do my best to cut stuff around until it ran OK, and I frame locked it to 30.

And .. we won! (I was way too hung over to show up to the prize giving though.)

Demo here:

5 Faces by Fairlight feat. CloudKicker
[Youtube]

After Revision I started working on getting the ray tracer working in the viewport, refining on idle. Much more work to do here, but some initial tests with AO showed promise. Watch this space.

AO in viewport - 1 second refine

AO in viewport – 1 second refine

AO in viewport - 10 second refine

AO in viewport – 10 second refine

 

 

06 May 19:15

How the cello controls the game in Cello Fortress

by Joost van Dongen
Jose L. Hidalgo

qué crack el joost este :)

The most unique aspect of Cello Fortress is how a cellist does a live performance in front of an audience, while at the same time controlling a game. This is completely different from other music games, in which the musician usually plays on a fake plastic instrument, and even if he plays a real instrument, he does nothing but imitate an existing song. In most such other music games, there is hardly any real gameplay: just points based on how well you played the song.

Cello Fortress is a completely different affair: here the cellist is controlling a real game, with real choice and interaction. Depending on what his opponents do, the cellist plays different notes. The cellist can even do things like baiting the opponents with a certain attack and then switching to another.

So how does that work? What does the cellist need to do to trigger the various attacks? Check this trailer to see (and hear!) how it works:


Live video footage in the trailer shot by Zoomin.tv Games at the Indie Games Concert.

Here is an overview of the attacks as explained in the trailer:
  • Slow high notes: long range guns
  • Slow chords: homing missiles
  • Fast high notes: machine guns*
  • Fast chords: double machine guns*
  • Dissonant chords: flamethrowers
  • Slow low notes: create mines
  • Fast low notes: mines move towards the player
  • Special melody 1: obliterate left half of screen
  • Special melody 2: obliterate right half of screen
*Playing even faster notes increases the speed of the machine guns.

The key thing to realise, is that the first seven of these attacks allow the cellist to play many different styles, melodies and rhythms, and still achieve that attack. The number of possibilities with "slow high notes" is literally infinite. This is a crucial aspect to the game, since it allows the cellist to improvise in many different ways, keeping each match of Cello Fortress fresh and varied. Having so much freedom also allows an experienced cellist to play fluently from one attack to the other.

There is real gameplay and choice in this. For example, something I often do when playing the cello in Cello Fortress, is play something slow to dare players to get close to my cannons. As soon as they do, I switch to fast chords to damage them from short range.

The special melodies are each 8 notes and have been defined beforehand. The fun in these is that the attack is announced when the 4th note is played, but the damage is not actually done until the 8th note is played. Players who pay close attention can hear the attack coming after only two notes, and thus flee before it even happens.

I can play the melody faster or slower to make the attack happen earlier or later. From a gameplay perspective, one would assume I always attack as quickly as possible, but my goal is actually not purely to win: I want to entertain the players and the audience. So I sometimes deliberately let them live to give them a more fun experience. This can be seen around 1:33 in the trailer: I make the final note very long to allow that player to escape. Just like in a film, the best moments are not when the hero dies, but when he narrowly escapes.



These controls were specifically chosen because they combine music and control in a natural way. Achieving this was more difficult than it may seem. In my very first prototype, the cello simply shot one bullet for every note, and the direction of the bullet depended on the pitch of the note. This turned out to play horribly: whenever the players moved from the left to the right, the cellist had to play a scale from low to high. When they moved back, the notes also had to go back from high to low. This made it completely impossible to play anything that sounded like good music.

Another thing I tweaked a lot is the mapping of which pattern triggers which attack. The current controls work quite well on an emotional level: the attack is linked to the feeling of the music. Slow, low notes often sound quite tense and sad on a cello (especially with the specific types of melodies I personally usually play), and alternating between slow and fast notes creates an awesomely menacing atmosphere. This can be seen in the trailer from around 1:00. Creating tension this way works incredibly well: I performed with Cello Fortress in front of an audience of several hundred people at the Indie Games Concert, and the noises from the audience made it clear that they experienced the tension very strongly.

A note I should make on this trailer, is that in the real game, there is a slight delay between the music the cello plays and the moment the guns react to it. This is because analysing music in real-time takes a bit of time. To make the trailer more understandable, I have moved the sound a bit to make the music fit the gameplay exactly.

While I am already performing with it, I am also still working on Cello Fortress to improve it. So what is next? My focus for the coming period is first creating real graphics, and after that I want to add a couple more attacks for the cellist. In the meanwhile, I hope more events, venues and exhibits will contact me to perform with Cello Fortress! Check www.cellofortress.com for tour dates and contact info!
24 Apr 13:13

Mi último libro, gratis en Amazon durante 48 horas

by Borja Prieto

He publicado en Amazon la serie de entradas que hice sobre “cómo ser feliz trabajando por tu cuenta”. Como he explicado alguna vez por aquí, escribir el libro a base de entradas en el blog es una forma relativamente sencilla de crear una primera versión del contenido. De hecho, he comprobado aque a mi es la única que me funciona, y la voy a aplicar en los dos libros que tengo pendientes.

¿Te interesa comprar el libro si has leído la serie?

Yo creo que sí. Si has seguido la serie y eres observador, habrás contado 19 consejos, aunque estaban numerados del 1 al 20. Pero en la versión publicada están los 20, así que aunque solo sea un poquito de contenido adicional, te interesa leerlo.

Además, el precio es inmejorable: puedes comprar el eBook para tu Kindle por 0€. Sí, he dicho 0. Sólo durante 24 horas, del 24/4 al 25/4. Si es gratis, ¿qué pierdes por hacerlo? Tampoco es que comprarlo después sea una gran pérdida, porque su precio es de 1€, pero si te lo puedes ahorrar, ¿por qué no comprarlo ahora?

El libro tiene licencia creative commons, así que tienes mi permiso para pasarlo a PDF, a ePub o cualquier otro formato, para compartirlo con un amigo o con tu vecina de arriba, incluso puedes venderlo por ahí si eres capaz de hacerlo, y no tienes que pagarme nada ni pedirme permiso. Coméntalo, compártelo, usa las ideas para tu blog o para una presentación o para lo que quieras, critícalo, amplíalo. El libro es tuyo, y puedes hacer lo que quieras con él.

Lo que sí me gustaría pedirte es un gran favor. A cambio de hacer recibido todo este contenido gratis, primero en el blog y ahora en ebook, te pido por favor que dejes un comentario en Amazon después de comprar el libro. Por supuesto, tienes libertad para decir lo que quieras, si te ha gustado o no, si te ha sido útil o no.

Si estás pensando en esperar un poquito para pagar el euro que vale el libro y recompensar así mi trabajo, no lo hagas. De ese euro a mi me llegan 35 céntimos, que no me van a sacar de pobre. Y me ayudas mucho más dejando un comentario y sumando ahora una venta al total, aunque sea a coste 0.

Puedes comprarlo aquí: https://www.amazon.es/dp/B00CFPGFCK


¿Tienes algo que decir?

Tu opinión me importa, ¿por qué no vas a Desencadenado y la compartes con otros lectores?

Para comentar, haz click aquÍ: Mi último libro, gratis en Amazon durante 48 horas

- o -

Este artículo apareció primero en Desencadenado.

22 Apr 09:21

Making-of “Turtles all the way down”

by geidav
At Revision 2013 my demo-group Brain Control released their newest production Turtles all the way down, which won the 64k-intro competition. Five guys spent a year of most of their spare-time on creating this 65.536 bytes counting piece of binary data. In this making-of you can read about the process we went through creating this intro and about the used technology.
21 Apr 07:51

Virtuix hooks up Oculus Rift to its Omni treadmill, shows off 'True VR' (video)

by Joe Pollicino
Jose L. Hidalgo

Frack... You'll have to run!

Virtuix hooks up Oculus Rift

Sure, Omni-directional treadmills are nothing new, but Virtuix's take is worth a mention now that it's been shown off working in conjunction with the Oculus Rift. The company's been posting videos of its Omni treadmill working with Kinect for months, but last Thursday it upped the ante by adding the Rift. All told, it makes for what looks to be an intense VR session of Team Fortress 2 -- one-upping SixSenses' Razer Hydra demo for the VR headset. The company's been working on this unit as an affordable solution for households, aiming to eventually try for funding via Kickstarter. Catch the video demo after the break and please resist throwing money at the screen in an attempt to get in on the action early.

Filed under: Gaming

Comments

Via: Mashable

Source: Virtuix (YouTube)