Hi, looking across the web, I haven't found a solid example of using compute shaders to filter images with HLSL. So here is an example that may help a lot for beginners struggling with compute shaders and image processing. It's not showing the filtering itself, but the layout neccessary to start filtering in a simple way.

I'm assuming you know how to set up your application with a device and device context. How to render a quad on the screen with a simple vertex and index buffer, vertex shader and pixel shader (I've added the code.)

So to render a picture on the screen we first have to load a texture and setup a texture resource view like this;

hr = D3DX11CreateShaderResourceViewFromFile( pd3dDevice, szFile, NULL, NULL, &m_pTextureRV, NULL );
if( FAILED( hr ) )
return hr;

pd3dDevice is our device, szFile a LPCWSTR string, and m_pTextureRV a texture resource view. Pretty straightforward, and you can load different image formats(tif, bmp, gif, png, jpg,..) with this function.

Ok, now compile a vertex and pixel shader, create inputlayer, create a quad with vertex and index buffer. We will not be using this first, but later to check our compute shader "filtering".

struct QUAD_VERTEX
{
XMFLOAT3 Pos;
XMFLOAT2 Tex;
};
>HRESULT CreateQuadVBandcompileshader( ID3D11Device* pd3dDevice )
{
HRESULT hr = S_OK;

// Compiles the vertex shader and then creates it for rendering a texture on a quad
ID3DBlob* pVSBlob = NULL;

hr = CreateShaderFromFile( pd3dDevice, L"shader.hlsl", NULL, NULL, "VS_Tex", "vs_5_0", 0, 0,
NULL, (ID3D11DeviceChild**)&m_pVertexShader, &pVSBlob, NULL);
// this function calls D3DX11CompileFromFile and pd3dDevice->CreateVertexShader, you'll have to look it up
// Create vertex input layout for screen quad

const D3D11_INPUT_ELEMENT_DESC quadlayout[] =
{
     { "POSITION", 0, DXGI_FORMAT_R32G32B32_FLOAT, 0, 0, D3D11_INPUT_PER_VERTEX_DATA, 0 },
     { "TEXCOORD", 0, DXGI_FORMAT_R32G32_FLOAT, 0, 12, D3D11_INPUT_PER_VERTEX_DATA, 0 },
};

UINT numElements = ARRAYSIZE( quadlayout );
hr = pd3dDevice->CreateInputLayout( quadlayout, numElements, pVSBlob->GetBufferPointer(),
pVSBlob->GetBufferSize(), &m_pQuadVertexLayout );
pVSBlob->Release();
if( FAILED( hr ) )
return hr;

//pixel shader : compiles the shader from a file and then creates the pixel shader
ID3DBlob* pPSBlob = NULL;
hr = CreateShaderFromFile( pd3dDevice, L"shader.hlsl", NULL, NULL, "PS_Tex", "ps_5_0", 0, 0,
NULL, (ID3D11DeviceChild**)&m_pPixelShader, &pPSBlob, NULL);

QUAD_VERTEX Verts[4];
Verts[0].Pos = XMFLOAT3( -1, -1, 0.5 );
Verts[0].Tex = XMFLOAT2( 0, 1 );
Verts[1].Pos = XMFLOAT3( -1, 1, 0.5 );
Verts[1].Tex = XMFLOAT2( 0, 0 );
Verts[2].Pos = XMFLOAT3( 1, -1, 0.5 );
Verts[2].Tex = XMFLOAT2( 1, 1 );
Verts[3].Pos = XMFLOAT3( 1, 1, 0.5 );
Verts[3].Tex = XMFLOAT2( 1, 0 );

D3D11_BUFFER_DESC bd;
ZeroMemory( &bd, sizeof(bd) );
bd.Usage = D3D11_USAGE_DEFAULT;
bd.ByteWidth = sizeof( QUAD_VERTEX ) * 4;
bd.BindFlags = D3D11_BIND_VERTEX_BUFFER;
bd.CPUAccessFlags = 0;

D3D11_SUBRESOURCE_DATA InitData;
ZeroMemory( &InitData, sizeof(InitData) );
InitData.pSysMem = Verts;
InitData.SysMemPitch = 0;
InitData.SysMemSlicePitch = 0;
V_RETURN( pd3dDevice->CreateBuffer( &bd, &InitData, &m_pQuadVB ) );

// Create index buffer
WORD indices[] =
{
0,1,2,
2,1,3
};

bd.Usage = D3D11_USAGE_DEFAULT;
bd.ByteWidth = sizeof( WORD ) * 6; // 6 vertices needed for 2 triangles in a triangle list
bd.BindFlags = D3D11_BIND_INDEX_BUFFER;
bd.CPUAccessFlags = 0;
InitData.pSysMem = indices;
V_RETURN( pd3dDevice->CreateBuffer( &bd, &InitData, &m_pQuadIB ));
return hr;
}

So now you should be able to render the picture onto the screen (assuming you have already set up your render target view and view port;

But that would be too simple. What I would like to do now is setup a compile shader for image filtering. I've seen an example where the data in a texture resource is retrieved though a map and unmap procedure, and then uploaded to a structured buffer. But that means that first the data is uploaded to graphics memory in a texture object, then brought back to cpu memory, and then uploaded again to a structured buffer. Nonsense, that can be done much easier.

A second example you may find somewhere is how a 2D texture object is sampled in a compute shader though coordinates retrieved from a structured buffer. Okay that's better, but using a structured buffer for uv coordinates is really not neccessary. The important thing here is that you can retrieve data from a texture resource with a compute shader, and stream this data directly into a structured buffer, that can further be used as a resource in a pixel shader.

So here we go: Based on the number of pixels in the texture, I define one structured buffer (m_pBufResult) with an associated Resource view (m_pBufResultRV) for input, and an Unordered access view for output. (to keep things as simple as possible)

void CreateMyBuffersForComputing( ID3D11Device* pd3dDevice)
{
//To get a pointer to the Texture resource from the Textrue Resource view
ID3D11Resource* pResource;
m_pTextureRV->GetResource( &pResource);

//Since a ID311Texture2D interface inherites from ID3D11Resource, we can change the pointer and access the descripter
D3D11_TEXTURE2D_DESC TxDesc;
((ID3D11Texture2D*)pResource)->GetDesc( &TxDesc);

//Using the descripter we retrieve the number of pixels of the texture and use this for defining the structured buffers.
UINT NmPixels = TxDesc.Width * TxDesc.Height;
pResource->Release();

//Now the structured buffer with shader resource view and unordered access view for use in the compute shader
//the resource view can also be used in a pixel shader!!!!!!!

CreateStructuredBuffer( pd3dDevice, sizeof(XMFLOAT4), NmPixels, NULL, &m_pBufResult );
CreateBufferUAV( pd3dDevice, m_pBufResult, &m_pBufResultUAV );
CreateBufferSRV( pd3dDevice, m_pBufResult, &m_pBufResultRV );

#if defined(_DEBUG)
if ( m_pBufResultUAV )
m_pBufResultUAV->SetPrivateData( WKPDID_D3DDebugObjectName, sizeof( "Result UAV" ) - 1, "Result UAV" );
#endif
}

//This code, i copied, explains how to setup structered bufferes in order to multiply two matrixes. (thank you microsoft for this nice example http://www.getcodesamples.com/src/C859DF09/BA8F51F3 or actually the people that wrote this piece of code, (why are they all so anonymous at this company?)). This page also shows a neat way to do debugging, actually you can also add some code to create a Texture2D here with the mapped data and save an image file with D3DX11SaveTextureToFile.

HRESULT CreateStructuredBuffer( ID3D11Device* pDevice, UINT uElementSize, UINT uCount, VOID* pInitData, ID3D11Buffer** ppBufOut )
{
     *ppBufOut = NULL;
     D3D11_BUFFER_DESC desc;
     ZeroMemory( &desc, sizeof(desc) );
     desc.BindFlags = D3D11_BIND_UNORDERED_ACCESS | D3D11_BIND_SHADER_RESOURCE;
     desc.ByteWidth = uElementSize * uCount;
     desc.MiscFlags = D3D11_RESOURCE_MISC_BUFFER_STRUCTURED;
     desc.StructureByteStride = uElementSize;
     if ( pInitData )
     {
     D3D11_SUBRESOURCE_DATA InitData;
     InitData.pSysMem = pInitData; //source data
     return pDevice->CreateBuffer( &desc, &InitData, ppBufOut );
     } else
          return pDevice->CreateBuffer( &desc, NULL, ppBufOut );
}

HRESULT CreateRawBuffer( ID3D11Device* pDevice, UINT uSize, VOID* pInitData, ID3D11Buffer** ppBufOut )
{
     *ppBufOut = NULL;
     D3D11_BUFFER_DESC desc;
     ZeroMemory( &desc, sizeof(desc) );
     desc.BindFlags = D3D11_BIND_UNORDERED_ACCESS | D3D11_BIND_SHADER_RESOURCE | D3D11_BIND_INDEX_BUFFER |      D3D11_BIND_VERTEX_BUFFER;

     desc.ByteWidth = uSize;
     desc.MiscFlags = D3D11_RESOURCE_MISC_BUFFER_ALLOW_RAW_VIEWS;
     if ( pInitData )
     {
     D3D11_SUBRESOURCE_DATA InitData;
     InitData.pSysMem = pInitData;
     return pDevice->CreateBuffer( &desc, &InitData, ppBufOut );
     } else
          return pDevice->CreateBuffer( &desc, NULL, ppBufOut );
}

//--------------------------------------------------------------------------------------
// Create Shader Resource View for Structured or Raw Buffers
//--------------------------------------------------------------------------------------
HRESULT CreateBufferSRV( ID3D11Device* pDevice, ID3D11Buffer* pBuffer, ID3D11ShaderResourceView** ppSRVOut )
{
     D3D11_BUFFER_DESC descBuf;
     ZeroMemory( &descBuf, sizeof(descBuf) );
     pBuffer->GetDesc( &descBuf );
     D3D11_SHADER_RESOURCE_VIEW_DESC desc;
     ZeroMemory( &desc, sizeof(desc) );
     desc.ViewDimension = D3D11_SRV_DIMENSION_BUFFEREX;
     desc.BufferEx.FirstElement = 0;
     if ( descBuf.MiscFlags & D3D11_RESOURCE_MISC_BUFFER_ALLOW_RAW_VIEWS )
     {

// This is a Raw Buffer
     desc.Format = DXGI_FORMAT_R32_TYPELESS;
     desc.BufferEx.Flags = D3D11_BUFFEREX_SRV_FLAG_RAW;
     desc.BufferEx.NumElements = descBuf.ByteWidth / 4;
     } else
         if ( descBuf.MiscFlags & D3D11_RESOURCE_MISC_BUFFER_STRUCTURED )
     {

// This is a Structured Buffer
     desc.Format = DXGI_FORMAT_UNKNOWN;
     desc.BufferEx.NumElements = descBuf.ByteWidth / descBuf.StructureByteStride;
     } else
     {
          return E_INVALIDARG;
     }
     return pDevice->CreateShaderResourceView( pBuffer, &desc, ppSRVOut );
}

//--------------------------------------------------------------------------------------
// Create Unordered Access View for Structured or Raw Buffers
//--------------------------------------------------------------------------------------
HRESULT CreateBufferUAV( ID3D11Device* pDevice, ID3D11Buffer* pBuffer, ID3D11UnorderedAccessView** ppUAVOut )
{
     D3D11_BUFFER_DESC descBuf;
     ZeroMemory( &descBuf, sizeof(descBuf) );
     pBuffer->GetDesc( &descBuf );

     D3D11_UNORDERED_ACCESS_VIEW_DESC desc;
     ZeroMemory( &desc, sizeof(desc) );
     desc.ViewDimension = D3D11_UAV_DIMENSION_BUFFER;
     desc.Buffer.FirstElement = 0;
     if ( descBuf.MiscFlags & D3D11_RESOURCE_MISC_BUFFER_ALLOW_RAW_VIEWS )
     {
// This is a Raw Buffer
     desc.Format = DXGI_FORMAT_R32_TYPELESS; // Format must be DXGI_FORMAT_R32_TYPELESS, when creating Raw Unordered Access View
     desc.Buffer.Flags = D3D11_BUFFER_UAV_FLAG_RAW;
     desc.Buffer.NumElements = descBuf.ByteWidth / 4;
     } else
          if ( descBuf.MiscFlags & D3D11_RESOURCE_MISC_BUFFER_STRUCTURED )
     {
// This is a Structured Buffer
     desc.Format = DXGI_FORMAT_UNKNOWN; // Format must be must be DXGI_FORMAT_UNKNOWN, when creating a View of a Structured Buffer
     desc.Buffer.NumElements = descBuf.ByteWidth / descBuf.StructureByteStride;
     } else
     {
          return E_INVALIDARG;
     }

     return pDevice->CreateUnorderedAccessView( pBuffer, &desc, ppUAVOut );
}

 

NEXT;

Compile shader and create computeshader (m_pComputeShader).

hr = CreateShaderFromFile( pd3dDevice, L"shader.hlsl", pdefines, NULL, "CS", "cs_5_0", NULL, NULL,
NULL, (ID3D11DeviceChild**) &m_pComputeShader, NULL, NULL); 

 HLSL code to read the texture2D resource and stream its data to a structured buffer using an UnorderedAccesView. Note that I've defined the texture resource as containing float4s, there's some magic involved here, because it works with all the image formats i tried.

////////////////////////compute shader/////////////////////////////
Texture2D <float4> txFrame : register( t0 );
RWStructuredBuffer<float4> rwbuffer : register( u0);

[numthreads(30, 25, 1)]
void CS( uint3 i : SV_DispatchThreadID)
{
//Read color and write to buffer

     uint bufp = i.x + i.y*1500;
     uint3 indx = uint3( uint(bufp%Pixwide), uint(bufp/Pixwide), 0 );
     rwbuffer[bufp] = txFrame.Load( indx );

}

It's all about indexing the right way. Here the value 1500 for multiplying the y value of the DispatchThread is due to the X dimension in the pd3dDevice->Dispatch call (50), multiplied with the x dimension of the numthreads matrix (30). (http://msdn.microsoft.com/en-us/library/windows/desktop/ff471566%28v=vs.85%29.aspx)

View the texture as a linear buffer. It's Pixwide pixels wide, so the width index follows from the modulus of the Bufp index, and the height index follows from a rounded value of bufp/Pixwide.

Use a uint3 to index into the Texture2D.Load function, retrieve its value and fill the structured buffer. as easy as that.

// Run CS
//--------------------------------------------------------------------------------------
void OBJ::RunMyComputeShader( ID3D11DeviceContext* pContext, UINT X, UINT Y, UINT Z )
{

pContext->CSSetShaderResources( 0, 1, &m_pTextureRV );
pContext->CSSetUnorderedAccessViews( 0, 1, &m_pBufResultUAV, NULL );
pContext->CSSetShader( m_pComputeShader, NULL, 0 );

pContext->Dispatch( X, Y, Z ); X=50, Y=40, Z=1 :Y depends on number of pixels in my texture

pContext->CSSetShader( NULL, NULL, 0 );

ID3D11UnorderedAccessView* ppUAViewNULL[1] = { NULL };
pContext->CSSetUnorderedAccessViews( 0, 1, ppUAViewNULL, NULL );

ID3D11ShaderResourceView* ppSRVNULL[2] = { NULL, NULL };
pContext->CSSetShaderResources( 0, 2, ppSRVNULL );

}

 

As it is, this is actually wasting time, we could already have done some nice filtering.

But now you may want to check if it all worked with some standard stuff;

Here the HLSL code for rendering a texture to the screen from a structured buffer. (Note however that this may only work with the computeshader 5 capabilities of graphic cards. I'm not sure about earlier versions).

StructuredBuffer<float4> OutputBuf : register (t1);

struct VS_STRUCT
{
     float3 Pos : POSITION;
     float2 Tex : TEXCOORD0;
};

struct PS_INPUT
{
     float4 Pos : SV_POSITION;
     float2 Tex : TEXCOORD0;
};

//just copy, don't forget to make the SV_POSITION a float4
PS_INPUT VS_Tex( VS_STRUCT input )
{
     PS_INPUT output = (PS_INPUT)0;
     output.Pos = float4(input.Pos.xyz, 1);
     output.Tex = input.Tex;

     return output;
}

float4 PS_Tex( PS_INPUT input) : SV_Target
{

     int coord = floor(input.Tex.x*Pixwide) + floor(input.Tex.y*Pixhigh)*Pixwide;
     return OutputBuf[coord];

}

Note that i'm using a single int as index, because the structured buffer only has one dimension. The texture UVs go from 0-1.0 float, and need to be combined and converted to integer indexes. Tex.x is the fraction of pixels in the horzontal dimension, and tex.y is the fraction of pixels in the vertical direction. Since the pixels in the structured buffer are ordered in horizontal series, i have to take the height index * the width of the image (Pixwide) + the horizontal index. (To make it look even better, i could define my own linear interpolation here with some more index arithmatics.)

// Render the image

void RenderPicture( ID3D11DeviceContext* pd3dImmediateContext)
{

     UINT stride = sizeof(QUAD_VERTEX ); //all basic stuff......
     UINT offset = 0;
     pd3dImmediateContext->IASetInputLayout( m_pQuadVertexLayout );
     pd3dImmediateContext->IASetPrimitiveTopology( D3D11_PRIMITIVE_TOPOLOGY_TRIANGLELIST );
     pd3dImmediateContext->IASetVertexBuffers( 0, 1, &m_pQuadVB, &stride, &offset );
     pd3dImmediateContext->IASetIndexBuffer( m_pQuadIB, DXGI_FORMAT_R16_UINT, 0 );
     pd3dImmediateContext->PSSetShaderResources( 1, 1, &m_pBufResultRV ); //our Buffer resource view in register 1
     pd3dImmediateContext->VSSetShader( m_pVertexShader, NULL, 0 ); //note that I do not define a sampler!!!
     pd3dImmediateContext->PSSetShader( m_pPixelShader, NULL, 0 );
     pd3dImmediateContext->DrawIndexed( 6, 0, 0 ); //Okay, render the picture, not really very exciting!
     pd3dImmediateContext->VSSetShader( NULL, NULL, 0 );
     pd3dImmediateContext->PSSetShader( NULL, NULL, 0 );
     ID3D11ShaderResourceView* ppSRVNULL[2] = { NULL, NULL};
     pd3dImmediateContext->PSSetShaderResources( 0, 2, ppSRVNULL );

}

m_pSwapChain->Present( 1, 0 );

Instead of rendering the image, I also could have processed it through multiple series of compute shading. (I haven't tried that yet). However, then I would have had to define a second Structured buffer, with associated Shader Resource view and UnorderedAccessView. This is because the ShaderResourceView and the UnorderedAccessView of a structured buffer cannot be bound to the processing pipeline at the same time. So don't forget to uncouple resources, when you reuse them in a different context.

 

Well that's it...........