Redintegrate - #DevOps? - Cemetech | Forum

AHelper
1
Developers, developers, developers (Posts: 2367)

07 Dec 2013 04:14:56 am

An even more polished example, with full source:
Code:

// Main test case


#include <context.h>


#include <task.h>


#include <taskworker.h>


#include <iostream>


#include <string.h>


#include <ffi.h>


#include <colors.h>





using namespace std;


using namespace Compute;





// Fallback for test 1


void testCPU(TaskWorker* wrk, char* in, char* out)


{


  const char name[] = "CPU";


  unsigned int i = wrk->get_global_id(0);





  if(i >= 28 && i <= 30)


  {


    out[i] = name[i - 28];


  }


  else


  {


    out[i] = in[i];


  }


}





// Fallback for test 2


void testCPU2(TaskWorker* wrk, unsigned int* out)


{


  unsigned int g = wrk->get_global_id(0);


  unsigned int l = wrk->get_local_id(0);





  out[g] = l;


}





int main(int argc, char** argv)


{


  string input;


  char output[256];





  auto test1 = [&input, &output](bool forceCPU)


  {


    // Clear out output


    memset(output, 0, sizeof(output));


    // Make Task, Context is auto-created as Singleton


    Task test("Test task", "test.cl", true, "main", testCPU);





    // Fallback?


    if(forceCPU) test.forceFallback();





    // Input arguments


    test.setArgument(input.size() + 1, (void*)input.data(), CL_MEM_COPY_HOST_PTR | CL_MEM_READ_ONLY, &ffi_type_pointer);


    // Output arguments


    test.setReturn(256, output, CL_MEM_WRITE_ONLY, &ffi_type_pointer);


    // Dimensions


    test.setWorkItems(1, input.size() + 1);


    // Run it


    test.start();


    test.join();


    test.end();


    // Check results


    cout << output << endl;


  };





  // Normal test


  input = string("Hello world from AHelper0's ___ 550 Ti from OpenCL");


  test1(false);


  // Fallback test


  input = string("Hello world from AHelper0's ___ using libffi in fallback mode");


  test1(true);





  unsigned int locals[10];





  auto test2 = [&locals](bool forceCPU)


  {


    Task test2("Test2 task", "test2.cl", true, "main", testCPU2);


    test2.setReturn(sizeof(locals), locals, CL_MEM_WRITE_ONLY, &ffi_type_pointer);


    test2.setWorkItems(1, 10);





    if(forceCPU)


      test2.forceFallback(); // Forcing can happen anywhere, even mid-OpenCL setup





    test2.start();


    test2.join();


    test2.end();





    for(int i = 0; i < 10; i++)


    {


      cout << locals[i] << " ";


    }





    cout << endl;


  };





  // Normal test


  test2(false);


  // Fallback test


  test2(true); // 0's because local mode not supported. Default is to say 0 on error in OpenCL, so...


}

test.cl:
Code:

__kernel void main(__global char *in, __global char *out)


{


  const char name[] = "GTX";


  


  unsigned int i = get_global_id(0);


  


  if(i >= 28 && i <= 30)


  {


    out[i] = name[i-28];


  }


  else


  {


    out[i] = in[i];


  }


}

test2.cl:
Code:

__kernel void main(__global unsigned int *out)


{


  unsigned int g = get_global_id(0);


  unsigned int l = get_local_id(0);


  


  out[g] = l;


}

This test shows off the Compute library running using both OpenCL computing and forcing to the CPU fallback. The first test simply copies a string and inserts different text based on which compute function is used. The second test simply prints the local ID (stays at zero as the local workers are mucked about with internally):
Code:

![HAL@HAL build]$ ./test5


Compute::Compute(): Loaded libOpenCL.so


Compute::Compute(): Detected 1 device(s): 


  GeForce GTX 550 Ti 1024 


Compute library initialized


Hello world from AHelper0's GTX 550 Ti from OpenCL


Hello world from AHelper0's CPU using libffi in fallback mode


0 1 2 3 4 5 6 7 8 9 


0 0 0 0 0 0 0 0 0 0 


Compute library unloaded

To show off the fallback in action, here is what happens if the OpenCL code fails (the kernel fails to compile in this case):
Code:

![HAL@HAL build]$ ./test5


Compute::Compute(): Loaded libOpenCL.so


Compute::Compute(): Detected 1 device(s):                                                                                                                                                       


  GeForce GTX 550 Ti 1024 


Compute library initialized


Compute::Task::Task(): Failed to compile CL code: :9:14: error: use of undeclared identifier 'namE'; did you mean 'name'?


    out[i] = namE[i-28];


             ^~~~


             name


:3:14: note: 'name' declared here


  const char name[] = "GTX";


             ^





Hello world from AHelper0's CPU 550 Ti from OpenCL


Compute::Task::Task(): Failed to compile CL code: :9:14: error: use of undeclared identifier 'namE'; did you mean 'name'?


    out[i] = namE[i-28];


             ^~~~


             name


:3:14: note: 'name' declared here


  const char name[] = "GTX";


             ^





Hello world from AHelper0's CPU using libffi in fallback mode


0 1 2 3 4 5 6 7 8 9 


0 0 0 0 0 0 0 0 0 0 


Compute library unloaded

Oh, and did I mention that all of the output is styled/colored?

Random sidenote: Current class count of the project is 91 classes with 16 subprojects.

AHelper
1
Developers, developers, developers (Posts: 2367)

08 Dec 2013 02:38:30 am

SmartBody looks insanely cool, I hope I can use it at some point.

KermMartian
Site Admin (Posts: 64050)

08 Dec 2013 01:41:05 pm

AHelper wrote:

Random sidenote: Current class count of the project is 91 classes with 16 subprojects.

Wow, that's quite extensive indeed. Smile

I think this needs a few more screenshots and a concise description so I can let more people know about this project somehow. Wink

AHelper
1
Developers, developers, developers (Posts: 2367)

08 Dec 2013 02:33:42 pm

KermMartian wrote:

AHelper wrote:

Random sidenote: Current class count of the project is 91 classes with 16 subprojects.

Wow, that's quite extensive indeed. Smile

I think this needs a few more screenshots and a concise description so I can let more people know about this project somehow. Wink

That is! I am currently tracking down a bug in my usage of SDL2 (or in SDL2 itself) that is causing (one of my PCs running) Windows to not render anything until the window gets destroyed. This happens on my basic SDL2 test case, so I know it really shouldn't be anything complex killing it.

Found it, double buffer code was missing. I had the change in one test case, but nowhere else. All seems fine now.

AHelper
1
Developers, developers, developers (Posts: 2367)

09 Dec 2013 01:55:53 am

About time I set up Trac to keep track of all of the madness in my project. If I was using it before, the previous bug wouldn't have been forgotten (I remember telling myself to not forget about the change after I found the bug again). I now have a good place to put a bunch of documentation, such as network protocols, file type documentation, milestone tasks, etc. (Currently is COLLADA animations, what I was going to do before I stopped for a while).

AHelper
1
Developers, developers, developers (Posts: 2367)

10 Dec 2013 02:51:08 am

Derp, just figured out how animations and controllers are connected: <channel> uses the target attribute to point to a bone and matrix for a armature <node> under <visual_scene>. I was stuck on this for a few days Neutral

AHelper
1
Developers, developers, developers (Posts: 2367)

14 Dec 2013 01:39:07 pm
Last edited by AHelper on 01 Sep 2016 12:42:24 am; edited 1 time in total

First steps of loading all of the animation data is done, suddenly a new COLLADA model appears! The stack of boxes are here? Or is that one box? o_O

animations... Soon...

AHelper
1
Developers, developers, developers (Posts: 2367)

16 Dec 2013 11:00:20 pm

If anyone has worked with OpenCL (or at least can suggest a linear algorithm for this), could you weigh in at all? Trying to get joint calculations thrown into OpenCL. Essentially, this requires a matrix multiplication down through a tree. I have "joints" (mat4) that have variable children. This can change to either fixed max of children or length specified. Each joint has a corresponding keyframe mat4 to multiply against. Each joint is then multiplied against this value. Then, for each child of a joint, the joint's mat4 is multiplied by the parent's mat4 and so on down the tree. I am still a beginner and as far as I know, I can only spawn workers in a linear fashion (I can't get it to nicely traverse a tree, needs to be linear in threads. Can be grouped together, but all groups of equal sizes.)

If I limit to, say, 4 children, I could have a group of 4 workers moving down the tree by laying out the input linearly by giving the current joint per-worker and parent per-group...

I will look into tree traversal ideas on the netz later on.

(edit) Looks like I am looking for something like an octree, but variable-sized based on usage.

elfprince13
2
OVER NINE THOUSAND! (Posts: 11872)

17 Dec 2013 01:36:53 am

http://devblogs.nvidia.com/parallelforall/thinking-parallel-part-ii-tree-traversal-gpu/

AHelper
1
Developers, developers, developers (Posts: 2367)

17 Dec 2013 01:56:47 am

It looks like that example is iterating over a tree by running multiple threads and specifying which element in the tree to look at in a read-only case. My issue is that I need to modify the tree as I move down, which would break if something runs out-of-order.

I essentially need to lock all joints until except for the root. Once the root finishes transforming itself, it unlocks itself for the children to use. I could emulate semaphores by using an atomic type, setting all to be locked except for the first and simply have a busy loop to block until things get unlocked. Would there be a better way?

elfprince13
2
OVER NINE THOUSAND! (Posts: 11872)

17 Dec 2013 05:25:42 pm

Multiple kernel launches for increasing depth? But you're not adding or deleting nodes, I don't think, so where does the order-dependence come in?

AHelper
1
Developers, developers, developers (Posts: 2367)

17 Dec 2013 05:26:37 pm

See this in-dev snippit:
Code:

void opencl_fallback_bake_bones(TaskWorker *tw, glm::mat4* binds, glm::mat4* keyframes, uint* parents, atomic_uint* atoms, glm::mat4* jointOut)


{


  uint id = tw->get_global_id(0);


  


  while(atoms[id]);


  


  if(parents[id] == -1)


    jointOut[id] = keyframes[id] * binds[id];


  else


    jointOut[id] = jointOut[parents[id]] * keyframes[id] * binds[id];


  


  atoms[id] = 0;


}

elfprince13
2
OVER NINE THOUSAND! (Posts: 11872)

17 Dec 2013 06:11:02 pm

When you load skeletal data (either from Collada, or from animations), why not do a one-time preprocessing to get absolute transforms instead of relative transforms? That will flatten your dependency tree, so you can do updates in parallel. Otherwise, you're basically just burning GPU resources for no reason.

AHelper
1
Developers, developers, developers (Posts: 2367)

18 Dec 2013 08:59:24 am

That doesn't float well when I blend different animations together.

elfprince13
2
OVER NINE THOUSAND! (Posts: 11872)

18 Dec 2013 03:31:23 pm

Then I recommend using multiple kernel launches to handle nodes at different depths. Busy waiting the GPU is a terrible idea.

AHelper
1
Developers, developers, developers (Posts: 2367)

19 Dec 2013 01:38:58 am

Hmm, that would work rather than waiting by depth. I will look into it later on, Just Cause 1/2 are stealing my time right now.

AHelper
1
Developers, developers, developers (Posts: 2367)

23 Dec 2013 09:36:40 pm

Trac sure does help organize a lot of the management chaos in Redintegrate Razz

I am still quite behind in where I want to be with animation, but I am making sure that the features that I would like in the future are being started now (or at least making sure that adding them in later will not be a problem). For example, I currently need to:

Create a system to export actions from blender to COLLADA (blender bugs list this as collada doesn't support "actions" in animations, which may or may not be true and would be dumb if they don't have it. Exporting requires all actions to be baked to armature's keyframes and exported all at once. A bit of JSON in Blender and some processing in Python will clear that up.)
Handling inverse kinematics for features such gazing, grabbing, etc.
Linking bones into Bullet Physics for ragdoll things, measuring impact forces, etc.
Blend different animations together
Calculate vertex positions from animations in Compute/OpenCL.

elfprince13 wrote:

Then I recommend using multiple kernel launches to handle nodes at different depths. Busy waiting the GPU is a terrible idea.

OpenCL apparently uses barriers to synchronize work items in work groups. How this is done is not specified, could be a busy wait, could be the worker sleeps.

<edit>

COLLADA models now link to their controllers for animation if such a link exists. Actual animation stuff is a WIP, right now setting up compute fallback functions to see how the animations will go, then get OpenCL kernels written after much revising/tuning.

AHelper
1
Developers, developers, developers (Posts: 2367)

01 Jan 2014 04:31:38 am

Throwing up another component that will eventually get built into Redintegrate: In-game creation and customization of buildings.

Either from making custom buildings for RTS games (Why only use pre-made buildings with one function when you can make your own!), making procedurally-generated buildings for certain maps, or just making standardized buildings for maps without making them by hand in modeling software, I hope this part gets completed. (Does Rust do this? Not sure from the video(s) on Steam).

Progress is very slow, but should pick up today.

AHelper
1
Developers, developers, developers (Posts: 2367)

30 Jan 2014 12:31:25 pm
Last edited by AHelper on 01 Sep 2016 12:42:24 am; edited 1 time in total

Working more on getting animations supported. I have had a lot of fighting with COLLADA animation and controller data and getting that data into Compute code.

Note, it shouldn't look like that. It should just be a tower of boxes bent over or twisted

This the very first render of animation data, so the process of fixing now begins. There was a lot of code that needed to be made before the client would even load or render, so places for mistakes are spread out.

<edit>

Mingw64 builds are working once again after mingw64 switched to SEH, forcing me to recompile nearly all of my dependencies. Here is the same version of the program on Win7:

To infinity and beyond!

AHelper
1
Developers, developers, developers (Posts: 2367)

01 Feb 2014 05:08:48 am
Last edited by AHelper on 01 Sep 2016 12:42:24 am; edited 1 time in total

Just added in CMake commands to allow archives/installers to be generated. For Linux, I have .tar.gz and .rpm packages. For Windows, I have NSIS-created installers and ZIPs.

Mmm...

Working on the package system. Right now, the resourceGroup is in charge of manually searching for files in a folder. That functionality is being abstracted out to allow packages to be either folders or compressed archives in various locations while handling writing as well as reading.

For my code, I use ifstream and ofstream, and those can easily be generated from the abstraction layer to allow read/writes. For other libraries like SDL2, memory mapped file data will be used when dealing with archives. For even worse libraries, a temp file can be used.

<edit> Filesystem/Package abstraction is working. I can simply create a package, get a file (creating if needed), and write to it. For me on Linux, requesting settings.cfg from the settings package pointed to the file ~/.local/share/redintegrate/settings/settings.cfg. I then did mkdir -p ../share/redintegrate/settings and it then pointed to ../share/redintegrate/settings/settings.cfg. When I removed the settings.cfg file and chmod -rw ../share/redintegrate/settings, it would select the package but fail to make the file. Wine works with this as well.

<edit> Archive handling is in-place. I can now open a .tar.lzma package, list all files and directories, and get the file read path. Still a WIP, but nearly done.