An interesting side-effect of the Ateji PX approach to GPU programming is that it makes it easy to map the GPU memory hierarchy. This is essential for achieving good performance on GPUs, and more generally on any hardware with non-uniform memory accesses (NUMA).
Remember, Ateji PX expresses parallelism with parallel branches, introduced by the
|| operator. When nesting parallel branches, each nesting level can be interpreted as being mapped to a different level in the memory.
What is important here is that memory hierarchy is expressed in terms of lexical scope. A variable from a different level in the memory hierarchy is accessible if and only if it is visible in the lexical scope.
In constrast, languages such as OpenCL use specific declaration modifiers to locate variables in different memory areas. With this approach, you can for instance a variable declared in the global lexical scope, but labeled with the
__private modifier. It looks like there is only one variable, while actually each kernel has its own copy. Such modifiers make it very hard to understand the logic of the code.