Parallel Transforms
Contents
tf::
Iterator-based Parallel Transforms
Iterator-based parallel-transform applies the given transform function to a range of items and store the result in another range specified by two iterators, first
and last
. The two iterators are typically two raw pointers to the first element and the next to the last element in the range in GPU memory space. The task created by tf::
while (first != last) { *first++ = callable(*src1++, *src2++, *src3++, ...); }
The two iterators, first
and last
, are typically two raw pointers to the first element and the next to the last element in the range. The following example creates a transform
kernel that assigns each element, starting from gpu_data
to gpu_data + 1000
, to the sum of the corresponding elements at gpu_data_x
, gpu_data_y
, and gpu_data_z
.
taskflow.emplace_on([](tf::syclFlow& sf){ // gpu_data[i] = gpu_data_x[i] + gpu_data_y[i] + gpu_data_z[i] tf::syclTask task = sf.transform( gpu_data, gpu_data + 1000, [] (int xi, int yi, int zi) { return xi + yi + zi; }, gpu_data_x, gpu_data_y, gpu_data_z ); }, sycl_queue);
Each iteration is independent of each other and is assigned one kernel thread to run the callable.