syclFlow Algorithms » Parallel Transforms

tf::syclFlow provides a template method, tf::syclFlow::transform, for creating a task to perform parallel transforms by applying the given function to a range of item and stores the transformed result in another range.

Iterator-based Parallel Transforms

Iterator-based parallel-transform applies the given transform function to a range of items and store the result in another range specified by two iterators, first and last. The two iterators are typically two raw pointers to the first element and the next to the last element in the range in GPU memory space. The task created by tf::syclFlow::transform(I first, I last, C&& callable, S... srcs) represents a kernel of parallel execution for the following loop:

while (first != last) {
  *first++ = callable(*src1++, *src2++, *src3++, ...);
}

The two iterators, first and last, are typically two raw pointers to the first element and the next to the last element in the range. The following example creates a transform kernel that assigns each element, starting from gpu_data to gpu_data + 1000, to the sum of the corresponding elements at gpu_data_x, gpu_data_y, and gpu_data_z.

taskflow.emplace_on([](tf::syclFlow& sf){
  // gpu_data[i] = gpu_data_x[i] + gpu_data_y[i] + gpu_data_z[i]
  tf::syclTask task = sf.transform(
    gpu_data, gpu_data + 1000, 
    [] (int xi, int yi, int zi) { return xi + yi + zi; },
    gpu_data_x, gpu_data_y, gpu_data_z
  ); 
}, sycl_queue);

Each iteration is independent of each other and is assigned one kernel thread to run the callable.