Parallel Reduction
tf::
Reduce Items with an Initial Value
The reduction task created by tf::[first, last)
using the binary operator bop
and stores the reduced result in result
. It represents the parallel execution of the following reduction loop on a SYCL device:
while (first != last) { *result = op(*result, *first++); }
The variable result
participates in the reduction loop and must be initialized with an initial value. The following code performs a parallel reduction to sum all the numbers in the given range with an initial value 1000
:
const size_t N = 1000000; int* soln = sycl::malloc_shared<int>(1); // solution int* data = sycl::malloc_shared<int>(N); // data std::for_each(data, data+N, [](int& v){ d = 1; }); *soln = 1000; // create a syclflow to perform parallel reduction on a SYCL device sycl::queue queue; tf::syclFlow syclflow(queue); syclflow.reduce(data, data+N, soln, [] (int a, int b) { return a + b; }); syclflow.offload(); assert(sol == N + 1000);
Reduce Items without an Initial Value
You can use tf::
*result = *first++; // no initial values participate in the reduction loop while (first != last) { *result = op(*result, *first++); }
The variable result
is overwritten with the reduced value and no initial values participate in the reduction loop. The following code performs a parallel reduction to sum all the numbers in the given range without any initial value:
const size_t N = 1000000; int* soln = sycl::malloc_shared<int>(1); // solution int* data = sycl::malloc_shared<int>(N); // data std::for_each(data, data+N, [](int& v){ d = 1; }); *soln = 1000; // no effect // create a syclflow to perform parallel reduction on a SYCL device sycl::queue queue; tf::syclFlow syclflow(queue); syclflow.uninitialized_reduce( data, data+N, soln, [] (int a, int b) { return a + b; } ); syclflow.offload(); assert(sol == N);