Taskflow Algorithms » Parallel Reduction

Taskflow provides template function that constructs a task to perform parallel reduction over a range of items.

Include the Header

You need to include the header file, taskflow/algorithm/reduce.hpp, for creating a parallel-reduction task.

#include <taskflow/algorithm/reduce.hpp>

Create a Parallel-Reduction Task

The reduction task created by tf::Taskflow::reduce(B first, E last, T& result, O bop) performs parallel reduction over a range of elements specified by [first, last) using the binary operator bop and stores the reduced result in result. It represents the parallel execution of the following reduction loop:

for(auto itr=first; itr<last; itr++) {
  result = bop(result, *itr);
}

At runtime, the reduction task spawns a subflow to perform parallel reduction. The reduced result is stored in result that will be captured by reference in the reduction task. It is your responsibility to ensure result remains alive during the parallel execution.

int sum = 100;
std::vector<int> vec = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10};

tf::Task task = taskflow.reduce(vec.begin(), vec.end(), sum, 
  [] (int l, int r) { return l + r; }  // binary reducer operator
);
executor.run(taskflow).wait();

assert(sum == 100 + 55);

The order in which the binary operator is applied to pairs of elements is unspecified. In other words, the elements of the range may be grouped and rearranged in arbitrary order. The result and the argument types of the binary operator must be consistent with the input data type .

Similar to Parallel Iterations, you can use std::reference_wrapper to enable stateful parameter passing between the reduction task and others.

int sum = 100;
std::vector<int> vec;
std::vector<int>::iterator first, last;

tf::Task init = taskflow.emplace([&](){
  vec   = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10};
  first = vec.begin();
  last  = vec.end();
});

tf::Task task = taskflow.reduce(std::ref(first), std::ref(last), sum, 
  [] (int l, int r) { return l + r; }  // binary reducer operator
);

// wrong! must use std::ref, or first and last are captured by copy
// tf::Task task = taskflow.reduce(first, last, sum, [] (int l, int r) { 
//   return l + r;    // binary reducer operator
// });

init.precede(task);

executor.run(taskflow).wait();

assert(sum == 100 + 55);

In the above example, when init finishes, vec has been initialized to 10 elements with first pointing to the first element and last pointing to the next of the last element (i.e., end of the range). These changes are visible to the execution context of the reduction task.

Create a Parallel-Transform-Reduction Task

It is common to transform each element into a new data type and then perform reduction on the transformed elements. Taskflow provides a method, tf::Taskflow::transform_reduce(B first, E last, T& result, BOP bop, UOP uop), that applies uop to transform each element in the specified range and then perform parallel reduction over result and transformed elements. It represents the parallel execution of the following reduction loop:

for(auto itr=first; itr<last; itr++) {
  result = bop(result, uop(*itr));
}

The example below transforms each digit in a string to an integer number and then sums up all integers in parallel.

std::string str = "12345678";
int sum {0};
tf::Task task = taskflow.transform_reduce(str.begin(), str.end(), sum,
  [] (int a, int b) {      // binary reduction operator
    return a + b;
  },  
  [] (char c) -> int {     // unary transformation operator
    return c - '0';
  }   
); 
executor.run(taskflow).wait(); 
assert(sum == 1 + 2 + 3 + 4 + 5 + 6 + 7 + 8);  // sum will be 36 

The order in which we apply the binary operator on the transformed elements is unspecified. It is possible that the binary operator will take r-value in both arguments, for example, bop(uop(*itr1), uop(*itr2)), due to the transformed temporaries. When data passing is expensive, you may define the result type T to be move-constructible.