Release 3.2.0 (2021/07/29)
Contents
Taskflow 3.2.0 is the 3rd release in the 3.x line! This release includes several new changes such as CPU-GPU tasking, algorithm collection, enhanced web-based profiler, documentation, and unit tests.
Download
Taskflow 3.2.0 can be downloaded from here.
System Requirements
To use Taskflow v3.2.0, you need a compiler that supports C++17:
- GNU C++ Compiler at least v8.4 with -std=c++17
- Clang C++ Compiler at least v6.0 with -std=c++17
- Microsoft Visual Studio at least v19.27 with /std:c++17
- AppleClang Xode Version at least v12.0 with -std=c++17
- Nvidia CUDA Toolkit and Compiler (nvcc) at least v11.1 with -std=c++17
- Intel C++ Compiler at least v19.0.1 with -std=c++17
- Intel DPC++ Clang Compiler at least v13.0.0 with -std=c++17 and SYCL20
Taskflow works on Linux, Windows, and Mac OS X.
Working Items
- enhancing support for SYCL with Intel DPC++
- enhancing parallel CPU and GPU algorithms
- designing pipeline interface and its scheduling algorithms
New Features
Taskflow Core
- added tf::
SmallVector optimization for optimizing the dependency storage in a graph - added move constructor and move assignment operator for tf::
Taskflow - added moved run in tf::
Executor for automatically managing taskflow's lifetimes
cudaFlow
- improved the execution flow of tf::
cudaFlowCapturer when updates involve
New algorithms in tf::
- added tf::
cudaFlow:: reduce - added tf::
cudaFlow:: transform_reduce - added tf::
cudaFlow:: uninitialized_reduce - added tf::
cudaFlow:: transform_uninitialized_reduce - added tf::
cudaFlow:: inclusive_scan - added tf::
cudaFlow:: exclusive_scan - added tf::
cudaFlow:: transform_inclusive_scan - added tf::
cudaFlow:: transform_exclusive_scan - added tf::
cudaFlow:: merge - added tf::
cudaFlow:: merge_by_key - added tf::
cudaFlow:: sort - added tf::
cudaFlow:: sort_by_key - added tf::
cudaFlow:: find_if - added tf::
cudaFlow:: min_element - added tf::
cudaFlow:: max_element - added tf::
cudaFlowCapturer:: reduce - added tf::
cudaFlowCapturer:: transform_reduce - added tf::
cudaFlowCapturer:: uninitialized_reduce - added tf::
cudaFlowCapturer:: transform_uninitialized_reduce - added tf::
cudaFlowCapturer:: inclusive_scan - added tf::
cudaFlowCapturer:: exclusive_scan - added tf::
cudaFlowCapturer:: transform_inclusive_scan - added tf::
cudaFlowCapturer:: transform_exclusive_scan - added tf::
cudaFlowCapturer:: merge - added tf::
cudaFlowCapturer:: merge_by_key - added tf::
cudaFlowCapturer:: sort - added tf::
cudaFlowCapturer:: sort_by_key - added tf::
cudaFlowCapturer:: find_if - added tf::
cudaFlowCapturer:: min_element - added tf::
cudaFlowCapturer:: max_element - added tf::
cudaLinearCapturing
syclFlow
CUDA Standard Parallel Algorithms
- added tf::
cuda_for_each - added tf::
cuda_for_each_index - added tf::
cuda_transform - added tf::
cuda_reduce - added tf::
cuda_uninitialized_reduce - added tf::
cuda_transform_reduce - added tf::
cuda_transform_uninitialized_reduce - added tf::
cuda_inclusive_scan - added tf::
cuda_exclusive_scan - added tf::
cuda_transform_inclusive_scan - added tf::
cuda_transform_exclusive_scan - added tf::
cuda_merge - added tf::
cuda_merge_by_key - added tf::
cuda_sort - added tf::
cuda_sort_by_key - added tf::
cuda_find_if - added tf::
cuda_min_element - added tf::
cuda_max_element
Utilities
- added CUDA meta programming
- added SYCL meta programming
Taskflow Profiler (TFProf)
Bug Fixes
- fixed compilation errors in constructing tf::
cudaRoundRobinCapturing - fixed compilation errors of TLS worker pointer in tf::
Executor - fixed compilation errors of nvcc v11.3 in auto template deduction
- std::scoped_lock
- tf::Serializer and tf::Deserializer
- fixed memory leak when moving a tf::
Taskflow
Breaking Changes
There are no breaking changes in this release.
Deprecated and Removed Items
- removed tf::cudaFlow::kernel_on method
- removed explicit partitions in parallel iterations and reductions
- removed tf::cudaFlowCapturerBase
- removed tf::cublasFlowCapturer
- renamed update and rebind methods in tf::
cudaFlow and tf:: cudaFlowCapturer to overloads
Documentation
- revised Static Tasking
- revised Executor
- revised Parallel Reduction
- added cudaFlow Algorithms
- added CUDA Standard Algorithms
Miscellaneous Items
We have published tf::
- Dian-Lun Lin and Tsung-Wei Huang, "Efficient GPU Computation using Task Graph Parallelism," European Conference on Parallel and Distributed Computing (EuroPar), 2021