Release 3.1.0 (2021/04/14)
Contents
Taskflow 3.1.0 is the 2nd release in the 3.x line! This release includes several new changes such as CPU-GPU tasking, algorithm collection, enhanced web-based profiler, documentation, and unit tests.
Download
Taskflow 3.1.0 can be downloaded from here.
System Requirements
To use Taskflow v3.1.0, you need a compiler that supports C++17:
- GNU C++ Compiler at least v8.4 with -std=c++17
- Clang C++ Compiler at least v6.0 with -std=c++17
- Microsoft Visual Studio at least v19.27 with /std:c++17
- AppleClang Xode Version at least v12.0 with -std=c++17
- Nvidia CUDA Toolkit and Compiler (nvcc) at least v11.1 with -std=c++17
- Intel C++ Compiler at least v19.0.1 with -std=c++17
- Intel DPC++ Clang Compiler at least v13.0.0 with -std=c++17 and SYCL20
Taskflow works on Linux, Windows, and Mac OS X.
New Features
Taskflow Core
- optimized task node storage by using std::
unique_ptr for semaphores - introduced tf::
syclFlow based on Intel DPC++ and SYCL 2020 spec - merged the execution flow of cudaFlow and cudaFlow capturer
cudaFlow
- optimized tf::
cudaRoundRobinCapturing through an event-pruning heuristic - optimized the default block size used in cudaFlow algorithms
- added tf::
cudaFlow:: clear() to clean up a cudaFlow - added tf::
cudaFlow:: num_tasks() to query the task count in a cudaFlow - added tf::
cudaTask:: num_dependents() to query the dependent count in a cudaTask - added tf::
cudaFlowCapturer:: clear() to clean up a cudaFlow capturer - added tf::
cudaFlowCapturer:: num_tasks() to query the task count in a cudaFlow capturer - added tf::
cudaFlowCapturer rebind methods: - tf::cudaFlowCapturer::rebind_single_task
- tf::cudaFlowCapturer::rebind_for_each
- tf::cudaFlowCapturer::rebind_for_each_index
- tf::cudaFlowCapturer::rebind_transform
- tf::cudaFlowCapturer::rebind_reduce
- tf::cudaFlowCapturer::rebind_uninitialized_reduce
- added tf::
cudaFlow update methods: - tf::cudaFlow::update_for_each
- tf::cudaFlow::update_for_each_index
- tf::cudaFlow::update_transform
- tf::cudaFlow::update_reduce
- tf::cudaFlow::update_uninitialized_reduce
- added cudaFlow examples:
- parallel reduction (examples/cuda/cuda_reduce.cu)
- parallel transform (examples/cuda/cuda_transform.cu)
- rebind (examples/cuda/cuda_rebind.cu)
syclFlow
- added a task graph-based programming model (see GPU Tasking (syclFlow))
- added syclFlow examples:
- device query (examples/sycl/sycl_device.cpp)
- range query (examples/sycl/sycl_ndrange.cpp)
- saxpy kernel (examples/sycl/sycl_saxpy.cpp)
- atomic operation using oneAPI atomic_ref (examples/sycl/sycl_atomic.cpp)
- vector addition (examples/sycl/sycl_vector_add.cpp)
- parallel reduction (examples/sycl/sycl_reduce.cpp)
- matrix multiplication (examples/sycl/sycl_matmul.cpp)
- parallel transform (examples/sycl/transform.cpp)
- rebind (examples/sycl/sycl_rebind.cpp)
- added syclFlow algorithms
- tf::
syclFlow:: single_task for single-threaded kernel - tf::
syclFlow:: for_each for parallel iterations - tf::
syclFlow:: for_each_index for index-based parallel iterations - tf::
syclFlow:: reduce for parallel reduction - tf::
syclFlow:: uninitialized_reduce for uninitialized parallel reduction
- tf::
Please visit these pages, GPU Tasking (syclFlow) and Compile Taskflow with SYCL, to know more details about compiling and running syclFlow programs.
Utilities
- resolved the compiler warning in serializer caused by
constexpr if
- resolved the compiler error of nvcc when parsin variadic namespace
Taskflow Profiler (TFProf)
- added support for syclflow task
Bug Fixes
- fixed the macro expansion issue with MSVC on
TF_CUDA_CHECK
- fixed the serializer compile error (#288)
- fixed the tf::
cudaTask:: type bug in mixing host and empty task types
Breaking Changes
There are no breaking changes in this release.
Deprecated and Removed Items
There are no deprecated or removed items in this release.
Documentation
- added Compile Taskflow with SYCL
- added SYCL example and tests to the page Building and Installing
- added Query the Worker Identieir to the cookbook page Executor
- added syclFlow Algorithms
- revised update methods in GPU Tasking (cudaFlow)
- revised rebind methods in GPU Tasking (cudaFlowCapturer)
Miscellaneous Items
- removed Circle-CI from the continuous integration
- updated grok to the user list
- updated RavEngine to the user list
- updated RPGMPacker to the user list
- updated Leanify to the user list