Building and Installing » Compile Taskflow with SYCL

Install SYCL Compiler

To compile Taskflow with SYCL code, you need the DPC++ clang compiler, which can be acquired from Getting Started with oneAPI DPC++.

Compile Source Code Directly

Taskflow's GPU programming interface for SYCL is tf::syclFlow. Consider the following simple.cpp program that performs the canonical saxpy (single-precision AX + Y) operation on a GPU:

#include <taskflow/taskflow.hpp>  // core taskflow routines
#include <taskflow/syclflow.hpp>  // core syclflow routines

int main() {
  
  tf::Executor executor;
  tf::Taskflow taskflow("saxpy example");
  
  sycl::queue queue;
  
  auto X = sycl::malloc_shared<float>(N, queue);
  auto Y = sycl::malloc_shared<float>(N, queue);
  
  taskflow.emplace_on([&](tf::syclFlow& sf){
    tf::syclTask fillX = sf.fill(X, 1.0f, N).name("fillX");
    tf::syclTask fillY = sf.fill(Y, 2.0f, N).name("fillY");
    tf::syclTask saxpy = sf.parallel_for(sycl::range<1>(N), 
      [=] (sycl::id<1> id) {
        X[id] = 3.0f * X[id] + Y[id];
      }
    ).name("saxpy");
    saxpy.succeed(fillX, fillY);
  }, queue).name("syclFlow");
  
  executor.run(taskflow).wait();
}

Use DPC++ clang to compile the program with the following options:

  • -fsycl: enable SYCL compilation mode
  • -fsycl-targets=nvptx64-nvidia-cuda-sycldevice: enable CUDA target
  • -fsycl-unnamed-lambda: enable unnamed SYCL lambda kernel
~$ clang++ -fsycl -fsycl-unnamed-lambda \
           -fsycl-targets=nvptx64-nvidia-cuda-sycldevice \  # for CUDA target
           -I path/to/taskflow -pthread -std=c++17 simple.cpp -o simple
~$ ./simple

Compile Source Code Separately

Large GPU applications often compile a program into separate objects and link them together to form an executable or a library. You can compile your SYCL code into separate object files and link them to form the final executable. Consider the following example that defines two tasks on two different pieces (main.cpp and syclflow.cpp) of source code:

// main.cpp
#include <taskflow/taskflow.hpp>

tf::Task make_syclflow(tf::Taskflow& taskflow);  // create a syclFlow task

int main() {

  tf::Executor executor;
  tf::Taskflow taskflow;

  tf::Task task1 = taskflow.emplace([](){ std::cout << "main.cpp!\n"; })
                           .name("cpu task");
  tf::Task task2 = make_syclflow(taskflow);

  task1.precede(task2);

  executor.run(taskflow).wait();

  return 0;
}
// syclflow.cpp
#include <taskflow/taskflow.hpp>
#include <taskflow/syclflow.hpp>

inline sycl::queue queue;    // create a global sycl queue

tf::Task make_syclflow(tf::Taskflow& taskflow) {
  return taskflow.emplace_on([](tf::syclFlow& cf){
    printf("syclflow.cpp!\n");
    cf.single_task([](){}).name("kernel");
  }, queue).name("gpu task");
}

Compile each source to an object using DPC++ clang:

~$ clang++ -I path/to/taskflow/ -pthread -std=c++17 -c main.cpp -o main.o
~$ clang++ -fsycl -fsycl-unnamed-lambda \
           -fsycl-targets=nvptx64-nvidia-cuda-sycldevice \
           -I path/to/taskflow/ -pthread -std=c++17 -c syclflow.cpp -o syclflow.o

# now we have the two compiled .o objects, main.o and syclflow.o
~$ ls
main.o syclflow.o 

Next, link the two object files to the final executable:

~$ clang++ -fsycl -fsycl-unnamed-lambda \
           -fsycl-targets=nvptx64-nvidia-cuda-sycldevice \  # for CUDA target
           main.o syclflow.o -pthread -std=c++17 -o main

# run the main program 
~$ ./main
main.cpp!
syclflow.cpp!