Towards accessible programming of the Cerebras CS-2 for scientific HPC applications

Davids Kacs

EPCC, University of Edinburgh


Cerebras CS-2

  • Accelerator device for machine learning
  • Can we use it to accelerate anything else?
  • Architecture [1]:
    • 850,000 cores
    • On-chip network
    • Dedicated hardware routers

CSL

  • Language for programming the CS-2 [2]
  • Unfamiliar to HPC developers
  • Low level details have to be managed by the user
    • Communication channels
    • SIMD
  • Uses Cerebras’ custom cslc compiler
const arr_dsd = @get_dsd(mem1d_dsd,
    .{.tensor_access = |i|{arr_sz} -> arr[i] }
);
CSL code example.

CS-2 is difficult to program

  • In-depth knowledge of the architecture needed to achieve high performance
  • Requires learning a new language
    • More time required to port existing applications and write new ones
xDSL logo

MLIR - a possible solution

  • Making the system programmable in an established language would make it easier to use and utilise developer time more effectively
  • This requires a new compiler
  • MLIR compiler framework - part of the LLVM project [3]
    • Composable and extensible
    • Provides reusable high level optimisations like xDSL's stencil dialect [4]
  • xDSL - MLIR recreated in Python [5]
    • Focussed on prototyping

Generating CSL code

  • Generated from high level MLIR
  • Well described API
  • Not as flexible - relies on cslc for a lot of optimisation
  • Currently worked on
  • In collaboration with members of the xDSL project

Generating CSLC's MLIR

  • Low level interface to the compiler
  • Better control of the compiled binary
    • Higher complexity
    • Unstable, undocumented API
  • Initial stages of exploration

Similarities

Both approaches can reuse high level stencil optimisations and communication library components written in CSL.

Compiler pipeline. The two methods can share a lot of the implementation.

Future work

  • Complete lowering from stencil MLIR to CSL code
  • Adapt this lowering to CSLC MLIR
  • Use Flang [6] as a frontend in the pipeline to compile Fortran for the CS-2
  • Benchmark both implementations on a range of stencil codes

References