Getting started with the Guix workflow language

Installation

This guide assumes GNU Guix and GNU GWL have been installed already. In case the GNU GWL hasn't been installed, run: guix install gwl

Then tell Guix where to find the GWL extension: export GUIX_EXTENSIONS_PATH=$HOME/.guix-profile/share/guix/extensions

Introduction

In the GWL there are two concepts we need to know about: processes and workflows. We describe a computation (e.g. running a program) using a process. With a workflow we describe how multiple processes relate to each other (process B must run after process A, process C must run before process A).

Processes and workflows are composed using a domain specific language embedded in the general purpose language Scheme. They can be executed in order with the guix workflow command.

Example

Let's start by writing the obligatory “Hello, world!” to see what a workflow might look like.

process hello-world
  # { echo "Hello, world!" }

This text defines a process named ”hello-world” which would run a shell snippet that prints “Hello, world!” to the screen. Delightful!

Running programs

But the “hello-world” doesn't justify building yet another workflow language. When approaching the real world a little further, we use the software deployment strengths and reproducibility guarantees of GNU Guix by automating the deployment of a potentially complex software environment using the packages field.

process samtools-index
  packages "samtools"
  inputs "/tmp/sample.bam"
  # {
    samtools index {{inputs}}
  }

workflow do-the-thing
  processes samtools-index

The packages field declares that we want the samtools package to be available in the environment of this process. The package variant is fully determined by the version of Guix used and is installed automatically when the process is executed. It is important to list all packages required to run the process in the packages field.

We also defined a simple workflow named do-the-thing that executes just the samtools-index process.

In the next section, we will see how we can combine more processes in a workflow. We will also use process templates to generate processes from a list of input file names.

Defining workflows

A workflow describes how processes relate to each other. So before we can write the workflow, we must define some processes. In this example we will create a file with a process named create-file, and we will compress that file using a process named compress-file.

process create-file
  outputs
    file "file.txt"
  run-time
    complexity
      space 20 MiB
      time  10 seconds
  # { echo hello > {{outputs}} }

process compress-file
  packages "gzip"
  inputs
    file "file.txt"
  outputs
    file "file.txt.gz"
  run-time
    complexity
      space 20 mebibytes
      time   2 minutes
  # { gzip {{inputs}} -c > {{outputs}} }

With these definitions in place, we can run both in one go by defining a workflow.

workflow file-workflow
  processes
    auto-connect create-file compress-file

The workflow specifies all processes that should run. The auto-connect procedure links up all inputs and outputs of all specified processes and ensures that the processes are run in the correct order. Later we will see other ways to specify process dependencies.

Process templates

We can parameterize the inputs and outputs for a process, so that the same process template can serve for different inputs and outputs. Here is a process template that is parameterized on input:

process compress-file (with input)
  packages "gzip"
  inputs input
  outputs
    string-append input ".gz"
  run-time
    complexity
      space 20 mebibytes
      time  10 seconds
  # {
    gzip {{input}} -c > {{outputs}}
  }

Dynamic workflows

We can now dynamically create compression processes by instantiating the compress-file template with specific input file names. We use Scheme's define and map to simplify the work for us:

process create-file (with filename)
  outputs filename
  run-time
    complexity
      space 20 mebibytes
      time  10 seconds
  # { echo "Hello, world!  This is {{outputs}}." > {{outputs}} }

process compress-file (with input)
  packages "gzip"
  inputs input
  outputs
    file input ".gz"
  run-time
    complexity
      space 20 mebibytes
      time  10 seconds
  # { gzip {{inputs}} -c > {{outputs}} }


;; All inputs files.  The leading dot continues the previous line.
define files
  list "one.txt"
     . "two.txt"
     . "three.txt"

;; Map process templates to files to generate a list of processes.
define create-file-processes
  map create-file files

define compress-file-processes
  map compress-file files

workflow dynamic-workflow
  processes
    auto-connect compress-file-processes create-file-processes

In the GWL, we can define process dependencies explicitly. This is useful when processes don't have explicit outputs or inputs. Processes can do something other than producing output files, such as inserting data in a database, so process dependencies can be specified manually.

Restrictions can be specified as an association list mapping processes to their dependencies, or via the convenient graph syntax.

workflow graph-example
  processes
    graph
      A -> B C
      B -> D
      C -> B

Extending workflows

In the dynamic-workflow we created files and compressed them. In the following workflow we will generate a file containing some information about these compressed files to learn how we can extend a workflow at any point in a new workflow.

;; We are going to extend the workflow defined in the file
;; "example-workflow.w".
define dynamic-workflow
  load-workflow "example-workflow.w"

process list-file-template (with filename)
  name
    string-append "list-file-"
                  basename filename
  packages "gzip"
  inputs filename
  outputs
    file filename ".list"
  run-time
    complexity
      space 20 mebibytes
      time  30 seconds
  # { gzip --list {{inputs}} > {{outputs}} }


;; Get all processes of the other workflow.
define foreign-processes
  workflow-processes dynamic-workflow

;; Get the processes that we want to extend on.
define compress-file-processes
  processes-filter-by-name foreign-processes "compress-file"

;; Create the new processes.
define list-file-processes
  map list-file-template
      append-map process-outputs compress-file-processes

workflow extended-dynamic-workflow
  processes
    append
      ;; These are the process connections of the imported workflow
      workflow-restrictions dynamic-workflow
      ;; And these are the new process connections.  The "zip" procedure
      ;; pairs up each of the processes in "list-file-processes" with
      ;; one of the processes in "compress-file-processes".
      zip list-file-processes compress-file-processes

With list-file-template we created a procedure that returns a process that generates a file containing details about the compressed archive. We use this function in extended-dynamic-workflow to run after each compress-file process.

In the processes field we include the contents of dynamic-workflow, thereby concisely extending it.

Further reading

The GWL manual tries to cover everything you will need to know to write real-world scientific workflows with the GWL.

The GNU Guile and GNU Guix manuals are good places to learn the language and concepts on which GWL builds.