nf-core/configs: iris
IRIS profile provided to run nextflow pipelines on the IRIS cluster at Memorial Sloan Kettering Cancer Center (MSKCC)
nf-core/configs: IRIS Configuration
All nf-core pipelines have been successfully configured for use on the IRIS cluster at Memorial Sloan Kettering Cancer Center (MSKCC).
To use, run the pipeline with -profile iris. This will download and launch the iris.config which has been pre-configured with a setup suitable for the IRIS cluster. Using this profile, Singularity images containing all required software will be pulled from our local library or downloaded and cached before execution of the pipeline.
Before running the pipeline
Before running a pipeline for the first time, you will need to ensure that right version of Java, Nextflow and Singularity are available on the cluster. The IRIS cluster uses the SLURM job scheduler, and Nextflow will automatically submit jobs via SLURM.
Load Java and Singularity
module load java/23.0.1
Singularity 4.1 should be loaded by default at /usr/bin/singularity
Install Nextflow
You can install nextflow by running:
curl -s https://get.nextflow.io | bash
chmod +x nextflow
Running the pipeline
A typical command to run an nf-core pipeline on IRIS would look like:
nextflow run nf-core/<PIPELINE> -profile iris [additional pipeline parameters]
Optional Parameters
The IRIS config provides several optional parameters to customize job submission and paths:
--group: Your IRIS group name (e.g.,core006). When specified, sets the default working directory to/scratch/<YOUR_GROUP>/work. If not specified, the working directory defaults to./workin your current directory.--partition: Specify a SLURM partition (default: uses$NXF_SLURM_PARTITIONenvironment variable orcpu)--qos: Set Quality of Service specification for SLURM jobs (e.g.,priority)--preemptable: Set totrueto use preemptable queues for faster job submission (default:false)--isolated: Set totrueto restrict jobs to only the specified partition (default:false)
Example Commands
Basic usage:
nextflow run nf-core/rnaseq -profile iris --input samplesheet.csv --genome GRCh38
Using the group parameter to set default working directory:
nextflow run nf-core/rnaseq -profile iris --group mygroup --input samplesheet.csv --genome GRCh38
Explicitly setting work and output directories:
nextflow run nf-core/rnaseq -profile iris \
-work-dir /scratch/mygroup/work \
--outdir /data1/mygroup/results \
--input samplesheet.csv --genome GRCh38
Using preemptable queue for faster submission:
nextflow run nf-core/rnaseq -profile iris --preemptable true --input samplesheet.csv --genome GRCh38
Using a QoS for priority:
nextflow run nf-core/rnaseq -profile iris --partition cpu --qos priority --input samplesheet.csv --genome GRCh38
Cluster Details
Resource Limits
The IRIS config sets the following maximum resource limits:
- CPUs: 52 cores per job
- Memory: 550 GB per job
- Time: 7 days per job
Queue Selection
The config automatically selects appropriate SLURM queues based on job requirements:
- cpushort: Jobs with runtime ≤ 2 hours (CPU only)
- gpushort: GPU jobs with runtime ≤ 2 hours
- gpu: Regular GPU jobs
- cpu_highmem: Jobs requiring ≥ 512 GB memory or ≥ 50 GB per CPU
- preemptable: Use the preemptable queue when
--preemptable trueis set - cpu: Default queue for standard CPU jobs
GPU Support
The config includes support for GPU jobs. Processes labeled with process_gpu or process_gpu_low will automatically:
- Request GPU resources via SLURM (
--gres=gpu:1) - Use appropriate GPU queues (
gpuorgpushort) - Enable GPU support in Singularity containers (
--nvflag)
Proactive Resource Detection
The system also monitors resource usage patterns to prevent failures:
- Near Out of Memory: When peak RSS reaches ≥80% of allocated memory
- Near Out of Time: When realtime reaches ≥80% of allocated time
- CPU Starved: When CPU usage reaches ≥80% of available CPU capacity
These conditions trigger proactive resource increases even if the job completes successfully, helping prevent failures in subsequent similar jobs.
Process Labels
The config defines several process labels with default resources that scale automatically:
| Label | CPUs | Memory | Time |
|---|---|---|---|
process_single | 1 | 1 GB | 4 h |
process_low | 2 | 12 GB | 2 h |
process_medium | 6 | 36 GB | 8 h |
process_high | 12 | 72 GB | 16 h |
process_long | 2 | 12 GB | 20 h |
process_high_memory | 6 | 200 GB | 8 h |
process_gpu | 6 | 25 GB | 8 h |
process_gpu_low | 6 | 25 GB | 2 h |
These are starting values that will be automatically increased on retry if needed.
Singularity Configuration
The config uses Singularity for containerization with the following settings:
- Cache Directory: Automatically set based on working directory or
$NXF_SINGULARITY_CACHEDIR - Library Directory: Uses the shared library,
/data1/core006/resources/singularity_image_library(or$NXF_SINGULARITY_LIBRARYDIR) - Auto-mounting: Enabled for seamless file access
- Scratch Space: Uses
/localscratchwhen available
Working Directory
- If the working directory is not set it is automatically configured based on your group (via —group
<YOUR_GROUP>). - Otherwise, the work directory is
./workin your current directory - Automatic cleanup is enabled when using
/scratchto save space
Automatic Resource Management
The IRIS config includes intelligent retry logic that automatically adjusts resources when jobs fail. The system monitors job execution and dynamically scales resources based on failure patterns and resource utilization.
Retry Strategy Overview
- Jobs are automatically retried up to 3 times on failure
- Resources are dynamically increased based on the failure type and attempt number
- The system uses both multiplicative (scales with attempt) and additive (fixed increment) strategies
Resource Scaling Logic
Memory Scaling
Memory is increased based on the failure type:
| Failure Condition | Attempt 2 | Attempt 3 | Attempt 4+ |
|---|---|---|---|
| Out of Memory (exit 125, 137) | Previous + 10 GB | Previous + 20 GB | Previous + 30 GB |
| Out of Time (exit 15, 140) | Previous + 4 GB | Previous + 8 GB | Previous + 12 GB |
| Near Out of Memory (≥80% used) | Previous + 4 GB | Previous + 8 GB | Previous + 12 GB |
| Other failures | Previous + 2 GB | Previous + 4 GB | Previous + 10 GB |
Formula: new_memory = previous_memory + (multiplier × attempt) + base_increment
CPU Scaling
CPUs are increased when jobs are time-constrained or CPU-starved:
| Failure Condition | Attempt 2 | Attempt 3 | Attempt 4+ |
|---|---|---|---|
| Out of Time (exit 15, 140) | Previous + 1 | Previous + 2 | Previous + 3 |
| Near Out of Time (≥80% used) | Previous + 1 | Previous + 2 | Previous + 3 |
| CPU Starved (≥80% CPU usage) | Previous + 2 | Previous + 4 | Previous + 5 |
| Other failures | Previous | Previous | Previous + 1 |
Formula: new_cpus = previous_cpus + (multiplier × attempt) + base_increment
Time Scaling
Runtime limits are increased for time-related failures:
| Failure Condition | Attempt 2 | Attempt 3 | Attempt 4+ |
|---|---|---|---|
| Out of Time (exit 15, 140) | Previous + 12 h | Previous + 24 h | Previous + 36 h |
| Near Out of Time (≥80% used) | Previous + 12 h | Previous + 24 h | Previous + 36 h |
| Other failures | Previous + 2 h | Previous + 4 h | Previous + 1 d |
Formula: new_time = previous_time + (multiplier × attempt) + base_increment
Example Retries
Out of Memory Failure
A process_medium job runs out of memory:
Attempt 1: 6 CPUs, 36 GB, 8h → Out of Memory (exit 137)
Attempt 2: 6 CPUs, 46 GB, 8h → Out of Memory (exit 137)
Attempt 3: 6 CPUs, 56 GB, 8h → Success
---
config:
themeVariables:
cScale0: "#D3212C"
cScale1: "#FF681E"
cScale2: "#006B3D"
radar:
curveOpacity: .10
curveStrokeWidth: 2
---
radar-beta
title Out of Memory
axis m["Memory"], s["Cpu"], e["Time"]
curve a["Attempt 1"]{36,6, 8}
curve b["Attempt 2"]{46,6, 8}
curve c["Attempt 3"]{56,6, 8}
ticks 3
Out of Time Failure
A process_low job exceeds the time limit:
Attempt 1: 2 CPUs, 12 GB, 2h → Out of Time (exit 140)
Attempt 2: 3 CPUs, 16 GB, 14h → Out of Time (exit 140)
Attempt 3: 5 CPUs, 24 GB, 38h → Success
---
config:
themeVariables:
cScale0: "#D3212C"
cScale1: "#FF681E"
cScale2: "#006B3D"
radar:
curveOpacity: .10
curveStrokeWidth: 2
---
radar-beta
title Out of Time
axis m["Memory"], s["Cpu"], e["Time"]
curve a["Attempt 1"]{12,2, 2}
curve b["Attempt 2"]{16,3, 14}
curve c["Attempt 3"]{24,5, 38}
ticks 3
Complex Multi-Failure Path
A job experiences multiple failure types across retries:
Attempt 1: 6 CPUs, 36 GB, 8h → Out of Memory (exit 137)
Attempt 2: 6 CPUs, 46 GB, 8h → Out of Time (exit 140)
Attempt 3: 7 CPUs, 50 GB, 20h → Success
---
config:
themeVariables:
cScale0: "#D3212C"
cScale1: "#FF681E"
cScale2: "#006B3D"
radar:
curveOpacity: .10
curveStrokeWidth: 2
---
radar-beta
title Multi-Failure
axis m["Memory"], s["Cpu"], e["Time"]
curve a["Attempt 1"]{36,6, 8}
curve b["Attempt 2"]{46,6, 8}
curve c["Attempt 3"]{50,7, 20}
ticks 3
Getting Help
If you have any questions or issues running nf-core pipelines on IRIS, please contact:
- IRIS Support: Nikhil Kumar (kumarn1@mskcc.org)
- nf-core Slack: https://nfcore.slack.com
Notes
Note: You will need an account on the IRIS cluster at MSKCC to use this profile.
Note: Nextflow should be run from a compute node (via
srunorsbatch), not from the login node, to avoid overloading the login infrastructure.
Note: The config automatically enables trace reports to help monitor pipeline execution and resource usage.
Config file
params { config_profile_description = 'IRIS profile provided to run nextflow pipelines on the IRIS cluster at Memorial Sloan Kettering Cancer Center (MSKCC)' config_profile_contact = 'Nikhil Kumar (kumarn1@mskcc.org)' config_profile_url = 'https://github.com/mskcc-omics-workflows'
// Resource Limits max_cpus = 52 max_memory = 550.GB max_time = 7.d
// Job Submission Options preemptable = false // Use preemptable queue for faster submission isolated = false // Set to true when you can only use the provided paritition group = '' // IRIS group for the job work default path (e.g. /scratch/my_group) qos = '' // Set Quality of Service specification for SLURM jobs (e.g. priority) partition = '' // SLURM partition (uses $NXF_SLURM_PARTITION or 'cpu' if not set)
// Path config scratch_path = '/localscratch' work_path = '/scratch' singularity_library = '/data1/core006/resources/singularity_image_library'
// Validation Parameters ignore_params_list = [ 'max_cpus', 'max_memory', 'max_time', 'preemptable', 'scratch_path', 'work_path', 'singularity_library', 'isolated', 'group', 'qos', 'partition', 'scratch', 'ignore_params_list', 'schema_ignore_params', 'validationSchemaIgnoreParams', 'singularity_library_dir', 'singularity_scratch','work_dir' ] schema_ignore_params = params.ignore_params_list.join(',') validationSchemaIgnoreParams = params.ignore_params_list.join(',')}
validation { ignoreParams = params.ignore_params_list}
// Set sensible defaultsparams.partition = params.partition ?: env('NXF_SLURM_PARTITION') ?: 'cpu'params.scratch = new java.io.File(params.scratch_path).exists() ? params.scratch_path : "${env('PWD')}/scratch"params.work_dir = new java.io.File("${params.work_path}/${params.group}").exists() && new java.io.File("${params.work_path}/${params.group}").path != '/scratch' ? new java.io.File("${params.work_path}/${params.group}").path + '/work' : "${env('PWD')}/work"workDir = params.work_dircleanup = params.work_dir.startsWith('/scratch')params.singularity_scratch = env('NXF_SINGULARITY_CACHEDIR') ?: params.work_dir + '/singularity_scratch'params.singularity_library_dir = env('NXF_SINGULARITY_LIBRARYDIR') ?: params.singularity_library
executor { name = 'slurm' pollInterval = 45.s queueSize = 500 queueStatInterval = '1 min' submitRateLimit = '95/1min' retry.delay = '1s' retry.maxDelay = '1 min'}
singularity { enabled = true autoMounts = true cacheDir = params.singularity_scratch libraryDir = params.singularity_library_dir pullTimeout = 1.hour}
// IRIS SLURM Exit Codes:// 15, 140 = Wall time limit exceeded// 125, 137 = Out of memoryprocess { arch = 'linux/x86_64' executor = 'slurm' resourceLimits = [ cpus: params.max_cpus, memory: params.max_memory, time: params.max_time ]
withLabel: process_single { cpus = { task.attempt == 1 ? 1 : task.attempt > 1 && task.previousTrace && (task.previousTrace.exit == 15 || task.previousTrace.exit == 140) ? (task.previousTrace && task.previousTrace.cpus ? task.previousTrace.cpus + 1 : task.cpus + 1) : task.attempt > 1 && task.previousTrace && task.previousTrace.realtime && task.previousTrace.realtime / task.previousTrace.time >= .80 ? (task.previousTrace && task.previousTrace.cpus ? task.previousTrace.cpus + 1 : task.cpus + 1) : task.attempt > 1 && task.previousTrace && task.previousTrace['%cpu'] && task.previousTrace['%cpu'] / task.previousTrace.cpus >= .80 ? (task.previousTrace && task.previousTrace.cpus ? task.previousTrace.cpus + 2 : task.cpus + 2) : task.attempt > 3 ? (task.previousTrace && task.previousTrace.cpus ? task.previousTrace.cpus + 1 : task.cpus + 1) : (task.previousTrace && task.previousTrace.cpus ? task.previousTrace.cpus : task.cpus) } memory = { task.attempt == 1 ? 1.GB : task.attempt > 1 && task.previousTrace && (task.previousTrace.exit == 125 || task.previousTrace.exit == 137) ? (task.previousTrace && task.previousTrace.memory ? (task.previousTrace.memory as nextflow.util.MemoryUnit) + (10.GB * task.attempt) : task.memory + (10.GB * task.attempt)) : task.attempt > 1 && task.previousTrace && (task.previousTrace.exit == 15 || task.previousTrace.exit == 140) ? (task.previousTrace && task.previousTrace.memory ? (task.previousTrace.memory as nextflow.util.MemoryUnit) + (4.GB * task.attempt) : task.memory + (4.GB * task.attempt)) : task.attempt > 1 && task.previousTrace && task.previousTrace.peak_rss && task.previousTrace.peak_rss / task.previousTrace.memory >= .80 ? (task.previousTrace && task.previousTrace.memory ? (task.previousTrace.memory as nextflow.util.MemoryUnit) + (4.GB * task.attempt) : task.memory + (4.GB * task.attempt)) : task.attempt > 3 ? (task.previousTrace && task.previousTrace.memory ? (task.previousTrace.memory as nextflow.util.MemoryUnit) + 10.GB : task.memory + 10.GB) : (task.previousTrace && task.previousTrace.memory ? (task.previousTrace.memory as nextflow.util.MemoryUnit) + 2.GB : task.memory + 2.GB) } time = { task.attempt == 1 ? 4.h : task.attempt > 1 && task.previousTrace && (task.previousTrace.exit == 15 || task.previousTrace.exit == 140) ? (task.previousTrace && task.previousTrace.time ? (task.previousTrace.time as nextflow.util.Duration) + (12.h * task.attempt) : task.time + (12.h * task.attempt)) : task.attempt > 1 && task.previousTrace && task.previousTrace.realtime && task.previousTrace.realtime / task.previousTrace.time >= .80 ? (task.previousTrace && task.previousTrace.time ? (task.previousTrace.time as nextflow.util.Duration) + 12.h : task.time + 12.h) : task.attempt > 3 ? (task.previousTrace && task.previousTrace.time ? (task.previousTrace.time as nextflow.util.Duration) + 1.d : task.time + 1.d) : (task.previousTrace && task.previousTrace.time ? (task.previousTrace.time as nextflow.util.Duration) + 2.h : task.time + 2.h) } } withLabel: process_low { cpus = { task.attempt == 1 ? 2 : task.attempt > 1 && task.previousTrace && (task.previousTrace.exit == 15 || task.previousTrace.exit == 140) ? (task.previousTrace && task.previousTrace.cpus ? task.previousTrace.cpus + 1 : task.cpus + 1) : task.attempt > 1 && task.previousTrace && task.previousTrace.realtime && task.previousTrace.realtime / task.previousTrace.time >= .80 ? (task.previousTrace && task.previousTrace.cpus ? task.previousTrace.cpus + 1 : task.cpus + 1) : task.attempt > 1 && task.previousTrace && task.previousTrace['%cpu'] && task.previousTrace['%cpu'] / task.previousTrace.cpus >= .80 ? (task.previousTrace && task.previousTrace.cpus ? task.previousTrace.cpus + 2 : task.cpus + 2) : task.attempt > 3 ? (task.previousTrace && task.previousTrace.cpus ? task.previousTrace.cpus + 1 : task.cpus + 1) : (task.previousTrace && task.previousTrace.cpus ? task.previousTrace.cpus : task.cpus) } memory = { task.attempt == 1 ? 12.GB : task.attempt > 1 && task.previousTrace && (task.previousTrace.exit == 125 || task.previousTrace.exit == 137) ? (task.previousTrace && task.previousTrace.memory ? (task.previousTrace.memory as nextflow.util.MemoryUnit) + (10.GB * task.attempt) : task.memory + (10.GB * task.attempt)) : task.attempt > 1 && task.previousTrace && (task.previousTrace.exit == 15 || task.previousTrace.exit == 140) ? (task.previousTrace && task.previousTrace.memory ? (task.previousTrace.memory as nextflow.util.MemoryUnit) + (4.GB * task.attempt) : task.memory + (4.GB * task.attempt)) : task.attempt > 1 && task.previousTrace && task.previousTrace.peak_rss && task.previousTrace.peak_rss / task.previousTrace.memory >= .80 ? (task.previousTrace && task.previousTrace.memory ? (task.previousTrace.memory as nextflow.util.MemoryUnit) + (4.GB * task.attempt) : task.memory + (4.GB * task.attempt)) : task.attempt > 3 ? (task.previousTrace && task.previousTrace.memory ? (task.previousTrace.memory as nextflow.util.MemoryUnit) + 10.GB : task.memory + 10.GB) : (task.previousTrace && task.previousTrace.memory ? (task.previousTrace.memory as nextflow.util.MemoryUnit) + 2.GB : task.memory + 2.GB) } time = { task.attempt == 1 ? 2.h : task.attempt > 1 && task.previousTrace && (task.previousTrace.exit == 15 || task.previousTrace.exit == 140) ? (task.previousTrace && task.previousTrace.time ? (task.previousTrace.time as nextflow.util.Duration) + (12.h * task.attempt) : task.time + (12.h * task.attempt)) : task.attempt > 1 && task.previousTrace && task.previousTrace.realtime && task.previousTrace.realtime / task.previousTrace.time >= .80 ? (task.previousTrace && task.previousTrace.time ? (task.previousTrace.time as nextflow.util.Duration) + 12.h : task.time + 12.h) : task.attempt > 3 ? (task.previousTrace && task.previousTrace.time ? (task.previousTrace.time as nextflow.util.Duration) + 1.d : task.time + 1.d) : (task.previousTrace && task.previousTrace.time ? (task.previousTrace.time as nextflow.util.Duration) + 2.h : task.time + 2.h) } } withLabel: process_medium { cpus = { task.attempt == 1 ? 6 : task.attempt > 1 && task.previousTrace && (task.previousTrace.exit == 15 || task.previousTrace.exit == 140) ? (task.previousTrace && task.previousTrace.cpus ? task.previousTrace.cpus + 1 : task.cpus + 1) : task.attempt > 1 && task.previousTrace && task.previousTrace.realtime && task.previousTrace.realtime / task.previousTrace.time >= .80 ? (task.previousTrace && task.previousTrace.cpus ? task.previousTrace.cpus + 1 : task.cpus + 1) : task.attempt > 1 && task.previousTrace && task.previousTrace['%cpu'] && task.previousTrace['%cpu'] / task.previousTrace.cpus >= .80 ? (task.previousTrace && task.previousTrace.cpus ? task.previousTrace.cpus + 2 : task.cpus + 2) : task.attempt > 3 ? (task.previousTrace && task.previousTrace.cpus ? task.previousTrace.cpus + 1 : task.cpus + 1) : (task.previousTrace && task.previousTrace.cpus ? task.previousTrace.cpus : task.cpus) } memory = { task.attempt == 1 ? 36.GB : task.attempt > 1 && task.previousTrace && (task.previousTrace.exit == 125 || task.previousTrace.exit == 137) ? (task.previousTrace && task.previousTrace.memory ? (task.previousTrace.memory as nextflow.util.MemoryUnit) + (10.GB * task.attempt) : task.memory + (10.GB * task.attempt)) : task.attempt > 1 && task.previousTrace && (task.previousTrace.exit == 15 || task.previousTrace.exit == 140) ? (task.previousTrace && task.previousTrace.memory ? (task.previousTrace.memory as nextflow.util.MemoryUnit) + (4.GB * task.attempt) : task.memory + (4.GB * task.attempt)) : task.attempt > 1 && task.previousTrace && task.previousTrace.peak_rss && task.previousTrace.peak_rss / task.previousTrace.memory >= .80 ? (task.previousTrace && task.previousTrace.memory ? (task.previousTrace.memory as nextflow.util.MemoryUnit) + (4.GB * task.attempt) : task.memory + (4.GB * task.attempt)) : task.attempt > 3 ? (task.previousTrace && task.previousTrace.memory ? (task.previousTrace.memory as nextflow.util.MemoryUnit) + 10.GB : task.memory + 10.GB) : (task.previousTrace && task.previousTrace.memory ? (task.previousTrace.memory as nextflow.util.MemoryUnit) + 2.GB : task.memory + 2.GB) } time = { task.attempt == 1 ? 8.h : task.attempt > 1 && task.previousTrace && (task.previousTrace.exit == 15 || task.previousTrace.exit == 140) ? (task.previousTrace && task.previousTrace.time ? (task.previousTrace.time as nextflow.util.Duration) + (12.h * task.attempt) : task.time + (12.h * task.attempt)) : task.attempt > 1 && task.previousTrace && task.previousTrace.realtime && task.previousTrace.realtime / task.previousTrace.time >= .80 ? (task.previousTrace && task.previousTrace.time ? (task.previousTrace.time as nextflow.util.Duration) + 12.h : task.time + 12.h) : task.attempt > 3 ? (task.previousTrace && task.previousTrace.time ? (task.previousTrace.time as nextflow.util.Duration) + 1.d : task.time + 1.d) : (task.previousTrace && task.previousTrace.time ? (task.previousTrace.time as nextflow.util.Duration) + 2.h : task.time + 2.h) } } withLabel: process_high { cpus = { task.attempt == 1 ? 12 : task.attempt > 1 && task.previousTrace && (task.previousTrace.exit == 15 || task.previousTrace.exit == 140) ? (task.previousTrace && task.previousTrace.cpus ? task.previousTrace.cpus + 1 : task.cpus + 1) : task.attempt > 1 && task.previousTrace && task.previousTrace.realtime && task.previousTrace.realtime / task.previousTrace.time >= .80 ? (task.previousTrace && task.previousTrace.cpus ? task.previousTrace.cpus + 1 : task.cpus + 1) : task.attempt > 1 && task.previousTrace && task.previousTrace['%cpu'] && task.previousTrace['%cpu'] / task.previousTrace.cpus >= .80 ? (task.previousTrace && task.previousTrace.cpus ? task.previousTrace.cpus + 2 : task.cpus + 2) : task.attempt > 3 ? (task.previousTrace && task.previousTrace.cpus ? task.previousTrace.cpus + 1 : task.cpus + 1) : (task.previousTrace && task.previousTrace.cpus ? task.previousTrace.cpus : task.cpus) } memory = { task.attempt == 1 ? 72.GB : task.attempt > 1 && task.previousTrace && (task.previousTrace.exit == 125 || task.previousTrace.exit == 137) ? (task.previousTrace && task.previousTrace.memory ? (task.previousTrace.memory as nextflow.util.MemoryUnit) + (10.GB * task.attempt) : task.memory + (10.GB * task.attempt)) : task.attempt > 1 && task.previousTrace && (task.previousTrace.exit == 15 || task.previousTrace.exit == 140) ? (task.previousTrace && task.previousTrace.memory ? (task.previousTrace.memory as nextflow.util.MemoryUnit) + (4.GB * task.attempt) : task.memory + (4.GB * task.attempt)) : task.attempt > 1 && task.previousTrace && task.previousTrace.peak_rss && task.previousTrace.peak_rss / task.previousTrace.memory >= .80 ? (task.previousTrace && task.previousTrace.memory ? (task.previousTrace.memory as nextflow.util.MemoryUnit) + (4.GB * task.attempt) : task.memory + (4.GB * task.attempt)) : task.attempt > 3 ? (task.previousTrace && task.previousTrace.memory ? (task.previousTrace.memory as nextflow.util.MemoryUnit) + 10.GB : task.memory + 10.GB) : (task.previousTrace && task.previousTrace.memory ? (task.previousTrace.memory as nextflow.util.MemoryUnit) + 2.GB : task.memory + 2.GB) } time = { task.attempt == 1 ? 16.h : task.attempt > 1 && task.previousTrace && (task.previousTrace.exit == 15 || task.previousTrace.exit == 140) ? (task.previousTrace && task.previousTrace.time ? (task.previousTrace.time as nextflow.util.Duration) + (12.h * task.attempt) : task.time + (12.h * task.attempt)) : task.attempt > 1 && task.previousTrace && task.previousTrace.realtime && task.previousTrace.realtime / task.previousTrace.time >= .80 ? (task.previousTrace && task.previousTrace.time ? (task.previousTrace.time as nextflow.util.Duration) + 12.h : task.time + 12.h) : task.attempt > 3 ? (task.previousTrace && task.previousTrace.time ? (task.previousTrace.time as nextflow.util.Duration) + 1.d : task.time + 1.d) : (task.previousTrace && task.previousTrace.time ? (task.previousTrace.time as nextflow.util.Duration) + 2.h : task.time + 2.h) } } withLabel: process_long { cpus = { task.attempt == 1 ? 2 : task.attempt > 1 && task.previousTrace && (task.previousTrace.exit == 15 || task.previousTrace.exit == 140) ? (task.previousTrace && task.previousTrace.cpus ? task.previousTrace.cpus + 1 : task.cpus + 1) : task.attempt > 1 && task.previousTrace && task.previousTrace.realtime && task.previousTrace.realtime / task.previousTrace.time >= .80 ? (task.previousTrace && task.previousTrace.cpus ? task.previousTrace.cpus + 1 : task.cpus + 1) : task.attempt > 1 && task.previousTrace && task.previousTrace['%cpu'] && task.previousTrace['%cpu'] / task.previousTrace.cpus >= .80 ? (task.previousTrace && task.previousTrace.cpus ? task.previousTrace.cpus + 2 : task.cpus + 2) : task.attempt > 3 ? (task.previousTrace && task.previousTrace.cpus ? task.previousTrace.cpus + 1 : task.cpus + 1) : (task.previousTrace && task.previousTrace.cpus ? task.previousTrace.cpus : task.cpus) } memory = { task.attempt == 1 ? 12.GB : task.attempt > 1 && task.previousTrace && (task.previousTrace.exit == 125 || task.previousTrace.exit == 137) ? (task.previousTrace && task.previousTrace.memory ? (task.previousTrace.memory as nextflow.util.MemoryUnit) + (10.GB * task.attempt) : task.memory + (10.GB * task.attempt)) : task.attempt > 1 && task.previousTrace && (task.previousTrace.exit == 15 || task.previousTrace.exit == 140) ? (task.previousTrace && task.previousTrace.memory ? (task.previousTrace.memory as nextflow.util.MemoryUnit) + (4.GB * task.attempt) : task.memory + (4.GB * task.attempt)) : task.attempt > 1 && task.previousTrace && task.previousTrace.peak_rss && task.previousTrace.peak_rss / task.previousTrace.memory >= .80 ? (task.previousTrace && task.previousTrace.memory ? (task.previousTrace.memory as nextflow.util.MemoryUnit) + (4.GB * task.attempt) : task.memory + (4.GB * task.attempt)) : task.attempt > 3 ? (task.previousTrace && task.previousTrace.memory ? (task.previousTrace.memory as nextflow.util.MemoryUnit) + 10.GB : task.memory + 10.GB) : (task.previousTrace && task.previousTrace.memory ? (task.previousTrace.memory as nextflow.util.MemoryUnit) + 2.GB : task.memory + 2.GB) } time = { task.attempt == 1 ? 20.h : task.attempt > 1 && task.previousTrace && (task.previousTrace.exit == 15 || task.previousTrace.exit == 140) ? (task.previousTrace && task.previousTrace.time ? (task.previousTrace.time as nextflow.util.Duration) + (12.h * task.attempt) : task.time + (12.h * task.attempt)) : task.attempt > 1 && task.previousTrace && task.previousTrace.realtime && task.previousTrace.realtime / task.previousTrace.time >= .80 ? (task.previousTrace && task.previousTrace.time ? (task.previousTrace.time as nextflow.util.Duration) + 12.h : task.time + 12.h) : task.attempt > 3 ? (task.previousTrace && task.previousTrace.time ? (task.previousTrace.time as nextflow.util.Duration) + 1.d : task.time + 1.d) : (task.previousTrace && task.previousTrace.time ? (task.previousTrace.time as nextflow.util.Duration) + 2.h : task.time + 2.h) } } withLabel: process_high_memory { cpus = { task.attempt == 1 ? 6 : task.attempt > 1 && task.previousTrace && (task.previousTrace.exit == 15 || task.previousTrace.exit == 140) ? (task.previousTrace && task.previousTrace.cpus ? task.previousTrace.cpus + 1 : task.cpus + 1) : task.attempt > 1 && task.previousTrace && task.previousTrace.realtime && task.previousTrace.realtime / task.previousTrace.time >= .80 ? (task.previousTrace && task.previousTrace.cpus ? task.previousTrace.cpus + 1 : task.cpus + 1) : task.attempt > 1 && task.previousTrace && task.previousTrace['%cpu'] && task.previousTrace['%cpu'] / task.previousTrace.cpus >= .80 ? (task.previousTrace && task.previousTrace.cpus ? task.previousTrace.cpus + 2 : task.cpus + 2) : task.attempt > 3 ? (task.previousTrace && task.previousTrace.cpus ? task.previousTrace.cpus + 1 : task.cpus + 1) : (task.previousTrace && task.previousTrace.cpus ? task.previousTrace.cpus : task.cpus) } memory = { task.attempt == 1 ? 200.GB : task.attempt > 1 && task.previousTrace && (task.previousTrace.exit == 125 || task.previousTrace.exit == 137) ? (task.previousTrace && task.previousTrace.memory ? (task.previousTrace.memory as nextflow.util.MemoryUnit) + (10.GB * task.attempt) : task.memory + (10.GB * task.attempt)) : task.attempt > 1 && task.previousTrace && (task.previousTrace.exit == 15 || task.previousTrace.exit == 140) ? (task.previousTrace && task.previousTrace.memory ? (task.previousTrace.memory as nextflow.util.MemoryUnit) + (4.GB * task.attempt) : task.memory + (4.GB * task.attempt)) : task.attempt > 1 && task.previousTrace && task.previousTrace.peak_rss && task.previousTrace.peak_rss / task.previousTrace.memory >= .80 ? (task.previousTrace && task.previousTrace.memory ? (task.previousTrace.memory as nextflow.util.MemoryUnit) + (4.GB * task.attempt) : task.memory + (4.GB * task.attempt)) : task.attempt > 3 ? (task.previousTrace && task.previousTrace.memory ? (task.previousTrace.memory as nextflow.util.MemoryUnit) + 10.GB : task.memory + 10.GB) : (task.previousTrace && task.previousTrace.memory ? (task.previousTrace.memory as nextflow.util.MemoryUnit) + 2.GB : task.memory + 2.GB) } time = { task.attempt == 1 ? 8.h : task.attempt > 1 && task.previousTrace && (task.previousTrace.exit == 15 || task.previousTrace.exit == 140) ? (task.previousTrace && task.previousTrace.time ? (task.previousTrace.time as nextflow.util.Duration) + (12.h * task.attempt) : task.time + (12.h * task.attempt)) : task.attempt > 1 && task.previousTrace && task.previousTrace.realtime && task.previousTrace.realtime / task.previousTrace.time >= .80 ? (task.previousTrace && task.previousTrace.time ? (task.previousTrace.time as nextflow.util.Duration) + 12.h : task.time + 12.h) : task.attempt > 3 ? (task.previousTrace && task.previousTrace.time ? (task.previousTrace.time as nextflow.util.Duration) + 1.d : task.time + 1.d) : (task.previousTrace && task.previousTrace.time ? (task.previousTrace.time as nextflow.util.Duration) + 2.h : task.time + 2.h) } } withLabel: process_gpu { cpus = { task.attempt == 1 ? 6 : task.attempt > 1 && task.previousTrace && (task.previousTrace.exit == 15 || task.previousTrace.exit == 140) ? (task.previousTrace && task.previousTrace.cpus ? task.previousTrace.cpus + 1 : task.cpus + 1) : task.attempt > 1 && task.previousTrace && task.previousTrace.realtime && task.previousTrace.realtime / task.previousTrace.time >= .80 ? (task.previousTrace && task.previousTrace.cpus ? task.previousTrace.cpus + 1 : task.cpus + 1) : task.attempt > 1 && task.previousTrace && task.previousTrace['%cpu'] && task.previousTrace['%cpu'] / task.previousTrace.cpus >= .80 ? (task.previousTrace && task.previousTrace.cpus ? task.previousTrace.cpus + 2 : task.cpus + 2) : task.attempt > 3 ? (task.previousTrace && task.previousTrace.cpus ? task.previousTrace.cpus + 1 : task.cpus + 1) : (task.previousTrace && task.previousTrace.cpus ? task.previousTrace.cpus : task.cpus) } memory = { task.attempt == 1 ? 25.GB : task.attempt > 1 && task.previousTrace && (task.previousTrace.exit == 125 || task.previousTrace.exit == 137) ? (task.previousTrace && task.previousTrace.memory ? (task.previousTrace.memory as nextflow.util.MemoryUnit) + (10.GB * task.attempt) : task.memory + (10.GB * task.attempt)) : task.attempt > 1 && task.previousTrace && (task.previousTrace.exit == 15 || task.previousTrace.exit == 140) ? (task.previousTrace && task.previousTrace.memory ? (task.previousTrace.memory as nextflow.util.MemoryUnit) + (4.GB * task.attempt) : task.memory + (4.GB * task.attempt)) : task.attempt > 1 && task.previousTrace && task.previousTrace.peak_rss && task.previousTrace.peak_rss / task.previousTrace.memory >= .80 ? (task.previousTrace && task.previousTrace.memory ? (task.previousTrace.memory as nextflow.util.MemoryUnit) + (4.GB * task.attempt) : task.memory + (4.GB * task.attempt)) : task.attempt > 3 ? (task.previousTrace && task.previousTrace.memory ? (task.previousTrace.memory as nextflow.util.MemoryUnit) + 10.GB : task.memory + 10.GB) : (task.previousTrace && task.previousTrace.memory ? (task.previousTrace.memory as nextflow.util.MemoryUnit) + 2.GB : task.memory + 2.GB) } time = { task.attempt == 1 ? 8.h : task.attempt > 1 && task.previousTrace && (task.previousTrace.exit == 15 || task.previousTrace.exit == 140) ? (task.previousTrace && task.previousTrace.time ? (task.previousTrace.time as nextflow.util.Duration) + (12.h * task.attempt) : task.time + (12.h * task.attempt)) : task.attempt > 1 && task.previousTrace && task.previousTrace.realtime && task.previousTrace.realtime / task.previousTrace.time >= .80 ? (task.previousTrace && task.previousTrace.time ? (task.previousTrace.time as nextflow.util.Duration) + 12.h : task.time + 12.h) : task.attempt > 3 ? (task.previousTrace && task.previousTrace.time ? (task.previousTrace.time as nextflow.util.Duration) + 1.d : task.time + 1.d) : (task.previousTrace && task.previousTrace.time ? (task.previousTrace.time as nextflow.util.Duration) + 2.h : task.time + 2.h) } accelerator = 1 } withLabel: process_gpu_low { cpus = { task.attempt == 1 ? 6 : task.attempt > 1 && task.previousTrace && (task.previousTrace.exit == 15 || task.previousTrace.exit == 140) ? (task.previousTrace && task.previousTrace.cpus ? task.previousTrace.cpus + 1 : task.cpus + 1) : task.attempt > 1 && task.previousTrace && task.previousTrace.realtime && task.previousTrace.realtime / task.previousTrace.time >= .80 ? (task.previousTrace && task.previousTrace.cpus ? task.previousTrace.cpus + 1 : task.cpus + 1) : task.attempt > 1 && task.previousTrace && task.previousTrace['%cpu'] && task.previousTrace['%cpu'] / task.previousTrace.cpus >= .80 ? (task.previousTrace && task.previousTrace.cpus ? task.previousTrace.cpus + 2 : task.cpus + 2) : task.attempt > 3 ? (task.previousTrace && task.previousTrace.cpus ? task.previousTrace.cpus + 1 : task.cpus + 1) : (task.previousTrace && task.previousTrace.cpus ? task.previousTrace.cpus : task.cpus) } memory = { task.attempt == 1 ? 25.GB : task.attempt > 1 && task.previousTrace && (task.previousTrace.exit == 125 || task.previousTrace.exit == 137) ? (task.previousTrace && task.previousTrace.memory ? (task.previousTrace.memory as nextflow.util.MemoryUnit) + (10.GB * task.attempt) : task.memory + (10.GB * task.attempt)) : task.attempt > 1 && task.previousTrace && (task.previousTrace.exit == 15 || task.previousTrace.exit == 140) ? (task.previousTrace && task.previousTrace.memory ? (task.previousTrace.memory as nextflow.util.MemoryUnit) + (4.GB * task.attempt) : task.memory + (4.GB * task.attempt)) : task.attempt > 1 && task.previousTrace && task.previousTrace.peak_rss && task.previousTrace.peak_rss / task.previousTrace.memory >= .80 ? (task.previousTrace && task.previousTrace.memory ? (task.previousTrace.memory as nextflow.util.MemoryUnit) + (4.GB * task.attempt) : task.memory + (4.GB * task.attempt)) : task.attempt > 3 ? (task.previousTrace && task.previousTrace.memory ? (task.previousTrace.memory as nextflow.util.MemoryUnit) + 10.GB : task.memory + 10.GB) : (task.previousTrace && task.previousTrace.memory ? (task.previousTrace.memory as nextflow.util.MemoryUnit) + 2.GB : task.memory + 2.GB) } time = { task.attempt == 1 ? 2.h : task.attempt > 1 && task.previousTrace && (task.previousTrace.exit == 15 || task.previousTrace.exit == 140) ? (task.previousTrace && task.previousTrace.time ? (task.previousTrace.time as nextflow.util.Duration) + (12.h * task.attempt) : task.time + (12.h * task.attempt)) : task.attempt > 1 && task.previousTrace && task.previousTrace.realtime && task.previousTrace.realtime / task.previousTrace.time >= .80 ? (task.previousTrace && task.previousTrace.time ? (task.previousTrace.time as nextflow.util.Duration) + 12.h : task.time + 12.h) : task.attempt > 3 ? (task.previousTrace && task.previousTrace.time ? (task.previousTrace.time as nextflow.util.Duration) + 1.d : task.time + 1.d) : (task.previousTrace && task.previousTrace.time ? (task.previousTrace.time as nextflow.util.Duration) + 2.h : task.time + 2.h) } accelerator = 1 }
queue = { if (params.isolated && params.preemptable) { return "preemptable,${params.partition}" } // Only use the set partition when isolated else if (params.isolated) { return params.partition } // Short CPU jobs else if (task.time <= 2.h && !task.accelerator) { return "cpushort,cpu,${params.partition}" } // Short GPU jobs else if (task.accelerator && task.time <= 2.h) { return 'gpushort,gpu' } // GPU jobs else if (task.accelerator) { return 'gpu' } // High memory jobs else if (task.memory >= 512.GB || task.memory / task.cpus >= 50.GB) { return "cpu_highmem,cpu,${params.partition}" } // Preemptable jobs else if (task.attempt < 2 && params.preemptable) { return "preemptable,cpu,${params.partition}" } else { return params.partition } }
// Cluster Options for GPU and QoS clusterOptions = { if (task.accelerator && params.qos) { return "--qos=${params.qos} --gres=gpu:${task.accelerator.request}" } else if (task.accelerator) { return "--gres=gpu:${task.accelerator.request}" } else if (params.qos) { return "--qos=${params.qos}" } else { return '' } }
// Container Options for GPU Support containerOptions = { if (task.accelerator && workflow.containerEngine == 'singularity') { return '--nv' } else if (task.accelerator && workflow.containerEngine == 'docker') { return '--gpus all' } else { return '' } }
scratch = params.scratch cache = true // Use 'lenient' if caches are not working beforeScript = 'unset R_LIBS; export SINGULARITYENV_TMPDIR=$NXF_SCRATCH; export SINGULARITYENV_TMP=$NXF_SCRATCH' maxRetries = 3 errorStrategy = { task.attempt < 4 ? 'retry' : 'ignore' }
publishDir.mode = 'copy' publishDir.enabled = { publishDir.path ? true : false } stageInMode = 'symlink' stageOutMode = 'copy'}
workflow.output.mode = 'copy'
trace { enabled = true}