Part 4: Input validation¶

In this fourth part of the Hello nf-core training course, we show you how to use the nf-schema plugin to validate inputs.

One of the key features of nf-core pipelines is robust input validation. The nf-schema plugin provides automated validation of pipeline parameters and sample sheets against JSON schemas, helping catch errors early and provide clear feedback to users.

In this section, we'll add proper input validation to our pipeline using the nf-schema plugin.

Note

This section assumes you have completed Part 2: Rewrite Hello for nf-core and have a working core-hello pipeline.

1. Understand schema-based validation¶

Before we implement validation, let's understand how nf-schema works and what schema files do.

1.1. What is nf-schema?¶

The nf-schema plugin is a Nextflow plugin that provides functionality for:

Parameter validation: Validating pipeline parameters against a JSON schema
Sample sheet validation: Validating input sample sheets and converting them to channels
Help text generation: Automatically generating help text from schema definitions
Parameter summary: Displaying which parameters differ from defaults

nf-schema is the successor to the deprecated nf-validation plugin and uses standard JSON Schema Draft 2020-12 for validation.

1.2. The two schema files¶

An nf-core pipeline typically uses two schema files:

nextflow_schema.json: Defines and validates pipeline parameters (e.g., --input, --outdir, --batch)
assets/schema_input.json: Defines the structure and validates the contents of input sample sheets

Both files use JSON Schema format, a standard for describing and validating JSON data structures.

1.3. Examine an existing schema¶

Let's look at the schema_input.json file that was created with our pipeline template:

cat core-hello/assets/schema_input.json

core-hello/assets/schema_input.json
{
    "$schema": "https://json-schema.org/draft/2020-12/schema",
    "$id": "https://raw.githubusercontent.com/core/hello/main/assets/schema_input.json",
    "title": "core/hello pipeline - params.input schema",
    "description": "Schema for the file provided with params.input",
    "type": "array",
    "items": {
        "type": "object",
        "properties": {
            "sample": {
                "type": "string",
                "pattern": "^\\S+$",
                "errorMessage": "Sample name must be provided and cannot contain spaces",
                "meta": ["id"]
            },
            "fastq_1": {
                "type": "string",
                "format": "file-path",
                "exists": true,
                "pattern": "^\\S+\\.f(ast)?q\\.gz$",
                "errorMessage": "FastQ file for reads 1 must be provided, cannot contain spaces and must have extension '.fq.gz' or '.fastq.gz'"
            },
            "fastq_2": {
                "type": "string",
                "format": "file-path",
                "exists": true,
                "pattern": "^\\S+\\.f(ast)?q\\.gz$",
                "errorMessage": "FastQ file for reads 2 cannot contain spaces and must have extension '.fq.gz' or '.fastq.gz'"
            }
        },
        "required": ["sample", "fastq_1"]
    }
}

This schema defines:

The input is an array of objects (one per sample)
Each object has fields: sample, fastq_1, and fastq_2
Field types and validation rules (e.g., file paths must exist and match certain patterns)
Which fields are required
Custom error messages for validation failures

This schema is appropriate for FASTQ sequencing data but doesn't match our simple greetings CSV format.

Takeaway¶

You now know what nf-schema is, what it does, and how schema files define validation rules for pipeline inputs.

What's next?¶

Create a custom schema file for our greetings input format.

2. Create a schema for the greetings input¶

Our greetings.csv file is very simple - it just contains one greeting per line. Let's create a schema that validates this format.

2.1. Understand the greetings.csv format¶

Let's remind ourselves what our input looks like:

cat core-hello/assets/greetings.csv

core-hello/assets/greetings.csv

Hello
Bonjour
Holà

This is a simple CSV with: - One column (no header) - One greeting per line - Text strings with no special format requirements

2.2. Design the schema structure¶

For our use case, we want to:

Accept CSV input with one column
Treat each row as a greeting string
Ensure greetings are not empty
Optionally, ensure no whitespace-only entries

We'll structure this as an array of objects, where each object has a greeting field.

2.3. Create the schema file¶

Replace the contents of core-hello/assets/schema_input.json with the following:

core-hello/assets/schema_input.json
{
    "$schema": "https://json-schema.org/draft/2020-12/schema",
    "$id": "https://raw.githubusercontent.com/core/hello/main/assets/schema_input.json",
    "title": "core/hello pipeline - params.input schema",
    "description": "Schema for the greetings file provided with params.input",
    "type": "array",
    "items": {
        "type": "object",
        "properties": {
            "greeting": {
                "type": "string",
                "pattern": "^\\S.*$",
                "errorMessage": "Greeting must be provided and cannot be empty or start with whitespace"
            }
        },
        "required": ["greeting"]
    }
}

Let's break down the key parts:

type: "array": The input is parsed as an array (list) of items
items.type: "object": Each item in the array is an object
properties.greeting: Defines a field called greeting
type: "string": Must be a text string
pattern: "^\\S.*$": Must start with a non-whitespace character (but can contain spaces after that)
errorMessage: Custom error message shown if validation fails
required: ["greeting"]: The greeting field is mandatory

2.4. Understand the samplesheet structure expectation¶

When nf-schema reads a CSV file, it expects either:

A headerless CSV: Each row becomes an object with fields named by column position (column1, column2, etc.)
A CSV with headers: The first row defines field names

For our simple case, we'll add a header to our greetings file to match the schema.

2.5. Update the greetings.csv file¶

Add a header line to the greetings file:

AfterBefore

core-hello/assets/greetings.csv
1 2 3 4	`greeting Hello Bonjour Holà`

core-hello/assets/greetings.csv
1 2 3	`Hello Bonjour Holà`

Now the CSV file has a header that matches the field name in our schema.

Takeaway¶

You know how to create a JSON schema that defines validation rules for a simple sample sheet input.

What's next?¶

Implement the validation in the pipeline code.

3. Implement input validation with samplesheetToList¶

Now we need to replace our simple CSV parsing with nf-schema's samplesheetToList function, which validates and converts the sample sheet.

3.1. Understand samplesheetToList¶

The samplesheetToList function:

Reads the input sample sheet (CSV, TSV, JSON, or YAML)
Validates it against the provided JSON schema
Returns a Groovy list where each entry corresponds to a row
Throws helpful error messages if validation fails

Basic usage:

include { samplesheetToList } from 'plugin/nf-schema'

def input_list = samplesheetToList(params.input, "assets/schema_input.json")

3.2. Update the input handling code¶

Open core-hello/subworkflows/local/utils_nfcore_hello_pipeline/main.nf and locate the section where we create the input channel (around line 64).

We need to:

Import the samplesheetToList function
Use it to validate and parse the input
Extract just the greeting strings for our workflow

First, add the import at the top of the file:

AfterBefore

core-hello/subworkflows/local/utils_nfcore_hello_pipeline/main.nf
//
// Subworkflow with functionality that may be useful for any Nextflow pipeline
//

import org.yaml.snakeyaml.Yaml
import groovy.json.JsonOutput

include { UTILS_NFSCHEMA_PLUGIN  } from '../../nf-core/utils_nfschema_plugin'
include { UTILS_NEXTFLOW_PIPELINE } from '../../nf-core/utils_nextflow_pipeline'
include { UTILS_NFCORE_PIPELINE   } from '../../nf-core/utils_nfcore_pipeline'

include { paramsSummaryMap          } from 'plugin/nf-schema'
include { samplesheetToList         } from 'plugin/nf-schema'

core-hello/subworkflows/local/utils_nfcore_hello_pipeline/main.nf
//
// Subworkflow with functionality that may be useful for any Nextflow pipeline
//

import org.yaml.snakeyaml.Yaml
import groovy.json.JsonOutput

include { UTILS_NFSCHEMA_PLUGIN  } from '../../nf-core/utils_nfschema_plugin'
include { UTILS_NEXTFLOW_PIPELINE } from '../../nf-core/utils_nextflow_pipeline'
include { UTILS_NFCORE_PIPELINE   } from '../../nf-core/utils_nfcore_pipeline'

include { paramsSummaryMap          } from 'plugin/nf-schema'

Now update the channel creation code:

AfterBefore

core-hello/subworkflows/local/utils_nfcore_hello_pipeline/main.nf
    //
    // Create channel from input file provided through params.input
    //
    ch_samplesheet = Channel.fromList(samplesheetToList(params.input, "${projectDir}/assets/schema_input.json"))
        .map { row ->
            // Extract just the greeting string from each row
            row.greeting
        }

    emit:
    samplesheet = ch_samplesheet
    versions    = ch_versions

core-hello/subworkflows/local/utils_nfcore_hello_pipeline/main.nf
    //
    // Create channel from input file provided through params.input
    //
    ch_samplesheet = Channel.fromPath(params.input)
                        .splitCsv()
                        .map { line -> line[0] }

    emit:
    samplesheet = ch_samplesheet
    versions    = ch_versions

Let's break down what changed:

samplesheetToList(params.input, "${projectDir}/assets/schema_input.json"): Validates the input file against our schema and returns a list
Channel.fromList(...): Converts the list into a Nextflow channel
.map { row -> row.greeting }: Extracts just the greeting string from each validated row

3.3. Enable parameter validation¶

The pipeline template already includes code to validate parameters, but we had it disabled with --validate_params false. Now that we have proper schema validation set up, let's enable it.

The validation is controlled by the params.validate_params parameter in core-hello/nextflow.config. Let's check that it's set to true (the default):

core-hello/nextflow.config
    validate_params            = true

This should already be the default. The validation is performed by the UTILS_NFSCHEMA_PLUGIN subworkflow, which is called during pipeline initialization.

Takeaway¶

You know how to use the samplesheetToList function to validate and parse input sample sheets using JSON schemas.

What's next?¶

Test that the validation works correctly.

4. Test input validation¶

Let's verify that our validation works by testing both valid and invalid inputs.

4.1. Test with valid input¶

First, let's confirm the pipeline still runs successfully with valid input:

nextflow run core-hello --outdir core-hello-results -profile test,docker

Note that we no longer need --validate_params false!

Output

 N E X T F L O W   ~  version 24.10.4

Launching `core-hello/main.nf` [serene_volta] DSL2 - revision: c31b966b36

Input/output options
  input                     : core-hello/assets/greetings.csv
  outdir                    : core-hello-results

Institutional config options
  config_profile_name       : Test profile
  config_profile_description: Minimal test dataset to check pipeline function

Core Nextflow options
  runName                   : serene_volta
  containerEngine           : docker
  profile                   : test,docker

!! Only displaying parameters that differ from the pipeline defaults !!
------------------------------------------------------
executor >  local (7)
[5a/1c3d8b] CORE_HELLO:HELLO:sayHello (1)       | 3 of 3 ✔
[2b/9f4a2c] CORE_HELLO:HELLO:convertToUpper (3) | 3 of 3 ✔
[8c/3e7d1a] CORE_HELLO:HELLO:collectGreetings   | 1 of 1 ✔
[4d/6b2f8e] CORE_HELLO:HELLO:cowpy              | 1 of 1 ✔
-[core/hello] Pipeline completed successfully-

Great! The pipeline runs successfully and validation passes silently.

4.2. Test with invalid input (empty greeting)¶

Now let's test that validation catches errors. Create a test file with an invalid entry:

cat > /tmp/invalid_greetings.csv << 'EOF'
greeting
Hello

Holà
EOF

This file has an empty second row (just whitespace), which should fail our validation rule.

Try running the pipeline with this invalid input:

nextflow run core-hello --input /tmp/invalid_greetings.csv --outdir test-results -profile docker

Output

 N E X T F L O W   ~  version 24.10.4

Launching `core-hello/main.nf` [silly_cuvier] DSL2 - revision: c31b966b36

ERROR ~ Validation of '/tmp/invalid_greetings.csv' file failed!

 -- Check '/tmp/invalid_greetings.csv' --
   -> Entry 2: Greeting must be provided and cannot be empty or start with whitespace

 -- Check '.nextflow.log' file for details

Perfect! The validation caught the error and provided a clear, helpful error message pointing to:

Which file failed validation
Which entry (row 2) has the problem
What the specific problem is

4.3. Test with missing required field¶

Let's create another invalid file, this time missing the header:

cat > /tmp/no_header.csv << 'EOF'
Hello
Bonjour
Holà
EOF

Try running with this file:

nextflow run core-hello --input /tmp/no_header.csv --outdir test-results -profile docker

The validation will fail because the first row is treated as a header, but then the actual data rows don't have a greeting field defined.

4.4. Understanding validation benefits¶

Input validation provides several important benefits:

Early error detection: Problems are caught before any processing begins
Clear error messages: Users get specific, actionable feedback about what's wrong
Reduced debugging time: No need to trace cryptic errors through the pipeline
Documentation: The schema serves as documentation of expected input format
Type safety: Ensures data types match expectations throughout the pipeline

Takeaway¶

You know how to test input validation and understand the benefits it provides for pipeline usability and robustness.

What's next?¶

Explore additional validation features and parameter schema validation.

5. Optional: Explore parameter validation¶

In addition to sample sheet validation, nf-schema also validates pipeline parameters against nextflow_schema.json.

5.1. Examine the parameter schema¶

Let's look at a section of the parameter schema:

grep -A 10 '"batch"' core-hello/nextflow_schema.json

You might notice that the batch parameter isn't defined yet in the schema. Let's add it.

5.2. Add the batch parameter to the schema¶

The parameter schema can be edited manually, but nf-core provides a tool to help:

nf-core pipelines schema build

This launches an interactive tool that helps you add and configure parameters. However, for our simple case, we can edit the JSON directly.

Open core-hello/nextflow_schema.json and find the "input_output_options" section. Add the batch parameter:

core-hello/nextflow_schema.json (excerpt)
"input_output_options": {
    "title": "Input/output options",
    "type": "object",
    "fa_icon": "fas fa-terminal",
    "description": "Define where the pipeline should find input data and save output data.",
    "required": ["input", "outdir"],
    "properties": {
        "input": {
            "type": "string",
            "format": "file-path",
            "exists": true,
            "schema": "assets/schema_input.json",
            "mimetype": "text/csv",
            "pattern": "^\\S+\\.csv$",
            "description": "Path to comma-separated file containing greetings.",
            "fa_icon": "fas fa-file-csv"
        },
        "batch": {
            "type": "string",
            "default": "batch-01",
            "description": "Name for this batch of greetings",
            "fa_icon": "fas fa-tag"
        },
        "outdir": {
            "type": "string",
            "format": "directory-path",
            "description": "The output directory where the results will be saved.",
            "fa_icon": "fas fa-folder-open"
        }
    }
}

5.3. Test parameter validation¶

Now try running with an invalid parameter type:

nextflow run core-hello --batch 12345 --outdir test-results -profile test,docker

The pipeline should run fine because 12345 will be converted to the string "12345". Parameter validation is more useful for catching missing required parameters or invalid file paths.

Try running without the required input parameter:

nextflow run core-hello --outdir test-results -profile docker

Output

ERROR ~ Validation of pipeline parameters failed!

 -- Check '.nextflow.log' file for details
 * --input: required property is missing

Excellent! The validation catches missing required parameters.

Takeaway¶

You know how parameter validation works and how to add parameter definitions to the schema.

What's next?¶

You can explore the nf-schema documentation to learn about advanced features like metadata maps, multi-file samples, and custom validation rules.

Congratulations!¶

You've completed the Hello nf-core training course! 🎉

Throughout this course, you've learned how to:

Run nf-core pipelines using test profiles and understand their structure
Create nf-core-style pipelines from scratch using the nf-core template
Make workflows composable with take, main, and emit blocks
Integrate nf-core modules from the community repository
Implement robust input validation using JSON schemas and the nf-schema plugin
Follow nf-core conventions for code organization, configuration, and documentation

You now have the foundational knowledge to develop production-ready Nextflow pipelines that follow nf-core best practices. Your pipeline includes proper module organization, input validation, and is ready to be extended with additional features.

Where to go from here¶

Ready to take your skills further? Here are some recommended next steps:

nf-core website: Explore the full catalog of nf-core pipelines and modules
nf-core documentation: Deep dive into pipeline development guidelines and best practices
nf-schema documentation: Learn advanced validation techniques
nf-test: Add comprehensive testing to your pipeline
Nextflow patterns: Discover common workflow patterns and solutions
Side Quests: Explore advanced Nextflow topics like metadata handling, debugging, and workflow composition

Get involved with the community¶

The nf-core community is welcoming and always happy to help:

nf-core Slack: Join the community to ask questions and share your work
GitHub Discussions: Participate in discussions about modules and pipelines
Contribute: Consider contributing your own modules or improvements back to the community

Thank you for completing this training. We hope you enjoyed learning about nf-core and feel confident building your own pipelines. Happy pipelining! 🚀