BIDS Integration¶
This document outlines strategies and best practices for integrating signalJourney provenance files within datasets structured according to the Brain Imaging Data Structure (BIDS) standard.
Goal¶
The primary goal is to store detailed pipeline provenance alongside the BIDS data it describes or generates, without violating BIDS validation rules.
Placement of *_signalJourney.json Files¶
BIDS currently does not formally specify *_signalJourney.json files. Therefore, they should always be placed within the /derivatives subdirectory of a BIDS dataset.
Recommended Location:
Place signalJourney files within the specific pipeline derivative directory they document:
<bids_root>/
derivatives/
<pipeline_name>/
sub-<label>/
[ses-<label>/]
<datatype>/
<source_entities>_desc-<description>_<suffix>.json <-- Associated data file
<source_entities>_desc-<description>_signalJourney.json <-- Provenance for the data file
...
dataset_description.json
*_signalJourney.json <-- Optional: For pipeline-level info not tied to a specific output file
<pipeline_name>: A descriptive name for your processing pipeline (e.g.,eeg_preprocessing,fmri_stats).- The signalJourney filename should mirror the BIDS filename of the data file it primarily documents, replacing the data suffix (e.g.,
_eeg.fif) with_signalJourney.json. - Pipeline-level signalJourney files (not documenting a specific output file but perhaps the overall pipeline execution) can be placed at the root of the
<pipeline_name>directory.
Why /derivatives?
- Raw data directories in BIDS have strict validation rules. signalJourney files are not part of the raw data standard.
/derivativesis the designated location for outputs of processing pipelines.- This keeps provenance clearly associated with the derived data.
Referencing Files within signalJourney¶
When specifying file paths within inputSources or outputTargets in a signalJourney file placed within a BIDS /derivatives directory:
- Use relative paths: Paths should ideally be relative to the BIDS root directory.
- Input Raw Data:
../rawdata/sub-01/ses-test/eeg/sub-01_ses-test_task-rest_eeg.vhdr - Input Derivative Data:
./derivatives/preprocessing_pipeline/sub-01/eeg/sub-01_desc-preproc_eeg.fif(Note the leading./) - Output Derivative Data:
./derivatives/ica_pipeline/sub-01/eeg/sub-01_desc-cleaned_eeg.fif
- Input Raw Data:
- This ensures paths remain valid regardless of where the BIDS dataset is moved.
- Avoid absolute paths unless necessary for external resources.
Using .bidsignore¶
To prevent standard BIDS validators (which don't recognize *_signalJourney.json) from generating errors or warnings about these files, add an entry to the .bidsignore file located at the root of your BIDS dataset:
# Ignore signalJourney provenance files
*_signalJourney.json
This tells BIDS-compliant tools to ignore these files during validation scans.
Root-Level vs. Derivative-Level Pipelines¶
signalJourney can document pipelines that operate on entire datasets or individual subject/session files.
- Subject/Session Level: If a pipeline is run independently for each subject/session, the corresponding
*_signalJourney.jsonfile should typically reside alongside the output data within the subject's derivative directory (as shown in the recommended location example). - Dataset Level: If a single pipeline instance processes multiple subjects (e.g., group analysis, template creation), a single
*_signalJourney.jsonfile might be placed at the root of the specific derivative pipeline directory (e.g.,/derivatives/<group_analysis_pipeline>/group_analysis_signalJourney.json). This file would then reference multiple subject input files using relative paths from the BIDS root.
Linking signalJourney Files¶
If one pipeline's output (documented in pipelineA_signalJourney.json) is the input to another pipeline (documented in pipelineB_signalJourney.json), use the pipelineSource field within the inputSources of pipelineB_signalJourney.json:
// Inside pipelineB_signalJourney.json
"inputSources": [
{
"sourceType": "file",
"location": "./derivatives/pipelineA/sub-01/eeg/sub-01_desc-outputA_eeg.fif",
"pipelineSource": {
"pipelineName": "Pipeline A Name", // Matches pipelineInfo.name in pipelineA_signalJourney.json
"pipelineVersion": "1.2.0", // Matches pipelineInfo.version in pipelineA_signalJourney.json
"signalJourneyFile": "./derivatives/pipelineA/sub-01_desc-outputA_signalJourney.json" // Optional path to the source file
}
}
]
Summary of Recommendations¶
- Place
*_signalJourney.jsonfiles only within/derivatives/<pipeline_name>/. - Use filenames that mirror the associated data file, replacing the data suffix with
_signalJourney.json. - Use relative paths from the BIDS root within
inputSourcesandoutputTargets. - Add
*_signalJourney.jsonto your top-level.bidsignorefile. - Store provenance alongside the data it describes (subject/session level where appropriate).
- Use
pipelineSourceto link dependent pipelines.