Developing new pipelines
Asimov only supports a small number of pipelines “out of the box”, but allows for new pipelines to be added as plugins.
There are two broad approaches to writing a plugin for asimov.
Either you can incorporate it directly into your codebase, which is especially suitable if your pipeline is written in python, or you can write an interface plugin which allows interaction between asimov and the pipeline.
Asimov uses a feature of python packages called an “entrypoint” in order to identify pipelines which are installed with an asimov interface.
How asimov interacts with pipelines
Asimov provides information to the pipeline interface from its ledger, which stores the configuration settings for pipelines, events, and analyses in one single source of truth.
Getting an analysis started is effectively a two-part process, with a “build” phase and a “submit” phase.
During the “build” phase asimov collects the appropriate configuration settings for the pipeline, and creates the configuration file which will be passed to the pipeline’s builder script.
This stage is normally run via the command line as asimov manage build
.
This stage runs the Production.make_config
method to produce the templated configuration file.
During the “submit” phase asimov runs the pipeline builder script, and then submits the jobs to the scheduler.
This stage is normally run via the command line as asimov manage submit
.
The Pipeline.build_dag
method is used to run the pipeline builder, and the Pipeline.submit_dag
method is used to handle the scheduler submission process.
The asimov monitor
command line tool can then be used to check the status of
Creating an interface
An asimov pipeline interface is simply a python class which subclasses asimov.pipeline.Pipeline
, and provides the pipeline specific logic to allow asimov to interact with it.
You should implement the new logic for your pipeline by overloading the methods defined in asimov.pipeline.Pipeline
, though you can add pipeline-specific methods to the interface as well.
Required methods
These methods are required for the interface to function properly.
Pipeline.detect_completion
This contains the logic which allows asimov to determine if the job has completed sucessfully.
While the performance of the job is monitored via the job scheduling system, the pipeline interface should check for final products of the analysis to determine completion.
For example, the bilby pipeline interface checks if a posterior samples file has been produced (by checking that the files returned by Pipeline.collect_assets exist).
Pipeline.collect_assets
This method contains logic to find results files from the analysis run directory.
These should be returned in a dictionary, and can be used by other pipelines to find analysis assets.
For example, the Bayeswave interface returns a dictionary in the format
- ::
{“psds”: {“L1”: /path/to/L1/psd, “H1”: /path/to/H1/psd}}
Pipeline.collect_logs
This method should find all of the log files for the running analysis, and return them as a dictionary.
The most important of these is the build_dag
method, which is used by the asimov framework to construct the DAG file to be submitted to the condor scheduler.
An example of a complete pipeline interface can be seen in the code for :class:asimov.pipelines.bilby.BilbyPipeline
.
Pipeline hooks
It is possible to customise the run process of the asimov pipeline runner using hooks.
By overloading the hook methods (listed below) inherited from the asimov.pipeline.Pipeline
class additional operations can
be conducted during the processing workflow.
Hooks should take no arguments.
Pipeline.build_dag
This method should call the pipeline script which will take the configuration file for the analysis, and use them to generate submission files for the scheduler.
Pipeline.submit_dag
This method should submit the submission files generated by Pipeline.build_dag
and submit them to the scheduler.
Pipeline.before_submit
This method is called by asimov before the submit_dag
method is run, and can be used to perform any pre-processing which is required before the job is submitted to the scheduller.
Pipeline.after_completion
This method is called by asimov after completion of a job has been detected, and can be used to begin post-processing.
For example, the bilby interface uses this hook to start PESummary post-processing.
Pipeline.after_processing
This method is run by asimov after the completion of the post-processing for a job has been completed.
The default version of this hook runs the Pipeline.store_results
method in order to place the final post-processed results into storage.
Optional methods
Pipeline.run_pesummary
This method will run PE summary on the samples created by an analysis pipeline.
You may overload this method if you need to run PESummary in a non-standard way.
You may also need to overload the Pipeline.detect_completion_processing
method if you change this method.
Pipeline.store_results
This method will store the results of the analysis in the asimov results store for the project.
The default method will collect the results files from PESummary, but it can be overloaded in order to store a different set of files or perform additional tasks prior to storage.
Pipeline.detect_completion_processing
This method provides the logic for determining if PESummary, or whichever post-processing commands are run, have completed successfully, and produced outputs.
You should only need to overload this method if you have altered Pipeline.run_pesummary
.
Pipeline.eject_job
This method is run by asimov to remove the analysis job from the scheduler.
For example, it will be run if the status of an analysis is set to stop
.
Pipeline.clean
This method should remove all of the artefacts of a job from the working directory.
Pipeline.resurrect
This method will be called on jobs which are marked as stuck
in the asimov ledger, and can be used to, for example, submit a rescue DAG for a job on the condor scheduler.
Pipeline.read_ini
This should be implemented as a class method, and should parse the configuration file for the pipeline into a dictionary.
Pipeline.check_progress
This method will be run by asimov to gather information about the current status of the analysis.
Adding an entrypoint
Asimov uses an “entrypoint” to make pipelines discoverable.
In these examples we assume that your pipeline interface is a class called MyPipeline
(which subclasses asimov.pipeline.Pipeline
), and is located in a file called asimov.py
in the main package, i.e.
|- setup.py
|- mypipeline
|- __init__.py
|- ...
|- asimov.py
|- ...
|- ...
There are a number of different python packaging technologies, so we will provide examples for just a few here.
setup.cfg
[options]
install_requires =
asimov
mypipeline
[metadata]
name = mypipeline
version = attr: mypipeline.__version__
description = A pipeline integration between asimov and mypipeline
[options.entry_points]
asimov.pipelines =
mypipeline = asimov:MyPipeline