Developing new pipelines¶

Asimov only supports a small number of pipelines “out of the box”, but allows for new pipelines to be added as plugins.

There are two broad approaches to writing a plugin for asimov. Either you can incorporate it directly into your codebase, which is especially suitable if your pipeline is written in python, or you can write an interface plugin which allows interaction between asimov and the pipeline.

Asimov uses a feature of python packages called an “entrypoint” in order to identify pipelines which are installed with an asimov interface.

How asimov interacts with pipelines¶

Asimov provides information to the pipeline interface from its ledger, which stores the configuration settings for pipelines, events, and analyses in one single source of truth.

Getting an analysis started is effectively a two-part process, with a “build” phase and a “submit” phase.

During the “build” phase asimov collects the appropriate configuration settings for the pipeline, and creates the configuration file which will be passed to the pipeline’s builder script.

This stage is normally run via the command line as asimov manage build.

This stage runs the Production.make_config method to produce the templated configuration file.

During the “submit” phase asimov runs the pipeline builder script, and then submits the jobs to the scheduler.

This stage is normally run via the command line as asimov manage submit.

The Pipeline.build_dag method is used to run the pipeline builder, and the Pipeline.submit_dag method is used to handle the scheduler submission process.

The asimov monitor command line tool can then be used to check the status of

Creating an interface¶

An asimov pipeline interface is simply a python class which subclasses asimov.pipeline.Pipeline, and provides the pipeline specific logic to allow asimov to interact with it.

You should implement the new logic for your pipeline by overloading the methods defined in asimov.pipeline.Pipeline, though you can add pipeline-specific methods to the interface as well.

Required methods¶

These methods are required for the interface to function properly.

Pipeline.detect_completion

This contains the logic which allows asimov to determine if the job has completed sucessfully. While the performance of the job is monitored via the job scheduling system, the pipeline interface should check for final products of the analysis to determine completion. For example, the bilby pipeline interface checks if a posterior samples file has been produced (by checking that the files returned by Pipeline.collect_assets exist).

Pipeline.collect_assets

This method contains logic to find results files from the analysis run directory. These should be returned in a dictionary, and can be used by other pipelines to find analysis assets. For example, the Bayeswave interface returns a dictionary in the format

::

{“psds”: {“L1”: /path/to/L1/psd, “H1”: /path/to/H1/psd}}

Pipeline.collect_logs

This method should find all of the log files for the running analysis, and return them as a dictionary.

The most important of these is the build_dag method, which is used by the asimov framework to construct the DAG file to be submitted to the condor scheduler.

An example of a complete pipeline interface can be seen in the code for :class:asimov.pipelines.bilby.BilbyPipeline.

Pipeline hooks¶

It is possible to customise the run process of the asimov pipeline runner using hooks. By overloading the hook methods (listed below) inherited from the asimov.pipeline.Pipeline class additional operations can be conducted during the processing workflow. Hooks should take no arguments.

Pipeline.build_dag

This method should call the pipeline script which will take the configuration file for the analysis, and use them to generate submission files for the scheduler.

Pipeline.submit_dag

This method should submit the submission files generated by Pipeline.build_dag and submit them to the scheduler.

Pipeline.before_submit

This method is called by asimov before the submit_dag method is run, and can be used to perform any pre-processing which is required before the job is submitted to the scheduller.

Pipeline.after_completion

This method is called by asimov after completion of a job has been detected, and can be used to begin post-processing. For example, the bilby interface uses this hook to start PESummary post-processing.

Pipeline.after_processing

This method is run by asimov after the completion of the post-processing for a job has been completed. The default version of this hook runs the Pipeline.store_results method in order to place the final post-processed results into storage.

Optional methods¶

Pipeline.run_pesummary

This method will run PE summary on the samples created by an analysis pipeline. You may overload this method if you need to run PESummary in a non-standard way. You may also need to overload the Pipeline.detect_completion_processing method if you change this method.

Pipeline.store_results

This method will store the results of the analysis in the asimov results store for the project. The default method will collect the results files from PESummary, but it can be overloaded in order to store a different set of files or perform additional tasks prior to storage.

Pipeline.detect_completion_processing

This method provides the logic for determining if PESummary, or whichever post-processing commands are run, have completed successfully, and produced outputs. You should only need to overload this method if you have altered Pipeline.run_pesummary.

Pipeline.eject_job

This method is run by asimov to remove the analysis job from the scheduler. For example, it will be run if the status of an analysis is set to stop.

Pipeline.clean

This method should remove all of the artefacts of a job from the working directory.

Pipeline.resurrect

This method will be called on jobs which are marked as stuck in the asimov ledger, and can be used to, for example, submit a rescue DAG for a job on the condor scheduler.

Pipeline.read_ini

This should be implemented as a class method, and should parse the configuration file for the pipeline into a dictionary.

Pipeline.check_progress

This method will be run by asimov to gather information about the current status of the analysis.

Adding an entrypoint¶

Asimov uses an “entrypoint” to make pipelines discoverable.

In these examples we assume that your pipeline interface is a class called MyPipeline (which subclasses asimov.pipeline.Pipeline), and is located in a file called asimov.py in the main package, i.e.

|- setup.py
|- mypipeline
   |- __init__.py
   |- ...
   |- asimov.py
   |- ...
|- ...

There are a number of different python packaging technologies, so we will provide examples for just a few here.

setup.cfg¶

[options]
install_requires =
                 asimov
                 mypipeline

[metadata]
name = mypipeline
version = attr: mypipeline.__version__
description = A pipeline integration between asimov and mypipeline

[options.entry_points]
asimov.pipelines =
                 mypipeline = asimov:MyPipeline
© Copyright 2020-2024, Daniel Williams.
Created using Sphinx 7.2.6.