Developing new pipelines
========================

Asimov only supports a small number of pipelines "out of the box", but allows for new pipelines to be added as plugins.

There are two broad approaches to writing a plugin for asimov.
Either you can incorporate it directly into your codebase, which is especially suitable if your pipeline is written in python, or you can write an interface plugin which allows interaction between asimov and the pipeline.

Asimov uses a feature of python packages called an "entrypoint" in order to identify pipelines which are installed with an asimov interface.

How asimov interacts with pipelines
-----------------------------------

Asimov provides information to the pipeline interface from its ledger, which stores the configuration settings for pipelines, events, and analyses in one single source of truth.

Getting an analysis started is effectively a two-part process, with a "build" phase and a "submit" phase.

During the "build" phase asimov collects the appropriate configuration settings for the pipeline, and creates the configuration file which will be passed to the pipeline's builder script.

This stage is normally run via the command line as ``asimov manage build``.

This stage runs the ``Production.make_config`` method to produce the templated configuration file.

During the "submit" phase asimov runs the pipeline builder script, and then submits the jobs to the scheduler.

This stage is normally run via the command line as ``asimov manage submit``.

The ``Pipeline.build_dag`` method is used to run the pipeline builder, and the ``Pipeline.submit_dag`` method is used to handle the scheduler submission process.

The ``asimov monitor`` command line tool can then be used to check the status of 

Creating an interface
---------------------

An asimov pipeline interface is simply a python class which subclasses ``asimov.pipeline.Pipeline``, and provides the pipeline specific logic to allow asimov to interact with it.

You should implement the new logic for your pipeline by overloading the methods defined in ``asimov.pipeline.Pipeline``, though you can add pipeline-specific methods to the interface as well.


Required methods
~~~~~~~~~~~~~~~~

These methods are required for the interface to function properly.

``Pipeline.detect_completion``
    This contains the logic which allows asimov to determine if the job has completed sucessfully.
    While the performance of the job is monitored via the job scheduling system, the pipeline interface should check for final products of the analysis to determine completion.
    For example, the bilby pipeline interface checks if a posterior samples file has been produced (by checking that the files returned by `Pipeline.collect_assets` exist).

``Pipeline.collect_assets``
    This method contains logic to find results files from the analysis run directory.
    These should be returned in a dictionary, and can be used by other pipelines to find analysis assets.
    For example, the Bayeswave interface returns a dictionary in the format

    ::
       {"psds": {"L1": /path/to/L1/psd, "H1": /path/to/H1/psd}}

``Pipeline.collect_logs``
    This method should find all of the log files for the running analysis, and return them as a dictionary.


The most important of these is the ``build_dag`` method, which is used by the asimov framework to construct the DAG file to be submitted to the condor scheduler.

An example of a complete pipeline interface can be seen in the code for :class:``asimov.pipelines.bilby.BilbyPipeline``.


Pipeline hooks
--------------

It is possible to customise the run process of the asimov pipeline runner using hooks.
By overloading the hook methods (listed below) inherited from the ``asimov.pipeline.Pipeline`` class additional operations can
be conducted during the processing workflow.
Hooks should take no arguments.

``Pipeline.build_dag``
    This method should call the pipeline script which will take the configuration file for the analysis, and use them to generate submission files for the scheduler.

``Pipeline.submit_dag``
    This method should submit the submission files generated by ``Pipeline.build_dag`` and submit them to the scheduler.

``Pipeline.before_submit``
    This method is called by asimov before the ``submit_dag`` method is run, and can be used to perform any pre-processing which is required before the job is submitted to the scheduller.

``Pipeline.after_completion``
    This method is called by asimov after completion of a job has been detected, and can be used to begin post-processing.
    For example, the bilby interface uses this hook to start PESummary post-processing.

``Pipeline.after_processing``
    This method is run by asimov after the completion of the post-processing for a job has been completed.
    The default version of this hook runs the ``Pipeline.store_results`` method in order to place the final post-processed results into storage.
    
Optional methods
----------------

``Pipeline.run_pesummary``
    This method will run PE summary on the samples created by an analysis pipeline.
    You may overload this method if you need to run PESummary in a non-standard way.
    You may also need to overload the ``Pipeline.detect_completion_processing`` method if you change this method.

``Pipeline.store_results``
    This method will store the results of the analysis in the asimov results store for the project.
    The default method will collect the results files from PESummary, but it can be overloaded in order to store a different set of files or perform additional tasks prior to storage.

``Pipeline.detect_completion_processing``
    This method provides the logic for determining if PESummary, or whichever post-processing commands are run, have completed successfully, and produced outputs.
    You should only need to overload this method if you have altered ``Pipeline.run_pesummary``.

``Pipeline.eject_job``
    This method is run by asimov to remove the analysis job from the scheduler.
    For example, it will be run if the status of an analysis is set to ``stop``.

``Pipeline.clean``
    This method should remove all of the artefacts of a job from the working directory.

``Pipeline.resurrect``
    This method will be called on jobs which are marked as ``stuck`` in the asimov ledger, and can be used to, for example, submit a rescue DAG for a job on the condor scheduler.

``Pipeline.read_ini``
    This should be implemented as a class method, and should parse the configuration file for the pipeline into a dictionary.

``Pipeline.check_progress``
    This method will be run by asimov to gather information about the current status of the analysis.
    
Adding an entrypoint
--------------------

Asimov uses an "entrypoint" to make pipelines discoverable.

In these examples we assume that your pipeline interface is a class called ``MyPipeline`` (which subclasses ``asimov.pipeline.Pipeline``), and is located in a file called ``asimov.py`` in the main package, i.e.

.. code-block::

   |- setup.py
   |- mypipeline
      |- __init__.py
      |- ...
      |- asimov.py
      |- ...
   |- ...
   

There are a number of different python packaging technologies, so we will provide examples for just a few here.
   
``setup.cfg``
~~~~~~~~~~~~~~

.. code-block:: toml

   [options]
   install_requires =
		    asimov
		    mypipeline

   [metadata]
   name = mypipeline
   version = attr: mypipeline.__version__
   description = A pipeline integration between asimov and mypipeline

   [options.entry_points]
   asimov.pipelines =
		    mypipeline = asimov:MyPipeline