Interacting with compute clusters¶

Asimov is designed to form a layer between you, a data analyst, and the computing cluster which performs your analysis.

An important part of its job is therefore interacting with that cluster, including submitting jobs to it, checking on how they’re running, and collecting results once jobs have finished.

Currently Asimov is designed to use the htcondor scheduling system, though at some point in the future we may implement additional systems to improve the overall flexibility of Asimov. HTCondor, which we’ll sometimes call condor for short, can be run both on a full-scale computing cluster and on a single machine (their minicondor distribution is ideal for this) and allows jobs to be queued, and run in the correct order to ensure that required inputs are produced for each stage of an analysis.

Interacting with the scheduler¶

Asimov handles all interactions which are required with the htcondor scheduler. If you’re using the command line interface to Asimov then the olivaw manage submit and olivaw monitor commands will initiate interaction between Asimov and the cluster.

The olivaw manage submit command will submit a job to the cluster, and each pipeline which Asimov supports will provide sufficient logic to set a job up on the cluster based on the configuration files created by running olivaw manage build.

The olivaw monitor command will check the status of a job in a cluster, and update the ledger as appropriate, to indicate if a job has completed or become stuck. In the case of a completed job the pipeline’s post-completion hook will be run, allowing post-processing to be started for a finished job.

Caching¶

In order to improve the performance of Asimov’s interactions with clusters, and to reduce the strain placed on the schedulers’ databases by default asimov will cache job information for 15 minutes. This can be adjusted in the main configuration file for asimov.

Configuration settings¶

Configuration options for HTCondor are stored in the htcondor namespace of the main configuration file for asimov.

cache_time¶

[htcondor]
cache_time = 900

The time for which job status information should be cached, in seconds. The default setting is 900 seconds (15 minutes). Please take care when reducing this setting, as excessively frequent querying of busy schedulers can result in reduced performance.

© Copyright 2020-2024, Daniel Williams.
Created using Sphinx 7.2.6.