Working with open data¶

We designed asimov to make it easier than ever to run your own analyses on LIGO data.

This page will provide a quick overview of how you can set up an analysis similar to the one used in a recent LIGO-Virgo-KAGRA (LVK) publication.

Note

We don’t guarantee that you’ll get precisely the same results as those which were published. We’re working on including various post-processing steps in the asimov workflow which were used for this publication, but they’re not quite ready yet. However, the analysis should work in a similar fashion.

Getting started¶

The process which we’ve outlined here should work on a fairly modern Linux-based computer, or a Windows computer which is running Windows Subsystem for Linux. We’ve not tested it yet on MacOS computers.

Getting the analysis environment¶

LIGO analyses require a complicated software stack. Fortunately it’s fairly easy to install this using the conda tool.

Full instructions on using IGWN environments with conda are `available here<https://computing.docs.ligo.org/conda/usage/>`_, but normally the following steps will work.

First you’ll need to install conda if you haven’t already got it.

$ curl -L -O https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-$(uname)-$(uname -m).sh
$ bash Mambaforge-$(uname)-$(uname -m).sh

You’ll be asked some questions about the installation as this runs.

Next you need to add conda-forge as a source of packages.

$ conda config --add channels conda-forge
$ conda config --set channel_priority strict

We can now create our own environment, which we’ll populate with the IGWN software.

$ curl -L -O https://computing.docs.ligo.org/conda/environments/linux-64/igwn-py39-testing.yaml
$ conda env create --file igwn-py39-testing.yaml

We need to activate this, which will change things so that we’re using versions of software inside this conda environment.

$ conda activate igwn-py39-testing

To give us access to open data we need to install an extra package as well

$ conda install -c conda-forge asimov-gwdata

But now we should have access to all the software that we need to proceed!

Getting htcondor¶

When the results are prepared for LVK publications we make use of a large network of computing facilities which are coordinated using a tool called htcondor which distributes analyses around different computers. Unless you happen to have a computing cluster which uses this technology available, you’ll need to install a mini version of htcondor which can run on a single machine. The instructions we outline here are derived from the htcondor documentation which you can find here, and assume you can run the sudo command on your machine to gain Administrator control. It’s still possible to get things going without this, but you’ll instead need to follow the instructions here.

You can install the minicondor software by running

$ sudo curl -fsSL https://get.htcondor.org | /bin/bash -s -- --no-dry-run

If everything’s worked you can run

$ condor_status

Which should give you a list of processors available on your machine, which looks something like this:

Name                 OpSys      Arch   State     Activity     LoadAv Mem   Actv

slot1@azaphrael.org  LINUX      X86_64 Unclaimed Benchmarking  0.000 2011  0+00
slot2@azaphrael.org  LINUX      X86_64 Unclaimed Idle          0.000 2011  0+00
slot3@azaphrael.org  LINUX      X86_64 Unclaimed Idle          0.000 2011  0+00
slot4@azaphrael.org  LINUX      X86_64 Unclaimed Idle          0.000 2011  0+00

Total Owner Claimed Unclaimed Matched Preempting Backfill  Drain

X86_64/LINUX    4     0       0        4        0          0        0      0
Total    4     0       0        4        0          0        0      0

Note

If you’ve previously installed minicondor might just need to start it by running $ sudo condor_master

Creating a new project¶

The first step we need to take is to create an asimov project. This is a special directory structure which will keep all of the components of our analysis organised.

First choose somewhere to keep your project, and make a new directory to keep it in, for example

$ mkdir gwosc-analysis

Then you’ll need to change into that directory by running

$ cd gwosc-analysis

We can then turn this directory into an asimov project by running

$ asimov init "GWOSC Project"

Where “GWOSC Project” is the name for your project, and can be anything you like.

You’ll see that the directory has been populated with various files and directories:

$ ls

asimov.log  checkouts  logs  results  working

Right now we don’t need to worry about what these are for, but they’re described elsewhere in the asimov documentation.

Setting up settings¶

There are lots of settings which need to be established for an analysis to run. In order to keep things as consistent as possible in major analyses we try to ensure these are the same between all our analyses, and then we only change the settings which absolutely need to be changed on an event-by-event basis.

We’ll set things up now using the settings for some recent LVK publications. Asimov uses YAML files to configure everything, and we apply these to the project. You’ll need to copy the following yaml data into a file called, for example, configuration.yaml which you can save inside the gwosc-analysis directory.

kind: configuration
pipelines:
  bilby:
    quality:
      state vector:
        L1: L1:DCS-CALIB_STATE_VECTOR_C01
        H1: H1:DCS-CALIB_STATE_VECTOR_C01
        V1: V1:DQ_ANALYSIS_STATE_VECTOR
    sampler:
      sampler: dynesty
    scheduler:
      accounting group: ligo.dev.o4.cbc.pe.bilby
      request cpus: 4
  bayeswave:
    quality:
      state vector:
        L1: L1:DCS-CALIB_STATE_VECTOR_C01
        H1: H1:DCS-CALIB_STATE_VECTOR_C01
        V1: V1:DQ_ANALYSIS_STATE_VECTOR
    scheduler:
      accounting group: ligo.dev.o4.cbc.pe.bilby
      request memory: 1024
      request post memory: 16384
    likelihood:
      iterations: 100000
      chains: 8
      threads: 4
postprocessing:
  pesummary:
    accounting group: ligo.dev.o4.cbc.pe.bilby
    cosmology: Planck15_lal
    evolve spins: forwards
    multiprocess: 4
    redshift: exact
    regenerate posteriors:
    - redshift
    - mass_1_source
    - mass_2_source
    - chirp_mass_source
    - total_mass_source
    - final_mass_source
    - final_mass_source_non_evolved
    - radiated_energy
    skymap samples: 2000

Once you’ve saved the file you need to “apply” it to the project by running

$ asimov apply -f configuration.yaml

We try not to ship “default” settings with asimov where possible, so that it’s clearer what’s actually being done as the analysis is built.

Adding an event¶

It feels like we’ve spent a lot of time getting things set up, but now we’re ready to actually start looking at gravitational waves.

For this guide we’ll look at GW150914, which was the first gravitational wave to be detected.

The asimov team maintain a set of YAML files for all of the published events `in a special repository<https://git.ligo.org/asimov/data/-/tree/main/events>`_ . For this event I’ve copied it onto this page, but you dont need to save this into a file; the command after the YAML file will download it directly from our repository and add it to your project.

data:
  channels:
    H1: H1:DCS-CALIB_STRAIN_C02
    L1: L1:DCS-CALIB_STRAIN_C02
  frame types:
    H1: H1_HOFT_C02
    L1: L1_HOFT_C02
  segment length: 4
event time: 1126259462.391
gid: G190047
interferometers:
- H1
- L1
kind: event
likelihood:
  psd length: 4
  reference frequency: 20
  sample rate: 2048
  segment start: 1126259460.391
  start frequency: 13.333333333333334
  window length: 4
name: GW150914_095045
priors:
  amplitude order: 1
  chirp mass:
    maximum: 41.97447913941358
    minimum: 21.418182160215295
  luminosity distance:
    maximum: 10000
    minimum: 10
  mass 1:
    maximum: 1000
    minimum: 1
  mass ratio:
    maximum: 1.0
    minimum: 0.05
quality:
  minimum frequency:
    H1: 20
    L1: 20

To add this event directly from the repository we can just running

$ asimov apply -f https://git.ligo.org/asimov/data/-/raw/main/events/gwtc-2-1/GW150914_095045.yaml

Note

While we normally call this event GW150914, its full name is GW150914_095045, and we’ll need to use that later when adding analyses.

Getting data and adding analyses¶

We’re almost there! Now we need to fetch the detector data, and add some analyses. Fortunately this is all automated, and we just need to copy some information into a yaml file and apply it to the project.

We’ll create three analysis steps; the first one fetches the frame files, the second one performs some analysis to work out how much noise is in our data, and the third performs the analysis on the signal. All three steps need to run to complete the analysis.

You’ll need to copy the text from this code block into a file, which you can call analyses.yaml:

kind: analysis
name: get-data
pipeline: gwdata
download:
  - frames
---
kind: analysis
name: psd-generation
pipeline: bayeswave
comment: Bayeswave on-source PSD estimation job
needs:
    - get-data
---
kind: analysis
name: data-analysis
pipeline: bilby
waveform:
  approximant: IMRPhenomXPHM
comment: Bilby parameter estimation job
needs:
    - psd-generation

We can then apply this to our project, and the correct event.

$ asimov apply -f analyses.yaml -e GW150914_095045

Now everything is set to get started.

Starting the analyses¶

Asimov now needs to generate the pipeline, and submit the relevant jobs to be processed. You can make this happen by running

$ asimov manage build submit

Running these analyses might take quite a long time (potentially several days). We can get asimov to monitor them for us by running

$ asimov start

The results¶

Eventually the results for the analysis will be generated by asimov and placed in the results directory. You’ll be able to explore them using the “summary pages” which will be generated and placed in the pages directory of your project.

© Copyright 2020-2024, Daniel Williams.
Created using Sphinx 7.2.6.