Working with open data ====================== We designed asimov to make it easier than ever to run your own analyses on LIGO data. This page will provide a quick overview of how you can set up an analysis similar to the one used in a recent LIGO-Virgo-KAGRA (LVK) publication. .. note:: We don't guarantee that you'll get *precisely* the same results as those which were published. We're working on including various post-processing steps in the asimov workflow which were used for this publication, but they're not quite ready yet. However, the analysis should work in a similar fashion. Getting started --------------- The process which we've outlined here should work on a fairly modern Linux-based computer, or a Windows computer which is running Windows Subsystem for Linux. We've not tested it yet on MacOS computers. Getting the analysis environment ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ LIGO analyses require a complicated software stack. Fortunately it's fairly easy to install this using the ``conda`` tool. Full instructions on using IGWN environments with conda are `available here`_, but normally the following steps will work. First you'll need to install conda if you haven't already got it. .. code-block:: console $ curl -L -O https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-$(uname)-$(uname -m).sh $ bash Mambaforge-$(uname)-$(uname -m).sh You'll be asked some questions about the installation as this runs. Next you need to add ``conda-forge`` as a source of packages. .. code-block:: console $ conda config --add channels conda-forge $ conda config --set channel_priority strict We can now create our own environment, which we'll populate with the IGWN software. .. code-block:: console $ curl -L -O https://computing.docs.ligo.org/conda/environments/linux-64/igwn-py39-testing.yaml $ conda env create --file igwn-py39-testing.yaml We need to *activate* this, which will change things so that we're using versions of software inside this conda environment. .. code-block:: console $ conda activate igwn-py39-testing To give us access to open data we need to install an extra package as well .. code-block:: console $ conda install -c conda-forge asimov-gwdata But now we should have access to all the software that we need to proceed! Getting htcondor ~~~~~~~~~~~~~~~~ When the results are prepared for LVK publications we make use of a large network of computing facilities which are coordinated using a tool called `htcondor `_ which distributes analyses around different computers. Unless you happen to have a computing cluster which uses this technology available, you'll need to install a mini version of ``htcondor`` which can run on a single machine. The instructions we outline here are derived from the ``htcondor`` documentation which you can find `here `_, and assume you can run the ``sudo`` command on your machine to gain Administrator control. It's still possible to get things going without this, but you'll instead need to follow the instructions `here `_. You can install the ``minicondor`` software by running .. code-block:: console $ sudo curl -fsSL https://get.htcondor.org | /bin/bash -s -- --no-dry-run If everything's worked you can run .. code-block:: console $ condor_status Which should give you a list of processors available on your machine, which looks something like this: :: Name OpSys Arch State Activity LoadAv Mem Actv slot1@azaphrael.org LINUX X86_64 Unclaimed Benchmarking 0.000 2011 0+00 slot2@azaphrael.org LINUX X86_64 Unclaimed Idle 0.000 2011 0+00 slot3@azaphrael.org LINUX X86_64 Unclaimed Idle 0.000 2011 0+00 slot4@azaphrael.org LINUX X86_64 Unclaimed Idle 0.000 2011 0+00 Total Owner Claimed Unclaimed Matched Preempting Backfill Drain X86_64/LINUX 4 0 0 4 0 0 0 0 Total 4 0 0 4 0 0 0 0 .. note:: If you've previously installed ``minicondor`` might just need to start it by running ``$ sudo condor_master`` Creating a new project ---------------------- The first step we need to take is to create an asimov project. This is a special directory structure which will keep all of the components of our analysis organised. First choose somewhere to keep your project, and make a new directory to keep it in, for example .. code-block:: console $ mkdir gwosc-analysis Then you'll need to change into that directory by running .. code-block:: console $ cd gwosc-analysis We can then turn this directory into an asimov project by running .. code-block:: console $ asimov init "GWOSC Project" Where "GWOSC Project" is the name for your project, and can be anything you like. You'll see that the directory has been populated with various files and directories: .. code-block:: console $ ls asimov.log checkouts logs results working Right now we don't need to worry about what these are for, but they're described elsewhere in the asimov documentation. Setting up settings ------------------- There are lots of settings which need to be established for an analysis to run. In order to keep things as consistent as possible in major analyses we try to ensure these are the same between all our analyses, and then we only change the settings which absolutely need to be changed on an event-by-event basis. We'll set things up now using the settings for some recent LVK publications. Asimov uses YAML files to configure everything, and we *apply* these to the project. You'll need to copy the following yaml data into a file called, for example, ``configuration.yaml`` which you can save inside the ``gwosc-analysis`` directory. .. code-block:: yaml kind: configuration pipelines: bilby: quality: state vector: L1: L1:DCS-CALIB_STATE_VECTOR_C01 H1: H1:DCS-CALIB_STATE_VECTOR_C01 V1: V1:DQ_ANALYSIS_STATE_VECTOR sampler: sampler: dynesty scheduler: accounting group: ligo.dev.o4.cbc.pe.bilby request cpus: 4 bayeswave: quality: state vector: L1: L1:DCS-CALIB_STATE_VECTOR_C01 H1: H1:DCS-CALIB_STATE_VECTOR_C01 V1: V1:DQ_ANALYSIS_STATE_VECTOR scheduler: accounting group: ligo.dev.o4.cbc.pe.bilby request memory: 1024 request post memory: 16384 likelihood: iterations: 100000 chains: 8 threads: 4 postprocessing: pesummary: accounting group: ligo.dev.o4.cbc.pe.bilby cosmology: Planck15_lal evolve spins: forwards multiprocess: 4 redshift: exact regenerate posteriors: - redshift - mass_1_source - mass_2_source - chirp_mass_source - total_mass_source - final_mass_source - final_mass_source_non_evolved - radiated_energy skymap samples: 2000 Once you've saved the file you need to "apply" it to the project by running .. code-block:: console $ asimov apply -f configuration.yaml We try not to ship "default" settings with asimov where possible, so that it's clearer what's actually being done as the analysis is built. Adding an event --------------- It feels like we've spent a lot of time getting things set up, but now we're ready to actually start looking at gravitational waves. For this guide we'll look at GW150914, which was the first gravitational wave to be detected. The asimov team maintain a set of YAML files for all of the published events `in a special repository`_ . For this event I've copied it onto this page, but you dont need to save this into a file; the command after the YAML file will download it directly from our repository and add it to your project. .. code-block:: yaml data: channels: H1: H1:DCS-CALIB_STRAIN_C02 L1: L1:DCS-CALIB_STRAIN_C02 frame types: H1: H1_HOFT_C02 L1: L1_HOFT_C02 segment length: 4 event time: 1126259462.391 gid: G190047 interferometers: - H1 - L1 kind: event likelihood: psd length: 4 reference frequency: 20 sample rate: 2048 segment start: 1126259460.391 start frequency: 13.333333333333334 window length: 4 name: GW150914_095045 priors: amplitude order: 1 chirp mass: maximum: 41.97447913941358 minimum: 21.418182160215295 luminosity distance: maximum: 10000 minimum: 10 mass 1: maximum: 1000 minimum: 1 mass ratio: maximum: 1.0 minimum: 0.05 quality: minimum frequency: H1: 20 L1: 20 To add this event directly from the repository we can just running .. code-block:: console $ asimov apply -f https://git.ligo.org/asimov/data/-/raw/main/events/gwtc-2-1/GW150914_095045.yaml .. note:: While we normally call this event GW150914, its full name is GW150914_095045, and we'll need to use that later when adding analyses. Getting data and adding analyses -------------------------------- We're almost there! Now we need to fetch the detector data, and add some analyses. Fortunately this is all automated, and we just need to copy some information into a yaml file and apply it to the project. We'll create three analysis steps; the first one fetches the frame files, the second one performs some analysis to work out how much noise is in our data, and the third performs the analysis on the signal. All three steps need to run to complete the analysis. You'll need to copy the text from this code block into a file, which you can call analyses.yaml: .. code-block:: yaml kind: analysis name: get-data pipeline: gwdata download: - frames --- kind: analysis name: psd-generation pipeline: bayeswave comment: Bayeswave on-source PSD estimation job needs: - get-data --- kind: analysis name: data-analysis pipeline: bilby waveform: approximant: IMRPhenomXPHM comment: Bilby parameter estimation job needs: - psd-generation We can then apply this to our project, and the correct event. .. code-block:: console $ asimov apply -f analyses.yaml -e GW150914_095045 Now everything is set to get started. Starting the analyses --------------------- Asimov now needs to generate the pipeline, and submit the relevant jobs to be processed. You can make this happen by running .. code-block:: console $ asimov manage build submit Running these analyses might take quite a long time (potentially several days). We can get asimov to monitor them for us by running .. code-block:: console $ asimov start The results ----------- Eventually the results for the analysis will be generated by asimov and placed in the `results` directory. You'll be able to explore them using the "summary pages" which will be generated and placed in the `pages` directory of your project.