ALMASim Documentation

ALMASim

PyPI version Python 3.12 Documentation CI codecov License: GPL v3

ALMASim is a library-first Python environment for simulating ALMA observations, exploring ALMA metadata, downloading science products, and building ML-ready radio/mm-wave datasets.

It provides reusable services in src/almasim that can be driven by CLI scripts, Jupyter notebooks, a FastAPI backend, or direct Python code — all through the same staged API.


Table of Contents


Key Capabilities

Simulation

  • Build clean sky cubes from point, Gaussian, extended, molecular-cloud, diffuse, Galaxy Zoo, and Hubble-100 source models

  • Simulate single-pointing ALMA interferometric observations with multi-configuration support (12m, 7m, TP)

  • PWV-aware per-channel noise model

  • Additive astrophysical background sky — faint dusty galaxies, diffuse emission, or combined

  • Optional serendipitous source injection

  • Iterative CLEAN-style deconvolution with resumable state

  • TP+INT feather-style image combination

Data Products

  • Dirty cube, dirty visibilities, beam cube, UV mask cube, U/V coordinate cubes

  • Interferometric, total-power, and combined TP+INT image cubes

  • ML-ready HDF5 shards (clean cube + dirty cube + dirty visibilities + UV mask + metadata)

  • Native MeasurementSet (.ms) export via CASA tools or python-casacore

Metadata and Archive

  • Query ALMA observations via TAP with rich inclusion/exclusion filters

  • Normalise TAP columns into stable application fields

  • Resolve DataLink products, download ALMA data products with parallel support

  • Unpack raw ASDMs into MeasurementSets

  • Apply delivered calibration to produce calibrated science MSs

Compute

  • Synchronous, local multiprocess, Dask, Slurm, and Kubernetes backends

  • Backend-agnostic simulation service layer


Architecture

src/almasim/          ← installable library  (pip install almasim)
  services/
    simulation.py     ← staged pipeline entry points
    interferometry/   ← UV sampling, baselines, noise, TP
    imaging/          ← deconvolution, TP+INT combination
    metadata/         ← TAP queries, normalisation
    products/         ← MS export, HDF5 shards, cube export
    compute/          ← backend abstraction
    archive/          ← ASDM unpack, calibration apply
    astro/            ← spectral lines, redshift, parameters
  skymodels/          ← source model implementations

backend/              ← FastAPI service  (Docker: ghcr.io/…/almasim-backend)
frontend/             ← Svelte UI  (requires Docker Compose)
examples/             ← CLI scripts and Jupyter notebooks

The library layer owns all domain logic. The backend is a thin adapter over library services. CLI scripts and notebooks call the same staged services directly.


Installation

Library only (cross-platform)

pip install almasim

With CASA tools (Linux x86-64 only)

casatools and casatasks wheels are Linux-only. Install the optional [casa] extra on a supported Linux system:

pip install "almasim[casa]"

The [casa] extra enables:

  • Native MeasurementSet export via casatools

  • ASDM-to-MS conversion via casatasks.importasdm

  • Calibration application via casatasks.applycal

Without [casa], all simulation, imaging, metadata, and download features still work. The MS export path falls back to python-casacore if available:

pip install "almasim[ms-casacore]"

From source (development)

git clone https://github.com/MicheleDelliVeneri/ALMASim.git
cd ALMASim
pip install uv
uv sync --group dev

Backend service (Docker Compose)

The FastAPI backend and Svelte frontend require Docker Compose:

git clone https://github.com/MicheleDelliVeneri/ALMASim.git
cd ALMASim
docker compose up

The backend image is available pre-built from GHCR:

docker pull ghcr.io/michelledelliveneri/almasim-backend:latest

Quick Start

Query ALMA metadata

from almasim.services.metadata.tap.service import query_by_science_type, InclusionFilters

df = query_by_science_type(
    include=InclusionFilters(science_keyword=["Galaxies"], band=[6])
)
print(df[["ALMA_source_name", "Band", "spatial_resolution"]].head())

Run a simulation from a metadata row

from almasim import SimulationParams, run_simulation
from pathlib import Path

params = SimulationParams.from_metadata_row(
    row,                          # pandas Series from a metadata query
    idx=0,
    main_dir=Path("src/almasim"),
    output_dir=Path("output"),
    project_name="my_project",
)

result = run_simulation(params)

Use the staged API

from almasim import (
    SimulationParams,
    generate_clean_cube,
    simulate_observation,
    image_products,
    export_results,
)

params = SimulationParams.from_metadata_row(row, idx=0, ...)

cube_result  = generate_clean_cube(params)
obs_result   = simulate_observation(params, cube_result)
img_result   = image_products(params, obs_result)
export_results(params, cube_result, obs_result, img_result)

Staged Simulation API

The pipeline is split into four composable stages:

Stage

Function

What it does

1

generate_clean_cube()

Build sky cube from skymodel, apply background

2

simulate_observation()

Run interferometric + TP simulation, return dirty products

3

image_products()

Deconvolve, combine INT+TP, build image cubes

4

export_results()

Write cubes, ML shards, parameter summaries to disk

run_simulation() orchestrates all four in sequence.

write_ml_dataset_shard() exports an HDF5 shard (clean cube + dirty cube + dirty visibilities + UV mask + metadata) independently of the main export path.

estimate_simulation_footprint() returns resolved pixel count, channel count, cell size, beam size, and raw output size in GiB — useful for pre-run capacity checks.

Full reference: Simulation docs


Skymodels

Source type

Description

point

Point source — PSF and CLEAN validation

gaussian

2-D Gaussian — compact extended source

extended

TNG-backed realistic extended emission

galaxy-zoo

Galaxy Zoo image morphology prior

hubble-100

Hubble Top-100 image morphology prior

molecular

Molecular cloud structured emission

diffuse

Correlated diffuse emission field

All skymodels accept explicit source_offset_x_arcsec / source_offset_y_arcsec to shift the science target from phase center.

Additive background sky (independent of the main source):

Mode

Effect

blank_field_dsfg

Faint dusty star-forming galaxies

dusty_diffuse

Correlated low-spatial-frequency dusty background

combined

Both of the above

Full reference: Skymodels docs


Compute Backends

Select via SimulationParams.compute_backend:

Backend

Use case

sync

Notebooks, examples, debugging

local

Local CPU parallelism

dask

Distributed execution, cluster scheduling

slurm

HPC job submission

kubernetes

Cluster-native environments

Full reference: Compute docs


Metadata and Downloads

Query metadata via TAP

from almasim.services.metadata.tap.service import (
    query_by_science_type,
    InclusionFilters,
    ExclusionFilters,
)

df = query_by_science_type(
    include=InclusionFilters(
        science_keyword=["Galaxies"],
        band=[6, 7],
        public_only=True,
        science_only=True,
    ),
    exclude=ExclusionFilters(solar=True),
)

Download products

from almasim.services.download import resolve_products, run_download_job

products = resolve_products(df["member_ous_uid"].tolist())
run_download_job(products, destination=Path("downloads"), extract_tar=True)

Full reference: Metadata docs · Downloads docs


Backend Service

The FastAPI backend exposes library services over HTTP and drives the Svelte frontend.

Endpoint group

Purpose

/api/v1/metadata

TAP queries and metadata management

/api/v1/simulation

Simulation job submission and status

/api/v1/download

Product resolution and download jobs

/api/v1/imaging

Deconvolution and combination products

/api/v1/visualizer

Output browsing and product inspection

/health

Health check

/docs

Interactive OpenAPI docs (Swagger UI)

Start locally for development:

cd backend
uv run uvicorn app.main:app --reload --port 8000

Full reference: Frontend docs


Examples

All examples use the sync compute backend and require no running scheduler.

Script

Description

examples/query_metadata_cli.py

Query TAP, export metadata and product CSVs

examples/download_products_cli.py

Resolve and download ALMA products

examples/archive_ms_cli.py

Unpack ASDMs and apply calibration

examples/staged_pipeline_cli.py

Full pipeline: query → simulate → ML shard

examples/imaging_cli.py

Synthetic imaging + iterative deconvolution

# Query metadata for Band 6 galaxy observations
python examples/query_metadata_cli.py \
  --science-keyword Galaxies --band 6 \
  --save-csv examples/output/metadata.csv

# Run a staged simulation from the first metadata row
python examples/staged_pipeline_cli.py \
  --metadata-csv examples/output/metadata.csv \
  --row-idx 0 --project-name demo \
  --ml-shard-path examples/output/demo.h5

# Iterative deconvolution demo
python examples/imaging_cli.py \
  --output-dir examples/output/imaging --cycles 180 --gain 0.12

Notebook equivalents: staged_pipeline_notebook.ipynb · query_metadata_notebook.ipynb · download_products_notebook.ipynb

End-to-end archive pipeline (Marimo)

examples/e2e_archive_pipeline.py is a reactive Marimo notebook that covers the full archive workflow interactively: query ALMA metadata → resolve DataLink products → download → unpack ASDMs → apply calibration.

# Install dev dependencies (includes marimo)
uv sync --group dev

# Interactive editing mode — cells re-run automatically as you edit
marimo edit examples/e2e_archive_pipeline.py

# Read-only app mode — run the pipeline step-by-step via the UI
marimo run examples/e2e_archive_pipeline.py

Steps 4 (unpack) and 5 (calibrate) require CASA tools (Linux x86-64 only):

pip install "almasim[casa]"

The notebook saves query filter presets as .query.json files so they can be reloaded across sessions.


Documentation

Full documentation: micheledelliveneri.github.io/ALMASim

Section

Topics

Quick Start

Installation, first simulation

Simulation

Staged API, SimulationParams, outputs

Interferometry

UV sampling, baselines, multi-config

Noise

PWV-aware noise model

Background Sky

Additive astrophysical background

Skymodels

Source models reference

Imaging

Deconvolution, TP+INT combination

Metadata

TAP queries, filters

Downloads

Product download workflow

Compute Backends

Sync, Dask, Slurm, Kubernetes

Frontend

Svelte UI workflows

Build docs locally:

uv sync --group dev
uv run sphinx-build -b html docs/source docs/build/html

Contributing

git clone https://github.com/MicheleDelliVeneri/ALMASim.git
cd ALMASim
uv sync --group dev
uv run pytest --ignore=illustris_python
uv run ruff check .
uv run ruff format .

A release is published automatically when a version tag is pushed:

# 1. Bump version in pyproject.toml and src/almasim/__version__.py
# 2. Commit and tag
git tag v2.1.11
git push origin v2.1.11

The release pipeline then:

  1. Validates that the tag matches pyproject.toml

  2. Runs the full lint + test suite

  3. Publishes wheel and sdist to PyPI via OIDC trusted publisher

  4. Creates a GitHub Release with auto-generated changelog and attached artifacts

  5. Builds and pushes the backend Docker image to GHCR

One-time PyPI setup: register a trusted publisher on PyPI with owner MicheleDelliVeneri, repo ALMASim, workflow release.yml, environment pypi.


License

ALMASim is released under the GNU General Public License v3.

Documentation Contents

Indices and Tables