ALMASim Documentation
ALMASim
ALMASim is a library-first Python environment for simulating ALMA observations, exploring ALMA metadata, downloading science products, and building ML-ready radio/mm-wave datasets.
It provides reusable services in src/almasim that can be driven by CLI scripts, Jupyter notebooks, a FastAPI backend, or direct Python code — all through the same staged API.
Table of Contents
Key Capabilities
Simulation
Build clean sky cubes from point, Gaussian, extended, molecular-cloud, diffuse, Galaxy Zoo, and Hubble-100 source models
Simulate single-pointing ALMA interferometric observations with multi-configuration support (12m, 7m, TP)
PWV-aware per-channel noise model
Additive astrophysical background sky — faint dusty galaxies, diffuse emission, or combined
Optional serendipitous source injection
Iterative CLEAN-style deconvolution with resumable state
TP+INT feather-style image combination
Data Products
Dirty cube, dirty visibilities, beam cube, UV mask cube, U/V coordinate cubes
Interferometric, total-power, and combined TP+INT image cubes
ML-ready HDF5 shards (clean cube + dirty cube + dirty visibilities + UV mask + metadata)
Native MeasurementSet (
.ms) export via CASA tools or python-casacore
Metadata and Archive
Query ALMA observations via TAP with rich inclusion/exclusion filters
Normalise TAP columns into stable application fields
Resolve DataLink products, download ALMA data products with parallel support
Unpack raw ASDMs into MeasurementSets
Apply delivered calibration to produce calibrated science MSs
Compute
Synchronous, local multiprocess, Dask, Slurm, and Kubernetes backends
Backend-agnostic simulation service layer
Architecture
src/almasim/ ← installable library (pip install almasim)
services/
simulation.py ← staged pipeline entry points
interferometry/ ← UV sampling, baselines, noise, TP
imaging/ ← deconvolution, TP+INT combination
metadata/ ← TAP queries, normalisation
products/ ← MS export, HDF5 shards, cube export
compute/ ← backend abstraction
archive/ ← ASDM unpack, calibration apply
astro/ ← spectral lines, redshift, parameters
skymodels/ ← source model implementations
backend/ ← FastAPI service (Docker: ghcr.io/…/almasim-backend)
frontend/ ← Svelte UI (requires Docker Compose)
examples/ ← CLI scripts and Jupyter notebooks
The library layer owns all domain logic. The backend is a thin adapter over library services. CLI scripts and notebooks call the same staged services directly.
Installation
Library only (cross-platform)
pip install almasim
With CASA tools (Linux x86-64 only)
casatools and casatasks wheels are Linux-only. Install the optional [casa] extra on a supported Linux system:
pip install "almasim[casa]"
The [casa] extra enables:
Native MeasurementSet export via
casatoolsASDM-to-MS conversion via
casatasks.importasdmCalibration application via
casatasks.applycal
Without [casa], all simulation, imaging, metadata, and download features still work. The MS export path falls back to python-casacore if available:
pip install "almasim[ms-casacore]"
From source (development)
git clone https://github.com/MicheleDelliVeneri/ALMASim.git
cd ALMASim
pip install uv
uv sync --group dev
Backend service (Docker Compose)
The FastAPI backend and Svelte frontend require Docker Compose:
git clone https://github.com/MicheleDelliVeneri/ALMASim.git
cd ALMASim
docker compose up
The backend image is available pre-built from GHCR:
docker pull ghcr.io/michelledelliveneri/almasim-backend:latest
Quick Start
Query ALMA metadata
from almasim.services.metadata.tap.service import query_by_science_type, InclusionFilters
df = query_by_science_type(
include=InclusionFilters(science_keyword=["Galaxies"], band=[6])
)
print(df[["ALMA_source_name", "Band", "spatial_resolution"]].head())
Run a simulation from a metadata row
from almasim import SimulationParams, run_simulation
from pathlib import Path
params = SimulationParams.from_metadata_row(
row, # pandas Series from a metadata query
idx=0,
main_dir=Path("src/almasim"),
output_dir=Path("output"),
project_name="my_project",
)
result = run_simulation(params)
Use the staged API
from almasim import (
SimulationParams,
generate_clean_cube,
simulate_observation,
image_products,
export_results,
)
params = SimulationParams.from_metadata_row(row, idx=0, ...)
cube_result = generate_clean_cube(params)
obs_result = simulate_observation(params, cube_result)
img_result = image_products(params, obs_result)
export_results(params, cube_result, obs_result, img_result)
Staged Simulation API
The pipeline is split into four composable stages:
Stage |
Function |
What it does |
|---|---|---|
1 |
|
Build sky cube from skymodel, apply background |
2 |
|
Run interferometric + TP simulation, return dirty products |
3 |
|
Deconvolve, combine INT+TP, build image cubes |
4 |
|
Write cubes, ML shards, parameter summaries to disk |
run_simulation() orchestrates all four in sequence.
write_ml_dataset_shard() exports an HDF5 shard (clean cube + dirty cube + dirty visibilities + UV mask + metadata) independently of the main export path.
estimate_simulation_footprint() returns resolved pixel count, channel count, cell size, beam size, and raw output size in GiB — useful for pre-run capacity checks.
Full reference: Simulation docs
Skymodels
Source type |
Description |
|---|---|
|
Point source — PSF and CLEAN validation |
|
2-D Gaussian — compact extended source |
|
TNG-backed realistic extended emission |
|
Galaxy Zoo image morphology prior |
|
Hubble Top-100 image morphology prior |
|
Molecular cloud structured emission |
|
Correlated diffuse emission field |
All skymodels accept explicit source_offset_x_arcsec / source_offset_y_arcsec to shift the science target from phase center.
Additive background sky (independent of the main source):
Mode |
Effect |
|---|---|
|
Faint dusty star-forming galaxies |
|
Correlated low-spatial-frequency dusty background |
|
Both of the above |
Full reference: Skymodels docs
Compute Backends
Select via SimulationParams.compute_backend:
Backend |
Use case |
|---|---|
|
Notebooks, examples, debugging |
|
Local CPU parallelism |
|
Distributed execution, cluster scheduling |
|
HPC job submission |
|
Cluster-native environments |
Full reference: Compute docs
Metadata and Downloads
Query metadata via TAP
from almasim.services.metadata.tap.service import (
query_by_science_type,
InclusionFilters,
ExclusionFilters,
)
df = query_by_science_type(
include=InclusionFilters(
science_keyword=["Galaxies"],
band=[6, 7],
public_only=True,
science_only=True,
),
exclude=ExclusionFilters(solar=True),
)
Download products
from almasim.services.download import resolve_products, run_download_job
products = resolve_products(df["member_ous_uid"].tolist())
run_download_job(products, destination=Path("downloads"), extract_tar=True)
Full reference: Metadata docs · Downloads docs
Backend Service
The FastAPI backend exposes library services over HTTP and drives the Svelte frontend.
Endpoint group |
Purpose |
|---|---|
|
TAP queries and metadata management |
|
Simulation job submission and status |
|
Product resolution and download jobs |
|
Deconvolution and combination products |
|
Output browsing and product inspection |
|
Health check |
|
Interactive OpenAPI docs (Swagger UI) |
Start locally for development:
cd backend
uv run uvicorn app.main:app --reload --port 8000
Full reference: Frontend docs
Examples
All examples use the sync compute backend and require no running scheduler.
Script |
Description |
|---|---|
Query TAP, export metadata and product CSVs |
|
Resolve and download ALMA products |
|
Unpack ASDMs and apply calibration |
|
Full pipeline: query → simulate → ML shard |
|
Synthetic imaging + iterative deconvolution |
# Query metadata for Band 6 galaxy observations
python examples/query_metadata_cli.py \
--science-keyword Galaxies --band 6 \
--save-csv examples/output/metadata.csv
# Run a staged simulation from the first metadata row
python examples/staged_pipeline_cli.py \
--metadata-csv examples/output/metadata.csv \
--row-idx 0 --project-name demo \
--ml-shard-path examples/output/demo.h5
# Iterative deconvolution demo
python examples/imaging_cli.py \
--output-dir examples/output/imaging --cycles 180 --gain 0.12
Notebook equivalents: staged_pipeline_notebook.ipynb · query_metadata_notebook.ipynb · download_products_notebook.ipynb
End-to-end archive pipeline (Marimo)
examples/e2e_archive_pipeline.py is a reactive Marimo notebook that covers the full archive workflow interactively: query ALMA metadata → resolve DataLink products → download → unpack ASDMs → apply calibration.
# Install dev dependencies (includes marimo)
uv sync --group dev
# Interactive editing mode — cells re-run automatically as you edit
marimo edit examples/e2e_archive_pipeline.py
# Read-only app mode — run the pipeline step-by-step via the UI
marimo run examples/e2e_archive_pipeline.py
Steps 4 (unpack) and 5 (calibrate) require CASA tools (Linux x86-64 only):
pip install "almasim[casa]"
The notebook saves query filter presets as .query.json files so they can be reloaded across sessions.
Documentation
Full documentation: micheledelliveneri.github.io/ALMASim
Section |
Topics |
|---|---|
Installation, first simulation |
|
Staged API, SimulationParams, outputs |
|
UV sampling, baselines, multi-config |
|
PWV-aware noise model |
|
Additive astrophysical background |
|
Source models reference |
|
Deconvolution, TP+INT combination |
|
TAP queries, filters |
|
Product download workflow |
|
Sync, Dask, Slurm, Kubernetes |
|
Svelte UI workflows |
Build docs locally:
uv sync --group dev
uv run sphinx-build -b html docs/source docs/build/html
Contributing
git clone https://github.com/MicheleDelliVeneri/ALMASim.git
cd ALMASim
uv sync --group dev
uv run pytest --ignore=illustris_python
uv run ruff check .
uv run ruff format .
A release is published automatically when a version tag is pushed:
# 1. Bump version in pyproject.toml and src/almasim/__version__.py
# 2. Commit and tag
git tag v2.1.11
git push origin v2.1.11
The release pipeline then:
Validates that the tag matches
pyproject.tomlRuns the full lint + test suite
Publishes wheel and sdist to PyPI via OIDC trusted publisher
Creates a GitHub Release with auto-generated changelog and attached artifacts
Builds and pushes the backend Docker image to GHCR
One-time PyPI setup: register a trusted publisher on PyPI with owner
MicheleDelliVeneri, repoALMASim, workflowrelease.yml, environmentpypi.
License
ALMASim is released under the GNU General Public License v3.
Documentation Contents
- Quick Start
- Simulation
- Simulation
- Interferometry
- Interferometry
- Noise
- Noise
- Background Sky
- Background Sky
- Skymodels
- Skymodels
- Imaging and Combination
- Imaging and Combination
- Metadata
- Metadata
- Downloads
- Downloads
- Compute Backends
- Compute Backends
- Frontend Workflows
- Frontend Workflows
- SimALMA Fidelity Plan