Fitting parameters

This tutorial shows how to perform Bayesian parameter inference on a cosmological model by fitting it to data, and how to forecast parameter constraints from a covariance matrix that describes uncertainties and correlations of (unknown) observed data.

Pantheon supernova data

We load the binned Pantheon dataset. This includes redshifts and apparent magnitudes of over 1000 Type Ia supernovae.

using DataFrames, CSV, LinearAlgebra, PDMats, SymBoltz

binned = true # use compressed dataset in the documentation
docsdir = joinpath(pkgdir(SymBoltz), "docs")
if binned # choose compressed dataset with 40 redshift bins
    data = joinpath(docsdir, "Pantheon/Binned_data/lcparam_DS17f.txt")
    Csyst = joinpath(docsdir, "Pantheon/Binned_data/sys_DS17f.txt")
else # choose full dataset with 1048 supernovae
    data = joinpath(docsdir, "Pantheon/lcparam_full_long.txt")
    Csyst = joinpath(docsdir, "Pantheon/sys_full_long.txt")
end

# Read data table
data = CSV.read(data, DataFrame, delim = " ", silencewarnings = true)

# Read covariance matrix of apparent magnitudes (mb)
Csyst = CSV.read(Csyst, DataFrame, header = false) # long vector
Csyst = collect(reshape(Csyst[2:end, 1], (Int(Csyst[1, 1]), Int(Csyst[1, 1])))) # to matrix
Cstat = Diagonal(data.dmb)^2 # TODO: should this be squared?
C = Csyst + Cstat

# Sort data and covariance matrix with decreasing redshift
is = sortperm(data, :zcmb, rev = true)
C = C[is, is]
C = PDMat(Symmetric(C)) # efficient sym-pos-def matrix with Cholesky factorization
data = data[is, :]
first(data, 10) # show 10 first rows

10×20 DataFrame

Row	#name	zcmb	zhel	dz	mb	dmb	x1	dx1	color	dcolor	3rdvar	d3rdvar	cov_m_s	cov_m_c	cov_s_c	set	ra	dec	biascor	Column20
	Int64	Float64	Float64	Float64	Float64	Float64	Int64	Int64	Int64	Int64	Int64	Int64	Int64	Int64	Int64	Int64	Int64	Int64	Missing	Missing
1	39	1.6123	1.6123	0.0	25.926	0.0735	0	0	0	0	0	0	0	0	0	0	0	0	missing	missing
2	38	1.2336	1.2336	0.0	25.304	0.05635	0	0	0	0	0	0	0	0	0	0	0	0	missing	missing
3	37	0.9511	0.9511	0.0	24.6411	0.0276	0	0	0	0	0	0	0	0	0	0	0	0	missing	missing
4	36	0.821	0.821	0.0	24.2446	0.0248	0	0	0	0	0	0	0	0	0	0	0	0	missing	missing
5	35	0.724	0.724	0.0	23.8666	0.027	0	0	0	0	0	0	0	0	0	0	0	0	missing	missing
6	34	0.6299	0.6299	0.0	23.5036	0.031	0	0	0	0	0	0	0	0	0	0	0	0	missing	missing
7	33	0.5742	0.5742	0.0	23.3004	0.0245	0	0	0	0	0	0	0	0	0	0	0	0	missing	missing
8	32	0.5174	0.5174	0.0	23.0012	0.02685	0	0	0	0	0	0	0	0	0	0	0	0	missing	missing
9	31	0.4738	0.4738	0.0	22.7974	0.02935	0	0	0	0	0	0	0	0	0	0	0	0	missing	missing
10	30	0.4355	0.4355	0.0	22.5579	0.0254	0	0	0	0	0	0	0	0	0	0	0	0	missing	missing

Let us plot the apparent magnitude as a function of redshift, and the covariance matrix:

using CairoMakie
fig = Figure(size = (600, 800))
ax1 = Axis(fig[1, 1:2], xlabel = "z", ylabel = "m", title = "Apparent brightness vs. redshift")
scatter!(ax1, data.zcmb, data.mb; markersize = 5, label = "data (Pantheon)")
errorbars!(ax1, data.zcmb, data.mb, data.dmb; linewidth = 1, whiskerwidth = 5)
ax2 = Axis(fig[2, 1]; xlabel = "z", ylabel = "z", title = "Covariance matrix", yreversed = true, aspect = 1)
hm = heatmap!(ax2, extrema(data.zcmb), extrema(data.zcmb), C; colormap = :balance, colorrange = (-0.001, +0.001))
Colorbar(fig[2, 2], hm)
fig

Predicting distances

To predict luminosity distances theoretically, we solve the w0waCDM model:

using SymBoltz, OrdinaryDiffEqTsit5
g = SymBoltz.metric()
K = SymBoltz.curvature(g)
X = SymBoltz.w0wa(g; analytical = true)
M = RMΛ(K = K, Λ = X)
M = complete(SymBoltz.background(M); flatten = false)
M = change_independent_variable(M, M.g.a; add_old_diff = true)
pars_fixed = Dict(M.τ => 0.0, M.r.T₀ => NaN, M.X.cₛ² => NaN)
pars_varying = [M.r.Ω₀, M.m.Ω₀, M.K.Ω₀, M.g.h, M.X.w0, M.X.wa]

dL = SymBoltz.distance_luminosity_function(M, pars_fixed, pars_varying, data.zcmb)
μ(p) = 5 * log10.(dL(p)[begin:end-1] / (10*SymBoltz.pc)) # distance modulus

# Show example predictions
Mb = -19.3 # absolute supernova brightness (constant since SN-Ia are standard candles)
bgopts = (alg = Tsit5(), reltol = 1e-5, maxiters = 1e3)
p0 = [9.3e-5, 0.3, 0.0, 0.7, -1.0, 0.0] # fiducial parameters
μs = μ(p0)
mbs = μs .+ Mb
lines!(ax1, data.zcmb, mbs; color = :black, label = "theory (ΛCDM)")
axislegend(ax1, position = :rb)
fig

Bayesian inference

To perform bayesian inference, we define a probabilistic model in Turing.jl:

using Turing

@model function supernova(μ_pred, mbs, C; Mb = Mb, Ωr0 = 9.3e-5)
    # Parameter priors
    h ~ Uniform(0.1, 1.0)
    Ωm0 ~ Uniform(0.0, 1.0)
    Ωk0 ~ Uniform(-1.0, +1.0)
    w0 ~ Uniform(-2.0, 0.0)
    wa ~ Uniform(-1.0, +1.0)

    p = [Ωr0, Ωm0, Ωk0, h, w0, wa]
    mbs_pred = μ_pred(p)
    if isempty(mbs_pred)
        Turing.@addlogprob! -Inf
        return nothing
    end
    mbs_pred .+= Mb
    return mbs ~ MvNormal(mbs_pred, C) # read "measurements sampled from multivariate normal with predictions and covariance matrix"

    # equivalently:
    #Δmb = mbs .- mbs_pred
    #χ² = transpose(Δmb) * invC * Δmb
    #Turing.@addlogprob! -1/2 * χ²
    #return nothing
end

# https://github.com/JuliaStats/Distributions.jl/issues/1964 # TODO: get rid of? PR?
function MvNormal(μ::AbstractVector{<:Real}, Σ::AbstractPDMat{<:Real})
    R = Base.promote_eltype(μ, Σ)
    Distributions.MvNormal{R, typeof(Σ), typeof(μ)}(μ, Σ)
end
function MvNormal(μ, Σ)
    return Distributions.MvNormal(μ, Σ)
end

sn_w0waCDM = supernova(μ, data.mb, C);
sn_ΛCDM = fix(sn_w0waCDM, w0 = -1.0, wa = 0.0);

WARNING: Constructor for type "MvNormal" was extended in `Main` without explicit qualification or import.
  NOTE: Assumed "MvNormal" refers to `Turing.MvNormal`. This behavior is deprecated and may differ in future versions.`
  NOTE: This behavior may have differed in Julia versions prior to 1.12.
  Hint: If you intended to create a new generic function of the same name, use `function MvNormal end`.
  Hint: To silence the warning, qualify `MvNormal` as `Turing.MvNormal` in the method signature or explicitly `import Turing: MvNormal`.

We can now sample from the model to obtain a MCMC chain for the ΛCDM model:

chain = sample(sn_ΛCDM, NUTS(), 1000; initial_params = InitFromParams((h = 0.7, Ωm0 = 0.3, Ωk0 = 0.01)))
import Plots, StatsPlots # don't collide with Makie
Plots.plot(chain)

Finally, we can visualize the high-dimensional chain with a corner plot using PairPlots.jl:

using PairPlots
layout = (
    PairPlots.Scatter(),
    PairPlots.Contourf(sigmas = 1:2),
    PairPlots.MarginHist(),
    PairPlots.MarginDensity(color = :black),
    PairPlots.MarginQuantileText(color = :black, font = :regular),
    PairPlots.MarginQuantileLines(),
)
pp = pairplot(chain => layout)

We can easily repeat this for another model:

sn_w0CDM_flat = fix(sn_w0waCDM, Ωk0 = 0.0, wa = 0.0);
chain = sample(sn_w0CDM_flat, NUTS(), 1000; initial_params = InitFromParams((h = 0.5, Ωm0 = 0.5, w0 = -1.0)))
pp = pairplot(chain => layout)

Forecasting

Planning future cosmological surveys involves forecasting the precision of the constraints they are believed to place on parameters. Here we show how one can perform forecasting by combining SymBoltz.jl with Turing.jl.

To start, create another probabilistic supernova model, but instead of observed luminosity distances, we now use simulated luminosity distances in a fiducial cosmology:

sn_fc_w0waCDM = supernova(μ, mbs, Diagonal(C));
sn_fc_w0CDM_flat = fix(sn_fc_w0waCDM, Ωk0 = 0.0, wa = 0.0);

MCMC-driven forecasting

A general assumption-free, but expensive method to perform forecasting is to explore the likelihood using MCMC (as before, only against simulated data):

chain_fc = sample(sn_fc_w0CDM_flat, NUTS(), 1000)
Plots.plot(chain_fc)

pars0 = Dict(pars_varying .=> p0)
truth = PairPlots.Truth((h = pars0[M.g.h], Ωm0 = pars0[M.m.Ω₀], w0 = pars0[M.X.w0]))
pp_fc = pairplot(chain_fc => layout, truth)

Fisher forecasting

A less general, but cheaper method to perform forecasting is use a second-order approximation of the likelihood around the mean of the probability distribution (i.e. maximum of the likelihood). This technique is called Fisher forecasting, and requires calculation of the Fisher (information) matrix

\[Fᵢⱼ = -\left⟨\frac{∂^2 \log P(θ)}{∂θᵢ \, ∂θⱼ}\right⟩.\]

This effectively measures how quickly the likelihood falls from the maximum in different directions in parameter space. Under certain assumptions, the Fisher matrix is the inverse of the covariance matrix $C = F⁻¹$.

First, we ask Turing to estimate the maximum likelihood mode of the probabilistic model:

maxl_fc = maximum_likelihood(sn_fc_w0CDM_flat; initial_params = [0.5, 0.5, -1.0]) # TODO: or MAP?

ModeResult with maximized lp of 101.16
[0.6999998423647277, 0.29999926589044196, -0.999996716514097]

As expected, the maximum likelihood corresponds to our chosen fiducial parameters. Nevertheless, the returned mode estimate object offers convenience methods that greatly simplifies our following calculations. Next, we calculate the Fisher matrix from the maximum likelihood, and invert it to get the covariance matrix:

using StatsBase
F_fc = informationmatrix(maxl_fc)
C_fc = inv(F_fc)

3×3 Named Matrix{Float64}
A ╲ B │            h           Ωm0            w0
──────┼─────────────────────────────────────────
h     │   1.60344e-5   0.000141453  -0.000493181
Ωm0   │  0.000141453    0.00351769   -0.00921492
w0    │ -0.000493181   -0.00921492     0.0259976

Finally, we derive 68% (1σ) and 95% (2σ) confidence ellipses from the covariance matrix and draw them onto our corner plot:

function ellipse(C::Matrix, i, j, c = (0.0, 0.0); nstd = 1, N = 33)
    σᵢ², σⱼ², σᵢⱼ = C[i,i], C[j,j], C[i,j]
    θ = (atan(2σᵢⱼ, σᵢ²-σⱼ²)) / 2
    a = √((σᵢ²+σⱼ²)/2 + √((σᵢ²-σⱼ²)^2/4+σᵢⱼ^2))
    b = √(max(0.0, (σᵢ²+σⱼ²)/2 - √((σᵢ²-σⱼ²)^2/4+σᵢⱼ^2)))

    a *= nstd # TODO: correct?
    b *= nstd # TODO: correct?

    cx, cy = c
    ts = range(0, 2π, length=N)
    xs = cx .+ a*cos(θ)*cos.(ts) - b*sin(θ)*sin.(ts)
    ys = cy .+ a*sin(θ)*cos.(ts) + b*cos(θ)*sin.(ts)
    return xs, ys
end

for i in eachindex(IndexCartesian(), C_fc)
    ix, iy = i[1], i[2]
    ix >= iy && continue
    μx = maxl_fc.values[ix]
    μy = maxl_fc.values[iy]
    for nstd in 1:2
        xs, ys = ellipse(C_fc.array, ix, iy, (μx, μy); nstd)
        lines!(pp_fc[iy,ix], xs, ys; color = :red)
    end
end

pp_fc