With the advent of billion-galaxy surveys with complex data, the need of the hour is to efficiently model galaxy spectral energy distributions (SEDs) with robust uncertainty quantification. The combination of simulation-based inference (SBI) and amortized neural posterior estimation (NPE) has been successfully used to analyse simulated and real galaxy photometry both precisely and efficiently. In this work, we utilise this combination and build on existing literature to analyse simulated noisy galaxy spectra. Here, we demonstrate a proof-of-concept study of spectra that is (a) an efficient analysis of galaxy SEDs and inference of galaxy parameters with physically interpretable uncertainties; and (b) amortized calculations of posterior distributions of said galaxy parameters at the modest cost of a few galaxy fits with Markov chain Monte Carlo (MCMC) methods. We utilise the SED generator and inference framework Prospector to generate simulated spectra, and train a dataset of 2 **×** 10^{6} spectra (corresponding to a five-parameter SED model) with NPE. We show that SBI—with its combination of fast and amortized posterior estimations—is capable of inferring accurate galaxy stellar masses and metallicities. Our uncertainty constraints are comparable to or moderately weaker than traditional inverse-modelling with Bayesian MCMC methods (e.g. 0.17 and 0.26 dex in stellar mass and metallicity for a given galaxy, respectively). We also find that our inference framework conducts rapid SED inference (0.9–1.2 **×** 10^{5} galaxy spectra via SBI/NPE at the cost of 1 MCMC-based fit). With this work, we set the stage for further work that focuses of SED fitting of galaxy spectra with SBI, in the era of JWST galaxy survey programs and the wide-field Roman Space Telescope spectroscopic surveys.

## 1.Introduction

Understanding the mass assembly of galaxies across cosmic time is a major goal of modern extragalactic astrophysics; solving this question sheds light onto a galaxy's underlying formation and evolution mechanism. Galaxies are well-characterized by features like stellar mass, chemical composition, dust attenuation, current star formation rate (SFR), and the star formation history. These parameters can be accurately inferred from a galaxy's spectral energy distribution (SED).

Within the last two decades, photometry-based SED fitting has become a pivotal method to measure the above properties. Ground-based telescopes have been used extensively for large multi-wavelength galaxy surveys—e.g. Sloan Digital Sky Survey (SDSS, Ahumada *et al* 2020), Dark Energy Survey (Abbott *et al* 2018), and DESI Legacy Imaging Surveys (Dey *et al* 2019)—producing large high-quality complex datasets. However, SED studies relying on photometry alone are subject to many challenges, such as the age-metallicity-dust degeneracy (Worthey 1994, Ferreras *et al* 1999). SED fitting using spectra mitigate this challenge significantly, especially to constrain galaxy SFRs, formation timescales, and metallicities, with measurement of absorption line indices and emission line strengths (e.g. Worthey 1994, Leja *et al* 2019b and references therein.)

There are several cutting-edge SED-fitting pipelines with Bayesian frameworks that use Markov chain Monte Carlo (MCMC) methods to infer galaxy properties—e.g. CIGALE, MAGPHYS, and Prospector (Leja *et al* 2017a, 2019a, Carnall *et al* 2019, Johnson *et al* 2021). However, the computational time needed by the fitting algorithms in these frameworks—e.g. MCMC or nested sampling has been recently is a major bottleneck. With the next generation of telescopes, like the Vera Rubin Observatory/ Legacy Survey of Space and Time (Ivezić *et al* 2019) and the Dark Energy Spectroscopic Instrument (DESI; Aghamousa *et al* 2016), tens of millions of optical and infrared galaxy photometry and spectra will be measured. Highly-resolved spectra have higher information content and require more complex and flexible models for fitting. Finally, datasets with spaxels (or spatial and spectral pixels) are increasing in number, e.g. data units in integral-field-unit (IFU) spectrograph observations, with JWST IFU spectroscopy of a gravitationally lensed galaxy (Khullar *et al* 2021), or the SDSS-MaNGA survey observations of star forming galaxies.

Fitting a large number of free parameters is computationally expensive. A five-parameter spectral model within a typical SED fitting code—stellar mass, dust attenuation, metallicity, age—converges to a best-fit model solution in 2–10 CPU h. Moreover, each galaxy spectra requires its own separate inference chains. With the advent of spectroscopic surveys that may potentially observe tens of millions of galaxy spectra, the cost of modelling spectra quickly becomes prohibitive. Thus, the need of the hour is to quickly and reliably deduce the physical parameters of galaxies in large surveys, as well as to rapidly analyse thousands of spectra from within one galaxy.

Deep learning applied to galaxy SED fitting allows, in principle, a mapping between an observed SED and the target galaxy's star formation history, with several studies in the last few years alone demonstrating the efficacy of new methodologies (Leung and Bovy 2019, Lovell *et al* 2019, Hahn and Melchior 2022). These methods allow regression of galaxy parameters, albeit without any uncertainty quantification.

Recent developments in deep learning methods have focused on uncertainty quantification. For example, deep ensembles involve retraining a network many times with different initializations to enable uncertainty quantification of the model outputs. (Ganaie *et al* 2021). Furthermore, with Bayesian neural networks (BNNs), deterministic weights of the model are replaced by probability distributions, which allows the model to provide uncertainties of its outputs (Valentin Jospin *et al* 2020) (training a BNN includes assumptions of priors over the network weights and assumes that parameter posteriors can be approximated by well behaved variational distribution e.g. Gaussian distribution or a mixture of multiple Gaussians).

Simulation-based inference (SBI; Cranmer *et al* 2019) combined with deep learning can mitigate assumptions (e.g. tractable likelihood) that can plague analytic likelihood and posterior modelling, as well as remove computational bottlenecks in statistical calculations. Many astrophysical studies have demonstrated success with SBI in calculating posterior estimation in a rapid manner (Kacprzak *et al* 2018, Alsing *et al* 2019, Zhang *et al* 2021, Huppenkothen and Bachetti 2022, Zhao *et al* 2022).

SBI of galaxy spectra is an exciting opportunity because these models (a) do not require the expression of an explicit likelihood, and (b) can calculate approximate posterior distributions of galaxy parameters efficiently, allowing for robust uncertainty quantification. Recent work has shown that photometric SED data can be used with SBI for fast inference, e.g. Hahn and Melchior (2022), and Robeyns *et al* (2022).

In this work, we demonstrate a proof-of-concept SBI framework to analyse galaxy spectra and recover posteriors efficiently for a five-parameter SED model. We expect to scale this work to upcoming galaxy spectroscopic surveys—like the Dark Energy Spectroscopy Instrument Aghamousa *et al* (2016) and the Roman Space Telescope High-Latitude Survey Wang *et al* (2022)— and for SED models with more complex and flexible descriptions of galaxy properties.

In section 2, we describe the simulated data used in this study. In section 3, we describe the SBI network architecture and analysis, and in section 4, we share our results and next steps. The fiducial cosmology model used for all distance measurements as well as other cosmological values assumes a standard flat cold dark matter (or CDM) Universe with a cosmological constant CDM), corresponding to WMAP9 observations (Hinshaw *et al* 2013).

## 2.Data

We use Prospector (Johnson *et al* 2021) to generate simulated SEDs of galaxies. Prospector relies on MCMC sampling for stellar population synthesis and parameter inference. It is based on the `Python-FSPS` framework, with the MILES stellar spectral library and the MESA isochrones & stellar tracks (MIST) set of isochrones (Conroy and Gunn 2010, Falcón-Barroso *et al* 2011, Foreman-Mackey *et al* 2013, Choi *et al* 2016, Leja *et al* 2017a).

We generate a training set of 10 000 rest-frame SEDs using a five-parameter model, with a delayed, exponentially declining (i.e. delayed-tau) star formation history. The SFH—SFR as a function of time—is:

where SFR is the star formation rate, *t* is the epoch at which the star formation history is being evaluated, and *τ* corresponds to the e-folding time in the delayed-*τ* SFH model.

This model incorporates physical priors used in survey studies of galaxy mass assembly (e.g. see Belli *et al* 2019). We sample the total stellar mass (), the stellar metallicity , a delayed and exponentially declining SFH with age , the e-folding time (*τ* from here on), and dust attenuation (, corresponding to the optical depth of diffuse dust at 5500 Å; from here on referred to as dust.) Each parameter vector ** θ ** comprises these five parameters. Our SED model assumes a Kroupa Initial Mass Function (Kroupa 2001). Nebular continuum and line emission are also present.

See table 1 for a description of the prior range for each model parameter. See figure 1 for five example SEDs/spectra from our training set, highlighting the emission and absorption features depending on the type of galaxy (young vs old stellar populations, respectively).

**Table 1.**Simulated SEDs: model description and prior range.

Parameter | Description | Priors |
---|---|---|

Total stellar mass formed | Log_{10} Uniform: [10^{8}, 10^{13}] | |

Stellar metallicity in units of | Uniform: [−2.0, 0.2] | |

Diffuse dust optical depth | Tophat: [0.1, 10.00] | |

Age of Galaxy (Gyr) | TopHat: [0, 4] | |

τ | e-folding time of SFH (Gyr) | Log_{10} Uniform: [0.1, 1.0] |

We smooth and resample the simulated SEDs to resemble a medium-resolution spectroscopic survey using Prospector's internal resampling utility. We use a velocity smoothing parameter ( (km s^{−1}), to account for the contribution of Doppler broadening by stellar velocities, and resolution of the model libraries), and fix it at 350 km s^{−1}; this smoothing corresponds to 100, similar to a deep galaxy survey conducted with the 100 JWST/NIRSpec prism (Zackrisson *et al* 2017). This results in a training set with each galaxy SED sampling rest-frame 3750–9500 Å, with 138 flux elements for each SED, which is the data vector ** x ** corresponding to each

**.**

*θ*To map our training set to observations, we add stochasticity to the training set in the form of Gaussian noise, to the level of 5% of the flux at a given wavelength, representative of real data at signal-to-noise ratio SNR ∼ 20. We conduct data augmentation to scale 10 000 noise-less spectra in our base training set to 2 **×** 10^{6} spectra with Gaussian noise used in our SBI framework (see section 3).

We also create an additional test of pectra conduct posterior diagnostics. We generate a set of 1000 spectra with noise, from the same parameter prior range.

## 3.Inference methodology

### 3.1.SBI

Our objective is to calculate posterior distributions of the galaxy parameters derived from a typical SED analysis, where ** θ ** is the set of galaxy properties, and

**represents the galaxy spectra. We do this by training our SBI model on the large stochastically-sampled training set of SEDs described in section 2. We utilise Neural Posterior Estimation (NPE) (Papamakarios and Murray 2016, Greenberg**

*x**et al*2019)) which relies on neural networks to train on simulated SEDs with realistic noise, and allow us to estimate 'amortized' posterior distributions.

SBI/NPE requires computational time in advance of the actual inference, and evaluates the posterior for different observations without having to re-run inference (this is known as amortization). This 'amortized' calculation of posteriors then allows us to infer the posteriors of a 'real' galaxy with computational time 1 s. For more details and examples of amortized neural network-based posterior estimation, see Greenberg *et al* (2019) or section 2 of Hahn and Melchior (2022). We provide a short summary below.

NPE uses 'normalizing flows' (Tabak and Turner 2013) as a density estimator, which employs an invertible bijective transformation to map a complex distribution (i.e. the true posterior distribution in SED model parameter space) to a simpler and faster-to-calculate distribution (often Gaussian, or a combination of Gaussians). This results in the calculation of approximate posterior distributions, that are assumed to be a good approximation of the underlying posterior distributions of parameters. In particular, we use Masked Autoregressive Flow (MAFs; Uria *et al* 2016, Papamakarios *et al* 2017) incorporated within `sbi` (similar to Hahn and Melchior 2022). MAFs perform well in modelling conditional probability distributions, such as posteriors (see section 2 of Hahn and Melchior 2022 for more details).

### 3.2.This work

See figure 2 for a depiction of the SBI architecture used in this work. We use a supervised learning pipeline within an SBI framework via the Macke Lab `sbi` toolbox (Tejero-Cantero *et al* 2020). To demonstrate a proof-of-concept, we train on 2 **×** 10^{6} simulations (where each simulation is a noise-added version of an SED in our training set) in an NPE framework. We use 25 hidden units and 10 transform layers without an embedding network in this framework. Our model trains on features in the raw simulated data; this model converges after 87 epochs and takes ∼14 h to train. This analysis generates the set of approximate posterior distributions for our five parameter SED model. We also test on other combinations of hidden units and transform layers, and choose the above as the fiducial choice with more robust results.

### 3.3.Posterior diagnostics

We evaluate the results using a variety of statistical and diagnostic mechanisms. First, to test the precision of our framework, we compare the recovered SED parameter values with the true values ** θ ** of the parameters from our test set. Secondly, to test the health of the calculated posteriors, we also perform posterior predictive checks (PPCs) and simulation-based calibration (SBC) checks (Talts

*et al*2018). PPCs validate that the model SEDs corresponding to the distribution of

*θ*values in a given posterior fall within the allowed range. We do this by cross-checking whether our best-fit model-based spectrum looks similar to observed data

*x*(see figure 3).

SBCs provide a quantitative insight into whether the posterior uncertainties are balanced. In this test, we sample values from the priors, and simulate observations (using our simulator) from these parameters. Following this, we perform inference given each of these observations, which generates SBC posteriors of their own. For a healthy posterior, the SBC ranks of ground truth parameters under the inferred posteriors should follow a uniform distribution (rank plots aid in visually confirming this; see figure 5).

Finally, we compare our results to an MCMC analysis of representative SEDs from the test set.

### 3.4.MCMC analysis

We fit the same five-parameter SED model to representative galaxy SEDs in the test set using the inference framework Prospector, which calculate Markov-Chain Monte Carlo (MCMC)-based posterior distributions.

We assume the following likelihood function:

where () are *n* independent spectral flux elements assumed to be drawn from a Gaussian distribution, and corresponds to the model spectrum for a given parameter set ** θ **.

We use the same ** θ ** prior range and shape as the SBI/NPE analysis, in order to calculate the posterior . We use

`emcee`Foreman-Mackey

*et al*(2013) to conduct the MCMC posterior sampling with 128 walkers, 128 iterations and a burn-in with the step set [4096, 4096, 2048, 512].

Note that non-Gaussian or correlated uncertainties are seen in spectral datasets (e.g. magnitude upper limits in the case of non-detections), which are not accurately captured by the above likelihood, making a 'likelihood-free inference' like SBI the ideal choice for this analysis. The results from the SBI analysis, posterior diagnostics, and MCMC comparison are shown in section 4.

### 3.5.Computing resources

For our SBI analysis, we use the Python 3 Google compute engine backend (with the CPU processor AMD EPYC 7B12), which for our network architecture takes ∼14 CPU hours to train 2 **×** 10^{6} simulated spectra with noise. For every subsequent posterior estimation, this setup takes ∼0.3 s.

For our serialized MCMC calculations, we utilise Prospector runtime on a 2.7 GHz Quad-Core Intel Core i7 processor, which takes ∼14 CPU hours to converge.

## 4.Results

In this proof-of-concept analysis, we test out to set whether an SBI framework can train on realistic noisy galaxy spectra to estimate amortized posteriors robustly. To test the efficacy of our SBI framework, we use a test set of 1000 randomly sampled SEDs from our prior range (see table 1 and section 2).

One such result is shown in the top panel of figure 3, for a galaxy with logM = 10.51 (), , age = 1.63 Gyr, *τ* = 0.11 Gyr, and dust = 0.65 (a metal-poor dusty galaxy). We plot pairwise posterior distributions estimated from both the MCMC (in blue) and SBI (or NPE, in orange) in order to compare constraints across the five-parameter SED model. The truth values are plotted with black dotted lines. In the bottom panel of figure 3, we show the maximum *a posteriori* SED models and model residuals from the SBI/NPE (black) and MCMC (blue) analyses. Also overplotted are 1000 randomly sampled SEDs from the posteriors (in grey).

We find excellent agreement between the median values of parameters across MCMC and SBI/NPE posteriors (when marginalized over other parameters); these values are also accurate relative to the true parameter values ** θ **. We also observe that the age and metallicity constraints are similar in both analyses for this test galaxy, while MCMC mass and dust estimation is more precise relative to SBI/NPE. This is the first demonstration that the proof-of-concept analysis presented here is effective at recovering galaxy SED parameters with spectroscopic observations.

On running inference on a sample of 1000 test galaxies sampled from the 16th–84th percentile range of priors in this study, we find accurate recovery of SED parameters. See figure 4 for a comparison between true and recovered values of each parameter, as well as *χ* plots to show goodness-of-fit across the simulated spectroscopic dataset. The recovered values here are the 50th percentile parameter values, and the uncertainties correspond to confidence intervals between 16th–84th percentile in the posteriors. Here, we demonstrate accurate and precise parameter recovery across the majority of the prior range, specifically for stellar mass (logM), while the constraints on metallicity and age are seen to be wider and less precise. We also note that in our analysis, the recovered parameters are the most biased at the edges of the prior ranges, which indicates that the underlying posterior distribution is not being captured in these parameter ranges. For example, the 16th, 50th and 84th percentile of parameter values for a given galaxy are not accurate descriptors of the underlying posteriors near the edges of the prior range. This can be potentially solved by training on a spectroscopic dataset sampled from a prior range marginally wider than the target spectroscopic survey. We also find no substantial difference in the quality of parameter recovery for star forming galaxies (emission line galaxies; median values of the age parameter 0.75 Gyr), or quiescent systems (galaxies with spectra containing strong absorption line indices; median values of the age parameter 2.5 Gyr).

We also run extensive PPC and SBC checks to test the accuracy and precision of our SED parameter values, where we find that the posterior distributions in this analysis are well converged. See figure 5 for rank distributions for each parameter—a healthy posterior follows uniform distribution (non-uniform distributions indicates a poorly calibrated posterior; Talts *et al* 2018). This demonstration of well-calibrated uncertainties in our SBI analysis is confirmed in figure 6, where we plot the rank cumulative distribution function on the left panel. In both figures, the grey region corresponds to the 95% confidence interval of a uniform distribution, which our parameter rank distributions follow.

The right panel of figure 6 shows the probability coverage curve of each SED parameter ** θ **. The principle behind a probability coverage plot is as follows: a well-calibrated posterior estimator will produce—for an ensemble of SEDs in the test set—parameter uncertainties that accurately reflect the true underlying uncertainty in the ensemble, e.g. a 68% posterior volume will contain 68% of the true SED parameter values of the test ensemble. Generalizing from this, by plotting the fraction of SED parameters in the test set which fall within the posterior volume as a function of the posterior volume, we obtain curves such as those seen in the right panel of figure 6. A well-calibrated estimator will produce probability coverage curves that closely trace the diagonal dashed line, while posterior estimates that are under-confident (i.e. over-estimate uncertainties) will produce curves that fall on the upper-left part of the plot.

For our five-parameter model, we see that the NPE has fairly well-calibrated uncertainty predictions for the age and *τ* parameters. On the other hand, it tends to over-predict the posterior uncertainties for three parameters—stellar mass, dust and metallicity. This is consistent with the results in figure 4—in the goodness-of-fits plots, the scatter in the differences between the predicted and true parameters values across the test set is smaller than what their error bars would suggest. However, we are encouraged by the fact that the NPE analysis (a) accurately predicts the median best-fit values across the test set for those 3 parameters, and (b) in most scientific applications, over-predicting the uncertainties in an analysis is preferable than the alternative. We aim to continue to further improve the posterior uncertainty calibrations in future work.

We also demonstrate here a significant improvement in inference speeds compared with MCMC inference. As mentioned above, the SBI/NPE model uses 25 hidden units and 10 transform layers: this computation takes ∼14 h to train on a CPU, and ∼0.3 s per posterior estimation thereafter (MCMC calculation for a single galaxy takes ∼24 h in our setup). Accounting for the cost of training, we can effectively infer accurate posteriors of 0.9–1.2 **×** 10^{5} galaxy spectra via SBI/NPE at the cost of 1 MCMC-based fit. This demonstrates that the amortization of posterior estimation in SBI with accurate recovery of SED parameters is the biggest advantage of our proof-of-concept analysis.

### 4.1.Caveats

This work presents NPE analyses on spectra generated using (a) empirical templates within flexible stellar population synthesis (FSPS) (MILES spectral libraries, and MIST isochrones; Falcón-Barroso *et al* 2011, Choi *et al* 2016), and (b) with an assumption of uniform (or nearly uniform) SNR and Gaussian noise across wavelength and for each galaxy spectrum. These assumptions have impact on both NPE and MCMC inference of galaxy parameters.

For example, the age-metallicity-dust degeneracy is a systematic that is limited by the information content of the templates, and the wavelength range sampled by the spectra that may or may not contain discriminating information. This affects both MCMC and NPE analyses, and is expected to be mitigated with upcoming spectroscopic surveys of statistical samples of galaxies (DESI, or the Prime Focus Spectrograph; Greene *et al* 2022) as well as simulation studies such as UniverseMachine Behroozi *et al* (2013) and FIRE Ma *et al* (2018).

In addition, SED inference using spectra is only weakly impacted by small (10%–20%) variations in SNRs across wavelengths (especially in the stellar continuum portions of a given spectrum. Several studies analysing quiescent galaxy spectra Carnall *et al* (2019), Khullar *et al* (2022), Tacchella *et al* (2022) as well as photometric SED fitting and parameter recovery Gladis Magris *et al* (2015), Leja *et al* (2017a) seem to point towards this trend. Star-forming galaxies (with strong emission lines) have wavelength regions with peaked SNRs, that improves constraints on parameters such as instantaneous SFRs and ages. This biases our inferences in favor of young stellar populations with emission line spectra, albeit weakly. In figure 4, we observe that both older and younger stellar populations are recovered with similar precision. Finally, we expect spectroscopic surveys like DESI, PFS and the Roman High-Latitude Survey to improve this systematic uncertainty, as they attempt to reach nearly uniform SNRs across wavelength.

## 5.Conclusion and next steps

In this work, we demonstrate a proof-of-concept for amortized NPE with an SBI framework, that utilizes simulated low/medium-resolution galaxy spectra. This is the first-of-its-kind demonstration of this technique on spectra, that will enable precise and rapid estimation of galaxy parameter posteriors for billion-galaxy surveys. We also show here a significant improvement in inference speeds, while maintaining accuracy in the recovery of parameters, with precision comparable or moderately weaker than MCMC constraints.

While this work focuses on using an SBI framework to train on galaxy spectra directly (without any summary statistics or embedded nets), we wish to scale this analysis with graphics processing unit (GPU)-processing on suitable summary statistics (Khullar *et al* in prep). Moreover, the combination of highly complex SED models (Leja *et al* 2019a, Suess *et al* 2022b) and high-resolution spectroscopy will enable precise constraints on star formation histories of galaxies. This is especially true in the era of JWST (Labbe *et al* 2022, Leethochawalit *et al* 2022, Nanayakkara *et al* 2022, Suess *et al* 2022a) and Roman Space Telescope (High Latitude Survey; Wang *et al* 2022), where SED analysis of systematic spectroscopic surveys will be bolstered with an SBI framework.

Finally, when using simulation-trained SBI models on future survey data, it is important to consider possible performance issues that will arise from small differences between simulated and real data (due to approximations, unknown physics or computational constraints, imperfect simulation of noise and other observational effects). The drop in performance of simulation-trained models that are applied to real data is a known issue that affects all deep learning models. Mitigation of these problems is an active area of research, which already led to the development of a broad group of methods called *Domain Adaptation* (Csurka 2017, Wang and Deng 2018). These methods allow deep learning model to learn the features shared between simulated and real data and use only these features for inference. This leads to better alignment of the two data distributions in the latent space of the deep learning model, which leads to improved performance (Ćiprijanović *et al* 2021, 2022). In future work, we will include domain adaptation methods in our SBI frameworks.

## Acknowledgments

The authors thank Alexander Ji, Egor Danilov, Michael D Gladders for their comments and feedback in the planning and analysis of this work. G K thanks the URA Visiting Scholars Program, 2021, for funding this work through graduate student salary support.

The authors are grateful to the reviewers of the journal MLST for their extremely thoughtful and helpful comments; their efforts and feedback have substantially improved the quality of the manuscript.

This manuscript has been supported by Fermi Research Alliance, LLC under Contract No. DE-AC02-07CH11359 with the U.S. Department of Energy (DOE), Office of Science, Office of High Energy Physics.

The authors of this paper have committed themselves to performing this work in an equitable, inclusive, and just environment, and we hold ourselves accountable, believing that the best science is contingent on a good research environment.

We acknowledge the Deep Skies Lab as a community of multi-domain experts and collaborators who have facilitated an environment of open discussion, idea-generation, and collaboration. This community was important for the development of this project.

## Data availability statement

All data that support the findings of this study are included within the article (and any supplementary files).

## Author contributions

G Khullar: Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Project administration, Resources, Software, Visualization, Writing of original draft; B Nord: Conceptualization, Investigation, Methodology, Project administration, Resources, Supervision, Writing (review & editing), Acquisition of the financial support for the project leading to this publication; A Ćiprijanović: Investigation, Methodology, Analysis, Project administration, Resources, Software, Supervision, Writing (review & editing); J Poh: Methodology, Analysis, Resources, Writing (review & editing); F Xu: Methodology, Resources.

## Data and code availability statement

The code and dataset used to perform the experiments presented in this paper is openly available in our GitHub repository:

https://github.com/deepskies/digs_sbi

Access to the repository and data are available upon publication.