CMA Spectral Unfolder
Multi-species spectral unfolding from scintillator stack detector data using CMA-ES (Covariance Matrix Adaptation Evolution Strategy), with Huber-robust fidelity, Tikhonov / logistic smoothness regularisation, and optional per-detector calibration tuning.
This code reconstructs particle energy spectra from a stack of scintillator detectors by inverting a pre-computed FLUKA Monte Carlo response matrix (RM). The inverse problem is ill-posed: the code solves it by minimising a regularised objective function using the derivative-free CMA-ES optimizer, supporting up to three particle species simultaneously.
The optimizer is designed for high-energy-density physics (HEDP) diagnostics, specifically scintillator-based calorimeters used in laser–plasma interaction experiments. However, the RM-inversion framework is general and applicable to any detector system for which a Monte Carlo response matrix is available.
Physics Background
The Inverse Problem
A stack of N scintillator slabs records the energy deposited by a broad-spectrum particle beam. The measured signal in detector channel \(d_j\) (\(j = 1 \ldots N_\text{FLUKA}\)) is a linear superposition of mono-energetic responses:
where \(R_j(E_i)\) is the response matrix element (energy deposited in detector \(j\) by a particle of energy \(E_i\)), and \(S_i = (dN/dE)_i\) is the unknown spectrum evaluated at the \(i\)-th energy bin. In matrix form: \(\mathbf{d} = \mathbf{R}\,\mathbf{s}\).
Why CMA-ES?
The response matrix is typically rank-deficient or severely ill-conditioned (condition numbers of \(10^6\)–\(10^{12}\) are common for thick-stack calorimeters). Classical linear inversions (SVD truncation, Tikhonov) struggle with positivity constraints and multi-species superposition. CMA-ES is a gradient-free evolutionary strategy that handles:
- Non-convex, multi-modal landscapes
- Box constraints (positivity via log-space parameterisation)
- Heterogeneous variable types (spectra + calibration factors)
- Arbitrary regularisation terms without differentiability requirements
Multi-Species Unfolding
When the detector is sensitive to multiple particle types (e.g. photons + electrons), the measured signal is:
where \(k\) indexes the species, each with its own RM and energy grid. The optimiser solves for all species simultaneously, which breaks the degeneracy that arises from fitting a single RM to a mixed signal.
Architecture
The code is organised into six classes and a set of free functions:
| Component | Role |
|---|---|
Config | Global parameters: paths, energy grids, ROI, calibration, error files |
SpeciesConfig | Per-species RM + energy grid + smoothing hyperparameters |
MultiSpeciesOptimizer | CMA-ES driver: builds objective, runs evolution loop |
OptimizationResult | Structured output container with accessor methods |
DataProcessor | RM import and experimental image reading |
ErrorAnalysis | 2-D error matrix interpolation and error-bar plotting |
Plotter | Standard diagnostic plots |
Getting Started
Installation
Requirements
- Python ≥ 3.9
- NumPy, Pandas, Matplotlib, Pillow, SciPy
cma(pycma) — the CMA-ES implementation by N. Hansen
# Install dependencies
pip install numpy pandas matplotlib pillow scipy cma
# Clone and enter the project
git clone https://github.com/your-org/cma-unfolder.git
cd cma-unfolder
CMA-ES scales as \(\mathcal{O}(n^2)\) per generation with the number of
variables \(n\). With N_guess = 25 per species and
calibration tuning off, \(n = 25\) (or 30 for two species). This is
well within the efficient regime. Do not exceed
N_guess ≈ 200 on a workstation without switching to a
large-scale CMA variant.
Directory Structure
cma-unfolder/
├── CMA_optimizer.py # Main script
├── images/
│ └── raw_data.tiff # Scintillator image
├── RM/
│ └── Response_matrix_double_population/
│ ├── Response_matrix_p/ # Photon RM files
│ │ ├── 0.05_Energy.txt
│ │ ├── 0.05_Spectrum.txt
│ │ ├── 0.05_FLUKA.txt
│ │ ├── 0.1_Energy.txt
│ │ └── ...
│ └── Response_matrix_e/ # Electron RM files
│ └── ...
├── Error/
│ ├── mean1_VAC.txt # Error level 1 (signal > 10%)
│ ├── mean2_VAC.txt # Error level 2 (1% < signal < 10%)
│ ├── mean3_VAC.txt # Error level 3 (0.1% < signal < 1%)
│ └── mean4_VAC.txt # Error level 4 (signal < 0.1%)
└── README.md
Response Matrix Format
Each mono-energetic FLUKA simulation at energy \(E_k\) (in MeV) produces a triplet of files, named using the energy value as prefix:
| File | Contents | Shape |
|---|---|---|
{E_k}_Energy.txt |
Energy bin edges of the simulated spectrum | \(N_\text{sim} \times 1\) |
{E_k}_Spectrum.txt |
Deposited energy spectrum (reserved, not currently used) | \(N_\text{sim} \times 1\) |
{E_k}_FLUKA.txt |
Integrated detector response per scintillator slab | \(N_\text{FLUKA} \times 1\) |
The file prefix must be a parseable floating-point number (e.g.
0.05, 1.5, 100). The importer
sorts by this value and uses it as the RM energy axis.
Configuration — Config
All user-tunable parameters live in the Config class as
class-level attributes. Modify this class to adapt the code to a new
experimental setup. No constructor is needed — all values are set at
import time.
Energy Grids
| Attribute | Type | Description |
|---|---|---|
N_guess | int | Number of spectral bins (CMA variables per species). Default 25. |
E_guess_range | tuple | \(\log_{10}(E_\text{min}/\text{MeV}),\;\log_{10}(E_\text{max}/\text{MeV})\) |
E_guess | ndarray | Log-spaced energy centroids, shape (N_guess,) |
smooth_factor | float | Smoothness penalty weight, auto-scaled by mean Δlog₁₀E |
The number of spectral bins controls the trade-off between spectral
resolution and numerical stability. Too few bins under-resolve spectral
features; too many introduce null-space degrees of freedom that the
regulariser must suppress. Start with N_guess ≈ N_FLUKA
and increase if the residuals show systematic structure.
ROI & Calibration
The ROI array defines pixel rectangles for each scintillator
slab in the experimental image. Each row is
[y_min, y_max, x_min, x_max]. The factor array
contains per-detector calibration multipliers applied to the raw FLUKA
output during RM import.
The factor array must be determined experimentally for your
specific detector stack. If your RM is already properly calibrated, set
all entries to 1.0. Incorrect calibration factors will bias
the unfolded spectrum systematically.
Error Analysis Files
Four text files, each containing a vector of length len(E_error),
encoding the relative unfolding uncertainty at four signal-to-noise thresholds.
These are pre-computed by running Calc_errors.py on the RM.
| File | Signal Regime |
|---|---|
mean1_VAC.txt | Signal / peak > 10% |
mean2_VAC.txt | 1% < signal / peak < 10% |
mean3_VAC.txt | 0.1% < signal / peak < 1% |
mean4_VAC.txt | signal / peak < 0.1% |
Second Species (Electrons)
When n_species = 2, the code loads a second RM from
folder_path_e with its own energy grid
(E_guess_e, N_guess_e). The electron
energy range and smoothing weight are configured independently.
Forward Model
For a single species with spectrum \(\{S_i\}_{i=1}^{N_\text{guess}}\) in log₁₀ space, the predicted detector signal is:
where:
- \(R_{ij}\) =
FLUKA_tot[nearest[i], j]— normalised RM element - \(f_i\) =
FLUKA_fact[nearest[i]]— peak normalization factor (restores absolute scale) - \(c_j\) =
facts[j]— optional per-detector calibration correction (\(\approx 1\))
For \(K\) species the total signal is the sum:
\(\hat{d}_j = \sum_{k=1}^{K} \hat{d}_j^{(k)}\).
This is implemented in _calc_FLUKA_species() and is fully
vectorised over energy bins.
Loss Function
The total objective minimised by CMA-ES is:
Fidelity Term
Relative squared residuals with Huber robustification:
where \(H_\delta(r) = \frac{1}{2}\min(|r|,\delta)^2 + \delta\,(|r| - \min(|r|,\delta))\) transitions from quadratic to linear at threshold \(\delta\). This suppresses the influence of outlier detector channels (e.g. damaged scintillators or channels with poor RM coverage).
The default huber_delta = 1e-2 means that relative residuals
above ~10% are down-weighted linearly rather than quadratically. Increase
δ if your data is clean; decrease it if you suspect systematic errors in
specific channels.
Smoothness Penalties
Simple (Tikhonov-like)
When adaptive_smoothing = False:
This is the squared \(\ell_2\) norm of the second finite difference — a discrete approximation to the curvature penalty \(\int |S''(E)|^2 dE\).
Logistic-Weighted
When adaptive_smoothing = True, the penalty uses first
differences with bin-dependent weights:
where \(x = i/(N-1)\) is the normalised bin index and \(\sigma\) is the logistic function. This reduces regularisation strength at the high-energy tail where the RM sensitivity drops and the spectrum is expected to fall sharply, avoiding the artificial flattening that uniform smoothing produces near the cutoff.
| Parameter | Attribute | Default | Effect |
|---|---|---|---|
| Onset | smooth_transition_start | 0.85 | Fraction of bins before weight starts dropping |
| Floor | smooth_min_weight | 0.01 | Minimum weight at the tail (0 = no regularisation) |
| Steepness | smooth_steepness | 30 | Logistic transition sharpness |
Calibration Factor Tuning
When tune_calibration = True, \(N_\text{FLUKA}\) additional
variables are appended to the CMA vector, representing per-detector
multiplicative corrections \(c_j\). They are initialised near 1.0 and
constrained by a smooth quartic barrier:
where \(\text{sp}(x) = \frac{1}{k}\log(1 + e^{kx})\) is the softplus function with sharpness \(k = 300\). This allows small deviations from unity while strongly penalising large excursions.
Enable calibration tuning only when you have reason to believe the
pre-computed factor array has residual errors (e.g. from
aging scintillators or position-dependent light collection). The
additional degrees of freedom can absorb real spectral features if the
penalty weight facts_penalty_weight is too low.
CMA-ES Internals
The optimizer uses pycma with the following default settings:
| Option | Value | Rationale |
|---|---|---|
sigma0 | 3.0 | Initial step size (log₁₀ scale; covers ~3 decades) |
CMA_stds | 5.0 (spectra), 0.05 (facts) | Heterogeneous scaling for different variable types |
popsize | max(17, 4+3ln n) | Default CMA heuristic, floor at 17 |
maxiter | 100,000 | Upper bound; typically converges in 5k–30k |
tolx | 1e-8 | Convergence on variable change |
CMA_active | 1 | Active CMA (exploits negative covariance information) |
The objective function is dispatched once before the loop via
_build_objective_fn(), which selects among four variants
(base / smooth / calib / smooth+calib) based on the active flags.
This eliminates per-evaluation branching overhead.
Optimisation Vector Layout
x = [ S_photon (N_guess_p) | S_electron (N_guess_e) | facts (N_FLUKA) ]
←── spectral vars ──────────────────────────→ ←── optional ──→
API Reference
SpeciesConfig
Dataclass. Container for one particle species' RM and optimiser settings.
Constructor Parameters
| Name | Type | Description |
|---|---|---|
label | str | Identifier (e.g. 'photon', 'electron') |
E | ndarray (N_data,) | RM energy axis (MeV, ascending) |
E_guess | ndarray (N_guess,) | Desired bin centroids; snapped to RM axis by resolve() |
FLUKA_tot | ndarray (N_data, N_FLUKA) | Normalised detector RM |
FLUKA_fact | ndarray (N_data, 1) | Peak normalisation factors |
smooth_factor | float | Smoothness penalty weight. Default 1.3e-5 |
smooth_transition_start | float | Logistic onset (0–1). Default 0.85 |
smooth_min_weight | float | Logistic floor. Default 0.01 |
smooth_steepness | float | Logistic k. Default 30.0 |
Methods
resolve()
Must be called once before optimisation. Maps E_guess onto the
RM energy axis via nearest-neighbour lookup (injective), sets
nearest_indices and N_guess.
MultiSpeciesOptimizer
Constructor Parameters
| Name | Type | Default | Description |
|---|---|---|---|
species_list | List[SpeciesConfig] | — | 1–3 species configs |
Exp_FLUKA | ndarray | — | Normalised experimental signal |
tune_calibration | bool | False | Optimise per-detector factors |
smoothing | bool | True | Enable smoothness regularisation |
adaptive_smoothing | bool | False | Logistic (True) vs simple (False) |
lower_bound | float | -20 | CMA lower bound (log₁₀) |
upper_bound | float | 20 | CMA upper bound (log₁₀) |
facts_bounds | tuple | (-0.2, 0.2) | Calibration factor range around 1 |
huber_delta | float | 1e-2 | Huber transition threshold |
facts_penalty_weight | float | 1e-2 | Weight on calibration constraint |
cma_options | dict | None | Override CMA-ES settings |
Methods
run_CMA() → OptimizationResult
Execute the CMA-ES evolution loop. Returns a structured
OptimizationResult.
calc_FLUKA(result) → ndarray
Reconstruct the total detector signal from an OptimizationResult.
Useful for computing post-hoc residuals.
mmin(x) → float
Internal objective function called by CMA-ES. Unpacks x and
delegates to the pre-selected variant.
OptimizationResult
Dataclass returned by run_CMA().
| Attribute | Type | Description |
|---|---|---|
spectra_linear | list[ndarray] | Unfolded spectra (linear scale), one per species |
spectra_log | list[ndarray] | Same in log₁₀ (raw CMA output) |
facts | ndarray or None | Optimised calibration factors |
species_labels | list[str] | Species names |
E_guess_list | list[ndarray] | Energy grids (snapped to RM) |
es_result | CMAResult | Raw pycma result object |
get(label) → (E_guess, spectrum_linear)
Convenience accessor. Raises KeyError if the label is not found.
E, spec = result.get('photon')
plt.loglog(E, spec)
DataProcessor
import_RM() → (Spectrum_tot, FLUKA_tot, FLUKA_fact, E)
Reads all *_FLUKA.txt / *_Energy.txt file
triplets from Config.folder_path. Applies
Config.factor, peak-normalises each row, and returns arrays
sorted by ascending energy.
read_image(norm_flag=True, plot_flag=False) → ndarray
Opens Config.image_path, extracts integrated pixel intensity
per ROI. If norm_flag is True, each value is divided by the
ROI pixel count.
find_nearest(array, values) → ndarray
Injective nearest-neighbour mapping: returns one unique index per value.
Raises ValueError if any value falls outside the array range.
ErrorAnalysis
Constructor
ErrorAnalysis(E_error, error_files, ddv, window_size=200)
Loads four error-level files and builds a 2-D error matrix over
(normalised signal, energy), smoothed with a moving-average kernel
of width window_size.
get_errors(x_vals, data_vals) → ndarray
Bilinear interpolation on the error matrix. data_vals is
peak-normalised internally. Returns NaN where interpolation fails
(out-of-bounds).
plot_error_results(E_guess, signal, y_errors, std)
Produces a publication-ready log–log plot with error bars combining
unfolding uncertainty, systematic error (std), and
Poisson counting noise.
Plotter
plot_results(Exp_FLUKA, FLUKA_sim)
Overlays experimental and reconstructed detector signals (normalised).
plot_spectrum(E_guess, sim)
Log–log plot of the unfolded spectrum.
Helper Functions
| Function | Description |
|---|---|
_smoothness_simple(spectrum, E_guess) | Second-difference curvature penalty |
_smoothness_logistic(spectrum, ...) | Logistic-weighted first-difference penalty |
_huber_loss(residuals, delta, weights) | Element-wise Huber loss |
_softplus(x, k) | Smooth ReLU approximation |
_facts_penalty(facts, k) | Quartic barrier for calibration factors |
make_synthetic_exp(...) | Forward-model a known spectrum into synthetic detector data |
Tutorial: Single-Species Unfolding
The simplest use case: unfold a photon spectrum from 17 scintillator channels.
from CMA_optimizer import (
Config, DataProcessor, SpeciesConfig,
MultiSpeciesOptimizer, Plotter
)
# 1. Load RM
_, FLUKA_tot, FLUKA_fact, E = DataProcessor.import_RM()
# 2. Read experimental data
Exp_FLUKA = DataProcessor.read_image(norm_flag=True)
Exp_max = Exp_FLUKA.max()
Exp_FLUKA /= Exp_max
# 3. Configure species
species = SpeciesConfig(
label='photon',
E=E,
E_guess=Config.E_guess.copy(),
FLUKA_tot=FLUKA_tot,
FLUKA_fact=FLUKA_fact,
smooth_factor=5e-6,
)
# 4. Run optimizer
opt = MultiSpeciesOptimizer(
species_list=[species],
Exp_FLUKA=Exp_FLUKA,
smoothing=True,
)
result = opt.run_CMA()
# 5. Plot
E_out, spec = result.get('photon')
Plotter.plot_spectrum(E_out, spec * Exp_max)
Tutorial: Multi-Species (Photon + Electron)
# After loading photon RM as above...
# Load electron RM by temporarily swapping folder_path
_backup = Config.folder_path
Config.folder_path = Config.folder_path_e
_, FLUKA_tot_e, FLUKA_fact_e, E_e = DataProcessor.import_RM()
Config.folder_path = _backup
species_e = SpeciesConfig(
label='electron',
E=E_e,
E_guess=Config.E_guess_e,
FLUKA_tot=FLUKA_tot_e,
FLUKA_fact=FLUKA_fact_e,
smooth_factor=1.3e-9,
smooth_transition_start=0.30,
smooth_min_weight=0.10,
smooth_steepness=10,
)
opt = MultiSpeciesOptimizer(
species_list=[species_p, species_e],
Exp_FLUKA=Exp_FLUKA,
smoothing=True,
)
result = opt.run_CMA()
# Access each species independently
E_p, spec_p = result.get('photon')
E_e, spec_e = result.get('electron')
The electron species typically requires a much lower
smooth_factor and an earlier logistic transition onset
because electron RMs tend to have sharper energy dependence and
fewer spectral bins.
Tutorial: Synthetic Closure Test
The main() function demonstrates a closure test: a known
analytical spectrum is forward-modelled through the RM to produce synthetic
data, then recovered by the optimizer. This validates the pipeline
end-to-end.
from CMA_optimizer import make_synthetic_exp
# Define a known exponential photon spectrum
E_temp = 10 # MeV temperature
known = (1 / Config.E_guess) * np.exp(-Config.E_guess / E_temp)
# Generate synthetic detector signal
Exp_FLUKA = make_synthetic_exp(FLUKA_tot, FLUKA_fact, E, Config.E_guess, known)
Exp_max = Exp_FLUKA.max()
Exp_FLUKA /= Exp_max
# Run optimizer and compare result.spectra_linear[0] against `known`
If the closure test fails to recover the input spectrum, possible causes
include: insufficient N_guess, incorrect
smooth_factor, or RM energy coverage that does not span the
input spectrum. Always run a closure test before analysing real data.
Tutorial: Calibration Factor Tuning
opt = MultiSpeciesOptimizer(
species_list=[species_p],
Exp_FLUKA=Exp_FLUKA,
tune_calibration=True,
facts_penalty_weight=1e-2,
smoothing=True,
)
result = opt.run_CMA()
# Inspect optimised calibration factors
print(result.facts) # should be close to 1.0
With calibration tuning enabled, the optimizer has
\(N_\text{guess} + N_\text{FLUKA}\) free parameters but only
\(N_\text{FLUKA}\) constraints. The problem is under-determined
unless the smoothness penalty and facts penalty jointly constrain
the solution. Monitor result.facts — if values drift
beyond ±10%, increase facts_penalty_weight.
Hyperparameter Guide
| Parameter | Typical Range | Effect of Increase |
|---|---|---|
N_guess | 10–100 | Higher spectral resolution; more ill-posed |
smooth_factor | 1e-7 – 1e-3 | Smoother spectrum; risk of over-regularisation |
huber_delta | 1e-3 – 1e-1 | Less outlier robustness; closer to L2 loss |
facts_penalty_weight | 1e-3 – 1 | Facts pinned closer to 1; less flexibility |
smooth_transition_start | 0.3 – 0.95 | Later onset → more bins fully regularised |
sigma0 | 1 – 5 | Broader initial search; slower convergence |
popsize | 17 – 100 | Better exploration; more evaluations per generation |
Start with the closure test. Adjust smooth_factor until the
recovered spectrum matches the input within the expected error bars. Then
apply to real data. If the residual plot shows systematic structure
(oscillations in the detector-signal comparison), the smoothness is too
high; if the spectrum shows high-frequency noise, it is too low.
Diagnostics & Debugging
Convergence Checks
- Inspect
result.es_result.stop()to see which CMA termination criterion fired. - Use
cma.plot()on the logger for convergence curves (objective value, σ, axis ratios). - If
maxiteris reached without convergence, increase it or widensigma0.
Residual Analysis
FLUKA_sim = opt.calc_FLUKA(result)
residual = (Exp_FLUKA.flatten() - FLUKA_sim / FLUKA_sim.max())
plt.bar(range(Config.N_FLUKA), residual)
plt.ylabel('Relative residual')
plt.xlabel('Detector channel')
plt.show()
Common Failure Modes
Spectrum hits the upper/lower bound
If many bins saturate at upper_bound or
lower_bound, widen the bounds. This typically happens when
the normalisation of Exp_FLUKA is inconsistent with the RM
scale, or when the RM does not cover the energy range of the true spectrum.
Optimizer converges to wrong spectrum
Run the closure test first. If it passes but real data fails, the issue
is likely in the RM (missing physics, incorrect calibration) or the
experimental data (background subtraction, saturation). Try enabling
tune_calibration to absorb per-channel systematics.
CMA-ES stalls (σ collapses prematurely)
Increase popsize (e.g. 50–100) and/or set
'tolfun': 1e-12 in cma_options. Premature
σ-collapse often indicates a flat or degenerate landscape — check
that the RM has sufficient rank for the chosen N_guess.
Performance & Scaling
| Configuration | N_vars | Typical Runtime |
|---|---|---|
| 1 species, 25 bins, no calib | 25 | 1–5 min |
| 2 species (25+5), no calib | 30 | 3–10 min |
| 2 species + calibration | 47 | 10–30 min |
| 1 species, 100 bins | 100 | 30–90 min |
Runtimes are approximate for a single-core desktop (3 GHz). CMA-ES is
inherently sequential in its covariance update, but the objective function
evaluations within each generation are independent and could be parallelised
via cma.CMAEvolutionStrategy's eval_parallel
callback. The forward model itself (_calc_FLUKA_species) is
fully vectorised with NumPy.
Extending the Code
Adding a Third Species
Create a third SpeciesConfig with its own RM and energy grid,
then pass all three to MultiSpeciesOptimizer. The code supports
up to 3 species natively.
Custom Regularisation
To add a new regularisation term (e.g. entropy, L1 sparsity), define a
function with signature f(spectrum: ndarray) → float and add
it to the appropriate _objective_* method. Create a new
objective variant if needed and register it in
_build_objective_fn().
Alternative Fidelity Metrics
The Huber loss can be replaced by any robust estimator (Cauchy, Tukey
biweight) by modifying _huber_loss(). The relative-residual
normalisation \((d - \hat{d})^2 / d^2\) can also be changed to absolute
residuals or Poisson log-likelihood depending on the noise model.
Known Issues
-
Config as mutable class attributes: swapping
Config.folder_pathto load a second RM is a side-effect pattern. A future refactor should pass the folder path as an argument toimport_RM(). -
ErrorAnalysis double-smoothing: the error matrix is
smoothed once inside
_create_error_matrix()and again in__init__. This is inherited from the original code and produces slightly more diffuse error estimates than a single pass. -
Injective nearest-neighbour in
find_nearest: if twoE_guessvalues map to the same RM index, the second is bumped to the next available index. This can shift bins at the edges of the energy grid. Prefer well-separatedE_guessvalues. -
No automatic normalisation consistency check between
Exp_FLUKAand the RM scale. The user must ensure consistent units.
License
This software is released under the PolyForm Noncommercial License 1.0.0. You may use, modify, and distribute the code for any non-commercial purpose. Commercial use requires a separate license from the author.
© 2025 G. Fauvel — CMA Spectral Unfolder