NNLS modeling¶
evm-gasfit fits one non-negative least squares (NNLS) regression per
(spec, model_by-combo, client). NNLS is the engine that turns measured EVM
runtimes into runtime coefficients; the proposal layer then converts those
into gas costs.
The regression equation¶
For each model_by slice of a ModelSpec, the fit solves:
test_runtime_msis the per-fixture wall-clock runtime in milliseconds.opcountis the count of the target opcode in that fixture.- Each
param_iis one of the spec'smodel_paramsentries other thantarget_coef. Its column in the design matrix is the interactionopcount × param_value— these features carry per-opcode marginal costs of parameters like memory size or copy length. - The intercept absorbs fixed per-fixture overhead and is also constrained
to be non-negative — a column of ones is prepended to the design matrix
before
scipy.optimize.nnlsis called (modeling/nnls.py).
Why non-negativity?¶
Gas costs cannot be negative. Ordinary OLS often resolves noise by handing two correlated features opposite signs that cancel out; NNLS forbids that and either drives such a coefficient to exactly zero or distributes the signal across the remaining features. The trade-off is that a coefficient sitting at the boundary (zero) is an active constraint, not a fitted value — see the p-value treatment below.
Features that get dropped¶
A model_params entry is silently dropped when its param_value is
constant across the filtered fixtures: the interaction column would be a
scalar multiple of opcount, indistinguishable from target_coef. The fit
proceeds with the remaining features and emits a dropping extra feature
warning (modeling/estimate.py).
Fits that get skipped¶
A (slice, client) combination is skipped with a warning — and contributes no
row to results.csv — when any of:
- Fewer observations than features plus the intercept plus one.
opcountis constant or all-zero (would maketarget_coefunidentifiable).- The NNLS solver itself raises.
If every (spec, slice, client) is skipped, the pipeline raises
ModelingError.
Bootstrap inference¶
Standard errors, confidence intervals, and p-values come from a
non-parametric bootstrap over the rows of the design matrix
(modeling/results.py).
Defaults: bootstrap_iterations: 1000, random_seed: 42 — both configurable
under modeling. The resample indices for
all iterations are drawn up front so the seed remains deterministic even if
some bootstrap fits fail mid-loop.
A bootstrap iteration that raises leaves a row of NaNs in the coefficient matrix; those rows are filtered out before any statistic is computed (failed draws are not treated as boundary hits at zero).
P-values¶
For each coefficient:
with ε = 1e-12. The 1/n_success floor honestly reports "below the
bootstrap resolution" rather than literally zero. A coefficient that the
point estimate already pins at exactly zero gets p = 1.0 (the constraint is
active; there's no evidence it's nonzero).
Confidence intervals¶
conf_int(alpha=0.05) returns the empirical 2.5 / 97.5 percentiles of the
successful bootstrap matrix per coefficient — i.e. a percentile bootstrap
CI, not a normal-approximation interval.
What lands in results.csv¶
One row per successful fit, with at minimum:
test_name,client_name,target_opcode, and one column permodel_bydimension.intercept_runtime_ms,intercept_pvalue.target_coef_runtime_ms,target_coef_pvalue,target_coef_conf_int_low,target_coef_conf_int_high.- For each surviving extra feature:
<extra>_runtime_ms,<extra>_pvalue,<extra>_conf_int_low,<extra>_conf_int_high. rsquared,rsquared_adj,nobs.
The target_coef_* columns are what the proposal layer routes to the
target_coef entry of model_params; the per-extra columns route to the
other model_params entries. See Deriving gas params for
the conversion step.
Quality gates¶
After all fits land, the per-client worst-case selector flags any candidate
row (poor_fit = True) whose p-value or R² crosses one of:
| Knob | Default |
|---|---|
modeling.poor_fit_p_value_threshold |
0.05 |
modeling.poor_fit_rsquared_threshold |
0.5 |
Poor-fit selections still make it into new_gas.csv (they're the best
candidate available for that client) but surface in the proposal report's
Poor-fit selections section so reviewers can decide whether to accept,
broaden the fixture set, or split the spec.