Reading the outputs¶
A run of evm-gasfit writes its artifacts under the directory you pass to
write_reports(out_dir) (CLI: --out). This page is a file-by-file tour of
what's there and how to read it.
out_dir/
├── results.csv
├── new_gas_all_params.csv
├── new_gas.csv
├── runtime_estimation_autogenerated_report.md
├── new_gas_proposal.md
├── meta.json
├── figs/ # only if output.plots: true
├── glue_results.csv # only if glue_adjustment.enabled
├── glue_opcodes_by_test.csv # only if glue_adjustment.enabled
└── glue_opcodes_autogenerated_report.md # only if glue_adjustment.enabled
CSV artifacts¶
results.csv¶
One row per successful NNLS fit — i.e. per (spec, model_by-combo, client).
Columns include:
| Group | Columns |
|---|---|
| Keys | test_name, target_opcode, client_name, one column per model_by axis. |
| Fit | nobs, rsquared, rsquared_adj. |
| Intercept | intercept_runtime_ms, intercept_pvalue. |
| Target | target_coef_runtime_ms, target_coef_pvalue, target_coef_conf_int_low, target_coef_conf_int_high. |
| Per extra | <extra>_runtime_ms, <extra>_pvalue, <extra>_conf_int_low, <extra>_conf_int_high (for each surviving model_params entry beyond target_coef). |
Skipped fits (insufficient data, constant opcount, solver failure) do not
produce a row — only the log carries the warning. See
NNLS modeling for skip reasons.
new_gas_all_params.csv¶
The aggregator's per-(gas_param, client, candidate) expansion of
results.csv. One row for every model_params entry of every fit
that could contribute to a gas param — winning and losing candidates
together. Columns include gas_param, client_name, the source
test_name / target_opcode / model_coef_name / model_by-combo, the
source_label of the producing model spec (presets[<name>] or
models.custom[<i>] — the disambiguator between candidates that are otherwise
identical, e.g. two specs differing only in filter_by), the
runtime + CI + p-value, the glue_adjustment applied (zero when no glue
row matched), new_gas_decimal (raw conversion), new_gas_rounded
(ceil-rounded), and two booleans:
is_winner— set on the row picked by the per-client worst-case selector.poor_fit— set on every candidate that failed the p-value / R² thresholds, not just winners. A row that is bothis_winnerandpoor_fitis a fallback winner: the whole(gas_param, client)group had no qualifying row and the selector fell back to the best-effort pick.
new_gas.csv¶
The final proposal: one row per gas param, taken as the across-client
worst case of the per-client winners. Columns are gas_param, client_name
(the worst-case client), runtime_ms, CI bounds, the contributing
selected_test / selected_opcode / selected_model_coef_name,
glue_adjustment, new_gas_decimal, new_gas_rounded, and model_by-combo
columns. This is what's diffed against the patched fork baseline in the
report.
meta.json¶
Run metadata: package version, fork name, anchor rate, config hash, and timestamps. Useful as an audit pointer when comparing runs.
Markdown reports¶
runtime_estimation_autogenerated_report.md¶
Per-spec NNLS summary. For each (test, target, model_by-combo):
- Per-client coefficient table with point estimate, CI, p-value.
- R² and
nobs. - Diagnostic plots inlined when
output.plots: true.
This is the place to inspect fit quality — the proposal report only surfaces poor-fit selections, not every poor fit.
new_gas_proposal.md¶
The headline artifact. Sections, in order:
- Run metadata — timestamp, fork, anchor rate.
- Summary — single line counting proposed / increased / decreased / new / unresolved params and warning counts.
- Contents — TOC with anchor links.
- Proposed gas parameters — the diff table for fitted rows:
gas_param,current_gas,proposed_gas,diff,pct, with<no fit>rows for unresolved params kept in a separate block. - Client comparison — for each fitted param, the worst-client and
second-worst-client values plus
worst / second-worstratio. Large ratios (≳ 2×) flag the worst client as a likely outlier. The per-client overview is rendered either as: - A
log2(proposed / current)heatmap whenoutput.plots: true(red = more expensive than current, green = cheaper, blank rows =new_paramsdeclared without a baseline). The cell the per-client selector picked for each client is outlined. - A markdown table fallback when plots are off; the winning cell is bolded.
- Worst-case provenance — one collapsible
<details>block per gas param, showing every(test_name, target_opcode, model_coef_name, model_by)contender × every client. This is where you go when a proposed value looks surprising and you want to see what the other candidates said. - Warnings — four subsections:
Missing parameters— proposed names that produced no value.Incomplete client coverage— proposed for some clients, absent on others.Missing glue adjustments— glue contributions skipped due to fit-quality gates (see Glue adjustment).Other— anything else.- Poor-fit selections — the same poor-fit winners the
new_gas.csvrow flags, plus a separateOther weak candidatessubsection listing losing candidates that also failed the thresholds. Use this to decide whether the worst-case pick is solid or whether you should split / refit the spec.
glue_opcodes_autogenerated_report.md (glue runs only)¶
Per-glue-opcode summary: tier, driver fixture, per-client coefficient with quality stats, and plots when enabled. See Glue adjustment for what the tiers mean.
Figures (figs/)¶
When output.plots: true:
figs/runtime/— per-fit scatter plus regression line, one PNG per(spec, model_by-combo, client).figs/glue/— per-priced-glue diagnostic plots (glue runs only).- The proposal report's heatmaps are inlined directly as base64 data URIs
so the
.mdis self-contained.
Turning plots off (output.plots: false) skips figs/ entirely and falls
back to markdown tables in the report.