Skip to content

Reading the outputs

A run of evm-gasfit writes its artifacts under the directory you pass to write_reports(out_dir) (CLI: --out). This page is a file-by-file tour of what's there and how to read it.

out_dir/
├── results.csv
├── new_gas_all_params.csv
├── new_gas.csv
├── runtime_estimation_autogenerated_report.md
├── new_gas_proposal.md
├── meta.json
├── figs/                                       # only if output.plots: true
├── glue_results.csv                            # only if glue_adjustment.enabled
├── glue_opcodes_by_test.csv                    # only if glue_adjustment.enabled
└── glue_opcodes_autogenerated_report.md        # only if glue_adjustment.enabled

CSV artifacts

results.csv

One row per successful NNLS fit — i.e. per (spec, model_by-combo, client). Columns include:

Group Columns
Keys test_name, target_opcode, client_name, one column per model_by axis.
Fit nobs, rsquared, rsquared_adj.
Intercept intercept_runtime_ms, intercept_pvalue.
Target target_coef_runtime_ms, target_coef_pvalue, target_coef_conf_int_low, target_coef_conf_int_high.
Per extra <extra>_runtime_ms, <extra>_pvalue, <extra>_conf_int_low, <extra>_conf_int_high (for each surviving model_params entry beyond target_coef).

Skipped fits (insufficient data, constant opcount, solver failure) do not produce a row — only the log carries the warning. See NNLS modeling for skip reasons.

new_gas_all_params.csv

The aggregator's per-(gas_param, client, candidate) expansion of results.csv. One row for every model_params entry of every fit that could contribute to a gas param — winning and losing candidates together. Columns include gas_param, client_name, the source test_name / target_opcode / model_coef_name / model_by-combo, the source_label of the producing model spec (presets[<name>] or models.custom[<i>] — the disambiguator between candidates that are otherwise identical, e.g. two specs differing only in filter_by), the runtime + CI + p-value, the glue_adjustment applied (zero when no glue row matched), new_gas_decimal (raw conversion), new_gas_rounded (ceil-rounded), and two booleans:

  • is_winner — set on the row picked by the per-client worst-case selector.
  • poor_fit — set on every candidate that failed the p-value / R² thresholds, not just winners. A row that is both is_winner and poor_fit is a fallback winner: the whole (gas_param, client) group had no qualifying row and the selector fell back to the best-effort pick.

new_gas.csv

The final proposal: one row per gas param, taken as the across-client worst case of the per-client winners. Columns are gas_param, client_name (the worst-case client), runtime_ms, CI bounds, the contributing selected_test / selected_opcode / selected_model_coef_name, glue_adjustment, new_gas_decimal, new_gas_rounded, and model_by-combo columns. This is what's diffed against the patched fork baseline in the report.

meta.json

Run metadata: package version, fork name, anchor rate, config hash, and timestamps. Useful as an audit pointer when comparing runs.

Markdown reports

runtime_estimation_autogenerated_report.md

Per-spec NNLS summary. For each (test, target, model_by-combo):

  • Per-client coefficient table with point estimate, CI, p-value.
  • R² and nobs.
  • Diagnostic plots inlined when output.plots: true.

This is the place to inspect fit quality — the proposal report only surfaces poor-fit selections, not every poor fit.

new_gas_proposal.md

The headline artifact. Sections, in order:

  1. Run metadata — timestamp, fork, anchor rate.
  2. Summary — single line counting proposed / increased / decreased / new / unresolved params and warning counts.
  3. Contents — TOC with anchor links.
  4. Proposed gas parameters — the diff table for fitted rows: gas_param, current_gas, proposed_gas, diff, pct, with <no fit> rows for unresolved params kept in a separate block.
  5. Client comparison — for each fitted param, the worst-client and second-worst-client values plus worst / second-worst ratio. Large ratios (≳ 2×) flag the worst client as a likely outlier. The per-client overview is rendered either as:
  6. A log2(proposed / current) heatmap when output.plots: true (red = more expensive than current, green = cheaper, blank rows = new_params declared without a baseline). The cell the per-client selector picked for each client is outlined.
  7. A markdown table fallback when plots are off; the winning cell is bolded.
  8. Worst-case provenance — one collapsible <details> block per gas param, showing every (test_name, target_opcode, model_coef_name, model_by) contender × every client. This is where you go when a proposed value looks surprising and you want to see what the other candidates said.
  9. Warnings — four subsections:
  10. Missing parameters — proposed names that produced no value.
  11. Incomplete client coverage — proposed for some clients, absent on others.
  12. Missing glue adjustments — glue contributions skipped due to fit-quality gates (see Glue adjustment).
  13. Other — anything else.
  14. Poor-fit selections — the same poor-fit winners the new_gas.csv row flags, plus a separate Other weak candidates subsection listing losing candidates that also failed the thresholds. Use this to decide whether the worst-case pick is solid or whether you should split / refit the spec.

glue_opcodes_autogenerated_report.md (glue runs only)

Per-glue-opcode summary: tier, driver fixture, per-client coefficient with quality stats, and plots when enabled. See Glue adjustment for what the tiers mean.

Figures (figs/)

When output.plots: true:

  • figs/runtime/ — per-fit scatter plus regression line, one PNG per (spec, model_by-combo, client).
  • figs/glue/ — per-priced-glue diagnostic plots (glue runs only).
  • The proposal report's heatmaps are inlined directly as base64 data URIs so the .md is self-contained.

Turning plots off (output.plots: false) skips figs/ entirely and falls back to markdown tables in the report.