Repricing Methodology

The proposed gas parameters come from the actual time each execution client takes to run a targeted set of benchmarks: runtimes are collected from the Benchmarkoor API by benchmarkoor-fetch and turned into gas numbers by evm-gasfit. Five stages turn a wall-clock measurement into a gas number:

  1. Benchmark data collection — synthetic blocks stress one operation at a time; Benchmarkoor records each block's runtime and opcode counts.
  2. Runtime model — an NNLS regression isolates the per-unit runtime of each opcode, per client.
  3. Glue adjustment — the overhead of the supporting opcodes that surround the target in each fixture is netted out.
  4. Runtime → gas — the adjusted runtime is converted to gas at a fixed anchor rate.
  5. Worst-case selection — the proposal takes the slowest client for each parameter, then evaluates any derived parameters.

Sections 1–5 below detail those stages. A further analysis builds on the resulting gas numbers and is documented after them: the throughput-loss analysis, which measures how much throughput each repricing choice — keeping today's prices, rounding, or fractional gas — wastes per op and over real mainnet traffic. The companion Bottlenecks Methodology covers the separate question of which operations are already too slow to reprice cheaper.

1. Benchmark data collection

The proposed gas costs are grounded in the actual time each execution client takes to run a specific set of benchmarks. Similar to the methodology of the Gas Cost Estimator project, we generate synthetic blocks that isolate and stress individual EVM operations and use them to derive the various gas parameters.

Concretely, to benchmark a single operation, different blocks are created by varying the number of times the target operation is executed and by changing the parameter values passed to it. These test blocks come from the EEST benchmark suite. The raw data that feeds this analysis was queried from the following two Benchmarkoor suites:

The Benchmarkoor tool then runs each block and collects the metrics we need: the block's total execution time and the number of times each operation was executed. Each block is run multiple times on every client to account for variability in execution time. Runtimes are pulled from the Benchmarkoor API by benchmarkoor-fetch; the benchmarks ran on the amsterdam fork and runs are selected by the .*-full.* run-id pattern. The compute suite drives the target-opcode fits, while the stateful suite feeds the glue-opcode analysis. Each block contributes one row to the model: a measured test_runtime_ms alongside the count of every opcode it executed.

2. Runtime model (NNLS)

evm-gasfit solves one non-negative least squares (NNLS) regression per (opcode spec, model variant, client) combination. It models the measured runtime of each fixture as a linear function of how many times the target opcode runs:

test_runtime_ms = intercept + target_coef · opcount + Σ (param_i · opcount · value_i)

The target_coef — the per-unit runtime of the opcode under study — is the quantity we care about. The trailing sum carries variable-cost terms: opcodes whose cost scales with an operand (MOD/SMOD/ADDMOD/MULMOD by bit-width, KECCAK256 base + per-word, several precompiles) get their own fixture parameters, while constant-cost opcodes (DIV, SDIV) collapse to a single coefficient. Parameters that are constant across all fixtures are dropped automatically — their column is indistinguishable from target_coef — and a fit is skipped entirely if there are too few observations or the opcount never varies.

The non-negativity constraint is the point of NNLS: gas costs cannot be negative, and an unconstrained fit would let correlated features take opposite signs to cancel noise. NNLS instead drives such a coefficient to exactly zero or spreads the signal across the remaining features.

Inference is a non-parametric bootstrap over the rows of the design matrix (1000 iterations). Each coefficient gets a p-value — the bootstrap share of near-zero estimates — and a 2.5 / 97.5 percentile confidence interval. A fit is flagged as a poor-fit selection when its p-value exceeds 0.05 or its R² falls below 0.5. Fits cover 6 clients (all shown here and on the runtime/glue pages; ethrex is fitted too but held out of the worst-case selection in step 5):

See the runtime model page for every per-fit regression summary, or the evm-gasfit NNLS docs for the full derivation.

3. Glue adjustment

Glue adjustment is enabled. A benchmark never measures an opcode in isolation: the recorded runtime also includes the glue opcodes that set up arguments, advance the program counter, and clean the stack (pushes, dups, loads, control flow). Because NNLS attributes the entire intercept + target_coef · opcount budget to the target, glue that scales with the loop body inflates the estimate. Glue adjustment removes that overhead before the runtime is converted to gas.

evm-gasfit works from a fixed set of canonical glue opcodes, fit in dependency tiers (pure opcodes first, then cycle, then mixed tiers that lean on the already-priced ones). The adjustment runs in three steps:

adjusted_target_coef = max(0, target_coef − Σ_g ratio_g · glue_runtime_ms_g)

A glue partner only contributes to the sum when its own fit clears the same p-value (< 0.05) and R² (≥ 0.5) gates; the confidence interval shifts with it and clips at zero. Skipped fits — targets left without a glue correction — are listed in the generated new_gas_proposal.md warnings. See the glue page for the per-fit detail and the evm-gasfit glue docs for the tier structure.

4. Runtime → gas

Each adjusted runtime coefficient is converted to gas at a throughput anchor — the gas/second a client should be able to sustain on the worst case. With runtime_ms in milliseconds:

new_gas = ⌈ anchor_rate · runtime_ms / 1000 ⌉

Because the proposed cost is exactly linear in the anchor — and the worst-case client doesn't change with it — the Repricing page recomputes every cost live in the browser as you move the anchor; no re-fit is needed. The committed baseline is 100,000,000 gas/s (100 Mgas/s). Any op whose measured cost implies a slower throughput than the anchor lands above its current price and is flagged as needing an increase; the ceiling keeps the result an integer.

5. Worst-case selection & derived parameters

A single gas parameter is fit many times — once per client, sometimes across several fixtures. The proposal consolidates those in two passes, always erring toward the slowest observation so the cost holds on every implementation:

Finally, derived parameters (those defined in terms of others) are evaluated in declaration order against the fitted values, the fork baseline, and any new_params integer baselines, using a small arithmetic-only expression language. The generated new_gas_proposal.md client-comparison section shows how far the worst client sits from the rest of the field. The evm-gasfit gas-params docs cover the selection and derivation rules in full.

Throughput loss of repricing

At 100 Mgas/s the fair price for an op is its exact fractional gas — the cost at which a block of that op runs at exactly the anchor's Mgas/s. The Throughput-loss page compares three ways to price each op against that ideal: keep today's integer cost (no reprice), round the fair price up to an integer ≥ 1 (reprice + round), or charge the fair price directly (reprice + fractional, zero loss). Per fitted parameter:

exact = anchor_rate · runtime_ms / 1000   loss(no reprice) = max(current − exact, 0)   loss(round) = ⌈exact⌉ − exact

The analysis focuses on the operations repricing makes cheaper. Throughput is wasted only by overcharging: an op whose fair price is below today's cost (exact < current) is overpriced, so the gas limit binds before the real-time budget and capacity sits idle. An op that repricing would make more expensive (underpriced today, exact > current) is undercharged — it isn't wasting throughput — so its no-reprice loss is floored at zero rather than counted as a negative offset. The no-reprice metric is therefore a pure sum of the wasted throughput on the cost-decrease ops, with no undercharge netting against it.

Opcode-level loss is the unweighted per-op loss / charged — the share of the gas you pay that is waste. Mainnet-traffic loss sums each per-execution op's loss weighted by how often it runs on mainnet, divided by G = total block gas_used over the same window — all gas the chain burns, including the 21k intrinsic and calldata, so it reads as a share of real block gas. Opcode counts come from Xatu's CBT fct_opcode_gas_by_opcode_hourly table and the denominator from fct_execution_gas_used_hourly (make composition → committed data/tx_composition*.csv + data/tx_block_gas_hourly.csv). Per-word / per-round coefficients are shown per-op but excluded from the weighted aggregate (their weight depends on data sizes not modelled here). Rounding discards roughly half a gas per cheap-op execution at any anchor, so its loss is nearly anchor-flat; the no-reprice loss shrinks as the anchor rises (today's fixed cost becomes a smaller overcharge) and an op drops out of the loss once the anchor pushes it underpriced.

Mapping gas parameters to loss — static vs. dynamic costs

Loss is computed per fitted gas coefficient, not per opcode. An op whose cost has a static and a size-dependent part is fit as two separate parameters and appears as two rows. For example KECCAK256 splits into OPCODE_KECCAK256_BASE (the per-execution cost) and OPCODE_KECCAK256_PER_WORD (the marginal cost per 32-byte word); the copy opcodes and the hash precompiles split the same way. Each coefficient gets its own modelled runtime_ms (hence its own exact) and is compared against the matching current cost parsed from the proposal table — base against today's base gas, per-word against today's per-word gas.

The two metrics treat the dynamic coefficients differently:

PUSH/DUP/SWAP variants are normalized to one parameter (PUSH2PUSH) and their mainnet counts summed, so the single base coefficient carries the combined frequency.

Ops that need a price increase are held at zero loss. When an op is underpriced today — its fair cost at the anchor exceeds the current gas (exact > current, e.g. EXP, whose modelled cost is far above today's base) — repricing would raise its cost, not cut it. A block heavy in that op runs slower than the anchor implies; it is undercharged, not wasting throughput — the underpricing the bottleneck analysis and a repricing increase exist to fix. Because this page focuses on the cost-decrease ops, the no-reprice loss is floored at zero for these (max(current − exact, 0)) rather than counted as a negative undercharge. So no undercharge offsets the overcharges, and the no-reprice headline is a pure sum of wasted throughput over the ops repricing makes cheaper — not a signed net. The round and fractional scenarios still price the underpriced ops correctly (⌈exact⌉ ≥ exact raises the cost). Coefficients with a zero or unparseable current cost are dropped from the no-reprice metric only.

Scope & coverage

This analysis covers the EVM's compute operations: arithmetic, bitwise, comparison, stack, control-flow, hashing (KECCAK256), memory, block/transaction and call context, and every precompile (including BLS12-381). The genuinely state- and IO-bound families — storage (SLOAD/SSTORE), account access, transient storage, CREATE, and BLOCKHASH — are deliberately excluded, as their runtime is dominated by state access rather than compute. The op set is the corresponding evm-gasfit presets, declared in fit.yaml.

A few coverage gaps are worth noting: the per-word coefficients for CALLDATACOPY, RETURNDATACOPY and MCOPY aren't fit because the benchmark fixtures don't vary copy size (only their base cost is priced), the BLS12-381 G1/G2 multi-scalar-multiplication precompiles had no matching fixtures in this run, and the MODEXP precompile has no preset upstream. These params simply don't appear in the table.

Gas-cost baseline

Proposed values are compared against the osaka gas-cost table — distinct from the amsterdam fork the benchmarks ran on. The baseline is the osaka GasCosts patched with any config overrides and new_params baselines, and it is the cost reference each proposed value diffs against.

Reproducibility

Re-run the full pipeline end to end:

make fetch        # → data/raw/    (needs a Benchmarkoor token in secrets.json)
make gasfit       # → data/gasfit/
make bottlenecks  # → data/bottlenecks.csv   (worst-case Mgas/s per op)
make composition  # → data/tx_composition*.csv + tx_block_gas_hourly.csv (Xatu; needs xatu creds)
make fracgas      # → data/fracgas*.csv      (repricing throughput loss at the anchor)
make site         # renders this site into docs/

The committed data/ directory makes the published site self-contained: every figure and table on these pages traces back to those artifacts.