Training Compute & FLOP Estimator

ANA›Life Services Authority›National Calculator Authority›Training Compute & FLOP Estimator

.calc-container { max-width: 640px; margin: 2rem 0; padding: 1.5rem; background: #fff; border: 1px solid #ddd; border-radius: 8px; box-shadow: 0 1px 3px rgba(0,0,0,0.06); font-family: system-ui, -apple-system, sans-serif; } .calc-container h3 { font-family: Georgia, serif; font-size: 1.15rem; color: #1a1a1a; margin-bottom: 1rem; padding-bottom: 0.5rem; border-bottom: 2px solid var(--ac, #3d5a80); } .calc-row { display: flex; align-items: center; gap: 0.75rem; margin-bottom: 0.75rem; flex-wrap: wrap; } .calc-row label { min-width: 160px; font-size: 0.9rem; color: #333; font-weight: 500; } .calc-row input[type="number"], .calc-row select { flex: 1; min-width: 120px; max-width: 200px; padding: 0.5rem 0.6rem; border: 1px solid #ccc; border-radius: 4px; font-size: 0.9rem; font-family: system-ui, sans-serif; color: #1a1a1a; background: #fafaf8; } .calc-row input:focus, .calc-row select:focus { outline: none; border-color: var(--ac, #3d5a80); box-shadow: 0 0 0 2px rgba(26,74,138,0.12); } .calc-row .unit { font-size: 0.82rem; color: #888; min-width: 30px; } .calc-btn { display: inline-block; margin-top: 0.5rem; padding: 0.55rem 1.5rem; background: var(--ac, #3d5a80); color: #fff; border: none; border-radius: 4px; font-size: 0.9rem; font-weight: 600; cursor: pointer; font-family: system-ui, sans-serif; } .calc-btn:hover { opacity: 0.9; } .calc-result { margin-top: 1.25rem; padding: 1rem 1.25rem; background: #f0f6fc; border-left: 3px solid var(--ac, #3d5a80); border-radius: 0 6px 6px 0; display: none; } .calc-result.visible { display: block; } .calc-result-label { font-size: 0.78rem; text-transform: uppercase; letter-spacing: 0.06em; color: #666; margin-bottom: 0.25rem; } .calc-result-value { font-size: 1.6rem; font-weight: 700; color: var(--ac, #3d5a80); } .calc-result-detail { font-size: 0.85rem; color: #555; margin-top: 0.5rem; line-height: 1.5; } .calc-note { margin-top: 1rem; font-size: 0.8rem; color: #888; font-style: italic; } .calc-grid { display: grid; grid-template-columns: 1fr 1fr; gap: 0.75rem; margin-top: 0.75rem; } .calc-grid-item { padding: 0.6rem 0.8rem; background: #f8f9fa; border-radius: 4px; border: 1px solid #eee; } .calc-grid-item .label { font-size: 0.75rem; color: #888; text-transform: uppercase; letter-spacing: 0.04em; } .calc-grid-item .value { font-size: 1.1rem; font-weight: 600; color: #1a1a1a; } @media (max-width: 720px) { .calc-row { flex-direction: column; align-items: flex-start; gap: 0.3rem; } .calc-row label { min-width: auto; } .calc-row input[type="number"], .calc-row select { max-width: 100%; width: 100%; } .calc-grid { grid-template-columns: 1fr; } } .calc-chart { margin: 1rem 0; text-align: center; } .calc-chart svg { max-width: 100%; height: auto; } .calc-chart-legend { display: flex; flex-wrap: wrap; justify-content: center; gap: 0.6rem 1.2rem; margin-top: 0.6rem; font-size: 0.8rem; color: #555; } .calc-chart-legend span { display: inline-flex; align-items: center; gap: 0.3rem; } .calc-chart-legend i { display: inline-block; width: 10px; height: 10px; border-radius: 2px; font-style: normal; } .calc-related { max-width: 640px; margin: 2rem 0 1rem; padding: 1.25rem 1.5rem; background: #f8f9fa; border: 1px solid #e8e8e8; border-radius: 8px; } .calc-related h3 { font-family: Georgia, serif; font-size: 1rem; color: #1a1a1a; margin: 0 0 0.75rem; padding-bottom: 0.4rem; border-bottom: 2px solid var(--ac, #3d5a80); } .calc-related-list { list-style: none; padding: 0; margin: 0 0 0.75rem; display: grid; grid-template-columns: 1fr 1fr; gap: 0.4rem 1.5rem; } .calc-related-list li a { font-size: 0.88rem; color: var(--ac, #3d5a80); text-decoration: none; } .calc-related-list li a:hover { text-decoration: underline; } .calc-browse-all { margin: 0.5rem 0 0; font-size: 0.9rem; font-weight: 600; } .calc-browse-all a { color: var(--ac, #3d5a80); text-decoration: none; } .calc-browse-all a:hover { text-decoration: underline; } @media (max-width: 720px) { .calc-related-list { grid-template-columns: 1fr; } }

Training Compute & FLOP Estimator

Estimate the total floating-point operations (FLOPs) required to train a neural network, based on model parameters, dataset size, and training configuration.

### Model Architecture

Number of Model Parameters

Total trainable parameters (e.g. 7B = 7,000,000,000)

Parameter Unit

Raw count Millions (M) Billions (B) Trillions (T)

Training Precision

FP16 / BF16 (2 bytes) FP32 (4 bytes) INT8 (1 byte)

### Dataset & Training

Number of Training Tokens

Total tokens seen during training (e.g. 2T = 2,000,000,000,000)

Token Unit

Raw count Millions (M) Billions (B) Trillions (T)

Number of Epochs

How many times the dataset is iterated (typically 1 for LLMs)

Gradient Checkpointing

Disabled (standard) Enabled (~33% extra forward FLOPs)

### Hardware Configuration

Hardware Accelerator

NVIDIA A100 (80GB) — 312 TFLOPS BF16 NVIDIA H100 SXM — 989 TFLOPS BF16 NVIDIA H200 — 1,979 TFLOPS BF16 NVIDIA V100 — 77.6 TFLOPS FP16 Google TPU v3 — 125 TFLOPS BF16 Google TPU v4 — 275 TFLOPS BF16 Google TPU v5e — 693 TFLOPS BF16 Custom

Peak FLOP/s per Accelerator

Peak theoretical throughput in FLOP/s

Number of Accelerators

Model FLOP Utilization (MFU %)

Typical range: 30–50% for large-scale training

Calculate Training Compute

Results will appear here.

function traUpdateHardware() { const sel = document.getElementById('tra-hardware'); const val = sel.value; if (val !== 'custom') { document.getElementById('tra-tflops').value = parseFloat(val); } }

function traFormatFlops(flops) { if (flops >= 1e24) return (flops / 1e24).toFixed(3) + ' YottaFLOP (10²⁴)'; if (flops >= 1e21) return (flops / 1e21).toFixed(3) + ' ZettaFLOP (10²¹)'; if (flops >= 1e18) return (flops / 1e18).toFixed(3) + ' ExaFLOP (10¹⁸)'; if (flops >= 1e15) return (flops / 1e15).toFixed(3) + ' PetaFLOP (10¹⁵)'; if (flops >= 1e12) return (flops / 1e12).toFixed(3) + ' TeraFLOP (10¹²)'; if (flops >= 1e9) return (flops / 1e9).toFixed(3) + ' GigaFLOP (10⁹)'; return flops.toExponential(3) + ' FLOP'; }

function traFormatSeconds(seconds) { if (seconds 100) errors.push('MFU must be between 1 and 100.');

if (errors.length > 0) { resultDiv.innerHTML = 'Input Errors:' + errors.map(e => '').join('') + ''; return; }

// --- Core Calculations --- const N = rawParams * paramUnit; // Total parameters const D = rawTokens * tokenUnit * epochs; // Total tokens processed

// Chinchilla / Kaplan formula: // FLOPs ≈ 6 * N * D (accounts for forward + backward pass) // The factor 6 = 2 (multiply-add) × 3 (1 forward + 2 backward) const flopsPerToken = 6 * N; const totalFLOPs = flopsPerToken * D * gradCkpt;

// Memory estimate (model weights only) const modelMemoryBytes = N * precision; const modelMemoryGB = modelMemoryBytes / 1e9;

// Optimizer state memory (Adam: 2× model params in FP32) const optimizerMemoryGB = (N * 4 * 2) / 1e9;

// Gradient memory (same dtype as model) const gradMemoryGB = modelMemoryGB;

const totalTrainingMemoryGB = modelMemoryGB + optimizerMemoryGB + gradMemoryGB;

// Effective throughput const mfu = mfuPct / 100; const effectiveFLOPS = tflopsRaw * numGPUs * mfu; const wallClockSeconds = totalFLOPs / effectiveFLOPS;

// GPU-hours const gpuHours = (wallClockSeconds / 3600) * numGPUs;

// Petaflop-days (common benchmark unit) const petaflopDays = totalFLOPs / (1e15 * 86400);

// Chinchilla optimal token count const chinchillaOptimalTokens = 20 * N;

// --- Format Results --- resultDiv.innerHTML = ` ### Training Compute Results

MetricValue Total Training FLOPs${traFormatFlops(totalFLOPs)} FLOPs (scientific notation)${totalFLOPs.toExponential(4)} PetaFLOP-days${petaflopDays.toFixed(2)} PF-days Model & Data Model Parameters${(N / 1e9).toFixed(3)} B (${N.toExponential(3)}) Total Tokens Processed${(D / 1e9).toFixed(2)} B tokens FLOPs per Token${traFormatFlops(flopsPerToken)} Chinchilla-Optimal Tokens for this Model${(chinchillaOptimalTokens / 1e9).toFixed(1)} B tokens (20 × N) Memory Estimates (minimum) Model Weights${modelMemoryGB.toFixed(1)} GB Optimizer States (Adam FP32)${optimizerMemoryGB.toFixed(1)} GB Gradients${gradMemoryGB.toFixed(1)} GB Total Training Memory (est.)${totalTrainingMemoryGB.toFixed(1)} GB Wall-Clock Time Estimate Effective Cluster Throughput${traFormatFlops(effectiveFLOPS)}/s Estimated Training Time${traFormatSeconds(wallClockSeconds)} GPU-Hours${gpuHours.toFixed(0).replace(/\B(?=(\d{3})+(?!\d))/g, ',')} GPU-hours

`; }

#### Formulas Used

Total Training FLOPs (Kaplan et al. / Chinchilla):

C = 6 × N × D × G

C — Total compute in FLOPs
N — Number of model parameters
D — Total tokens processed (tokens × epochs)
G — Gradient checkpointing multiplier (1.0 or ~1.33)
6 — Factor accounting for: 2 (multiply-add) × 3 (1 forward pass + 2 backward passes)

Wall-Clock Time:

T = C / (FLOP/s_peak × num_accelerators × MFU) Chinchilla-Optimal Tokens:

D_optimal ≈ 20 × N Memory (minimum lower bound):

Mem = weights (N × bytes) + optimizer (2 × N × 4B for Adam) + gradients (N × bytes)

#### Assumptions & References

The factor of 6 per parameter per token is derived from Kaplan et al. (2020), "Scaling Laws for Neural Language Models" and validated in Hoffmann et al. (2022), "Training Compute-Optimal Large Language Models" (Chinchilla).
Model FLOP Utilization (MFU) of 30–50% is typical for large-scale distributed training; see Chowdhery et al. (2022), PaLM reporting ~46% MFU on TPUs.
Reference: Epoch AI "Compute Trends" (2023) and OpenAI "AI and Compute" (2018) for historical context on training compute scaling.

Training Compute & FLOP Estimator

Training Compute & FLOP Estimator

More Calculators

Read Next

References

Training Compute & FLOP Estimator

Training Compute & FLOP Estimator

More Calculators

Read Next

References

Related Authorities