Content Inventory & Taxonomy Complexity Estimator

Estimates the overall complexity score of a content inventory and taxonomy system based on content volume, hierarchy depth, cross-linking density, content types, and governance overhead. Use this to plan migration, audit, or redesign efforts.

Formulas

Component Scores (each 0–100):

  • Volume Score = min(100, log₁₀(totalItems) / 6 × 100)
  • Taxonomy Structure Score = min(100, depth × log₂(branchingFactor + 1) / (10 × log₂(33)) × 100)
    where branchingFactor = totalNodes / topCategories
  • Content-Type Diversity Score = min(100, √contentTypes / √50 × 100)
  • Cross-Link Density Score = min(100, ln(crossLinks + 1) / ln(51) × 100)
  • Metadata Richness Score = min(100, metadataFields / 50 × 100)
  • Governance Load Score = min(100, log₁₀(itemsPerOwner + 1) / log₁₀(1001) × 100)
  • Localisation Score = min(100, (languages − 1) / 19 × 100)
  • Churn Score = min(100, ln(updateFreq + 1) / ln(53) × 100)

Composite Score = 0.20 × Volume + 0.20 × TaxStructure + 0.10 × TypeDiversity + 0.15 × CrossLink + 0.10 × Metadata + 0.10 × Governance + 0.10 × Localisation + 0.05 × Churn

Audit Hours = totalItems × 0.05 × (1 + compositeScore / 100)

Migration Person-Days = auditHours / 6 × (1 + compositeScore / 200)

Recommended FTEs = max(0.1, (totalItems / 500) × (1 + compositeScore / 100) / owners)

Complexity Bands: Low < 20 · Moderate 20–40 · High 40–60 · Very High 60–80 · Extreme ≥ 80

Assumptions & References

  • Volume uses a log₁₀ scale (base 6) so that 1 M items scores 100; linear scaling would compress small inventories unfairly.
  • Taxonomy structure complexity follows information-theoretic branching entropy: deeper trees with higher branching factors are exponentially harder to govern (Zeng & El-Gohary, 2014).
  • Content-type diversity uses a square-root scale to reflect diminishing marginal complexity beyond ~20 types.
  • Cross-link density uses a natural-log scale; 50 avg links/item is treated as the practical ceiling for human-navigable faceted taxonomies.
  • Metadata richness is linear up to 50 fields, consistent with Dublin Core extensions and enterprise MDM benchmarks.
  • Governance load is measured as items per owner; ratios above 1 000 items/owner are considered unmanageable without automation.
  • Localisation multiplier is linear from 1 (no extra complexity) to 20 languages (full complexity).
  • Churn is log-scaled against a weekly update cadence (52/yr) as the practical upper bound for manual governance.
  • Audit effort baseline of 0.05 h/item is derived from industry benchmarks for content audits (Halvorson & Rach, Content Strategy for the Web, 2012).
  • Migration effort assumes 6 productive hours/person-day with a complexity uplift factor.
  • FTE recommendation assumes one governance FTE can manage ~500 items/year at baseline; complexity scales this linearly.
  • Weights were calibrated against practitioner surveys in the Information Architecture Institute's annual IA practice reports.

In the network