Methodology

How figures get to this site, what we project, what we defer, and what we will not show.

How figures get to this site

Every number you see is the result of a deterministic pipeline. No live database; the site is a static export, rebuilt on each data revision.

Ingestion. The RBI e-STATES Excel workbook (and, in Phase 2, TN Budget at a Glance PDFs) is fetched once per fiscal year, hashed, and copied into immutable storage. SCOPE.md §5.1 lists the per-source pipeline; IMPLEMENTATION.md §3.2 documents the parser version that read each row.

Transformation. Python parsers normalise the workbook into long-format Parquet, attaching a source_doc_id on every row. A DuckDB step inside the Next.js prebuild emits per-page JSON payloads (revenue mix, expenditure, deficit, peers, FRBM, population) — see SCOPE.md §5.2.

Rendering. Next.js statically exports HTML for both /en and /ta routes; runtime reads only the JSON that was generated at build time. There is no live API, no analytics-driven content, no server-side rendering. Everything you see was decided at build time.

The citation pill

Every figure on this site carries a small pill labelled "Source". Tap it.

Each pill expands to three links: the mirror (a copy we host on our CDN, used because the upstream is sometimes unreachable from outside India), the upstream URL at the original publisher, and — for PDF sources — a single-page slice that loads in WhatsApp's in-app browser without breaking the #page=N fragment.

The /sources page lists every document we have ever cited, with its SHA-256 hash and the date we retrieved it.

Temporal coverage — which years each figure type covers

Each chart's per-chart picker exposes only those (year, figure type) combinations that the upstream data actually carries. The table below summarises what's available for the donut, treemap, and peers chart from RBI's e-STATES Appendix-1 (revenue) and Appendix-2 (expenditure):

Telangana exclusion. Telangana was created on 2 June 2014. The peer-states chart excludes TS for any fiscal year before FY 2014-15 — pre-2014 TS rows in the underlying RBI workbook are residuals or back-projections, not directly comparable to today's TS.

Per-capita unavailable for FY < 2001-02. The per-capita toggle uses the projection_series census2001_2011_linear@v2 (described below). Years before FY 2001-02 have no Census 2001 anchor; the toggle silently does not apply, and an inline note is shown on the chart.

Population emitter v2. The previous projection_series census2011_linear@v1 used Census 2011 as a single anchor and projected linearly forward and backward. v2 adds the Census 2001 anchor so the FY 2001-2010 range uses interpolation rather than backward extrapolation. AP/TS pre-2014 totals are back-derived from the undivided-AP 2001 enumeration via the 2011 split ratio (≈ 0.4161), tracked as a documented approximation in RISKS.md.

Budget Estimates, Revised Estimates, and Actuals

Indian state budgets publish three figures for each year, each meaning something different.

Why this matters. The most likely citizen misreading of any government chart is "this is what was spent" when it actually shows the plan. The hatch pattern is the visual cue. If a chart is hatched, the government has not yet spent that money — it has only said it intends to.

Per-capita figures: how the projection works

Per-capita figures use the Census 2011 final population totals as a base, projected forward using a simple linear extrapolation through the ten years of the data series. This is the census2001_2011_linear@v2 method recorded on every per-capita figure.

Method. We start with the Census 2011 state populations, fit a linear trend to the inter-census growth rate, and extrapolate forward. There is no smoothing for migration, no district-level redistribution. The method is honest about its bluntness.

Uncertainty. Per-capita figures show a ±3% sensitivity band underneath the headline number. The band is not a confidence interval in the statistical sense; it is a deliberately conservative envelope that reflects two known sources of error: linear extrapolation drift over a 14-year horizon, and inter-state migration that the Census-base method cannot see.

Andhra Pradesh / Telangana caveat. The 2011 Census was taken before the 2014 bifurcation. The five-state series (TN, KA, KL, AP, TS) uses post-bifurcation populations from the Census 2011 redistricting tables; pre-2014 totals would not be directly comparable to today's AP and Telangana boundaries.

FY 2023 projected populations (the year used in the home-page peer comparison):

APAndhra Pradesh
5,61,67,496
KAKarnataka
7,30,08,948
KLKerala
3,54,58,031
TNTamil Nadu
8,62,24,466
TSTelangana
4,00,26,032

FRBM ceiling values

The TN Fiscal Responsibility and Budget Management Act, 2003, sets a revenue-deficit ceiling of 0% of GSDP — i.e., the state must run a revenue surplus. The fiscal-deficit and outstanding-debt ceilings are set by the 15th Finance Commission's TN-specific glide path, with COVID-era relaxations applied to FY 2020-21.

FY 2020-21 (COVID year): the value 4.5% of GSDP shown below is a documented conservative midpoint between the 15th FC baseline (4.0%) and the conditional power-sector-reform allowance (5.0%). RISKS.md tracks this as an open item until a finance-domain reviewer signs off.

Fiscal yearRevenue deficit ceiling (% GSDP)Fiscal deficit ceiling (% GSDP)Outstanding debt ceiling (% GSDP)
FY 2014-150.0%3.0%25.0%
FY 2015-160.0%3.0%25.0%
FY 2016-170.0%3.0%25.0%
FY 2017-180.0%3.0%25.0%
FY 2018-190.0%3.0%25.0%
FY 2019-200.0%3.0%25.0%
FY 2020-210.0%3.0%25.0%
FY 2021-220.0%4.5%conservative midpoint32.6%
FY 2022-230.0%4.0%32.0%
FY 2023-240.0%3.5%31.0%
FY 2024-250.0%3.0%30.0%
FY 2025-260.0%3.0%29.0%

Data quality rules

Every build runs the following six checks against the curated parquet. Any failure exits non-zero and blocks promotion (IMPLEMENTATION.md §3.5).

  1. Source completeness — no row without a source document ID. If a single number cannot be traced, the build fails.
  2. Source registry integrity — every source ID resolves to a real document on disk, and the SHA-256 of that document matches what we recorded at fetch time.
  3. Sign sanity — receipts and expenditure are non-negative; deficits are typed as totals with a documented sign convention.
  4. Sector coverage — every Budget-Estimate or Revised-Estimate expenditure row has a sector classification. No "miscellaneous" bucket on charts.
  5. Parser provenance — every row records which parser version produced it, when, and has a unique fact ID.
  6. Population / FRBM join coverage — every state-year used on the frontend has a matching row in the population and FRBM tables. No silent NaN denominators.

Contested classifications

Some expenditure heads admit more than one defensible sector classification. Where reasonable people disagree, we render the alternative classifications transparently rather than hiding the choice. Sector classification is judgment-laden, especially when an accounting flow (state-to-utility transfer, residual social-services bucket) sits between policy intent and where the rupees land. For each head where the answer is genuinely contested, we document both candidate sectors and the case for each — and we mark the head as under independent review until a paid TN-fiscal-policy reviewer signs off.

I.A.12:Other Social Services

Under independent review
Assigned sector
General Administration & Other (general_administration_other)
Alternative sector
Welfare & Social Security (welfare_social_security)
FY 2023-24 amount
788.99 crore

Rationale

RBI's 'Others*' is a residual bucket sitting beneath the social-services umbrella in Appendix-2. It is a genuinely ambiguous head: in any given fiscal year TN may book scholarship top-ups, minor welfare schemes, or routine social-services overheads here, and the underlying mix is not exposed at this level of aggregation. The case for the assigned classification (general_administration_other) is conservatism — defaulting a residual to a residual avoids inflating the headline welfare total with line items we cannot inspect, and keeps the welfare bucket made up of heads with named schemes (Social Security & Welfare, Nutrition, SC/ST/OBC Welfare, Labour Welfare, Calamity Relief). The case for the alternative classification (welfare_social_security) is structural — RBI itself slots Others* under social services in the Appendix-2 hierarchy, so a reader following the source taxonomy verbatim would expect this head to count as welfare. The amount involved is small (₹789 cr in FY 2023-24, ~0.3% of revenue expenditure), so the choice does not move the treemap meaningfully; the contestation is principle, not magnitude. A reviewer's call here turns on whether the site should follow RBI's own taxonomy or apply a tighter operational definition of welfare.

I.B.5:Energy

Under independent review
Assigned sector
Energy (energy)
Alternative sector
Welfare & Social Security (welfare_social_security)
FY 2023-24 amount
24,037.19 crore

Rationale

The most politically contested head in the mapping. TN's revenue-expenditure under Energy in FY 2023-24 was ₹24,037 cr, the bulk of which is the state's transfer to TANGEDCO compensating the utility for tariff subsidies — the free-electricity-up-to-100-units domestic scheme, the free-power scheme for farm pumpsets, and concessional tariffs for handloom and powerloom weavers. Operationally these are consumer-side welfare transfers: a household that pays no electricity bill is materially in the same position as one that receives a direct cash transfer of equivalent value, and the policy intent is redistributive rather than infrastructural. Fiscally, however, RBI books the entire state-to-TANGEDCO transfer under Energy (an economic service), because the accounting entity receiving the money is the discom, not the household. The case for the assigned classification (energy) is honesty to source taxonomy and to the discom-funding mechanism — the money flows to a power utility on a power-sector head, and reclassifying it as welfare requires a judgement the RBI publication itself does not make. The case for the alternative classification (welfare_social_security) is operational economics: any analysis that asks 'how much of TN's budget is consumer subsidy?' will under-count by ~₹24,000 cr if Energy stays in the energy bucket. Unlike I.A.12 the magnitude is large — moving this head to welfare would push welfare past ₹57,000 cr and shrink the energy bucket to near zero in FY 2023-24. A reviewer's call here likely turns on the CAG audit treatment of the TANGEDCO transfer (whether it is recorded as a subsidy disbursement or a sector outlay) and on whether the reviewer is willing to depart from RBI's own classification.

When reviewer feedback returns, mappings may bump to v2. The audit trail of changes lives in `data/mappings/CHANGELOG.md`.

W0 feasibility evidence

Before any cloud spending, we ran a feasibility gate documented in notebooks/00-feasibility.md. The decision verdict, recorded on 2026-05-13, was:

MVP path: RBI e-STATES Excel only. PDF parsing deferred to post-MVP.

asia-south1 mandate: weakened. Region choice to be revisited at W1 once a personal-India fetch run is scheduled.

LlamaParse account: not created. Will not be created unless a personal-India PDF fetch returns a real PDF and pdfplumber accuracy on it is < 95%.

What this site does NOT show

The MVP is deliberately narrow. The full anti-goals list is in SCOPE.md §8.4; the short version: