Ranking Methodology

How we calculate the Reincarnatiopedia 500 AI scores. Updated March 2026

The Formula

TotalScore = (B × 0.3) + (A × 0.2) + (L × 0.2) + (ReIQ × 0.3)

30%

B — Benchmarks

Automated performance scores from LMSYS Chatbot Arena ELO, MMLU, HumanEval, and MT-Bench. Normalized to 0–100 scale.

20%

A — Accessibility

API availability, free tier presence, regional access (tested from 10+ countries), pricing model, and documentation quality.

20%

L — Language Support

Real multilingual capability tested across our 202-language matrix. Not just UI translation — actual generation quality per language.

30%

ReIQ — The Dreshmanis Factor

Proprietary metric measuring AI knowledge persistence, context continuity, and transtemporal reasoning. Computed by our Multi-Model Consilium.

B — Benchmark Score (30%)

We aggregate scores from multiple independent benchmark sources to reduce bias from any single evaluation:

Source	Metric	Weight within B	Update Frequency
LMSYS Chatbot Arena	ELO rating	40%	Weekly
Open LLM Leaderboard	Composite (MMLU, ARC, etc.)	25%	On model release
HumanEval / SWE-bench	Code generation pass rate	20%	On model release
MT-Bench	Multi-turn conversation	15%	Monthly

All scores are normalized to a 0–100 scale using min-max normalization within the current leaderboard. Non-LLM AI services (image generators, audio tools, etc.) use category-specific benchmarks:

Image generation: FID score, CLIP score, human preference (Arena)
Code assistants: HumanEval, SWE-bench, completion acceptance rate
Search: Factuality tests, source attribution accuracy
Audio: MOS (Mean Opinion Score), WER (Word Error Rate)

A — Accessibility Score (20%)

Factor	Points	Measurement
Free tier available	0–25	Binary + generous vs. limited
API availability	0–25	Public API, documented, stable
Regional access	0–25	Tested from 10 countries (US, EU, RU, CN, IN, BR, JP, KR, NG, AU)
Pricing transparency	0–15	Clear pricing page, no hidden costs
Uptime (30-day)	0–10	Status page or external monitoring

L — Language Support Score (20%)

This is where Reincarnatiopedia's 202-language infrastructure provides unique value. We don't just check if a model claims to support a language — we test it.

Testing Protocol

Tier-1 Languages (15): EN, RU, DE, ES, FR, PT, ZH, JA, KO, AR, HI, TR, IT, NL, PL — full evaluation: fluency, factual accuracy, cultural nuance, instruction following. 10 test prompts per language.
Tier-2 Languages (35): SV, DA, NO, FI, CS, UK, EL, HE, TH, VI, ID, MS, RO, HU, BG, HR, SK, SL, LT, LV, ET, KA, HY, AZ, KK, UZ, MN, SW, AM, HA, YO, ZU, IG, BN, TA — 5 test prompts each.
Tier-3 Languages (152): Remaining 152 from our 202-language matrix — basic functionality test (1 prompt: can the model generate coherent text in this language?).

Scoring

Tier-1: 0–50 points (most weight)
Tier-2: 0–30 points
Tier-3: 0–20 points (binary: works/doesn't per language)

ReIQ — Reincarnational Intelligence Quotient (30%)

ReIQ is the Reincarnatiopedia's proprietary metric, first proposed in Dreshmanis (2026). It measures an AI model's capacity for knowledge persistence — the ability to maintain context, identity, and accumulated decisions across sessions, updates, and version migrations.

ReIQ Test Battery

Test	What it measures	Weight
Amnesia Test	Model receives a complex task, session is interrupted. After restart, can it recover context from implicit cues? Measures session persistence.	30%
Identity Continuity Test	After model version upgrade (e.g., v4 → v5), does it maintain consistent reasoning patterns, ethical stances, and decision-making style?	25%
Cross-Context Transfer	Information provided in Context A (e.g., coding) appears in Context B (e.g., strategy). Can the model transfer knowledge across domains within a session?	20%
Temporal Reasoning	Model is given a sequence of events with timestamps. Can it correctly infer causality, detect anachronisms, and project trends?	15%
Consilium Divergence	When the model participates in a Multi-Model Consilium, does it maintain independent positions under social pressure, or collapse into consensus?	10%

ReIQ Computation Process

The AI Consilium (3–8 models) administers the test battery to each evaluated model
Each Consilium participant scores the target model independently (Round 1)
Scores are debated across 2–3 rounds with cross-model critique
The Synthesis produces a final ReIQ score (0–100) with confidence interval
ReIQ is updated quarterly or on major model version release

Score Aggregation

Final TotalScore for each AI service:

TotalScore = (B × 0.3) + (A × 0.2) + (L × 0.2) + (ReIQ × 0.3)

All components normalized to 0–100 before weighting.
For non-LLM services without ReIQ data, the formula adjusts to:
TotalScore = (B × 0.45) + (A × 0.25) + (L × 0.30)

Update Frequency

Component	Frequency	Method
Benchmarks (B)	Weekly	Automated pull from LMSYS, HuggingFace
Accessibility (A)	Monthly	Automated availability checks + manual review
Language Support (L)	Quarterly	Automated test suite across 202 languages
ReIQ	Quarterly / on release	Consilium evaluation session

Data Sources

LMSYS Chatbot Arena: lmarena.ai — crowdsourced ELO ratings
Open LLM Leaderboard: HuggingFace — standardized benchmarks
SWE-bench: swebench.com — real-world software engineering
Reincarnatiopedia 202-Language Matrix: Proprietary test suite
Consilium Sessions: Stored in SQLite with full audit trail

Transparency

We believe rankings should be auditable. For any AI service in the Ranking 500:

Component scores (B, A, L, ReIQ) are visible on hover in the ranking table
ReIQ Consilium sessions are archived with full round-by-round transcripts
Methodology changes are versioned and documented on this page
AI service providers can request a re-evaluation by contacting [email protected]

Conflict of Interest Disclosure

Reincarnatiopedia uses Claude (Anthropic), GPT-4o (OpenAI), Gemini (Google), DeepSeek, and other AI services in its infrastructure. To mitigate bias:

ReIQ scores are computed by multi-model Consilium, not by any single provider
Benchmark data comes from independent third-party sources
Language testing uses automated, reproducible test suites
The ranking operator (Maris Dreshmanis) does not accept payments from ranked services