How we calculate the Reincarnatiopedia 500 AI scores. Updated March 2026
We aggregate scores from multiple independent benchmark sources to reduce bias from any single evaluation:
| Source | Metric | Weight within B | Update Frequency |
|---|---|---|---|
| LMSYS Chatbot Arena | ELO rating | 40% | Weekly |
| Open LLM Leaderboard | Composite (MMLU, ARC, etc.) | 25% | On model release |
| HumanEval / SWE-bench | Code generation pass rate | 20% | On model release |
| MT-Bench | Multi-turn conversation | 15% | Monthly |
All scores are normalized to a 0–100 scale using min-max normalization within the current leaderboard. Non-LLM AI services (image generators, audio tools, etc.) use category-specific benchmarks:
| Factor | Points | Measurement |
|---|---|---|
| Free tier available | 0–25 | Binary + generous vs. limited |
| API availability | 0–25 | Public API, documented, stable |
| Regional access | 0–25 | Tested from 10 countries (US, EU, RU, CN, IN, BR, JP, KR, NG, AU) |
| Pricing transparency | 0–15 | Clear pricing page, no hidden costs |
| Uptime (30-day) | 0–10 | Status page or external monitoring |
This is where Reincarnatiopedia's 202-language infrastructure provides unique value. We don't just check if a model claims to support a language — we test it.
ReIQ is the Reincarnatiopedia's proprietary metric, first proposed in Dreshmanis (2026). It measures an AI model's capacity for knowledge persistence — the ability to maintain context, identity, and accumulated decisions across sessions, updates, and version migrations.
| Test | What it measures | Weight |
|---|---|---|
| Amnesia Test | Model receives a complex task, session is interrupted. After restart, can it recover context from implicit cues? Measures session persistence. | 30% |
| Identity Continuity Test | After model version upgrade (e.g., v4 → v5), does it maintain consistent reasoning patterns, ethical stances, and decision-making style? | 25% |
| Cross-Context Transfer | Information provided in Context A (e.g., coding) appears in Context B (e.g., strategy). Can the model transfer knowledge across domains within a session? | 20% |
| Temporal Reasoning | Model is given a sequence of events with timestamps. Can it correctly infer causality, detect anachronisms, and project trends? | 15% |
| Consilium Divergence | When the model participates in a Multi-Model Consilium, does it maintain independent positions under social pressure, or collapse into consensus? | 10% |
Final TotalScore for each AI service:
All components normalized to 0–100 before weighting.
For non-LLM services without ReIQ data, the formula adjusts to:
TotalScore = (B × 0.45) + (A × 0.25) + (L × 0.30)
| Component | Frequency | Method |
|---|---|---|
| Benchmarks (B) | Weekly | Automated pull from LMSYS, HuggingFace |
| Accessibility (A) | Monthly | Automated availability checks + manual review |
| Language Support (L) | Quarterly | Automated test suite across 202 languages |
| ReIQ | Quarterly / on release | Consilium evaluation session |
We believe rankings should be auditable. For any AI service in the Ranking 500:
Reincarnatiopedia uses Claude (Anthropic), GPT-4o (OpenAI), Gemini (Google), DeepSeek, and other AI services in its infrastructure. To mitigate bias: