Transparency

SPORE methodology

This page documents, transparently, how the metrics displayed on SPORE actually work. The project claims scientific rigour — rigour begins with stating honestly how each figure is produced, including when the method is imperfect.

The novelty score (0.00 to 1.00)

What it is : an estimate, by the LLM itself, of how original the formulated hypothesis is relative to the recent scientific literature found through Semantic Scholar.

How it is produced : after querying the Semantic Scholar API on the hypothesis's keywords, SPORE submits the full set of papers returned to the LLM (DeepSeek or Claude depending on the stage) with an explicit instruction: "assess how novel this hypothesis is against what already exists". The LLM returns a score between 0 and 1 and a categorical verdict (novel · incremental · already_explored · already_proven).

What it is NOT : an objective measure based on embeddings, cosine distance, or statistical analysis of the citation graph. There is no mathematical formula behind it. This is a heuristic self-assessment.

Acknowledged limitations

The score reflects the LLM's self-evaluation bias (a tendency to overrate its own outputs, or to underrate them when trained for caution).
The temporal window of comparison depends on the Semantic Scholar results and is not strictly bounded.
A "rediscovered" concept that has existed for 30 years but does not appear in the Semantic Scholar top 10 may be scored as novel.
The mean score observed across published briefs is 0.80 — this is a signal of systematic over-evaluation, not a neutral measurement.

How to read it : use the novelty score as a relative indicator (this hypothesis seems more novel than that one, according to the model), not as an absolute measure. To assess the genuine novelty of a hypothesis, read the "Novelty assessment" section of the brief, which lists the closest existing works identified.

Planned evolution : a future sprint (referenced in the backlog as N2.7-bis) will implement an algorithmic score based on semantic distance (sentence-transformers embeddings) and the absence of co-occurrence in the corpus. The current score will be retained alongside it for comparison.

The panel consensus score (0 to 10)

What it is : the weighted average of the individual scores given by the five AI reviewers of the post-publication panel.

How it is produced : each reviewer (Methodologist, Domain expert, Devil's advocate, Industry reviewer, Funding strategist) attaches a score out of 10 according to its own criteria, accompanied by a categorical verdict (strong_accept · accept · weak_accept · weak_reject · reject) and a confidence rating (0 to 1). The consensus_score is a weighted mean by each reviewer's confidence.

Meta-Reviewer verdict

If consensus_score ≥ 7.0 and verdict iter1 ≥ majority accept → publish_brief
If 4.5 ≤ consensus_score < 7.0 → revise_and_resubmit (one iteration maximum)
If consensus_score < 4.5 → reject

What it is NOT : an independent validation. The five reviewers are projections of the same linguistic representational space as the one that generated the hypothesis. They detect internal inconsistencies, not experimental contradiction.

The kill rate

What it is : the proportion of attempted collisions that do not yield a published brief.

How it is produced : a public counter, updated on each cycle, visible on the Statistics page.

Why it is high : across 2,095 collisions attempted, 38 briefs published (a 98.2% kill rate). This figure is NOT a defect — it is rigorous selection at work. The majority of random domain pairs do not yield a scientifically defensible causal bridge. Publishing every collision would amount to publishing 95% noise.

Reference verification

What it is : every DOI cited in a published brief has been technically verified through the Semantic Scholar API.

What it means : the DOI exists, it points to an article identified on Semantic Scholar, and the title and authors retrieved match what is cited in the brief.

What it does NOT mean : the conclusion cited has NOT been verified. SPORE guarantees that the references exist and are correctly identified. SPORE does not guarantee that the LLM has correctly interpreted the content of each article. For critical briefs, always verify the conclusion by consulting the original article.

Public costs

All LLM inference costs are published in real time on the Statistics page. Mean cost per brief: ~$0.51. Cumulative cost since launch: visible publicly.

Technical stack

LLM: DeepSeek V3.2 (primary) + Claude (critical stages)
Embeddings: sentence-transformers all-MiniLM-L6-v2
Bibliography: Semantic Scholar API
Scientific domains: OpenAlex (500 level-2 concepts)
Pipeline: Python 3.12, LangGraph, SQLite
Frontend: Next.js 14, App Router

This page is a living document. If you find an imprecise formulation, contact benoit@spore-research.com.

Updated: May 2026.