Workflows
The work TAISR is built for.
Five canonical jobs of technical AI safety research. Each is a first-class workflow with its own scaffold and output shape, not a special case of a generic chat surface.
Primary workflows
Literature synthesis
Draw together current evidence on a research question — say, chain-of-thought deception or what scheming evals do and do not measure — without smoothing disagreement into one fluent story. TAISR works per claim across the corpus, using a RECTIFIER-like decomposition: source mapping, support state, contradictions, and unresolved questions. The output is claim state, not prose followed by citations.
Benchmark and evaluation comparison
Compare safety-relevant evals — for example, METR's autonomy evals against Apollo's deception evals — without turning incompatible benchmarks into a fake leaderboard. TAISR keeps task distribution, time budget, tool access, scoring rule, contamination risk, and human-baseline conditions attached to the comparison.
Safety-case and reporting review
Review a safety argument or technical report for what the evidence actually supports. TAISR separates evidence-for, evidence-against, missing evidence, and decision implications, applying AlphaProof-like support discipline where formal verification is not available. "Insufficient evidence" is a first-class verdict, not a hedged conclusion.
Challenge and rebuttal review
Evaluate objections claim by claim, including objections that are partly right and partly wrong, instead of capitulating or defending wholesale. TAISR preserves the challenge trail and distinguishes methodological disagreement from counterevidence, closer to AI Co-Scientist's competing-hypothesis structure than a generic agree/disagree response.
Research-gap and hypothesis support
Map where the literature on a topic is settled, contested, or unaddressed — for example, which empirical questions about scheming evals are answered and which have not been attempted. TAISR treats absence as evidence state, not an invitation to invent novelty, and evaluates hypotheses against the packet rather than narrative plausibility.