A summary of recent AI research papers, research-tool releases, and lab updates.
I made this small AI research radar to keep up with AI news related to research. It autonomously collects signals such as new papers, lab announcements, model and developer-tool updates, research-writing tools, AI-for-science work, mathematical reasoning, literature-search systems, robotics, hardware, and selected company news. The results are filtered by a set of research-oriented keywords.
This paper evaluates LLM-driven formal proof search in Lean for open mathematics. Its strongest agent solved 9 of 353 open Erdős problems, proved 44 of 492 OEIS conjectures, and shows how AI-aided proof search can start contributing to real mathematical research.
NVIDIA describes how its BioNeMo stack, Parabricks, and RTX PRO 4500 Blackwell Server Edition GPU accelerate precision-medicine workloads. Parabricks moves genomic analysis tasks such as alignment and variant calling from hours to minutes; the RTX PRO 4500 Blackwell gives roughly 2x gains for tools including Minimap2, fq2bam, and DeepVariant. For protein work, OpenFold3 sees up to 2.4x speedups, while Smith-Waterman…
Keywords: Blackwell, life sciences, molecular, NVIDIA, protein · score 47
Google Research's ERA is a Gemini-based research coding system that searches literature, writes code, explores solutions, and evaluates results for scientific problems. The Nature-published work reports expert-level performance across genomics, public health, satellite imagery, neuroscience, time-series forecasting, and mathematics, and feeds into the Computational Discovery trusted-tester tool.
Keywords: computational discovery, empirical research assistance · score 26.6
NVIDIA summarizes eight ICRA 2026 research papers focused on moving robotics policies from simulation into real-world deployment. The work spans GPU-accelerated multi-arm planning, Isaac Lab-trained navigation policies that transfer across robot bodies, cluttered-object grasping, deformable-object manipulation, precise assembly, and vision-language-action reliability. Reported results include 3x faster multi-arm pla…
Artificial Analysis and IBM introduce ITBench-AA, an agentic enterprise IT benchmark focused first on Kubernetes SRE incident response. Frontier models inspect logs, traces, metrics, and topology to identify root-cause entities; the launch report says all frontier models scored below 50%, with Claude Opus 4.7 and GPT-5.5 near the top.
MagenticLite is an agentic system for small models that works across the browser and local file system in a single workflow. It combines specialized models and orchestration to support efficient agentic performance on everyday tasks.
Anthropic introduces Claude Opus 4.8 as an upgrade to its Opus model family, emphasizing stronger coding, agentic-task, and professional-work performance. The release is relevant for research workflows because it targets long-running, consistency-sensitive work such as coding, analysis, document-heavy tasks, and tool-using agent workflows.
Apple researchers introduce ParaRNN, a framework that makes nonlinear RNNs trainable in parallel by recasting recurrent computation as a system solved with Newton-style iterations and parallel scans. The ICLR 2026 Oral work reports a 665x speedup over sequential training, enables 7B-parameter GRU/LSTM-style language models competitive with transformers and Mamba, and includes a public codebase for experimenting with…
Keywords: source/tag match · score 6
arXiv — formal proof and mathematical reasoningMay 22, 2026
This paper tests Claude Code as an agentic prover for Lean 4 program verification on the CLEVER benchmark. It reports high rates of valid specifications, implementation certification, and end-to-end verified generation, suggesting current program-verification benchmarks may be too easy for modern agentic proving systems.
Large language models (LLMs) are revolutionizing the financial trading landscape by enabling sophisticated analysis of vast amounts of unstructured data to...
Google Research summarizes I/O 2026 research launches across science, agents, open models, hardware, weather, and quantum. Highlights include Gemini for Science, ERA and Co-Scientist work, Antigravity 2.0 for multi-agent development, Gemma V4, and research moving into product and scientific workflows.
NVIDIA's Nemotron-Labs Diffusion models bring diffusion-style text generation to LLM workflows. The family supports autoregressive, diffusion, and self-speculation modes, with open 3B, 8B, and 14B text models plus training code; NVIDIA reports substantially higher token-per-forward-pass throughput while keeping familiar deployment paths.
Data Formulator introduces AI-powered analytics for enterprise data workflows. Data teams can easily bring enterprise data into an AI-ready workspace where users can explore, analyze, and visualize data with AI agents to turn raw data into actionable insights.
Keywords: agents · score 11
arXiv — formal proof and mathematical reasoningMay 25, 2026
This Lean 4 paper targets a bottleneck in parallel tactic search: each branch often re-runs expensive elaboration instead of reusing the proof state. The authors introduce proof-state snapshotting in the Lean language server so many search branches can reuse one elaborated state, making portfolio-style proof search more practical.
Autonomous AI agents are becoming more capable. Open models, Model Context Protocol (MCP)-connected tools, and portable skills are also making agents easier to...
AutoformBot is a multi-agent Lean 4 system for autoformalizing textbook mathematics at scale. It produced Atlas, a verified library with more than 45,000 declarations and 500,000 lines of Lean code from 26 open-access graduate-level textbooks, suggesting large-scale autoformalization is becoming technically and economically feasible.
NVIDIA CUDA 13.3 brings new capabilities and performance optimizations to developers across the CUDA ecosystem. The launch of NVIDIA CUDA Tile programming in...
Keywords: CUDA, GPU, NVIDIA · score 28.3
arXiv — formal proof and mathematical reasoningMay 27, 2026
This paper studies when Lean can be trusted as a judge for natural-language math answers, finding that proof success is highly coverage-dependent and sometimes unfaithful. It introduces COVCAL, a selective-risk method that accepts answers only when Lean-trace diagnostics support a calibrated accuracy guarantee, otherwise abstaining.
SoundnessBench tests whether LLMs can judge the methodological soundness of machine-learning research ideas before costly experiments. Across 12 frontier models, the benchmark finds a strong optimism bias toward weak proposals, suggesting current AI scientist systems still need better proposal-evaluation tools.