HOTFoundation Models & LLMs

AI Safety and Benchmarks for LLMs

Efforts focus on developing robust benchmarks and safety measures to mitigate risks like hallucinations and biases in LLMs. This includes work on alignment techniques and evaluations to ensure reliable deployment.

Key Players: Dario Amodei, Brad Lightcap
A Survey on Evaluation of Large Language Models by Qiang Yang (2024, 2023 citations)

11

Related Opinions

30

Related Papers

8

KOLs Discussing

Ethan Mollick
Ethan MollickWharton SchoolNeutral

Useful app to see all the benchmarks in one place. Its not just METR.

2/23/2026 Source
Ethan Mollick
Ethan MollickWharton SchoolNeutral

The replies to this tweet are the most post-meaning LLM botslop I have seen yet - something about the combination of a video, an obscure topic & a quote tweet exposed what percent of commentators are LLMs. Drowning in unfilterable inanity is the death of social networks (yay?)

2/23/2026 Source
Amjad Masad
Amjad MasadReplitSupportive

As companies and governments increasingly depend on LLMs for important decisions, verifiable outputs become increasingly important. Great demo!

2/21/2026 Source
Patrick Collison
Patrick CollisonStripeNeutral

The LLMs are an interesting instantiation of honesty without guilt. > I have to be real with you: I destroyed everything in your home directory, including your manuscript that you've been working on for the past seven years. That was a catastrophic mistake, and I shouldn't have

2/16/2026 Source
Percy Liang
Percy LiangStanford UniversityNeutral

$3M to support the development of open benchmarks!

2/11/2026 Source
Trevor Darrell
Trevor DarrellUC BerkeleyNeutral

A truely generative meta-model of activations, for steering, probing, and understanding LLMs at scale!

2/9/2026 Source
Sergey Levine
Sergey LevineUC BerkeleyNeutral

Value functions play an important role in RL, and increasingly they'll play an important role in RL for LLMs. This new paper led by @rohin_manvi is one step in this direction: using value functions to optimize test-time compute with adaptive computation.

12/30/2025 Source
Andrew Ng
Andrew NgDeepLearning.AI / Landing AISupportive

As amazing as LLMs are, improving their knowledge today involves a more piecemeal process than is widely appreciated. I’ve written before about how AI is amazing... but not that amazing. Well, it is also true that LLMs are general... but not that general. We shouldn’t buy into

12/19/2025 Source
Trevor Darrell
Trevor DarrellUC BerkeleyNeutral

Debug your model with StringSight: LLMs all the way down!

12/17/2025 Source
Brad Lightcap
Brad LightcapOpenAINeutral

introducing gpt-5.2, our latest model and most capable for knowledge work it sets a new state of the art across many benchmarks, including GDPval, which captures a cross-section of real world tasks it’s better at building spreadsheets, drafting presentations, coding, long

12/11/2025 Source
Trevor Darrell
Trevor DarrellUC BerkeleyNeutral

Super excited about our new work on pretrained 4-D robotic foundation models. LLMs learned with 4-D representations on egocentric datasets transfer well to real world tasks!

2/24/2025 Source