Foundation Models & LLMs
Large language models, training at scale, architecture innovations, benchmarks
The core technology layer — who is building the best models and how fast they improve
In the Foundation Models & LLMs sector, the most critical development right now is the rapid advancement of scaling laws and multimodal capabilities, driven by efficiency gains from key players like Stanford and Google. With 117 out of 240 expert stances supporting ongoing innovations, models are achieving significant performance improvements, as evidenced by recent papers, but this progress is tempered by growing concerns over environmental sustainability and safety risks. This shift underscores a high-activity period, with new benchmarks and experiments pushing the boundaries of AI utility. The hottest sub-topics include scaling laws in LLMs, where Percy Liang from Stanford argues that scaling compute and data yields efficiency gains, as detailed in his 2024 paper 'Lost in the Middle' with 648 citations, despite evidence of diminishing returns. Another key area is AI safety and benchmarks, led by Dario Amodei of Anthropic and Brad Lightcap, who advocate for robust evaluations to address hallucinations and biases, as outlined in Qiang Yang's 2024 survey with over 2,000 citations. Multimodal AI integration, championed by Hugo Larochelle and Bernhard Schölkopf, is also warming up, with Sergey Levine's 2023 paper on PaLM-E demonstrating enhanced real-world applications in search and robotics. A central debate in the sector centers on whether scaling laws remain the optimal path for LLM advancement. Proponents like Percy Liang and Jeff Dean from Google assert that scaling drives performance improvements and real-world applications, based on experiments showing efficiency gains. In contrast, critics such as Bernhard Schölkopf from the Max Planck Institute and Nick Frosst argue that it leads to diminishing returns and unsustainable environmental costs, as highlighted in research emphasizing the need for alternative approaches to mitigate carbon footprints. For investors, the implications are substantial: opportunities exist in backing companies focused on efficient architectures and safety measures, potentially yielding high returns amid rapid innovation. However, watch for regulatory hurdles related to environmental impacts and ethical concerns, as these could delay deployments and increase costs, with the sector's current momentum creating a narrow window for strategic investments before potential overregulation stifles growth.
Key Voices in Foundation Models & LLMs

Brad Lightcap
OpenAI
5 posts

Trevor Darrell
UC Berkeley
4 posts

Aravind Srinivas
Perplexity AI
4 posts

Casey Newton
Platformer
4 posts

Guillermo Rauch
Vercel
2 posts

Mark Chen
OpenAI
2 posts

Sam Altman
OpenAI
2 posts

Emad Mostaque
Stability AI
2 posts

Tri Dao
FlashAttention
2 posts

Aidan N. Gomez
Cohere
1 posts

Dario Amodei
Anthropic
1 posts

Alexandr Wang
Scale AI
1 posts

we're partnering with @bcg @mckinsey @accenture and @capgemini to deploy openai frontier to enterprises globally https://t.co/5dKA0LViti

Unicorns have always been used to measure sparks of AGI. (This was written by GPT-2 in February, 2019)

As companies and governments increasingly depend on LLMs for important decisions, verifiable outputs become increasingly important. Great demo!

Something folk haven't figured out: 15,000 tokens/second speed and million token context windows aren't for humans They are for the AIs to talk to each other & coordinate faster than we ever could Not just a bit faster and better Orders of magnitude That's your competition

The future of design is… engineering. All designers at @vercel now also build, thanks to tools like @v0, Claude Code, and Cursor. They've been contributing to our frontends and apps for a while now. But over the past few months, the leap they've made is engineering the design https://t.co/5un9xjSxoY

This is incredible btw - using Gemini 3.1 as a city builder. I used to dream about this when painstakingly making virtual cities for simulation games like Republic.

Gemini 3 Pro has been upgraded to Gemini 3.1 Pro for all Perplexity Pro and Max users (consumer and enterprise). It's the second most picked model by our Enterprise customers after Claude 4.5 Sonnet/Opus family. Enjoy! https://t.co/E5SH1WxnH5

AI is an amplifier of your intellect and values. A mirror of your soul. If you were a confirmation bias person, AI can be catastrophic for you. There’s some way to contort almost any prompt to give you the answer you’re looking for. The extreme version of this is AI psychosis.

Sonnet 4.6 for all Perplexity Pro and Max customers available now (consumer and enterprise), across all clients - web, mobile, Comet

Happy for my brother. An absolute triumph for Benchmark.

New record for GPT 5.2 Pro ⏲️ Wonder when this will be days 🤔 https://t.co/scuvbDEDrr

New family of Aya models that are small a very effective at key geographies!

Here's an interesting visual reasoning benchmark at which 3-year olds apparently handily beat all frontier models. https://t.co/vDyAlW2BKQ https://t.co/eXfW6bRMtd

Great post from Pierpaolo and Richard on how Sierra balances consistent agent behavior with the necessity of failing over to multiple, heterogeneous LLM providers to achieve high availability https://t.co/Ox0LDTDeBs

This is definitely something to be aware of both for benchmark builders and users IMO. For longer-running, more difficult tasks, the differences between which agent you use can be big, like a 10% gain in success rate when going from Claude Code to OpenHands.

Making progress in Quantum Field Theory with GPT-5.2. It's happening, for real.

$3M to support the development of open benchmarks!

We updated GPT-5.2 (the instant model) in ChatGPT today. Not a huge change, but hopefully you find it a little better.

We fixed search over your history (past threads) on Perplexity. Works really well now. https://t.co/fsDwXcBCz7

A truely generative meta-model of activations, for steering, probing, and understanding LLMs at scale!

We've upgraded Perplexity's Advanced Deep Research harness to run with Opus 4.6 (from last week's version with Opus 4.5). This furthers our lead on Google's DSQA benchmark over other alternatives. Rolled out to all Max users immediately, and slowly rolling to all Pro users. https://t.co/8wmfBxkwSP

we wrote about our in-house data agent used by ~4k people, from product/eng, to research, GTM, finance, and more it was built with codex, and runs on codex, gpt-5, and our evals & embeddings APIs like codex, it works like a teammate you can collab with https://t.co/sjPGis8CHk

Wonderful collaboration with @francesarnold We employed genSLM the first genome scale language model to design functional and versatile enzymes.

I can’t wait for tonight’s rubber match to the Bears-Packers trilogy this season. Both of the regular season games were fantastic (the first settled on a late interception of Caleb Williams, and the second in OT on a Caleb bomb to DJ Moore). Caleb Williams' first playoff game, https://t.co/9tLLmrG6Uf

introducing openai for healthcare it includes chatgpt for healthcare, as well as models optimized for care providers and workflows both our APIs and chatgpt support HIPAA compliance requirements we're partnering with HCA, boston children's hospital, MSK, stanford health and

I've decided to release a minimal, free online version of my upcoming "10-202 - Intro to Modern AI" course, starting January 26: https://t.co/ptnrNmVPyf. As a brief summary, this course introduces students to the elements of modern AI systems: you'll build and train a simple LLM

a master class on the physics of language models by FAIR's @ZeyuanAllenZhu

Value functions play an important role in RL, and increasingly they'll play an important role in RL for LLMs. This new paper led by @rohin_manvi is one step in this direction: using value functions to optimize test-time compute with adaptive computation.

