Search:

AI Czar David Sacks Explains the DeepSeek Freak Out

February 02, 202512:28
00:00:00
one of the really cool things about this
00:00:01
job is just that when something like
00:00:03
this happens I get to kind of talk to
00:00:05
everyone and everyone wants to talk and
00:00:08
i' I feel like I've talked to maybe not
00:00:10
everyone in like all the top people in
00:00:12
AI but it feels like most of them and
00:00:15
there's definitely a lot of takes all
00:00:16
over the map on deep seek but I feel
00:00:18
like I've started to put together a
00:00:20
synthesis based on hearing from the top
00:00:23
people in the field it was a bit of a
00:00:25
freakout I mean it's rare that a model
00:00:27
release is going to be a global news
00:00:28
story or called a trillion dollars of
00:00:31
market cap decline in in one day and so
00:00:33
it is interesting to think about like
00:00:34
why was this such a potent news story
00:00:37
and I think it's because there's two
00:00:40
things about that company that that are
00:00:42
different one is that obviously it's a
00:00:43
Chinese company rather than an American
00:00:45
company and so you have the whole China
00:00:46
versus
00:00:48
US competition and then the other is
00:00:51
it's O an open source company or at
00:00:53
least an open source the the R1 model
00:00:56
and so you've kind of got the whole open
00:00:57
source versus closed Source debate and
00:01:00
if you take either one of those things
00:01:02
out it probably wouldn't have been such
00:01:03
a a big story but I think the synthesis
00:01:05
of these things got a lot of people's
00:01:07
attention a huge part of Tik tok's
00:01:09
audience for example is international
00:01:12
some of them like the idea that that the
00:01:14
US may not win the AI race that the US
00:01:17
is kind of getting a come up and here
00:01:19
and I think that fueled some of the
00:01:20
early attention on Tik Tok similarly
00:01:23
there's a lot of people who are rooting
00:01:24
for open source or they have animosity
00:01:27
towards open AI
00:01:30
and so they were kind of rooting for
00:01:32
this idea that oh there's this open
00:01:33
source model that's going to give away
00:01:35
what open AI has done at 120th a cost so
00:01:38
I think all of these things provided
00:01:39
fuel for the story now I think the
00:01:42
question is okay what should we make of
00:01:44
this I mean I think there are things
00:01:46
that are true about the story and then
00:01:47
things that are not true or should be
00:01:50
debunked I think the let's call it true
00:01:53
thing here is that if you had said to
00:01:56
people a few weeks ago that the second
00:02:00
company to release a reasoning model
00:02:04
along the lines of 01 would be a Chinese
00:02:07
company I think people would have been
00:02:09
surprised by that so I think there was a
00:02:11
surprise and just to kind of back up for
00:02:12
people you know there's there's two
00:02:14
major kinds of AI models now there's
00:02:16
kind of the Bas llm model like Chachi
00:02:18
P40 or the Deep seek equivalent was V3
00:02:21
which they launched a month ago and
00:02:23
that's basically like a smart PhD you
00:02:25
ask a question gives you an answer then
00:02:27
there's the new reasoning models which
00:02:30
are based on reinforcement learning sort
00:02:32
of a separate process as opposed to
00:02:35
pre-training and 01 was the first Model
00:02:38
released along those lines and you can
00:02:41
think of a reasoning model as like a
00:02:43
smart PhD who doesn't give you a snap
00:02:45
answer but actually goes off and does
00:02:47
the work you can give it a much more
00:02:48
complicated question and it'll break
00:02:51
that complicated problem into a subset
00:02:53
of of smaller problems and then it'll go
00:02:56
step by step to solve the problem and
00:02:59
that's that's called Chain of Thought
00:03:01
right and so the new generation of
00:03:03
agents that are coming are based on this
00:03:05
type of idea of of chain of thought that
00:03:07
that an AI model can sequentially
00:03:09
perform tasks figure out much more
00:03:11
complicated problems so open AI was the
00:03:14
first to release this type of reasoning
00:03:16
model Google has a similar model they're
00:03:18
working on called Gemini 2.0 flinking
00:03:21
they've released kind of an early
00:03:22
prototype of this called Deep research
00:03:25
1.5 anthropic has something but I don't
00:03:28
think they've released it yet so other
00:03:30
companies have similar models to 01
00:03:34
either in the works or in some sort of
00:03:36
private beta but deep seek was really
00:03:39
the next one after open AI to release
00:03:41
you know the full public version of it
00:03:44
and moreover they open sourced it and so
00:03:46
this created a pretty big splash and I
00:03:49
think it was legitimately surprising to
00:03:51
people that the next big company to put
00:03:55
out a reasoning model like this would be
00:03:57
a a Chinese company and moreover they
00:04:00
would open source it give it away for
00:04:01
free and I think the API access is
00:04:03
something like 12th the the cost so all
00:04:06
of these things really did drive the the
00:04:09
new cycle and I think for good reason
00:04:11
because I think that if you had asked
00:04:12
most people in the industry a few weeks
00:04:15
ago how far behind is China on AI models
00:04:19
they would say six to 12 months and now
00:04:23
I think they might say something more
00:04:24
like 3 to 6 months right because 01 was
00:04:27
released about four months ago and R1 is
00:04:30
comparable to that so I think it's
00:04:32
definitely moved up people's time frames
00:04:35
for how close China is on on AI now
00:04:40
let's take the um we should take the
00:04:42
claim that they only did this for $6
00:04:44
million on this one I'm with Palmer
00:04:48
lucky and Brad gersner and others and I
00:04:50
think this has been pretty much
00:04:51
corroborated by everyone I've talked to
00:04:53
that that that number should be
00:04:56
debunked so first of all it's very hard
00:04:59
to Val
00:05:00
validate a claim about how much money
00:05:02
went into the training of this model
00:05:05
it's not something that we can
00:05:07
empirically discover but even if you
00:05:09
accept it at face value that $6 million
00:05:12
was for the final training run so when
00:05:14
the media is hyping up these stories
00:05:17
saying that this Chinese company did it
00:05:19
for six million and and these dumb
00:05:20
American companies did it for a billion
00:05:23
it's not an Apples to Apples comparison
00:05:25
right I mean if you were to make the
00:05:26
Apples to Apples comparison you would
00:05:27
need to compare the final training run
00:05:31
cost by Deep seek to that of open AI or
00:05:34
anthropic and what the founder of
00:05:37
anthropic said and what I think Brad has
00:05:40
said being an investor in open Ai and
00:05:42
having talked to them is that the final
00:05:45
training run cost was more in the tens
00:05:48
of millions of dollars about nine or 10
00:05:51
months ago and so you know it's not 6
00:05:54
million versus a billion okay it's a
00:05:56
billion dollar number might include all
00:05:58
the hardware they've bought the the
00:05:59
years of putting into it a holistic
00:06:01
number as opposed to the training number
00:06:03
yeah it's not running it's not fair to
00:06:05
compare let's call it a Soup To Nuts
00:06:07
number a fully loaded number by American
00:06:10
AI companies to the final training run
00:06:12
by the Chinese company but real quick
00:06:16
sax you've got you've got an open source
00:06:19
model and they've the the white paper
00:06:21
they put out there is very specific
00:06:24
about what they did to make it and uh
00:06:27
sort of the results they got out of it I
00:06:30
don't think they give the training data
00:06:31
but you could start to stress test what
00:06:34
they've already put out there and see if
00:06:36
you can do it cheap essentially like I
00:06:38
said I think it is hard to validate the
00:06:40
number I think that if let's just assume
00:06:42
that we give them credit for the Six
00:06:43
Million number my point is less that
00:06:45
they couldn't have done it but just that
00:06:48
we need to be comparing likes to likes
00:06:50
yeah so if for example you're going to
00:06:53
look at the fully loaded cost of what it
00:06:55
took deep seek to get to this point then
00:06:58
you would need to look at what has been
00:07:00
the R&D cost to date of all the models
00:07:04
and all the experiments and all the
00:07:06
training runs they've done right and the
00:07:08
compute cluster that they surely have so
00:07:12
Dylan Patel who's leading semiconductor
00:07:14
analyst has estimated that deep seek has
00:07:17
about 50,000 Hoppers and specifically he
00:07:20
said they have about 10,000 h100s they
00:07:23
have 10,000 H 800s and 30,000
00:07:27
h20s now the cost of that s sorry is
00:07:30
they deep seek or it's deep seek plus
00:07:32
the hedge fund deep seek plus the hedge
00:07:33
fund but it's the same founder right and
00:07:36
by the way that doesn't mean they did
00:07:37
anything illegal right because the h100s
00:07:39
were banned under export controls in
00:07:42
2022 then they did the H 800s in 20123
00:07:46
but this founder was very farsighted he
00:07:48
was very ahead of the curve and he was
00:07:50
through his hedge fund he was using AI
00:07:51
to basically do algorithmic trading so
00:07:55
he bought these chips a while ago in any
00:07:57
event you add up the the C cost of a
00:08:01
compute cluster with 50,000 plus Hoppers
00:08:04
and it's going to be over a billion
00:08:05
dollars so this idea that you've got
00:08:07
this Scrappy company that did it for
00:08:09
only six million just not true they have
00:08:11
a substantial compute
00:08:14
cluster that they Ed to to train their
00:08:18
models and frankly that doesn't count
00:08:21
any chips that they might have beyond
00:08:24
the 50,000 you know that they might have
00:08:26
obtained in violation of export
00:08:30
restrictions that obviously they're not
00:08:31
going to admit to and we just don't know
00:08:34
we don't really know the full extent of
00:08:36
of what they have so I just think it's
00:08:39
like worth pointing that out that I
00:08:41
think that part of the story got
00:08:42
overhyped it's hard to know what's fact
00:08:45
and what's fiction everybody who's on
00:08:46
the outside guessing
00:08:49
has their own incentive right like so if
00:08:52
you're a semiconductor analyst that
00:08:54
effectively is massively bullish Nvidia
00:08:58
you want it to be true
00:09:00
that it wasn't possible to train on $6
00:09:03
million obviously if you're the person
00:09:05
that makes an alternative that's that
00:09:06
disruptive you want it to be true that
00:09:09
it was trained on $6 million all of that
00:09:11
I think is all speculation the thing
00:09:14
that struck me
00:09:16
was how different their approach was and
00:09:19
TK just mentioned this but if you dig
00:09:21
into not just the original white paper
00:09:24
of deep seek but they've also published
00:09:26
some subsequent papers that have refined
00:09:28
some of the details
00:09:30
I do think that this is a case and Saks
00:09:32
you can tell me if you disagree but this
00:09:33
is a case where necessity was the mother
00:09:35
of invention so I'll give you two
00:09:38
examples where I just read these things
00:09:39
and I was like man these guys are like
00:09:41
really clever the first is as you said
00:09:45
let's let's put in a pin on whether they
00:09:46
distilled 01 which we can talk about in
00:09:49
a second but at the end of the day these
00:09:52
guys were like well how am I going to do
00:09:53
this reinforcement learning thing they
00:09:55
invented a totally different algorithm
00:09:57
there was the the Orthodoxy right this
00:09:59
thing called po that everybody used and
00:10:02
they were like no we're going to use
00:10:03
something else called I think it's
00:10:05
called grpo or something it uses a lot
00:10:08
less computer memory and it's highly
00:10:11
performant so maybe they were con strain
00:10:14
sacks practically speaking by some
00:10:16
amount of compute that caused them to
00:10:18
find this which you may not have found
00:10:20
if you had just a total surplus of
00:10:22
compute availability and then the second
00:10:24
thing that was crazy is everybody is
00:10:26
used to building models and compiling
00:10:28
through kud
00:10:30
which is nidia's proprietary language
00:10:33
which I've said for a couple times is
00:10:34
their biggest moat but it's also the
00:10:37
biggest threat factor for lockin and
00:10:39
these guys worked totally around Cuda
00:10:41
and they did something called PTX which
00:10:43
goes right to the bare metal and it's
00:10:45
controllable and it's effectively like
00:10:48
writing assembly now the only reason I'm
00:10:50
bringing these up is we meaning the West
00:10:52
with all the money that we've had didn't
00:10:55
come up with these
00:10:57
ideas and I think part of why we didn't
00:10:59
come up is not that we're not smart
00:11:01
enough to do it but we weren't forced to
00:11:03
because the constraints didn't exist and
00:11:05
so I just wonder how we make sure we
00:11:08
learn this principle meaning when the AI
00:11:11
company wakes up and rolls out of bed
00:11:13
and some VC gives them $200
00:11:16
million maybe that's not the right
00:11:18
answer for a series A or a seed and
00:11:20
maybe the right answer is 2 million so
00:11:23
that they do these deep seek like
00:11:26
Innovations constraint makes for great
00:11:28
art what do you think uh freedberg when
00:11:30
you're looking at this well I think it
00:11:33
also enables a new class of investment
00:11:36
opportunity given the low cost and the
00:11:40
speed it really highlights that maybe
00:11:42
the opportunity to create value doesn't
00:11:44
really sit at that level in the value
00:11:45
chain but further Upstream apology made
00:11:48
a comment on Twitter today that was
00:11:49
pretty funny or I think ref this about
00:11:52
the raer he like out the rapper may be
00:11:54
the the moat the the moat which is true
00:11:58
at the end of the day if model
00:12:00
performance continues to improve get
00:12:02
cheaper and it's so competitive that it
00:12:04
commoditized much faster than anyone
00:12:06
even
00:12:07
thought then the value is going to be
00:12:10
created somewhere else in the value
00:12:11
chain maybe it's not the rapper maybe
00:12:14
it's with the user and maybe by the way
00:12:15
here's an important point maybe it's
00:12:17
further in the economy you know when
00:12:19
electricity production took off in the
00:12:21
United States it's not like the
00:12:22
companies are making a lot of money that
00:12:23
are making all the electricity it's the
00:12:25
rest of the economy that Acres a lot of
00:12:26
the value

Podspun Insights

In this episode, a lively discussion unfolds around the recent release of a groundbreaking AI model by a Chinese company, Deep Seek. The conversation dives into the implications of this release, which has sent ripples through the tech world and sparked debates about the future of AI. The host shares insights from conversations with top industry experts, revealing a fascinating synthesis of opinions on why this model’s launch became a global news sensation. The mix of U.S.-China competition and the open-source versus closed-source narrative adds layers of intrigue, making it a pivotal moment in AI history.

Listeners are taken on a rollercoaster ride through the complexities of AI development, including the differences between traditional large language models and the new reasoning models that can tackle intricate problems through a method called Chain of Thought. The episode highlights how the unexpected emergence of a Chinese company in this space has shifted perceptions about the global AI race, with experts reevaluating how close China is to catching up.

As the dialogue progresses, the host addresses the sensational claims surrounding the cost of developing the model, emphasizing the need for accurate comparisons and debunking myths that have fueled media hype. With a mix of technical insights and strategic analysis, the episode captures the essence of innovation driven by necessity, showcasing how constraints can lead to remarkable breakthroughs.

Ultimately, this episode is not just about AI; it’s a reflection on the evolving landscape of technology, investment opportunities, and the creative solutions that arise when faced with challenges. It’s a thought-provoking exploration that leaves listeners pondering the future of AI and its broader implications for society.

Badges

This episode stands out for the following:

  • 90
    Most shocking
  • 90
    Best overall
  • 90
    Best concept / idea
  • 90
    Most surprising

Episode Highlights

  • The Global Impact of AI Releases
    A model release can become a global news story, especially when it involves international competition.
    “It's rare that a model release is going to be a global news story.”
    @ 00m 25s
    February 02, 2025
  • China's AI Surprise
    The second company to release a reasoning model was unexpectedly a Chinese company, surprising many in the industry.
    “Surprise! The second company to release a reasoning model is Chinese.”
    @ 01m 56s
    February 02, 2025
  • Debunking the $6 Million Myth
    The claim that a Chinese company trained its model for only $6 million is misleading and oversimplified.
    “This idea that you've got this scrappy company that did it for only six million is just not true.”
    @ 08m 07s
    February 02, 2025

Episode Quotes

Key Moments

  • Talking to Experts00:08
  • Global News Impact00:25
  • AI Competition01:56
  • Debunking Myths08:07
  • Innovation Under Constraint11:28

Words per Minute Over Time

Vibes Breakdown