DeepSeek Doesn’t Deter Nvidia’s Growth

01/28/2025

Here is my understanding of the DeepSeek breakthrough and its repercussions on the AI ecosystem

DeepSeek used “Time scaling” effectively, which allows their r1 model to think deeper at the inference phase. By using more power instead of coming up with the answer immediately, the model will take longer to research for a better solution and then answer the query, better than existing models.

How did the model get to that level of efficiency?

DeepSeek used a lot of interesting and effective techniques to make better use of its resources, and this article from NextPlatform does an excellent job with the details.

Besides effective time scaling the model distilled the answers from other models including ChatGPT’s models.

What does that mean for the future of AGI, AI, ASI, and so on?

Time scaling will be adopted more frequently, and tech leaders across Silicon Valley are responding to improve their methods as cost-effectively as possible. That is the logical and sequential next step – for AI to be any good, it was always superior inference that was going to be the differentiator and value addition.

Time scaling can be done at the edge as the software gets smarter.

If the software gets smarter, will it require more GPUs?

I think the GPU requirements will not diminish because you need GPUs for training and time scaling, smarter software will still need to distill data.

Cheaper LLMs are not a plug-and-play replacement. They will still require significant investment and expertise to train and create an effective inference model. Just as a number aiming at a 10x reduction in cost is a good target but it will compromise quality and performance. Eventually, the lower-tier market will get crowded and commoditized – democratized if you will, which may require cheaper versions of hardware and architecture from AI chip designers, as an opportunity to serve lower-tier customers.

Inferencing

Over time, yes inference will become more important – Nvidia has been talking about the scaling law, which diminishes the role of training and the need to get smarter inference for a long time. They are working on this as well, I even suspect that the $3,000 Digits they showcased for edge computing will provide some of the power needed.

Reducing variable costs per token/query is huge: The variable cost will reduce, which is a huge boon to the AI industry, previously retrieving and answering tokens cost more than the entire monthly subscription to ChatGPT or Gemini.

From Gavin Baker on X on APIs and Costs:

R1 from DeepSeek seems to have done that, “r1 is cheaper and more efficient to inference than o1 (ChatGPT). r1 costs 93% less to *use* than o1 per each API, can be run locally on a high end work station and does not seem to have hit any rate limits which is wild.

However, “Batching massively lowers costs and more compute increases tokens/second so still advantages to inference in the cloud.”

It is comparable to o1 from a quality perspective although lags o3.

There were real algorithmic breakthroughs that led to it being dramatically more efficient both to train and inference.

On training costs and real costs:

Training in FP8, MLA and multi-token prediction are significant. It is easy to verify that the r1 training run only cost $6m.

The general consensus is that the “REAL” costs with the DeepSeek model much larger than the $6Mn given for the r1 training run.

Omitted are:

Hundreds of millions of dollars on prior research and has access to much larger clusters.

Deepseek likely had more than 2048 H800s; An equivalently smart team can’t just spin up a 2000 GPU cluster and train r1 from scratch with $6m.

There was a lot of distillation – i.e. it is unlikely they could have trained this without unhindered access to GPT-4o and o1, which is ironical because you’re banning the GPU’s but giving access to distill leading edge American models….Why buy the cow when you can get the milk for free?

The NextPlatform too expressed doubts about DeepSeek’s resources

We are very skeptical that the V3 model was trained from scratch on such a small cluster.

A schedule of geographical revenues for Nvidia’s Q3-FY2025 showed 15% of Nvidia’s or over $4Bn revenue “sold” to Singapore, with the caveat that it may not be the ultimate destination, which also creates doubts that DeepSeek may have gotten access to Nvidia’s higher-end GPUs despite the US export ban or stockpiled them before the ban.

Better software and inference is the way of the future

As one of the AI vendors at CES told me, she had the algorithms to answer customer questions and provide analytical insides at the edge for several customers – they have the data from their customers and the software, but they couldn’t scale because AWS was charging them too much for cloud GPU usage when they didn’t need that much power. So besides r1’s breakthrough in AGI, this movement has been afoot for a while, and this will spur investment and innovation in inference. We will definitely continue to see demand for high-end Blackwell GPUs to train data and create better models for at least the next 18 months to 24 months after which the focus should shift to inference and as Nvidia’s CEO said, 40% of their GPUs are already being used for inference.

Leave a Reply Cancel reply