Fountainheadinvesting

Fountainhead Investing

  • Objective Analysis: Research On High Quality Companies With Sustainable Moats
  • Tech Focused: 60% Allocated To AI, Semiconductors, Technology

5 Star Tech Analyst Focused On Excellent Companies With Sustainable Moats

Categories
AI Cloud Service Providers Industry Semiconductors Stocks

Hyperscalers, Meta and Microsoft Confirm Massive Capex Plans

Meta (META) has committed to $60-65Bn of Capex and Microsoft (MSFT) $80Bn: After the DeepSeek revelations, this is a great sign of confidence for Nvidia (NVDA), Broadcom (AVGO), and Marvel (MRVL) and other semiconductor companies. Nvidia, Broadcom, and Marvell should continue to see solid demand in 2025.

Meta CEO, Mark Zuckerberg also mentioned that one of the advantages that Meta has (and other US firms by that same rationale) is that they will have a continuous supply of chips, which DeepSeek will not have, and the likes of US customers like Meta will easily outperform when it comes to scaling and servicing customers. (They will fine-tune Capex between training and inference). Meta would be looking at custom silicon as well for other workloads, which will help Broadcom and Marvell.

Meta executives specifically called out a machine-learning system designed jointly with Nvidia as one of the factors driving better-personalized advertising. This is a good partnership and I don’t see it getting derailed anytime soon.

Meta also talked about how squarely focused they were on software and algorithm improvements. Better inference models are the natural progression and the end goal of AI. The goal is to make AI pervasive in all kinds of apps for consumers/businesses/medical breakthroughs, and so on. For that to happen you still need scalable computing power to reach a threshold when the models have been trained enough to provide better inference and/or be generative enough, to do it for a specific domain or area of expertise.

This is the tip of the iceberg, we’re not anywhere close to reducing the spend. Most forecasts that I looked at saw data center training spend growth slowing down only in 2026, and then spending on inference growing at a slower speed. Nvidia’s consensus revenue forecasts show a 50% revenue gain in 2025 and 25% thereafter, so we still have a long way to go.

I also read that Nvidia’s GPUs are doing 40% of inference work, they’re very much on the ball on inference.

The DeepSeek impact: If DeepSeek’s breakthrough in smarter inference were announced by a non-Chinese or an American company and if they hadn’t claimed a cheaper cost, it wouldn’t have made the impact it did.  The surprise element was the reported total spend, and the claim that they didn’t have access to GPUs – it was meant to shock and awe and create cracks in the massive spending ecosystem, which it is doing. But the reported total spend or not using high GPUs doesn’t seem plausible, at least to me. Here’s my earlier article detailing some of the reasons. The Chinese government subsidized every export entry to the world, from furniture to electric vehicles, so why not this one? That has been their regular go-to-market strategy. 

Cheaper LLMs are not a plug-and-play replacement. They will still require significant investment and expertise to train and create an effective inference model. I think the GPU requirements will not diminish because you need GPUs for training and time scaling, smarter software will still need to distill data. 

Just as a number aiming at a 10x reduction in cost is a good target but it will compromise quality and performance. Eventually, the lower-tier market will get crowded and commoditized – democratized if you will, which may require cheaper versions of hardware and architecture from AI chip designers, as an opportunity to serve lower-tier customers.

American companies will have to work harder, for sure – customers want cheap (Databricks’ CEO’s phone hasn’t stopped ringing for alternative solutions) unless they TikTok this one as well…..

Categories
AI Cloud Service Providers Industry Semiconductors Stocks

DeepSeek Hasn’t Deep-Sixed Nvidia

01/28/2025

Here is my understanding of the DeepSeek breakthrough and its repercussions on the AI ecosystem

DeepSeek used “Time scaling” effectively, which allows their r1 model to think deeper at the inference phase. By using more power instead of coming up with the answer immediately, the model will take longer to research for a better solution and then answer the query, better than existing models.

How did the model get to that level of efficiency?

DeepSeek used a lot of interesting and effective techniques to make better use of its resources, and this article from NextPlatform does an excellent job with the details.

Besides effective time scaling the model distilled the answers from other models including ChatGPT’s models.

What does that mean for the future of AGI, AI, ASI, and so on?

Time scaling will be adopted more frequently, and tech leaders across Silicon Valley are responding to improve their methods as cost-effectively as possible. That is the logical and sequential next step – for AI to be any good, it was always superior inference that was going to be the differentiator and value addition.

Time scaling can be done at the edge as the software gets smarter.

If the software gets smarter, will it require more GPUs?

I think the GPU requirements will not diminish because you need GPUs for training and time scaling, smarter software will still need to distill data. 

Cheaper LLMs are not a plug-and-play replacement. They will still require significant investment and expertise to train and create an effective inference model. Just as a number aiming at a 10x reduction in cost is a good target but it will compromise quality and performance. Eventually, the lower-tier market will get crowded and commoditized – democratized if you will, which may require cheaper versions of hardware and architecture from AI chip designers, as an opportunity to serve lower-tier customers.

Inferencing

Over time, yes inference will become more important – Nvidia has been talking about the scaling law, which diminishes the role of training and the need to get smarter inference for a long time. They are working on this as well, I even suspect that the $3,000 Digits they showcased for edge computing will provide some of the power needed.

Reducing variable costs per token/query is huge: The variable cost will reduce, which is a huge boon to the AI industry, previously retrieving and answering tokens cost more than the entire monthly subscription to ChatGPT or Gemini.

From Gavin Baker on X on APIs and Costs:

R1 from DeepSeek seems to have done that, “r1 is cheaper and more efficient to inference than o1 (ChatGPT). r1 costs 93% less to *use* than o1 per each API, can be run locally on a high end work station and does not seem to have hit any rate limits which is wild.

However, “Batching massively lowers costs and more compute increases tokens/second so still advantages to inference in the cloud.”

It is comparable to o1 from a quality perspective although lags o3.

There were real algorithmic breakthroughs that led to it being dramatically more efficient both to train and inference.  

On training costs and real costs:

Training in FP8, MLA and multi-token prediction are significant.  It is easy to verify that the r1 training run only cost $6m.

The general consensus is that the “REAL” costs with the DeepSeek model much larger than the $6Mn given for the r1 training run.

Omitted are:

Hundreds of millions of dollars on prior research and has access to much larger clusters.

Deepseek likely had more than 2048 H800s;  An equivalently smart team can’t just spin up a 2000 GPU cluster and train r1 from scratch with $6m.  

There was a lot of distillation – i.e. it is unlikely they could have trained this without unhindered access to GPT-4o and o1, which is ironical because you’re banning the GPU’s but giving access to distill leading edge American models….Why buy the cow when you can get the milk for free?

The NextPlatform too expressed doubts about DeepSeek’s resources

We are very skeptical that the V3 model was trained from scratch on such a small cluster.

A schedule of geographical revenues for Nvidia’s Q3-FY2025 showed 15% of Nvidia’s or over $4Bn revenue “sold” to Singapore, with the caveat that it may not be the ultimate destination, which also creates doubts that DeepSeek may have gotten access to Nvidia’s higher-end GPUs despite the US export ban or stockpiled them before the ban. 

Better software and inference is the way of the future

As one of the AI vendors at CES told me, she had the algorithms to answer customer questions and provide analytical insides at the edge for several customers – they have the data from their customers and the software, but they couldn’t scale because AWS was charging them too much for cloud GPU usage when they didn’t need that much power. So besides r1’s breakthrough in AGI, this movement has been afoot for a while, and this will spur investment and innovation in inference. We will definitely continue to see demand for high-end Blackwell GPUs to train data and create better models for at least the next 18 months to 24 months after which the focus should shift to inference and as Nvidia’s CEO said, 40% of their GPUs are already being used for inference.

Categories
Semiconductors

Teradyne’s Weak 2024 Outlook Sparks Concerns, While Advantest Surges Ahead

Teradyne’s guidance was a big disappointment – they’re forecasting zero growth for 2024, mainly because the first two quarters will be lower but growth will pick up in Q3, Q4. Consensus estimates were for 10% growth in 2024.

Even as end clients like Cloud Service Providers and hyperscalers have bought more semiconductors in the last 6 months, capacity utilization of Teradyne’s testing equipment is still low and new buying of equipment will not be triggered till capacity is used up. This isn’t always a linear relationship, and often there are lags. Second – mobile and PC’s markets have also not increased demand and Teradyne will not get visibility till April for mobile phones demand (read Apple via TSM, which is their largest customer).

Management believes that there is little downside left, and they see utilization rates improving and unit growth in PC’s and smartphones could be a tailwind in the second half. They are still maintaining 2026 estimates of $4.3Mn and over $6 in EPS, but they have a lot of catching up to do.

Contrast this with closest competitor, Advantest (ATEYY), which surprised this morning and increased guidance by 2-3% for the next quarter.. They’re expecting a great second half as well.

I have more Advantest than Teradyne and it’s also done much better for me. If things don’t improve at Teradyne I may just focus on Advantest.

Categories
Networking

Arista Networks: A Strong Buy for Long-Term Growth in Network Infrastructure

Arista Networks: (ANET) Buy, $$245  One year target $280. 

Invest 5 Years, 16-20% annual return. P/E 34, 3-5 year EPS growth 14-16%. 

Best large-scale network provider for hyperscalers like Meta and Microsoft. Unlike Cisco (CSCO), Arista didn’t focus on selling gear, instead, it partnered with hyperscalers to build their networks and platforms from scratch. This is a unique competitive advantage and very profitable too; Arista boasts the best margins (32% operating profit) and cash flow in the industry. It is a bit expensive with much of the Earnings growth of 44% in 2023 already priced in, with the stock doubling from $120 last year. Still, an excellent long-term play as the pick and shovels play for AI and high-speed data networks; it tends to surprise so the EPS growth could likely be higher.

I’ve owned it since May 2023 and I add on declines, my last purchase was around $231.