• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

NVIDIA stock to lose $400 billion ( US tech 1 Trillion) after DeepSeek release

Myths

Member
Nvidia and OpenAI when investors find out they aint need all that money

giphy.gif
 
Last edited:
And personally I think Open Source AI is a Pandora’s box that can’t be put back and now we have hostile actors that are actively using it. And not just CCP.
1000% agree. We, as a society, are not mature enough nor responsible enough to have tech this powerful so easily and readily available to every one on the planet. I mean, just think of what kind of damage a modern day school shooter will be able to do in the future with help from AI (or AGI).
 

Fabieter

Member
I moved to Australia and because they have no car industry to "protect" they allow Chinese cars here and they really are as good as major brands, in most cases with better features, but at a fraction of the cost.

The problem is that Chinas government is paying alot for these companies so they can actually sell those products at garage prices. They go against every fair trade agreement and are rightfully banned in alot of countries.
 

hyperbertha

Member
Not so sure AGI by 2030. LLMs are not enough. But I mean, this advances so fast, I'm predicting on a hunch really. It doesn't seem to be very far and seems attainable by 2030 but there's probably a few hurdles in the way. The whole training approach might have to change. On top of government roadblocks slowing things down.
Does it really advance so fast? We haven't had any meaningful advance since chatgpt. O1 brute forces reasoning to the point of can solve a few more problems. But it's not really reasoning, many experts say. I tend to agree. It seems these reasoning models are rather just memorizing the kind of steps a particular type of problem might warrant.
 
Last edited:

nemiroff

Gold Member
What's good about this (Deepseek R1): From here on apps will be lightweight enough to run on devices locally without the need to be connected all the time.

In the bigger picture, when the dust settles, I can't see there being much if any proprietary stuff going on here to stop everyone else to do the same. And the "saved" energy will just be used to further accelerate the overarching timeline. I mean, there's no stopping the AI race, and we're just at the beginning. Who knows, Nvidia stock may soon be up again for what I know.

I watched a software engineer explain this, and here's the summary:

Distillation: Deepseek R1 uses a technique called "distillation" to learn from larger, more powerful AI models. Instead of trying to replicate the entire knowledge base of these larger models, it focuses on mimicking their desired outputs. This allows Deepseek R1 to achieve impressive performance with a much smaller size, making it more efficient in terms of both processing power and memory usage.

Multiple Large Language Models: Deepseek R1 is trained on a diverse set of large language models. This exposure to different perspectives helps it become more robust and adaptable, improving its overall performance and efficiency.

Smaller Size: Deepseek R1's smaller size is a major factor in its efficiency. It allows the model to run on less powerful hardware, making it more accessible to a wider range of users and reducing the computational resources required for its operation.

In essence, Deepseek R1's efficiency stems from its ability to learn effectively from larger models while maintaining a compact size. This approach allows it to deliver powerful AI capabilities without the need for excessive computing power


Edit:

It was a summary from this video:

 
Last edited:

vkbest

Member
The way the market reacted tho.. :goog_rolleyes:
The market reaction is pretty normal, the abnormal thing was investing so much money in those companies. I mean, they have been investing much money because they thought OpenAI, Nvidia, etc were the winner horse. Now a small startup makes a model 20x more efficient in training and inference.
 

nemiroff

Gold Member
The market reaction is pretty normal, the abnormal thing was investing so much money in those companies. I mean, they have been investing much money because they thought OpenAI, Nvidia, etc were the winner horse. Now a small startup makes a model 20x more efficient in training and inference.

The Nvidia stock likely reflects the concern that the market may not need as many powerful and expensive GPUs to build and run these models as anticipated. The rest of the tech market may just be triggered as a result.

Keep in mind: Deepseek R1 is most likely built on all of those companies' existing models. And there's no stopping those same companies from replicating the same type of distillation builds. Good thing is, users may benefit from potentially cheaper AI services in the end.

We're in the beginning of the AI race. I have no doubt more shifts and surprises are incoming.


Disclaimer: I have no idea; I'm just a normal guy in his armchair.
 

GHG

Gold Member




Text version:

1) DeepSeek r1 is real with important nuances. Most important is the fact that r1 is so much cheaper and more efficient to inference than o1, not from the $6m training figure. r1 costs 93% less to *use* than o1 per each API, can be run locally on a high end work station and does not seem to have hit any rate limits which is wild. Simple math is that every 1b active parameters requires 1 gb of RAM in FP8, so r1 requires 37 gb of RAM. Batching massively lowers costs and more compute increases tokens/second so still advantages to inference in the cloud. Would also note that there are true geopolitical dynamics at play here and I don’t think it is a coincidence that this came out right after “Stargate.” RIP, $500 billion - we hardly even knew you.Real: 1) It is/was the #1 download in the relevant App Store category. Obviously ahead of ChatGPT; something neither Gemini nor Claude was able to accomplish. 2) It is comparable to o1 from a quality perspective although lags o3. 3) There were real algorithmic breakthroughs that led to it being dramatically more efficient both to train and inference. Training in FP8, MLA and multi-token prediction are significant. 4) It is easy to verify that the r1 training run only cost $6m. While this is literally true, it is also *deeply* misleading. 5) Even their hardware architecture is novel and I will note that they use PCI-Express for scale up.Nuance: 1) The $6m does not include “costs associated with prior research and ablation experiments on architectures, algorithms and data” per the technical paper. “Other than that Mrs. Lincoln, how was the play?” This means that it is possible to train an r1 quality model with a $6m run *if* a lab has already spent hundreds of millions of dollars on prior research and has access to much larger clusters. Deepseek obviously has way more than 2048 H800s; one of their earlier papers referenced a cluster of 10k A100s. An equivalently smart team can’t just spin up a 2000 GPU cluster and train r1 from scratch with $6m. Roughly 20% of Nvidia’s revenue goes through Singapore. 20% of Nvidia’s GPUs are probably not in Singapore despite their best efforts. 2) There was a lot of distillation - i.e. it is unlikely they could have trained this without unhindered access to GPT-4o and o1. As
@altcap
pointed out to me yesterday, kinda funny to restrict access to leading edge GPUs and not do anything about China’s ability to distill leading edge American models - obviously defeats the purpose of the export restrictions. Why buy the cow when you can get the milk for free?

2) Conclusions: 1) Lowering the cost to train will increase the ROI on AI. 2) There is no world where this is positive for training capex or the “power” theme in the near term. 3) The biggest risk to the current “AI infrastructure” winners across tech, industrials, utilities and energy is that a distilled version of r1 can be run locally at the edge on a high end work station (someone referenced a Mac Studio Pro). That means that a similar model will run on a superphone in circa 2 years. If inference moves to the edge because it is “good enough,” we are living in a very different world with very different winners - i.e. the biggest PC and smartphone upgrade cycle we have ever seen. Compute has oscillated between centralization and decentralization for a long time. 4) ASI is really, really close and no one really knows what the economic returns to superintelligence will be. If a $100 billion reasoning model trained on 100k plus Blackwells (o5, Gemini 3, Grok 4) is curing cancer and inventing warp drives, then the returns to ASI will be really high and training capex and power consumption will steadily grow; Dyson Spheres will be back to being best explanation for Fermi’s paradox. I hope the returns to ASI are high - would be so awesome. 5) This is all really good for the companies that *use* AI: software, internet, etc. 6) From an economic perspective, this massively increases the value of distribution and *unique* data - YouTube, Facebook, Instagram and X. 7) American labs are likely to stop releasing their leading edge models to prevent the distillation that was so essential to r1, although the cat may already be entirely out of the bag on this front. i.e. r1 may be enough to train r2, etc.Grok-3 looms large and might significantly impact the above conclusions. This will be the first significant test of scaling laws for pre-training arguably since GPT-4. In the same way that it took several weeks to turn v3 into r1 via RL, it will likely take several weeks to run the RL necessary to improve Grok-3’s reasoning capabilities. The better the base model, the better the reasoning model should be as the three scaling laws are multiplicative - pre-training, RL during post-training and test-time compute during inference (a function of the RL). Grok-3 has already shown it can do tasks beyond o1 - see the Tesseract demo - how far beyond is going to be important. To paraphrase an anonymous Orc from “The Two Towers,” meat might be back on the menu very shortly. Time will tell and “when the facts, I change my mind.”
 
Last edited:

E-Cat

Member
What evidence is there that simply injecting more compute will magically let it get 90 percent on frontier math? These models are not really reasoning. There is a high likelihood that we have hit an architectural brick wall, and some new breakthrough is needed.
You have clearly no idea what you’re talking about. There was a 3-month delay between o1 to o3, during which performance went from 2% to 25%. No architectural breakthrough between the two models, just reinforcement learning bootstrapping the previous generation. There was no wall in performance as long as more test-time compute was added. Now we have 25x more efficiency for free, cool. I’m not relying on that AT ALL for my prognosis, btw. See o4 performance on this benchmark in March or April and come talk to me again
 

StereoVsn

Gold Member
What's good about this (Deepseek R1): From here on apps will be lightweight enough to run on devices locally without the need to be connected all the time.

In the bigger picture, when the dust settles, I can't see there being much if any proprietary stuff going on here to stop everyone else to do the same. And the "saved" energy will just be used to further accelerate the overarching timeline. I mean, there's no stopping the AI race, and we're just at the beginning. Who knows, Nvidia stock may soon be up again for what I know.

I watched a software engineer explain this, and here's the summary:

Distillation: Deepseek R1 uses a technique called "distillation" to learn from larger, more powerful AI models. Instead of trying to replicate the entire knowledge base of these larger models, it focuses on mimicking their desired outputs. This allows Deepseek R1 to achieve impressive performance with a much smaller size, making it more efficient in terms of both processing power and memory usage.

Multiple Large Language Models: Deepseek R1 is trained on a diverse set of large language models. This exposure to different perspectives helps it become more robust and adaptable, improving its overall performance and efficiency.

Smaller Size: Deepseek R1's smaller size is a major factor in its efficiency. It allows the model to run on less powerful hardware, making it more accessible to a wider range of users and reducing the computational resources required for its operation.

In essence, Deepseek R1's efficiency stems from its ability to learn effectively from larger models while maintaining a compact size. This approach allows it to deliver powerful AI capabilities without the need for excessive computing power


Edit:

It was a summary from this video:


The one other thing to consider is that they used ChatGPT and Llama as the base first. So they cut down their initial dev time. Those $billions spent by US essentially allowed this Chinese company to provide a much cheaper model.

Pattern repeats itself from the 1990s-2000s and US keeps doing this shit.
 

GHG

Gold Member


The release of deep seek AI from a Chinese company should be a wakeup call for our industries that we need to be laser focused on competing to win because we have the greatest scientists in the world.

Even Chinese leadership told me that this is very unusual. When you hear of deepseek, when you hear somebody somebody come up with something we always have the ideas we're always first so I would say that's a positive, that could be very much a positive development.

So instead of spending billions and billions you'll spend less and you'll come up with hopefully the same solution. In the very near future we're going to be placing tariffs on Farm production of computer chips semiconductors and pharmaceuticals to return production of these essential Goods to the United States of America. They left us and they went to Taiwan where which is about 98% of the chip business by the way and we want them to come back.

Donald Trump Water GIF by Election 2016
 

TVexperto

Member
I think the point is that EU in general is way behind in AI research. There is Mistral (I think that’s the name) but it’s nowhere near the front runners.

Separately as far as gaming goes, by far more interesting development is new Tencent model that turns images and text prompts into full on 3D models that you can use for various purposes.

Read up on Hunyuan3D, it’s pretty damn wild. I feel sorry for 3D artists :(.

Edit: It’s also free and on Hugging Face and GitHub.
Claude is a french company and in my opinion much better with coding than chatgpt because it has artifacts
 

Buggy Loop

Member


Soon as any media starts to compare american models spending billions and china just doing it for $6M I stop listening.

Its stupid as fuck

The $6M does not include the costs associated with prior research and ablation experiments on architectures, algorithms and data. They used american models to distill and learn.



You put export limits on AI GPUs but you let China use GPT to distill it unhindered

slow clap

But continue spreading panic GHG

ByteDance already has a model that beats DeepSeek, and they'll grow like mushrooms month to month. OMG AI evolves? /pikachu face
 

GHG

Gold Member
Soon as any media starts to compare american models spending billions and china just doing it for $6M I stop listening.

Its stupid as fuck

The $6M does not include the costs associated with prior research and ablation experiments on architectures, algorithms and data. They used american models to distill and learn.



You put export limits on AI GPUs but you let China use GPT to distill it unhindered

slow clap

But continue spreading panic GHG

ByteDance already has a model that beats DeepSeek, and they'll grow like mushrooms month to month. OMG AI evolves? /pikachu face


Yes, it's little old me causing panic and instructing hedge funds to offload half a trillion dollars of Nvidia stock.

Melissa Villasenor Oops GIF by Saturday Night Live


Sorry.
 

StereoVsn

Gold Member
Claude is a french company and in my opinion much better with coding than chatgpt because it has artifacts
Anthropic is great, but it’s a US startup. You might be thinking Mistral. This one is also pretty decent but doesn’t really offer much reason to use it.

Edit: Also, panic is over and Markets are up, and so is Nvidia. 😉

And yeah, US should have never allowed such a free use of models or open sourcing things such as Llama.

We will see what Trump does, but the cat is out of proverbial bag.
 
Last edited:

viveks86

Member
Many experts in the field predict the singularity is actually closer than we think.

Ray Kurzweil is one who predicts we'll hit this level in 2045. Some have made predictions earlier than that, but I think 2045 is the most realistic prediction. Either way, it's happening in our lifetimes.

It could greatly benefit hunanity and lead us to a utopian future, or a be our catastrophic downfall. It all depends on how we prepare for and manage it. To do this, there needs to be global cooperation on its development
Yeah ~2050 sounds about right. There also needs to be some major paradigm shift in computing before we get several magnitudes of power. Like quantum computing, nuclear fusion and a move away from silicon transistors altogether. Hopefully I'll be retired then and can witness the biggest leap in humanity (or its demise). And if I'm going to become fodder for bioelectricity for the machine overlords, just feed my brain video games for "maximum throughput".

The Matrix GIF
 

GHG

Gold Member


Said this yesterday but some people are too busy being butthurt to notice the hot tips I'm giving them.

It's not about things "slowing down". Meta is up, Microsoft is attempting to rally back, Apple (whose chips will benefit the most from this due to their architecture) are up. These (and all other competing) companies will benefit greatly from deepseek and can now use what they already have (along with looking at alternatives to Nvidia - yes, this may well bring AMD back in to the picture if the methodologies can be adapted to get around AMD's chip to chip issues) to scale up at no extra cost.


----

The Singularity will be here before TSMC has built another factory on US soil, Trump's most idiotic move so far

The US is in over 35 trillion dollars of debt. But yes, let's just continue to spend as much as humanly possible and even resort to asking the government for money instead of working towards efficiency first.

When all you have is a hammer, everything looks like a nail.
 
Last edited:

Fess

Member
If there was ever an industry that needed heavy regulation then it is AI. AI should be tools that make life better for humanity, not replace humanity.
Not AI but I work in an industry where automation has been ramping up. I was very glad to hear that one important point the higher ups had in their evaluation for a swap to robotic manufacturing was how it would affect us normal workers.

Would we still have something to do? What would we do? Would we think the new duties were fun?

In the end I understand that I’m just a number in a math equation about cost reduction but it’s good that they at least somewhat think about the consequences.
 

ResurrectedContrarian

Suffers with mild autism
Sounds like current AI is overrated bullshit grifting

A simple change in coding and you get “better” results

I'm not sure that you follow the change I described.

Replacing supervised fine-tuning with self-rewarding reinforcement by rewarding reasoning chains isn't just a simple change in coding. It's a complex development, and in fact it's one that builds off of OpenAI's own history of research. They pioneered the application of RL to language models, and have published extensively on it; these new models take the next step building off that research, and push the RL angle even further with a smart simplification of the stack to boost things quickly.

qTxu1tE.jpeg

wKR30wx.jpeg

🤷🏼‍♂️ You can say all the gobbledygook you want, but if it's not capable of searching the internet, it is basically useless.😤

There's a setting right there on the official DeepSeek app that lets you toggle on live internet search by the model.

Why do 7B's image examples still not look that great? Is this a different kind of generative model? The beautiful girl looks pretty terrible compared to what other modern models churn out, which are effectively perfect and photo realistic. 7B's eyes are still fucked up and the chin/jaw/mouth proportions seem off.

Yeah DeepSeek's image model is not state of the art. It's a neat experiment and their techniques will be looked at by others, but it's no replacement even for Flux models which are significantly more powerful.
 

Topher

Identifies as young
Not AI but I work in an industry where automation has been ramping up. I was very glad to hear that one important point the higher ups had in their evaluation for a swap to robotic manufacturing was how it would affect us normal workers.

Would we still have something to do? What would we do? Would we think the new duties were fun?

In the end I understand that I’m just a number in a math equation about cost reduction but it’s good that they at least somewhat think about the consequences.

My son works in a manufacturing plant and I can easily see his job being affected by the same scenario. I think these things can be useful to a point, but can also be incredibly shortsighted if the goal of all this is just to make more money for stockholders, everything else be damned.
 

Buggy Loop

Member


And you interconnect them, I mean for any AI farm worth a damn... he's missing a piece of information

Stability AI founder has been benching it for weeks , "The MFU on the deepseek runs is oddly low and the unified memory and way higher interconnect should yield big step up in 8 bit precision"

If you stay at small scale for local inference, Nvidia Digits is made almost exclusively for this kind of model.

The same dude 2 weeks ago



RIP Mac mini cluster, his word. By his own simplistic definition for DeepSeek being just a measure of costs per GB then you're at $15.63 (!!!) per GB

Season 2 Shrug GIF by The Office
 
you have a billion people investing in nvidia, in something most dont understand, with product development thatll probably take tons of unexpected turns, plus two rival countries both going at it hard, while everyone else is whispering "psst... dotcom bubble".

its going to be a wild ride.
"bubble" still intact.
41c34a5bd81a4a9da3b4067a4e1429b8.jpg
 

Hudo

Gold Member
Claude is a french company and in my opinion much better with coding than chatgpt because it has artifacts
Black Forest Labs (Germany) is also doing some good work on LLMs. And one of their models is used by X's Grok, afaik.
 
Former Microsoft engineer's thoughts on this:


The way I'm understanding it based on his video, you still need those massive high-compute, high-power requirement models to exist. Because Deepseek trains on the huge databanks of those models, then distills it down, lowering the power and compute requirements.

So if all those huge models go proprietary and sue people training on them, isn't that a way they could protect their dominance?

Kind of ironic, given that those huge models also train on questionably-obtained data...

Good video, btw.
 

yogaflame

Member
Wow, another spy machine of Chinese communist government. More stealth bombers, space ships, cars, and missiles to copy from USA. Good luck.
 
Last edited:

E-Cat

Member
It’s not a dumb move if Xi and CCP invade Taiwan in 2027-2028 which is a non-zero chance.
Meanwhile, it's a much higher than zero chance that AGI will have been realized by 2027-2028, at least sans the tariffs that now threaten the US' global AI dominance by doubling the cost of the entire supply chain; especially in light of DeepSeek. Now is the time to buy hand over fist, not shoot yourself in the foot and suck on the bloody stumps
 

ResurrectedContrarian

Suffers with mild autism
Claude is a french company and in my opinion much better with coding than chatgpt because it has artifacts

Claude is from Anthropic, which is an American company. They named it that after the mathematician Claude Shannon, who laid the groundwork for information theory (...Claude Shannon was also American, despite the name).

There is a somewhat prominent French company training LLMs, called Mistral, but they haven't released anything too groundbreaking in a while (they did help pioneer the "mixture of experts" approach that is one part of DeepSeek, so that's something).


Black Forest Labs (Germany) is also doing some good work on LLMs. And one of their models is used by X's Grok, afaik.

BFL is fantastic, but so far they've only released (diffusion based) image models, no language models at all. Their next project was announced to be video.

Grok used to have BFL's FLUX model as its image generator, but just within the past month-ish they switched to their own in-house model. But FLUX is one of the best open source image models around, probably still the best.
 

JCK75

Member
It's a really good time to buy their stock because it's not going to take long before the truth comes out that deep seek is using Nvidia and is just lying about it due to legal reasons.
 

ResurrectedContrarian

Suffers with mild autism
It's a really good time to buy their stock because it's not going to take long before the truth comes out that deep seek is using Nvidia and is just lying about it due to legal reasons.
They are using NVIDIA chips -- they admit that openly in their paper.

But the chips are H800s, a bit slower and cheaper, instead of the cutting edge new chips that NVIDIA is hyping as the next obligatory hardware for the next wave of models. So it's just a matter of the market thinking "maybe chip scaling isn't the factor anymore." Even then, it's not really a logical reason to dump stock, given how dominant NVIDIA remains.

The H800s by the way are actually slower because we did that on purpose lol, the US required chips sold to China to be downscaled in their capabilities.


But DeepSeek shows this really doesn't matter... the highest powered chip isn't the prime factor for innovating or winning in AI research.
 
Last edited:

StereoVsn

Gold Member
Meanwhile, it's a much higher than zero chance that AGI will have been realized by 2027-2028, at least sans the tariffs that now threaten the US' global AI dominance by doubling the cost of the entire supply chain; especially in light of DeepSeek. Now is the time to buy hand over fist, not shoot yourself in the foot and suck on the bloody stumps
… I am sure that’s totally going to magically produce physical hardware after Xi invades. We aren’t talking singularity here and 2027-2028 are unlikely anyways.
 

Sentenza

Member
What evidence is there that simply injecting more compute will magically let it get 90 percent on frontier math? These models are not really reasoning. There is a high likelihood that we have hit an architectural brick wall, and some new breakthrough is needed.
Despise some fearmongering on this topic, every sign so far seems to actually point on the fact that “scaling” hasn’t capped yet.

As they keep throwing more and more computing to these LLM, new proprieties and capabilities seem to emerge.
 

E-Cat

Member
… I am sure that’s totally going to magically produce physical hardware after Xi invades. We aren’t talking singularity here and 2027-2028 are unlikely anyways.
No, I’m saying the singularity is in play before that. Just force them to make more plants in the US while holding off tariffs as a carrot
 
Last edited:

Hudo

Gold Member
Claude is from Anthropic, which is an American company. They named it that after the mathematician Claude Shannon, who laid the groundwork for information theory (...Claude Shannon was also American, despite the name).

There is a somewhat prominent French company training LLMs, called Mistral, but they haven't released anything too groundbreaking in a while (they did help pioneer the "mixture of experts" approach that is one part of DeepSeek, so that's something).




BFL is fantastic, but so far they've only released (diffusion based) image models, no language models at all. Their next project was announced to be video.

Grok used to have BFL's FLUX model as its image generator, but just within the past month-ish they switched to their own in-house model. But FLUX is one of the best open source image models around, probably still the best.
You are right. BFL is doing work on diffusion-based stuff. I was mistaking them for another company. Mea culpa.
 
Top Bottom