1000% agree. We, as a society, are not mature enough nor responsible enough to have tech this powerful so easily and readily available to every one on the planet. I mean, just think of what kind of damage a modern day school shooter will be able to do in the future with help from AI (or AGI).And personally I think Open Source AI is a Pandora’s box that can’t be put back and now we have hostile actors that are actively using it. And not just CCP.
I moved to Australia and because they have no car industry to "protect" they allow Chinese cars here and they really are as good as major brands, in most cases with better features, but at a fraction of the cost.
Does it really advance so fast? We haven't had any meaningful advance since chatgpt. O1 brute forces reasoning to the point of can solve a few more problems. But it's not really reasoning, many experts say. I tend to agree. It seems these reasoning models are rather just memorizing the kind of steps a particular type of problem might warrant.Not so sure AGI by 2030. LLMs are not enough. But I mean, this advances so fast, I'm predicting on a hunch really. It doesn't seem to be very far and seems attainable by 2030 but there's probably a few hurdles in the way. The whole training approach might have to change. On top of government roadblocks slowing things down.
1000% agree. We, as a society, are not mature enough nor responsible enough to have tech this powerful so easily and readily available to every one on the planet. I mean, just think of what kind of damage a modern day school shooter will be able to do in the future with help from AI (or AGI).
The market reaction is pretty normal, the abnormal thing was investing so much money in those companies. I mean, they have been investing much money because they thought OpenAI, Nvidia, etc were the winner horse. Now a small startup makes a model 20x more efficient in training and inference.The way the market reacted tho..![]()
The market reaction is pretty normal, the abnormal thing was investing so much money in those companies. I mean, they have been investing much money because they thought OpenAI, Nvidia, etc were the winner horse. Now a small startup makes a model 20x more efficient in training and inference.
1) DeepSeek r1 is real with important nuances. Most important is the fact that r1 is so much cheaper and more efficient to inference than o1, not from the $6m training figure. r1 costs 93% less to *use* than o1 per each API, can be run locally on a high end work station and does not seem to have hit any rate limits which is wild. Simple math is that every 1b active parameters requires 1 gb of RAM in FP8, so r1 requires 37 gb of RAM. Batching massively lowers costs and more compute increases tokens/second so still advantages to inference in the cloud. Would also note that there are true geopolitical dynamics at play here and I don’t think it is a coincidence that this came out right after “Stargate.” RIP, $500 billion - we hardly even knew you.Real: 1) It is/was the #1 download in the relevant App Store category. Obviously ahead of ChatGPT; something neither Gemini nor Claude was able to accomplish. 2) It is comparable to o1 from a quality perspective although lags o3. 3) There were real algorithmic breakthroughs that led to it being dramatically more efficient both to train and inference. Training in FP8, MLA and multi-token prediction are significant. 4) It is easy to verify that the r1 training run only cost $6m. While this is literally true, it is also *deeply* misleading. 5) Even their hardware architecture is novel and I will note that they use PCI-Express for scale up.Nuance: 1) The $6m does not include “costs associated with prior research and ablation experiments on architectures, algorithms and data” per the technical paper. “Other than that Mrs. Lincoln, how was the play?” This means that it is possible to train an r1 quality model with a $6m run *if* a lab has already spent hundreds of millions of dollars on prior research and has access to much larger clusters. Deepseek obviously has way more than 2048 H800s; one of their earlier papers referenced a cluster of 10k A100s. An equivalently smart team can’t just spin up a 2000 GPU cluster and train r1 from scratch with $6m. Roughly 20% of Nvidia’s revenue goes through Singapore. 20% of Nvidia’s GPUs are probably not in Singapore despite their best efforts. 2) There was a lot of distillation - i.e. it is unlikely they could have trained this without unhindered access to GPT-4o and o1. As
@altcap
pointed out to me yesterday, kinda funny to restrict access to leading edge GPUs and not do anything about China’s ability to distill leading edge American models - obviously defeats the purpose of the export restrictions. Why buy the cow when you can get the milk for free?
2) Conclusions: 1) Lowering the cost to train will increase the ROI on AI. 2) There is no world where this is positive for training capex or the “power” theme in the near term. 3) The biggest risk to the current “AI infrastructure” winners across tech, industrials, utilities and energy is that a distilled version of r1 can be run locally at the edge on a high end work station (someone referenced a Mac Studio Pro). That means that a similar model will run on a superphone in circa 2 years. If inference moves to the edge because it is “good enough,” we are living in a very different world with very different winners - i.e. the biggest PC and smartphone upgrade cycle we have ever seen. Compute has oscillated between centralization and decentralization for a long time. 4) ASI is really, really close and no one really knows what the economic returns to superintelligence will be. If a $100 billion reasoning model trained on 100k plus Blackwells (o5, Gemini 3, Grok 4) is curing cancer and inventing warp drives, then the returns to ASI will be really high and training capex and power consumption will steadily grow; Dyson Spheres will be back to being best explanation for Fermi’s paradox. I hope the returns to ASI are high - would be so awesome. 5) This is all really good for the companies that *use* AI: software, internet, etc. 6) From an economic perspective, this massively increases the value of distribution and *unique* data - YouTube, Facebook, Instagram and X. 7) American labs are likely to stop releasing their leading edge models to prevent the distillation that was so essential to r1, although the cat may already be entirely out of the bag on this front. i.e. r1 may be enough to train r2, etc.Grok-3 looms large and might significantly impact the above conclusions. This will be the first significant test of scaling laws for pre-training arguably since GPT-4. In the same way that it took several weeks to turn v3 into r1 via RL, it will likely take several weeks to run the RL necessary to improve Grok-3’s reasoning capabilities. The better the base model, the better the reasoning model should be as the three scaling laws are multiplicative - pre-training, RL during post-training and test-time compute during inference (a function of the RL). Grok-3 has already shown it can do tasks beyond o1 - see the Tesseract demo - how far beyond is going to be important. To paraphrase an anonymous Orc from “The Two Towers,” meat might be back on the menu very shortly. Time will tell and “when the facts, I change my mind.”
You have clearly no idea what you’re talking about. There was a 3-month delay between o1 to o3, during which performance went from 2% to 25%. No architectural breakthrough between the two models, just reinforcement learning bootstrapping the previous generation. There was no wall in performance as long as more test-time compute was added. Now we have 25x more efficiency for free, cool. I’m not relying on that AT ALL for my prognosis, btw. See o4 performance on this benchmark in March or April and come talk to me againWhat evidence is there that simply injecting more compute will magically let it get 90 percent on frontier math? These models are not really reasoning. There is a high likelihood that we have hit an architectural brick wall, and some new breakthrough is needed.
What's good about this (Deepseek R1): From here on apps will be lightweight enough to run on devices locally without the need to be connected all the time.
In the bigger picture, when the dust settles, I can't see there being much if any proprietary stuff going on here to stop everyone else to do the same. And the "saved" energy will just be used to further accelerate the overarching timeline. I mean, there's no stopping the AI race, and we're just at the beginning. Who knows, Nvidia stock may soon be up again for what I know.
I watched a software engineer explain this, and here's the summary:
Distillation: Deepseek R1 uses a technique called "distillation" to learn from larger, more powerful AI models. Instead of trying to replicate the entire knowledge base of these larger models, it focuses on mimicking their desired outputs. This allows Deepseek R1 to achieve impressive performance with a much smaller size, making it more efficient in terms of both processing power and memory usage.
Multiple Large Language Models: Deepseek R1 is trained on a diverse set of large language models. This exposure to different perspectives helps it become more robust and adaptable, improving its overall performance and efficiency.
Smaller Size: Deepseek R1's smaller size is a major factor in its efficiency. It allows the model to run on less powerful hardware, making it more accessible to a wider range of users and reducing the computational resources required for its operation.
In essence, Deepseek R1's efficiency stems from its ability to learn effectively from larger models while maintaining a compact size. This approach allows it to deliver powerful AI capabilities without the need for excessive computing power
Edit:
It was a summary from this video:
The release of deep seek AI from a Chinese company should be a wakeup call for our industries that we need to be laser focused on competing to win because we have the greatest scientists in the world.
Even Chinese leadership told me that this is very unusual. When you hear of deepseek, when you hear somebody somebody come up with something we always have the ideas we're always first so I would say that's a positive, that could be very much a positive development.
So instead of spending billions and billions you'll spend less and you'll come up with hopefully the same solution. In the very near future we're going to be placing tariffs on Farm production of computer chips semiconductors and pharmaceuticals to return production of these essential Goods to the United States of America. They left us and they went to Taiwan where which is about 98% of the chip business by the way and we want them to come back.
Claude is a french company and in my opinion much better with coding than chatgpt because it has artifactsI think the point is that EU in general is way behind in AI research. There is Mistral (I think that’s the name) but it’s nowhere near the front runners.
Separately as far as gaming goes, by far more interesting development is new Tencent model that turns images and text prompts into full on 3D models that you can use for various purposes.
Read up on Hunyuan3D, it’s pretty damn wild. I feel sorry for 3D artists.
Edit: It’s also free and on Hugging Face and GitHub.
That's an interesting one for gaming as well. I think it's become pretty obvious that there's a lot less optimisation going on this gen.I think the most important point is that everyone is rediscovering the wheel, seeing that developing in low level will always bring more results.
Soon as any media starts to compare american models spending billions and china just doing it for $6M I stop listening.
Its stupid as fuck
The $6M does not include the costs associated with prior research and ablation experiments on architectures, algorithms and data. They used american models to distill and learn.
You put export limits on AI GPUs but you let China use GPT to distill it unhindered
slow clap
But continue spreading panic GHG
ByteDance already has a model that beats DeepSeek, and they'll grow like mushrooms month to month. OMG AI evolves? /pikachu face
Anthropic is great, but it’s a US startup. You might be thinking Mistral. This one is also pretty decent but doesn’t really offer much reason to use it.Claude is a french company and in my opinion much better with coding than chatgpt because it has artifacts
Thanks, Obama.. I mean Trump… I mean GHG!Yes, it's little old me causing panic and instructing hedge funds to offload half a trillion dollars of Nvidia stock.
![]()
Sorry.
Yeah ~2050 sounds about right. There also needs to be some major paradigm shift in computing before we get several magnitudes of power. Like quantum computing, nuclear fusion and a move away from silicon transistors altogether. Hopefully I'll be retired then and can witness the biggest leap in humanity (or its demise). And if I'm going to become fodder for bioelectricity for the machine overlords, just feed my brain video games for "maximum throughput".Many experts in the field predict the singularity is actually closer than we think.
Ray Kurzweil is one who predicts we'll hit this level in 2045. Some have made predictions earlier than that, but I think 2045 is the most realistic prediction. Either way, it's happening in our lifetimes.
It could greatly benefit hunanity and lead us to a utopian future, or a be our catastrophic downfall. It all depends on how we prepare for and manage it. To do this, there needs to be global cooperation on its development
It's not about things "slowing down". Meta is up, Microsoft is attempting to rally back, Apple (whose chips will benefit the most from this due to their architecture) are up. These (and all other competing) companies will benefit greatly from deepseek and can now use what they already have (along with looking at alternatives to Nvidia - yes, this may well bring AMD back in to the picture if the methodologies can be adapted to get around AMD's chip to chip issues) to scale up at no extra cost.
The Singularity will be here before TSMC has built another factory on US soil, Trump's most idiotic move so far
Not AI but I work in an industry where automation has been ramping up. I was very glad to hear that one important point the higher ups had in their evaluation for a swap to robotic manufacturing was how it would affect us normal workers.If there was ever an industry that needed heavy regulation then it is AI. AI should be tools that make life better for humanity, not replace humanity.
Up over 6% today so DLSS is swapped on now?
Sounds like current AI is overrated bullshit grifting
A simple change in coding and you get “better” results
![]()
![]()
You can say all the gobbledygook you want, but if it's not capable of searching the internet, it is basically useless.
![]()
Why do 7B's image examples still not look that great? Is this a different kind of generative model? The beautiful girl looks pretty terrible compared to what other modern models churn out, which are effectively perfect and photo realistic. 7B's eyes are still fucked up and the chin/jaw/mouth proportions seem off.
Not AI but I work in an industry where automation has been ramping up. I was very glad to hear that one important point the higher ups had in their evaluation for a swap to robotic manufacturing was how it would affect us normal workers.
Would we still have something to do? What would we do? Would we think the new duties were fun?
In the end I understand that I’m just a number in a math equation about cost reduction but it’s good that they at least somewhat think about the consequences.
Black Forest Labs (Germany) is also doing some good work on LLMs. And one of their models is used by X's Grok, afaik.Claude is a french company and in my opinion much better with coding than chatgpt because it has artifacts
Former Microsoft engineer's thoughts on this:
It’s not a dumb move if Xi and CCP invade Taiwan in 2027-2028 which is a non-zero chance.The Singularity will be here before TSMC has built another factory on US soil, Trump's most idiotic move so far
I think the most important point is that everyone is rediscovering the wheel, seeing that developing in low level will always bring more results.
That was just what Elon said I think. Don’t think it is verified.Didn't it just come out that they are running on 50,000 NVIDIA chips?
Meanwhile, it's a much higher than zero chance that AGI will have been realized by 2027-2028, at least sans the tariffs that now threaten the US' global AI dominance by doubling the cost of the entire supply chain; especially in light of DeepSeek. Now is the time to buy hand over fist, not shoot yourself in the foot and suck on the bloody stumpsIt’s not a dumb move if Xi and CCP invade Taiwan in 2027-2028 which is a non-zero chance.
Claude is a french company and in my opinion much better with coding than chatgpt because it has artifacts
Black Forest Labs (Germany) is also doing some good work on LLMs. And one of their models is used by X's Grok, afaik.
They are using NVIDIA chips -- they admit that openly in their paper.It's a really good time to buy their stock because it's not going to take long before the truth comes out that deep seek is using Nvidia and is just lying about it due to legal reasons.
… I am sure that’s totally going to magically produce physical hardware after Xi invades. We aren’t talking singularity here and 2027-2028 are unlikely anyways.Meanwhile, it's a much higher than zero chance that AGI will have been realized by 2027-2028, at least sans the tariffs that now threaten the US' global AI dominance by doubling the cost of the entire supply chain; especially in light of DeepSeek. Now is the time to buy hand over fist, not shoot yourself in the foot and suck on the bloody stumps
Despise some fearmongering on this topic, every sign so far seems to actually point on the fact that “scaling” hasn’t capped yet.What evidence is there that simply injecting more compute will magically let it get 90 percent on frontier math? These models are not really reasoning. There is a high likelihood that we have hit an architectural brick wall, and some new breakthrough is needed.
No, I’m saying the singularity is in play before that. Just force them to make more plants in the US while holding off tariffs as a carrot… I am sure that’s totally going to magically produce physical hardware after Xi invades. We aren’t talking singularity here and 2027-2028 are unlikely anyways.
You are right. BFL is doing work on diffusion-based stuff. I was mistaking them for another company. Mea culpa.Claude is from Anthropic, which is an American company. They named it that after the mathematician Claude Shannon, who laid the groundwork for information theory (...Claude Shannon was also American, despite the name).
There is a somewhat prominent French company training LLMs, called Mistral, but they haven't released anything too groundbreaking in a while (they did help pioneer the "mixture of experts" approach that is one part of DeepSeek, so that's something).
BFL is fantastic, but so far they've only released (diffusion based) image models, no language models at all. Their next project was announced to be video.
Grok used to have BFL's FLUX model as its image generator, but just within the past month-ish they switched to their own in-house model. But FLUX is one of the best open source image models around, probably still the best.