Jump to content

Chinese start-ups such as DeepSeek are challenging global AI giants


Recommended Posts

Posted

Their models are cheaper thanks to US export restrictions that have inadvertently spurred innovation

Last month, a Chinese start-up called DeepSeek astonished the international tech community with its latest open-source artificial intelligence model. DeepSeek-V3 delivers a performance comparable to that of better-funded US rivals such as OpenAI. This week it impressed once again with R1, its foray into AI reasoning.

https://archive.ph/51WIt#selection-2415.0-2419.1

 

I see lots of comments about how China just produces cheap copycat goods. Clearly this is an outdated view. Japanese manufacturing was regarded in the same way for about 25 years after WW2.

  • Love It 1
  • Agree 1
Posted

They're doing it for a fraction of the costs the US companies are spending.

 

See the much more extensive post on this above mine 

  • Like 1
Posted
On 1/25/2025 at 4:39 PM, placeholder said:

 

I see lots of comments about how China just produces cheap copycat goods. Clearly this is an outdated view.

 

Actually that's exactly what happened with DeepSeek. 

 

It may be the case it has managed to cut costs and compute, but we do know that it is built at least in part on the shoulders of the giants: it uses Nvidia chips – albeit older, cheaper versions - and utilises Meta's open-source Llama architecture, as well as AliBaba's equivalent Qwen.

 

https://www.bbc.com/news/articles/cx2jxvn0r51o

 

They copied Meta's open source. And by all accounts DeepSeek is not as good as the  US competition, for instance you can't ask it questions about Tiannamen Square. It censors itself when it comes to Chinese topics.

  • Agree 1
Posted
4 hours ago, Cameroni said:

 

Actually that's exactly what happened with DeepSeek. 

 

It may be the case it has managed to cut costs and compute, but we do know that it is built at least in part on the shoulders of the giants: it uses Nvidia chips – albeit older, cheaper versions - and utilises Meta's open-source Llama architecture, as well as AliBaba's equivalent Qwen.

 

https://www.bbc.com/news/articles/cx2jxvn0r51o

 

They copied Meta's open source. And by all accounts DeepSeek is not as good as the  US competition, for instance you can't ask it questions about Tiannamen Square. It censors itself when it comes to Chinese topics.

Actually, it's broader than China vs USA, it's also about open source vs commercial AI. The open source ai community is quite dynamic and it's an additional contribution which will be improved not only by China, but also by other AI open source participants.

 

The choice of open source by the Chinese has two advantages:

- as you pointed out, it allows them to benefit from previous open source knowledge and resources,

- it directs a share of AI open source innovation by other actors towards complementary innovation which will help China to cope with current limitations regarding chips and other resources.

  • Like 1
  • Agree 1
Posted

I think the scariest part is that in the moment nobody really knows the details. People claim this and that. But is all the information true? But the market obviously reacts already. Because maybe it is true.

 

AI is such a hype and so many people invest in things which look promising, the bubble is very unstable.

And I guess what just happened could happen again and again. Or maybe not. It's a high-risk investment, between making huge amounts of money and losing it all.

Posted
12 minutes ago, rabas said:

The above video by Microsoft engineer David Plummer is a clear intro to what Deepseek is, and isn't. The clever part is its distillation process to turn larger models into smaller, good enough, models that use less hardware.  But I do see a possible cat in the bag and asked Grok to confirm (we're already a mix of human and AI!).  I post my question and it's answer for clarity.  However it goes, Deepseek's arrival  will be transformative.   

 

Me: My question is about the newly released Deepseek AI software and its operation. The question is specific. Preface: Deepseek is said to use larger AI models to train a number of smaller models that can run on smaller hardware and thus cheaper. Does this mean that 1) larger hardware and training is still required when training the smaller modules? and 2) does the Deepseek open source code they released include the initial large training model, or it only includes code for running the smaller models? 

 

(Spoiler, required and no not released) 

 

Grok:

  1. Larger Hardware and Training for Smaller Modules: Yes, the process of training smaller models through distillation or transfer learning still requires the use of larger hardware and computational resources initially. DeepSeek uses a large model (like DeepSeek-R1 or DeepSeek-V3) to train these smaller models. The initial training of the large model necessitates significant computational resources, but once trained, the knowledge can be distilled into smaller models that require less computational power for deployment. This is evident from the strategies mentioned where DeepSeek-R1 knowledge is distilled into smaller models, which implies that the heavy lifting is done during the training of the larger model.

  2.  

  3. Open Source Code and Model Inclusion: The open-source release by DeepSeek includes models that are distilled from the larger models, particularly the DeepSeek-R1 series. However, the training code for the initial large models (like DeepSeek-V3 or the base models used for distillation) does not seem to be open-sourced. The comments on X and technical reports indicate that while the distilled models and possibly some of their training code are available, the actual training pipeline for the larger models, including the data and the original training code, are not part of the open-source release. This means users can run and possibly fine-tune the smaller models but cannot replicate the training of the large models without additional resources or information.

     

In summary, while you can utilize and potentially further train the smaller models with less hardware, the groundwork of training the larger models still requires substantial resources. The open-source materials provided by DeepSeek focus on the smaller, distilled models rather than the full spectrum of their training process for the large models.

End of Grok.
 

Does Grok, and other AI, know up to date data?

Not long ago I asked AI about a recent event and the answer was something like: I don't have any data about that event.

Posted
11 minutes ago, OneMoreFarang said:

Does Grok, and other AI, know up to date data?

Not long ago I asked AI about a recent event and the answer was something like: I don't have any data about that event.

 

Good question. From my post you see Grok is well aware of Deepseek and how it works.   How so if It takes so long to train?

 

Learning and knowledge break down into 3 stages.

 

Stage 1: Build an initial model (links held in memory) , which takes many months at full power. Once per major revision.

Stage 2: These large models can be updated at significant but lest cost. Done periodically, as required by new info, not too often.

Stage 3: On the fly.  When Grok answered my question, he first looked at a number of websites and X posts to learn. This is current probably from some indexing. I,e., he is not yet trained on it but can go 'read up' before answering my question.

 

  • Like 1
  • Thanks 1

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
  • Recently Browsing   0 members

    • No registered users viewing this page.



×
×
  • Create New...