• Welcome to ASR. There are many reviews of audio hardware and expert members to help answer your questions. Click here to have your audio equipment measured for free!

New AI Kid On The Block: Deepseek

Music1969

Major Contributor
Joined
Feb 19, 2018
Messages
4,858
Likes
2,977
So I've hearing a lot that Deepseek's open source AI uses reinforcement learning

I've been watching so short videos to help explain it but struggling to understand.

So I understand it is a trial and error method and reward system - but how does it know it has reached the correct answer ?

How do we know and how does it know, that it hasn't given us gibberish as an answer ?

Apparently it doesn't use human feedback in its training (that is costly and inefficient)

Seems a bit mysterious to me.

Can anyone dumb it down? Specifically Deepseek

This guy says he reverse engineered it, so I assume the answer is in here and I didn't understand the key part

 
Ok found this, where he says we don't actually provide the answer in training. Which is again confusing to me

 
They don't use human-labeled training data, but the training data is labeled: in the full paper, section 2.2.2., they mention using rule-based verification of model outputs for mathematical problems, and compilers + test cases for verifying solutions to coding problems. Later, in section 2.3.3., they clarify that the reinforcement learning process uses only "data that could be evaluated using rule-based rewards". It's unclear to me if there are more categories of data here beyond math and coding.

Data for the supervised fine tuning phase of DeepSeek-R1 training is generated by sampling outputs from the model after reinforcement learning, then using the DeepSeek-V3 model to evaluate those outputs and generate labels, then feeding that data back into the DeepSeek-R1 training process. Thus, minimal human labeling is required for this training process; direct human labeling is only used to gather the "cold start" dataset for DeepSeek-R1 reinforcement learning, which is a relatively small dataset containing only thousands of examples. Publicly available RLHF datasets often contain ~100k examples; a Huggingface blog post from 2022 says that RLHF datasets of ~50k examples are sufficient; the original RLHF paper mentions about 60k examples used with GPT-3 at the time.

In the implementation of the training function (approximately 8:30 in the first video), you can see that each input sample comes with a "Prompt" and a "Target"; the target is the label.
 
Last edited:
Thus, no human labeling is required at any point in the process.
In the implementation of the training function (approximately 8:30 in the first video), you can see that each input sample comes with a "Prompt" and a "Target"; the target is the label

Thanks, this comes to the crux of my query - in this example below I did notice the answers were there in the code.

So who entered these 'targets' / answers ? If they were entered by humans, then it is human labeled ?

1737860417543.png
 
Yes, in this video, he wrote his entire training dataset from scratch by hand, which would be considered human labeling. However, remember that he is not showing us the actual DeepSeek-R1 code and data in the video, but instead is demonstrating his (highly simplified) interpretation of the paper. The video communicates the high-level design of the training process while omitting the details of how these datasets are constructed and used in reality.
 
Yes, in this video, he wrote his entire training dataset from scratch by hand, which would be considered human labeling. However, remember that he is not showing us the actual DeepSeek-R1 code and data in the video, but instead is demonstrating his (highly simplified) interpretation of the paper. The video communicates the high-level design of the training process while omitting the details of how these datasets are constructed and used in reality.
Thanks , now it makes sense to me.

Fascinating stuff.

Some people in the video comments said this Deepseek r1 is showing 'truely emergent' behaviour - have you seen examples of that?

And what do they really even mean by that?
 
Thanks , now it makes sense to me.

Fascinating stuff.

Some people in the video comments said this Deepseek r1 is showing 'truely emergent' behaviour - have you seen examples of that?

And what do they really even mean by that?
The "aha moment" example in both videos and the paper is an example of the emergent behavior that they are talking about. They never taught the model to identify reasoning errors and "backtrack" to try different approaches to generate an answer, but it was able to figure out how to do this on its own over the course of the reinforcement learning process.

Edit: in general, "emergent behavior" is an unpredictable property of a system that "emerges" only after the finished system is built and observed. LLMs in general demonstrate tons of emergent behavior. We train the base models to predict the next word given a sequence of input words; from this trivially simple task setup, we have created all of our ChatGPTs and DeepSeeks and Llamas and friends, where we talk about how some models are "better at math" or "more conversational". All of these skills and "personalities" are emergent properties of LLMs. No one ever expected that a next-word predictor could "learn math", yet here we are -- the core innovation or outcome of DeepSeek is the discovery that reinforcement learning on math problems can teach a model to solve any kind of problem.
Ok found this, where he says we don't actually provide the answer in training. Which is again confusing to me

In this video, saying that "we don't actually provide the answer in training" is completely incorrect. Actually, most of what he says is not very accurate -- explanations of reinforcement learning and policy optimization are basically nonsense. In his "baby learning to walk" example, the "reward model" for that baby is the pain of falling down. This is effectively a training label: if the "walking policy" (i.e. brain) makes a good decision about what muscle to use, then the baby feels no pain, and the "walking policy" will learn that it should use that muscle in that way. If the "walking policy" makes a bad decision and the baby falls down, then it will learn something different.
 
Last edited:
Some people in the video comments said this Deepseek r1 is showing 'truely emergent' behaviour
That should not come as a surprise. All "Systems" (other than the most simple) have emergent properties or emergent behaviour.

You don't need a computer to have emergent behaviour. For example the export of silver from South America after 1545 by the Spanish should have made Spain rich relative to other countries (expected behaviour) but it also caused inflation in Europe whilst kick-starting international trade mechanisms, especially with China (emergent behaviour).

A new tax is introduced which brings a government more money to begin with (expected behaviour), for example a tax on windows 1696. But over time the income drops as loopholes are discovered (emergent behaviour) such as bricking up windows.
 
The "aha moment" example in both videos and the paper is an example of the emergent behavior that they are talking about. They never taught the model to identify reasoning errors and "backtrack" to try different approaches to generate an answer, but it was able to figure out how to do this on its own over the course of the reinforcement learning process.

Edit: in general, "emergent behavior" is an unpredictable property of a system that "emerges" only after the finished system is built and observed. LLMs in general demonstrate tons of emergent behavior. We train the base models to predict the next word given a sequence of input words; from this trivially simple task setup, we have created all of our ChatGPTs and DeepSeeks and Llamas and friends, where we talk about how some models are "better at math" or "more conversational". All of these skills and "personalities" are emergent properties of LLMs. No one ever expected that a next-word predictor could "learn math", yet here we are -- the core innovation or outcome of DeepSeek is the discovery that reinforcement learning on math problems can teach a model to solve any kind of problem.

In this video, saying that "we don't actually provide the answer in training" is completely incorrect. Actually, most of what he says is not very accurate -- explanations of reinforcement learning and policy optimization are basically nonsense. In his "baby learning to walk" example, the "reward model" for that baby is the pain of falling down. This is effectively a training label: if the "walking policy" (i.e. brain) makes a good decision about what muscle to use, then the baby feels no pain, and the "walking policy" will learn that it should use that muscle in that way. If the "walking policy" makes a bad decision and the baby falls down, then it will learn something different.
DeepSeek r1 is obviously hot topic at the moment.

I saw this comment on reddit.

Is there truth on this? That it was training on the outputs of ChatGPT etc?

So when you said DeepSeek r1 learned math, did it really learn ?

1737981918837.png
 
DeepSeek poses significant cybersecurity and data privacy risks that are concerning security experts. DeepSeek's advanced AI capabilities make it a powerful tool for identifying vulnerabilities, automating cyberattacks, and processing massive datasets. This could enable more sophisticated and targeted attacks against U.S. systems and infrastructure. The AI model's ability to analyze encrypted data and correlate information also raises concerns about potential espionage and data exfiltration. Additionally, DeepSeek's generative capabilities could be used to create highly convincing phishing attempts and disinformation campaigns1. This increases the risk of social engineering attacks that could trick individuals into revealing sensitive information.While DeepSeek itself has had security vulnerabilities discovered, the primary concern is how it could be leveraged as an offensive tool by malicious actors.

Given DeepSeek's rapid rise and competitive capabilities compared to U.S. AI models, addressing associated security risks will likely be an urgent priority for policymakers and the cybersecurity community.
where was this from?
 
DeepSeek r1 is obviously hot topic at the moment.

I saw this comment on reddit.

Is there truth on this? That it was training on the outputs of ChatGPT etc?

So when you said DeepSeek r1 learned math, did it really learn ?

View attachment 424172
It is not trained exclusively on the output of ChatGPT and other models, so this comment is nonsense. It's like saying that because I learned algebra and calculus in grade school, my math skills can never ever surpass the level of my grade school teachers. If I proceed to university and study advanced math, I could learn something new and beyond the knowledge of those teachers.
DeepSeek poses significant cybersecurity and data privacy risks that are concerning security experts. DeepSeek's advanced AI capabilities make it a powerful tool for identifying vulnerabilities, automating cyberattacks, and processing massive datasets. This could enable more sophisticated and targeted attacks against U.S. systems and infrastructure. The AI model's ability to analyze encrypted data and correlate information also raises concerns about potential espionage and data exfiltration. Additionally, DeepSeek's generative capabilities could be used to create highly convincing phishing attempts and disinformation campaigns1. This increases the risk of social engineering attacks that could trick individuals into revealing sensitive information.While DeepSeek itself has had security vulnerabilities discovered, the primary concern is how it could be leveraged as an offensive tool by malicious actors.

Given DeepSeek's rapid rise and competitive capabilities compared to U.S. AI models, addressing associated security risks will likely be an urgent priority for policymakers and the cybersecurity community.
Fearmongering at its finest. While none of this is false, OpenAI thought that GPT-2 was too dangerous to release in 2019, with identical concerns about disinformation, impersonation, and abuse. This time around we've got the extra aspect of "China scary" to reignite the debate. I don't believe that DeepSeek-R1 has changed the risk landscape significantly compared to prior frontier models from OpenAI, Anthropic, etc.
 
Anyone reading the DeepSeek privacy policy will be concerned. It collects tons of personal data from the user and stores it on China servers which are governed far differently than USA/EU. In addition, the software can be hacked by outside sources who may want your data as well. The list of exploits is growing. Even today Rueters reports registration has been down due to security issues.


One person on YouTube is promoting isolating DeepSeek from the internet to avoid the significant Privacy/Security issues:


Meanwhile, it's one of Apple's top app downloads. What can possibly go wrong? :facepalm:
My point is that none of this is new; DeepSeek hasn't changed anything, it's given us a new reason to re-examine old problems. Any China-based service stores personal data in Chinese servers which are accessible to the Chinese government. Any system that is exposed to the Internet can be hacked. OpenAI went through similar teething issues with prompt injection and XSS vulnerabilities. Disabling new user registration to address critical security issues is a common tactic for cloud service providers.
 
Anyone reading the DeepSeek privacy policy will be concerned.

Sounds like someone is peeved that their Nvidia shares just dropped through the floor.

Anyone reading the privacy policies of any of the online AI providers should be concerned. Just as they should be concerned by the egregious privacy infringements of the parent companies that make competing models, i.e. Meta, X, Microsoft, OpenAI, Google etc. Many of these models including Deepseek can be run locally if privacy is a concern (which it of course should be).

Whether your data is 'safer' stored on US servers rather than Chinese is probably moot at this point ...
 
Sounds like someone is peeved that their Nvidia shares just dropped through the floor.
OT, but I wonder if that will persist or improve as Deepseek undergoes the predicted hacks, server overload, teething problems, etc?
 
I have been worried about the incredible cost of bringing AI to consumers. The high cost of training the model (both in compute resources and real energy) means that only a few companies can afford it, focusing the power in those few entities. I am especially worried about dominance of Nvidia on the hardware side.

While US companies have dominated AI, China has been developing various AI technologies and interestingly enough, putting them all out in open-source. The latest development, the Deepseek AI, seems transformative in the way it is so much cheaper to train. I think the numbers are that it only cost $6M to train it versus over $100M from Open-AI. It also has clever features such as asking it to show how it is reasoning through a problem. The one video I watched showed it to be quite human like in how it explored the problem, often backtracking to find a new solution.

To get a taste of it myself, I created a login today and asked it two related questions:

Who is Amir Majidimehr

As of my last update in October 2023, Amir Majidimehr is a well-known figure in the audiophile community and the technology sector. He has held significant positions in various tech companies and is recognized for his expertise in digital media, audio engineering, and consumer electronics.

Majidimehr has worked at Microsoft, where he was involved in the development of digital media technologies, including the Windows Media platform. He is also known for his contributions to the field of high-fidelity audio and has been a vocal advocate for high-resolution audio formats.

In addition to his professional work, Majidimehr is an active participant in online forums and communities dedicated to audiophiles, where he shares his knowledge and insights on audio technology and equipment. His contributions have made him a respected voice in the audiophile world.

If you are looking for more specific or recent information about Amir Majidimehr, I recommend checking the latest sources or his professional profiles on platforms like LinkedIn.



It is not profound and is dated but from what I recall, as good as OpenAI.

What is audio science review

Audio Science Review (ASR)
is a popular online platform and community dedicated to the objective evaluation and discussion of audio equipment, including headphones, amplifiers, DACs (digital-to-analog converters), speakers, and other audio-related gear. The site is known for its rigorous, data-driven approach to reviewing audio products, relying heavily on measurements and scientific analysis rather than subjective listening impressions.

Key Features of Audio Science Review:

  1. Measurement-Based Reviews:
    ASR uses advanced audio measurement tools and techniques to evaluate the performance of audio equipment. Metrics such as frequency response, distortion, noise floor, and jitter are analyzed to provide an objective assessment of a product's performance.
  2. Transparency and Objectivity:
    The platform emphasizes transparency in its testing methodologies, often sharing detailed graphs, charts, and data to support its conclusions. This approach appeals to those who prefer evidence-based evaluations over subjective opinions.
  3. Community Discussions:
    ASR has an active forum where members discuss audio equipment, share insights, and debate topics related to audio science and technology. The community includes both audiophiles and professionals in the audio engineering field.
  4. Focus on Value and Performance:
    ASR often highlights products that offer excellent performance at reasonable prices, challenging the notion that high cost always equates to better sound quality.
  5. Founder and Leadership:
    The site was founded by Amir Majidimehr, a well-known figure in the audio and technology industries. His background in engineering and digital media has helped shape ASR's scientific approach to audio evaluation.

Why ASR Stands Out:

  • Critique of Audiophile Myths:
    ASR is known for debunking common audiophile myths, such as the perceived benefits of ultra-expensive cables or the idea that high-resolution audio formats are always audibly superior to standard formats.
  • Accessible Data:
    The site makes complex audio measurements understandable for a broad audience, helping consumers make informed decisions based on performance rather than marketing claims.

Criticism:

While ASR is highly regarded for its scientific approach, it has also faced criticism from some audiophiles who argue that subjective listening experiences and personal preferences are equally important in evaluating audio gear. Some feel that ASR's focus on measurements overlooks the emotional and subjective aspects of music enjoyment.

Overall, Audio Science Review is a valuable resource for anyone interested in understanding the technical performance of audio equipment and making data-driven purchasing decisions. You can visit the site at audiosciencereview.com.


Again, this is as good as any other Chatbot and what I expect from the technology. So no compromise in fidelity despite the huge reduction in training cost.

When I lased looked at using such AI to summarize review threads, the cost was prohibitive. This should substantially reduces that overhead and burden. I suspect there is considerable turmoil in tech companies over Deepseek as I type this!
 
I suspect there is considerable turmoil in tech companies over Deepseek as I type this!
I'll say. Nvidia is currently down almost 14% TODAY (including an after-hours improvement) which represents something like a drop of $400B in market cap. To put it in perspective, this is considered enough money to end world hunger for 10 years. TSMC down a similar percentage.

I had read somewhere that there were advances in computation that allowed model training to be done 95% faster / cheaper... I was wondering if that would affect Nvidia's business if it panned out. I guess now we know.
 
I'll say. Nvidia is currently down almost 14% TODAY (including an after-hours improvement) which represents something like a drop of $400B in market cap. To put it in perspective, this is considered enough money to end world hunger for 10 years. TSMC down a similar percentage.

I had read somewhere that there were advances in computation that allowed model training to be done 95% faster / cheaper... I was wondering if that would affect Nvidia's business if it panned out. I guess now we know.

I am bull for $NVDA. But a decent read https://stratechery.com/2025/deepseek-faq/
 
On this site that has over 1million user input votes over numerous LLMs. DeepSeek and ChatGPT are tied for 2nd/3rd next to 1st place Gemini in ratings. How they get the ratings is explained on the site and is quite complex. FYI
 
I am bull for $NVDA. But a decent read https://stratechery.com/2025/deepseek-faq/
From the article:

simply paying Nvidia more isn’t the only way to make better models.
Seems bad for $NVDA, gamers sure aren't going to keep the business going the way it has been...

Also, to the extent we're accelerating progress towards "take-off" (as discussed in the linked article) I'll just say that this is also bad for everyone who doesn't stand to get rich from having invested in AI.

"AI take-off" means "you are no longer employable" for 99.9% of workers.

That's not panicky fear-mongering, that's basically the definition of AGI OpenAI uses, with 99.9% inserted in place of "most" by me. If there's a computer program that's actually smarter than someone in practice, why employ that person? You're going to pay them to do a worse job than the computer does?

I've been asking this for decades now and never got a satisfying answer.
 
Back
Top Bottom