DeepSeek for Mortals: Decoding DeepSeek

14 min readFeb 4, 2025

Imagine you’re watching someone build the most amazing LEGO set ever. That’s what’s happening right now in the world of artificial intelligence with something called DeepSeek-R1. It’s like a super-smart digital brain that can solve puzzles and think through problems in ways we’ve never seen before. But here’s the really cool part — unlike other fancy AI systems that are kept under lock and key, DeepSeek-R1 is like a public library — open for everyone to use!

What makes it special? Well, it’s like having a Swiss Army knife that’s both incredibly powerful and surprisingly easy to use. The creators combined three amazing things: a clever new way of building AI (what we call innovative architecture), smart training methods (like teaching a child through games and practice), and the best part — they made it open-source, meaning anyone can use it without needing a supercomputer in their basement. It’s democratizing AI — fancy words for saying “making powerful technology available to everyone,” from students to scientists to businesses.

Stick with me as we explore how this digital brain was built, trained, and made smarter. We’ll look at the clever tricks used to overcome challenges (like teaching a robot to dance), how they packed all this knowledge into smaller, more efficient packages (think shrink ray!), and how they’re making this technology work everywhere from smartphones to supercomputers. Whether you’re a tech wizard or just curious about how machines learn to think, this story of DeepSeek-R1 will change how you see the future of technology.

How was DeepSeek-R1 Built?

Think of DeepSeek-R1 like building a really smart robot. Just as you wouldn’t build a robot in one go, DeepSeek-R1 is built in stages, each stage teaching it something new. What makes it special is that it’s like a Swiss Army knife — it can do many different tasks really well, but doesn’t need a supercomputer to run. That’s what makes DeepSeek-R1 different from other AI models that need massive computers to work.

Now let’s explore how DeepSeek-R1 was built.

Meet the Brain: DeepSeek-v3

At its heart, DeepSeek-v3 is like a super-smart brain with different expert areas — imagine having a room full of specialists, but only calling on the ones you need for each task. It’s built using something called a Mixture-of-Experts (MoE) approach, which means it has 671 billion building blocks (we call them parameters), but cleverly only uses about 37 billion at a time to save energy — like turning on only the lights in the room you’re using.

What makes this AI special is how much it has learned — it’s read through 14.8 trillion pieces of high-quality information. It also has some neat tricks up its sleeve, like a special way of paying attention to important information (Multi-head Latent Attention) and a smart system for making sure all its “experts” share the workload fairly.

If you’re curious about how this expert system works in detail, I wrote about a similar approach in this article about another AI model.

All these clever design choices mean DeepSeek-v3 can think faster and learn more efficiently without needing as much computing power. The best part? Since December 2024, anyone can download and use it because it’s open-source — like a recipe that’s been shared with the world.

The Learning Baby: DeepSeek-R1-Zero and the “Aha” Moment

With the DeepSeek-v3 model begins the innovation that paved the way for DeepSeek-R1. Now, let me tell you something fascinating about its training process. Imagine you’re teaching a child to solve puzzles, but instead of showing them how to do it first, you let them figure it out through trial and error. That’s exactly what DeepSeek does!

Unlike other AI models that typically start by copying examples (known as Supervised Fine-Tuning or SFT), DeepSeek jumps straight into learning through experience — what we call Reinforcement Learning (RL).

Picture this: the model starts from scratch, learning to think step-by-step. It records its reasoning between special markers (we call them <think> tags) before giving the final answer. It’s like in a math class, where showing your work is just as important as the final solution!

Now, here’s where it gets clever. To make this process efficient, DeepSeek uses Group Relative Policy Optimization (GRPO). Imagine you’re a teacher with a class full of students. Instead of relying on another teacher (a “critic model”) to grade their work, you compare how well each student performed on the same problem. That’s GRPO — it evaluates a group of answers together and learns by comparing them.

The model earns rewards in two key ways:

Accuracy: Like getting points for the correct answer. In math or coding tasks (like LeetCode challenges), it’s easy to verify if the response is right or wrong.
Format: Similar to extra points for neatness in an exam. The model must organize its reasoning neatly between <think> tags, showing each step clearly.

What’s special here is that DeepSeek simplifies the training process. Instead of relying on complex neural networks to judge answers (which can sometimes be “tricked” by clever but incorrect responses), it uses simple, rule-based grading. This approach keeps training faster, more reliable, and consistent — just like having clear, straightforward grading criteria in a classroom.

Now, imagine you’re watching a child solve puzzles. At first, they might rush through, but over time they learn to take their time, think through each step, and double-check their work. That’s exactly what happens with DeepSeek during its training! As it practices more (through reinforcement learning), it naturally begins to “think longer,” writing detailed explanations. Over time, it develops impressive skills: it starts self-reflecting (checking its own work) and exploring alternatives (trying different ways to solve problems). The best part? No one explicitly taught it to do this — it figured it out on its own, just like humans improve through practice.

But here’s an interesting twist. Imagine learning to play an instrument. Even if you hit the right notes, your performance might still sound off without proper technique. That’s what happened with DeepSeek-R1-Zero, the earlier version of the model. While it learned to solve problems well through trial and error, it had two major quirks:

Unreadable Responses: Imagine someone with brilliant ideas but messy handwriting. DeepSeek-R1-Zero could solve problems but struggled to explain its reasoning clearly. This happened because it skipped the crucial supervised fine-tuning stage, which teaches good “communication” skills before delving into complex problem-solving.
Language Mixing: Think of someone randomly switching between English and French mid-sentence — confusing, right? DeepSeek-R1-Zero had a similar issue. It sometimes mixed multiple languages in a single response, making it difficult for users to follow along. This happened because it hadn’t learned to consistently stick to one language.

Despite these challenges, something incredible emerged — an “Aha” moment for both the model and its creators.

These quirks highlighted just how powerful reinforcement learning can be. With the right incentives, the model not only learned to solve problems but also developed complex reasoning behaviors all on its own.

This discovery laid the foundation for refining DeepSeek into the more polished and versatile DeepSeek-R1 model.

And that’s where the story of DeepSeek’s innovation gets even more exciting!

The “Aha” Moment

Here’s something fascinating that happened during the development of DeepSeek-R1. Imagine you’re teaching a child to solve puzzles, and suddenly you notice they’re spending more time thinking through each move, even going back to check their work. That’s exactly what researchers saw with DeepSeek-R1-Zero!

During training, the model began taking more “thinking time” and frequently stopped to double-check its work. It was like watching a student naturally develop better study habits over time.

This was a real “lightbulb moment.” Rather than giving the AI detailed, step-by-step instructions, the researchers simply rewarded it for good solutions — what we call reinforcement learning (RL).

And just like a child learning through trial and error, the AI began creating its own clever strategies to solve problems.

This discovery was a game-changer: it showed that AI can develop sophisticated thinking patterns when given the right incentives, without needing explicit instructions.

Building on this insight, the researchers designed DeepSeek-R1 like a student progressing through education levels. They combined the trial-and-error approach of RL with direct teaching methods (known as supervised fine-tuning). This blend helped resolve the early quirks in DeepSeek-R1-Zero, such as mixing languages in responses or generating hard-to-read explanations. By balancing these two training methods, the researchers created an AI that wasn’t just smart but also much better at clearly explaining its reasoning.

The Big Daddy: DeepSeek-R1

Here’s where it gets really interesting! You know how when you’re learning something new, you often make mistakes and learn from them? Well, that’s exactly what happened with DeepSeek-R1. The scientists looked at all the quirks and hiccups that its earlier version (DeepSeek-R1-Zero) had — like mixing up languages or explaining things in a confusing way — and thought, “Aha! We can use these lessons to make something even better!” It’s like when you’re learning to ride a bike — you fall a few times, but each tumble teaches you something valuable about balance. The team took all these lessons and used them to create a smarter, more polished version of the AI.

You know how when you’re learning to cook, you follow steps in a recipe book? Well, DeepSeek-R1 follows its own special recipe, and I’ve got a neat diagram that shows exactly how it’s done:

Alright, let me walk you through this fascinating process:

Cold Start Fine-Tuning: Imagine teaching a baby to talk. You start with simple words and phrases, right? That’s exactly what we do with DeepSeek-R1! You see, its earlier version (DeepSeek-R1-Zero) had some trouble — it was like a toddler mixing up languages and speaking unclearly. So, they came up with a clever solution: they started by teaching it with a small but super-high-quality set of examples that show step-by-step thinking (what we call chain-of-thought or CoT examples).Think of it like teaching someone to solve a puzzle. Instead of just showing them the final picture, we show them how to look at the pieces, sort them, and put them together one by one. This helps DeepSeek-R1 learn to break down complex problems into smaller, manageable steps. It’s like teaching someone to cook by first showing them how to read a recipe, then how to prepare ingredients, and finally how to combine everything. By using these carefully chosen examples during this “cold start” phase, DeepSeek-R1 learned to think more clearly and explain its reasoning in a way that makes sense to humans. It’s like the difference between someone mumbling in mixed-up languages and someone speaking clearly and logically in a language you understand. Pretty neat, right?
Learning Through Rewards (Reasoning-Oriented RL): Remember how we talked about the model learning through trial and error? Well, this is where it gets really interesting! Think of it like training a puppy — you give treats when they do something right. In this case, we’re teaching our AI two main things using the same GRPO framework we discussed earlier: >Solving Problems Better: Imagine you’re teaching someone math. When they solve a problem correctly and show their work neatly, you give them a gold star. That’s exactly what we do here! When the AI solves problems (especially math ones) using good logic and clear steps, it gets a “reward.” Over time, just like a student getting better at homework, the AI learns to think more carefully and solve problems more cleverly. >Speaking One Language at a Time: Here’s something really cool — imagine you’re talking to someone who keeps jumping between English and French in the same sentence. Pretty confusing, right? Well, our DeepSeek-R1-Zero AI had the same problem! So they came up with a clever trick: we created a special reward system, kind of like giving a gold star to a student. Every time the AI stuck to one language throughout its entire response, it got extra points. Think of it like teaching a child to finish their story in English before starting a new one in French — it just makes more sense that way! This simple but effective approach helped solve one of the big issues we saw in our earlier model, DeepSeek-R1-Zero, where it would get a bit… linguistically confused.
Rejection Sampling and SFT: The next step was to make the model smarter. Let me tell you about something really clever they did to make DeepSeek-R1 smarter — it’s called Rejection Sampling and Supervised Fine-Tuning (SFT). Imagine you’re baking cookies, but instead of keeping all of them, you only keep the ones that look and taste perfect. That’s exactly what we do here! We let the AI generate lots of responses, but we’re pretty picky about which ones we keep for training. Here’s how it works: Once our AI has gotten pretty good at its job through reinforcement learning (think of it like practice makes perfect), we ask it to solve lots of problems. For each problem, it comes up with multiple solutions. Just like a strict but fair teacher, we only keep the answers that meet our high standards — the ones that are correct, clear, and well-explained. Everything else goes in the recycling bin. This way, we build up a collection of top-notch examples that cover everything from solving puzzles to writing essays to answering factual questions. We make sure our training examples cover all sorts of tasks — it’s like teaching someone to be good at not just math, but also writing, answering trivia questions, and understanding themselves better. This helps our AI become a well-rounded thinker. Finally, we do one more important check — we clean up all our training data to make it crystal clear. We remove any confusing stuff like mixed-up languages (imagine someone switching between English and Spanish mid-sentence!), super-long paragraphs that might put you to sleep, and messy code blocks. Think of it like editing a book to make sure every page is easy to read and understand.
RL for all Scenarios: Now we get on to the final chapter in training DeepSeek-R1 — it’s like teaching a smart student to become not just knowledgeable, but also kind and helpful. Think of it as finishing school for AI! In this last stage, we again use reinforcement learning (RL) — imagine giving a thumbs up every time the AI does something right. We’re teaching it to be super helpful while staying within the rules of good behavior (what we call ethical considerations). For brain-teaser tasks like solving math problems, it’s pretty straightforward — like giving a gold star when a student shows their work and gets the right answer. When the AI follows proper mathematical rules and shows logical thinking, it gets a reward. Over time, just like a student who practices a lot, it gets better and better at problem-solving. But here’s where it gets really interesting — for tasks that don’t have clear right or wrong answers, like writing a story or giving advice, we use something special. We have these “reward models” that are like experienced teachers who’ve learned from lots of human feedback. They help guide the AI to give answers that most people would find helpful and appropriate.

Think of this final RL stage as graduation day — after all this training, DeepSeek-R1 becomes like a well-rounded student who’s not just book-smart, but also knows how to be helpful and considerate. It can handle all sorts of tasks while staying true to what humans expect and value.

DeepSeek-R1 is a powerful AI that requires significant computing resources. To address this, scientists developed a method to compress its capabilities into smaller models while maintaining performance.

Through a process called “distillation,” they efficiently transferred the AI’s intelligence into more compact versions — similar to condensing a recipe while preserving its essential flavors.

Distillation

The DeepSeek researchers discovered a clever way to pass down the problem-solving skills of their large AI model to smaller, more efficient models — similar to how a master chef trains apprentices by sharing their best techniques. These “mini-brains” may be compact, but they can often match the performance of their larger counterpart, solving problems just as effectively with a fraction of the computing power.

This breakthrough is a game-changer. It means AI can now be deployed in places with limited resources, like smartphones, small robots, or other devices without access to supercomputers. Because these smaller models require less computational power, they can be used in a wide range of applications without the need for expensive, large-scale infrastructure.

Distillation, the process used to create these smaller models, addresses the resource challenges of training and deploying large models like DeepSeek-R1. By carefully transferring the knowledge and reasoning capabilities from the larger model, researchers retain much of the original model’s performance while significantly reducing hardware demands. This allows for scalable AI solutions across diverse environments, all while maintaining high efficiency and effectiveness.

Here is a neat diagram that shows how Distillation works:

Teaching the Small AI Using the Big AI: Imagine you have a super-smart AI teacher (DeepSeek-R1) that creates about 800,000 homework problems with detailed solutions — like showing all the steps to solve a math problem or explain a complex idea. These problems cover everything from solving tricky math equations to answering tough questions and writing essays. It’s like creating the perfect study guide!
Training the Student AIs: We pick some smaller AIs (like Qwen and Llama) to be our students. Think of them as younger siblings who want to be as smart as their big brother or sister. These student AIs learn by studying all those perfect examples from the teacher AI, trying to copy not just the answers, but also how to think through problems step by step.
Keeping It Simple: Instead of using complicated training methods (like reinforcement learning, where AIs learn through trial and error), we just let the students study the teacher’s examples directly. This is like learning from a textbook instead of experimenting in a lab — it’s faster, uses less energy, and still works really well!

Results of Distillation: The Magic of Shrinking AI

Amazing things happens when we “shrink” our AI models. You know how sometimes smaller things can be just as powerful as bigger ones? Like how a tiny phone today can do more than a huge computer from the 1980s?

Well, these smaller, “distilled” AI models are doing something incredible — they’re actually thinking and reasoning just as well as (and sometimes even better than!) their bigger cousins. It’s like having a tiny expert in your pocket!
Here’s a mind-blowing example: Take our compact model called DeepSeek-R1-Distill-Qwen-7B. Despite being much smaller, it’s actually beating GPT-4o (which is like the heavyweight champion of AI) at solving math problems. It’s like having a quick-thinking student outsmarting a room full of professors!
But wait, it gets even more exciting: Right now, these smaller models are already impressive, but imagine if we taught them the same advanced learning techniques (what we call RL or Reinforcement Learning) that we used with the bigger models. It would be like giving our clever student even more tools to work with — especially for tasks that need careful thinking or understanding what humans really want.

Conclusion

What makes DeepSeek-R1 so special?

Imagine you’ve built the world’s most advanced calculator, but instead of keeping the blueprints secret, you figure out how to make a simpler version that almost anyone can build. That’s what the DeepSeek team has done with AI. They’ve created something remarkable and then found clever ways to make it work with less powerful computers.

What’s truly fascinating is how this changes the game for everyone working with AI. In the past, you needed extremely expensive computers — think millions of dollars worth of equipment — to run advanced AI systems. But DeepSeek-R1 proves something important:

with smart engineering and careful training, you can build AI systems that run on much simpler hardware.

It’s like discovering you can build a race car engine that runs on regular fuel.

This breakthrough means that researchers and developers who couldn’t afford expensive equipment can now work on advanced AI projects. Universities with limited budgets, small companies, and independent researchers can all participate in pushing AI forward. It’s not just about making things cheaper — it’s about making advanced AI accessible to more minds and more ideas.

In my next post, I’ll discuss my thoughts on how DeepSeek has transformed the AI landscape and its potential impact on our world.