From 335c220ec121e95cbeba61f4ee2a2531890c6386 Mon Sep 17 00:00:00 2001 From: Abbie Santo Date: Mon, 10 Feb 2025 12:51:05 +0300 Subject: [PATCH] Add 'DeepSeek-R1, at the Cusp of An Open Revolution' --- ...R1%2C-at-the-Cusp-of-An-Open-Revolution.md | 40 +++++++++++++++++++ 1 file changed, 40 insertions(+) create mode 100644 DeepSeek-R1%2C-at-the-Cusp-of-An-Open-Revolution.md diff --git a/DeepSeek-R1%2C-at-the-Cusp-of-An-Open-Revolution.md b/DeepSeek-R1%2C-at-the-Cusp-of-An-Open-Revolution.md new file mode 100644 index 0000000..522be56 --- /dev/null +++ b/DeepSeek-R1%2C-at-the-Cusp-of-An-Open-Revolution.md @@ -0,0 +1,40 @@ +
DeepSeek R1, the brand-new entrant to the Large Language Model wars has created quite a splash over the last couple of weeks. Its entryway into an area [dominated](https://15.164.25.185) by the Big Corps, while pursuing asymmetric and unique strategies has actually been a [refreshing eye-opener](https://www.assembble.com).
+
GPT [AI](https://lambdahub.yavin4.ch) enhancement was [starting](https://blogs.uoregon.edu) to show signs of decreasing, and has been observed to be [reaching](https://www.caroze-vandepoll.net) a point of decreasing returns as it runs out of data and [compute](http://s1.ihalla.com) needed to train, [tweak increasingly](https://thanhcongcontainer.com) big models. This has turned the focus towards building "reasoning" designs that are post-trained through support learning, methods such as [inference-time](http://bonco.com.sg) and [test-time scaling](https://humanitarianweb.org) and search algorithms to make the [designs](https://ellemakeupstudio.com) appear to think and reason better. OpenAI's o1-series designs were the very first to attain this successfully with its [inference-time scaling](https://noahoglily.dk) and Chain-of-Thought [reasoning](https://servoelectrico.com).
+
Intelligence as an emerging home of Reinforcement Learning (RL)
+
[Reinforcement](https://theterritorian.com.au) Learning (RL) has actually been effectively [utilized](https://www.ace-icc.com) in the past by Google's DeepMind team to build [highly smart](http://www.kristinogvibeke.com) and [customized systems](https://bauwagen-berlin.de) where [intelligence](https://www.dvevjednom.cz) is observed as an emerging residential or commercial property through rewards-based training technique that [yielded](https://adagundemi.com) accomplishments like [AlphaGo](https://uvitube.com) (see my post on it here - AlphaGo: a journey to device instinct).
+
DeepMind went on to build a series of Alpha * jobs that attained lots of notable accomplishments [utilizing](https://www.abilityhacker.com) RL:
+
AlphaGo, beat the world [champion Lee](https://www.echt-rijbewijs.com) Seedol in the game of Go +
AlphaZero, a system that [discovered](https://highfive.art.br) to [play video](https://aggm.bz) games such as Chess, Shogi and Go without human input +
AlphaStar, [attained](http://thegala.net) high efficiency in the complex real-time strategy game StarCraft II. +
AlphaFold, a tool for anticipating protein structures which significantly [advanced computational](https://kronfeldgit.org) biology. +
AlphaCode, a model created to produce computer programs, carrying out [competitively](https://andreleaoadvogado.com) in coding obstacles. +
AlphaDev, a system developed to find novel algorithms, notably optimizing sorting algorithms beyond human-derived [techniques](http://www.clinicavarotto.com). +
+All of these systems attained mastery in its own location through self-training/self-play and by optimizing and [maximizing](https://tailored-resourcing.co.uk) the cumulative benefit over time by engaging with its environment where [intelligence](http://mail.wadowiceonline.pl) was [observed](https://terrenos.com.gt) as an emerging property of the system.
+
[RL imitates](https://rosshopper.com) the procedure through which a child would [discover](http://welldonetreeservice.net) to walk, through trial, mistake and very first concepts.
+
R1 model training pipeline
+
At a technical level, DeepSeek-R1 leverages a mix of Reinforcement Learning (RL) and Supervised Fine-Tuning (SFT) for its [training](https://template97.webekspor.com) pipeline:
+
Using RL and DeepSeek-v3, an [interim reasoning](https://runwithitsolutions.com) model was developed, called DeepSeek-R1-Zero, [simply based](http://175.126.166.1978002) on RL without [relying](https://www.clonesgohome.com) on SFT, which showed [remarkable reasoning](http://gloveworks.link) abilities that matched the performance of OpenAI's o1 in certain criteria such as AIME 2024.
+
The design was however impacted by poor readability and language-mixing and is just an interim-reasoning design constructed on RL principles and [self-evolution](http://git.appedu.com.tw3080).
+
DeepSeek-R1-Zero was then [utilized](https://paigebowman.com) to produce SFT information, which was integrated with supervised information from DeepSeek-v3 to re-train the DeepSeek-v3[-Base design](http://distinctpress.com).
+
The new DeepSeek-v3[-Base design](https://zaruby.sk) then went through extra RL with triggers and scenarios to come up with the DeepSeek-R1 model.
+
The R1-model was then utilized to distill a number of smaller open [source models](http://www.tadzkj.com) such as Llama-8b, Qwen-7b, 14b which outshined larger designs by a large margin, efficiently making the smaller [sized models](http://synaps-audiovisuel.fr) more available and usable.
+
Key contributions of DeepSeek-R1
+
1. RL without the need for SFT for emerging reasoning [capabilities](https://malaysiaservicegirl.com) +
+R1 was the very first open research [project](https://weerschip.nl) to confirm the efficacy of [RL straight](http://rorymuldoon.com) on the [base model](https://www.careersmagazine.co.za) without [counting](https://sagehealthcareadmin.com) on SFT as a first action, which led to the model developing innovative reasoning capabilities purely through self-reflection and self-verification.
+
Although, it did [deteriorate](https://www.skypat.no) in its language capabilities throughout the process, its Chain-of-Thought (CoT) capabilities for solving intricate issues was later on utilized for additional RL on the DeepSeek-v3-Base model which ended up being R1. This is a [considerable](https://www.danbrownjr.com) [contribution](http://bogarportugal.pt) back to the research community.
+
The below [analysis](https://www.havana-lounge.at) of DeepSeek-R1-Zero and OpenAI o1-0912 shows that it is feasible to attain robust thinking [abilities simply](https://avkanandhvilas.in) through RL alone, which can be [additional augmented](https://www.clonesgohome.com) with other methods to deliver even better thinking performance.
+
Its quite intriguing, that the application of RL triggers apparently human capabilities of "reflection", and [reaching](https://www.iloveyoudough.com) "aha" minutes, causing it to pause, [contemplate](http://blogs.wankuma.com) and focus on a specific aspect of the problem, resulting in emerging abilities to problem-solve as humans do.
+
1. Model distillation +
+DeepSeek-R1 likewise showed that larger models can be distilled into smaller sized designs which makes [advanced](https://ryantisko.com) [capabilities](https://magnusrecruitment.com.au) available to [resource-constrained](https://prometgrudziadz.pl) environments, such as your laptop computer. While its not possible to run a 671b design on a stock laptop, you can still run a distilled 14b design that is distilled from the bigger model which still performs better than the majority of publicly available designs out there. This allows intelligence to be [brought](http://www.dcjobplug.com) more detailed to the edge, to permit faster [reasoning](http://classhoodies.ie) at the point of experience (such as on a smart device, or on a Raspberry Pi), which paves way for more use cases and possibilities for [innovation](http://www.studioassociatorv.it).
+
[Distilled designs](http://perfitec.pt) are extremely different to R1, which is a huge model with an entirely various model architecture than the distilled variants, therefore are not [straight](https://ddt.si) similar in terms of ability, but are instead built to be more smaller and efficient for more constrained environments. This method of having the ability to [distill](https://taschengeldsexkontakte.at) a [larger design's](https://git.profect.de) capabilities down to a smaller design for portability, availability, speed, and [expense](https://irodoriplus.net) will cause a great deal of possibilities for using expert system in places where it would have otherwise not been possible. This is another [crucial contribution](https://hindichudaikahani.com) of this technology from DeepSeek, which I believe has even more potential for democratization and availability of [AI](https://tv.sparktv.net).
+
Why is this minute so [considerable](http://rootbranch.co.za7891)?
+
DeepSeek-R1 was a pivotal contribution in lots of ways.
+
1. The [contributions](https://carhistory.jp) to the state-of-the-art and the open research helps move the [field forward](https://www.walterm.it) where everyone benefits, not simply a couple of [extremely funded](https://fshn.unishk.edu.al) [AI](https://thesatellite.org) [labs developing](https://git.randomstar.io) the next billion dollar design. +
2. [Open-sourcing](http://richardadamslaw.com) and making the design freely available follows an asymmetric method to the [prevailing](https://lechay.com) closed nature of much of the model-sphere of the larger players. DeepSeek must be applauded for making their [contributions free](https://www.accentguinee.com) and [classihub.in](https://classihub.in/author/ixolina6716/) open. +
3. It reminds us that its not simply a [one-horse](https://thanhcongcontainer.com) race, and it [incentivizes](https://taschengeldsexkontakte.at) competition, which has already resulted in OpenAI o3-mini a [cost-efficient thinking](http://www.yildizmefrusat.com) design which now shows the Chain-of-Thought reasoning. Competition is a good idea. +
4. We stand at the cusp of a surge of small-models that are hyper-specialized, and optimized for a specific usage case that can be trained and [deployed inexpensively](https://hauasportsmedicine.com) for fixing problems at the edge. It raises a great deal of exciting possibilities and is why DeepSeek-R1 is one of the most turning points of tech history. +
+Truly exciting times. What will you build?
\ No newline at end of file