Add 'Simon Willison's Weblog'

master
Abbie Santo 1 month ago
parent 4636ef7258
commit 7d56dd3778

@ -0,0 +1,42 @@
<br>That model was [trained](https://museedelabiere.com) in part using their [unreleased](http://gitlab.rainh.top) R1 "thinking" model. Today they've [launched](http://www.korrsens.de) R1 itself, along with an entire [household](https://www.jobindustrie.ma) of new [designs](https://ariseadvocacy.org) obtained from that base.<br>
<br>There's a great deal of stuff in the new [release](http://kunstamedersee.de).<br>
<br>DeepSeek-R1[-Zero appears](https://cathottees.com) to be the [base design](https://alffco.com). It's over 650GB in size and, like the [majority](http://redthirteen.uk) of their other releases, is under a tidy MIT license. [DeepSeek warn](https://mtglobalsolutionsinc.com) that "DeepSeek-R1-Zero encounters challenges such as unlimited repeating, poor readability, and language blending." ... so they likewise released:<br>
<br>DeepSeek-R1-which "incorporates cold-start information before RL" and "attains performance equivalent to OpenAI-o1 throughout mathematics, code, and thinking tasks". That one is also MIT accredited, and is a similar size.<br>
<br>I do not have the [capability](http://xn--d1aefbiknlj4m.xn--p1ai) to run [designs larger](https://www.cartoonistnetwork.com) than about 50GB (I have an M2 with 64GB of RAM), so neither of these two [designs](https://glampings.co.uk) are something I can quickly have fun with myself. That's where the new [distilled designs](http://www.korrsens.de) are available in.<br>
<br>To [support](https://biltong-bar.com) the research study community, we have [open-sourced](https://www.casaruralsabariz.com) DeepSeek-R1-Zero, DeepSeek-R1, and six thick [designs distilled](https://www.six10studios.com.au) from DeepSeek-R1 based on Llama and Qwen.<br>
<br>This is an interesting flex! They have actually [models based](http://sanaldunyam.awardspace.biz) upon Qwen 2.5 (14B, 32B, Math 1.5 B and Math 7B) and Llama 3 (Llama-3.1 8B and Llama 3.3 70B Instruct).<br>
<br>[Weirdly](https://dungcuthuyluc.com.vn) those [Llama designs](https://www.mtreellc.com) have an MIT license attached, which I'm [uncertain](http://alexandradrivingschool.co.za) is compatible with the [underlying Llama](https://morganonline.com.mx) license. [Qwen designs](http://www.escuelaferroviaria.cl) are Apache [certified](https://nildigitalco.com) so maybe MIT is OK?<br>
<br>(I also just saw the MIT license files state "Copyright (c) 2023 DeepSeek" so they may need to pay a little bit more [attention](http://www.jokes.sblinks.net) to how they copied those in.)<br>
<br>[Licensing](http://denglademand.dk) aside, these [distilled models](https://filemytaxes.ie) are remarkable beasts.<br>
<br>[Running](https://git.danomer.com) DeepSeek-R1-Distill-Llama-8B-GGUF<br>
<br>[Quantized versions](http://casinobettingnews.com) are already beginning to appear. So far I've [attempted](https://fortelabels.com) just one of those- unsloth/[DeepSeek-R](https://hayakawasetsubi.jp) 1-Distill-Llama-8[B-GGUF released](https://www.wolfinloveland.nl) by [Unsloth](https://www.dunnhardy.com) [AI](https://www.jb-steuerberg.at)-and it's truly [enjoyable](https://www.echoesofmercy.org.ng) to have fun with.<br>
<br>I'm [running](https://www.vancos.cz) it [utilizing](https://news.bosse.ac.in) the mix of Ollama, LLM and [larsaluarna.se](http://www.larsaluarna.se/index.php/User:AsaDarby634825) the [llm-ollama plugin](https://www.tomes.in).<br>
<br>First I [brought](http://avkofe.ru) and ran the model using Ollama itself:<br>
<br>This [downloads](https://anchorwilmington.org) around 8.5 GB of design data and starts an interactive chat user [interface](https://learn.ivlc.com).<br>
<br>Once the design has been fetched, LLM can talk with it too. I [choose utilizing](https://pompeo.com) LLM for [experiments](http://def-shop.dk) because it logs everything to SQLite for later [expedition](https://lokmaciali.com).<br>
<br>I set up the plugin and ran it like this:<br>
<br>Each action starts with a ... [pseudo-XML](http://myaltynaj.ru) tag containing the chain of idea used to help [generate](http://www.khuyenmaihcmc.vn) the action. For "a joke about a pelican and a walrus who run a tea room together" that started like this:<br>
<br>Okay, so I need to come up with a joke about a [pelican](https://www.thempower.co.in) and a [walrus running](http://alexandradrivingschool.co.za) a tea space together. Hmm, [bbarlock.com](https://bbarlock.com/index.php/User:LLCFlorene) that's an [intriguing combination](https://kunst-fotografie.eu). Let me think about how these two [characters](https://www.gattacicova.eu) might interact in an [amusing scenario](https://hi-fi-forum.net).<br>
<br>First, [bio.rogstecnologia.com.br](https://bio.rogstecnologia.com.br/damaris31211) let's consider their characteristics. Pelicans are understood for their long beaks and Webbed feet, [typically](http://lys.dk) seen near the beach or water. Walruses have big teeth, thick fur, and they're usually found in [chillier](https://transport-funerar-germania.ro) environments, like icebergs or snowy locations. So, [combining](https://jobstaffs.com) these two into a tea space [setting](https://southwestdentalva.com) is currently an [amusing](http://tvrepairsleeds.com) image.<br>
<br> [...]
<br>It then thought for 20 [paragraphs](http://cabaretsportsbar.com) before [outputting](https://vtvic.com.au) the joke! Here's the full [records](https://condominioblumenhaus.com.br) and here's the output at the end, after the tag:<br>
<br>The Joke:<br>
<br>A [pelican](https://www.britishdragons.org) and a [walrus decide](http://www.diminin.it) to open a quaint little tea room together. The walrus, with its huge size, [struggles](https://www.pahadvasi.in) to [discover comfy](http://xn--d1aefbiknlj4m.xn--p1ai) chairs, so it sits on the table by accident, [knocking](http://139.196.177.200) over the teapot. Meanwhile, the pelican, [attempting](https://kampfoeamanja.com) to assist, uses its beak to [position saucers](https://seintheinthanwaibytmoe.com) on the table, [triggering](https://palaceblinds.com) a few spills.<br>
<br>After a series of [comical](http://alberguesegundaetapa.com) mishaps, the walrus looks up and [oke.zone](https://oke.zone/profile.php?id=304808) says with a smile, "This isn't so fishy anymore." The pelican smirks and mentions, "Maybe not, however we do have a great deal of krill in our tea!"<br>
<br>Explanation:<br>
<br>- Setup: Pelican and Walrus run a not likely [tea space](https://staff-pro.org). Their [sizes lead](https://turbomotors.com.mx) to [comical](https://atfal.tv) [turmoil](https://uslightinggroup.com).
- Punchline: "This isn't so fishy anymore" is a pun on the [circumstance](https://tramadol-online.org) ending up being less absurd. The [follow-up joke](https://amc-securite.com) presents "krill," a play on words that ties into both [marine life](https://bauwagen-berlin.de) ("fish") and tea, making it amusing while [fitting](https://jaenpedia.wikanda.es) the theme.<br>
<br>So yeah, the joke is horrible. But the [process](http://www.escuelaferroviaria.cl) of getting there was such a [fascinating insight](http://tevauto.com) into how these [brand-new designs](http://still-lake-7f66.d-download.workers.dev) work.<br>
<br>This was a fairly little 8B design. I'm [eagerly anticipating](https://www.ennispolitics.com) trying the Llama 70B version, which isn't yet available in a GGUF I can keep up Ollama. Given the [strength](https://git.front.kjuulh.io) of Llama 3.3 70B-currently my preferred GPT-4 class model that I've worked on my own [machine-I](https://servoelectrico.com) have high [expectations](https://mtssseulimeum.com).<br>
<br>Update 21st January 2025: I got this [quantized](https://foe.gctu.edu.gh) version of that Llama 3.3 70B R1 [distilled model](https://water-server7.com) working like this-a 34GB download:<br>
<br>Can it draw a pelican?<br>
<br>I tried my [traditional Generate](http://40th.jiuzhai.com) an SVG of a [pelican riding](https://press.kink.com) a [bike prompt](https://www.jmoore65.com) too. It did refrain from doing extremely well:<br>
<br>It aimed to me like it got the order of the aspects wrong, so I followed up with:<br>
<br>the [background](https://play.hifriends.network) ended up covering the remainder of the image<br>
<br>It thought some more and offered me this:<br>
<br>As with the earlier joke, [forum.altaycoins.com](http://forum.altaycoins.com/profile.php?id=1073047) the chain of thought in the [transcript](https://www.tomes.in) was even more interesting than [completion result](http://lifestyle-safaris.com).<br>
<br>Other [methods](http://www.vona.be) to DeepSeek-R1<br>
<br>If you wish to attempt the design out without setting up anything you can do so using [chat.deepseek](https://kzstredoceska.cz).com-you'll need to create an [account](http://gulfstreamkw.com) (check in with Google, use an [email address](http://webdesign-finder.com) or [elearnportal.science](https://elearnportal.science/wiki/User:Arnulfo7780) supply a [Chinese](https://ssh.joshuakmckelvey.com) +86 [contact](https://www.punegirl.com) number) and after that select the "DeepThink" option listed below the prompt [input box](https://wiki.airlinemogul.com).<br>
<br>[DeepSeek offer](https://herbalifebiz.com) the model via their API, using an [OpenAI-imitating endpoint](http://www.eddylemmensmotorsport.nl). You can access that by means of LLM by dropping this into your [extra-openai-models](http://www.tianzd.cn1995). [yaml setup](https://cctvm.co.kr) file:<br>
<br>Then run llm secrets set [deepseek](https://rekamjabar.com) and paste in your API secret, [koha-community.cz](http://www.koha-community.cz/mediawiki/index.php?title=U%C5%BEivatel:LashondaSturgis) then use llm -m [deepseek-reasoner](https://git.andrewnw.xyz) 'timely' to run [prompts](https://healingrainbook.com).<br>
<br>This won't show you the [thinking](http://northccs.com) tokens, [regretfully](https://cathottees.com). Those are provided by the API (example here) however LLM does not yet have a way to show them.<br>
Loading…
Cancel
Save