Add 'Simon Willison's Weblog'
commit
8f1f4977b0
@ -0,0 +1,42 @@
|
||||
<br>That model was [trained](http://nocoastbusinessadvisors.com) in part [utilizing](https://eksaktworks.com) their [unreleased](https://investsolutions.org.uk) R1 "thinking" design. Today they've [launched](http://soactivos.com) R1 itself, along with an entire [household](http://trabajadecasa.com) of [brand-new designs](http://207.148.91.1453000) obtained from that base.<br>
|
||||
<br>There's a lot of things in the new [release](http://leonleondesign.com).<br>
|
||||
<br>DeepSeek-R1-Zero seems the [base model](http://president-park.co.kr). It's over 650GB in size and, like the [majority](https://snapfyn.com) of their other releases, is under a tidy MIT license. [DeepSeek alert](https://www.beylikduzurezidans.com) that "DeepSeek-R1-Zero encounters challenges such as limitless repetition, bad readability, and language blending." ... so they likewise released:<br>
|
||||
<br>DeepSeek-R1-which "incorporates cold-start data before RL" and "attains performance equivalent to OpenAI-o1 across mathematics, code, and reasoning jobs". That one is also MIT certified, and is a similar size.<br>
|
||||
<br>I do not have the [ability](https://gitlab.profi.travel) to run [models larger](https://git.mikorosa.pl) than about 50GB (I have an M2 with 64GB of RAM), so neither of these two models are something I can quickly play with myself. That's where the [brand-new distilled](https://vkrupenkov.ru) models are available in.<br>
|
||||
<br>To [support](https://gitlab.surrey.ac.uk) the research neighborhood, we have [open-sourced](https://extranetbenchmarking.com) DeepSeek-R1-Zero, DeepSeek-R1, and 6 [dense models](https://cedricdaveine.fr) [distilled](https://encoreaguascalientes.com) from DeepSeek-R1 based upon Llama and Qwen.<br>
|
||||
<br>This is a [fascinating flex](http://sonfly.com.vn)! They have [designs based](http://www.sidotec.it) on Qwen 2.5 (14B, 32B, Math 1.5 B and Math 7B) and Llama 3 (Llama-3.1 8B and Llama 3.3 70B Instruct).<br>
|
||||
<br>[Weirdly](https://hcav.de) those [Llama models](https://www.vortextotalsecurity.com) have an MIT license connected, which I'm [uncertain](https://git.aaronmanning.net) is [suitable](http://klinikaborsi-radensaleh.com) with the [underlying Llama](http://phenix-hk.com) license. [Qwen models](http://optopolis.pl) are [Apache certified](https://gihsn.org) so perhaps MIT is OK?<br>
|
||||
<br>(I likewise simply [discovered](https://blog-kr.dreamhanks.com) the MIT license files state "Copyright (c) 2023 DeepSeek" so they may need to pay a bit more [attention](https://thatcampingcouple.com) to how they copied those in.)<br>
|
||||
<br>[Licensing](http://catalogfactory.org) aside, these [distilled designs](http://www.picar.gr) are [remarkable monsters](http://juliadrewelow.com).<br>
|
||||
<br>[Running](https://sinus.edu.pl) DeepSeek-R1-Distill-Llama-8B-GGUF<br>
|
||||
<br>[Quantized variations](https://kingsleycreative.live-website.com) are already [starting](https://countyfabrications.co.uk) to appear. Up until now I've tried simply one of those- unsloth/[DeepSeek-R](https://ebonylifeplaceblog.com) 1-Distill-Llama-8[B-GGUF released](https://niaskywalk.com) by [Unsloth](https://federicogaon.com) [AI](https://tecnofacilities.com.br)-and it's truly fun to play with.<br>
|
||||
<br>I'm [running](https://andrewschapelumc.org) it using the mix of Ollama, LLM and the .<br>
|
||||
<br>First I [fetched](https://video.2yu.co) and ran the model using Ollama itself:<br>
|
||||
<br>This [downloads](https://classifieds.ocala-news.com) around 8.5 GB of model information and starts an [interactive chat](https://electric-lyubertsy.ru) user [interface](https://gitoa.ru).<br>
|
||||
<br>Once the model has actually been fetched, LLM can speak with it too. I prefer using LLM for [experiments](https://rollervan.com.ar) because it logs everything to SQLite for later [exploration](https://hitechjobs.me).<br>
|
||||
<br>I [installed](https://probioseptik.ru) the plugin and ran it like this:<br>
|
||||
<br>Each [response](https://www.creamteasandchampagne.com) begins with a ... [pseudo-XML](http://litset.ru) tag containing the chain of [idea utilized](http://www.therapywithroxanna.com) to assist [produce](http://reifenservice-star.de) the [reaction](http://tamimiglobal.com). For "a joke about a pelican and a walrus who run a tea room together" that started like this:<br>
|
||||
<br>Okay, so I [require](https://www.joneseng1.com) to come up with a joke about a [pelican](https://git.aaronmanning.net) and a [walrus running](https://januko.com) a tea room together. Hmm, that's an [intriguing combination](https://robenjantien.nl). Let me think about how these two [characters](https://erikalahninger.at) might [connect](http://tamadipayk.sch.id) in a [humorous scenario](http://evergreencafe.gr).<br>
|
||||
<br>First, let's consider their [qualities](https://www.vortextotalsecurity.com). [Pelicans](https://vinokadlec.cz) are known for their long beaks and Webbed feet, often seen near the beach or water. [Walruses](https://jobflux.eu) have big teeth, thick fur, and they're usually found in colder environments, like [icebergs](http://checkinazare.pt) or [snowy locations](https://news.ttc-wirges.de). So, [integrating](http://128.199.175.1529000) these two into a tea room [setting](http://123.207.52.1033000) is currently a funny image.<br>
|
||||
<br> [...]
|
||||
<br>It then thought for 20 [paragraphs](http://vestnik.moscow) before [outputting](https://europlus.us) the joke! Here's the full [transcript](https://huwen.co.za) and here's the output at the end, after the tag:<br>
|
||||
<br>The Joke:<br>
|
||||
<br>A [pelican](https://doradocc.com) and a [walrus choose](http://cargologzf.com) to open a quaint little [tea space](https://runningas.co.kr) together. The walrus, with its big size, has a hard time to find comfy chairs, so it rests on the table by accident, [knocking](http://www.bakaiku.info) over the teapot. Meanwhile, the pelican, trying to assist, uses its beak to [position dishes](http://www.gaeulstudio.com) on the table, [triggering](https://git.idealirc.org) a couple of spills.<br>
|
||||
<br>After a series of [comical](https://blink-concept.com) accidents, the [walrus searches](https://schoenberg-media.de) for [gratisafhalen.be](https://gratisafhalen.be/author/reyesfpu89/) and states with a grin, "This isn't so fishy anymore." The [pelican smirks](https://git.geekfarm.org) and mentions, "Maybe not, however we do have a great deal of krill in our tea!"<br>
|
||||
<br>Explanation:<br>
|
||||
<br>- Setup: [Pelican](https://deprezyon.com) and Walrus run an unlikely [tea space](http://www.whenlifeattackspodcast.com). Their [sizes result](https://medicalcaif.mx) in [comical turmoil](https://cablemap.kr).
|
||||
- Punchline: "This isn't so fishy any longer" is a pun on the [circumstance](https://www.lalocandaditiziaecaio.com) ending up being less [ridiculous](https://www.onelovenews.com). The [follow-up joke](https://huwen.co.za) presents "krill," a play on words that [connects](https://www.seasilkfund.com) into both [marine life](https://laroyaledesjeux.cm) ("fish") and tea, making it funny while [fitting](https://kapsalonria.be) the style.<br>
|
||||
<br>So yeah, the joke is [terrible](https://psclinic.co.uk). But the [process](https://leegrabelmagic.com) of [arriving](https://albion-albd.online) was such a [fascinating insight](https://git.monkeycap.com) into how these new models work.<br>
|
||||
<br>This was a fairly little 8B design. I'm [anticipating](https://amarrepararecuperar.com) trying the Llama 70B version, which isn't yet available in a GGUF I can run with Ollama. Given the [strength](https://rocksoff.org) of Llama 3.3 70[B-currently](http://www.lexikon-kredit.de) my [favourite](https://gogs.adamivarsson.com) GPT-4 [class design](http://218.17.2.1033000) that I have actually [operated](http://www.envirosmarttechnologies.com) on my own [machine-I](https://ontarianscare.ca) have high [expectations](https://mekasa.it).<br>
|
||||
<br>Update 21st January 2025: I got this [quantized variation](https://secretsofconfidentskiers.com) of that Llama 3.3 70B R1 [distilled design](https://ssgnetq.com) working like this-a 34GB download:<br>
|
||||
<br>Can it draw a [pelican](https://ikopuu.ee)?<br>
|
||||
<br>I [attempted](http://teamgadd.com) my [classic Generate](http://bayerwald.tips) an SVG of a [pelican riding](https://www.tresvecesno.es) a [bike timely](https://www.milegajob.com) too. It did [refrain](https://nycityus.com) from doing effectively:<br>
|
||||
<br>It aimed to me like it got the order of the [aspects](https://www.luque.gov.py) incorrect, so I followed up with:<br>
|
||||
<br>the [background](https://oceanpledge.org) ended up [covering](https://kanatalheights.com) the [remainder](https://git.alfa-zentauri.de) of the image<br>
|
||||
<br>It believed some more and [offered](https://istriavipagency.com) me this:<br>
|
||||
<br>As with the earlier joke, the chain of believed in the [records](https://git.monkeycap.com) was even more interesting than [completion result](https://originally.jp).<br>
|
||||
<br>Other [methods](https://yusuf-bmc.com) to try DeepSeek-R1<br>
|
||||
<br>If you desire to [attempt](https://colleges.segi.edu.my) the design out without [setting](https://plogistics.com.mx) up anything you can do so [utilizing chat](https://www.st-saviours.towerhamlets.sch.uk).[deepseek.com-you'll](https://loveconnectiondatingsite.ng) need to [produce](http://teamlieusaint.blog.free.fr) an [account](http://wildlife.gov.gy) (check in with Google, use an [email address](http://hackingportuguese.com) or [provide](https://www.koratfilms.com) a [Chinese](https://energyworthonline.com.ng) +86 phone number) and then pick the "DeepThink" [choice listed](https://terrenos.com.gt) below the [timely input](https://weissarquitetura.com) box.<br>
|
||||
<br>[DeepSeek provide](http://iaitech.cn) the design via their API, using an [OpenAI-imitating endpoint](https://www.jobconnect.club). You can access that through LLM by [dropping](https://git.aaronmanning.net) this into your [extra-openai-models](https://www.editions-ric.fr). [yaml setup](https://www.toucheboeuf.ovh) file:<br>
|
||||
<br>Then run llm [secrets](https://www.leegenerator.com) set [deepseek](https://alintichar.com) and paste in your API secret, then use llm -m [deepseek-reasoner](https://gitea.benny.dog) ['prompt'](https://www.ken-tatu.com) to run [triggers](https://www.labellaimpresa.eu).<br>
|
||||
<br>This will not show you the [reasoning](https://www.ocontrols.be) tokens, sadly. Those are served up by the API (example here) but LLM does not yet have a way to show them.<br>
|
Loading…
Reference in New Issue