{"id":710,"date":"2025-01-29T12:02:22","date_gmt":"2025-01-29T12:02:22","guid":{"rendered":"https:\/\/janusai.pro\/?p=710"},"modified":"2025-01-29T12:02:40","modified_gmt":"2025-01-29T12:02:40","slug":"deepseek-v3-paper-details-how-to-bypass-the-cuda-monopoly","status":"publish","type":"post","link":"https:\/\/janusai.pro\/ro\/deepseek-v3-paper-details-how-to-bypass-the-cuda-monopoly\/","title":{"rendered":"Detalii despre lucrarea DeepSeek V3: Cum s\u0103 ocoli\u021bi monopolul CUDA!"},"content":{"rendered":"<div style=\"margin-top: 0px; margin-bottom: 0px;\" class=\"sharethis-inline-share-buttons\" ><\/div>\n<p><a href=\"https:\/\/www.deepseek.com\/\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">DeepSeek V3<\/a> detalii despre lucrare: Cum s\u0103 ocoli\u021bi monopolul CUDA!<\/p>\n\n\n\n<p>Cele dou\u0103 modele recent lansate de DeepSeek, DeepSeek-V3 \u0219i DeepSeek-R1, ating performan\u021be comparabile cu modele similare de la OpenAI, la un cost mult mai mic.<\/p>\n\n\n\n<p>Potrivit rapoartelor din presa str\u0103in\u0103, \u00een doar dou\u0103 luni, ace\u0219tia au antrenat un model lingvistic MoE cu 671 de miliarde de parametri pe un cluster de 2 048 de GPU-uri H800, ceea ce este de 10 ori mai eficient dec\u00e2t AI de top.<\/p>\n\n\n\n<p>Aceast\u0103 descoperire nu a fost realizat\u0103 folosind CUDA, ci printr-un num\u0103r mare de optimiz\u0103ri fine \u0219i prin utilizarea program\u0103rii PTX (execu\u021bie paralel\u0103 a firelor) asem\u0103n\u0103toare asambl\u0103rii de la NVIDIA.<\/p>\n\n\n\n<p><a href=\"https:\/\/www.deepseek.com\/\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">DeepSeek<\/a> a fost nevoit\u0103 s\u0103 urmeze o cale diferit\u0103 de cea a OpenAI \u0219i a altor companii care se bazeaz\u0103 pe puterea de calcul prin for\u021b\u0103 brut\u0103 \u00een cadrul constr\u00e2ngerilor hardware. Aceasta a utilizat o serie de inova\u021bii tehnologice pentru a reduce cerin\u021bele de putere ale modelului, ob\u021bin\u00e2nd \u00een acela\u0219i timp c\u00e2\u0219tiguri de performan\u021b\u0103.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/rmy9ct2fln.feishu.cn\/space\/api\/box\/stream\/download\/asynccode\/?code=ZDM1YTM0ODZkYmQzOWNkNzc2ZTBmNzUwY2ZjOWYxMjZfYnUyVHFsb05ya0c1M0hvMGRUbk9CN3FVekR1ZjlQMEZfVG9rZW46TUtzM2JudThpb1p3NHJ4SlZNeWNWdU10bnNnXzE3MzgxNTE4NjQ6MTczODE1NTQ2NF9WNA\" alt=\"\"\/><\/figure>\n\n\n\n<p>Unele comentarii entuziaste din partea internau\u021bilor:<\/p>\n\n\n\n<p>\"\u00cen aceast\u0103 lume, dac\u0103 exist\u0103 grupuri de oameni care ar fi suficient de nebuni s\u0103 spun\u0103 lucruri de genul \"CUDA este prea lent!<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/rmy9ct2fln.feishu.cn\/space\/api\/box\/stream\/download\/asynccode\/?code=OGEyMmE3ZTJkODlkZDlmNjliZTI1MzI5YTE4ZWE3MjdfWWRBam5VTkVaV1ZsMFg3VzVTRjRDZlUzV2ZiSHZYT2RfVG9rZW46VGZsdWJrTzZHb243OUx4bEZsbmNmMFNzblFiXzE3MzgxNTE4NjQ6MTczODE1NTQ2NF9WNA\" alt=\"\"\/><\/figure>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/rmy9ct2fln.feishu.cn\/space\/api\/box\/stream\/download\/asynccode\/?code=NzI1ZTBlMjJkMDI2N2MyMDdkMGI4YmU5OTJjNGM0YzFfZW4xbjVERFdhdGVObHBDUWR3NVZjbDRSM2lrVDlWRGlfVG9rZW46Q2N5MWIxV2ltbzdmZU14VXI2amNuZDk2bmRkXzE3MzgxNTE4NjQ6MTczODE1NTQ2NF9WNA\" alt=\"\"\/><\/figure>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/rmy9ct2fln.feishu.cn\/space\/api\/box\/stream\/download\/asynccode\/?code=MDMyN2YyYjYwYTNkZDhmMmEyYWY2MjMzZGE3MGM1ZmFfM29veUZrRWdYODRGR0JVdWVVTnRoMzVwTWxjV09CT25fVG9rZW46SVE2dGJWek9Mb29jaTJ4ZnkzWWN5bUZWbnVnXzE3MzgxNTE4NjQ6MTczODE1NTQ2NF9WNA\" alt=\"\"\/><\/figure>\n\n\n\n<div id=\"ez-toc-container\" class=\"ez-toc-v2_0_82_2 counter-hierarchy ez-toc-counter ez-toc-grey ez-toc-container-direction\">\n<div class=\"ez-toc-title-container\">\n<p class=\"ez-toc-title\" style=\"cursor:inherit\">Tabla de con\u021binut<\/p>\n<span class=\"ez-toc-title-toggle\"><a href=\"#\" class=\"ez-toc-pull-right ez-toc-btn ez-toc-btn-xs ez-toc-btn-default ez-toc-toggle\" aria-label=\"Toggle Tabelul de con\u021binut\"><span class=\"ez-toc-js-icon-con\"><span class=\"\"><span class=\"eztoc-hide\" style=\"display:none;\">Toggle<\/span><span class=\"ez-toc-icon-toggle-span\"><svg style=\"fill: #999;color:#999\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" class=\"list-377408\" width=\"20px\" height=\"20px\" viewbox=\"0 0 24 24\" fill=\"none\"><path d=\"M6 6H4v2h2V6zm14 0H8v2h12V6zM4 11h2v2H4v-2zm16 0H8v2h12v-2zM4 16h2v2H4v-2zm16 0H8v2h12v-2z\" fill=\"currentColor\"><\/path><\/svg><svg style=\"fill: #999;color:#999\" class=\"arrow-unsorted-368013\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"10px\" height=\"10px\" viewbox=\"0 0 24 24\" version=\"1.2\" baseprofile=\"tiny\"><path d=\"M18.2 9.3l-6.2-6.3-6.2 6.3c-.2.2-.3.4-.3.7s.1.5.3.7c.2.2.4.3.7.3h11c.3 0 .5-.1.7-.3.2-.2.3-.5.3-.7s-.1-.5-.3-.7zM5.8 14.7l6.2 6.3 6.2-6.3c.2-.2.3-.5.3-.7s-.1-.5-.3-.7c-.2-.2-.4-.3-.7-.3h-11c-.3 0-.5.1-.7.3-.2.2-.3.5-.3.7s.1.5.3.7z\"\/><\/svg><\/span><\/span><\/span><\/a><\/span><\/div>\n<nav><ul class='ez-toc-list ez-toc-list-level-1' ><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/janusai.pro\/ro\/deepseek-v3-paper-details-how-to-bypass-the-cuda-monopoly\/#Genius_geeks_fine-tune_PTX_to_maximize_GPU_performance\" >Genius geeks ajusteaz\u0103 PTX pentru a maximiza performan\u021ba GPU<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-2\" href=\"https:\/\/janusai.pro\/ro\/deepseek-v3-paper-details-how-to-bypass-the-cuda-monopoly\/#PTX_and_CUDA\" >PTX \u0219i CUDA<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-3\" href=\"https:\/\/janusai.pro\/ro\/deepseek-v3-paper-details-how-to-bypass-the-cuda-monopoly\/#However_the_technical_barriers_remain\" >Cu toate acestea, barierele tehnice r\u0103m\u00e2n<\/a><\/li><\/ul><\/nav><\/div>\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Genius_geeks_fine-tune_PTX_to_maximize_GPU_performance\"><\/span>Genius geeks ajusteaz\u0103 PTX pentru a maximiza performan\u021ba GPU<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>NVIDIA PTX (Parallel Thread Execution) este o arhitectur\u0103 middle-instruction-set conceput\u0103 special pentru GPU-urile sale, situat\u0103 \u00eentre limbajele de programare GPU de nivel \u00eenalt (cum ar fi CUDA C\/C++) sau alte limbaje front-end \u0219i codul ma\u0219in\u0103 de nivel sc\u0103zut (streaming assembly sau SASS).<\/p>\n\n\n\n<p>PTX este o arhitectur\u0103 de seturi de instruc\u021biuni de nivel sc\u0103zut care prezint\u0103 GPU ca un dispozitiv de calcul paralel al datelor, permi\u021b\u00e2nd optimiz\u0103ri fine, cum ar fi alocarea registrelor \u0219i reglarea la nivel de thread\/thread-bundle, care nu sunt posibile cu limbaje precum CUDA C\/C++.<\/p>\n\n\n\n<p>Atunci c\u00e2nd PTX este convertit \u00een SASS, acesta este optimizat pentru o anumit\u0103 genera\u021bie de GPU-uri NVIDIA.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/rmy9ct2fln.feishu.cn\/space\/api\/box\/stream\/download\/asynccode\/?code=MGIwZTQ0ZDdhMTgxYTBlMmEzZDE5OTczN2ZlZmEzNGFfc3I3T2U0UzNCOGdjd1ZHcktHd1hkd1RpcXlLbkxrU1FfVG9rZW46Vk05WWJ0a1Bob3NkYzl4bXpFc2N6anI3bktjXzE3MzgxNTE4NjQ6MTczODE1NTQ2NF9WNA\" alt=\"\"\/><\/figure>\n\n\n\n<p>La antrenarea modelului V3, DeepSeek a reconfigurat GPU-ul NVIDIA H800:<\/p>\n\n\n\n<p>Din cele 132 de nuclee ale procesorului de flux, 20 au fost alocate pentru comunicarea \u00eentre servere, \u00een principal pentru compresia \u0219i decompresia datelor, pentru a dep\u0103\u0219i limita de conectare a procesorului \u0219i a \u00eembun\u0103t\u0103\u021bi viteza de procesare a tranzac\u021biilor.<\/p>\n\n\n\n<p>Pentru a maximiza performan\u021ba, DeepSeek a implementat, de asemenea, algoritmi avansa\u021bi de pipelining prin ajust\u0103ri suplimentare la nivel de pachet de fire\/fiecare cu granula\u021bie fin\u0103.<\/p>\n\n\n\n<p>Aceste optimiz\u0103ri dep\u0103\u0219esc cu mult nivelul de dezvoltare CUDA conven\u021bional\u0103, dar sunt extrem de dificil de men\u021binut. Cu toate acestea, acest nivel de optimizare este exact ceea ce demonstreaz\u0103 pe deplin capacit\u0103\u021bile tehnice remarcabile ale echipei DeepSeek.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/rmy9ct2fln.feishu.cn\/space\/api\/box\/stream\/download\/asynccode\/?code=MDk2ZDMyM2IzOGU5OWNmN2JhOTY2ZDZmMjhkOWYwZGFfUnFiV0hvbnQ0ZUFHSHg3WHpyMW5jYTRvMURPM1pDSTZfVG9rZW46QnZVNWJyUzBDb2FWeE54Ym4ybGNZNXlnbmFnXzE3MzgxNTE4NjQ6MTczODE1NTQ2NF9WNA\" alt=\"\"\/><\/figure>\n\n\n\n<p>Documentul V3 men\u021bioneaz\u0103 \u00een mod specific detalii despre PTX<\/p>\n\n\n\n<p>Acest lucru se datoreaz\u0103 faptului c\u0103, sub dubla presiune a lipsei de GPU la nivel mondial \u0219i a restric\u021biilor din SUA, companii precum DeepSeek au trebuit s\u0103 caute solu\u021bii inovatoare.<\/p>\n\n\n\n<p>Din fericire, ei au f\u0103cut progrese semnificative \u00een acest domeniu.<\/p>\n\n\n\n<p>Un dezvoltator consider\u0103 c\u0103 \"programarea GPU la nivel sc\u0103zut este direc\u021bia corect\u0103. Cu c\u00e2t optimizarea este mai mare, cu at\u00e2t costul este mai mic sau bugetul de performan\u021b\u0103 poate fi utilizat pentru alte progrese f\u0103r\u0103 cheltuieli suplimentare\".<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/rmy9ct2fln.feishu.cn\/space\/api\/box\/stream\/download\/asynccode\/?code=MmEzYzA5ZTVmNjE4ZTlhMWE0NWU1ZTgyZTA2NmUxMDJfUWdNb21QeEFtUWlFSFA1aGFWZEZJMzlUNjdPT3J5NXRfVG9rZW46RWtaaGJ2UlBHbzk2VWF4TmxkeGNPeGdKblJnXzE3MzgxNTE4NjQ6MTczODE1NTQ2NF9WNA\" alt=\"\"\/><\/figure>\n\n\n\n<p>Aceast\u0103 descoperire a avut un impact semnificativ asupra pie\u021bei, iar unii investitori cred c\u0103 noul model va reduce cererea de hardware de \u00eenalt\u0103 performan\u021b\u0103, ceea ce poate afecta performan\u021ba v\u00e2nz\u0103rilor unor companii precum NVIDIA.<\/p>\n\n\n\n<p>Cu toate acestea, veteranii industriei, inclusiv fostul director general al Intel, Pat Gelsinger, consider\u0103 c\u0103 aplica\u021biile AI pot utiliza la maximum toat\u0103 puterea de calcul disponibil\u0103.<\/p>\n\n\n\n<p>Gelsinger consider\u0103 c\u0103 aceast\u0103 descoperire realizat\u0103 de DeepSeek reprezint\u0103 o nou\u0103 modalitate de a \u00eencorpora capacit\u0103\u021bi de inteligen\u021b\u0103 artificial\u0103 \u00een dispozitive ieftine destinate pie\u021bei de mas\u0103.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/rmy9ct2fln.feishu.cn\/space\/api\/box\/stream\/download\/asynccode\/?code=NzgzZjM2ZTVlOWM0OWI1MDE5OTI1NTQwNWRjYTI5Y2NfZ25sc2tPNFJ1UHZwemp1WEVlclU1cloxZXI5aHJMbEZfVG9rZW46SHlGTGJnNHpHbzNzbnd4bkxPQ2N4T0RyblZkXzE3MzgxNTE4NjQ6MTczODE1NTQ2NF9WNA\" alt=\"\"\/><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"PTX_and_CUDA\"><\/span>PTX \u0219i CUDA<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>A\u0219adar, sosirea DeepSeek \u00eenseamn\u0103 c\u0103 dezvoltarea LLM de ultim\u0103 genera\u021bie nu mai necesit\u0103 clustere GPU pe scar\u0103 larg\u0103?<\/p>\n\n\n\n<p>Vor fi investi\u021biile uria\u0219e \u00een resurse de calcul ale Google, <a href=\"https:\/\/openai.com\/\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">OpenAI<\/a>, Meta \u0219i xAI se irosesc \u00een cele din urm\u0103? Consensul general \u00een r\u00e2ndul dezvoltatorilor de AI este c\u0103 nu este cazul.<\/p>\n\n\n\n<p>Cu toate acestea, este cert c\u0103 exist\u0103 \u00eenc\u0103 un poten\u021bial imens de exploatat \u00een ceea ce prive\u0219te procesarea datelor \u0219i optimizarea algoritmilor, iar \u00een viitor vor ap\u0103rea cu siguran\u021b\u0103 metode de optimizare mai inovatoare.<\/p>\n\n\n\n<p>Av\u00e2nd \u00een vedere c\u0103 modelul V3 al DeepSeek este liber, detaliile sunt dezv\u0103luite \u00een detaliu \u00een raportul s\u0103u tehnic.<\/p>\n\n\n\n<p>Raportul documenteaz\u0103 optimiz\u0103rile profunde de baz\u0103 efectuate de DeepSeek. Pe scurt, gradul de optimizare poate fi rezumat la \"au reconstruit \u00eentregul sistem de la zero\".<\/p>\n\n\n\n<p>Dup\u0103 cum s-a men\u021bionat mai sus, la instruirea V3 cu ajutorul GPU-ului H800, DeepSeek a personalizat unit\u0103\u021bile de calcul de baz\u0103 ale GPU-ului (streaming multiprocessor cores sau SMs) pentru a r\u0103spunde nevoilor specifice.<\/p>\n\n\n\n<p>Din totalul de 132 de SM, ace\u0219tia au alocat \u00een mod specific 20 pentru a gestiona mai degrab\u0103 sarcini de comunicare \u00eentre servere dec\u00e2t sarcini de calcul.<\/p>\n\n\n\n<p>Aceast\u0103 personalizare se face la nivelul PTX (paralel thread execution), care este setul de instruc\u021biuni de nivel sc\u0103zut al GPU-ului NVIDIA.<\/p>\n\n\n\n<p>PTX ruleaz\u0103 la un nivel apropiat de limbajul de asamblare \u0219i permite optimiz\u0103ri fine, cum ar fi alocarea registrelor \u0219i reglarea la nivel de thread\/thread-bundle. Cu toate acestea, acest control fin este complex \u0219i dificil de \u00eentre\u021binut.<\/p>\n\n\n\n<p>Acesta este motivul pentru care dezvoltatorii prefer\u0103 de obicei s\u0103 utilizeze limbaje de programare de nivel \u00eenalt, cum ar fi CUDA, care ofer\u0103 optimiz\u0103ri de performan\u021b\u0103 suficiente pentru majoritatea sarcinilor de programare paralel\u0103 \u0219i elimin\u0103 necesitatea optimiz\u0103rilor de nivel sc\u0103zut.<\/p>\n\n\n\n<p>Cu toate acestea, atunci c\u00e2nd vine vorba de maximizarea eficien\u021bei resurselor GPU \u0219i \u00eendeplinirea cerin\u021belor specifice de optimizare, dezvoltatorii trebuie s\u0103 recurg\u0103 la PTX.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"However_the_technical_barriers_remain\"><\/span>Cu toate acestea, barierele tehnice r\u0103m\u00e2n<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>\u00cen acest sens, utilizatorul de internet Ian Cutress a declarat: \"Utilizarea PTX de c\u0103tre Deepseek nu elimin\u0103 barierele tehnice ale CUDA\".<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/rmy9ct2fln.feishu.cn\/space\/api\/box\/stream\/download\/asynccode\/?code=YTFkNWFkMTNiYjQzNDZiMDI3ZmYxYjA3MzExYjE1MGRfemNRaFdmM1R4MTMwUWVWTUxxbHN2SjZYNEhvazBrZlNfVG9rZW46SFlEU2IwNEd3b29kMGl4cmVaOGNTcHFZbmxjXzE3MzgxNTE4NjQ6MTczODE1NTQ2NF9WNA\" alt=\"\"\/><\/figure>\n\n\n\n<p>CUDA este un limbaj de nivel \u00eenalt. Acesta faciliteaz\u0103 dezvoltarea de biblioteci \u0219i interfe\u021be cu GPU-urile NVIDIA \u0219i sprijin\u0103 dezvoltarea iterativ\u0103 rapid\u0103.<\/p>\n\n\n\n<p>CUDA poate optimiza performan\u021ba prin reglarea fin\u0103 a codului de baz\u0103 (de exemplu, PTX), iar bibliotecile de baz\u0103 sunt deja complete. \u00cen prezent, majoritatea programelor software la nivel de produc\u021bie sunt construite pe CUDA.<\/p>\n\n\n\n<p>PTX este mai asem\u0103n\u0103tor cu un limbaj de asamblare direct inteligibil pentru GPU. Acesta func\u021bioneaz\u0103 la un nivel sc\u0103zut \u0219i permite optimizarea la nivel micro.<\/p>\n\n\n\n<p>Dac\u0103 alege\u021bi s\u0103 programa\u021bi \u00een PTX, aceasta \u00eenseamn\u0103 c\u0103 nu poate fi utilizat\u0103 niciuna dintre bibliotecile CUDA \u00eencorporate men\u021bionate mai sus. Aceasta este o sarcin\u0103 foarte anevoioas\u0103, care necesit\u0103 o expertiz\u0103 aprofundat\u0103 \u00een probleme de hardware \u0219i runtime.<\/p>\n\n\n\n<p>Cu toate acestea, dac\u0103 dezvoltatorii \u00een\u021beleg pe deplin ceea ce fac, pot ob\u021bine \u00eentr-adev\u0103r performan\u021be \u0219i optimiz\u0103ri mai bune \u00een timpul rul\u0103rii.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/rmy9ct2fln.feishu.cn\/space\/api\/box\/stream\/download\/asynccode\/?code=MGU3N2MyY2Y5NDg0MzAxYjkzNzI4MDg3YTRjN2JiNjJfSTI4Um5wZkxwamJMNjRLdmx6TnFmcTlhVDhLbTEyYlhfVG9rZW46SVpVcWJ4TmRtbzdYRjF4RFk5SWN4OWdjbmRlXzE3MzgxNTE4NjQ6MTczODE1NTQ2NF9WNA\" alt=\"\"\/><\/figure>\n\n\n\n<p>\u00cen prezent, principala caracteristic\u0103 a ecosistemului NVIDIA este \u00eenc\u0103 utilizarea CUDA.<\/p>\n\n\n\n<p>Dezvoltatorii care doresc s\u0103 ob\u021bin\u0103 un plus de 10-20% performan\u021b\u0103 sau eficien\u021b\u0103 energetic\u0103 din sarcina lor de calcul, cum ar fi companiile care implementeaz\u0103 modele \u00een cloud \u0219i v\u00e2nd servicii de token, au optimizat \u00eentr-adev\u0103r de la nivelul CUDA la nivelul PTX. Acestea sunt dispuse s\u0103 investeasc\u0103 timp pentru c\u0103 merit\u0103 pe termen lung.<\/p>\n\n\n\n<p>Trebuie remarcat faptul c\u0103 PTX este de obicei optimizat pentru un anumit model de hardware \u0219i este dificil de transferat de la un hardware la altul, cu excep\u021bia cazului \u00een care logica de adaptare este scris\u0103 special.<\/p>\n\n\n\n<p>\u00cen plus, reglarea manual\u0103 a nucleului de calcul necesit\u0103 mult\u0103 perseveren\u021b\u0103, curaj \u0219i o capacitate special\u0103 de a r\u0103m\u00e2ne calm, deoarece programul poate avea o eroare de acces la memorie la fiecare 5.000 de cicluri.<\/p>\n\n\n\n<p>Desigur, pentru acele scenarii \u00een care PTX este cu adev\u0103rat necesar \u0219i pentru acei dezvoltatori care sunt pl\u0103ti\u021bi suficient pentru a se ocupa de aceste probleme, ne exprim\u0103m \u00eentreaga \u00een\u021belegere \u0219i respect.<\/p>\n\n\n\n<p>Pentru to\u021bi ceilal\u021bi dezvoltatori, este recomandabil s\u0103 continue s\u0103 utilizeze CUDA sau alte variante avansate bazate pe CUDA (sau MLIR).<\/p>","protected":false},"excerpt":{"rendered":"<p>Detalii despre lucrarea DeepSeek V3: Cum s\u0103 ocoli\u021bi monopolul CUDA! Cele dou\u0103 modele recent lansate de DeepSeek, DeepSeek-V3 \u0219i DeepSeek-R1, ating performan\u021be comparabile cu modelele similare de la OpenAI la un cost mult mai mic. Potrivit rapoartelor din presa str\u0103in\u0103, \u00een doar dou\u0103 luni, acestea au antrenat un model lingvistic MoE cu 671 de miliarde de parametri pe un cluster de 2 048...<\/p>","protected":false},"author":2,"featured_media":684,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_kadence_starter_templates_imported_post":false,"_kad_post_transparent":"","_kad_post_title":"","_kad_post_layout":"","_kad_post_sidebar_id":"","_kad_post_content_style":"","_kad_post_vertical_padding":"","_kad_post_feature":"","_kad_post_feature_position":"","_kad_post_header":false,"_kad_post_footer":false,"footnotes":""},"categories":[1],"tags":[],"class_list":["post-710","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/janusai.pro\/ro\/wp-json\/wp\/v2\/posts\/710","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/janusai.pro\/ro\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/janusai.pro\/ro\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/janusai.pro\/ro\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/janusai.pro\/ro\/wp-json\/wp\/v2\/comments?post=710"}],"version-history":[{"count":1,"href":"https:\/\/janusai.pro\/ro\/wp-json\/wp\/v2\/posts\/710\/revisions"}],"predecessor-version":[{"id":711,"href":"https:\/\/janusai.pro\/ro\/wp-json\/wp\/v2\/posts\/710\/revisions\/711"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/janusai.pro\/ro\/wp-json\/wp\/v2\/media\/684"}],"wp:attachment":[{"href":"https:\/\/janusai.pro\/ro\/wp-json\/wp\/v2\/media?parent=710"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/janusai.pro\/ro\/wp-json\/wp\/v2\/categories?post=710"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/janusai.pro\/ro\/wp-json\/wp\/v2\/tags?post=710"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}