{"id":710,"date":"2025-01-29T12:02:22","date_gmt":"2025-01-29T12:02:22","guid":{"rendered":"https:\/\/janusai.pro\/?p=710"},"modified":"2025-01-29T12:02:40","modified_gmt":"2025-01-29T12:02:40","slug":"deepseek-v3-paper-details-how-to-bypass-the-cuda-monopoly","status":"publish","type":"post","link":"https:\/\/janusai.pro\/cs\/deepseek-v3-paper-details-how-to-bypass-the-cuda-monopoly\/","title":{"rendered":"Podrobnosti o dokumentu DeepSeek V3: Jak obej\u00edt monopol CUDA!"},"content":{"rendered":"<div style=\"margin-top: 0px; margin-bottom: 0px;\" class=\"sharethis-inline-share-buttons\" ><\/div>\n<p><a href=\"https:\/\/www.deepseek.com\/\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">DeepSeek V3<\/a> podrobnosti o pap\u00edru: Jak obej\u00edt monopol CUDA!<\/p>\n\n\n\n<p>Dva ned\u00e1vno vydan\u00e9 modely DeepSeek, DeepSeek-V3 a DeepSeek-R1, dosahuj\u00ed v\u00fdkonu srovnateln\u00e9ho s podobn\u00fdmi modely od OpenAI za mnohem ni\u017e\u0161\u00ed cenu.<\/p>\n\n\n\n<p>Podle zpr\u00e1v zahrani\u010dn\u00edch m\u00e9di\u00ed se jim za pouh\u00e9 dva m\u011bs\u00edce poda\u0159ilo na clusteru 2 048 grafick\u00fdch procesor\u016f H800 vycvi\u010dit jazykov\u00fd model MoE s 671 miliardami parametr\u016f, co\u017e je desetkr\u00e1t efektivn\u011bj\u0161\u00ed ne\u017e \u0161pi\u010dkov\u00e1 um\u011bl\u00e1 inteligence.<\/p>\n\n\n\n<p>Tohoto pr\u016flomu nebylo dosa\u017eeno pomoc\u00ed CUDA, ale d\u00edky velk\u00e9mu mno\u017estv\u00ed jemn\u00fdch optimalizac\u00ed a pou\u017eit\u00ed programov\u00e1n\u00ed PTX (paraleln\u00ed spou\u0161t\u011bn\u00ed vl\u00e1ken), kter\u00e9 je podobn\u00e9 asembleru spole\u010dnosti NVIDIA.<\/p>\n\n\n\n<p><a href=\"https:\/\/www.deepseek.com\/\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">DeepSeek<\/a> byla nucena vydat se jinou cestou ne\u017e OpenAI a dal\u0161\u00ed spole\u010dnosti, kter\u00e9 se spol\u00e9haj\u00ed na hrubou v\u00fdpo\u010detn\u00ed s\u00edlu p\u0159i hardwarov\u00fdch omezen\u00edch. Vyu\u017eila \u0159adu technologick\u00fdch inovac\u00ed, aby sn\u00ed\u017eila energetick\u00e9 n\u00e1roky modelu a z\u00e1rove\u0148 dos\u00e1hla zv\u00fd\u0161en\u00ed v\u00fdkonu.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/rmy9ct2fln.feishu.cn\/space\/api\/box\/stream\/download\/asynccode\/?code=ZDM1YTM0ODZkYmQzOWNkNzc2ZTBmNzUwY2ZjOWYxMjZfYnUyVHFsb05ya0c1M0hvMGRUbk9CN3FVekR1ZjlQMEZfVG9rZW46TUtzM2JudThpb1p3NHJ4SlZNeWNWdU10bnNnXzE3MzgxNTE4NjQ6MTczODE1NTQ2NF9WNA\" alt=\"\"\/><\/figure>\n\n\n\n<p>N\u011bkolik nad\u0161en\u00fdch koment\u00e1\u0159\u016f od u\u017eivatel\u016f internetu:<\/p>\n\n\n\n<p>\"Pokud v tomto sv\u011bt\u011b existuj\u00ed skupiny lid\u00ed, kte\u0159\u00ed jsou natolik bl\u00e1zniv\u00ed, \u017ee \u0159\u00edkaj\u00ed v\u011bci jako 'CUDA je p\u0159\u00edli\u0161 pomal\u00e1!<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/rmy9ct2fln.feishu.cn\/space\/api\/box\/stream\/download\/asynccode\/?code=OGEyMmE3ZTJkODlkZDlmNjliZTI1MzI5YTE4ZWE3MjdfWWRBam5VTkVaV1ZsMFg3VzVTRjRDZlUzV2ZiSHZYT2RfVG9rZW46VGZsdWJrTzZHb243OUx4bEZsbmNmMFNzblFiXzE3MzgxNTE4NjQ6MTczODE1NTQ2NF9WNA\" alt=\"\"\/><\/figure>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/rmy9ct2fln.feishu.cn\/space\/api\/box\/stream\/download\/asynccode\/?code=NzI1ZTBlMjJkMDI2N2MyMDdkMGI4YmU5OTJjNGM0YzFfZW4xbjVERFdhdGVObHBDUWR3NVZjbDRSM2lrVDlWRGlfVG9rZW46Q2N5MWIxV2ltbzdmZU14VXI2amNuZDk2bmRkXzE3MzgxNTE4NjQ6MTczODE1NTQ2NF9WNA\" alt=\"\"\/><\/figure>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/rmy9ct2fln.feishu.cn\/space\/api\/box\/stream\/download\/asynccode\/?code=MDMyN2YyYjYwYTNkZDhmMmEyYWY2MjMzZGE3MGM1ZmFfM29veUZrRWdYODRGR0JVdWVVTnRoMzVwTWxjV09CT25fVG9rZW46SVE2dGJWek9Mb29jaTJ4ZnkzWWN5bUZWbnVnXzE3MzgxNTE4NjQ6MTczODE1NTQ2NF9WNA\" alt=\"\"\/><\/figure>\n\n\n\n<div id=\"ez-toc-container\" class=\"ez-toc-v2_0_82_2 counter-hierarchy ez-toc-counter ez-toc-grey ez-toc-container-direction\">\n<div class=\"ez-toc-title-container\">\n<p class=\"ez-toc-title\" style=\"cursor:inherit\">Obsah<\/p>\n<span class=\"ez-toc-title-toggle\"><a href=\"#\" class=\"ez-toc-pull-right ez-toc-btn ez-toc-btn-xs ez-toc-btn-default ez-toc-toggle\" aria-label=\"P\u0159epnut\u00ed tabulky obsahu\"><span class=\"ez-toc-js-icon-con\"><span class=\"\"><span class=\"eztoc-hide\" style=\"display:none;\">P\u0159ep\u00edna\u010d<\/span><span class=\"ez-toc-icon-toggle-span\"><svg style=\"fill: #999;color:#999\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" class=\"list-377408\" width=\"20px\" height=\"20px\" viewbox=\"0 0 24 24\" fill=\"none\"><path d=\"M6 6H4v2h2V6zm14 0H8v2h12V6zM4 11h2v2H4v-2zm16 0H8v2h12v-2zM4 16h2v2H4v-2zm16 0H8v2h12v-2z\" fill=\"currentColor\"><\/path><\/svg><svg style=\"fill: #999;color:#999\" class=\"arrow-unsorted-368013\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"10px\" height=\"10px\" viewbox=\"0 0 24 24\" version=\"1.2\" baseprofile=\"tiny\"><path d=\"M18.2 9.3l-6.2-6.3-6.2 6.3c-.2.2-.3.4-.3.7s.1.5.3.7c.2.2.4.3.7.3h11c.3 0 .5-.1.7-.3.2-.2.3-.5.3-.7s-.1-.5-.3-.7zM5.8 14.7l6.2 6.3 6.2-6.3c.2-.2.3-.5.3-.7s-.1-.5-.3-.7c-.2-.2-.4-.3-.7-.3h-11c-.3 0-.5.1-.7.3-.2.2-.3.5-.3.7s.1.5.3.7z\"\/><\/svg><\/span><\/span><\/span><\/a><\/span><\/div>\n<nav><ul class='ez-toc-list ez-toc-list-level-1' ><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/janusai.pro\/cs\/deepseek-v3-paper-details-how-to-bypass-the-cuda-monopoly\/#Genius_geeks_fine-tune_PTX_to_maximize_GPU_performance\" >Geni\u00e1ln\u00ed geekov\u00e9 vyladili PTX, aby maximalizovali v\u00fdkon GPU<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-2\" href=\"https:\/\/janusai.pro\/cs\/deepseek-v3-paper-details-how-to-bypass-the-cuda-monopoly\/#PTX_and_CUDA\" >PTX a CUDA<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-3\" href=\"https:\/\/janusai.pro\/cs\/deepseek-v3-paper-details-how-to-bypass-the-cuda-monopoly\/#However_the_technical_barriers_remain\" >Technick\u00e9 p\u0159ek\u00e1\u017eky v\u0161ak p\u0159etrv\u00e1vaj\u00ed<\/a><\/li><\/ul><\/nav><\/div>\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Genius_geeks_fine-tune_PTX_to_maximize_GPU_performance\"><\/span>Geni\u00e1ln\u00ed geekov\u00e9 vyladili PTX, aby maximalizovali v\u00fdkon GPU<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>NVIDIA PTX (Parallel Thread Execution) je architektura prost\u0159edn\u00ed sady instrukc\u00ed navr\u017een\u00e1 speci\u00e1ln\u011b pro jej\u00ed GPU, kter\u00e1 se nach\u00e1z\u00ed mezi vysoko\u00farov\u0148ov\u00fdmi programov\u00fdmi jazyky GPU (jako je CUDA C\/C++) nebo jin\u00fdmi jazykov\u00fdmi frontendy a n\u00edzko\u00farov\u0148ov\u00fdm strojov\u00fdm k\u00f3dem (streaming assembly nebo SASS).<\/p>\n\n\n\n<p>PTX je n\u00edzko\u00farov\u0148ov\u00e1 architektura instruk\u010dn\u00ed sady, kter\u00e1 p\u0159edstavuje GPU jako datov\u011b paraleln\u00ed v\u00fdpo\u010detn\u00ed za\u0159\u00edzen\u00ed a umo\u017e\u0148uje jemn\u00e9 optimalizace, jako je alokace registr\u016f a lad\u011bn\u00ed na \u00farovni vl\u00e1ken a svazk\u016f vl\u00e1ken, kter\u00e9 nejsou mo\u017en\u00e9 v jazyc\u00edch, jako je CUDA C\/C++.<\/p>\n\n\n\n<p>Kdy\u017e je PTX p\u0159eveden do SASS, je optimalizov\u00e1n pro konkr\u00e9tn\u00ed generaci grafick\u00fdch procesor\u016f NVIDIA.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/rmy9ct2fln.feishu.cn\/space\/api\/box\/stream\/download\/asynccode\/?code=MGIwZTQ0ZDdhMTgxYTBlMmEzZDE5OTczN2ZlZmEzNGFfc3I3T2U0UzNCOGdjd1ZHcktHd1hkd1RpcXlLbkxrU1FfVG9rZW46Vk05WWJ0a1Bob3NkYzl4bXpFc2N6anI3bktjXzE3MzgxNTE4NjQ6MTczODE1NTQ2NF9WNA\" alt=\"\"\/><\/figure>\n\n\n\n<p>P\u0159i tr\u00e9nov\u00e1n\u00ed modelu V3 spole\u010dnost DeepSeek p\u0159ekonfigurovala grafick\u00fd procesor NVIDIA H800:<\/p>\n\n\n\n<p>Ze 132 jader proudov\u00e9ho procesoru jich bylo 20 p\u0159id\u011bleno pro komunikaci mezi servery, p\u0159edev\u0161\u00edm pro kompresi a dekompresi dat, aby se p\u0159ekonal limit p\u0159ipojen\u00ed procesoru a zv\u00fd\u0161ila se rychlost zpracov\u00e1n\u00ed transakc\u00ed.<\/p>\n\n\n\n<p>Za \u00fa\u010delem maximalizace v\u00fdkonu implementovala spole\u010dnost DeepSeek tak\u00e9 pokro\u010dil\u00e9 algoritmy pipeliningu prost\u0159ednictv\u00edm dodate\u010dn\u00fdch jemn\u00fdch \u00faprav na \u00farovni vl\u00e1ken\/svazk\u016f vl\u00e1ken.<\/p>\n\n\n\n<p>Tyto optimalizace jdou daleko za \u00farove\u0148 b\u011b\u017en\u00e9ho v\u00fdvoje CUDA, ale jejich \u00fadr\u017eba je nesm\u00edrn\u011b n\u00e1ro\u010dn\u00e1. Pr\u00e1v\u011b tato \u00farove\u0148 optimalizace v\u0161ak pln\u011b demonstruje vynikaj\u00edc\u00ed technick\u00e9 schopnosti t\u00fdmu DeepSeek.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/rmy9ct2fln.feishu.cn\/space\/api\/box\/stream\/download\/asynccode\/?code=MDk2ZDMyM2IzOGU5OWNmN2JhOTY2ZDZmMjhkOWYwZGFfUnFiV0hvbnQ0ZUFHSHg3WHpyMW5jYTRvMURPM1pDSTZfVG9rZW46QnZVNWJyUzBDb2FWeE54Ym4ybGNZNXlnbmFnXzE3MzgxNTE4NjQ6MTczODE1NTQ2NF9WNA\" alt=\"\"\/><\/figure>\n\n\n\n<p>V dokumentu V3 jsou v\u00fdslovn\u011b uvedeny podrobnosti o PTX<\/p>\n\n\n\n<p>Je to proto, \u017ee pod dvoj\u00edm tlakem glob\u00e1ln\u00edho nedostatku GPU a omezen\u00ed ze strany USA musely spole\u010dnosti jako DeepSeek hledat inovativn\u00ed \u0159e\u0161en\u00ed.<\/p>\n\n\n\n<p>Na\u0161t\u011bst\u00ed se jim v t\u00e9to oblasti poda\u0159ilo dos\u00e1hnout v\u00fdznamn\u00e9ho pokroku.<\/p>\n\n\n\n<p>Jeden z v\u00fdvoj\u00e1\u0159\u016f se domn\u00edv\u00e1, \u017ee \"n\u00edzko\u00farov\u0148ov\u00e9 programov\u00e1n\u00ed GPU je spr\u00e1vn\u00fd sm\u011br. \u010c\u00edm v\u00edce optimalizac\u00ed, t\u00edm ni\u017e\u0161\u00ed n\u00e1klady, resp. v\u00fdkonnostn\u00ed rozpo\u010det, kter\u00fd lze bez dal\u0161\u00edch v\u00fddaj\u016f vyu\u017e\u00edt pro dal\u0161\u00ed pokrok.\"<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/rmy9ct2fln.feishu.cn\/space\/api\/box\/stream\/download\/asynccode\/?code=MmEzYzA5ZTVmNjE4ZTlhMWE0NWU1ZTgyZTA2NmUxMDJfUWdNb21QeEFtUWlFSFA1aGFWZEZJMzlUNjdPT3J5NXRfVG9rZW46RWtaaGJ2UlBHbzk2VWF4TmxkeGNPeGdKblJnXzE3MzgxNTE4NjQ6MTczODE1NTQ2NF9WNA\" alt=\"\"\/><\/figure>\n\n\n\n<p>Tento pr\u016flom m\u011bl v\u00fdznamn\u00fd dopad na trh a n\u011bkte\u0159\u00ed investo\u0159i se domn\u00edvaj\u00ed, \u017ee nov\u00fd model sn\u00ed\u017e\u00ed popt\u00e1vku po v\u00fdkonn\u00e9m hardwaru, co\u017e m\u016f\u017ee ovlivnit prodejn\u00ed v\u00fdsledky spole\u010dnost\u00ed, jako je NVIDIA.<\/p>\n\n\n\n<p>Veter\u00e1ni v oboru, v\u010detn\u011b b\u00fdval\u00e9ho gener\u00e1ln\u00edho \u0159editele spole\u010dnosti Intel Pata Gelsingera, v\u0161ak v\u011b\u0159\u00ed, \u017ee aplikace um\u011bl\u00e9 inteligence mohou pln\u011b vyu\u017e\u00edt ve\u0161ker\u00fd dostupn\u00fd v\u00fdpo\u010detn\u00ed v\u00fdkon.<\/p>\n\n\n\n<p>Gelsinger pova\u017euje tento pr\u016flom spole\u010dnosti DeepSeek za nov\u00fd zp\u016fsob, jak zabudovat schopnosti um\u011bl\u00e9 inteligence do levn\u00fdch za\u0159\u00edzen\u00ed pro masov\u00fd trh.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/rmy9ct2fln.feishu.cn\/space\/api\/box\/stream\/download\/asynccode\/?code=NzgzZjM2ZTVlOWM0OWI1MDE5OTI1NTQwNWRjYTI5Y2NfZ25sc2tPNFJ1UHZwemp1WEVlclU1cloxZXI5aHJMbEZfVG9rZW46SHlGTGJnNHpHbzNzbnd4bkxPQ2N4T0RyblZkXzE3MzgxNTE4NjQ6MTczODE1NTQ2NF9WNA\" alt=\"\"\/><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"PTX_and_CUDA\"><\/span>PTX a CUDA<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>Znamen\u00e1 tedy p\u0159\u00edchod DeepSeek, \u017ee v\u00fdvoj \u0161pi\u010dkov\u00fdch LLM ji\u017e nevy\u017eaduje velk\u00e9 klastry GPU?<\/p>\n\n\n\n<p>Budou obrovsk\u00e9 investice spole\u010dnosti Google do v\u00fdpo\u010detn\u00edch zdroj\u016f, <a href=\"https:\/\/openai.com\/\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">OpenAI<\/a>, Meta a xAI nakonec p\u0159ijdou vnive\u010d? Mezi v\u00fdvoj\u00e1\u0159i AI panuje v\u0161eobecn\u00e1 shoda, \u017ee tomu tak nen\u00ed.<\/p>\n\n\n\n<p>Je v\u0161ak jist\u00e9, \u017ee v oblasti zpracov\u00e1n\u00ed dat a optimalizace algoritm\u016f je st\u00e1le obrovsk\u00fd potenci\u00e1l, kter\u00fd je t\u0159eba vyu\u017e\u00edt, a v budoucnu se jist\u011b objev\u00ed dal\u0161\u00ed inovativn\u00ed metody optimalizace.<\/p>\n\n\n\n<p>Model V3 syst\u00e9mu DeepSeek je otev\u0159en\u00fd a podrobnosti jsou podrobn\u011b zve\u0159ejn\u011bny v jeho technick\u00e9 zpr\u00e1v\u011b.<\/p>\n\n\n\n<p>Zpr\u00e1va dokumentuje hlubok\u00e9 z\u00e1kladn\u00ed optimalizace proveden\u00e9 n\u00e1strojem DeepSeek. Ve zkratce lze stupe\u0148 optimalizace shrnout slovy \"p\u0159estav\u011bli cel\u00fd syst\u00e9m od z\u00e1kladu\".<\/p>\n\n\n\n<p>Jak bylo uvedeno v\u00fd\u0161e, p\u0159i tr\u00e9ninku V3 pomoc\u00ed grafick\u00e9ho procesoru H800 spole\u010dnost DeepSeek p\u0159izp\u016fsobila z\u00e1kladn\u00ed v\u00fdpo\u010detn\u00ed jednotky grafick\u00e9ho procesoru (streamovac\u00ed v\u00edceprocesorov\u00e1 j\u00e1dra neboli SM) specifick\u00fdm pot\u0159eb\u00e1m.<\/p>\n\n\n\n<p>Z celkov\u00e9ho po\u010dtu 132 SM jich bylo 20 p\u0159id\u011bleno speci\u00e1ln\u011b na \u0159e\u0161en\u00ed komunika\u010dn\u00edch \u00faloh mezi servery, nikoli v\u00fdpo\u010detn\u00edch \u00faloh.<\/p>\n\n\n\n<p>Toto p\u0159izp\u016fsoben\u00ed se prov\u00e1d\u00ed na \u00farovni PTX (parallel thread execution), co\u017e je n\u00edzko\u00farov\u0148ov\u00e1 instruk\u010dn\u00ed sada grafick\u00e9ho procesoru NVIDIA.<\/p>\n\n\n\n<p>PTX b\u011b\u017e\u00ed na \u00farovni bl\u00edzk\u00e9 jazyku assembleru a umo\u017e\u0148uje jemn\u00e9 optimalizace, jako je alokace registr\u016f a lad\u011bn\u00ed na \u00farovni vl\u00e1ken\/vl\u00e1knov\u00fdch svazk\u016f. Toto jemn\u00e9 \u0159\u00edzen\u00ed je v\u0161ak slo\u017eit\u00e9 a obt\u00ed\u017en\u011b udr\u017eovateln\u00e9.<\/p>\n\n\n\n<p>Proto v\u00fdvoj\u00e1\u0159i obvykle d\u00e1vaj\u00ed p\u0159ednost vysoko\u00farov\u0148ov\u00fdm programovac\u00edm jazyk\u016fm, jako je CUDA, kter\u00e9 poskytuj\u00ed dostate\u010dn\u00e9 v\u00fdkonnostn\u00ed optimalizace pro v\u011bt\u0161inu paraleln\u00edch programovac\u00edch \u00faloh a eliminuj\u00ed pot\u0159ebu n\u00edzko\u00farov\u0148ov\u00fdch optimalizac\u00ed.<\/p>\n\n\n\n<p>Pokud v\u0161ak jde o maximalizaci efektivity prost\u0159edk\u016f GPU a dosa\u017een\u00ed specifick\u00fdch po\u017eadavk\u016f na optimalizaci, mus\u00ed v\u00fdvoj\u00e1\u0159i s\u00e1hnout po PTX.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"However_the_technical_barriers_remain\"><\/span>Technick\u00e9 p\u0159ek\u00e1\u017eky v\u0161ak p\u0159etrv\u00e1vaj\u00ed<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>V t\u00e9to souvislosti u\u017eivatel internetu Ian Cutress \u0159ekl: \"Pou\u017eit\u00ed PTX spole\u010dnost\u00ed Deepseek neodstra\u0148uje technick\u00e9 p\u0159ek\u00e1\u017eky CUDA.\"<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/rmy9ct2fln.feishu.cn\/space\/api\/box\/stream\/download\/asynccode\/?code=YTFkNWFkMTNiYjQzNDZiMDI3ZmYxYjA3MzExYjE1MGRfemNRaFdmM1R4MTMwUWVWTUxxbHN2SjZYNEhvazBrZlNfVG9rZW46SFlEU2IwNEd3b29kMGl4cmVaOGNTcHFZbmxjXzE3MzgxNTE4NjQ6MTczODE1NTQ2NF9WNA\" alt=\"\"\/><\/figure>\n\n\n\n<p>CUDA je vysoko\u00farov\u0148ov\u00fd jazyk. Usnad\u0148uje v\u00fdvoj knihoven a rozhran\u00ed s grafick\u00fdmi procesory NVIDIA a podporuje rychl\u00fd iterativn\u00ed v\u00fdvoj.<\/p>\n\n\n\n<p>CUDA dok\u00e1\u017ee optimalizovat v\u00fdkon vylad\u011bn\u00edm z\u00e1kladn\u00edho k\u00f3du (tj. PTX) a z\u00e1kladn\u00ed knihovny jsou ji\u017e hotov\u00e9. V\u011bt\u0161ina softwaru na produk\u010dn\u00ed \u00farovni je v sou\u010dasn\u00e9 dob\u011b postavena na CUDA.<\/p>\n\n\n\n<p>PTX se v\u00edce podob\u00e1 p\u0159\u00edmo srozumiteln\u00e9mu jazyku assembleru pro GPU. Pracuje na n\u00edzk\u00e9 \u00farovni a umo\u017e\u0148uje optimalizaci na mikro\u00farovni.<\/p>\n\n\n\n<p>Pokud se rozhodnete programovat v PTX, znamen\u00e1 to, \u017ee nelze pou\u017e\u00edt \u017e\u00e1dnou z v\u00fd\u0161e uveden\u00fdch integrovan\u00fdch knihoven CUDA. Jedn\u00e1 se o velmi zdlouhav\u00fd \u00fakol, kter\u00fd vy\u017eaduje hlubok\u00e9 znalosti hardwaru a problematiky b\u011bhu.<\/p>\n\n\n\n<p>Pokud v\u0161ak v\u00fdvoj\u00e1\u0159i pln\u011b rozum\u00ed tomu, co d\u011blaj\u00ed, mohou skute\u010dn\u011b dos\u00e1hnout lep\u0161\u00edho v\u00fdkonu a optimalizace za b\u011bhu.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/rmy9ct2fln.feishu.cn\/space\/api\/box\/stream\/download\/asynccode\/?code=MGU3N2MyY2Y5NDg0MzAxYjkzNzI4MDg3YTRjN2JiNjJfSTI4Um5wZkxwamJMNjRLdmx6TnFmcTlhVDhLbTEyYlhfVG9rZW46SVpVcWJ4TmRtbzdYRjF4RFk5SWN4OWdjbmRlXzE3MzgxNTE4NjQ6MTczODE1NTQ2NF9WNA\" alt=\"\"\/><\/figure>\n\n\n\n<p>V sou\u010dasn\u00e9 dob\u011b je hlavn\u00edm proudem ekosyst\u00e9mu NVIDIA st\u00e1le pou\u017e\u00edv\u00e1n\u00ed CUDA.<\/p>\n\n\n\n<p>V\u00fdvoj\u00e1\u0159i, kte\u0159\u00ed cht\u011bj\u00ed ze sv\u00e9 v\u00fdpo\u010detn\u00ed z\u00e1t\u011b\u017ee z\u00edskat extra 10-20% v\u00fdkon nebo energetickou \u00fa\u010dinnost, jako jsou spole\u010dnosti, kter\u00e9 nasazuj\u00ed modely v cloudu a prod\u00e1vaj\u00ed tokenov\u00e9 slu\u017eby, skute\u010dn\u011b optimalizovali z \u00farovn\u011b CUDA na \u00farove\u0148 PTX. Jsou ochotny investovat \u010das, proto\u017ee se jim to z dlouhodob\u00e9ho hlediska vyplat\u00ed.<\/p>\n\n\n\n<p>Je t\u0159eba poznamenat, \u017ee PTX je obvykle optimalizov\u00e1n pro ur\u010dit\u00fd model hardwaru a je obt\u00ed\u017en\u00e9 jej p\u0159en\u00e1\u0161et mezi r\u016fzn\u00fdm hardwarem, pokud nen\u00ed speci\u00e1ln\u011b naps\u00e1na logika p\u0159izp\u016fsoben\u00ed.<\/p>\n\n\n\n<p>Ru\u010dn\u00ed lad\u011bn\u00ed v\u00fdpo\u010detn\u00edho j\u00e1dra nav\u00edc vy\u017eaduje velkou vytrvalost, odvahu a zvl\u00e1\u0161tn\u00ed schopnost zachovat klid, proto\u017ee program m\u016f\u017ee m\u00edt ka\u017ed\u00fdch 5 000 cykl\u016f chybu p\u0159\u00edstupu do pam\u011bti.<\/p>\n\n\n\n<p>Samoz\u0159ejm\u011b pro ty sc\u00e9n\u00e1\u0159e, kde je PTX opravdu pot\u0159eba, a pro ty v\u00fdvoj\u00e1\u0159e, kte\u0159\u00ed jsou dostate\u010dn\u011b placeni za to, aby se t\u011bmito probl\u00e9my zab\u00fdvali, vyjad\u0159ujeme pln\u00e9 pochopen\u00ed a respekt.<\/p>\n\n\n\n<p>V\u0161em ostatn\u00edm v\u00fdvoj\u00e1\u0159\u016fm doporu\u010dujeme nad\u00e1le pou\u017e\u00edvat CUDA nebo jin\u00e9 pokro\u010dil\u00e9 varianty zalo\u017een\u00e9 na CUDA (nebo MLIR).<\/p>","protected":false},"excerpt":{"rendered":"<p>Podrobnosti o dokumentu DeepSeek V3: Jak obej\u00edt monopol CUDA! Dva ned\u00e1vno vydan\u00e9 modely DeepSeek, DeepSeek-V3 a DeepSeek-R1, dosahuj\u00ed v\u00fdkonu srovnateln\u00e9ho s podobn\u00fdmi modely od OpenAI za mnohem ni\u017e\u0161\u00ed cenu. Podle zpr\u00e1v zahrani\u010dn\u00edch m\u00e9di\u00ed se jim za pouh\u00e9 dva m\u011bs\u00edce poda\u0159ilo vytr\u00e9novat model jazyka MoE s 671 miliardami parametr\u016f na clusteru s 2 048...<\/p>","protected":false},"author":2,"featured_media":684,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_kadence_starter_templates_imported_post":false,"_kad_post_transparent":"","_kad_post_title":"","_kad_post_layout":"","_kad_post_sidebar_id":"","_kad_post_content_style":"","_kad_post_vertical_padding":"","_kad_post_feature":"","_kad_post_feature_position":"","_kad_post_header":false,"_kad_post_footer":false,"footnotes":""},"categories":[1],"tags":[],"class_list":["post-710","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/janusai.pro\/cs\/wp-json\/wp\/v2\/posts\/710","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/janusai.pro\/cs\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/janusai.pro\/cs\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/janusai.pro\/cs\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/janusai.pro\/cs\/wp-json\/wp\/v2\/comments?post=710"}],"version-history":[{"count":1,"href":"https:\/\/janusai.pro\/cs\/wp-json\/wp\/v2\/posts\/710\/revisions"}],"predecessor-version":[{"id":711,"href":"https:\/\/janusai.pro\/cs\/wp-json\/wp\/v2\/posts\/710\/revisions\/711"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/janusai.pro\/cs\/wp-json\/wp\/v2\/media\/684"}],"wp:attachment":[{"href":"https:\/\/janusai.pro\/cs\/wp-json\/wp\/v2\/media?parent=710"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/janusai.pro\/cs\/wp-json\/wp\/v2\/categories?post=710"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/janusai.pro\/cs\/wp-json\/wp\/v2\/tags?post=710"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}