{"id":710,"date":"2025-01-29T12:02:22","date_gmt":"2025-01-29T12:02:22","guid":{"rendered":"https:\/\/janusai.pro\/?p=710"},"modified":"2025-01-29T12:02:40","modified_gmt":"2025-01-29T12:02:40","slug":"deepseek-v3-paper-details-how-to-bypass-the-cuda-monopoly","status":"publish","type":"post","link":"https:\/\/janusai.pro\/tr\/deepseek-v3-paper-details-how-to-bypass-the-cuda-monopoly\/","title":{"rendered":"DeepSeek V3 makalesinin ayr\u0131nt\u0131lar\u0131: CUDA tekeli nas\u0131l a\u015f\u0131l\u0131r!"},"content":{"rendered":"<div style=\"margin-top: 0px; margin-bottom: 0px;\" class=\"sharethis-inline-share-buttons\" ><\/div>\n<p><a href=\"https:\/\/www.deepseek.com\/\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">DeepSeek V3<\/a> makale detaylar\u0131: CUDA tekeli nas\u0131l a\u015f\u0131l\u0131r!<\/p>\n\n\n\n<p>DeepSeek'in yak\u0131n zamanda piyasaya s\u00fcr\u00fclen iki modeli DeepSeek-V3 ve DeepSeek-R1, OpenAI'nin benzer modelleriyle kar\u015f\u0131la\u015ft\u0131r\u0131labilir bir performans\u0131 \u00e7ok daha d\u00fc\u015f\u00fck bir maliyetle elde ediyor.<\/p>\n\n\n\n<p>Yabanc\u0131 bas\u0131nda \u00e7\u0131kan haberlere g\u00f6re, sadece iki ay i\u00e7inde, en iyi yapay zekadan 10 kat daha verimli olan 2.048 H800 GPU'dan olu\u015fan bir k\u00fcmede 671 milyar parametreye sahip bir MoE dil modelini e\u011fittiler.<\/p>\n\n\n\n<p>Bu at\u0131l\u0131m CUDA kullan\u0131larak de\u011fil, \u00e7ok say\u0131da ince taneli optimizasyon ve NVIDIA'n\u0131n assembly benzeri PTX (paralel i\u015f par\u00e7ac\u0131\u011f\u0131 y\u00fcr\u00fctme) programlamas\u0131n\u0131n kullan\u0131lmas\u0131yla elde edildi.<\/p>\n\n\n\n<p><a href=\"https:\/\/www.deepseek.com\/\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">DeepSeek<\/a> OpenAI ve donan\u0131m k\u0131s\u0131tlamalar\u0131 alt\u0131nda kaba kuvvet hesaplama g\u00fcc\u00fcne dayanan di\u011fer \u015firketlerden farkl\u0131 bir yol izlemek zorunda kalm\u0131\u015ft\u0131r. Performans kazan\u0131mlar\u0131 elde ederken modelin g\u00fc\u00e7 gereksinimlerini azaltmak i\u00e7in bir dizi teknolojik yenilik kulland\u0131.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/rmy9ct2fln.feishu.cn\/space\/api\/box\/stream\/download\/asynccode\/?code=ZDM1YTM0ODZkYmQzOWNkNzc2ZTBmNzUwY2ZjOWYxMjZfYnUyVHFsb05ya0c1M0hvMGRUbk9CN3FVekR1ZjlQMEZfVG9rZW46TUtzM2JudThpb1p3NHJ4SlZNeWNWdU10bnNnXzE3MzgxNTE4NjQ6MTczODE1NTQ2NF9WNA\" alt=\"\"\/><\/figure>\n\n\n\n<p>\u0130nternet kullan\u0131c\u0131lar\u0131ndan baz\u0131 co\u015fkulu yorumlar:<\/p>\n\n\n\n<p>\"Bu d\u00fcnyada, 'CUDA \u00e7ok yava\u015f!<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/rmy9ct2fln.feishu.cn\/space\/api\/box\/stream\/download\/asynccode\/?code=OGEyMmE3ZTJkODlkZDlmNjliZTI1MzI5YTE4ZWE3MjdfWWRBam5VTkVaV1ZsMFg3VzVTRjRDZlUzV2ZiSHZYT2RfVG9rZW46VGZsdWJrTzZHb243OUx4bEZsbmNmMFNzblFiXzE3MzgxNTE4NjQ6MTczODE1NTQ2NF9WNA\" alt=\"\"\/><\/figure>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/rmy9ct2fln.feishu.cn\/space\/api\/box\/stream\/download\/asynccode\/?code=NzI1ZTBlMjJkMDI2N2MyMDdkMGI4YmU5OTJjNGM0YzFfZW4xbjVERFdhdGVObHBDUWR3NVZjbDRSM2lrVDlWRGlfVG9rZW46Q2N5MWIxV2ltbzdmZU14VXI2amNuZDk2bmRkXzE3MzgxNTE4NjQ6MTczODE1NTQ2NF9WNA\" alt=\"\"\/><\/figure>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/rmy9ct2fln.feishu.cn\/space\/api\/box\/stream\/download\/asynccode\/?code=MDMyN2YyYjYwYTNkZDhmMmEyYWY2MjMzZGE3MGM1ZmFfM29veUZrRWdYODRGR0JVdWVVTnRoMzVwTWxjV09CT25fVG9rZW46SVE2dGJWek9Mb29jaTJ4ZnkzWWN5bUZWbnVnXzE3MzgxNTE4NjQ6MTczODE1NTQ2NF9WNA\" alt=\"\"\/><\/figure>\n\n\n\n<div id=\"ez-toc-container\" class=\"ez-toc-v2_0_72 counter-hierarchy ez-toc-counter ez-toc-grey ez-toc-container-direction\">\n<div class=\"ez-toc-title-container\">\n<p class=\"ez-toc-title\" style=\"cursor:inherit\">\u0130\u00e7indekiler<\/p>\n<span class=\"ez-toc-title-toggle\"><a href=\"#\" class=\"ez-toc-pull-right ez-toc-btn ez-toc-btn-xs ez-toc-btn-default ez-toc-toggle\" aria-label=\"\u0130\u00e7erik Tablosunu De\u011fi\u015ftir\"><span class=\"ez-toc-js-icon-con\"><span class=\"\"><span class=\"eztoc-hide\" style=\"display:none;\">Ge\u00e7i\u015f<\/span><span class=\"ez-toc-icon-toggle-span\"><svg style=\"fill: #999;color:#999\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" class=\"list-377408\" width=\"20px\" height=\"20px\" viewbox=\"0 0 24 24\" fill=\"none\"><path d=\"M6 6H4v2h2V6zm14 0H8v2h12V6zM4 11h2v2H4v-2zm16 0H8v2h12v-2zM4 16h2v2H4v-2zm16 0H8v2h12v-2z\" fill=\"currentColor\"><\/path><\/svg><svg style=\"fill: #999;color:#999\" class=\"arrow-unsorted-368013\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"10px\" height=\"10px\" viewbox=\"0 0 24 24\" version=\"1.2\" baseprofile=\"tiny\"><path d=\"M18.2 9.3l-6.2-6.3-6.2 6.3c-.2.2-.3.4-.3.7s.1.5.3.7c.2.2.4.3.7.3h11c.3 0 .5-.1.7-.3.2-.2.3-.5.3-.7s-.1-.5-.3-.7zM5.8 14.7l6.2 6.3 6.2-6.3c.2-.2.3-.5.3-.7s-.1-.5-.3-.7c-.2-.2-.4-.3-.7-.3h-11c-.3 0-.5.1-.7.3-.2.2-.3.5-.3.7s.1.5.3.7z\"\/><\/svg><\/span><\/span><\/span><\/a><\/span><\/div>\n<nav><ul class='ez-toc-list ez-toc-list-level-1' ><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/janusai.pro\/tr\/deepseek-v3-paper-details-how-to-bypass-the-cuda-monopoly\/#Genius_geeks_fine-tune_PTX_to_maximize_GPU_performance\" title=\"Genius geeks GPU performans\u0131n\u0131 en \u00fcst d\u00fczeye \u00e7\u0131karmak i\u00e7in PTX&#039;e ince ayar yap\u0131yor\">Genius geeks GPU performans\u0131n\u0131 en \u00fcst d\u00fczeye \u00e7\u0131karmak i\u00e7in PTX'e ince ayar yap\u0131yor<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-2\" href=\"https:\/\/janusai.pro\/tr\/deepseek-v3-paper-details-how-to-bypass-the-cuda-monopoly\/#PTX_and_CUDA\" title=\"PTX ve CUDA\">PTX ve CUDA<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-3\" href=\"https:\/\/janusai.pro\/tr\/deepseek-v3-paper-details-how-to-bypass-the-cuda-monopoly\/#However_the_technical_barriers_remain\" title=\"Ancak, teknik engeller devam etmektedir\">Ancak, teknik engeller devam etmektedir<\/a><\/li><\/ul><\/nav><\/div>\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"Genius_geeks_fine-tune_PTX_to_maximize_GPU_performance\"><\/span>Genius geeks GPU performans\u0131n\u0131 en \u00fcst d\u00fczeye \u00e7\u0131karmak i\u00e7in PTX'e ince ayar yap\u0131yor<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>NVIDIA PTX (Parallel Thread Execution), GPU'lar\u0131 i\u00e7in \u00f6zel olarak tasarlanm\u0131\u015f, y\u00fcksek seviyeli GPU programlama dilleri (CUDA C\/C++ gibi) veya di\u011fer dil \u00f6n u\u00e7lar\u0131 ile d\u00fc\u015f\u00fck seviyeli makine kodu (streaming assembly veya SASS) aras\u0131nda yer alan bir orta komut seti mimarisidir.<\/p>\n\n\n\n<p>PTX, GPU'yu bir veri paralel hesaplama cihaz\u0131 olarak sunan ve CUDA C\/C++ gibi dillerle m\u00fcmk\u00fcn olmayan kay\u0131t tahsisi ve i\u015f par\u00e7ac\u0131\u011f\u0131\/i\u015f par\u00e7ac\u0131\u011f\u0131 demeti d\u00fczeyinde ayarlama gibi ince taneli optimizasyonlara olanak tan\u0131yan d\u00fc\u015f\u00fck seviyeli bir komut seti mimarisidir.<\/p>\n\n\n\n<p>PTX SASS'a d\u00f6n\u00fc\u015ft\u00fcr\u00fcld\u00fc\u011f\u00fcnde, belirli bir NVIDIA GPU nesli i\u00e7in optimize edilir.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/rmy9ct2fln.feishu.cn\/space\/api\/box\/stream\/download\/asynccode\/?code=MGIwZTQ0ZDdhMTgxYTBlMmEzZDE5OTczN2ZlZmEzNGFfc3I3T2U0UzNCOGdjd1ZHcktHd1hkd1RpcXlLbkxrU1FfVG9rZW46Vk05WWJ0a1Bob3NkYzl4bXpFc2N6anI3bktjXzE3MzgxNTE4NjQ6MTczODE1NTQ2NF9WNA\" alt=\"\"\/><\/figure>\n\n\n\n<p>DeepSeek, V3 modelini e\u011fitirken NVIDIA H800 GPU'yu yeniden yap\u0131land\u0131rd\u0131:<\/p>\n\n\n\n<p>132 ak\u0131\u015f i\u015flemci \u00e7ekirde\u011finin 20'si, i\u015flemcinin ba\u011flant\u0131 s\u0131n\u0131r\u0131n\u0131 a\u015fmak ve i\u015flem i\u015fleme h\u0131z\u0131n\u0131 art\u0131rmak i\u00e7in, esas olarak veri s\u0131k\u0131\u015ft\u0131rma ve a\u00e7ma i\u00e7in sunucular aras\u0131 ileti\u015fime tahsis edilmi\u015ftir.<\/p>\n\n\n\n<p>Performans\u0131 en \u00fcst d\u00fczeye \u00e7\u0131karmak i\u00e7in DeepSeek ayr\u0131ca ek ince taneli i\u015f par\u00e7ac\u0131\u011f\u0131 \/ i\u015f par\u00e7ac\u0131\u011f\u0131 demeti seviyesi ayarlamalar\u0131 yoluyla geli\u015fmi\u015f pipelining algoritmalar\u0131 uygulad\u0131.<\/p>\n\n\n\n<p>Bu optimizasyonlar geleneksel CUDA geli\u015ftirme seviyesinin \u00e7ok \u00f6tesine ge\u00e7mektedir, ancak s\u00fcrd\u00fcr\u00fclmesi son derece zordur. Ancak bu optimizasyon seviyesi, DeepSeek ekibinin ola\u011fan\u00fcst\u00fc teknik yeteneklerini tam olarak ortaya koymaktad\u0131r.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/rmy9ct2fln.feishu.cn\/space\/api\/box\/stream\/download\/asynccode\/?code=MDk2ZDMyM2IzOGU5OWNmN2JhOTY2ZDZmMjhkOWYwZGFfUnFiV0hvbnQ0ZUFHSHg3WHpyMW5jYTRvMURPM1pDSTZfVG9rZW46QnZVNWJyUzBDb2FWeE54Ym4ybGNZNXlnbmFnXzE3MzgxNTE4NjQ6MTczODE1NTQ2NF9WNA\" alt=\"\"\/><\/figure>\n\n\n\n<p>V3 belgesinde \u00f6zellikle PTX ile ilgili ayr\u0131nt\u0131lardan bahsedilmektedir<\/p>\n\n\n\n<p>Bunun nedeni, k\u00fcresel GPU k\u0131tl\u0131\u011f\u0131 ve ABD k\u0131s\u0131tlamalar\u0131n\u0131n ikili bask\u0131s\u0131 alt\u0131nda DeepSeek gibi \u015firketlerin yenilik\u00e7i \u00e7\u00f6z\u00fcmler aramak zorunda kalmas\u0131d\u0131r.<\/p>\n\n\n\n<p>Neyse ki bu alanda \u00f6nemli at\u0131l\u0131mlar ger\u00e7ekle\u015ftirdiler.<\/p>\n\n\n\n<p>Bir geli\u015ftirici \"d\u00fc\u015f\u00fck seviyeli GPU programlaman\u0131n do\u011fru y\u00f6n oldu\u011funa inan\u0131yor. Daha fazla optimizasyon, daha d\u00fc\u015f\u00fck maliyet ya da ek harcama yapmadan di\u011fer ilerlemeler i\u00e7in kullan\u0131labilecek performans b\u00fct\u00e7esi demektir.\"<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/rmy9ct2fln.feishu.cn\/space\/api\/box\/stream\/download\/asynccode\/?code=MmEzYzA5ZTVmNjE4ZTlhMWE0NWU1ZTgyZTA2NmUxMDJfUWdNb21QeEFtUWlFSFA1aGFWZEZJMzlUNjdPT3J5NXRfVG9rZW46RWtaaGJ2UlBHbzk2VWF4TmxkeGNPeGdKblJnXzE3MzgxNTE4NjQ6MTczODE1NTQ2NF9WNA\" alt=\"\"\/><\/figure>\n\n\n\n<p>Bu at\u0131l\u0131m\u0131n pazar \u00fczerinde \u00f6nemli bir etkisi oldu ve baz\u0131 yat\u0131r\u0131mc\u0131lar yeni modelin y\u00fcksek performansl\u0131 donan\u0131ma olan talebi azaltaca\u011f\u0131na ve bunun da NVIDIA gibi \u015firketlerin sat\u0131\u015f performans\u0131n\u0131 etkileyebilece\u011fine inan\u0131yor.<\/p>\n\n\n\n<p>Ancak aralar\u0131nda eski Intel CEO'su Pat Gelsinger'in de bulundu\u011fu sekt\u00f6r duayenleri, yapay zeka uygulamalar\u0131n\u0131n mevcut t\u00fcm bilgi i\u015flem g\u00fcc\u00fcn\u00fc tam olarak kullanabilece\u011fine inan\u0131yor.<\/p>\n\n\n\n<p>Gelsinger, DeepSeek'in bu at\u0131l\u0131m\u0131n\u0131, kitlesel pazar i\u00e7in d\u00fc\u015f\u00fck maliyetli cihazlara yapay zeka yeteneklerini yerle\u015ftirmenin yeni bir yolu olarak g\u00f6r\u00fcyor.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/rmy9ct2fln.feishu.cn\/space\/api\/box\/stream\/download\/asynccode\/?code=NzgzZjM2ZTVlOWM0OWI1MDE5OTI1NTQwNWRjYTI5Y2NfZ25sc2tPNFJ1UHZwemp1WEVlclU1cloxZXI5aHJMbEZfVG9rZW46SHlGTGJnNHpHbzNzbnd4bkxPQ2N4T0RyblZkXzE3MzgxNTE4NjQ6MTczODE1NTQ2NF9WNA\" alt=\"\"\/><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"PTX_and_CUDA\"><\/span>PTX ve CUDA<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>Peki DeepSeek'in geli\u015fi, son teknoloji LLM'nin geli\u015ftirilmesinin art\u0131k b\u00fcy\u00fck \u00f6l\u00e7ekli GPU k\u00fcmeleri gerektirmedi\u011fi anlam\u0131na m\u0131 geliyor?<\/p>\n\n\n\n<p>Google'\u0131n bilgi i\u015flem kaynaklar\u0131na yapt\u0131\u011f\u0131 b\u00fcy\u00fck yat\u0131r\u0131mlar, <a href=\"https:\/\/openai.com\/\" target=\"_blank\" rel=\"noreferrer noopener nofollow\">OpenAI<\/a>Meta ve xAI nihayetinde bo\u015fa m\u0131 gidecek? YZ geli\u015ftiricileri aras\u0131ndaki genel fikir birli\u011fi, durumun b\u00f6yle olmad\u0131\u011f\u0131 y\u00f6n\u00fcndedir.<\/p>\n\n\n\n<p>Bununla birlikte, veri i\u015fleme ve algoritma optimizasyonu a\u00e7\u0131s\u0131ndan hala yararlan\u0131lmas\u0131 gereken b\u00fcy\u00fck bir potansiyel oldu\u011fu ve gelecekte daha yenilik\u00e7i optimizasyon y\u00f6ntemlerinin ortaya \u00e7\u0131kaca\u011f\u0131 kesindir.<\/p>\n\n\n\n<p>DeepSeek'in a\u00e7\u0131k kaynakl\u0131 V3 modeli ile ilgili detaylar teknik raporunda ayr\u0131nt\u0131l\u0131 olarak a\u00e7\u0131klanm\u0131\u015ft\u0131r.<\/p>\n\n\n\n<p>Rapor, DeepSeek taraf\u0131ndan ger\u00e7ekle\u015ftirilen derin temel optimizasyonlar\u0131 belgelemektedir. K\u0131sacas\u0131, optimizasyon derecesi \"t\u00fcm sistemi s\u0131f\u0131rdan yeniden in\u015fa ettiler\" \u015feklinde \u00f6zetlenebilir.<\/p>\n\n\n\n<p>Yukar\u0131da belirtildi\u011fi gibi, H800 GPU kullanarak V3'\u00fc e\u011fitirken DeepSeek, GPU'nun \u00e7ekirdek hesaplama birimlerini (ak\u0131\u015fl\u0131 \u00e7oklu i\u015flemci \u00e7ekirdekleri veya SM'ler) belirli ihtiya\u00e7lar\u0131 kar\u015f\u0131lamak \u00fczere \u00f6zelle\u015ftirmi\u015ftir.<\/p>\n\n\n\n<p>Toplam 132 SM'den 20'sini \u00f6zellikle bilgi i\u015flem g\u00f6revleri yerine sunucular aras\u0131 ileti\u015fim g\u00f6revlerini yerine getirmek \u00fczere tahsis etmi\u015flerdir.<\/p>\n\n\n\n<p>Bu \u00f6zelle\u015ftirme, NVIDIA GPU'nun d\u00fc\u015f\u00fck seviyeli komut seti olan PTX (paralel i\u015f par\u00e7ac\u0131\u011f\u0131 y\u00fcr\u00fctme) seviyesinde yap\u0131l\u0131r.<\/p>\n\n\n\n<p>PTX, assembly diline yak\u0131n bir seviyede \u00e7al\u0131\u015f\u0131r ve kay\u0131t tahsisi ve i\u015f par\u00e7ac\u0131\u011f\u0131 \/ i\u015f par\u00e7ac\u0131\u011f\u0131 demeti seviyesi ayar\u0131 gibi ince taneli optimizasyonlara olanak tan\u0131r. Ancak bu ince kontrol karma\u015f\u0131kt\u0131r ve s\u00fcrd\u00fcr\u00fclmesi zordur.<\/p>\n\n\n\n<p>Bu nedenle geli\u015ftiriciler genellikle \u00e7o\u011fu paralel programlama g\u00f6revi i\u00e7in yeterli performans optimizasyonlar\u0131 sa\u011flayan ve d\u00fc\u015f\u00fck seviyeli optimizasyonlara olan ihtiyac\u0131 ortadan kald\u0131ran CUDA gibi y\u00fcksek seviyeli programlama dillerini kullanmay\u0131 tercih etmektedir.<\/p>\n\n\n\n<p>Bununla birlikte, GPU kaynaklar\u0131n\u0131n verimlili\u011fini en \u00fcst d\u00fczeye \u00e7\u0131karmak ve belirli optimizasyon gereksinimlerini kar\u015f\u0131lamak s\u00f6z konusu oldu\u011funda, geli\u015ftiriciler PTX'e ba\u015fvurmak zorundad\u0131r.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><span class=\"ez-toc-section\" id=\"However_the_technical_barriers_remain\"><\/span>Ancak, teknik engeller devam etmektedir<span class=\"ez-toc-section-end\"><\/span><\/h2>\n\n\n\n<p>Bu konuda \u0130nternet kullan\u0131c\u0131s\u0131 Ian Cutress, \"Deepseek'in PTX kullanmas\u0131 CUDA'n\u0131n teknik engellerini ortadan kald\u0131rm\u0131yor\" dedi.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/rmy9ct2fln.feishu.cn\/space\/api\/box\/stream\/download\/asynccode\/?code=YTFkNWFkMTNiYjQzNDZiMDI3ZmYxYjA3MzExYjE1MGRfemNRaFdmM1R4MTMwUWVWTUxxbHN2SjZYNEhvazBrZlNfVG9rZW46SFlEU2IwNEd3b29kMGl4cmVaOGNTcHFZbmxjXzE3MzgxNTE4NjQ6MTczODE1NTQ2NF9WNA\" alt=\"\"\/><\/figure>\n\n\n\n<p>CUDA y\u00fcksek seviyeli bir dildir. NVIDIA GPU'lar ile k\u00fct\u00fcphane ve aray\u00fcz geli\u015ftirmeyi kolayla\u015ft\u0131r\u0131r ve h\u0131zl\u0131 yinelemeli geli\u015ftirmeyi destekler.<\/p>\n\n\n\n<p>CUDA, temel kodu (yani PTX) ince ayarlayarak performans\u0131 optimize edebilir ve temel k\u00fct\u00fcphaneler zaten tamamlanm\u0131\u015ft\u0131r. \u00dcretim seviyesindeki yaz\u0131l\u0131mlar\u0131n \u00e7o\u011fu \u015fu anda CUDA \u00fczerine in\u015fa edilmi\u015ftir.<\/p>\n\n\n\n<p>PTX, GPU i\u00e7in do\u011frudan anla\u015f\u0131labilir bir assembly diline daha \u00e7ok benzemektedir. D\u00fc\u015f\u00fck seviyede \u00e7al\u0131\u015f\u0131r ve mikro seviyede optimizasyona izin verir.<\/p>\n\n\n\n<p>PTX'te programlamay\u0131 se\u00e7erseniz, yukar\u0131da bahsedilen yerle\u015fik CUDA k\u00fct\u00fcphanelerinin hi\u00e7birinin kullan\u0131lamayaca\u011f\u0131 anlam\u0131na gelir. Bu, donan\u0131m ve \u00e7al\u0131\u015fma zaman\u0131 konular\u0131nda derinlemesine uzmanl\u0131k gerektiren \u00e7ok s\u0131k\u0131c\u0131 bir i\u015ftir.<\/p>\n\n\n\n<p>Bununla birlikte, geli\u015ftiriciler ne yapt\u0131klar\u0131n\u0131 tam olarak anlarlarsa, \u00e7al\u0131\u015fma zaman\u0131nda ger\u00e7ekten daha iyi performans ve optimizasyon elde edebilirler.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/rmy9ct2fln.feishu.cn\/space\/api\/box\/stream\/download\/asynccode\/?code=MGU3N2MyY2Y5NDg0MzAxYjkzNzI4MDg3YTRjN2JiNjJfSTI4Um5wZkxwamJMNjRLdmx6TnFmcTlhVDhLbTEyYlhfVG9rZW46SVpVcWJ4TmRtbzdYRjF4RFk5SWN4OWdjbmRlXzE3MzgxNTE4NjQ6MTczODE1NTQ2NF9WNA\" alt=\"\"\/><\/figure>\n\n\n\n<p>\u015eu anda, NVIDIA ekosisteminin ana ak\u0131m\u0131 hala CUDA kullan\u0131m\u0131d\u0131r.<\/p>\n\n\n\n<p>Bulutta model da\u011f\u0131tan ve token hizmetleri satan \u015firketler gibi hesaplama y\u00fcklerinden ekstra 10-20% performans veya g\u00fc\u00e7 verimlili\u011fi elde etmek isteyen geli\u015ftiriciler, ger\u00e7ekten de CUDA seviyesinden PTX seviyesine kadar optimize ettiler. Uzun vadede buna de\u011fece\u011fi i\u00e7in zaman harcamaya isteklidirler.<\/p>\n\n\n\n<p>PTX'in genellikle belirli bir donan\u0131m modeli i\u00e7in optimize edildi\u011fi ve uyarlama mant\u0131\u011f\u0131 \u00f6zel olarak yaz\u0131lmad\u0131\u011f\u0131 s\u00fcrece farkl\u0131 donan\u0131mlar aras\u0131nda ta\u015f\u0131nmas\u0131n\u0131n zor oldu\u011fu unutulmamal\u0131d\u0131r.<\/p>\n\n\n\n<p>Buna ek olarak, hesaplama \u00e7ekirde\u011fini manuel olarak ayarlamak b\u00fcy\u00fck bir azim, cesaret ve sakin kalmak i\u00e7in \u00f6zel bir yetenek gerektirir, \u00e7\u00fcnk\u00fc program her 5.000 d\u00f6ng\u00fcde bir bellek eri\u015fim hatas\u0131 verebilir.<\/p>\n\n\n\n<p>Elbette, PTX'in ger\u00e7ekten gerekli oldu\u011fu senaryolar ve bu konularla ilgilenecek kadar maa\u015f alan geli\u015ftiriciler i\u00e7in tam anlay\u0131\u015f ve sayg\u0131 duydu\u011fumuzu ifade ediyoruz.<\/p>\n\n\n\n<p>Di\u011fer t\u00fcm geli\u015ftiriciler i\u00e7in CUDA veya CUDA (veya MLIR) tabanl\u0131 di\u011fer geli\u015fmi\u015f varyantlar\u0131 kullanmaya devam etmeleri \u00f6nerilir.<\/p>","protected":false},"excerpt":{"rendered":"<p>DeepSeek V3 makalesinin ayr\u0131nt\u0131lar\u0131: CUDA tekeli nas\u0131l a\u015f\u0131l\u0131r! DeepSeek'in yak\u0131n zamanda piyasaya s\u00fcr\u00fclen iki modeli DeepSeek-V3 ve DeepSeek-R1, OpenAI'nin benzer modelleriyle kar\u015f\u0131la\u015ft\u0131r\u0131labilir bir performans\u0131 \u00e7ok daha d\u00fc\u015f\u00fck bir maliyetle elde ediyor. Yabanc\u0131 bas\u0131nda \u00e7\u0131kan haberlere g\u00f6re, sadece iki ay i\u00e7inde 2.048 bilgisayarl\u0131 bir k\u00fcmede 671 milyar parametreli bir MoE dil modelini e\u011fittiler...<\/p>","protected":false},"author":2,"featured_media":684,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_kadence_starter_templates_imported_post":false,"_kad_post_transparent":"","_kad_post_title":"","_kad_post_layout":"","_kad_post_sidebar_id":"","_kad_post_content_style":"","_kad_post_vertical_padding":"","_kad_post_feature":"","_kad_post_feature_position":"","_kad_post_header":false,"_kad_post_footer":false,"footnotes":""},"categories":[1],"tags":[],"class_list":["post-710","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-uncategorized"],"_links":{"self":[{"href":"https:\/\/janusai.pro\/tr\/wp-json\/wp\/v2\/posts\/710","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/janusai.pro\/tr\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/janusai.pro\/tr\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/janusai.pro\/tr\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/janusai.pro\/tr\/wp-json\/wp\/v2\/comments?post=710"}],"version-history":[{"count":1,"href":"https:\/\/janusai.pro\/tr\/wp-json\/wp\/v2\/posts\/710\/revisions"}],"predecessor-version":[{"id":711,"href":"https:\/\/janusai.pro\/tr\/wp-json\/wp\/v2\/posts\/710\/revisions\/711"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/janusai.pro\/tr\/wp-json\/wp\/v2\/media\/684"}],"wp:attachment":[{"href":"https:\/\/janusai.pro\/tr\/wp-json\/wp\/v2\/media?parent=710"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/janusai.pro\/tr\/wp-json\/wp\/v2\/categories?post=710"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/janusai.pro\/tr\/wp-json\/wp\/v2\/tags?post=710"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}