热搜词:

MiniMax押注线性注意力,让百万级长文本只用1/2700算力

Transformer鏋舵瀯涓诲�鐫€鐢熸垚寮廇I娴�疆鐨勫綋涓嬶紝浣嗗畠骞堕潪鍗佸叏鍗佺編锛屼篃骞堕潪娌℃湁鏀瑰啓鑰呫€�

MiniMax-01灏变互鍙橀潻鑰呬箣濮挎悈鍔ㄥ紑婧愮ぞ鍖猴紝鎶兼敞绾挎€ф敞鎰忓姏鏈哄埗骞跺皢鍏舵墿灞曞埌鍓嶆墍鏈�湁鐨�4560浜�鍙傛暟瑙勬ā銆�

杩欐槸涓€鍦烘妧鏈�啋闄╋紝涔熷彲鑳芥槸鏋舵瀯鍒涙柊鐨勪笅涓€涓�噷绋嬬�銆�

鈻矼iniMax-01鎶€鏈�枃妗�

鏈�湡銆屽ぇ妯″瀷鍒涙柊鏋舵瀯銆嶄富棰樿�璋�锛�閲忓瓙浣�閭€璇峰埌MiniMax-01鏋舵瀯璐熻矗浜洪挓鎬$劧锛岃亰鑱婄嚎鎬ф敞鎰忓姏浠庡疄楠屽�璧板悜宸ヤ笟绾уぇ妯″瀷鐨勫叏杩囩▼锛屼互鍙婁粬瀵规ā鍨嬫灦鏋勭殑鎬濊€冨拰娲炶棌銆�

浠ヤ笅涓�閲忓瓙浣�涓�MiniMax閽熸€$劧鐨勫�璇濆疄褰曟暣鐞嗭細

闈炰富娴佹妧鏈�矾绾垮厛琛岃€�

閲忓瓙浣�锛氳兘鍚﹀厛绠€鍗曚粙缁嶄竴涓嬭嚜宸憋紵

MiniMax閽熸€$劧锛�鎴戞槸閽熸€$劧锛岀幇鍦ㄦ槸MiniMax鐨勯珮绾х爺绌舵€荤洃锛屼富瑕佽礋璐g綉缁滄灦鏋勭殑璁捐�鍜屽�妯℃€佺悊瑙eぇ妯″瀷銆傚湪MiniMax涓昏�宸ヤ綔鏄�富瀵艰�璁�iniMax-01鐨勭綉缁滅粨鏋勩€�

涔嬪墠鎴戝湪涓婃捣浜哄伐鏅鸿兘瀹為獙瀹�鎷呬换闈掑勾绉戝�瀹舵柊鏋舵瀯鎺㈢储缁勭殑PI锛岃礋璐i潪transformer鏋舵瀯鐨勯珮鏁堣�缁冨缓妯℃柟娉曪紝浠ュ強瑙嗗惉璇�█澶氭ā鎬佽瀺鍚堢殑鐮旂┒銆�

閲忓瓙浣嶏細浣犳槸浠€涔堟椂鍊欏紑濮嬬爺绌剁嚎鎬�ttention鐨勶紵涓轰粈涔堥€夋嫨杩欐潯鎶€鏈�矾绾匡紵

MiniMax閽熸€$劧锛�鏈€鏃╂槸鍦�2021骞�7鏈堜唤寮€濮嬬爺绌剁嚎鎬�ttention銆傝繖鍏跺疄婧愪簬鎴�2020骞村崥澹�瘯涓氭椂鍋氱殑涓€绡囪�鏂�銆奿nvertible attention銆�锛屽綋鏃跺彲閫嗙�缁忕綉缁滃拰attention鏈哄埗閮芥瘮杈冪伀锛屾垜浠�氨鎶婁袱鑰呯粨鍚堣捣鏉ョ爺绌躲€�

鈻炽€奿nvertible attention銆嬭�鏂�

鍚庢潵锛屾垜浠�洟闃熶腑鏈夋垚鍛樺�鏁板�寰堟劅鍏磋叮锛岃€宭inear attention杩欐牱鐨勯珮鏁堝簭鍒楀缓妯℃柟娉曞�鏁板�瑕佹眰杈冮珮锛岄渶瑕佸緢澶氬叕寮忔帹瀵硷紝姝eソ濂戝悎浜嗗洟闃熺殑鍏磋叮锛屾墍浠ユ垜浠�€夋嫨浜嗚繖涓�柟鍚戙€�

閲忓瓙浣嶏細褰撴椂linear attention鍦ㄨ�涓氬唴鏄�粈涔堢姸鎬侊紵

MiniMax閽熸€$劧锛氬綋鏃跺畠鏄�潪甯搁潪涓绘祦鐨勶紝鍋氱殑浜哄緢灏�锛屽洜涓洪偅鏃跺ぇ閮ㄥ垎鐮旂┒鑰呴兘鍦ㄥ仛transformer銆倀ransformer鍦∟LP棰嗗煙鍩烘湰涓婂凡缁忔湁澶т竴缁熺殑瓒嬪娍銆�

鎴戜滑褰撴椂鎯崇潃锛屼笌鍏剁户缁�仛transformer娉�劧浼椾汉锛屼笉濡傚仛something different銆�

閲忓瓙浣嶏細浣犲�浣曞垽鏂璴inear attention璺�嚎鐨勬妧鏈�綔鍔涳紵

MiniMax閽熸€$劧锛�鎴戜滑鐨勫垵琛峰緢鐩存帴鈥斺€旇В鍐硉ransformer浜屾�璁$畻澶嶆潅搴︾殑闂��銆傚綋鏃舵垜浠�篃娴嬭瘯浜嗗緢澶氭柟娉曪紝鍖呮嫭sparsetransformer鍜宭inear attention銆�

缁撴灉鍙戠幇sparse transformer纭�疄鑳絯ork锛屾樉瀛樺拰閫熷害閮芥瘮transformer蹇�紝鑰宭inear attention鏁堟灉涓嶅ソ锛岄€熷害涔熷緢鎱€€備絾鎴戜滑浠嶉€夋嫨浜唋inear attention銆�

涓€鏂归潰鏄�洜涓哄畠鍦ㄦ暟瀛︿笂寰堟湁鎰忔€濓紝鎴戜滑璁や负瀹冪殑鏁堟灉涓嶅簲璇ヨ繖涔堝樊锛涘彟涓€鏂归潰锛屾垜浠��涓簊parse attention鐨勪笂闄愬氨鏄痜ull attention锛屽畠寰堥毦瓒呰秺锛�鑰宭inear attention杩樻湁瓒呰秺鐨勫彲鑳芥€�銆�

閲忓瓙浣嶏細鑳藉惁浠嬬粛涓€涓嬩粈涔堟槸绾挎€�ttention锛�

MiniMax閽熸€$劧锛�绾挎€�ttention鏈�川涓婃槸涓€涓猭ernel trick銆傚湪transformer涓�紝Q銆並銆乂涓変釜鐭╅樀鐩镐箻鏃讹紝鍥犱负缁村害涓嶅悓锛屽厛涔楺K杩樻槸鍏堜箻KV浼氬�鑷磋�绠楀�鏉傚害涓嶅悓銆�

鍏堜箻KV鍙�互鎶婅�绠楀�鏉傚害鍙樻垚绾挎€э紝浣嗛棶棰樻槸QK鐩镐箻鍚庝細缁忚繃softmax锛岃€宻oftmax涓嶆弧瓒�浜ゆ崲寰�锛屾棤娉曠畝鍗曞湴鎷嗗垎鎴愬厛涔楰V銆傛墍浠�inear attention鐨勭�涓€姝ュ氨鏄��鍘绘帀softmax銆�

浣嗗幓鎺塻oftmax浼氬奖鍝嶇粨鏋滐紝鎺ヤ笅鏉ョ殑浠诲姟灏辨槸鍦ㄥ幓鎺塻oftmax鐨勬儏鍐典笅锛岃�缁撴灉淇濇寔涓€鑷存€э紝杩欏氨鏄痩inear attention瑕佸仛鐨勪簨鎯�銆�

鈻矼iniMax-Text-01鏋舵瀯绀烘剰

閲忓瓙浣嶏細绾挎€ф敞鎰忓姏涓庣█鐤廰ttention銆佺嚎鎬�RNN鏋舵瀯鏈変粈涔堟湰璐ㄥ尯鍒�紵

MiniMax閽熸€$劧锛�绋€鐤廰ttention鏈�川涓婁粛鏄�竴涓猻oftmax attention锛屽彧鏄�畠璁$畻鐨勭偣姣攄ense attention鐭╅樀瑕佸皯锛屾瘮濡俿liding window attention鍙��绠楃獥鍙e唴鐨刟ttention score锛岄€氳繃灏戠畻鏉ヨ揪鍒板姞閫熺洰鐨勩€�

鑰宭inear RNN鍜宭inear attention鏈�川涓婃槸涓€涓�笢瑗匡紝鍙�槸鏈変簺浜烘妸瀹冨彨RNN锛屾湁浜涗汉鎶婂畠鍙玜ttention銆�

鍥犱负鎵€鏈変笢瑗块兘鍙�互鍐欐垚RNN褰㈠紡銆傛瘮濡俵ightning attention瀵瑰簲rwkv4锛岃€宺wkv-7鍏跺疄鏄�敼杩涚増鐨刧ated delta net锛屽畠浠�櫧鐒舵湰璐ㄧ浉浼硷紝浣嗗疄鐜扮粏鑺備笉鍚屻€�

鈻炽€奟WKV-7 鈥淕oose鈥� with Expressive Dynamic State Evolution銆嬭�鏂�

閲忓瓙浣嶏細瀵圭嚎鎬ф敞鎰忓姏鏈哄埗鐨勭爺绌舵湁鍝�簺鍏抽敭鑺傜偣锛�

MiniMax閽熸€$劧锛�鏈€鏃╁ぇ姒傚湪2018-19骞达紝鏈夌爺绌跺彂鐜板彲浠ラ€氳繃kerneltrick闄嶄綆transformer softmax attention鐨勭悊璁鸿�绠楀�鏉傚害锛屼絾褰撴椂鏁堟灉涓嶅ソ锛屾晥鐜囦篃浣庛€�

2019-20骞达紝涓绘祦杩樻槸sparse attention锛�璋锋瓕绛夊叕鍙告彁鍑轰簡寰堝�sparse attention鍙樼�銆備箣鍚�linear attention鎵嶅紑濮嬪嚭鐜帮紝浣嗛潰涓存晥鏋滀笉濂姐€侀€熷害涓嶅揩鐨勫眬闈€€�

鐮旂┒浜哄憳涓昏�閲囧彇涓ゆ潯璺�嚎鏀硅繘锛�涓€鏄�閫氳繃瀵�softmax鍑芥暟鐨勯€艰繎锛岃�鍒嗗竷绗﹀悎softmax锛�浜屾槸鎴戜滑閫夋嫨鐨勮矾绾匡紝涓嶅啀鍏冲績鎬庝箞閫艰繎softmax锛岃€屾槸鐢ㄥ畬鍏ㄤ笉鍚岀殑鏂规硶寤烘ā銆�

鎴戜滑鍦�2021骞�10鏈堝彂琛ㄤ簡绗�竴绡囪�鏂�銆奀OSFORMER : RETHINKING SOFTMAX IN ATTENTION銆�锛岀敤cos鍑芥暟鍙栦唬浜唖oftmax鎿嶄綔锛岃�璁$畻鍙�互鎷嗗垎銆�

2022骞翠笂鍗婂勾锛屾垜浠�彂琛ㄤ簡绗�簩绡�銆奣he Devil in linear transformer銆�锛屽垎鏋愪簡linear attention鏁堟灉鍙樺樊鐨勫師鍥犲苟缁欏嚭瑙e喅鏂规�锛岃繖鏄�lightning attention鐨勫墠韬�銆�

鈻炽€奣he Devil in linear transformer銆嬭�鏂�

鍚庢潵鎴戜滑杩樼爺绌朵簡涓撻棬涓簂inear attention鏈嶅姟鐨勪綅缃�紪鐮侊紝浠ュ強闀垮嵎绉�紝鍙戣〃浜員NN锛屻€奣OEPLITZ NEURAL NETWORK FOR sequenceMODELING銆嬶紝杩欐槸涓嶴4锛坢amba鐨勫墠韬�級绫讳技鐨勬柟娉曘€�

鏈€鍚庢垜浠�帹鍑轰簡lightning attention锛岄€氳繃鏀硅繘decay鏂瑰紡鍜岀綉缁滅粨鏋勶紝鏁堟灉涓妋atch浜唗ransformer锛屽苟閫氳繃鍒嗗潡绠楁硶锛坱iling technique锛変娇閫熷害鏇村揩銆�

閲忓瓙浣嶏細鎬庝箞鐪嬪緟鐩�墠闈瀟ransformer鏋舵瀯鐨勬妧鏈�矾绾匡紵

**閽熸€$劧锛歭inear attention鍏跺疄灏辨槸闈瀟ransformer鐨勬柟娉曘€傞潪transformer鏋舵瀯鐜板湪闄や簡绫籖NN鐨勮矾绾匡紝鍏朵粬璺�嚎閮藉紡寰�簡銆�

姣斿�CNN鍍忛偅涓�暱鍗风Н銆佸ぇ鏍稿嵎绉�紝鏁堟灉涓嶅ソ閫愭笎灏辫�娣樻卑浜嗙殑鎰熻�锛屼笉杩囧湪鏌愪簺鏂归潰鍏跺疄杩樿洰寮猴紝鍦ㄥ簭鍒楀缓妯★紝姣斿�璇村紓甯告�娴嬩换鍔′笂闈㈣繕鏄�湁涓€瀹氭晥鏋滅殑銆�

闈瀟ransformer鏋舵瀯鍏跺疄灏变笁涓�紝涓€涓�槸linear attention锛屼竴涓�槸闀垮嵎绉�锛屼竴涓�槸linear RNN銆�

浣嗗疄闄呬笂杩欎笁涓�兘鍙�互缁熶竴鎴愪竴涓�紝鎴戜滑鎶婂畠鍙�仛linear complexity model**銆傛垜浠�啓浜嗕竴绡囨枃绔犳妸杩欎笁涓�簨鎯呴兘鍥婃嫭鍦ㄤ竴璧蜂簡銆�

鈻炽€奤nlocking the Secrets of linear Complexity Sequence Model from A Unified Perspective銆嬭�鏂�

閲忓瓙浣嶏細lightning attention涓嶮amba銆丷WKV鐨勬牳蹇冨尯鍒�槸浠€涔堬紵

MiniMax閽熸€$劧锛�鏈€鏍稿績鐨勫尯鍒�槸lightning attention鏄�渶绠€鍗曠殑linear attention銆侻amba鍜孯WKV閮戒娇鐢╠ata dependent decay锛岃€宭ightning attention涓轰簡閫熷害锛屼娇鐢ㄧ殑鏄痟andcraft decay锛屽嵆浜轰负鎸囧畾鐨刣ecay銆�

铏界劧鍙��涔犵殑decay鏁堟灉浼氭洿濂戒竴浜涳紝浣�浼氱壓鐗查€熷害銆傛瘮濡俁WKV-7姣攇ating delta net鎱�10-15%锛岃€実ated delta net閫熷害鍙堟瘮lightning attention鎱�竴鍗婂乏鍙炽€�

RWKV鐨勫缓妯℃晥鏋滅‘瀹炴瘮lightning attention濂斤紝浣嗛€熷害鎱�紝涓斾粛鏈�В鍐硆etrieval闂��銆�

閲忓瓙浣嶏細绾挎€ф敞鎰忓姏鐨勪笂闄愰珮涓斿彲琛岋紝鐜板湪鏄��涓氬叡璇嗕簡鍚楋紵

MiniMax閽熸€$劧锛�涓嶆槸锛屽�鏋滄槸鍏辫瘑鐨勮瘽锛屽ぇ瀹堕兘浼氬幓scale up linear attention妯″瀷浜嗐€傝€屼笖鍘荤幇鍦ㄤ篃涓嶆槸鍏辫瘑锛屽�鏋滅幇鍦ㄦ槸鍏辫瘑锛屽ぇ瀹朵篃浼氬叏閮ㄥ仛linear锛屼絾鍙�互鐪嬪埌骞舵病鏈夈€�

浣嗗�鎴戜滑鏉ヨ�锛屽湪23骞翠笅鍗婂勾鐨勬椂鍊欏氨宸茬粡鐪嬪埌浜嗚繖涓€鐐广€傚綋鏃舵垜闂�簡寰堝�浜猴紝璺熷緢澶氫汉鑱婅繃锛屼粬浠�渶甯告彁鍑虹殑鐐规槸浠栦滑鐭ラ亾linear attention鍦ㄥ皬瑙勬ā涓婄‘瀹瀢ork锛�浣嗚�寰椾竴鏃�cale up涓婂幓灏变細涓嶈�銆�

鎴戝綋鏃跺氨鎯抽偅鎴戝氨鎶婂畠scale涓婂幓缁欏ぇ瀹剁湅鐪嬨€傜幇鍦╩inimax-01鍑烘潵涔嬪悗锛屽氨娌′汉鎬€鐤憀inear attention鍦ㄥぇ瑙勬ā涓嬬殑鑳藉姏浜嗐€�

浠庡皬灏濊瘯鍒板ぇ钀藉湴

閲忓瓙浣嶏細浣犺�涓簂inear attention鐨勪笂闄愯兘瓒呰秺full attention鍚楋紵

MiniMax閽熸€$劧锛�鎴戜滑鐜板湪鍙�互鐪嬪埌hybrid鏋舵瀯姣旂函transformer瑕佸ソ銆備絾绾痩inear attention鐨勬渶澶ч棶棰樻槸retrieval鑳藉姏锛岃繖鏄��鏈�晫鐩�墠闅句互瑙e喅鐨勯棶棰樸€�

鐜版湁鏂规硶铏界劧澶嶆潅锛岄€熷害涔熸參锛屼粛鐒舵棤娉曞畬鍏ㄨВ鍐筹紝杩欎篃鏄�负浠€涔堝繀椤昏蛋鍚慼ybrid鏋舵瀯鐨勫師鍥犮€�

閲忓瓙浣嶏細褰撴椂鍐冲畾浠庡疄楠屽�鍑烘潵鏄�洜涓鸿�瀵熷埌浜嗕粈涔堟牱鐨勮妭鐐癸紵

MiniMax閽熸€$劧锛�鍦�2023骞�5-6鏈堜唤锛屾垜浠�唴閮ㄥ凡缁忔湁lightning attention 2锛岃繖鏄�綋鏃朵笘鐣屼笂绗�竴涓�€熷害姣擣lash attention杩樺揩鐨刲inear attention瀹炵幇銆�

鎴戜滑璁や负瀹冨凡缁忚秺杩囦簡宸ヤ笟绾㈢嚎锛屾妧鏈�垚鐔熷害闈炲父楂橈紝鍙�互scale up浜嗐€�

閲忓瓙浣嶏細濡備綍瀹氫箟杩欎釜宸ヤ笟绾㈢嚎锛�

MiniMax閽熸€$劧锛�棣栧厛鏁堟灉涓婃瘮transformer濂斤紝鍏舵�姣攖ransformer蹇�€傝繖鏍峰畠灏卞叿澶囧彇浠�ransformer鐨勮兘鍔涗簡銆傛垜浠�綋鏃跺湪15B瑙勬ā鐨刣ense model涓婇獙璇佷簡杩欎竴鐐广€�

閲忓瓙浣嶏細褰撴椂浠庡疄楠屽�鍑烘潵鐨勮妭鐐逛笂锛屼负浠€涔堟渶缁堝拰MiniMax璧板埌浜嗕竴璧凤紵

MiniMax閽熸€$劧锛�褰撴椂鍏跺疄鍜屼竴浜涘ぇ鍘傞兘鏈夎亰杩囥€備絾鏈€鍚庤繕鏄�拰MiniMax鎶婅繖涓�簨鍋氭垚浜嗐€�

棣栧厛cosformer鏄�垜璺熶繆鏉板悎浣滅殑鏂囩珷锛屾垜浠�箣闂存湁鍚堜綔鐨勫熀纭€锛屼繆鏉颁箣鍓嶅湪鍟嗘堡鐨勬椂鍊欏氨鏄�垜鑰佹澘銆�23骞村簳鐨勬椂鍊欎繆鏉板氨绾︽垜鍚冮キ锛屼粬鏄�瘮杈冪浉淇℃妧鏈�殑杩欎簺鍓嶆部鐨勫彲鑳芥€с€�鎴戠殑鐞嗚В鏄�粬褰撴椂涔熷湪鎵炬妧鏈�獊鐮寸殑鐐�銆�

褰撴椂MiniMax宸茬粡瀹屾垚浜嗗�Moe鐨勭爺绌讹紝涓嬩竴姝ョ殑鎶€鏈�獊鐮寸偣鍏跺疄寰堝皯浜嗐€傚綋鏃秎ightning attention宸茬粡鍙戜簡锛宮amba涔熺伀浜嗭紝鎵€浠ュ湪浠栫溂閲屾槸涓€涓�彲琛岀殑鏂瑰悜銆�

閲忓瓙浣嶏細杩欏拰MiniMax鍋氫簰鍔ㄩ櫔浼翠骇鍝佹湁鍏崇郴鍚楋紵

MiniMax閽熸€$劧锛�娌℃湁浠€涔堝叧鑱旓紝闂�繆鏉版洿鍏冲績鐨勬槸妯″瀷鐨勪笂闄愶紝鎬庝箞鑳藉�杩涗竴姝ョ獊鐮磋繖涓�ぉ鑺辨澘銆�

閲忓瓙浣嶏細linear attention鍦ㄥぇ浼楄�閲庨噷鍙�兘鏇村�鏄�竴涓�獊鐮存晥鐜囩殑鏂瑰悜锛岃€屼笉鏄�獊鐮村ぉ鑺辨澘銆�

MiniMax閽熸€$劧锛�杩欓噷闈㈢殑鐐规槸鍦ㄤ簬锛岄�鍏堟瘡涓�巶鍟嗙殑绠楀姏鏄�亽瀹氱殑锛岃兘鎶婃ā鍨嬪姞閫熷緱瓒婂揩锛岃兘鍚冪殑鏁版嵁灏辫秺澶氾紝浜у嚭鐨勬ā鍨嬪氨瓒婂ソ銆�鍦ㄧ畻鍔涙亽瀹氱殑鎯呭喌涓嬶紝灏辨槸妯″瀷瓒婂揩瓒婂ソ銆�

閲忓瓙浣嶏細鐜板湪鏈夎�瀵熷埌鏁版嵁瑙侀《鐨勬儏鍐靛悧锛�

MiniMax閽熸€$劧锛�鐜板湪杩樻病鏈夊惂銆傛暟鎹�繕鏄�湪涓€鐩磗cale鐨勯樁娈碉紝浣嗗彲鑳戒笉浼氬儚23骞撮偅涔堟縺杩涖€�

鍥犱负鏁版嵁姘歌繙鍦ㄥ�鍔狅紝姣忓ぉ閮戒細鏈夋柊鐨勬暟鎹�嚭鏉ワ紝瀵逛簬妯″瀷鏉ヨ�锛屽畠姣忓ぉ閮芥湁鏂版暟鎹�幓澶勭悊銆備簰鑱旂綉姣忓ぉ鐢熶骇鐨勬暟鎹�氨鏄�湁閭d箞澶氾紝閫氳繃娓呮礂锛屾垜浠�粛鐒惰兘寰楀埌鏂扮殑鏁版嵁鍑烘潵銆�

閲忓瓙浣嶏細鐩告瘮浜庝汉绫诲彂灞曡繖涔堝�骞村凡缁忓瓨鍦ㄧ殑鏁版嵁鏉ヨ�锛屾暟鎹��閫熸斁缂撲簡鍚楋紵

MiniMax閽熸€$劧锛�鍏跺疄涓嶄竴瀹氾紝浣犵湅涓�浗涓婁笅浜斿崈骞寸Н鏀掑嚭鏉ョ殑涔熷氨閭e嚑鏈�功銆備絾闅忕潃浜掕仈缃戠殑鍙戝睍锛屾暟鎹�噺鐨勫�闀挎槸闈炲父闄″抄鐨勪竴涓�洸绾匡紝鍙�兘浜掕仈缃戜箣鍓嶄骇鐢熺殑鏁翠綋鏁版嵁锛屾瘮涓嶄笂涔嬪悗涓€骞翠骇鐢熺殑鏁版嵁銆�

閲忓瓙浣嶏細鍦╯cale up杩囩▼涓�紝lightning attention闈�复浜嗗摢浜涙寫鎴橈紵

MiniMax閽熸€$劧锛�涓轰簡楠岃瘉瀹冪殑鍙�墿灞曟€э紝鎴戜滑棣栧厛鍋氫簡scaling law瀹為獙锛屼粠灏忔ā鍨嬮€愭�鎵╁睍鍒�7B銆�9B锛屾渶鍚巗cale鍒�400澶欱鐨勬ā鍨嬨€�

鑰屼笖鎴戜滑浠庣悊璁轰笂璇佹槑浜唋inear鐨勫�閲忔瘮transformer澶�銆�

鎴戜滑鎶婂�閲忓畾涔変负RNN鐨刢urrent states澶у皬銆傚�transformer鏉ヨ�锛屽�閲忓ぇ灏忔槸O(d)锛宒鏄痵ize锛涘�linear attention鏉ヨ�锛屽�閲忓ぇ灏忔槸d虏/h锛岀敱浜巇杩滃ぇ浜巋锛屾墍浠ュ�閲忔洿澶с€�

鏈€缁堝疄鐜颁笂鎴戜滑涔熼獙璇佷簡hybrid妯″瀷姣旂函transformer鏁堟灉鏇村ソ銆�

閲忓瓙浣嶏細4M闀垮害鐨勫簭鍒楃獥鍙f槸濡備綍瀹炵幇鐨勶紵

MiniMax閽熸€$劧锛�瀵筶ightning鏉ヨ�锛岃�缁冮暱搴﹀彲浠ユ槸浠绘剰鐨勩€傚彧瑕佺畻鍔涙墦婊★紝璁�粌8K銆�32K鎴�128K鐨勯€熷害鏄�竴鏍风殑锛�TGS锛坱oken per GPU per second锛�鏄�浉鍚岀殑銆�

鑰宼ransformer鍥犱负鏄痭虏鐨勮�绠楀�鏉傚害锛宻equence瓒婇暱锛岃�绠楀�鏉傚害澧為暱澶�揩锛宭atency鍛堜簩娆℃洸绾夸笂鍗囥€傚湪1M闀垮害鏃讹紝softmax attention鐨刲atency鏄痩ightning attention鐨�2,700鍊�銆�

閲忓瓙浣嶏細鍚庣画鍋氬埌鏃犻檺涓婁笅鏂囩獥鍙h繕鏈夊摢浜涙妧鏈�寫鎴橀渶瑕佸簲瀵癸紵

MiniMax閽熸€$劧锛�鎴戜滑鐜板湪鐨刪ybrid鏋舵瀯涓�繕鏈�1/8鐨剆oftmax attention锛屽湪1M闀垮害涓嬭繖鏄�摱棰堬紝杩�1/8甯︽潵鐨刲atency杩滈珮浜庡墿涓�7/8鐨刲inear attention銆�

濡傛灉瑕佽繘琛岄暱鏂囨湰浼樺寲锛岃偗瀹氳�鑰冭檻浼樺寲softmax attention閮ㄥ垎锛屽彲浠ュ€熼壌绋€鐤忔敞鎰忓姏鏂瑰紡锛岃�瀹冩洿蹇�€佹洿杞汇€�

鍙﹀�锛屾垜浠�篃鑰冭檻璁﹕oftmax鍜宭inear attention鐨勬贩鍚堟瘮渚嬫洿鏋佺�锛屼笉鍐嶆槸1/8锛屽彲鑳芥槸1/16鎴�1/32銆傛渶婵€杩涚殑鏂规�鏄�暣涓�ā鍨嬪彧鏀句竴灞俿oftmax锛屼絾涓轰簡淇濋櫓鎴戜滑娌℃湁閲囩敤锛屼富瑕佽€冭檻鏄��retrieval鑳藉姏鐨勫奖鍝嶃€�

閲忓瓙浣嶏細涓轰粈涔坮etrieval鑳藉姏瀵规ā鍨嬪�姝ら噸瑕侊紵

MiniMax閽熸€$劧锛�**retrieval鏄痠n-context learning鐨勫熀纭€锛屾槸蹇呰�鏉′欢**銆�

浣犲繀椤昏�浣忎笂涓嬫枃涓�殑淇℃伅鎵嶈兘鍋歩n-context learning锛岃€宨n-context learning鏄�幇鍦ㄦ墍鏈夊ぇ妯″瀷楂橀樁鑳藉姏鐨勫熀纭€锛屾瘮濡�CoT(Chain of Thought)锛岀壒鍒�槸long CoT锛�瀹冧滑閮戒緷璧杛etrieval鑳藉姏銆�

鍐宠儨鏂版灦鏋�

閲忓瓙浣嶏細浣犳湁鍏虫敞鍒拌�涓氬唴锛屽�FFN鍜宎ttention鏈€鏂扮殑鏋舵瀯鏀硅繘鍚楋紵

MiniMax閽熸€$劧锛�FFN鐨勬敼杩涘氨鏄疢oe锛屾垜涔熷叧娉ㄤ簡瀛楄妭鐨刄ltra Mem锛屼絾鎴戣�寰楀畠鏄�竴涓�湁鎹熺殑涓滆タ锛屾槸鏈夋崯鐨勫帇缂╋紝鏈�潵瀹僺cale up涓婂幓鍙�兘浼氭湁闂��锛屼笉杩囨垜浠�病鏈塻cale up锛屾垜鍙�兘璇村畠鍙�兘浼氭湁闂��銆�

鈻炽€奤LTRA-SPARSE MEMORY NETWORK 銆嬭�鏂�

鍥犱负FFN鍩烘湰涓婂氨鏄�繖浜涖€侻oe杩欏潡鎴戜滑鐨勬敼杩涙棤澶栦箮浠庝箣鍓嶇殑澶т笓瀹舵敼鎴愮幇鍦ㄧ殑灏忎笓瀹舵ā寮忥紝璁╁畠鍙樺緱鏇村姞绋€鐤忥紝鐒跺悗鍐嶅線涓嬪仛涓€浜涘姞閫燂紝杩橀渶瑕佽繘涓€姝ョ爺绌躲€�

鍐嶅�瀹冭繘琛屼紭鍖栫殑璇濓紝鍥犱负FFN灏辨槸鐭╅樀涔樻硶浜嗭紝浼樺寲灏卞彧鑳藉儚Nvidia浠栦滑鍦�CUDA灞傞潰涓婂仛涓€浜涚煩闃典箻娉曠殑鏈€搴曞眰浼樺寲銆�

閲忓瓙浣嶏細鏈夊叧娉ㄥ埌琛屼笟鍐呭�attention鏋舵瀯鏂归潰鐨勬敼杩涘悧锛�

MiniMax閽熸€$劧锛�attention涓婄殑鏀硅繘鍩烘湰涓婂氨鏄痩inear銆傛垜浠�篃鍦ㄨ€冭檻鏈�潵浼氫笉浼氬仛涓€涓�洿寮虹殑Linear锛屽湪鐩�墠鍩虹�涓婏紝鎶奓inear attention鍋氳繘涓€姝ュ姞閫�

鏀硅繘鏂瑰悜鏈夊緢澶氱�鏂规�锛屼竴涓�槸鏀筪ecay锛岃繕鏈夊氨鏄�敼閲岄潰鐨勪竴浜涘皬trick锛�鍏蜂綋鍙�互鏈熷緟鎴戜滑鐨勬柊paper銆�

閲忓瓙浣嶏細鍜变滑鐩�墠鐨勪笂涓嬫枃闀垮害鍜屾帹鐞嗘垚鏈�殑杩欎釜姣旂巼绠楁槸姣旇緝鍏堣繘鍚楋紵

MiniMax閽熸€$劧锛�**涓€鏃︾壍娑夊埌鎶妔equence length鎷夐暱鐨勮瘽锛屾垜浠�槸鏈夊緢鏄庢樉鐨�绠楀姏鎴愭湰浼樺娍**锛岃秺闀匡紝鎴愭湰浼樺娍浼氳秺鏄庢樉锛屾棤璁烘槸鎺ㄧ悊杩樻槸璁�粌銆�

姣斿�璇村湪1M涓婏紝linear attention鎵€娑堣€楃殑绠楀姏鏄痜ull attention鐨�1/2700銆傜浉姣斾箣涓嬶紝鍥犱负鎴戜滑浠嶇劧鏈�1/8鐨刦ull attention锛岄偅鍩烘湰涓婂氨鏄�畠灏辨槸transformer鏋舵瀯鐨�1/8锛屽洜涓簂inear attention鍩烘湰涓婁笉绠楀紑閿€浜嗭紝鍩烘湰娌℃湁寮€閿€銆�

鈻砽inear attention澶勭悊闀胯緭鍏ユ晥鐜囧拰鍏ㄧ悆椤跺皷妯″瀷瀵规瘮

閲忓瓙浣嶏細璁$畻寮€閿€杩欎箞浣庣殑璇濊兘瀹炵幇璁$畻鐡堕�鍚楋紵

MiniMax閽熸€$劧锛�鐜板湪纭�疄鏄��瀛樼摱棰堬紝decoding鐨勬椂鍊欐槸璁垮瓨鐡堕�锛岃€屼笉鏄��绠楃摱棰堛€傚洜涓簂ightning寰堝揩锛屽疄鍦ㄥお蹇�簡锛屾病鏈夊姙娉曡�璁垮瓨涔熷儚璁$畻鍗犵敤涓€鏍峰皯鐨勮祫婧愩€�涓昏�鏄�洜涓哄疄闄呭簲鐢ㄤ腑鐨勫簭鍒楅暱搴﹂兘涓嶅�闀�銆�

鏈�潵濡備綍璁╁畠鎴愪负璁$畻鐡堕�锛岄偅灏辨槸鐪嬫€庝箞鏍峰幓浼樺寲璁垮瓨浜嗐€傝繖浜涗細鏄�伐绋嬮偅杈归渶瑕佽礋璐g殑浜嬫儏銆�

閲忓瓙浣嶏細濡傛灉绾挎€ф敞鎰忓姏鎴愪负涓嬩竴浠d富娴佹灦鏋勪簡锛屼粈涔堟牱鐨勭‖浠堕€傞厤鏀硅繘浼氭洿閫傚悎瀹冨憿锛�

MiniMax閽熸€$劧锛�杩欓噷闈㈤潪甯竧ricky鐨勪竴浠朵簨鎯呭氨鏄�紝鎴戜滑闇€瑕佽€冭檻鐨勬槸搴忓垪闀垮害銆傚�鏋滀綘鐨勫簭鍒楅暱搴﹀叧娉ㄤ簬8K銆�32K锛岄偅涔坅ttention鎬诲叡涔熷氨鍗犳瘮鐧惧垎涔嬪崄鍑狅紝鍓╀笅鐨勭櫨鍒嗕箣鍏�崄鍑犻兘鏄�悗闈㈢殑FFN閮ㄥ垎銆�

鍗充娇浣犳妸attention鍏ㄩ儴浼樺寲鍒版瀬鑷达紝鍒颁簡0锛屼綘涔熷彧浼樺寲浜嗙櫨鍒嗕箣鍗佸嚑鐨勬椂寤躲€備絾濡傛灉鎶婂簭鍒楅暱搴︽媺闀跨殑璇濓紝attention鐨勫崰姣斿氨浼氳秺鏉ヨ秺澶э紝杩欐槸鐩告瘮浜巉ull attention鏉ヨ�锛屼絾瀵筶inear attention鏉ヨ�锛屽畠鐨勫崰姣旀槸涓嶅彉鐨勩€�

鍥犱负FFN涔熸槸绾挎€х殑锛宭inear attention涔熸槸绾挎€х殑锛�瀹冪殑鍗犳瘮澶ф�鏄�10%宸﹀彸锛岃繖涓�槸鍑犱箮涓嶅彉鐨�锛屽嵆浣垮湪1M鎯呭喌涓嬪畠涔熸槸鐧惧垎涔嬪崄鍑犵殑鍗犳瘮銆�

浣嗗�鏋滄槸full attention鐨勮瘽锛宎ttention璁$畻鍙�兘灏卞崰浜嗙櫨鍒嗕箣99锛屽悗闈㈢殑FFN鍙�崰浜嗙櫨鍒嗕箣1浜嗐€傛墍浠�inear attention鍙�細鍦ㄩ暱鏂囦笂鏈変紭鍔裤€�

濡傛灉绾挎€ф灦鏋勬垚涓轰富娴佺殑璇濓紝鍚庨潰鍙�兘灏辨槸杩芥眰浣庤兘鑰楃殑纭�欢锛屽彧鑳�鎶婅兘鑰楅檷浣�銆傚寘鎷�鑴夊啿绁炵粡缃戠粶鑺�墖锛圫piking Neural Network, SNN锛夊彲鑳戒細鏇撮€傚悎锛屽叾瀹炰篃鏈変汉鍦ㄥ仛銆�

鈻宠剦鍐茬�缁忕綉缁滆姱鐗囩ず鎰� 灞曟湜AGI涔嬭矾

閲忓瓙浣嶏細瀵规ā鍨嬪紑婧愭晥鏋滄湁鍝�簺鏈熷緟鍛�紵

MiniMax閽熸€$劧锛�棣栧厛鏄��浼犱笂鐨勬晥鏋溿€傛垜涓�汉瑙夊緱寮€婧愰櫎浜嗗睍绀轰竴浜涜倢鑲変互澶栵紝鏈€閲嶈�鐨勮繕鏄�湅澶у�鍚庣画鎬庝箞鑳藉�鐢ㄨ捣鏉ワ紝鎴戣�寰�灏忔ā鍨嬪紑婧愬彲鑳芥槸鏈�潵鎴戜滑姣旇緝鑰冭檻鍋氱殑銆�

杩樻湁鎬庝箞璁╁ぇ瀹惰兘澶焒inetune鐨勪竴浜涘熀寤哄仛璧锋潵锛屽彲鑳戒篃鏄�渶瑕佽€冭檻鐨勩€�寮€婧愭槸鎴戜滑浠ュ悗闀挎湡鐨勪簨鎯咃紝涔嬪悗鏃楄埌妯″瀷搴旇�浼氭寔缁�紑婧�銆�

閲忓瓙浣嶏細鏈�潵闈瀐ybrid鐨勬煇涓�函琛€鏋舵瀯鏈夎窇鍑烘潵鐨勫彲鑳藉悧锛�

MiniMax閽熸€$劧锛氱洰鍓嶆病鏈夋柟娉曡兘姣攈ybrid鍋氬緱鏇村ソ锛岀壒鍒�槸鍦ㄩ€熷害鏂归潰銆傚姞鍏ヤ竴灏忛儴鍒唖oftmax attention锛屽湪搴忓垪闀垮害涓嶆槸鐗瑰埆闀跨殑鎯呭喌涓嬶紝閫熷害浼樺娍闈炲父鏄庢樉锛岀壒鍒�槸flash attention鍑虹幇鍚庛€�

绾��鏋舵瀯鐨勭爺绌朵粛鍦ㄨ繘琛岋紝浣嗛毦搴﹀緢澶э紝宸茬粡娌℃湁浣庡瀭鐨勬灉瀹炰簡銆傛垜浠�湁涓€浜涙妧鏈�柟妗堬紝浣嗗疄鐜伴兘涓嶇畝鍗曪紝鏈€缁堝彇鍐充簬鎴戜滑闇€瑕佸仛鍒板�闀跨殑搴忓垪闀垮害銆�

鍙︿竴涓�棶棰樻槸锛�瓒呴暱鏂囨湰鏄�惁鏈夊己鐑堢殑鍒氶渶锛�铏界劧鍍廋laude绛夋ā鍨嬪凡杈惧埌200K涓婁笅鏂囷紝浣嗙敤鎴蜂技涔庡�褰撳墠宸叉湁闀垮害涔熷緢婊℃剰銆傛湭鏉�agent搴旂敤鍙�兘浼氬甫鏉ュ�瓒呴暱搴忓垪鐨勯渶姹傦紝浣嗙洰鍓嶈繕娌℃湁鎴愮啛鐨刡enchmark銆�

浣嗘垜瑙夊緱杩欎釜闂��灏卞儚Nvidia浼氫负鏈�潵鐨勬父鎴忓紑鍙戣秴鍓嶆€ц兘鐨勬樉鍗′竴鏍凤紝铏界劧鐜板湪杩樼敤涓嶄笂锛屼絾杩欐槸闈㈠悜鏈�潵鐨勬妧鏈�€�

姣斿�deep research闇€瑕佹ā鍨嬭�鍙栧嚑鍗佷釜缃戠珯鐨勫唴瀹癸紝澶勭悊鏃堕棿鍦ㄥ嚑鍗佸垎閽熺骇鍒�紝杩欏彲鑳芥槸闀挎枃鏈�殑涓€涓�簲鐢ㄦ柟鍚戙€�

閲忓瓙浣嶏細浣犺�寰桟oT涔嬪悗鐨勪笅涓€涓�ぇ浜嬫儏鍙�兘浼氭槸浠€涔堝憿锛�

MiniMax閽熸€$劧锛�杩欎釜鎴戜滑鎯宠繃锛岄�鍏堢幇鍦ㄧ殑reasoning model鏄�瘮杈冪伀鐨勶紝浠婂勾鐨勪富娴佽繕浼氭槸reasoning杩欎竴鍧椼€備箣鍚庣殑璇濓紝鎴戜滑寰堥毦鎯冲埌绾��瑷€妯″瀷鏈�潵杩樻湁浠€涔堢壒鍒�ぇ鐨勫彉闈┿€�

鎴戜篃璺熷埆鐨勮€佸笀鑱婅繃锛�浠栦滑鐨勬劅瑙夋槸澶у�浼氬幓閲嶆柊鍑忓皯妯″瀷寮€閿€锛屽氨璁﹔easoning鐨勯€熷害瓒婃潵瓒婂揩锛岃�瀹冪殑浠锋牸鍙樺緱瓒婃潵瓒婁綆锛屽湪缁存寔鏁堟灉鐨勬儏鍐典笅鎶婃垚鏈�線涓嬪帇銆�

鍥犱负澶╄姳鏉垮緢蹇�氨鎺ヨ繎浜嗭紝鐜板湪缁濆ぇ澶氭暟鐨勬儏鍐甸兘鏄�湪瀵瑰ぇ妯″瀷鑳藉姏杩涜�鏌ユ紡琛ョ己銆備絾濡傛灉璇磋繕鏈夋洿澶х殑鎶€鏈�獊鐮达紝鐭�湡鍐呭彲鑳芥瘮杈冨皯瑙侊紝鎴戜滑杩樻病鐪嬪埌銆�

閲忓瓙浣嶏細MiniMax鍦ㄦ帰绱�簡绾挎€ф敞鎰忓姏涔嬪悗锛屼笅涓€涓�彲鑳芥帰绱㈢殑鏂瑰悜鏄�粈涔堝憿锛�

MiniMax閽熸€$劧锛�涓嬩竴涓�彲鑳芥槸鍘绘帰绱㈠�妯℃€佺殑鏋舵瀯锛屽叿浣撴寚鐨勬槸鎴戜滑瑕佷笉瑕佸仛杩欑�鍘熺敓鐨勭敓鎴愮悊瑙g粺涓€澶фā鍨嬬殑鏋舵瀯銆�

閲忓瓙浣嶏細浠�GI涓虹粓鐐癸紝璁$畻澶嶆潅搴�(n虏)杩樻槸O(n)鐨勬ā鍨嬩細鏄�洿濂界殑绛旀�锛�

MiniMax閽熸€$劧锛�閭e綋鐒舵槸O(n)浜嗐€備粠鎷熶汉鍖栨潵璇达紝浜鸿偗瀹氭槸O(n)澶嶆潅搴︾殑銆傚氨姣斿�璇存墦涓�瘮鏂癸紝濡傛灉浜虹殑澶嶆潅搴︽槸O(n虏)锛岄偅涔堟垜璺熶綘璇磋瘽鐨勯€熷害浼氬彉寰楄秺鏉ヨ秺鎱€€�

鍥犱负瀵箃ransformer鏉ヨ�锛屽畠鐨刬nference鐨刢omplexity鏄疧(n虏)鐨勮�绠楀�鏉傚害锛屼篃灏辨槸鎴戝悙绗�竴涓猼oken鍜屽悙绗�100涓猼oken鐨勬椂寤舵槸涓嶄竴鏍风殑銆�

鎴戜滑浜虹被鏃犳硶鎯宠薄杩欐牱鐨勪簨鎯咃紝鍥犱负浜轰粠闄嶇敓涓嬫潵涔嬪悗鎬绘病鏈夐噸鍚�繃锛屾槸涓€鐩村湪鍚愪笢瑗跨殑锛屾墍浠�浜虹殑璁$畻澶嶆潅搴﹀氨鏄�亽瀹氱殑銆�

閲忓瓙浣嶏細浜轰竴瀹氭槸鏅鸿兘鐨勬渶浼樿В鍚楋紵