{"componentChunkName":"component---src-templates-info-posts-js","path":"/information/AI/8bfa88f8-39d9-50c7-a6a8-c1e7795341f7","result":{"data":{"microcmsInformation":{"id":"8bfa88f8-39d9-50c7-a6a8-c1e7795341f7","title":"Gemma 4 徹底解説──Google DeepMind発オープンウェイトLLM/VLMの全体像と導入ガイド","date":"2026 年 04 月 04 日","image":{"url":"https://images.microcms-assets.io/assets/52137c02cafa4450bbdc092b64fbadac/8bad1975bebc42efbf5c01ac2eff98c5/gemma-4-header.png"},"author":{"author":"Goubara"},"body":"<h1 id=\"h6dfe3f55e2\">Gemma 4 徹底解説──Google DeepMind発オープンウェイトLLM/VLMの全体像と導入ガイド</h1><p>Google DeepMindは2026年4月2日に、新らしいオープンウェイトモデルファミリー <strong>Gemma 4</strong> を発表しました。Gemma 4 は <strong>Apache 2.0 ライセンス</strong>で提供され、推論、エージェント型ワークフロー、コーディング、マルチモーダル理解を重視したモデル群です。Googleは公式に、Gemma 4 を「Gemini 3 と同じ研究・技術を基盤にしたオープンモデル」と位置づけています。</p><h2 id=\"hbbe87573b8\">Gemma 4 とは何か──定義とできること</h2><p>Gemma 4 は、Google DeepMindが開発した<strong>オープンウェイト（重みを公開）のLLM/VLMファミリー</strong>です。Geminiの技術をベースに、推論・エージェント型ワークフロー・コーディング・マルチモーダル理解に強みを持ちます。</p><p>主な特徴を整理すると以下のとおりです。</p><ul><li><strong>ライセンス:</strong> Apache 2.0（商用利用・改変・再配布が自由）</li><li><strong>入出力:</strong> テキスト＋画像入力 → テキスト出力。小型モデル（E2B/E4B）は音声入力にも対応</li><li><strong>コンテキスト長:</strong> E2B/E4Bは最大128Kトークン、26B-A4B/31Bは最大256Kトークン</li><li><strong>多言語:</strong> 140以上の言語で事前学習、35以上の言語を標準サポート</li><li><strong>思考モード:</strong> 設定可能なthinkingモードを搭載し、高度な推論タスクに対応</li></ul><p>Gemma 3まで採用されていた独自の「Gemma Terms of Use」から Apache 2.0 へ移行した点は、企業導入の観点で大きな変更点です。</p><h2 id=\"h8353926643\">モデル構成──4つのバリアントを理解する</h2><p>Gemma 4 は用途と実行環境に応じた4サイズで提供されます。</p><table><tbody><tr><th colspan=\"1\" rowspan=\"1\"><p>バリアント</p></th><th colspan=\"1\" rowspan=\"1\"><p>アーキテクチャ</p></th><th colspan=\"1\" rowspan=\"1\"><p>総パラメータ</p></th><th colspan=\"1\" rowspan=\"1\"><p>推論時アクティブ</p></th><th colspan=\"1\" rowspan=\"1\"><p>コンテキスト</p></th><th colspan=\"1\" rowspan=\"1\"><p>音声入力</p></th></tr><tr><td colspan=\"1\" rowspan=\"1\"><p><strong>E2B</strong></p></td><td colspan=\"1\" rowspan=\"1\"><p>Dense + PLE</p></td><td colspan=\"1\" rowspan=\"1\"><p>約5.1B</p></td><td colspan=\"1\" rowspan=\"1\"><p>約2.3B相当</p></td><td colspan=\"1\" rowspan=\"1\"><p>128K</p></td><td colspan=\"1\" rowspan=\"1\"><p>○</p></td></tr><tr><td colspan=\"1\" rowspan=\"1\"><p><strong>E4B</strong></p></td><td colspan=\"1\" rowspan=\"1\"><p>Dense + PLE</p></td><td colspan=\"1\" rowspan=\"1\"><p>—</p></td><td colspan=\"1\" rowspan=\"1\"><p>約4B相当</p></td><td colspan=\"1\" rowspan=\"1\"><p>128K</p></td><td colspan=\"1\" rowspan=\"1\"><p>○</p></td></tr><tr><td colspan=\"1\" rowspan=\"1\"><p><strong>26B-A4B</strong></p></td><td colspan=\"1\" rowspan=\"1\"><p>MoE（128専門家）</p></td><td colspan=\"1\" rowspan=\"1\"><p>26B</p></td><td colspan=\"1\" rowspan=\"1\"><p>約3.8B</p></td><td colspan=\"1\" rowspan=\"1\"><p>256K</p></td><td colspan=\"1\" rowspan=\"1\"><p>✕</p></td></tr><tr><td colspan=\"1\" rowspan=\"1\"><p><strong>31B</strong></p></td><td colspan=\"1\" rowspan=\"1\"><p>Dense</p></td><td colspan=\"1\" rowspan=\"1\"><p>31B</p></td><td colspan=\"1\" rowspan=\"1\"><p>31B</p></td><td colspan=\"1\" rowspan=\"1\"><p>256K</p></td><td colspan=\"1\" rowspan=\"1\"><p>✕</p></td></tr></tbody></table><p><strong>「E」は Effective（実効）パラメータ</strong>の意味です。Per-Layer Embeddings（PLE）という技術により、実際のパラメータ数より少ないメモリで、より大きなモデルに匹敵する表現力を実現しています。E2Bの場合、量子化すれば約1.5GB以下のメモリに収まるとされ、スマートフォンでの実行も視野に入ります。</p><p>26B-A4Bは<strong>Mixture of Experts（MoE）</strong>で、128の小型エキスパートのうち8＋1共有エキスパートのみをトークンごとにアクティブにします。推論時に3.8Bパラメータしか使わないため、レイテンシ重視の用途に向いています。31B Denseは品質最重視のバリアントで、ファインチューニングのベースとしても有力です。</p><h2 id=\"hbdab032353\">アーキテクチャの要点──PLE・ハイブリッドアテンション・p-RoPE</h2><p>Gemma 4 のアーキテクチャには、以下の3つの主要な技術革新があります。</p><h3 id=\"h67cfb8d46e\">Per-Layer Embeddings（PLE）</h3><p>PLEは各デコーダ層に<strong>小さな残差信号を供給する第2の埋め込みテーブル</strong>です。トークンID成分とコンテキスト認識成分を組み合わせたベクトルを各層ごとに生成し、アテンションとフィードフォワードの後に軽量な残差ブロックで隠れ状態を調整します。これにより、小型モデルでも大型モデル並みの表現深度を実現しています。</p><h3 id=\"h69cc5ff577\">ハイブリッドアテンション</h3><p>ローカルスライディングウィンドウアテンション（小型モデル512トークン、大型モデル1024トークン）とグローバルフルコンテキストアテンションを<strong>層ごとに交互に配置</strong>します。最終層は常にグローバルアテンションとし、長文脈の深い理解と低メモリフットプリントを両立しています。</p><h3 id=\"h611214b84f\">Proportional RoPE（p-RoPE）</h3><p>スライディングウィンドウ層には標準RoPE、グローバル層にはProportional RoPEを適用する<strong>デュアルRoPE構成</strong>です。グローバル層ではKey/Valueを統合しており、長いコンテキストを効率的に処理できます。</p><h2 id=\"h16cc2cc6c3\">性能指標──Gemma 3からの進化</h2><p>公式ブログおよびモデルカードで公開されている主な指標を以下にまとめます。</p><table><tbody><tr><th colspan=\"1\" rowspan=\"1\"><p>ベンチマーク</p></th><th colspan=\"1\" rowspan=\"1\"><p>31B Dense</p></th><th colspan=\"1\" rowspan=\"1\"><p>備考</p></th></tr><tr><td colspan=\"1\" rowspan=\"1\"><p>MMLU-Pro</p></td><td colspan=\"1\" rowspan=\"1\"><p>85.2%</p></td><td colspan=\"1\" rowspan=\"1\"><p>Qwen 3.5 27Bを上回るとされる</p></td></tr><tr><td colspan=\"1\" rowspan=\"1\"><p>AIME 2026</p></td><td colspan=\"1\" rowspan=\"1\"><p>89.2%</p></td><td colspan=\"1\" rowspan=\"1\"><p>数学推論</p></td></tr><tr><td colspan=\"1\" rowspan=\"1\"><p>LiveCodeBench v6</p></td><td colspan=\"1\" rowspan=\"1\"><p>80.0%</p></td><td colspan=\"1\" rowspan=\"1\"><p>コーディング</p></td></tr><tr><td colspan=\"1\" rowspan=\"1\"><p>Codeforces ELO</p></td><td colspan=\"1\" rowspan=\"1\"><p>2150</p></td><td colspan=\"1\" rowspan=\"1\"><p>より大型のモデルと競合</p></td></tr></tbody></table><p>※ 上記スコアは公式ブログや関連記事で報告された数値です。評価条件（プロンプト設定・思考モードの有無など）によって変動する可能性があるため、導入判断時は<a href=\"https://ai.google.dev/gemma/docs/core/model_card_4\" target=\"_blank\" rel=\"noopener noreferrer\">公式モデルカード</a>の詳細を確認してください。</p><p>Gemma 3との主な違いは以下のとおりです。</p><ul><li><strong>推論・数学:</strong> AIME等の高度な推論ベンチマークで大幅な改善</li><li><strong>コーディング:</strong> ネイティブなfunction calling対応を含め、コード系タスクが向上</li><li><strong>安全性:</strong> テキスト→テキスト、画像→テキストの両方で、ポリシー違反が大幅に減少しつつ不当な拒否率も低く抑えられている</li><li><strong>ライセンス:</strong> Gemma独自ライセンスからApache 2.0へ変更</li><li><strong>音声入力:</strong> E2B/E4Bで自動音声認識（ASR）・音声翻訳に対応（Gemma 3にはなかった機能）</li><li><strong>Arenaランキング:</strong> 31Bがオープンモデル世界第3位、26Bが第6位（Arena AI テキストリーダーボード、2026年4月1日公式ブログ発表時点）</li></ul><h2 id=\"h910a6f3a9c\">入手・実行方法──主要プラットフォーム別ガイド</h2><p>Gemma 4 は複数のプラットフォームから入手・実行できます。</p><h3 id=\"h8cefb065a2\">Hugging Face</h3><p>各バリアントのモデルページ（例: <a href=\"https://huggingface.co/google/gemma-4-31B\" target=\"_blank\" rel=\"noopener noreferrer\">google/gemma-4-31B</a>）からダウンロード可能です。itサフィックス付きがInstruction-tunedモデルです。</p><pre><code>pip install -U transformers torch\nfrom transformers import pipeline\npipe = pipeline(&quot;text-generation&quot;, model=&quot;google/gemma-4-31B-it&quot;)\n</code></pre><h3 id=\"h5dbc8ded28\">Ollama</h3><p><a href=\"https://ollama.com/library/gemma4\" target=\"_blank\" rel=\"noopener noreferrer\">Ollama</a>でローカル実行が可能です。Instruction-tunedモデルが利用できます。</p><pre><code>ollama run gemma4</code></pre><h3 id=\"hfde792250c\">Kaggle</h3><p>Kaggle Modelsページからノートブック環境で直接試せます。GPU付き環境が無料枠で利用可能です。</p><h3 id=\"h2ce8c90aec\">Google AI Studio / Gemini API</h3><p><a href=\"https://ai.google.dev/gemma\" target=\"_blank\" rel=\"noopener noreferrer\">Google AI for Developers</a>経由でAPI呼び出しも可能です。ローカルにGPUがない場合の選択肢になります。</p><h3 id=\"h0338a89506\">その他の対応フレームワーク</h3><p>LM Studio、Gemma.cpp、LiteRT-LM、llama.cpp、MediaPipe、MLX、Transformers、PyTorch、Kerasでも動作が確認されています。量子化（GGUF等）版はUnslothやコミュニティから提供されており、VRAM制約のある環境での実行に役立ちます。</p><h2 id=\"h1271628ce7\">注意点と制約</h2><ul><li><strong>出力はテキストのみ:</strong> 画像や音声の生成には対応していません。入力がマルチモーダルでも、出力はテキストに限定されます</li><li><strong>音声入力は小型モデル限定:</strong> ASR・音声翻訳はE2B/E4Bのみ。26B-A4B/31Bでは音声入力を利用できません</li><li><strong>GPU要件:</strong> 31B Denseをfp16で実行するには約60GB以上のVRAMが必要です。量子化やMoEモデル（26B-A4B）の選択で対処できますが、精度とのトレードオフを考慮してください</li><li><strong>ファインチューニング:</strong> 31B Denseはファインチューニングのベースとして推奨されています。MoEモデルのファインチューニングは構造上やや難易度が高い点に留意が必要です</li><li><strong>安全性:</strong> Gemma 3から改善されているものの、LLM共通の課題（ハルシネーション、バイアス等）は依然として存在します。本番環境ではガードレールの導入を推奨します</li><li><strong>ベンチマークの解釈:</strong> 公開スコアは特定の評価条件下の結果です。自社タスクでの性能は独自に検証する必要があります</li></ul><h2 id=\"ha214098e44\">まとめ</h2><p>Gemma 4 は、Google DeepMind が公開した最新のオープンウェイトモデルファミリーで、<strong>Apache 2.0</strong>、<strong>128K～256K の長文脈</strong>、<strong>thinking モード</strong>、<strong>function calling</strong>、<strong>画像・動画理解</strong>、そして <strong>E2B/E4B の音声入力</strong>が大きな特徴です。小型モデルはエッジやモバイル寄り、大型モデルは PC やワークステーション寄りと役割が明確で、ローカル実行から hosted API まで選択肢が広い点も魅力です。Gemma 4 は、オンデバイスAIとローカル推論を本格的に検討したい開発者にとって、有力な候補と言えます。</p><h2 id=\"h03e7af0d39\">参考リンク</h2><ul><li><a href=\"https://blog.google/innovation-and-ai/technology/developers-tools/gemma-4/\" target=\"_blank\" rel=\"noopener noreferrer\">Gemma 4: Byte for byte, the most capable open models – Google Blog（公式発表）</a></li><li><a href=\"https://ai.google.dev/gemma/docs/core/model_card_4\" target=\"_blank\" rel=\"noopener noreferrer\">Gemma 4 model card – Google AI for Developers（公式モデルカード）</a></li><li><a href=\"https://developers.googleblog.com/bring-state-of-the-art-agentic-skills-to-the-edge-with-gemma-4/\" target=\"_blank\" rel=\"noopener noreferrer\">Bring state-of-the-art agentic skills to the edge with Gemma 4 – Google Developers Blog</a></li><li><a href=\"https://huggingface.co/blog/gemma4\" target=\"_blank\" rel=\"noopener noreferrer\">Welcome Gemma 4: Frontier multimodal intelligence on device – Hugging Face Blog</a></li><li><a href=\"https://deepmind.google/models/gemma/gemma-4/\" target=\"_blank\" rel=\"noopener noreferrer\">Gemma 4 – Google DeepMind</a></li><li><a href=\"https://huggingface.co/google/gemma-4-31B\" target=\"_blank\" rel=\"noopener noreferrer\">google/gemma-4-31B – Hugging Face</a></li></ul>","category":{"category":"AI"}}},"pageContext":{"id":"8bfa88f8-39d9-50c7-a6a8-c1e7795341f7"}},"staticQueryHashes":["3649515864","63159454"]}