<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
    <channel>
        <title>Open Access on Producthunt daily</title>
        <link>https://producthunt.programnotes.cn/en/tags/open-access/</link>
        <description>Recent content in Open Access on Producthunt daily</description>
        <generator>Hugo -- gohugo.io</generator>
        <language>en</language>
        <lastBuildDate>Tue, 08 Apr 2025 15:27:36 +0800</lastBuildDate><atom:link href="https://producthunt.programnotes.cn/en/tags/open-access/index.xml" rel="self" type="application/rss+xml" /><item>
        <title>llama-models</title>
        <link>https://producthunt.programnotes.cn/en/p/llama-models/</link>
        <pubDate>Tue, 08 Apr 2025 15:27:36 +0800</pubDate>
        
        <guid>https://producthunt.programnotes.cn/en/p/llama-models/</guid>
        <description>&lt;img src="https://images.unsplash.com/photo-1612866510806-a9a9f7abd246?ixid=M3w0NjAwMjJ8MHwxfHJhbmRvbXx8fHx8fHx8fDE3NDQwOTcyNDB8&amp;ixlib=rb-4.0.3" alt="Featured image of post llama-models" /&gt;&lt;h1 id=&#34;meta-llamallama-models&#34;&gt;&lt;a class=&#34;link&#34; href=&#34;https://github.com/meta-llama/llama-models&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;meta-llama/llama-models&lt;/a&gt;
&lt;/h1&gt;&lt;p align=&#34;center&#34;&gt;
&lt;/p&gt;
&lt;p align=&#34;center&#34;&gt;
        🤗 &lt;a href=&#34;https://huggingface.co/meta-Llama&#34;&gt; Models on Hugging Face&lt;/a&gt;&amp;nbsp | &lt;a href=&#34;https://ai.meta.com/blog/&#34;&gt; Blog&lt;/a&gt;&amp;nbsp |  &lt;a href=&#34;https://llama.meta.com/&#34;&gt;Website&lt;/a&gt;&amp;nbsp | &lt;a href=&#34;https://llama.meta.com/get-started/&#34;&gt;Get Started&lt;/a&gt;&amp;nbsp | &lt;a href=&#34;https://github.com/meta-llama/llama-cookbook&#34;&gt;Llama Cookbook&lt;/a&gt;&amp;nbsp
&lt;br&gt;
&lt;hr&gt;
&lt;h1 id=&#34;llama-models&#34;&gt;Llama Models
&lt;/h1&gt;&lt;p&gt;Llama is an accessible, open large language model (LLM) designed for developers, researchers, and businesses to build, experiment, and responsibly scale their generative AI ideas. Part of a foundational system, it serves as a bedrock for innovation in the global community. A few key aspects:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Open access&lt;/strong&gt;: Easy accessibility to cutting-edge large language models, fostering collaboration and advancements among developers, researchers, and organizations&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Broad ecosystem&lt;/strong&gt;: Llama models have been downloaded hundreds of millions of times, there are thousands of community projects built on Llama and platform support is broad from cloud providers to startups - the world is building with Llama!&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Trust &amp;amp; safety&lt;/strong&gt;: Llama models are part of a comprehensive approach to trust and safety, releasing models and tools that are designed to enable community collaboration and encourage the standardization of the development and usage of trust and safety tools for generative AI&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Our mission is to empower individuals and industry through this opportunity while fostering an environment of discovery and ethical AI advancements. The model weights are licensed for researchers and commercial entities, upholding the principles of openness.&lt;/p&gt;
&lt;h2 id=&#34;llama-models-1&#34;&gt;Llama Models
&lt;/h2&gt;&lt;p&gt;&lt;a class=&#34;link&#34; href=&#34;https://pypi.org/project/llama-models/&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;img src=&#34;https://img.shields.io/pypi/dm/llama-models&#34;
	
	
	
	loading=&#34;lazy&#34;
	
		alt=&#34;PyPI - Downloads&#34;
	
	
&gt;&lt;/a&gt;
&lt;a class=&#34;link&#34; href=&#34;https://discord.gg/TZAAYNVtrU&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;img src=&#34;https://img.shields.io/discord/1257833999603335178&#34;
	
	
	
	loading=&#34;lazy&#34;
	
		alt=&#34;Discord&#34;
	
	
&gt;&lt;/a&gt;&lt;/p&gt;
&lt;table&gt;
  &lt;thead&gt;
      &lt;tr&gt;
          &lt;th style=&#34;text-align: center&#34;&gt;&lt;strong&gt;Model&lt;/strong&gt;&lt;/th&gt;
          &lt;th style=&#34;text-align: center&#34;&gt;&lt;strong&gt;Launch date&lt;/strong&gt;&lt;/th&gt;
          &lt;th style=&#34;text-align: center&#34;&gt;&lt;strong&gt;Model sizes&lt;/strong&gt;&lt;/th&gt;
          &lt;th style=&#34;text-align: center&#34;&gt;&lt;strong&gt;Context Length&lt;/strong&gt;&lt;/th&gt;
          &lt;th style=&#34;text-align: center&#34;&gt;&lt;strong&gt;Tokenizer&lt;/strong&gt;&lt;/th&gt;
          &lt;th style=&#34;text-align: center&#34;&gt;&lt;strong&gt;Acceptable use policy&lt;/strong&gt;&lt;/th&gt;
          &lt;th style=&#34;text-align: center&#34;&gt;&lt;strong&gt;License&lt;/strong&gt;&lt;/th&gt;
          &lt;th style=&#34;text-align: center&#34;&gt;&lt;strong&gt;Model Card&lt;/strong&gt;&lt;/th&gt;
      &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
      &lt;tr&gt;
          &lt;td style=&#34;text-align: center&#34;&gt;Llama 2&lt;/td&gt;
          &lt;td style=&#34;text-align: center&#34;&gt;7/18/2023&lt;/td&gt;
          &lt;td style=&#34;text-align: center&#34;&gt;7B, 13B, 70B&lt;/td&gt;
          &lt;td style=&#34;text-align: center&#34;&gt;4K&lt;/td&gt;
          &lt;td style=&#34;text-align: center&#34;&gt;Sentencepiece&lt;/td&gt;
          &lt;td style=&#34;text-align: center&#34;&gt;&lt;a class=&#34;link&#34; href=&#34;models/llama2/USE_POLICY.md&#34; &gt;Use Policy&lt;/a&gt;&lt;/td&gt;
          &lt;td style=&#34;text-align: center&#34;&gt;&lt;a class=&#34;link&#34; href=&#34;models/llama2/LICENSE&#34; &gt;License&lt;/a&gt;&lt;/td&gt;
          &lt;td style=&#34;text-align: center&#34;&gt;&lt;a class=&#34;link&#34; href=&#34;models/llama2/MODEL_CARD.md&#34; &gt;Model Card&lt;/a&gt;&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td style=&#34;text-align: center&#34;&gt;Llama 3&lt;/td&gt;
          &lt;td style=&#34;text-align: center&#34;&gt;4/18/2024&lt;/td&gt;
          &lt;td style=&#34;text-align: center&#34;&gt;8B, 70B&lt;/td&gt;
          &lt;td style=&#34;text-align: center&#34;&gt;8K&lt;/td&gt;
          &lt;td style=&#34;text-align: center&#34;&gt;TikToken-based&lt;/td&gt;
          &lt;td style=&#34;text-align: center&#34;&gt;&lt;a class=&#34;link&#34; href=&#34;models/llama3/USE_POLICY.md&#34; &gt;Use Policy&lt;/a&gt;&lt;/td&gt;
          &lt;td style=&#34;text-align: center&#34;&gt;&lt;a class=&#34;link&#34; href=&#34;models/llama3/LICENSE&#34; &gt;License&lt;/a&gt;&lt;/td&gt;
          &lt;td style=&#34;text-align: center&#34;&gt;&lt;a class=&#34;link&#34; href=&#34;models/llama3/MODEL_CARD.md&#34; &gt;Model Card&lt;/a&gt;&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td style=&#34;text-align: center&#34;&gt;Llama 3.1&lt;/td&gt;
          &lt;td style=&#34;text-align: center&#34;&gt;7/23/2024&lt;/td&gt;
          &lt;td style=&#34;text-align: center&#34;&gt;8B, 70B, 405B&lt;/td&gt;
          &lt;td style=&#34;text-align: center&#34;&gt;128K&lt;/td&gt;
          &lt;td style=&#34;text-align: center&#34;&gt;TikToken-based&lt;/td&gt;
          &lt;td style=&#34;text-align: center&#34;&gt;&lt;a class=&#34;link&#34; href=&#34;models/llama3_1/USE_POLICY.md&#34; &gt;Use Policy&lt;/a&gt;&lt;/td&gt;
          &lt;td style=&#34;text-align: center&#34;&gt;&lt;a class=&#34;link&#34; href=&#34;models/llama3_1/LICENSE&#34; &gt;License&lt;/a&gt;&lt;/td&gt;
          &lt;td style=&#34;text-align: center&#34;&gt;&lt;a class=&#34;link&#34; href=&#34;models/llama3_1/MODEL_CARD.md&#34; &gt;Model Card&lt;/a&gt;&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td style=&#34;text-align: center&#34;&gt;Llama 3.2&lt;/td&gt;
          &lt;td style=&#34;text-align: center&#34;&gt;9/25/2024&lt;/td&gt;
          &lt;td style=&#34;text-align: center&#34;&gt;1B, 3B&lt;/td&gt;
          &lt;td style=&#34;text-align: center&#34;&gt;128K&lt;/td&gt;
          &lt;td style=&#34;text-align: center&#34;&gt;TikToken-based&lt;/td&gt;
          &lt;td style=&#34;text-align: center&#34;&gt;&lt;a class=&#34;link&#34; href=&#34;models/llama3_2/USE_POLICY.md&#34; &gt;Use Policy&lt;/a&gt;&lt;/td&gt;
          &lt;td style=&#34;text-align: center&#34;&gt;&lt;a class=&#34;link&#34; href=&#34;models/llama3_2/LICENSE&#34; &gt;License&lt;/a&gt;&lt;/td&gt;
          &lt;td style=&#34;text-align: center&#34;&gt;&lt;a class=&#34;link&#34; href=&#34;models/llama3_2/MODEL_CARD.md&#34; &gt;Model Card&lt;/a&gt;&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td style=&#34;text-align: center&#34;&gt;Llama 3.2-Vision&lt;/td&gt;
          &lt;td style=&#34;text-align: center&#34;&gt;9/25/2024&lt;/td&gt;
          &lt;td style=&#34;text-align: center&#34;&gt;11B, 90B&lt;/td&gt;
          &lt;td style=&#34;text-align: center&#34;&gt;128K&lt;/td&gt;
          &lt;td style=&#34;text-align: center&#34;&gt;TikToken-based&lt;/td&gt;
          &lt;td style=&#34;text-align: center&#34;&gt;&lt;a class=&#34;link&#34; href=&#34;models/llama3_2/USE_POLICY.md&#34; &gt;Use Policy&lt;/a&gt;&lt;/td&gt;
          &lt;td style=&#34;text-align: center&#34;&gt;&lt;a class=&#34;link&#34; href=&#34;models/llama3_2/LICENSE&#34; &gt;License&lt;/a&gt;&lt;/td&gt;
          &lt;td style=&#34;text-align: center&#34;&gt;&lt;a class=&#34;link&#34; href=&#34;models/llama3_2/MODEL_CARD_VISION.md&#34; &gt;Model Card&lt;/a&gt;&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td style=&#34;text-align: center&#34;&gt;Llama 3.3&lt;/td&gt;
          &lt;td style=&#34;text-align: center&#34;&gt;12/04/2024&lt;/td&gt;
          &lt;td style=&#34;text-align: center&#34;&gt;11B, 90B&lt;/td&gt;
          &lt;td style=&#34;text-align: center&#34;&gt;128K&lt;/td&gt;
          &lt;td style=&#34;text-align: center&#34;&gt;TikToken-based&lt;/td&gt;
          &lt;td style=&#34;text-align: center&#34;&gt;&lt;a class=&#34;link&#34; href=&#34;models/llama3_3/USE_POLICY.md&#34; &gt;Use Policy&lt;/a&gt;&lt;/td&gt;
          &lt;td style=&#34;text-align: center&#34;&gt;&lt;a class=&#34;link&#34; href=&#34;models/llama3_3/LICENSE&#34; &gt;License&lt;/a&gt;&lt;/td&gt;
          &lt;td style=&#34;text-align: center&#34;&gt;&lt;a class=&#34;link&#34; href=&#34;models/llama3_3/MODEL_CARD.md&#34; &gt;Model Card&lt;/a&gt;&lt;/td&gt;
      &lt;/tr&gt;
      &lt;tr&gt;
          &lt;td style=&#34;text-align: center&#34;&gt;Llama 4&lt;/td&gt;
          &lt;td style=&#34;text-align: center&#34;&gt;4/5/2025&lt;/td&gt;
          &lt;td style=&#34;text-align: center&#34;&gt;Scout-17B-16E, Maverick-17B-128E&lt;/td&gt;
          &lt;td style=&#34;text-align: center&#34;&gt;10M, 1M&lt;/td&gt;
          &lt;td style=&#34;text-align: center&#34;&gt;TikToken-based&lt;/td&gt;
          &lt;td style=&#34;text-align: center&#34;&gt;&lt;a class=&#34;link&#34; href=&#34;models/llama4/USE_POLICY.md&#34; &gt;Use Policy&lt;/a&gt;&lt;/td&gt;
          &lt;td style=&#34;text-align: center&#34;&gt;&lt;a class=&#34;link&#34; href=&#34;models/llama4/LICENSE&#34; &gt;License&lt;/a&gt;&lt;/td&gt;
          &lt;td style=&#34;text-align: center&#34;&gt;&lt;a class=&#34;link&#34; href=&#34;models/llama4/MODEL_CARD.md&#34; &gt;Model Card&lt;/a&gt;&lt;/td&gt;
      &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;
&lt;h2 id=&#34;download&#34;&gt;Download
&lt;/h2&gt;&lt;p&gt;To download the model weights and tokenizer:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Visit the &lt;a class=&#34;link&#34; href=&#34;https://llama.meta.com/llama-downloads/&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Meta Llama website&lt;/a&gt;.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Read and accept the license.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Once your request is approved you will receive a signed URL via email.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Install the &lt;a class=&#34;link&#34; href=&#34;https://github.com/meta-llama/llama-stack&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Llama CLI&lt;/a&gt;: &lt;code&gt;pip install llama-stack&lt;/code&gt;. (&lt;strong&gt;&amp;lt;&amp;ndash; Start Here if you have received an email already.&lt;/strong&gt;)&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Run &lt;code&gt;llama model list&lt;/code&gt; to show the latest available models and determine the model ID you wish to download. &lt;strong&gt;NOTE&lt;/strong&gt;:
If you want older versions of models, run &lt;code&gt;llama model list --show-all&lt;/code&gt; to show all the available Llama models.&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Run: &lt;code&gt;llama download --source meta --model-id CHOSEN_MODEL_ID&lt;/code&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Pass the URL provided when prompted to start the download.&lt;/p&gt;
&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Remember that the links expire after 24 hours and a certain amount of downloads. You can always re-request a link if you start seeing errors such as &lt;code&gt;403: Forbidden&lt;/code&gt;.&lt;/p&gt;
&lt;h2 id=&#34;running-the-models&#34;&gt;Running the models
&lt;/h2&gt;&lt;p&gt;In order to run the models, you will need to install dependencies after checking out the repository.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;# Run this within a suitable Python environment (uv, conda, or virtualenv)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;pip install .&lt;span class=&#34;o&#34;&gt;[&lt;/span&gt;torch&lt;span class=&#34;o&#34;&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;Example scripts are available in &lt;code&gt;models/{ llama3, llama4 }/scripts/&lt;/code&gt; sub-directory. Note that the Llama4 series of models require at least 4 GPUs to run inference at full (bf16) precision.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;3
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;4
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;5
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;6
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;7
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;8
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;cp&#34;&gt;#!/bin/bash
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;nv&#34;&gt;NGPUS&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;m&#34;&gt;4&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;nv&#34;&gt;CHECKPOINT_DIR&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;~/.llama/checkpoints/Llama-4-Scout-17B-16E-Instruct
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;nv&#34;&gt;PYTHONPATH&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;k&#34;&gt;$(&lt;/span&gt;git rev-parse --show-toplevel&lt;span class=&#34;k&#34;&gt;)&lt;/span&gt; &lt;span class=&#34;se&#34;&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;  torchrun --nproc_per_node&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;nv&#34;&gt;$NGPUS&lt;/span&gt; &lt;span class=&#34;se&#34;&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;  -m models.llama4.scripts.chat_completion &lt;span class=&#34;nv&#34;&gt;$CHECKPOINT_DIR&lt;/span&gt; &lt;span class=&#34;se&#34;&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;  --world_size &lt;span class=&#34;nv&#34;&gt;$NGPUS&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;The above script should be used with an Instruct (Chat) model. For a Base model, update the &lt;code&gt;CHECKPOINT_DIR&lt;/code&gt; path and use the script &lt;code&gt;models.llama4.scripts.completion&lt;/code&gt;.&lt;/p&gt;
&lt;h2 id=&#34;running-inference-with-fp8-and-int4-quantization&#34;&gt;Running inference with FP8 and Int4 Quantization
&lt;/h2&gt;&lt;p&gt;You can reduce the memory footprint of the models at the cost of minimal loss in accuracy by running inference with FP8 or Int4 quantization. Use the &lt;code&gt;--quantization-mode&lt;/code&gt; flag to specify the quantization mode. There are two modes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;code&gt;fp8_mixed&lt;/code&gt;: Mixed precision inference with FP8 for some weights and bfloat16 for activations.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;int4_mixed&lt;/code&gt;: Mixed precision inference with Int4 for some weights and bfloat16 for activations.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Using FP8, running Llama-4-Scout-17B-16E-Instruct requires 2 GPUs with 80GB of memory. Using Int4, you need a single GPU with 80GB of memory.&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt; 1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 3
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 4
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 5
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 6
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 7
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 8
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 9
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;10
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;11
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;12
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;nv&#34;&gt;MODE&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;fp8_mixed  &lt;span class=&#34;c1&#34;&gt;# or int4_mixed&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;k&#34;&gt;if&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;[&lt;/span&gt; &lt;span class=&#34;nv&#34;&gt;$MODE&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;==&lt;/span&gt; &lt;span class=&#34;s2&#34;&gt;&amp;#34;fp8_mixed&amp;#34;&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;]&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;;&lt;/span&gt; &lt;span class=&#34;k&#34;&gt;then&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;  &lt;span class=&#34;nv&#34;&gt;NGPUS&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;m&#34;&gt;2&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;k&#34;&gt;else&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;  &lt;span class=&#34;nv&#34;&gt;NGPUS&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;m&#34;&gt;1&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;k&#34;&gt;fi&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;nv&#34;&gt;CHECKPOINT_DIR&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;~/.llama/checkpoints/Llama-4-Scout-17B-16E-Instruct
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;nv&#34;&gt;PYTHONPATH&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;k&#34;&gt;$(&lt;/span&gt;git rev-parse --show-toplevel&lt;span class=&#34;k&#34;&gt;)&lt;/span&gt; &lt;span class=&#34;se&#34;&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;  torchrun --nproc_per_node&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;nv&#34;&gt;$NGPUS&lt;/span&gt; &lt;span class=&#34;se&#34;&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;  -m models.llama4.scripts.chat_completion &lt;span class=&#34;nv&#34;&gt;$CHECKPOINT_DIR&lt;/span&gt; &lt;span class=&#34;se&#34;&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;  --world_size &lt;span class=&#34;nv&#34;&gt;$NGPUS&lt;/span&gt; &lt;span class=&#34;se&#34;&gt;\
&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;  --quantization-mode &lt;span class=&#34;nv&#34;&gt;$MODE&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;p&gt;For more flexibility in running inference (including using other providers), please see the &lt;a class=&#34;link&#34; href=&#34;https://github.com/meta-llama/llama-stack&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;&lt;code&gt;Llama Stack&lt;/code&gt;&lt;/a&gt; toolset.&lt;/p&gt;
&lt;h2 id=&#34;access-to-hugging-face&#34;&gt;Access to Hugging Face
&lt;/h2&gt;&lt;p&gt;We also provide downloads on &lt;a class=&#34;link&#34; href=&#34;https://huggingface.co/meta-llama&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Hugging Face&lt;/a&gt;, in both transformers and native &lt;code&gt;llama4&lt;/code&gt; formats. To download the weights from Hugging Face, please follow these steps:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Visit one of the repos, for example &lt;a class=&#34;link&#34; href=&#34;https://huggingface.co/meta-llama/Llama-4-Scout-17B-16E&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;meta-llama/Llama-4-Scout-17B-16E&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Read and accept the license. Once your request is approved, you&amp;rsquo;ll be granted access to all Llama 3.1 models as well as previous versions. Note that requests used to take up to one hour to get processed.&lt;/li&gt;
&lt;li&gt;To download the original native weights to use with this repo, click on the &amp;ldquo;Files and versions&amp;rdquo; tab and download the contents of the &lt;code&gt;original&lt;/code&gt; folder. You can also download them from the command line if you &lt;code&gt;pip install huggingface-hub&lt;/code&gt;:&lt;/li&gt;
&lt;/ul&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;huggingface-cli download meta-llama/Llama-4-Scout-17B-16E-Instruct-Original --local-dir meta-llama/Llama-4-Scout-17B-16E-Instruct-Original
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;To use with transformers, the following snippet will download and cache the weights:&lt;/p&gt;
&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt; 1
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 2
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 3
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 4
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 5
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 6
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 7
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 8
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt; 9
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;10
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;11
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;12
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;13
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;14
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;15
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;16
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;17
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;18
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;19
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;20
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;21
&lt;/span&gt;&lt;span class=&#34;lnt&#34;&gt;22
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-python&#34; data-lang=&#34;python&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;c1&#34;&gt;# inference.py&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;kn&#34;&gt;from&lt;/span&gt; &lt;span class=&#34;nn&#34;&gt;transformers&lt;/span&gt; &lt;span class=&#34;kn&#34;&gt;import&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;AutoTokenizer&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;Llama4ForConditionalGeneration&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;kn&#34;&gt;import&lt;/span&gt; &lt;span class=&#34;nn&#34;&gt;torch&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;model_id&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;s2&#34;&gt;&amp;#34;meta-llama/Llama-4-Scout-17B-16E-Instruct&amp;#34;&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;tokenizer&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;AutoTokenizer&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;from_pretrained&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;model_id&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;messages&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;[&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;p&#34;&gt;{&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;role&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt; &lt;span class=&#34;s2&#34;&gt;&amp;#34;user&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;s2&#34;&gt;&amp;#34;content&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;:&lt;/span&gt; &lt;span class=&#34;s2&#34;&gt;&amp;#34;Who are you?&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;},&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;p&#34;&gt;]&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;inputs&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;tokenizer&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;apply_chat_template&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;messages&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;add_generation_prompt&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;kc&#34;&gt;True&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;return_tensors&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;pt&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;return_dict&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;kc&#34;&gt;True&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;model&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;Llama4ForConditionalGeneration&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;from_pretrained&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;    &lt;span class=&#34;n&#34;&gt;model_id&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;device_map&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;auto&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;torch_dtype&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;torch&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;bfloat16&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;outputs&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;model&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;generate&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;**&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;inputs&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;to&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;model&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;device&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;),&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;max_new_tokens&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;mi&#34;&gt;100&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;)&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;n&#34;&gt;outputs&lt;/span&gt; &lt;span class=&#34;o&#34;&gt;=&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;tokenizer&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;batch_decode&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;outputs&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;[:,&lt;/span&gt; &lt;span class=&#34;n&#34;&gt;inputs&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;[&lt;/span&gt;&lt;span class=&#34;s2&#34;&gt;&amp;#34;input_ids&amp;#34;&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;]&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;.&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;shape&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;[&lt;/span&gt;&lt;span class=&#34;o&#34;&gt;-&lt;/span&gt;&lt;span class=&#34;mi&#34;&gt;1&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;]&lt;/span&gt; &lt;span class=&#34;p&#34;&gt;:])&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt;&lt;span class=&#34;nb&#34;&gt;print&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;(&lt;/span&gt;&lt;span class=&#34;n&#34;&gt;outputs&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;[&lt;/span&gt;&lt;span class=&#34;mi&#34;&gt;0&lt;/span&gt;&lt;span class=&#34;p&#34;&gt;])&lt;/span&gt;
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;div class=&#34;highlight&#34;&gt;&lt;div class=&#34;chroma&#34;&gt;
&lt;table class=&#34;lntable&#34;&gt;&lt;tr&gt;&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code&gt;&lt;span class=&#34;lnt&#34;&gt;1
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;
&lt;td class=&#34;lntd&#34;&gt;
&lt;pre tabindex=&#34;0&#34; class=&#34;chroma&#34;&gt;&lt;code class=&#34;language-bash&#34; data-lang=&#34;bash&#34;&gt;&lt;span class=&#34;line&#34;&gt;&lt;span class=&#34;cl&#34;&gt; torchrun --nnodes&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;m&#34;&gt;1&lt;/span&gt; --nproc_per_node&lt;span class=&#34;o&#34;&gt;=&lt;/span&gt;&lt;span class=&#34;m&#34;&gt;8&lt;/span&gt; inference.py
&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/table&gt;
&lt;/div&gt;
&lt;/div&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;installations&#34;&gt;Installations
&lt;/h2&gt;&lt;p&gt;You can install this repository as a &lt;a class=&#34;link&#34; href=&#34;https://pypi.org/project/llama-models/&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;package&lt;/a&gt; by just doing &lt;code&gt;pip install llama-models&lt;/code&gt;&lt;/p&gt;
&lt;h2 id=&#34;responsible-use&#34;&gt;Responsible Use
&lt;/h2&gt;&lt;p&gt;Llama models are a new technology that carries potential risks with use. Testing conducted to date has not — and could not — cover all scenarios.
To help developers address these risks, we have created the &lt;a class=&#34;link&#34; href=&#34;https://ai.meta.com/static-resource/responsible-use-guide/&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;Responsible Use Guide&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&#34;issues&#34;&gt;Issues
&lt;/h2&gt;&lt;p&gt;Please report any software “bug” or other problems with the models through one of the following means:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Reporting issues with the model: &lt;a class=&#34;link&#34; href=&#34;https://github.com/meta-llama/llama-models/issues&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;https://github.com/meta-llama/llama-models/issues&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Reporting risky content generated by the model: &lt;a class=&#34;link&#34; href=&#34;http://developers.facebook.com/llama_output_feedback&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;developers.facebook.com/llama_output_feedback&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Reporting bugs and security concerns: &lt;a class=&#34;link&#34; href=&#34;http://facebook.com/whitehat/info&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;facebook.com/whitehat/info&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&#34;questions&#34;&gt;Questions
&lt;/h2&gt;&lt;p&gt;For common questions, the FAQ can be found &lt;a class=&#34;link&#34; href=&#34;https://llama.meta.com/faq&#34;  target=&#34;_blank&#34; rel=&#34;noopener&#34;
    &gt;here&lt;/a&gt;, which will be updated over time as new questions arise.&lt;/p&gt;
</description>
        </item>
        
    </channel>
</rss>
