With the release of GPT from OpenAI, many companies entered the race to create robust Generative Large Language Models of their own. Creating a Generative AI from scratch can involve a pretty cumbersome process, as it requires conducting thorough research in the field of Generative AI and performing numerous trials and errors. It also entails carefully curating a high-quality dataset, as the effectiveness of Large Language Models heavily depends on the data they are trained on. And lastly, it requires enormous computation power to train these models, which many companies cannot access. So as of now, only a few companies can create these LLMs, including OpenAI and Google, and now finally, Meta has joined this race with the introduction of LlaMA.
This article was published as a part of the Data Science Blogathon.
LlaMA (Large Language Model Meta AI) is a Generative AI model, specifically a group of foundational Large Language Models developed by Meta AI, a company owned by Meta(Formerly Facebook). Meta announced Llama in Feb of 2023. Meta released Llama in different sizes(based on parameters), i.e., 7,13,33, and 65 billion parameters with a context length of 2k tokens. The model is with the intent to help researchers advance their knowledge in the field of AI. The small 7B models allow researchers with low computation power to study these models.
With the introduction of LlaMa, Meta has entered the LLM space and is now competing with OpenAI’s GPT and Google’s PaLM models. Meta believes that retraining or fine-tuning small models with limited computation resources can achieve results on par with state-of-the-art models in their respective fields. Meta AI’s LlaMa differs from OpenAI and Google’s LLM because the LlaMA model family is completely Open Source and free for anyone to use, and it even released the LlaMA weights for researchers for non-commercial uses.
LlaMA 2 surpasses the previous version, LlaMA version 1, which Meta released in July of 2023. It came out in three sizes: 7B, 13B, and 70B parameter models. Upon its release, LlaMA 2 achieved the highest score on Hugging Face. Even across all segments (7B, 13B, and 70B), the top-performing model on Hugging Face originates from LlaMA 2, having been fine-tuned or retrained.
Llama 2 was trained on 2 Trillion Pretraining Tokens. The context length for all the Llama 2 models is 4k(2x the context length of Llama 1). Llama 2 outperformed state-of-the-art open-source models such as Falcon and MPT in various benchmarks, including MMLU, TriviaQA, Natural Question, HumanEval, and others (You can find the comprehensive benchmark scores on Meta AI’s website). Furthermore, Llama 2 underwent fine-tuning for chat-related use cases, involving training with over 1 million human annotations. These chat models are readily available to use on the Hugging Face website.
The source code for Llama 2 is available on GitHub. If you want to work with the original weights, these are also available, but for this, you need to provide your name and email to the Meta AIs website. So go to the Meta AI by clicking here, then enter your name, email address, and organization(student if you are not working). Then scroll down and click on accept and continue. Now you will get a mail stating that you can download the model weights. The form will look like the one below.
Now there are two ways to work with your model. One is to directly download the model through the instructions and link provided in the email(the hard way, and only good if you have a decent GPU), and the other is to use Hugging Face and Google Colab. In this article, I will go through the easy way, which anyone can try. Before going to Google Colab, we need to set up a Hugging Face account and create an Inference API. Then we need to go to the llama 2 model in Hugging Face(which you can do by clicking here), and then provide the email you provided to the Meta AI website. Then you will be authenticated and will be shown something similar to the below.
Now, we can download any Llama 2 model through Hugging Face and start working with it.
In the last section, we have seen the prerequisites before testing the Llama 2 model. We will start with importing necessary libraries in the Google Colab, which we can do with the pip command.
!pip install -q transformers einops accelerate langchain bitsandbytes
We need to install these necessary packages to start working with Llama 2. Also, the transformers library from hugging face to download the model. The einops function performs easy matrix multiplications within the model(it uses Einstein Operations/Summation notation), accelerates bits and bytes to speedup the inference, and langchain integrates our llama.
Next, to login into the Hugging Face through colab through the Hugging Face API Key, we can download the llama model; for this, we do the following.
!huggingface-cli login
Now we provide the Hugging Face Inference API key we created earlier. Then if it prompts Add token as git credential? (Y/n), Then you can reply with n. Now we are logged into Hugging Face API Key and are ready to download the model.
Now to download our model, we will write the following.
from langchain import HuggingFacePipeline
from transformers import AutoTokenizer
import transformers
import torch
model = "meta-llama/Llama-2-7b-chat-hf"
tokenizer = AutoTokenizer.from_pretrained(model)
pipeline = transformers.pipeline(
"text-generation",
model=model,
tokenizer=tokenizer,
torch_dtype=torch.bfloat16,
trust_remote_code=True,
device_map="auto",
max_length=1000,
eos_token_id=tokenizer.eos_token_id
)
llm = HuggingFacePipeline(pipeline = pipeline, model_kwargs = {'temperature':0})
Here we set the model’s temperature and pass the pipeline we created to the pipeline variable. This HuggingFacePipeline will now allow us to use the model that we have downloaded.
We shall create a Prompt Template for our model and then test it.
from langchain import PromptTemplate, LLMChain
template = """
You are an intelligent chatbot that gives out useful information to humans.
You return the responses in sentences with arrows at the start of each sentence
{query}
"""
prompt = PromptTemplate(template=template, input_variables=["query"])
llm_chain = LLMChain(prompt=prompt, llm=llm)
print(llm_chain.run('What are the 3 causes of glacier meltdowns?'))
So we asked the model to list the three possible causes of glacier meltdowns, and the model returned the following:
We see that the model has done exceptionally well. The best part is that it used emoji numbering to represent the points and has exactly returned 3 points to the output. It even used the water tide emoji to represent the glaciers. This way, you can start working with the Llama 2 from Hugging Face and Colab.
In this article, we have briefly examined the LlaMA(Large Language Model Meta AI)models created and released by Meta AI. We have learned about the different model sizes of its and seen how version 2, i.e., Llama 2, clearly defeats the state-of-the-art Open Source LLMs at different benchmarks. Finally, we have gone through the process of getting access to the Llama 2 model trained weights. Finally, we walked through the Llama-2 7B chat version in the Google Colab through the Hugging Face and LangChain libraries.
Some of the key takeaways from this article include:
A. LlaMA is a group of foundational LLMs developed by Meta AI, owned by Meta(Formerly Facebook); this was announced to the public in February 2023.
A. Llama 2 comes in 3 different sizes, they are 7B, 13B, and the 70B parameter model. All three of them work exceptionally well and can be fine-tuned easily.
A. Yeah. It is possible to run the 7B model of Llama 2 on the local machine, which requires you to have at least 10GB of GPU VRAM for the model to work properly. Though quantized versions of Llama 2 7B are available, they require even less VRAM, and some can run only with the CPU.
A. Meta AI has announced that Llama and Llama 2 will be open-sourced. They even provide the model weights if requested through a form on their website. Within hours after releasing Llama 2, many alternative Llama 2 models have sprung up in the Hugging Face.
A. With Llama, we can create applications like conversation chatbots, sentiment classification systems, summarization tools, and many more. In the future, developers will create even smaller versions that can work to develop Generative AI-enabled mobile applications.
The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.
Lorem ipsum dolor sit amet, consectetur adipiscing elit,