Mistral Large 2: Powerful Enough to Challenge Llama 3.1 405B?

Ajay Kumar Reddy 30 Jul, 2024

9 min read

Introduction

Just a few days ago Meta AI released the new Llama 3.1 family of models. A day after the release, the Mistral AI released its largest model so far, called the Mistral Large 2. The model is trained on a large corpus of data and is expected to perform on par with the current SOTA models like the GPT 4o, and Opus and lie just below the open-source Meta Llama 3.1 405B. Like the Meta models, the Large 2 is said to excel at multi-lingual capabilities. In this article, we will go through the Mistral Large 2 model, check how well it works in different aspects.

Learning Objectives

Explore Mistral Large 2 and its features.
See how well it compares to the current SOTA models.
Understand the Large 2 coding abilities from its generations.
Learn to generate structured JSON responses with Large 2.
Understanding the tool calling feature of Mistral Large 2.

This article was published as a part of the Data Science Blogathon.

Exploring Mistral Large 2 – Mistral’s Largest Open Model

As the heading goes, Mistral AI has recently announced the release of its newest and largest model named Mistral Large 2. This was announced just after the Meta AI released the Llama 3.1 family of models. Mistral Large 2 is a 123 Billion parameter model with 96 attention heads and the model has a context length similar to the Llama 3.1 family of models and is 128k tokens.

Similar to the Llama 3.1 family, Mistral Large 2 uses diverse data containing different languages including Hindi, French, Korean, Portuguese, and more, though it falls just short of the Llama 3.1 405B. The model also trains on over 80 coding languages, with a focus on Python, C++, Javascript, C, and Java. The team has said that Large 2 is exceptional in following instructions and remembering long conversations.

The major difference between the Llama 3.1 family and the Mistral Large 2 release is their respective licenses. While Llama 3.1 is released for both commercial and research purposes, Mistral Large 2 is released under the Mistral Research License, allowing developers to research it but not use it for developing commercial applications. The team assures that developers can work with Mistral Large to create the best Agentic systems, leveraging its exceptional JSON and tool-calling skills.

Mistral Large 2 Compared to the Best: A Benchmark Analysis

Mistral Large 2 gets great results on the HuggingFace Open LLM Benchmarks. Coming to the coding, it outperforms the recently released Codestral and CodeMamba and the performance comes close to the leading models like the GPT 4o, Opus, and the Llama 3.1 405B.

The above graph pic depicts Reasoning benchmarks for different models. We can notice that Large 2 is good at Reasoning. The Large 2 just falls short of the GPT 4o model from OpenAI. Compared to the previously released Mistral Large, the Mistral Large 2 beats its older self by a huge margin.

This graph gives us information about the scores performed by different SOTA models in the Multi-Lingual MMLU benchmark. We can notice that the Mistral Large 2 is very close to the Llama 3.1 405B in terms of performance despite being 3 times smaller and beats the other models in all the above languages.

Hands-On with Mistral Large 2: Accessing the Model via API

In this section, we will get an API Key from the Mistral website, which will let us access their newly released Mistral Large 2 model. For this, first, we need to sign up on their portal which can be accessed by clicking the link here. We need to verify with our mobile number to create an API Key. Then visit the link here to create the API key.

Above, we can see that we can create a new API Key by clicking on the Create new key button. So, we will create a key and store it.

Downloading Libraries

Now, we will start by downloading the following libraries.

!pip install -q mistralai

This downloads the mistralai library, maintained by Mistral AI, allowing us to access all the models created by the Mistral AI team through the API key we created.

Storing Key in Environment

Next, we will store our key in an environment variable with the below code:

import os
os.environ["MISTRAL_API_KEY"] = "YOUR_API_KEY"

Testing the Model

Now, we will begin the coding part to test the new model.

from mistralai.client import MistralClient
from mistralai.models.chat_completion import ChatMessage

message = [ChatMessage(role="user", content="What is a Large Language Model?")]
client = MistralClient(api_key=os.environ["MISTRAL_API_KEY"])

response = client.chat(
   model="mistral-large-2407",
   messages=message
)

print(response.choices[0].message.content)

We start by importing the MistralClient, which will let us access the model and the ChatMessage class with which we will create the Prompt Message.
Then we define a list of ChatMessage instances by giving the instance, the role, which is the user, and the content, here we are asking about LLMs.
Then we create an instance of the MistralClient by giving it the API Key.
Now we call the chat() method of the client object and give it the model name which is mistral-large-2407, it is the name for the Mistral Large 2.
We give the list of messages to the messages parameter, and the response variable stores the generated answer.
Finally, we print the response. The text response is stored in the response.choice[0].message.content, which follows the OpenAI style.

Output

Running this has produced the output below:

The Large Language Model generates a well-structured and straight-to-the-point response. We have seen that the Mistral Large 2 performs well at coding tasks. So let us test the model by asking it a coding-related question.

response = client.chat(
   model="mistral-large-2407",
   messages=[ChatMessage(role="user", content="Create a good looking profile card in css and html")]
)
print(response.choices[0].message.content)

Here, we have asked the model to generate a code to create a good-looking profile card in CSS and HTML. We can check the response generated above. The Mistral Large 2 has generated an HTML code followed by the CSS code generation and finally explains how it works. It even tells us to replace the profile-pic.png so that we can get our photo there. Now let us test this in an online web editor.

The results can be seen below:

Now this is a good-looking profile card. The styling is impressive, with a rounded photo and a well-chosen color scheme. The code includes hyperlinks for Twitter, LinkedIn, and GitHub, allowing you to link to their respective URLs. Overall, Mistral Large 2 serves as an excellent coding assistant for developers who are just getting started.

Generating Structured Responses and Tool Calling

The Mistral AI team has announced that the Mistral Large 2 is one of the best choices to create Agentic Workflows, where a task requires multiple Agents and the Agents require multiple tools to solve it. For this to happen, the Mistral Large has to be good at two things, the first is generating structured responses that are in JSON format and the next is being an expert in tool calling to call different tools.

Testing the model

Let us test the model by asking it to generate a response in JSON format.

For this, the code will be:

messages = [
   ChatMessage(role="user", content="""Who are the best F1 drivers and which team they belong to? /
   Return the name and the ingredients in short JSON object.""")
]


response = client.chat(
   model="mistral-large-2407",
   response_format={"type": "json_object"},
   messages=messages,
)


print(response.choices[0].message.content)

Here, the process for generating a JSON response is very similar to the chat completions. We just send a message to the model asking it to generate a JSON response. Here, we are asking it to generate a JSON response of some of the best F1 drivers along with the team they drive for. The only difference is that, inside the chat() function, we give a response_format parameter to which we give a dictionary stating that we need a JSON response.

Running the code

Running the code and checking the results above, we can see that the model has indeed generated a JSON response.

We can validate the JSON response with the below code:

import json

try:
 json.dumps(chat_response.choices[0].message.content)
 print("Valid JSON")
except Exception as e:
 print("Failed")

Running this has printed Valid JSON to the terminal. So the Mistral Large 2 is capable of generating valid JSONs.

Testing Function Calling Abilities

Let us test the function-calling abilities of this model as well. For this:

def add(a: int, b: int) -> int:
 return a+b
tools = [
   {
       "type": "function",
       "function": {
           "name": "add",
           "description": "Adds two numbers",
           "parameters": {
               "type": "object",
               "properties": {
                   "a": {
                       "type": "integer",
                       "description": "An integer number",
                   },
                   "b": {
                       "type": "integer",
                       "description": "An integer number",
                   },
               },
               "required": ["a","b"],
           },
       },
   }
]


name_to_function = {
   "add": add
}

We start by defining the function. Here we defined a simple add function that takes two integers and adds them.
Now, we need to create a dictionary explaining this function. The type key tells us that this tool is a function, followed by that we give information like what is the function name, what the function does.
Then, we give it the function properties. Properties are the function parameters. Each parameter is a separate key and for each parameter, we tell the type of the parameter and provide a description of it.
Then we give the required key, for this the value will be the list of all required variables. For an add function to work, we require both parameters a and b, hence we give both of them to the required key.
We create such dictionaries for each function that we create and append it to a list.
We even create a name_to_function dictionary which will map our function names in strings to the actual functions.

Testing the Model Again

Now, we will give this function to the model and test it.

response = client.chat(
   model="mistral-large-2407",
   messages=[ChatMessage(role="user", content="I have 19237 apples and 21374 oranges. How many fruits I have in total?")],
   tools=tools,
   tool_choice="auto"
)

from rich import print as rprint

rprint(response.choices[0].message.tool_calls[0])
rprint("Function Name:",response.choices[0].message.tool_calls[0].function.name)
rprint("Function Args:",response.choices[0].message.tool_calls[0].function.arguments)

Here to the chat() function, we give the list of tools to the tools parameter and set the tool_choice to auto.
The auto will let the model decide whether it has to use a tool or not.
We have given it a query by providing the quantity of two fruits and asking it to sum them.
We import rich to get better printing of responses.
All the tool calls generated by the model will be stored in the tools_call attribute of the message class. We access the first tool call by indexing it [0].
Inside this tool_call, we have different attributes like to which function the tool call refers to and what are the function arguments. All these we are printing in the above code.

We can take a look at the output pic above. The part above the func_name is the output generated from the above code. The model has indeed made a tool call to the add function. It has provided the arguments a and b along with their values for the function arguments. Now the function argument looks like a dictionary but it is a string. So to convert it to a dictionary and give it to the model we use the json.loads() method.

So, we access the function from the name_to_function dictionary and then give it the parameters that it takes and print the output that it generates. From this example, we have taken a look at the tool-calling abilities of the Mistral Large 2.

Conclusion

Mistral Large 2, the latest open model from Mistral AI, boasts an impressive 123 billion parameters and demonstrates exceptional instruction-following and conversation-remembering capabilities. While it falls short of Llama 3.1 405B in terms of size, it outperforms other models in coding tasks and shows remarkable performance in reasoning and multilingual benchmarks. Its ability to generate structured responses and call tools makes it an excellent choice for creating Agentic workflows.

Key Takeaways

Mistral Large 2 is Mistral AI’s largest open model, with 123 billion parameters and 96 attention heads.
Trained on datasets containing different languages, including Hindi, French, Korean, Portuguese, and over 80 coding languages.
Beats Codestral and CodeMamba, in terms of coding abilities and is on par with the SOTA models.
Despite being 3 times smaller than the Llama 3.1 405B model, Mistra Large 2 is very close to this model in multi-lingual capabilities.
Being fine-tuned on large datasets of code, the Mistral Large 2 can generate working code which was seen in this article.

Frequently Asked Questions

Q1. Can Mistral Large 2 be used for commercial applications?

A. No, Mistral Large 2 is released under the Mistral Research License, which restricts commercial use.

Q2. Can Mistral Large 2 generate structured responses?

A. Yes, Mistral Large 2 can generate structured responses in JSON format, making it suitable for Agentic workfl.ows

Q3. Does Mistral Large 2 have tool-calling abilities?

A. Yes, Mistral Large 2 can call external tools and functions. It is good at grasping the functions given to it and selects the best based on events.

Q4. How can one interact with the Mistral Large 2 model?

A. Currently, anyone can sign up for the Mistral AI website and create a free API key for a few days, with which we can interact with the model through the mistralai library.

Q5. On what other platforms Mistral Large 2 is available?

A. Mistral Large 2 is available on popular cloud providers like the Vertex AI from GCP, Azure AI Studio from Azure, Amazon Bedrock, and even on IBM Watson.ai.

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.