Microsoft Phi 3 Mini: The Tiny Model That Runs on Your Phone

NISHANT TIWARI 21 May, 2024

6 min read

Introduction

In the field of artificial intelligence (AI), there’s always been a belief that bigger is better. But Microsoft has just shaken things up with their latest creation, Phi-3-mini. It’s a small AI model that’s turning heads by showing that size isn’t everything. Despite being much smaller than its counterparts, Phi-3-mini can hold its own when it comes to understanding language and making sense of things. This challenges the idea that only large language models (LLMs) can do the heavy lifting in AI. This article delves into what this new model is all about and how it is redefining AI innovation. So MicroSoft launched its miny model called Phi 3. And In this Article you will get to know about about phi 3 and Microsoft phi 3 model

Microsoft Phi 3 Mini: The Tiny Model That Runs on Your Phone

Understanding the Phi-3-mini

Phi-3-mini is a recent advancement in small language models (SLMs) developed by Microsoft. Here’s a breakdown of its key features:

Size and Capability: Phi-3-mini is a lightweight model with only 3.8 billion parameters. Despite its compact size, it offers performance comparable to much larger models on various tasks, including language understanding, reasoning, coding, and math.
Training Data: The secret behind Phi-3-mini’s capability lies in its unique training data. It leverages a combination of synthetic data and filtered, high-quality data from publicly available websites. This focus on quality and reasoning-dense properties equips the model to handle complex problems.
Fine-tuning for Safety and Usefulness: Phi-3-mini goes beyond just training on data. It undergoes additional processes like supervised fine-tuning and direct preference optimization. These techniques ensure the model adheres to human instructions and prioritizes safety in its responses.
Technical details: Phi-3-mini is built on a transformer architecture, a common design for large language models. It’s a decoder-only model, meaning it specializes in generating text based on the input it receives. The model is particularly well-suited for prompts and instructions delivered in a chat format.
Availability: If you’re interested in trying out Phi-3-mini, you can access it through various platforms like Microsoft Azure AI Studio, Hugging Face, and Ollama.pen_spark

Phi-3 compared to other language models

Here’s how Phi-3 stacks up against other language models:

Size Advantage:

Phi-3 is a Small Language Model (SLM), meaning it has far fewer parameters (billions) compared to Large Language Models (LLMs) (trillions). This makes Phi-3:
- More resource-efficient: Requires less power to run, ideal for devices like smartphones.
- Faster: Able to process information and respond quicker.

Performance:

Despite its size, Phi-3 performs very well on benchmarks that assess:
- Language processing
- Coding
- Mathematical reasoning
It can even outperform similar-sized models and even some larger LLMs in these tasks.

Training Techniques:

Phi-3 leverages a couple of key strategies for its success:
- High-quality data: Trained on a carefully curated dataset that includes filtered public documents, educational materials, and even synthetic data (generated by other LLMs).
- Knowledge distillation: Learns from the knowledge of larger models in a compressed way.

Variants and Availability:

Phi-3 comes in different sizes (e.g., Phi-3 mini) with varying capabilities.
It’s an open-source model, freely available for developers to use and experiment with.

Why Big isn’t Always Better in AI?

Recently, there has been a significant focus on scaling up LLMs, believing that bigger models lead to better performance. However, the phi-3-mini model, despite achieving a similar level of language understanding and reasoning ability as much larger models, is still fundamentally limited by its size for certain tasks. The model cannot store extensive “factual knowledge,” resulting in lower performance on tasks such as TriviaQA. This limitation has prompted the exploration of augmentation with a search engine to address the model’s weakness. Additionally, the model’s language capabilities are mostly restricted to English, highlighting the need to explore multilingual capabilities for Small Language Models (SLMs) as an important next step.

Phi-3: A Family of Powerful Small Language Models (SLMs)

Microsoft’s Phi-3-mini is part of a family of powerful SLMs developed to challenge the assumption that bigger is always better. These SLMs have been designed to achieve high performance with a significantly smaller number of parameters compared to larger models. The phi-3-mini model, with 3.8 billion parameters, has been trained on 3.3 trillion tokens.

Despite the small size, it demonstrates performance that rivals much larger models, such as Mixtral 8x7B and GPT-3.5. The innovation lies in the dataset used for training, which is a scaled-up version of the one used for phi-2. This dataset consists of heavily filtered web data and synthetic data. This approach has enabled the development of powerful SLMs that can be deployed on devices with limited computational resources.

Phi-3: A Family of Powerful Small Language Models (SLMs)

Inside Phi-3

Phi-3 refers to a series of language models developed by Microsoft, with Phi-3-mini being a notable addition. Phi-3-mini is a 3.8 billion parameter language model trained on 3.3 trillion tokens, designed to be as powerful as larger models while being small enough to be deployed on a phone. Despite its compact size, Phi-3-mini boasts impressive performance, rivaling that of larger models such as Mixtral 8x7B and GPT-3.5. It achieves 69% on MMLU and 8.38 on MT-bench, showcasing its prowess in language understanding and reasoning.

Furthermore, Phi-3-mini can be quantized to 4 bits, occupying approximately 1.8GB of memory, making it suitable for deployment on mobile devices. The model’s training data, a scaled-up version of the one used for Phi-2, is composed of heavily filtered web data and synthetic data, contributing to its remarkable capabilities.

Microsoft Phi 3 Mini vs other language models

The Secret Sauce of Phi-3’s Success

The success of Phi-3 can be attributed to its training methodology, which utilizes high-quality training data to improve the performance of SLMs. The training data consists of heavily filtered web data and synthetic data, following the sequence of works initiated in “Textbooks Are All You Need.” This method allows Phi-3-mini to reach the level of highly capable models such as GPT-3.5 with only 3.8B parameters. This showcases the effectiveness of the training approach. Additionally, the model is chat-finetuned, aligning it for robustness, safety, and chat format, further contributing to its success.

Where Phi-3 Shines and What It Still Learns

Phi-3-mini exhibits strengths in its compact size, impressive performance, and the ability to be deployed on mobile devices. Its training with high-quality data and chat-finetuning contribute to its success. This allows it to rival larger models in language understanding and reasoning.

However, the model is fundamentally limited by its size for certain tasks. It cannot store extensive “factual knowledge,” leading to lower performance on tasks such as TriviaQA. Nevertheless, efforts to resolve this weakness are underway, including augmentation with a search engine and exploring multilingual capabilities for Small Language Models.

Safety First with Phi-3

Phi-3-mini was developed with a strong emphasis on safety and responsible AI principles, in alignment with Microsoft’s guidelines. The approach to ensuring safety involved various measures such as safety alignment in post-training, red-teaming, and automated testing. It also involved evaluations across multiple categories of responsible AI (RAI) harm. The model’s training data was carefully curated and modified to address RAI harm categories, leveraging both existing datasets and in-house generated ones.

An independent red team at Microsoft played a crucial role in identifying areas of improvement during the post-training process. This led to the refinement of the dataset and a significant decrease in harmful response rates. The post-training process itself consisted of supervised finetuning (SFT) and direct preference optimization (DPO), which utilized high-quality data across diverse domains to steer the model away from unwanted behavior.

Despite the diligent RAI efforts, challenges around factual inaccuracies, biases, inappropriate content generation, and safety issues remain, as is the case with most LLMs. However, the use of carefully curated training data and targeted post-training, along with insights from red-teaming, has significantly mitigated these issues.

Conclusion

The Phi-3 model, including phi-3-mini, phi-3-small, and phi-3-medium, has been extensively evaluated and compared with other available language models. The results of the benchmarks demonstrate the model’s impressive reasoning ability, language understanding, and performance in multi-turn conversations. The model’s capacity to handle long context tasks, while maintaining high quality, has been highlighted.

Additionally, the post-training process, including the development of a long context version of phi-3-mini, has further enhanced the model’s capabilities. Going forward, the model’s main advancements would be its multilingual capabilities and the use of a search engine to improve factual knowledge. Overall, the Phi-3 model has shown promising results and potential for further development and application.