Microsoft has recently introduced Phi-2, a small language model (SLM) that showcases impressive capabilities in the field of generative AI. Despite its name, Phi-2 is far from being “small” as it boasts 2.7 billion parameters, a significant increase from its predecessor Phi-1.5, which had 1.3 billion parameters. This SLM has demonstrated “state-of-the-art performance” among language models with less than 13 billion parameters, even surpassing larger models by up to 25 times in complex benchmarks.
Phi-2 has outperformed models like Meta’s Llama-2, Mistral, and Google’s Gemini Nano 2 on various benchmarks. Microsoft’s goal with Phi is to develop an SLM with comparable capabilities and performance to much larger models. The company aims to explore whether these emergent abilities can be achieved on a smaller scale through strategic training choices and data selection.
During the training process of Phi-2, Microsoft carefully selected high-quality data, including “text-book quality” information, and supplemented it with web data filtered for educational value and content quality. This meticulous approach to data selection contributes to Phi-2’s exceptional performance.
So, why is Microsoft focused on SLMs? One reason is that SLMs offer a cost-effective alternative to larger language models. They are particularly useful for tasks that do not require the extensive power of an LLM. Additionally, SLMs have lower computational power requirements compared to LLMs, meaning users do not need expensive GPUs to handle their data-processing needs.
In conclusion, Microsoft’s Phi-2 demonstrates the potential of small language models in the field of generative AI. With its impressive performance and significant parameter count, Phi-2 challenges the notion that only large language models can excel in this domain. By strategically selecting and augmenting data, Microsoft aims to push the boundaries of what SLMs can achieve.