Generative AI, a rapidly growing technology, is currently used by OpenAI’s ChatGPT and Google Bard for chat purposes, as well as by image generation systems like Stable Diffusion and DALL-E. However, these tools require cloud-based data centers with numerous GPUs to perform the necessary computing processes for each query. The future possibility of running generative AI tasks directly on mobile devices, connected cars, and smart speakers like Amazon Echo, Google Home, or Apple HomePod is being explored.
Taiwan-based semiconductor company MediaTek believes that this future is closer than we realize. They have announced their collaboration with Meta to bring Meta’s Lllama 2 LLM, along with MediaTek’s latest-generation APUs and NeuroPilot software development platform, to devices without relying on external processing. However, it should be noted that this does not eliminate the need for data centers entirely. Due to the size of LLM datasets and the required storage system performance, a smaller data center is still necessary.
For example, Llama 2’s “small” dataset contains 7 billion parameters, or approximately 13GB, which is suitable for basic generative AI functions. However, larger versions of LLMs with 72 billion parameters or more will require significantly more storage, beyond the practical capabilities of smartphones. Over the next few years, LLMs in development are expected to be 10 to 100 times larger than Llama 2 or GPT-4, with storage requirements in the hundreds of gigabytes and higher. While smartphones may struggle to store and provide sufficient IOPS for such large datasets, cache appliances with fast flash storage and terabytes of RAM can handle these requirements.
MediaTek anticipates that Llama 2-based AI applications will be available for smartphones powered by their next-generation flagship SoC, which is set to be released by the end of the year. To access these datasets on devices, mobile carriers would need to rely on low-latency edge networks, which are small data centers or equipment closets with fast connections to 5G towers. These data centers would be located directly on the carrier’s network, allowing LLMs running on smartphones to access parameter data without going through multiple network “hops.”
On-device generative AI offers several benefits. Firstly, it reduces latency because the data is processed on the device itself, especially when localized cache methodologies are used for frequently accessed parts of the parameter dataset. Secondly, it improves data privacy as only the model data is transmitted through the data center, keeping user data on the device. Thirdly, it enhances bandwidth efficiency since a significant amount of data processing occurs on the device itself. Fourthly, it increases operational resiliency as the system can continue functioning even if the network is disrupted, provided that the device has a large enough parameter cache. Lastly, it is more energy-efficient since it requires fewer compute-intensive resources at the data center and less energy to transmit data.
However, achieving these benefits may involve splitting workloads and using load-balancing techniques to mitigate centralized data center compute costs and network overhead. Additionally, there are concerns about the power of LLMs that can be run on current hardware. While on-device data interception across a network is less of a concern, there is a security risk of sensitive data being compromised on the local device if not properly managed. Furthermore, updating the model data and maintaining data consistency on a large number of distributed edge caching devices pose challenges.
Another consideration is the cost of mini edge data centers. Edge networking is currently utilized by Edge Service Providers like Equinix, which is needed by services such as Netflix and Apple’s iTunes, rather than mobile network operators like AT&T, T-Mobile, or Verizon. Generative AI service providers such as OpenAI/Microsoft, Google, and Meta would need to establish similar arrangements.
In conclusion, on-device generative AI is a topic that tech companies are actively exploring. Within the next five years, it is possible that on-device intelligent assistants will become a reality. The era of AI in our pockets is approaching sooner than expected.