Efficient LLMs Implementation on Apple Devices
Large language models (LLMs) like GPT-4 typically require substantial computing power and memory, yet AI researchers at Apple claim to have found an efficient way to deploy LLMs on iPhones and other Apple devices with relatively limited internal memory.
Innovative Solution for Overcoming Memory Limitations
In a research paper, the team details their solution for running large language models that exceed available DRAM capacity on mobile devices such as iPhones. They suggest storing model parameters in the flash memory and transferring them to DRAM as needed.
Enhancing Throughput with Data Recycling
To maximize throughput, the authors describe a ‘recycling’ technique where data processed by an AI model is reused, reducing the need to continuously fetch memory, which should result in smoother processing. Moreover, the researchers say that grouping larger chunks of data can lead to faster reading, contributing to swifter processing and responses from the AI model.
Impact on AI Model Performance
These two methods could allow AI models to operate at sizes up to twice that of the available DRAM and achieve inference speeds up to 5 to 25 times faster compared to loading directly into the CPU and GPU.
Potential Applications for Enhanced LLMs in Apple Products
More efficient functioning of LLMs on iPhones could enable advanced Siri commands, real-time language translation, and the implementation of AI features in photography. There is speculation that Apple is already developing its own large-scale language model, referred to by employees as ‘AppleGPT’, and intends to integrate generative AI into applications like Siri, Xcode, and Keynote
Read More