Apple ReALM is Making Siri Smarter and Outperforms GPT-4

Updated on April 4 2024

Apple has remained a leader in user privacy and security for a long time, and this focus can be seen in its AI strategy as well. The exciting development of ReALM (Reference Resolution as Language Modelling) by Apple is reflected in their recently-published paper on this novel advancement, which indicates a more intelligent and faster Siri along with the possible future of Apple products.

Siri has been the well mannered (or on the contrary, quite exasperating) voice assistant on Apple devices for years now. Despite assistance in simple tasks, Siri has difficulties in classifying context and intonation in user commands. Hence, unnatural conversations and poor user experience may be the outcomes of this. But indeed, Apple ReALM is a big shift in what Siri is going to be and maybe it will also impact other Apple products.

What is Apple ReALM

ReALM’s primary role is to give AI models like Siri context by reassembling the display and labelling each on-screen object and its location. This way, the visual layout will be converted into a text-based representation, which can serve as a clue to the voice assistant to interpret user requests.

Unlike GPT-4 that is bigger and slower, and yet is as good as GPT-4 on a number of benchmarks, ReALM shows Apple’s innovative approach to AI development.

Unlike the conventional AI assistants that frequently lack a sense of context, ReALM is very good with it. For example, you could ask Siri to “Play the song again” by triggering ReALM to pinpoint the song you just played, fulfil the request, and play it. This change in processing enables a smoother and more pleasant user experience.

For instance, while scrolling through a news article mentioning a restaurant, you could simply say, “Call them” to ReALM. By understanding the context (the restaurant name on the webpage), ReALM could directly initiate a call without needing further clarification. This eliminates the frustration of repetitive prompts and makes interacting with your device feel more seamless.

Key Features of ReALM

ReALM, or Reference Resolution As Language Modelling, is designed to enhance the capabilities of voice assistants like Siri.

Here are the main features of ReALM:

Small and Efficient: In comparison to the models such as GPT-4, the ReALM is clearly smaller, making it more convenient for on-device wallets. This downsizing still retains performance quality by being more than just GPT-4 on specific tasks despite that it has fewer parameters.

Visual Modelling: It serves as an illustration that gives a visual model rebuilding the screen. All on-screen objects are labeled and their positions are specified. It builds up text-based representation of the schematic layout for the sake of providing contextual hints to the voice assistant when users request for services.

Improved Context Understanding: The objective of ReALM is to strengthen Siri’s understanding of dialog in terms of taking into account context and meaning of ambiguous expressions. It based on reference resolution helps Siri give highly contextualized answers when tackling queries, which is essential for successfully dealing with domain specific questions.

On-Screen Content Utilization: This model may give the answers using on-screen materials and that is going to improve Siri reaction towards the setting environment of the user. This involves identifying the activities that are running in a covert mode and so it gives more value to voice commands.

Textual Encoding Approach: ReALM processes documents into text in a way that makes them simple for neural networks to solve the task. This modality helps to achieve the goal more quickly and with less resources, therefore less response time of voice assistants and more accuracy.

Efficiency in Parsing Contextual Data: This way, ReALM eliminates the need for converting images into text. As a result, getting the contextual information becomes not only more efficient but also less demanding of resources.

Also Read: Apple Strengthens AI Capabilities by Acquiring DarwinAI

Performance and Efficiency of ReALM

It is precisely these features that make ReALM significantly more productive than its opponents in on-device performance. We can achieve this through a compact model, which enables faster on-device operations than big models such as GPT-4, that need more computational power. This efficiency is critical for voice assistants, which require to process a user’s request in the short time, and accurately prolong the battery life of a device.

Context Comprehension and References Resolution

The core power of ReALM is to enhance Siri’s contextual recognition in dialogue and process the on screen content from the background activities as well. However, ReALM differs from other models in the sense that it is capable of converting speech-to-speech, onscreen video, and background sounds into text that can be processed by an LLM. This novel approach to enhancing voice assistants differs from the conventionally explored conversational context as a subtle, but versatile tool to make virtual assistants even more user friendly.

Benchmarking Against GPT Models

In the benchmarking tests, ReALM showed a superiority over the systems which are similar in terms of functions and have the comparable functionalities. The smallest model from ReALM delivered comparable, or even higher, performance to GPT-4, and the larger models substantially surpassed it. This implies that ReALM can handle tasks that are too much for GPT-4 to comprehend, such as on-screen references, with the performance being slightly inferior to its but still very good nonetheless since there are many less parameters.

Scalability and Versatility

The ReALM comes in different sizes, such as ReALM-80M, ReALM-250M, ReALM-1B, and ReALM-3B with the flexibility of their utilization in line with the specific application. This scalability reflects Apple’s devotion to using AI technology and making it available to everyone through different devices and purposes.

Also Read: Apple WWDC 2024 Dates, Key Highlights and Expected AI Announcements

Apple ReALM vs GPT-4


Apple found out that ReALM models had similar performance to GPT-4, but with fewer parameters, making them more suitable for on-device use. Increasing the parameters in ReALM resulted in it outperforming GPT-4 by a large margin.

ProcessingOn-DeviceCloud-Based (primarily)
FocusContextual Understanding & EfficiencyGenerative Text & Code
Data UsageLimited Data (focuses on text descriptions)Vast Amounts of Data (text, code, images)
PrivacyHigh (keeps data on device)Lower (data sent to cloud for processing)
SpeedPotentially Faster (avoids cloud communication)Potentially Slower (limited by internet speed)
Accuracy for Contextual RequestsPotentially Higher (tailored for understanding context)Potentially Lower (less emphasis on contextual understanding)
Suitability for SiriHigh (improves response speed and understanding)Lower (not specifically designed for smart assistants)

Apple ReALM does pack a punch and ensures on-device focus checks all the boxes.

Also Read: Apple’s First Large Multimodal Model -MM1


Apple ReALM is an important new technology for AI on devices. ReALM focuses on being fast, efficient, and protecting user privacy. Unlike cloud-based AI like GPT-4, ReALM works directly on the device. This means faster response times and better privacy, making ReALM a good choice for virtual assistants like Siri.

While we cannot yet see the full effect of ReALM, one thing is clear: it works almost as well as GPT-4, but with much fewer parameters, even for understanding on-screen information. Amazingly, Apple ReALM also does better than GPT-4 for certain user requests related to specific topics. This makes ReALM a great choice for a practical reference system that can work on devices without sacrificing performance. With its scalability, versatility, and Apple’s continued innovation, ReALM brings a new dimension to Apple’s AI capabilities and a more user friendly Siri.

Featured Tools



Humanize AI

Air Chat






Related Articles