Solutions

Datasets

Research

Resources

Company

Talk to us

Toloka welcomes new investors Bezos Expeditions and Mikhail Parakhin in strategic funding round

Learn more

Toloka welcomes new investors Bezos Expeditions and Mikhail Parakhin in strategic funding round

Learn more

Toloka welcomes new investors Bezos Expeditions and Mikhail Parakhin in strategic funding round

Learn more

Large Action Models: Transforming Human-Computer Interaction

Toloka Team

March 20, 2024

Essential ML Guide

Can your AI agent survive in the real world?

Training datasets are what it needs to reason, adapt, and act in unpredictable environments

Get traning data

Large Action Models or LAMs are a type of foundation model like large language models (LLMs). However, these two are very distinct in the way they interact with the world. LAMs represent a vital step towards seamless human-computer interaction, promising more intuitive, efficient, and personalized interactions in a variety of fields. Further, we'll get to know more about LAMs, their use, and how they are different from LLMs.

What are Large Action Models?

Large Action Model (LAM) is not just a single AI application that generates a response to a prompt or generates an image on demand. It is a full-fledged system that accomplishes complex tasks. It is similar to the way a person interacts with different apps on their smartphone. LAMs know how to interact with any user interface as if a human were doing it.

LAMs are AI models that can perform tasks without requiring any assistance helping humans achieve certain goals.

Large action models come under the spotlight after the recent launch of Rabbit's artificial intelligence device called R1. It runs on the rabbit OS which is powered by LAM and has a natural language interface. The device is a screen with a built-in microphone, a rotating camera, and an analog button to start a conversation with the AI device. There are no apps on the device like on a standard smartphone, instead, the creators of Rabbit R1 rely on a more intuitive interface, mostly voice-controlled.

The Rabbit R1 is an AI-powered gadget that can use your apps for you. Source: The Verge

The developers say that the rabbit OS understands everything the user tells it. It does this by recognizing human intentions, which are different for each person and can rapidly change. In addition to being able to understand what the users are saying, it also helps perform actions for them. It can translate, navigate the web, answer questions, play music on demand, order plane tickets, and more.

The user can interact with different applications through R1. The device can perform these actions because the LAMs can connect with real-world applications. With this capability, they can control other devices, automate processes, collect and utilize data.

How Does It Work?

To understand the structured nature of human-computer interactions within applications LAM uses neuro-symbolic programming. It is an artificial intelligence approach that combines techniques from both neural networks which are inspired by the structure of the brain and symbolic AI technologies which deal with logic and symbols. Neuro-symbolic techniques can help LAMs understand and represent the complex relationships between actions and human intention.

Furthermore, LAM assists the user in performing an action through extensive knowledge of user interfaces. This means that during the training process, the LAM model learns what a large number of user interfaces of websites and applications look like and the way they work.

During training, LAMs adapt the technique called “imitation through demonstration” or “learning through demonstration”. It means that they examine how people engage with interfaces as they click buttons or enter data, and then accurately mimic these actions. They collect the knowledge and learn from examples provided by users, making them more adaptable to further changes and capable of handling diverse tasks.

What’s The Difference Between LAM and LLM?

The distinction between LAMs and Large Language Models (LLMs) is the following: while LLMs are adept at generating text based on input prompts, LAMs focus on understanding actions and orchestrating sequences of actions to accomplish specific goals.

At the moment, LLMs are proficient in understanding and generating natural language text, but they may not be optimized for task execution. For example, they can recommend which flight is best to choose, but the booking on the airline's website has to be done by the user.

Large Action Models, on the other hand, are designed not just to understand language but also to take actions based on that understanding. They can book appointments, make reservations, or complete forms by interacting with applications or systems. LAMs are particularly designed to understand human intentions and perform actions in applications or systems, which makes them more suitable for interactions that are oriented to complete tasks.

Large Language Models rely primarily on neural network architectures for language processing, LAMs often incorporate hybrid approaches that combine neural networks with symbolic reasoning or planning algorithms. This enables LAMs to understand both the language context and the underlying structure of actions required to accomplish tasks effectively.

Business Cases for LAM

Large Action Models (LAMs) have huge potential across a variety of business applications in different industries. The following are some potential business cases for the use of LAM technology:

Virtual assistants and customer support. LAM has the potential to be adopted as a core framework for developing advanced virtual assistants that are capable of not only understanding and responding to customer queries but also carrying out tasks on their behalf;

Process automation. LAMs may automate repetitive and time-consuming activities for organizations, for example, data entering, paperwork handling, or stock management. Through voice input, for instance, similar documents can be filled out quickly. This can lead to significant time and cost savings, and with advances in recognition technology, improve accuracy and reduce errors;

Retail and customer service. LAMs can analyze customers' shopping history, preferences, and behavior to provide personalized recommendations. Based on past purchases, they may suggest relevant products, promotions, or recipe ideas to shoppers as they move through the supermarket aisles. On top of this, by analyzing customer feedback and sentiment, LAMs can identify areas for improvement, solving customer problems in real-time. All of this can help retailers increase overall shopping satisfaction and boost their revenue.

LAM in Chatbots and Assistants

Besides communicating and providing responses to questions, LAM personal assistants similar to Rabbit's R1 are capable of analyzing the preferences, habits, and past interactions of a user to provide personalized recommendations for various activities, such as restaurant recommendations, movies, books, or places to travel. Similarly, they can also offer personalized advice on health, fitness, or personal finance based on individual goals and preferences.

In addition, LAMs can integrate with smart home devices and IoT (Internet of Things) systems to control appliances, monitor power consumption, or enhance home security. They can respond to voice instructions, adjust devices' settings based on user preferences, and automate routine activities to enhance convenience and comfort.

The Future of LAMs

LAMs can be considered as part of the broader category of Generative AI and also as a leap forward from Large Language Models. A standalone device R1 developed by Rabbit is one of the earliest incarnations of the LAM, which is a small, intuitive, and fast AI computer. It's a simple device that can perform a huge number of complex tasks as if humans were doing them.

If that's just the beginning, what does the future hold when LAMs are more thoroughly defined and explored? Hopefully, they will become truly useful and accurate assistants for people. However, there are some concerns about giving them full autonomy. For example, it is not clear whether they can be trusted to make critical decisions and take actions. Therefore, there must remain human involvement and control in the design and customization of LAMs.

LAMs have all the prerequisites to become one of the most powerful AI technology. As they continue to improve their understanding of human intention and action execution, they will become increasingly effective at automating complex tasks within a variety of industries. This applies not only to routine administrative tasks but also to more complex decision-making and problem-solving processes. In summary, the future of LAM promises to revolutionize interaction with technology, task automation, productivity, and convenience in everyday life.

If you are seeking a reliable and trusted data partner to support the development of your own Generative AI model, Toloka stands out with its comprehensive offerings in supervised fine-tuning, RHLF, and evaluation. Toloka's expertise in these areas not only ensures high-quality training data but also enables fine-tuning and validation of Generative AI models for optimal performance and accuracy. Get in touch to discuss your GenAI model development pipeline with Toloka’s team.

Large Action Models: Transforming Human-Computer Interaction

Can your AI agent survive in the real world?

Can your AI agent survive in the real world?

Can your AI agent survive in the real world?

What are Large Action Models?

How Does It Work?

What’s The Difference Between LAM and LLM?

Business Cases for LAM

LAM in Chatbots and Assistants

The Future of LAMs

Read more about future of Generative AI in Toloka’s blog:

Recent articles

More about Toloka