The Cetacean Translation Initiative (CETI) has been translating sperm whale communications since its founding in 2020. CETI is now made up of more than 50 researchers who are using artificial intelligence to help them understand the whales’ language. Now, with help from a team at MIT, they are using an LLM (large language model) to analyze the patterns in whale communications.

A pod of sperm whales underwater in the Atlantic ocean of Azores. Adobe
Sperm whales were carefully selected for this endeavor, as they have the largest brains on Earth. They live in complex, multi-generational social groups. Their method of communication by clicks makes them an ideal candidate for translation by an LLM.
The first phase in the mission to decode whale song was to construct a large-scale acoustic and behavioral data set to train an LLM to observe whale communication in context and translate it. After collection, the raw data was processed, visualized and prepared for machine learning.
Training WhaleLM
In its 2024 report, CETI announced that scientists created the sperm whale acoustic and behavioral data set that would be used to train an LLM. To build the data set, researchers created the CETI Glider System to record audio, drones to tag the whales and hydrodynamic whale tags to track movement. The drones and gliders have machine learning-based systems that allow them to follow whales based on the animals’ echolocation clicks and other data.
The resulting data set of whale sounds and behaviors revealed that sperm whales have a phonetic alphabet, according to CETI. Researchers have uncovered vowels and diphthongs, a combination of two adjacent vowel sounds within the same syllable, such as the ‘ou’ in hound, within the whales’ language. It also showed that sperm whales make a sequence of clicks called codas. Codas range from three to 40 clicks long and vary between social groups.
Computer scientists at MIT used CETI’s labeled data set to train a model that can annotate new data and separate the calls according to which whale is making them. They’re calling it WhaleLM. The model can learn the patterns of whale communications and predict what should come next. Based on codas, WhaleLM was able to predict current whale behavior with 72% accuracy and future actions with 86% accuracy, providing the first evidence that whales could be using codas to coordinate behavior, according to an unreviewed preprint.
However, major challenges remain. WhaleLM can predict patterns with high accuracy, but discovering the true meanings of the various codas will take additional time.
Whales have accents
Codas, the sequences of clicks sperm whales make to communicate, differ between social groups. This allows researchers to distinguish between groups by their codas, like the difference between an American and a British accent.
CETI researchers discovered that codas also have two other features, called rubato and ornamentation, that vary between groups. Rubato involves the intervals between clicks and ornamentation is the occasional addition of an extra click. Taking these variations into account, the researchers identified 8,719 distinct codas.
CETI is based on the Eastern Caribbean island of Dominica, where it partners with several local organizations as well as global partners such as National Geographic. Dominica is home to one of the few resident, non-migratory populations of sperm whales. This allows researchers to study the same whales and families for years, which could be essential for understanding their language. CETI researchers are continuing to track, tag and study the whales as the organization moves into the next phase of its mission.
CETI hopes to foster empathy for whales and other species by demonstrating their similarities to humans. The nonprofit organization hopes that attention and care for these whales will lead to the development of new conservation strategies.
“We remain eternal optimists that humans are capable of confronting any challenge, even fostering a deeper connectivity across life on Earth, and a collective desire and resolution to protect it,” CETI Founder David Gruber said in the 2024 report.



