I was a bit skeptical at first, but I must admit now: this is a sea change, a revolution as big as the invention of the printing press. Future generations will refer to 2023 as the time of a paradigmatic transformation for knowledge workers and a leap forward in Artificial Intelligence.
I was skeptical at first when the news came out about ChatGPT generating well-written articles and fluent translations, because it seemed to me to be more of the same. Old news. We have been training neural networks for years already and we know that the data we put in to train the models can do miracles. It’s not like now suddenly we have machines that can think like humans. They still make mistakes. In fact, the mistakes are now more treacherous, hallucinations as people say. They make you think it’s perfect but underneath it’s all made up and false. So why suddenly this hype? What’s new?
What’s new is the massive scale of the Large Language Models. Taking everything that has ever been published on the internet as your training data and manipulating that with billions of parameters. It sounds like an astronomer's description of the universe or a physicist's description of the realms of particles. It’s beyond our imagination, intangible and yet it’s real, perhaps even more real than our visible world. It’s real because empirically speaking it works. The results speak for themselves. In the world of translation, it’s like moving away from your own trusted translation memories and dedicated MT engine to … yes, an invisible universe. How can that be better? It’s a leap of faith.
It’s a leap of faith that millions suddenly are prepared to take. That is the other thing that’s new. The rapid adoption creates an unprecedented speed of innovation. Every week, if not every day, there are new breakthroughs. OpenAI, which stood at the start of this revolution with ChatGPT, certainly doesn’t have a monopoly anymore. There is already a wide choice of LLMs from other big tech companies and start-ups. The word is spreading now that no big or small company will be able to control this beast of innovation. Since Meta leaked its foundational 65 billion parameter Large Language Model LLaMA into open source at the end of February, we have seen many of the barriers to widespread adoption being broken. No, not everybody needs to train their own large model. In an open-source world, we can leverage each other’s models. No, the cost of using this technology is no longer prohibitive. We can run it from our phones.
The AI revolution will have a big impact on the way we live and work and of course it will change the translation industry. The technology and the tools will change; jobs go away, new jobs will be created; businesses will disappear, new businesses will pop up. It is hard to envision how exactly the process of change will happen and what the landscape will look like in the aftermath. But there are some things that we can speculate or be quite certain about.
Data is at the core of the AI revolution. Trillions of words all accumulated in massive language models generate these amazing results: fluently written texts and translations. Interestingly enough the translation industry has a history of automating its processes using similar data, words in the form of translation memories, but the size of our data collections is nothing compared to the LLMs. The good news for operators in the translation industry is that data quality outweighs data size. The NLP engineers are starting to realize that there is no merit in making the Large Language Models even larger. (Meta’s LLaMA is actually relatively small compared to other LLMs.) The challenge now is to make them smarter, better, more compact and more intelligent. The latest developments also show that the models become stackable, meaning that we do not have to retrain large models from scratch every time. We can leverage existing models with our own data to customize and improve the quality and the results.
The translation industry sits on a wealth, an ocean, of most valuable data that can make the difference between nice demo’s and production-ready and quality-proof translation systems. However, the reality is that most of that translation memory data is not in the right shape and condition to be used in the new AI processes. We speculate that it will take quite some time for language service providers and their customers to clean their data and be ready to go that extra mile and get the optimal results. Those that move the fastest will be the winners.
Quality review traditionally is a painstaking and time-consuming process in the translation industry. It is an informal and ad-hoc process fully based on human judgment. This form of quality control is incompatible with AI-driven translation. Encouraged by the impressive results we see more enterprises becoming interested in adopting what they call an MT First Strategy. What this means is that they will use MT technology to translate everything, literally all content that exists in and around their organizations, and then in the second instance, they will decide which parts of the translated content require some editing, polishing and review.
But how will they decide? It is physically impossible for human reviewers to plow through the massive volumes of MT’d content. The AI revolution has brought along a solution for this too. It’s called Quality Estimation or Quality Prediction. Here again, we are looking at machines that have been trained with good and bad examples of translations to do a fully automatic judgment of the quality of translations. The first of these AI-driven QE systems are already on the market. These new AIQE systems will become standard features in the translation industry. We believe that these features will be instrumental in separating fully automatic translation workflows from human-in-the-loop workflows.
Since MT gained more popularity in the past six years we have seen CAT tools and TMS platforms being populated with ever more MT plug-ins. This kind of illustrated the overall thinking: MT is a nice-to-have for the parts of content for which no translation memory matches exist. The AI revolution that we are going through now could change that predominant machine-assisted translation thinking completely. MT, empowered by LLMs, could very well become the first translation resource for everything. CAT tools could lose their hegemony in the translation industry and become boutique-style workstations for professionals. TMS platforms will be challenged to transform into hybrid AI-driven platforms and to compete with brand new GenAI multi-modal (translation) platforms directly built onto LLMs.
It’s hard to predict how the technology and tools race will play out. Innovation on this front will be the most exciting development to follow.
The localization industry is at the forefront of this AI revolution. After all, it is the Language Models that trigger these miraculous results, expanding further into the workspace of translators, editors, writers and lots of other knowledge workers’ professions. Jobs are changing and disappearing. New jobs and opportunities will undoubtedly come up. New services and skills will be needed. In cases where it will be just as easy, better or faster to generate multilingual content, translation may be skipped altogether. The business will be redefined. The destiny of the localization industry is in many ways to converge into other industries and with that to dissolve itself as a separately identifiable sector. Localization becomes a function of customer support, digital marketing, manufacturing, or HR.We believe that the AI revolution in the translation industry will not lead to unemployment, quite the opposite. The general expectation that translation of everything is ubiquitously available, a utility just like electricity, water and internet, will be the motor for the creation of many new jobs. But, in the end, things may not move as fast as some of us perhaps wish for. There is still a lot to worry about and work to be done for governments and large enterprises to lay the groundwork for this AI revolution to realize its full potential in the world of translation.
Want to dive deeper into the conversation and hear different perspectives? Save your seat and join us on upcoming webinar on ChatGPT and the Takeaways for the Translation Industry on May 31st!
Jaap van der Meer founded TAUS in 2004. He is a language industry pioneer and visionary, who started his first translation company, INK, in The Netherlands in 1980. Jaap is a regular speaker at conferences and author of many articles about technologies, translation and globalization trends.