On June 25-26 more than fifty thought leaders gathered at the Grand America Hotel in Salt Lake City with one objective in mind: to define the steps for improving the global content industry ecosystem and making it ready for emerging technologies and the changes to come. This creative process was guided by industry veterans Jaap van der Meer and Renato Beninatto.
To set the context and give a clear understanding of the world becoming increasingly global and the technology only further accelerating, Jaap van der Meer took us on a sentimental journey down translation automation memory lane. He highlighted some of the historical and political events that shaped the language industry as we know it today, making us realize not only that some of the people in the room that day played an active role in it, but also that some of those challenges are still valid today, and that we must accelerate in finding ways to overcome them.
The 50s were a decade with a few firsts in translation automation: the first Machine Translation Conference in 1952, the first public demonstration of Georgetown-IBM MT System in Russian in 1954. What followed were other conferences, creation of first research groups and publishing of scientific papers. Mid 60s, the term “artificial intelligence” (AI), coined by John McCarthy, becomes known to the world.
The 70s bring OpenLogos in 1968, an open source system translating from English and German into French, Italian, Spanish and Portuguese, meant to enhance the human translator's work environment. In the same year, Peter Toma founds LATSEC to market Systran – the first real commercial MT company, a pioneer that withstood the test of time.
In the early 80s, Carnegie Mellon University Computer Science Department develops Harpy, the first real-time 1,000 word continuous speech recognition system. A year later, in 1971, Logos begins working on a rule-based English-Vietnamese engine for the US authorities to be able to transfer military technology to South Vietnam. The same year, a project begins at Brigham Young University to develop an automated translation system.
Skipping a decade known for the launch of various MT companies, we get to 2005. The conditions are better - there are web services, the cloud, MOSES and statistical machine translation. TAUS is one year old and coming strong with the idea that translation should be available to every citizen of the world - The Promise of Translation "Out of the Wall".
Fast forward fourteen years, and there we were, at the TAUS Industry Leaders Forum in Salt Lake City, in the era of Neural Machine Translation (NMT), and amidst the fourth industrial revolution caused by digitalization. The future is change, and we are not ready: our global content pipelines are not yet fully invisible, autonomous and data-driven, and the global content supply chain is largely fragmented. To prepare, we need to bridge the knowledge, operational and data gaps on the industry-level, together!
Aiman Copty (Oracle), JP Barraza (Systran) and Stéphan Déry (Canada Translation Bureau) shared their thoughts on traditional roles, processes and data acquisition models, to inspire honest discussions in the following sessions.
Machine Translation (MT) is not a solution that replaces humans, it is making content that you have no time for available, explains JP. With the proliferation of NMT and the content explosion, Aiman sees an opportunity for language professionals to become language advisors. What remains a question is where we will be looking for these experts in the future.
Operationally, the future involves an invisible translation workflow and moving up the value chain in MT with language-specific design. Most of the products will have machine learning built into them, explains Aiman, and it will be the international element that a device needs to learn. At the Translation Bureau Canada, they’ve already seen a 40-55% efficiency gain from the switch from TM to NMT, but they still outsource tens of millions of dollars worth of translations. How do I pay my translators? How do I measure?, is an unsolved question for Stéphan.
Data is crucial if we want translations to become an extension of the products, Aiman stresses. JP points out that in domains with little data available, you need a human in the loop. The long tail conversation and the race for data has also started for the Translation Bureau Canada, as the government announced their plan to focus on tens of indigenous languages in the near future.
The panel with Mimi Hills (VMWare), Bert Vander Meeren (Google) and Shelby Cemer (Microsoft) sets the stage for the buyers discussion.
The operational gap with buyers seems to lie with the types of content that still require heavy manual work. Ideally, they want to move toward low or no touch. It also appears that for some, the tolerance threshold for quality went down for some pieces of content and locales, while on the service side they still have a clearly defined threshold. Getting data around it from customers becomes key here, even though user feedback can be hard to mine.
The data gap is clearly visible in the push from the market to go into long tail languages where data is currently missing. At the intersection between operations and data, where business decisions on the market language planning side are made, there is an opportunity for a more consolidated approach.
The panelists Jim Saunders (SDL), Jack Welde (Smartling), Sarah Weldon (Google) and Bob Willans (XTM) each share the same perspective when it comes to MT and AI - they see them as catalysts for change and the way to a no-touch automated supply chain. The technology marches forward and we need to catch-up.
Customization is the new niche, and that is where the data is greatly needed. When it comes to the knowledge gaps, the panelists stressed the importance of the cultural mindshift that has to occur among the people, not only the scarcity of resources.
Let’s stop talking about the factory, let’s talk about the outcomes!, said Jaime Punishill (Lionbridge), comparing the development of the global content industry with fintech. Panelists Erik Vogt (RWS Moravia), Olga Beregovaya (Welocalize), Garry Levitt (thebigword), Allison McDougall (AMPLEXOR) continued on the same line of thought.
Not all content is created equal, and language service providers need to look at who is going to use the content, where it will be served and consumed and what its shelf life is. The content transformation introduces new value added services based around data, marketing and digital transformation.
They also need to be able to follow the customer demand and innovate on behalf of the client at the same time, without losing the reputation for problem solving or losing their margins. An opportunity to help close the knowledge gap is by going beyond the usual pool of linguists to help customers identify the talents.
It is clear that the industry is in the phase of transformation and that MT is not the boogeyman anymore. While the buyers are focused on the high level topics like making the process invisible, data privacy and GDPR, the providers are rushing to develop new value-added services and are worrying about the access to data and data ownership. There is a clear need for more collaboration, consolidation and shift in focus to the outcomes, not the factory!
Milica is a marketing professional with over 10 years in the field. As TAUS Head of Product Marketing she manages the positioning and commercialization of TAUS data services and products, as well as the development of taus.net. Before joining TAUS in 2017, she worked in various roles at Booking.com, including localization management, project management, and content marketing. Milica holds two MAs in Dutch Language and Literature, from the University of Belgrade and Leiden University. She is passionate about continuously inventing new ways to teach languages.