The last significant breakthrough in the technology of statistical machine translation (SMT) was in 2005. That year, David Chiang published his famous paper on hierarchical translation models that allowed to significantly improve the quality of statistical MT between distant languages. Nowadays we are standing on the verge of an even more exciting moment in MT history: deep learning (DL) is taking MT towards much higher accuracy and finally brings human-like semantics to the translation process.
In general terms, DL is a family of machine learning algorithms that use multilayer artificial neural networks to efficiently learn representation of high-level features from noisy observations. Artificial neural networks, inspired by human knowledge about the biological brain, have already led to breakthroughs in several data-centered fields, including speech recognition, computer vision, user behavior prediction and nonlinear classification.
The idea to use a computational model of natural neurons to learn from data is not new. In fact, modern DL is a reincarnation of traditional neural networks with a low number of layers from the early 1990s. The key difference between old-style neural networks and DL is the iterative multilayer architecture that allows the latter to learn and represent features in a more complete (not all features can be defined by experts) and more accurate (multiple levels of representation) way. Another key factor of DL success is the ability of multilayer neural networks to learn an optimal set of features describing objects automatically instead of hand-engineered features traditionally used in old neural networks. Training of DL networks requires extensive computational resources (by the way, their unavailability was the main reason why DL did not capture the headlines earlier) and abundant training data, but allows to discover dependencies that were previously undiscoverable.
Many modern natural language processing applications heavily rely on machine learning methods. This is the reason why expectations from DL Natural Language Processing (NLP) in general and DL MT, in particular, were extremely high. However, in practice, learning higher levels of abstraction (semantics) derived from lower levels (lexical features) via intermediate steps (morphology, syntax, etc.) turned out to be a difficult task and the usability of DL for NLP remained a big question for some years. The main reasons for that are:
However, nowadays the status quo is changing markedly:
DL is currently on the verge of breaking the quality barrier, making MT smarter. On the other hand, DL MT is still just an extended version of a statistical approach which the most-quoted scientist alive today, Chomsky, criticized saying that “statistical models have been proven incapable of learning language”. In other words, while significant improvement in terms of quality can be expected because DL helps learn a distributed semantic representation of human language, it is not immediately able to accurately generalize from the discrete word space based on the finite training dataset.
Maxim Khalilov is currently a head of R&D at Glovo, a Spanish on-demand courier service unicorn. Prior to that he was a director of applied artificial intelligence at Unbabel, a company disrupting the customer service market with machine translation and worked a product owner in data science at Booking.com responsible for exploitation, collection and exploitation of digital content for hospitality market. Maxim is also a co-founder of a Natural Language Processing company NLPPeople.com, has a Ph.D. from Polytechnic University of Catalonia (Barcelona, 2009), an MBA from IE Business School (Madrid, 2016) and is the author of more than 30 scientific publications.