Nowadays, Machine Translation (MT) is increasingly being considered as another asset in the translation industry along with the translation memories (TM) and the terminologies. Language Services Providers (LSP) are gradually pondering the benefits of MT in their projects and either creating a specific department for MT matters, partnering with major MT engine providers, or trusting their client’s MT results. Thus, the usage of MT is a global fact, but its application in particular contexts still has to be explored.
With the purpose of gathering data on real-life practices, a UAB PhD researcher María Do Campo Bayón launched a survey addressed to Language Services Providers to understand how much data is needed to create and train an MT engine. The survey was open to descriptive answers about the LSPs’ use and/or development of MT. The call for participation was made with the collaboration of TAUS.
These are some of the insights:
LSPs reported that their decision to make such an investment is usually justified or driven by the following criteria:
Once the motivation is established, it is important to have a clear idea of what you want to achieve with the MT engine. Based on that, you need to determine the minimum amount of data needed to train the MT engine. 55.6% of respondents answered that they have determined the minimal amount of data needed for the training of one or more language pairs whereas the remaining 44.4% has not yet established such a figure.
Among those who have already established minimal figures, the indicated volumes vary, although they all mention relatively big figures. The range goes from 10 - 15 million to at least around 80,000 segment pairs. Different language pairs and content types will demand more or fewer data, but in general, we can establish a minimum span of around 500,000 - 1,000,000 segment pairs.
Client’s TM (50%)
Corpus from specific content type (17%)
Client-based terminology (8%)
General corpus (8%)
All of the above (17%)
After the training, you need to set up a testing phase that matches your project’s pipeline and workflow. The answers revealed that, for testing, LSPs follow a mix of automatic and human evaluations. Most participants combine automatic metrics with tests involving linguists (post-editing tasks, manual scoring tasks, etc.). A third of the companies choose to do only human evaluations such as editing by a native translator, human revision, or outsourcing to machine learning specialists. LSPs have also reported that they use only automatic metrics such as scripts for BLEU and TER and comparison reports. There are also companies that define a specific process based on their pipeline and type of projects and clients. In one specific example the LSP first contrasts automatic metrics with the human translation. They then use a test set, extract 1000 words from a representative text and run automatic and human evaluations to compare specific vs. generic engines. They repeat
these tests until acceptable metrics and scores are achieved.
When asked about the kind of indicators used, companies use all available indicators: human evaluation, automatic evaluation, edit distance, post-editing effort, and productivity tests (only carried out in long-term projects).
Regarding the kind of comparisons they do, these are the results:
It is difficult to establish clear rules or guidelines when approving the use of an MT engine. That is the reason why the survey also asked for minimal thresholds in two scenarios - low and high impact/visibility projects.
First, we asked respondents for a minimum threshold in low impact/visibility technical documentation projects. These are the common answers:
Then, we asked the same question for a high impact/visibility project. The answers are different:
All answers have one thing in common and that is that they all mention qu
ality. As quality is a well-known controversial term in translation theory, it is important to determine what is considered as MT output quality. Most respondents (50%) refer to engine output quality in terms of productivity gain, for example, if post-editing is faster than human translation, if the edit-distance is lower or the project is delivered earlier Others (17%) consider an engine good enough to use if the style is acceptable. Finally, a great group of participants (33%) relies on the consistency over the trials, o
n the acceptable metrics and A/F scores, and on the “green light” of human evaluators.
Even if LSPs are already working with a tested and evaluated MT engine, they may consider making improvements to the engine in the future. When it comes to improving the engines, all survey participants agree on two main reasons for adjustments: post-editors’ performance (speed and quality) and client's needs. So, it is advisable to ask for feedback from both post-editors and clients and continue reporting on MT productivity and quality. Nevertheless, 11.1% of companies reveal that the cost is also a factor to consider.
Final recommendations
Before the training phase:
After the training phase:
When the engine is in a production environment:
ABOUT THE AUTHORS