Like most machine learning applications, to get intelligent results from machine translation tools you need training data.
We have reached the conclusion that more data is not always better. Instead of massive amounts of data, we need high-quality data, clustered for specific domains and content types
Agenda
The ideal query corpus
Tips on training data optimization and evaluation
Where can you find high-quality training data?
Use case presentations by Lilt and SYSTRAN