Effectiveness of Domain-specific Language Data

We've run an in-domain data experiment in the WMT Workgroup to measure the effectiveness of domain-specific training data. TAUS Matching Data corpora performed strongly across all language pairs and proved that fine tunning the data brings a guaranteed BLEU score improvement.

Author
milica-panić

Milica is a marketing professional with over 10 years in the field. As TAUS Head of Product Marketing she manages the positioning and commercialization of TAUS data services and products, as well as the development of taus.net. Before joining TAUS in 2017, she worked in various roles at Booking.com, including localization management, project management, and content marketing. Milica holds two MAs in Dutch Language and Literature, from the University of Belgrade and Leiden University. She is passionate about continuously inventing new ways to teach languages.

Related Articles
11/03/2024
Purchase TAUS's exclusive data collection, featuring close to 7.4 billion words, covering 483 language pairs, now available at discounts exceeding 95% of the original value.
09/11/2023
Explore the crucial role of language data in training and fine-tuning LLMs and GenAI, ensuring high-quality, context-aware translations, fostering the symbiosis of human and machine in the localization sector.
19/12/2022
Domain adaptation approaches can be categorized into three categories according to the level of supervision used during the training process.