TAUS Webinar - Data Cleaning 101

EPIC

Resources

Data Cleaning 101

23 February 2021

5:00 - 6:00 pm CEST

Every company ‘sits’ on a mountain of language data in translation memories and content management systems. But that data are locked up in legacy formats and templates that make them not very useful and accessible in the modern scenarios of machine translation...

Watch recording

Agenda

Problems in data (why cleaning is required)

The available tools and their limitations

Cleaning based on sentence embeddings (Laser, LaBSE)

Comparison with examples