From Words to Algorithms: a Career Journey to Language Engineering

Lisa Vasileva works as a Machine Learning Engineer at TAUS. She joined the company in 2020 as an intern, as a part of her Humanities Research Master studies in Human Language Technology (HLT) at Vrije Universiteit Amsterdam. Her background as a professional translator and the transition to the field of Natural Language Processing (NLP) not only coincidentally mirrors the journey of TAUS from its translation resource center days to a data company that it is today. With the world of translation getting more and more dominated by AI applications, Lisa’s journey also illustrates a career path that many other language professionals might embark on. We asked Lisa to tell us about her journey into NLP and more specifically on her recent contributions to the TAUS Estimate API offering.

What brought you to the world of translation?

I have always enjoyed learning foreign languages, so I decided to study Linguistics and major in Translation Studies, as it seemed a good application of my interests and aptitude. After graduating, I joined a localization company, where I started out as a trainee translator and went on to become a lead translator and reviewer. I think this choice did turn out to be a fitting occupation for me — I enjoyed translation, and spent almost 6 years in this industry before deciding to explore other career opportunities.

What motivated your decision to study Human Language Technology at VU Amsterdam?

In my job as a translator I learned about Machine Translation (MT) from a different angle: not just as a user and consumer, but also as an evaluator and post-editor. This experience gave me more insight into how MT engines are built and improved, and introduced me to the bigger field of NLP. I started considering jobs in NLP, but often felt that careers in this field were not something I have the right background for.

The HLT Master's program at the VU offers courses in NLP and welcomes people with a Linguistics background — this seemed the right (if not perfect) combination for me, so I applied.

How did you find TAUS? Tell us about your internship topic and experience.

TAUS participated at the annual Meet & Greet hosted for the students of the HLT program. At this event, companies come to talk about internship and employment opportunities, and students learn about prospective jobs and in-demand job profiles. TAUS looked like a great place to combine my experience in translation and my new skills in NLP, since they are helping companies improve their MT engines through better data and data solutions.

My internship project on automated detection of machine-generated and human-generated text in NMT framework was separate from the company operations but I got to know the team and learned about business activities and goals, so when offered to join in a full-time capacity after the internship, I gladly stayed.

What is your current role at TAUS? Please tell us about some of the recent projects you have been involved in.

I am a Machine Learning Engineer and in this role I build NLP-focused solutions for both internal applications and customer-oriented projects.

Most recently, I have been involved in advancing our Machine Translation Quality Estimation (MTQE) offering, and I really enjoy this work. MTQE systems become better every year, as evidenced by the results of WMT shared tasks in Quality Estimation. However, as their performance is improving, the systems are also becoming more complex and less interpretable. Now that quality predictions are more and more accurate, the industry seems to require better insight into what stands behind the predictions: Which aspects are being penalised? What contributes to a higher score?

Together with the wider Engineering and Data team at TAUS, I have been working on ways to make MTQE more interpretable by adding glassbox features to the MTQE predictions. One of these features serves to reveal if MT output is lacking in grammar, fluency or other linguistic aspects. At the moment this feature is made available as a complementary MTQE metric — LinguisticQE. It is a separate score intended to assess if MT output is grammatically correct, fluent and natural-sounding. It is currently available for selected language pairs, and we are working on expanding the language coverage and incorporating it into our general-purpose MTQE offering.

What are the aspects of your new profession as an ML engineer that you are particularly excited about?

On a higher level, I am excited and motivated by just how many problems and challenges can be tackled with NLP technologies, and in a meaningful way. NLP tools and techniques power many applications, from ones derived "directly" like auto-complete, spelling correction and writing tools, but also indirect applications like bias detection and mitigation, fact-checking and hate speech detection.

In the work we do at TAUS, I can see how we can continue to make MT better and more accessible to both companies and individuals — by making it more reliable with MTQE solutions and domain adaptation.

Now, the big question on Human versus Machine, in the realm of translation. What’s your take on the future of human translators?

That's a difficult question! Even though there has been amazing progress in MT in the last 10 years, I don't think translator as a profession is going to disappear any time soon. I would even be as bold as to say that MT will only completely replace human translators when we have reached true general artificial intelligence. (Which some in the community think might happen very soon if we are not careful!)

This might sound contradictory to what I said before, so I would like to emphasise: advancements in MT and NLP in general have made the work of a translator easier in many ways. As I see it, translators can be equipped to be more productive and spend more of their time on the truly creative parts of their job and less time on "boring" tasks, like LQA, which is becoming more efficient and accurate, or post-editing, which is more manageable thanks to domain adaptation. The way I see it, the role and skillset of a translator are transforming but not going away or being replaced.

What would be your advice to people with a linguistic/humanities background who are interested in exploring the NLP field? What would be the ways to get started?

From my experience, NLP is a truly interdisciplinary field, and subject matter expertise can greatly contribute to the field. It takes technical skills and knowledge to build NLP tools and systems, but it takes increasingly more in-domain expertise to evaluate them and advance their performance from good-enough to excellent.

Since I came to the field with a Linguistics background, I can see how linguistic knowledge is most helpful not just when creating the systems but also when evaluating them. In more mainstream domains and resource-rich language pairs, MT has reached the level of performance where adequacy and accuracy are consistently reasonable, and we are moving on to improvement of such aspects as fluency, readability and well-formedness, which are by default subjective and more difficult to evaluate.

In a Google Research paper on systematic differences between Supervised and Unsupervised MT output, this evaluation was tackled with a linguistically-motivated automated metric that allows to quantify aspects of fluency and naturalness through the carefully designed assessment of structural similarity between human and machine translation. For me, this metric is a good example of how the practical and theoretical linguistic toolsets can be incorporated into an NLP problem: knowledge of structural differences between languages and standards of translation quality is combined with the NLP tools to extract information necessary for modelling these structural differences, and to design a meaningful interpretation of the result. Confirmed in this way, hypothesis about stylistic differences between unsupervised and supervised MT gives way to further improvements of MT in terms of fluency, style and readability.

For anybody interested in NLP and curious to see if they can apply their knowledge in the field, there are so many resources to explore and it might be difficult to decide where to begin. Among other great sources, I find two really approachable and useful to a wide audience: Python for Everybody from Coursera (since Python provides an amazing ecosystem for NLP researchers and practitioners), and Making Friends with Machine Learning – a great intuitive introduction to Machine Learning.

Interested to learn more about the NLP work that Lisa and the rest of the TAUS Engineering and Data team does? Contact us.

From Words to Algorithms: a Career Journey to Language Engineering

Explore the fascinating journey of Lisa Vasileva, a Machine Learning Engineer at TAUS, as she transitions from a professional translator to the field of Natural Language Processing (NLP).