What a different world we suddenly find ourselves in. Who would have imagined, just two weeks ago, that planes would stay on the ground, that we are no longer queuing up on the autobahn, standing in line for the barista, that all the NBA games, the Eurovision Song Festival and the UEFA Euro 2020 would be canceled. Who would have thought that we, all together, find ourselves in a war with an invisible enemy that undermines our normal lives, our markets and our society?
Solidarity WinsIt is amazing to see how determined we are, as a species you could say, in a time of crisis like this. Draconian measures are being taken in most real-time by governments and local authorities. Economies are shut down and people are being isolated. We understand that this is necessary. Yet, we realize that real solidarity in our societies and communities depends totally on full disclosure and verifiable information. People spend hours searching, reading and studying the latest news on Corona and COVID-19. The virus is at work among us now. How do we resist it, beat it, or live with it? For most people, especially the younger generation, this crisis is an unprecedented experience.
It is reassuring to witness, in general, how virtuous behavior prevails, how solidarity seems to be winning for now at least, and how efficiently the necessary measures are being implemented. Most of us realize that things will get worse before they get any better. The coronavirus COVID-19 is now already affecting 176 countries (out of 195) around the world. Winning this battle in the long run depends on constant solidarity, trust, knowledge and understanding among all of our co-citizens on this planet.
TAUS is launching the Corona Crisis Corpus project. We will collect language data specific to virus outbreaks, health conditions and cures, symptoms and medicines, hospitals and treatments, and everything that citizens and patients around the world want and need to know about the coronavirus and our joint effort to vanquish it. We will clean, cluster and organize the data and make them available in the form of bilingual corpora in the TAUS Data Library. MT developers, Language Service Providers and everyone else who is training their own MT engines can come to our site, download these corpora and use them to improve their translation services and systems.
We will kick this off with a ‘Corona Starter Kit Corpus’ containing all relevant matches from the existing TAUS Data Cloud. This Corona Starter Kit Corpus will be available in a number of languages fairly soon.
We then invite translators and agencies as well as life sciences companies to contribute their own translation memories covering this same domain, so that together we can expand both the volume of good data and the language spread. TAUS will apply the Matching Data service on all the translation memories we receive so as to clean, cluster and organize the data into Corona Crisis Corpora in as many languages as possible. The resulting corpora will all be available on the TAUS Data Library.
We trust that Google, Microsoft, Facebook, Amazon, Systran, Iconic, and dozens of other small and large MT developers around the world will access the Corona Crisis Corpora and very quickly train and optimize their engines in order to help the 4.5 billion internet users and co-citizens on our planet who are searching every day for unbiased, solid information on this life-threatening crisis and find the right content in the languages they can read and understand.
If we do this job well, it will send out good vibrations about how our industry can help the world communicate better. Hopefully this effort will continue to reverberate after the dust settles and we return to business as usual.
Ah, and yes, of course, this is a charity project. TAUS is putting in the labor and infrastructure for free. We are asking all language data contributors to share their data for free. The Corona Crisis Corpora can all be downloaded for free. There is no money at stake at any point in this endeavor.
TAUS proposes the following rules for the Corona Crisis Corpus project:
Contact data@taus.net if you would like to share your medical data.
Jaap van der Meer founded TAUS in 2004. He is a language industry pioneer and visionary, who started his first translation company, INK, in The Netherlands in 1980. Jaap is a regular speaker at conferences and author of many articles about technologies, translation and globalization trends.