Language Data

Resources

Try for Free

language data

Language Data

TAUS Data Sale to Boost Multilingual LLMs

by Anne-Maj van der Meer

11/03/2024

Purchase TAUS's exclusive data collection, featuring close to 7.4 billion words, covering 483 language pairs, now available at discounts exceeding 95% of the original value.

Language Data

Transforming Translations: The Crucial Role of Language Data in the Age of Large Language Models and Generative AI

by Anne-Maj van der Meer

09/11/2023

Explore the crucial role of language data in training and fine-tuning LLMs and GenAI, ensuring high-quality, context-aware translations, fostering the symbiosis of human and machine in the localization sector.

Language Data

Domain Adaptation: Types and Methods

by Anne-Maj van der Meer

19/12/2022

Domain adaptation approaches can be categorized into three categories according to the level of supervision used during the training process.

Language Data

Ten-Step Guide to Data Cleaning

by Anne-Maj van der Meer

19/12/2022

Machine learning and AI applications need data in order to work. And in order to get good results and output, the cleaner the data, the better.

Language Data

A Brief Introduction to Text Summarization

by Anne-Maj van der Meer

19/12/2022

Text Summarization can be categorized under two types: Extraction and Abstraction. With the power of AI, summarization is becoming more popular and accessible.

Language Data

Synthetic Data Generation for Neural Machine Translation

by Lahorka Nikolovski

07/10/2022

Synthetic parallel data generation by back-translation as a solution for the problem of translating low-resource languages and texts from low-resource domains.

Language Data

What is Speech Recognition and how to do it?

by Pamela Álvarez Ferreira

22/06/2022

The implementation of AI & ML algorithms and computation techniques are helping to improve the accuracy of recognizing speech into text

Language Data

Types of Audio Transcription and when to use them

by Pamela Álvarez Ferreira

20/05/2022

It is crucial to choose the right audio transcription type between verbatim, edited, intelligent, and phonetic, to best suit your transcription project needs

Language Data

Natural Language Technologies (NLT) to Drive the Next Generation of AI Solutions

by Şölen Aslan

03/03/2022

Natural Language Technologies are on the rise: making optimal use of NLT and its subcategories is crucial to remain up-to-date with the latest AI solutions

Language Data

NLP-driven Word Clouds in Data Marketplace

by András Aponyi

03/01/2022

What can word clouds driven by NLP tell you about your training datasets? Here is how we create word clouds on TAUS Data Marketplace.

Language Data

Data-Enhanced Machine Translation

by Jaap van der Meer

02/12/2021

The next logical translation solution: Data Enhanced Machine Translation (DEMT)

Language Data

Data and AI Trends in 2022

by Şölen Aslan

01/12/2021

Which language data for AI trends you should expect to rise in 2022: expansion of multilingual AI data and models, more companies joining the data market, data diversity and lifelong learning machines.