Quality Estimation serves as a powerful tool for risk management in translation workflows. By providing accurate predictions of the quality of machine-generated content, it enables proactive identification of potential issues before they impact the final output. This proactive approach allows for timely adjustments and corrections, mitigating the risk of delivering subpar or inaccurate translations to clients or end-users.
Moreover, QE contributes to better resource allocation by identifying segments or areas of content that may require additional attention. This foresight aids in optimizing workflows, ensuring that human resources are directed where they are most needed. Ultimately, the risk management aspect of QE enhances the overall reliability and reputation of translation services.
One of the key benefits of Quality Estimation is its significant impact on reducing post-editing efforts. By accurately predicting the quality of machine-generated content, QE helps identify segments that are likely to require post-editing intervention. This targeted approach minimizes the need for extensive post-editing across the entire document, saving time and resources.
Reduced post-editing efforts lead to increased efficiency in translation workflows. Translators can focus their efforts on refining specific areas flagged by QE, ensuring a more streamlined and effective post-editing process. This benefit is particularly valuable in scenarios where time is of the essence, enabling quicker delivery of high-quality translations to clients.
Quality Estimation facilitates the benchmarking of Machine Translation (MT) engines, providing valuable insights into their performance. By evaluating and comparing multiple MT engines, QE enables organizations to make informed decisions about the most suitable engine for specific projects or domains.
Benchmarking with QE helps organizations identify the strengths and weaknesses of different MT engines, allowing for data-driven decisions in selecting the most reliable and accurate solution. This benefit is crucial for companies operating in diverse industries with varying language requirements, ensuring that the chosen MT engine aligns with the specific needs and expectations of each project.
TAUS has trained a generic model for quality estimation based on the data in our Data Repository. The generic model is trained on 100+ languages and, as the name suggests, a more “generic” domain. The performance of this model differs per language and domain. This model is not set to a specific quality standard. This means that the user needs to do some exploration to find out the right threshold for their type of content and use case.
Custom models are trained on demand. For a custom model, TAUS works closely together with the client to train the model on their unique type of content (keeping in mind certain jargon and brand names) as well as on the specific quality expectations they may have. The custom model will be able to put out a custom score, set on the specified parameters. This can be a label (“good”, “bad”) or a number.
At TAUS, we maintain volume-based pricing for our Quality Estimation. Users can purchase a credit bundle that contains a number of characters (starting at 2 million). Every segment that gets sent through the API (both source and target) is counted and the number of characters are then subtracted from the credits available in their bundle. Once a bundle is depleted, users can easily purchase a new bundle.
A credit bundle can be used both on generic and custom models.
Estimate API, just like other API-based services is designed for easy integration into other applications. TAUS offers developer support and resources to assist with integration, further simplifying the process. We also have integrations with memoQ and Blackbird.
Deployed as a cloud-based service, QE can be highly scalable, allowing it to handle varying workloads and accommodate growing demands by provisioning additional resources dynamically. Additionally, we apply advanced machine learning techniques, such as distributed training and inference, to further enhance scalability by enabling efficient processing of large datasets and rapid response times.
The quality standard and expectations are of course subjective and differ per use case, domain and content type. It is really up to you and your use case to decide where to draw the line of good and bad quality. However, here are some guidelines from the TAUS NLP team to interpret the scores and make decisions when using V2 of the generic model:
0.70
0.80
0.88
0.90
Scores above 0.9 generally indicate good translations
0.88-0.9 is a gray area (can be good, might have issues)
Below 0.88 usually indicates at least minor errors
Below 0.8 suggests serious errors
Below 0.7 indicates very poor quality
The reliability of QE scores varies depending on the context and model used. QE scores are approximations, derived from mathematical representations of sentences, aiming to indicate similarity in meaning between translations. Generic models trained on vast multilingual data provide a broad understanding but may require human interpretation to correlate scores with human judgment. Customization of QE models offers flexibility, allowing tailoring to specific domains and language pairs, improving adaptability and certainty in score interpretation. Options for QE score categorization range from discrete labels like "poor", "average", "good", to continuous values, with custom models offering finer control over categorization based on labeled training data. Read more
Quality Estimation scores should not be confused with the scores generated by Translation Memory software. Full and fuzzy matches do not exist within the context of Quality Estimation, simply because there are no reference translations. Scores are generated based on models that are trained to rate both the accuracy and the fluency of the translations. A perfect translation (or 100% match) therefore does not occur with a QE model.
Quality Estimation models can be somewhat stricter with regard to translation accuracy. If for instance a human translator or post-editor chooses (for stylistic reasons) to skip a word in the translation or use a collective noun (instead of the specific noun), the Estimate API will return a lower score. A change in the syntaxis of a sentence is usually also penalized by the Estimate API.
Training custom QE models typically requires labeled data that consists of pairs of source and target sentences and their corresponding quality scores or labels. These quality scores can be human judgments indicating the perceived quality or fluency of translations. If labeled data is not available, we apply synthetic data generation techniques to augment the available data, either provided by the customer, or taken from the TAUS Data Repository.
A customized QE model tailored to a specific domain or dataset often yields more accurate predictions compared to a generic model. This is because it can leverage domain-specific features, nuances, and patterns that may not be captured effectively by a generic model. Consequently, the customized model can provide more relevant and precise insights, leading to improved decision-making and performance within its designated domain or context.
Data analysis and cleaning: if the customer has provided a training dataset, we analyze it to identify any inconsistencies, missing data, or other issues that need to be resolved before training.
Synthetic data generation: we generate synthetic data to augment the existing dataset and generate negative examples that are essential for optimal model performance.
Training: we fine-tune the QE model using the cleaned dataset.
Testing:we test the model's performance on a held-out test set to evaluate its accuracy and identify any areas for improvement.
TAUS does not store any data that is being sent through the API. Metadata such as language combinations and the quality scores, are stored so that users can gain insights into their quality levels over time, per language pair, per model, etc. through the Reports section of their TAUS account.
We have a full legal framework in place
that can be found here.