This report presents the results of a benchmarking exercise, outlining our testing methodology, the training process, and all the key insights gained from analyzing the correlation between QE scores and human evaluations.
Contents of the Report
The Setup - Defining the Benchmarking Framework and Evaluation Process
The Results - Correlation between QE Scores and Human Labels
Key Takeaways - Strengths & Limitations
Case Studies