PHASE #3: Datasheets for Machine Translation

As mentioned in the previous post, this template is inspired by Datasheet for Datasets [1], and thanks to our research team an extension on the subject of Machine Translation (MT) has been created (Phase 1). In order to showcase to the community the possible advantages of having such a documentation, we provide some examples regarding well-known datasets in the Computational Linguistic field (Phase 2).

3.1 Machine Translation template

Below, we attach the Machine Translation template that can be freely used and accesed .

Datasheet_for_Dataset_Template.pdf

Questions that haven been taken from Datasheet for Datasets [1] can be found in the previous post (Phase 1). The questions that our team reported concerning MT can also be found in the same post (Phase 2) . Moreover, we offer an explanation on why we consider these questions to be a fundament part of the Datasheet.

3.2 Datasheet examples

The research team provide the first examples to the community in order to inspire them to complete and publish the Datasheet for Machine Translation regarding released or to be released datasets.

  • Europarl v10

Europarl_v10_Datasheet.pdf

  • New-Commentary v15

News_v15_Datasheet.pdf

References

[1] Datasheets for Datasets

HOME

Written on April 27, 2020