DIN 19461:2026-06
Language resources and language technology - Derived text formats (DTF) / Note: Date of issue 2026-05-01
| Fecha edición: |
2026-06-01
En Vigor
|
|---|---|
| Idiomas disponibles: | Alemán |
| Resumen: | Derived text formats are abstracted representations of an original text that remove copyrighted content but retain relevant information for text and data mining (TDM). Examples are word lists or N-grams. They enable legally compliant research, transparency and reusability. One area of application for derived text formats is the development and improvement of Large Language Models (LLMs). This document sets out general principles for derived text formats as such and for their creation and provision. Based on this, analysis procedures can then be adapted to the derived text formats. By using this document, the limits of the analysis procedures, e. g. for the analysis of protected works, can be named and described. The aim of these principles is to make the use of text collections more legally secure and sustainable, especially in the case of protected works, to facilitate cooperation, to create trust and to open up new possibilities for the use of modern analysis methods. |
| Keywords: | Analysis|Copyright|Data|Data mining|Definitions|Derived|Information|Information management|Information processing|Information sciences|Languages|Methods|Methods of analysis|Natural language systems|Procedures|Resources|Technology|Terminology|Text|Text processings |
| ICS: | 01.140.20 - Ciencias de la información, 01.020 - Terminología (principios y coordinación) |
| CTN: |










