The language technology lab carries out research in the field of Natural Language Processing
We strongly believe that engineering is a key part of research in this field and that often a new insight is only to be found when re-implementing an approach. We are especially interested in analyzing and processing non-standard, error-prone language as found in social media and learner language.
Consequently, we mainly focus on two areas of specialization:
Educational NLP: Short answer scoring, Essay scoring, Vocabulary Acquisition, Spelling and grammar correction
Social Media Analysis: Robustness of tools, Domain adaption, Large-scale semantic processing
Areas
-
exploreEducational NLP
Educational Language Technology
We mainly focus these research areas of educational language technology:
Vorhersage Schwierigkeit:
-
C-tests
• Lisa Beinborn, Torsten Zesch and Iryna Gurevych: Predicting the Difficulty of Language Proficiency Tests - Texte / CEFR
• Ildikó Pilán, Elena Volodina and Torsten Zesch: Predicting proficiency levels in learner writings by transferring a linguistic complexity model from expert-written coursebooks
Wortschatz:
- Lexical Recognition Tests (Arabisch)
• Osama Hamed and Torsten Zesch: Automatic Diacritization as Prerequisite Towards the Automatic Generation of Arabic Lexical Recognition Tests
Bewertung und Analyse von Texten:
-
Toolkit
• Torsten Zesch and Andrea Horbach: ESCRITO-An NLP-Enhanced Educational Scoring Toolkit -
Essays / Aufsätze
Quality:
• Andrea Horbach, Dirk Scholten-Akoun, Yuning Ding and Torsten Zesch: Fine-grained essay scoring of a complex writing task for native speakers
• Torsten Zesch, Michael Wojatzki and Dirk Scholten-Akoun: Task-Independent Features for Automated Essay Grading -
Kurzantworten
Allgemein:
• Andrea Horbach and Torsten Zesch: The Influence of Variance in Learner Answers on Automatic Content Scoring
• Brian Riordan, Andrea Horbach, Aoife Cahill, Torsten Zesch and Chong Min Lee: Investigating neural architectures for short answer scoring
Chinesisch:
• Yuning Ding, Andrea Horbach, Haoshi Wang, Xuefeng Song andTorsten Zesch: Chinese Content Scoring: Open-Access Datasets and Features on Different Segmentation Levels
Cross-lingual:
• Andrea Horbach, Sebastian Stennmanns and Torsten Zesch: Cross-lingual Content Scoring -
Lese- und Hörverstehen
in Begutachtung ... -
Clustering zur Bewertungsunterstützung
• Torsten Zesch, Michael Heilman and Aoife Cahill: Reducing Annotation Efforts in Supervised Short Answer Scoring - Robustheit (Spelling, Adversarials)
• Yuning Ding, Brian Riordan, Andrea Horbach, Aoife Cahill and Torsten Zesch: Don't take "nswvtnvakgxpm" for an answer - The surprising vulnerability of automatic content scoring systems to adversarial input
• Andrea Horbach, Yuning Ding and Torsten Zesch: The Influence of Spelling Errors on Content Scoring Performance
Generierung Sprachtests
-
Gap-fill bundles
• Niklas Meyer, Michael Wojatzki and Torsten Zesch:
Validating Bundled Gap Filling – Empirical Evidence for Ambiguity Reduction and Language Proficiency Testing Capabilities -
c-Tests
• Torsten Zesch, Andrea Horbach, Melanie Goggin and Jennifer Wrede-Jackes: A flexible online system for curating reduced redundancy language exercises and tests -
Reading comprehension
• Andrea Horbach, Itziar Aldabe, Marie Bexte, Oier Lopez de Lacalle and Montse Maritxalar: Appropriateness and Pedagogic Usefulness of Reading Comprehension Questions - Challenging Distractors
• Torsten Zesch and Oren Melamud: Automatic Generation of Challenging Distractors Using Context-Sensitive Inference Rules
Feedback (work in progress)
-
Inhaltlich für Short Answers
-
Highlighting und statistische Analyse sprachlicher Merkmale
- Rechtschreibfehler Grundschüler
(andere) Modalitäten
-
Handwriting
• Christian Gold and Torsten Zesch: Exploring the Impact of Handwriting Recognition on the Automated Scoring of Handwritten Student Answers -
Picture/Text (work in progress)
• Bewertung von picture description tasks - Spoken Language (work in progress)
• Short-answer scoring (spoken answers)
• Bewertung Sprachstand gesprochene Sprache
-
-
exploreSocial Media
Social Media Analysis
We mainly focus on these research areas of social media analysis:
- Robustness of tools
- Domain adaption
- Large-scale semantic processing
- Stance detection
- Hate speech detection
- Paraphrase and entailment recognition
- Robust and scalable preprocessing
-
exploreNLP Engineering
NLP engineering
We are committed to reproducible and replicable research. Thus, we develop and maintain multiple open-source software projects:
- DKPro
- JWPL
-
exploreCyber-Physical Systems
Human-centered Cyber-Physical Systems
Language Technology Lab is a part of the research profile Human-centered Cyber-physical Systems in the Faculty of Engineering.
Language plays an important role in in cyber-physical systems as a means for communication between humans and such systems. Thus, it is necessary to better understand how machines can automatically understand language. On the one hand, we need to analyze the structure of the language, e.g. by automatically identifying POS tags. On the other hand, we need to semantically analyze a given statement and to contextualize it given a certain interaction. For this purpose, it is necessary to better understand the role of the language users in a communication process.
Ongoing
-
business_centerUser-Centered Social Media (DFG Research Training Group 2015-2020)
User-Centred Social Media
The Research Training Group “User-Centred Social Media” (UCSM) is an interdisciplinary Research Training Group (Graduiertenkolleg) at the Department of Computer Science and Applied Cognitive Science of the University of Duisburg-Essen. This programme is funded by DFG and starts on October 1, 2015.
The emergence of Social Media marks a significant step in the application of information and communication technology with a profound impact on people, businesses, and society. Social Media constitute complex sociotechnical systems, encompassing potentially very large user groups, both in public and organizational contexts, and exhibiting features such as user-generated content, social interaction and awareness, and emergent functionality. While Social Media use is widespread and increasing, significant research gaps exist with respect to analyzing and understanding the characteristics and determinants of user behaviour, both at the individual and the collective level, as well as regarding the user-centered design of Social Media systems, aiming at empowering users to better appropriate, control and adapt systems for their individual goals. There is a growing demand in academia and in industry for scientifically trained experts that are knowledgeable both in the human-oriented and the technical aspects of Social Media.
More information can be found at the User-Centred Social Media Homepage
-
business_centerArgument-Based Decision Support for Recommender Systems (DFG SPP RATIO 2018-2020)
Argument-Based Decision Support for Recommender Systems (ASSURE)
ASSURE is a project within the SPP RATIO.
Argumentative statements contained in user-generated texts such as online product reviews can significantly facilitate a user’s decision. Recommender systems aim at alleviating the user’s decision problem by suggesting items the user is likely interested in, but do not exploit the potential of reasoned arguments given for or against a certain item or its properties. The overall objective of ASSURE project is to make use of arguments embedded in online reviews to significantly improve the quality and transparency of recommendations given by the system, and to provide users with a much higher level of interactive control over the recommendation process than is currently the case.The project aims at advancing the state of the art in several respects: Firstly, we will develop novel methods for extracting arguments from the typically informal texts found in user reviews. We will further enrich the arguments with annotations of how specific and how emotionally intense they are.
Secondly, we will combine the extracted arguments and the additional annotations with user ratings and other item-related data in an integrated user and item model to improve the effectiveness of recommender algorithms. This model will also provide a basis for developing novel techniques through which users can interactively explore, filter, or weight different arguments, as well as other data, to control how recommendations are generated. Thirdly, we will develop methods for providing users with personalized, argument-based explanations of the items recommended. A further important outcome of the project will be a dataset of unprecedented quality and size that is annotated on different layers regarding argumentation. Such a dataset is a prerequisite for further research on argumentation in the context of recommending, and will be suited for use in shared tasks that form part of the priority program.
More information can be found at the ASSURE Homepage.
-
business_centerAutomatic Scoring of Free-Text Answer (TestDaF 2019-2020)
The TestDaF Institute is one of the biggest providers of language proficiency testing for German as a foreign language. LTL collaborates with the TestDaF institutes on automatic and assisted scoring of free-form answers, more precisely answers given to listening comprehension prompts and learner essays. We explore way how the the scoring workload of humans graders can be reduced and how we can ensure fast and consistent scoring of free text answers.
-
business_centerBildungsgerechtigkeit im Fokus II (BMBF 2016-2020)
Within the project Bildungsgerechtigkeit im Fokus we are part of Teilprojekt 2: Blended learning.
-
business_centerSustainability of research software (2018 - 2020)
Reproducibility of experiments is a key requirement of scientifical working. With DKPro Core and DKPro Text Classification, we are working towards an improved reproducibilty of software experiements.
We received a 3 year funding by the DFG to further improve DKPro Core and DKPro Text Classification as landscape marks fit for conducting scientifically experiments. -
business_centerExploration of digital technologies in public employment services using the example of text mining (2019-2020)
Employment is an essential part of participation on the society. Public employment services play an important role here. They are often directly and indirectly involved in the initiation of new employment relationships. This is particularly the case where job seekers cannot find an employment on their own.
The planned project examines how the use of digital technologies changes work and organization in employment agencies and job centers. The study uses text mining as a concrete application example of digital technologies. In an interdisciplinary collaboration between sociology and computer science, changes in the work and organization of job placement depending on different scenarios of digitization are examined.
-
business_centerHate Speech Research Overview
On Language Technology Lab, we have conducted multiple research projects in regard to hate speech. Our research interests lie on automatization of hate speech detection and how improvement and reliability in automated detection can be achieved. To do so, we have examined definitional and linguistic challenges and assessed how gender can play a role in hate speech detection. We have explored the development of monolingual and multilingual classification systems which can be used to identify and categorize offensive language on social media. We have generated a classification system to distinguish between free speech and language constituting a criminal offense. While our research focus lies on the technicality of hate speech, our aim is to take a more holistic approach by also taking social and legal aspects into consideration.
Hate Speech Definitions
This research examines how reliability of hate speech annotations can be achieved based on the first German hate speech corpus on refugees. Our results suggest that detailed instructions for the annotation could be more useful than considering hate speech as a binary yes or no decision.
- B. Ross et.al, Measuring the Reliability of Hate Speech Annotations: The Case of the European Refugee Crisis. In Proceedings of NLP4CMC III: 3rd Workshop on Natural Language Processing for Computer-Mediated Communication (Michael Beißwenger, Michael Wojatzki, Torsten Zesch, eds.), 2016. https://arxiv.org/pdf/1701.08118.
Significance of Implicitness in Hate Speech
The research explores whether implicitness affects the perception of hate speech. Our findings suggest that it is crucial to take implicitness into account when developing automated hate speech detection systems.
- Benikova, D., Wojatzki, M., & Zesch, T. (2017). What does this imply? Examining the Impact of Implicitness on the Perception of Hate Speech. In GSCL 2017, Berlin, Germany. https://link.springer.com/chapter/10.1007/978-3-319-73706-5_14
Hate Speech towards Women
This study explores whether there is a relationship between perception of hate speech and gender by asking female and male subject to judge 400 assertions targeting women. The objective of the research is to find out whether being part of the targeted group or personal agreement with an assertion influence how hate speech is perceived.
- Gold, D., Wojatzki, M., Horsmann, T., & Zesch, T. (2018) Do Women Perceive Hate Differently: Examining the Relationship Between Hate Speech, Gender, and Agreement Judgments. In KONVENS.
https://www.oeaw.ac.at/fileadmin/subsites/academiaecorpora/PDF/konvens18_13.pdf
Hate Speech Detection Systems
We participated on the SemEval 2019 Shared Task and made two contributions. The first contribution entailed building a system to predict multilingual hate speech posts and the second contribution was about how identification and categorization of offensive language on social media can be achieved.
-
Zhang, H., Wojatzki, M., Horsmann, T., & Zesch, T. (2019). ltl. uni-due at SemEval-2019 Task 5: Simple but Effective Lexico-Semantic Features for Detecting Hate Speech in Twitter. In SemEval 2019. https://www.aclweb.org/anthology/S19-2078.pdf
- Aggarwal, P., Horsmann, T., Wojatzki, M., & Zesch, T. (2019). LTL-UDE at SemEval-2019 Task 6: BERT and Two-Vote Classification for Categorizing Offensiveness. In SemEval 2019.
https://www.aclweb.org/anthology/S19-2121.pdf
Classification of Criminal Offenses
We have generated an automated classification system to determine which Twitter posts would constitute a criminal offense under German criminal law using a data annotation schema that consists of a series of binary decision. Our findings suggest that the majority of posts fall under the category of morally offensive but do not constitute a criminal offense.
- Zufall, F., Horsmann, T., & Zesch, T. (2019). From Legal to Technical Concept: Towards an Automated Classification of German Political Twitter Postings as Criminal Offenses. In NAACL. https://www.aclweb.org/anthology/N19-1135.pdf
- B. Ross et.al, Measuring the Reliability of Hate Speech Annotations: The Case of the European Refugee Crisis. In Proceedings of NLP4CMC III: 3rd Workshop on Natural Language Processing for Computer-Mediated Communication (Michael Beißwenger, Michael Wojatzki, Torsten Zesch, eds.), 2016. https://arxiv.org/pdf/1701.08118.
-
codeCode and Data
Fast or Accurate? A Comparative Evaluation of PoS Tagger Models
Part-of-Speech tagging is an important preprocessing step for many applications in Natural Language Processing. This importance is reflected by many PoS tagger implementations available today. Which one do you use? Are you sure it is the most suited choice for your demands?
For choosing a PoS tagger there a two properties that should influence your choice:
Speed and AccuracyBig Data scenarios shift speed stronger into the focus than in Digital Humanities where speed is often of minor importance.
Despite of the well known PoS tagger provided by Stanford or the TreeTagger, there are actually many more alternatives to them. Each implementation provides often more than just one model, which is the best?
Experimental Setting
We evaluated in total 27 models for 9 different PoS tagger implementations. The tagger implementation are listed below, we evaluated them on two languages, English and German.
In English, we evaluated each tagger model on the following corpora: British National Corpus, Brown, Gimpel, MASC, Switchboard. In German we evaluated on the Tüba-D/Z and Rehbein.
We excluded in English the Wall-Street-Journal and in German the Tiger and Negra corpus as many models have been trained with those corpora.We evaluate for one language each corpus on each PoS tagger model and measure additionally the runtime of the PoS tagger for the tagging. The measuring starts before the tagger is called and ends right after it. Below figure shows the workflow of our experiment.
To overcome the differences in the tagsets of the various corpora, we harmonised the tags to a coarse grained tagset composing of eleven tags.
Results
The samples highlighted in red are the ones showing the best speed/accuracy combination. The surprising winner is a rule-based Hepple tagger.
The most accurate German tagger is the TreeTagger. HunPos offers a reasonable trade-off between speed and accuracy. We currently do not have a rule-based tagger for German to test whether the results of the Hepple tagger transfer to German.
How to cite us?
Horsmann, Tobias; Erbs, Nicolai; Zesch, Torsten (2015): Fast or Accurate ? – A Comparative Evaluation of PoS Tagging Models. Proceedings of the International Conference of the German Society for Computational Linguistics and Language Technology (GSCL-2015), Essen, Germany.
Past
-
business_centerSemi-automatic generation of reading comprehension questions (Stifterverband 2018-2019)
The goal of this project is to improve the provision and integration of the reading comprehension tests. We aim to motivate the students to study literatures that are relevant to the lecture. After this, there will be some different types of test (free text, multiple choice, fill the blank, etc.) automatically generated by state-of-the-art language technology and curated by the teachers.
These tasks will be varied based on the curriculums conducted by us and can be directly accepted and used for evaluation. We will make the required software open source and can be easily integrated in existing teaching process through a strait forward integration by module.
The project has an extremely potential for transferring to other disciplines and teaching format. Afterall, whenever source texts are available, reading comprehension tests can be generated for purpose and this will also reduce the amount of manual work.
For more information, please visit FELLOWSHIPS HOCHSCHULLEHRE - FELLOWS 2017
-
business_centerCAPE - Computer-assisted Programming Exercises (UDE 2018)
In the lecture "Fundamental Artificial Intelligence" (about 250 students) and "Language Technology" (about 50 students), we will prepare some programming tasks for the students, to improve the programming ability of the students. Additionally, in order to lower the access barrier, we use some pre-configured system for the programming tasks.
Link to the website coming soon.
-
business_centerINDUS - Individualized Language Learning (DFG 2014-2018)
Indus Network
Individualized language learning as a counterpart to standardized classes is now just around the corner due to new developments in the field of language technology. Thus not only commonly spoken languages but also languages with a smaller amount of native speakers can be learned. It becomes apparent however, that embedding those technologies into real learning environments gives rise to new questions, which can only be answered with the framework of interdisciplinary research.
The INDUS-Network („Individualisiertes Sprachlernen” / „Personalized language learning”) unites experts in the fields of language technology, linguistics, educational research, psychology of learning, pedagogical psychology, language acquisition research, and didactics of language learning.
Those experts work together on aspects of individualization, modeling of learners, and adjustments of teaching materials to different initial situations.
More information on the website of the INDUS-Network Homepage.
-
business_centerGerman-Arab Transformation Partnership (DAAD 2016)
More information can be found at the webpage of the workshop in Tunesia.