We are committed to reproducible and replicable research
Thus, we generally make all research software publicly available. We are also involved in the development of several language technology tools.
We only give a short overview here, please refer to the project websites for more information.
Code - Tools
- c-test builder is a web-based tool that allows the easy creation of c-test style language tests.
- DKPro Core is a collection of software components for natural language processing (NLP) based on the Apache UIMA framework.
- DKPro Lab is a lightweight framework for parameter sweeping experiments.
- DKPro Similarity is an open source framework for word and text similarity.
- DKPro TC is a UIMA-based text classification framework built on top of DKPro Core and DKPro Lab.
- DKPro Toolbox aims at providing a simplified access to linguistic processing of text in a Java environment.
- escrito is a toolkit for scoring student writings using NLP techniques that addresses two main user groups: teachers and NLP researchers.
- jWeb1T is an open source Java tool for efficiently searching n-gram data in the Web 1T 5-gram corpus format.
- JWPL (Java Wikipedia Library) is a free, Java-based application programming interface that allows to access all information in Wikipedia.
Code - Experiments
- Diacritization (link is coming soon)
- “Fast or Accurate?” A Comparative Evaluation of PoS Tagger Models
- “LRT English” (link is coming soon)
- ASAP spelling This repository contains gold-standard spelling correction annotation for the test data section of the asap short answer scoring corpus.
- Diacritization Data (Quran, RDI & Tashkeela) (link is coming soon)
- Hatespeech datasets Ross et al., Benikova et al. and Wojatzki et al.
- Semantic Relatedness Datasets