Computational Linguistics in the Netherlands 30

09:30 – 10:30

Semantics I – Drift 21 0.32 (chair: Els Lefever)

09:30 – 09:50

Evaluating the consistency of word embeddings from small data
Jelke Bloem¹, Antske Fokkens², Aurélie Herbelot³
¹University of Amsterdam, ²VU Amsterdam, ³University of Trento

09:50 – 10:10

Type-Driven Composition of Word Embeddings in the age of BERT
Gijs Wijnholds
Utrecht University

10:10 – 10:30

A diachronic study on the compositionality of English noun-noun compounds using vector-based semantics
Prajit Dhar¹, Janis Pagel², Lonneke van der Plas³, Sabine Schulte im Walde⁴
¹University of Groningen, ²Institute for Natural Language Processing, University of Stuttgart, ³University of Malta, ⁴University of Stuttgart

09:30 – 10:30

Machine Learning – Drift 21 1.05 (chair: Marco Spruit)

09:30 – 09:50

Investigating The Generalization Capacity Of Convolutional Neural Networks For Interpreted Languages
Daniel Bezema and Denis Paperno
Utrecht University

09:50 – 10:10

Predicting the number of citations of scientific articles with shallow and deep models
Gideon Maillette de Buy Wenniger¹, Herbert Teun Kruitbosch², Lambert Schomaker¹, Valentijn A. Valentijn³
¹Autonomous Perceptive Systems group – Bernoulli Institute, University of Groningen, ²University of Groningen, ³Kapteyn Astronomical Institute, University of Groningen,

10:10 – 10:30

A Non-negative Tensor Train Decomposition Framework for Language Data
Tim Van de Cruys
IRIT & CNRS

09:30 – 10:30

Text Analytics I – Drift 25 1.02 (chair: Ineke Schuurman)

09:30 – 09:50

Language features and social media metadata for age prediction using CNN
Abhinay Pandya¹, Mourad Oussalah¹, Paola Monachesi², Panos Kostakos¹
¹University of Oulu, ²Utrecht University

09:50 – 10:10

EventDNA: Identifying event mention spans in Dutch-language news text
Camiel Colruyt, Orphée De Clercq, Véronique Hoste
LT3, Ghent University

10:10 – 10:30

Cross-context News Corpus of Protest Events
Ali Hürriyetoğlu, Erdem Yoruk, Deniz Yuret, Osman Mutlu, Burak Gurel, Cagri Yoltar, Firat Durusan
Koç University

09:30 – 10:30

Syntax & Parsing I – Drift 21 0.05 (chair: Michael Moortgat)

09:30 – 09:50

Linguistic enrichment of historical Dutch using deep learning
Silke Creten¹, Peter Dekker², Vincent Vandeghinste³
¹KU Leuven, ²Vrije Universiteit Brussel & Instituut voor de Nederlandse Taal, ³Instituut voor de Nederlandse Taal

09:50 – 10:10

Resolution of morphosyntactic ambiguity in Russian with two-level linguistic analysis
Uliana Petrunina
University of Tromsø

10:10 – 10:30

Task-specific pretraining for German and Dutch dependency parsing
Daniël de Kok and Tobias Pütz
University of Tübingen

09:30 – 10:30

Sentiment Analysis – Drift 21 1.09 (chair: Kalliopi Zervanou)

09:30 – 09:50

An unsupervised aspect extraction method with an application to Dutch book reviews
Stephan Tulkens¹ and Andreas van Cranenburgh²
¹CLiPS, University of Antwerp, ²University of Groningen

09:50 – 10:10

Improving Pattern.nl sentiment analysis
Lorenzo Gatti and Judith van Stegeren
Human Media Interaction, University of Twente

10:10 – 10:30

Dutch language polarity analysis on reviews and cognition description datasets
Gerasimos Spanakis and Josephine Rutten
Maastricht University

11:00 – 12:00

Keynote: Multilingual Dependency Parsing: From Universal Dependencies to Sesame Street – Drift 21 0.32 (chair: Jan Odijk)

While research on dependency parsing has always had a strong multilingual orientation, the lack of standardized annotations for a long time made it difficult both to meaningfully compare results across languages and to develop truly multilingual systems. The Universal Dependencies project has during the last five years tried to overcome this obstacle by developing cross-linguistically consistent morphosyntactic annotation for many languages. During the same period, dependency parsing (like the rest of NLP) has been transformed by the adoption of continuous vector representations and neural network techniques. In this talk, I will introduce the framework and resources of Universal Dependencies, and discuss advances in multilingual dependency parsing enabled by these resources in combination with deep learning techniques, ranging from traditional word and character embeddings to deep contextualized word representations like ELMo and BERT.
Joakim Nivre

12:00 – 13:45

Poster – Drift 21 0.03

BLISS: A collection of Dutch spoken dialogue about what makes people happy
Jelte van Waterschoot¹, Iris Hendrickx², Arif Khan², Marcel de Korte³
¹University of Twente, ²Radboud University Nijmegen, ³ReadSpeaker

Dutch Anaphora Resolution: A Neural Network Approach towards Automatic die/dat Prediction
Liesbeth Allein¹, Artuur Leeuwenberg², Marie-Francine Moens¹
¹KU Leuven, ²Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht

Computational Model of Quantification
Guanyi Chen and Kees van Deemter
Utrecht University

Parallel corpus annotation and visualization with TimeAlign
Martijn van der Klis and Ben Bonfil
UiL OTS, Utrecht University

Towards a Dutch FrameNet lexicon and parser using the data-to-text method
Gosse Minnema¹ and Levi Remijnse²
¹University of Groningen, ²VU University Amsterdam

The merits of Universal Language Model Fine-tuning for Small Datasets – a case with Dutch book reviews
Benjamin van der Burgh and Suzan Verberne
LIACS, Leiden University

Low-Resource Unsupervised Machine Translation using Dependency Parsing
Lukas Edman, Gertjan van Noord, Antonio Toral
University of Groningen

Relation extraction for images using the image captions as supervision
Xue Wang¹, Youtian Du¹, Suzan Verberne², Fons J. Verbeek²
¹Xi’an Jiaotong University, ²LIACS, Leiden University

Evaluating and improving state-of-the-art named entity recognition and anonymisation methods
Chaïm van Toledo and Marco Spruit
Universiteit Utrecht

Automatic extraction of semantic roles in support verb constructions
Ignazio Mauro Mirto
Università di Palermo

Introducing CROATPAS: A digital semantic resource for Croatian verbs
Costanza Marini and Elisabetta Ježek
University of Pavia

SONNET: our Semantic Ontology Engineering Toolset
Maaike de Boer, Jack Verhoosel, Roos Bakker
TNO

Stylometric and Emotion-Based Features for Hate Speech Detection
Ilia Markov and Walter Daelemans
University of Antwerp, CLiPS

Evaluating an Acoustic-based Pronunciation Distance Measure Against Human Perceptual Data
Martijn Bartelds and Martijn Wieling
University of Groningen

Examination on the Phonological Rules Processing of Korean TTS
Hyeon-yeol Im
Chung-ang University

12:00 – 13:45

Poster – Drift 21 0.06

How far is “man bites dog” from “dog bites man”? Investigating the structural sensitivity of distributional verb matrices
Luka van der Plas
Utrecht University

The Effect of Vocabulary Overlap on Linguistic Probing Tasks for Neural Language Models
Prajit Dhar and Arianna Bisazza
University of Groningen

Towards automation of language assessment procedures
Sjoerd Eilander and Jan Odijk
¹Utrecht University, UiL-OTS

A Collection of Side Effects and Coping Strategies in Patient Discussion Groups
Anne Dirkson, Suzan Verberne, Wessel Kraaij
Leiden University

A replication study for better application of text classification in political science.
Hugo de Vos
Leiden University

Article omission in Dutch newspaper headlines
R. van Tuijl and Denis Paperno
Utrecht University

Innovation Power of ESN
Erwin Koens
HU University of Applied Sciences Utrecht

Multi-label ICD Classification of Dutch Hospital Discharge Letters
Ayoub Bagheri¹, Arjan Sammani², Daniel Oberski³, Folkert W. Asselbergs²
¹Department of Methodology and Statistics, Utrecht University, ²Department of Cardiology, Division of Heart and Lungs, University Medical Center Utrecht, ³Department of Methodology and Statistics, Faculty of Social Sciences, Utrecht University

Political self-presentation on Twitter before, during, and after elections: A diachronic analysis with predictive models
Harmjan Setz, Marcel Broersma, Malvina Nissim
University of Groningen

Text Processing with Orange
Erik Tjong Kim Sang¹, Peter Kok¹, Wouter Smink², Bernard Veldkamp², Gerben Westerhof², Anneke Sools²
¹Netherlands eScience Center, ²University of Twente

Whose this story? Investigating Factuality and Storylines
Tommaso Caselli, Marcel Broersma, Blanca Calvo Figueras, Julia Meyer
Rijksuniversiteit Groningen

Bootstrapping the extension of an Afrikaans treebank through gamification
Peter Dirix¹ and Liesbeth Augustinus²
¹Cerence, KU Leuven, ²CCL, KU Leuven

Starting a treebank for Ughele
Peter Dirix¹ and Benedicte Haraldstad Frostad²
¹Cerence, KU Leuven, ²Norwegian Language Council, Oslo

13:45 – 14:45

Semantics II – Drift 21 0.32 (chair: Malvina Nissim)

13:45 – 14:05

Comparing Frame Membership to WordNet-based and Distributional Similarity
Esra Abdelkareem
Debrecen University

14:05 – 14:25

Representing a concept by the distribution of names of its instances
Matthijs Westera¹, Gemma Boleda², Sebastian Padó³
¹Universitat Pompeu Fabra, ²ICREA / Universitat Pompeu Fabra, ³Universität Stuttgart

14:25 – 14:45

Semantic parsing with fuzzy meaning representations
Pavlo Kapustin¹ and Michael Kapustin²
¹University of Bergen, ²Moscow Institute of Physics and Technology

13:45 – 14:45

Text Analytics II – Drift 25 1.02 (chair: Paola Monachesi)

13:45 – 14:05

Annotating sexism as hate speech: the influence of annotator bias
Elizabeth Cappon^1,2, Guy De Pauw^1,2, Walter Daelemans²
¹TEXTGAIN, ²University of Antwerp

14:05 – 14:25

Accurate Estimation of Class Distributions in Textual Data
Erik Tjong Kim Sang¹, Kim Smeenk², Aysenur Bilgin³, Tom Klaver¹, Laura Hollink³, Jacco van Ossenbruggen^3,4, Frank Harbers², Marcel Broersma²
¹Netherlands eScience Center, ²University of Groningen, ³CWI, ⁴VU

14:25 – 14:45

Interpreting Dutch Tombstone Inscriptions
Johan Bos
University of Groningen

13:45 – 14:45

Medical NLP – Drift 21 1.05 (chair: Thierry Declerck)

13:45 – 14:05

Extracting Drug, Reason, and Duration Mentions from Clinical Text Data: A Comparison of Approaches
Jens Lemmens, Simon Suster, Walter Daelemans
University of Antwerp, CLiPS

14:05 – 14:25

Dialogue Summarization for Smart Reporting: the case of consultations in health care.
Sabine Molenaar, Fabiano Dalpiaz, Sjaak Brinkkemper
Utrecht University

14:25 – 14:45

Natural Language Processing and Machine Learning for Classification of Dutch Radiology Reports
Prajakta Shouche¹ and Ludo Cornelissen²
¹University of Groningen, ²University Medical Center Groningen

13:45 – 14:45

Syntax & Parsing II – Drift 21 0.05 (chair: Gosse Bouma)

13:45 – 14:05

Frequency-tagged EEG responses to grammatical and ungrammatical phrases.
Amelia Burroughs, Nina Kazanina, Conor Houghton
University of Bristol

14:05 – 14:25

Detecting syntactic differences automatically using the minimum description length principle
Martin Kroon¹, Sjef Barbiers¹, Jan Odijk², Stéphanie van der Pas¹
¹Leiden University, ²Utrecht University

14:25 – 14:45

Complementizer Agreement Revisited: A Quantitative Approach
Milan Valadou
KU Leuven

14:45 – 16:15

Poster – Drift 21 0.03

Alpino for the masses
Joachim Van den Bogaert
K.U. Leuven

GrETEL @ INT: Querying Very Large Treebanks by Example
Vincent Vandeghinste and Koen Mertens
Instituut voor de Nederlandse Taal

Convergence in First and Second Language Acquisition Dialogues
Arabella Sinclair and Raquel Fernández
ILLC, University of Amsterdam

ExpReal: A Multilingual Expressive Realiser
Ruud de Jong¹, Nicolas Szilas², Mariët Theune¹
¹University of Twente, ²University of Geneva

IVESS: Intelligent Vocabulary and Example Selection for Spanish vocabulary learning
Jasper Degraeuwe and Patrick Goethals
Ghent University

Interlinking the ANW Dictionary and the Open Dutch WordNet
Thierry Declerck
DFKI GmbH

Psycholinguistic Profiling of Contemporary Egyptian Colloquial Arabic Words
Bacem Essam¹ and Ameni Mejri²
¹Peerwith, ²Debrecen University

BERT-NL: a set of language models pre-trained on the Dutch SoNaR corpus
Alex Brandsen¹, Anne Dirkson¹, Suzan Verberne¹, Maya Sappelli², Dungh Manh Chu³, Kimberly Stoutjesdijk³
¹Leiden University, ²FDMG / HAN, ³FD Mediagroep

Dialect-aware Tokenisation for Translating Arabic User Generated Gontent
Pintu Lohar¹, Haithem Afli², Andy Way³
¹Dublin City University, ²ADAPT Centre, Cork Institute of Technology, ³ADAPT Centre, Dublin City University

Literary MT under the magnifying glass: Assessing the quality of an NMT-translated Agatha Christie novel.
Margot Fonteyne, Arda Tezcan, Lieve Macken
LT3, Ghent University, Belgium

WordNet, occupations and natural gender
Ineke Schuurman¹, Vincent Vandeghinste², Leen Sevens¹
¹KU Leuven, ²Instituut voor de Nederlandse Taal

Mark my Word: A Sequence-to-Sequence Approach to Definition Modeling
Timothee Mickus¹, Denis Paperno², Mathieu Constant³
¹Université de Lorraine, ATILF, ²Utrecht University, ³Université de Lorraine, CNRS, ATILF

Nederlab Word Embeddings
Martin Reynaert
KNAW Meertens Institute / Tilburg University

Neural Semantic Role Labeling Using Deep Syntax for French FrameNet
Tatiana Bladier¹ and Marie Candito²
¹Heinrich Heine University of Düsseldorf, ²LLF (Univ Paris Diderot / CNRS)

14:45 – 16:15

Poster – Drift 21 0.06

Acoustic speech markers for psychosis
Janna de Boer¹, Alban Voppel², Frank Wijnen³, Iris Sommer²
¹UMC Utrecht, ²UMC Groningen, ³UiL OTS

Social media candidate generation as a psycholinguistic task
Stephan Tulkens, Dominiek Sandra, Walter Daelemans
University of Antwerp, CLiPS

Evaluating Language-Specific Adaptations of Multilingual Language Models for Universal Dependency Parsing
Ahmet Üstün, Arianna Bisazza, Gosse Bouma, Gertjan van Noord
University of Groningen

SPOD: Syntactic Profiler of Dutch
Gertjan van Noord, Jack Hoeksema, Peter Kleiweg, Gosse Bouma
University of Groningen

HAMLET: Hybrid Adaptable Machine Learning approach to Extract Terminology
Ayla Rigouts Terryn, Veronique Hoste, Els Lefever
LT3, Ghent University

How Similar are Poodles in the Microwave? Classification of Urban Legend Types
Myrthe Reuver
Radboud University Nijmegen

Identifying Predictors of Decisions for Pending Cases of the European Court of Human Rights
Masha Medvedeva, Michel Vols, Martijn Wieling
University of Groningen

Rightwing Extremism Online Vernacular: Empirical Data Collection and Investigation through Machine Learning Techniques
Pierre Voué
Textgain

SnelSLiM: a webtool for quick stable lexical marker analysis
Bert Van de Poel
KU Leuven

Syntactic, semantic and phonological features of speech in schizophrenia spectrum disorders; a combinatory classification approach.
Alban Voppel^1,2, Janna de Boer^1,2, Hugo Schnack^2,3, Iris Sommer¹
¹UMC Groningen, ²UMC Utrecht, ³Universiteit Utrecht

Tracing thoughts – application of “ngram tracing” on schizophrenia data
Lisa Becker¹ and Walter Daelemans²
¹University of Potsdam, ²Universiteit Antwerpen

Spanish ‘se’ and ‘que’ in Universal Dependencies (UD) parsing: a critical review
Patrick Goethals and Jasper Degraeuwe
Ghent University

16:15 – 17:15

Language Generation – Drift 21 0.05 (chair: Mariët Theune)

16:15 – 16:35

Generating relative clauses from logic
Crit Cremers
LUCL, Leiden University

16:35 – 16:55

Elastic words in English and Chinese: are they the same phenomenon?
Lin Li, Kees van Deemter, Denis Paperno
Utrecht University

16:55 – 17:15

Generation of Image Captions Based on Deep Neural Networks
Shima Javanmardi¹, Ali Mohammad Latif¹, Fons Verbeek², Mohammad Taghi Sadeghi Sadeghi¹
¹Yazd University, ²Leiden University

16:15 – 17:15

Semantics III – Drift 21 0.32 (chair: Suzan Verberne)

16:15 – 16:35

AETHEL: typed supertags and semantic parses for Dutch
Konstantinos Kogkalidis¹, Michael Moortgat¹, Richard Moot²
¹Utrecht University, ²CNRS

16:35 – 16:55

Evaluating character-level models in neural semantic parsing
Rik van Noord, Antonio Toral, Johan Bos
University of Groningen

16:55 – 17:15

Testing Abstract Meaning Representation for Recognizing Textual Entailment
Lasha Abzianidze
CLCG, University of Groningen

16:15 – 17:15

Text & Speech Analytics – Drift 25 1.02 (chair: Erik Tjong Kim Sang)

16:15 – 16:35

Towards Dutch Automated Writing Evaluation
Orphee De Clercq
LT3, Ghent University

16:35 – 16:55

Automatic Analysis of Dutch speech prosody
Aoju Chen, Na Hu, Berit Janssen
Utrecht University

16:55 – 17:15

Hyphenation: from transformer models and word embeddings to a new linguistic rule-set
Francois REMY
UGent/IDLAB

16:15 – 17:15

Translation – Drift 21 1.05 (chair: Vincent Vandeghinste)

16:15 – 16:35

Translation mining in the domain of conditionals: first results
Jos Tellings
Utrecht University

16:35 – 16:55

Automatic Detection of English-Dutch and French-Dutch Cognates on the basis of Orthographic Information and Cross-Lingual Word Embeddings
Sofie Labat, Els Lefever, Pranaydeep Singh
LT3, Ghent University

16:55 – 17:15

On the difficulty of modelling fixed-order languages versus case marking languages in Neural Machine Translation
Stephan Sportel and Arianna Bisazza
University of Groningen

16:15 – 17:15

Shared Task – Drift 21 1.09 (chair: Marijn Schraagen)

Computational Linguistics in the Netherlands 30

Detailed CLIN30 programme