Finnish Internet Parsebank

The FIP consists of nearly 4 billion words automatically collected from the Web. It has full morphological and dependency syntax analyses. On the word-level, this includes the part-of-speech classes of the words and their morphological features (such as noun, singular and genitive), and on the sentence-level, the sentence structure and the syntactic functions of the words in it (such as nominal subject). These are marked following the Universal Dependencies (UD) scheme, a syntactic model seeking cross-linguistically consistent annotations and attested on 47 languages. The UD allows for novel insights to many linguistic research problems by enabling their study across languages.

The FIP is available through user interfaces and as a downloadable version, shuffled at the sentence-level at https://turkunlp.org/finnish_nlp.html#parsebank.



Institution

School of Languages and Translation Studies, Institute for Advanced Studies and Department of Information Technology, University of Turku