Finnish Internet Parsebank

The FIP consists of nearly 4 billion words automatically collected from the Web. It has full morphological and dependency syntax analyses. On the word-level, this includes the part-of-speech classes of the words and their morphological features (such as noun, singular and genitive), and on the sentence-level, the sentence structure and the syntactic functions of the words in it (such as nominal subject). These are marked following the Universal Dependencies (UD) scheme, a syntactic model seeking cross-linguistically consistent annotations and attested on 47 languages. The UD allows for novel insights to many linguistic research problems by enabling their study across languages.

The FIP is available through a user interface at and as a downloadable version, shuffled at the sentence-level at


School of Languages and Translation Studies, Institute for Advanced Studies and Department of Information Technology, University of Turku