Partner No. 5 Institute of Computer Science, Polish Academy of Sciences
Workpackage Acronym: ICS-MM
Title: An HPSG treebank for Polish
Coordinator: Leonard Bolc The goal : to create a treebank of syntactic structures in Polish using HPSG (Head-driven Phrase Structure Grammar) for encoding the parse trees.
Summary of the project
The objective of our proposal is to create a treebank of syntactic structures in Polish using HPSG (Head-driven Phrase Structure Grammar) for encoding the parse trees. The formal HPSG grammar of Polish developed in our ongoing KBN project will be used for this purpose. Such an HPSG-encoded treebank will give sound linguistic grounds for evaluation and improvement of the KBN grammar and its implementation. The treebank can be also used for evaluation of other grammars, writing more effective parsers, e.g., to capture free word order phenomena, add probabilistic data, etc.
The framework of HPSG we have chosen is currently one of the leading linguistic formalisms used both in theoretical and application oriented research programs all over the world.
Recently the interest in modern language technologies has been driven also to Slavic languages (Czech and Bulgarian so far) and HPSG-based grammars have been used in LaTeSlav (Language Processing Technologies for Slavic Languages), a European Union joint research project. The use of a uniform linguistic platform for diverse languages gives the advantage of simplifying potential integration with grammars of other languages. Another vast step towards future practical developments is building the treebank of linguistic constructions. Although in the KBN project we concentrate mostly on syntactic description of Polish, both semantics and morphology will be taken into account in our grammar.
The work in this project will be divided into two tasks: preparing the test data out of the Polish texts corpus and manual annotation of this selected text corpus to prepare the linguistically motivated set of syntactic parses. This will be also the first test of adequacy and coverage of our HPSG grammar. Once such a bank is prepared, it will be used for the improvement of the implementation of the grammar. The organisation and management of the proposal are strictly related to our ongoing project. The work on the proposal can start no sooner as at the end of the second phase of the KBN project.
Proposal Staff Results
ESPRIT's home page Foundation for Polish Science