Partner No. 5                         Institute of Computer Science,  Polish Academy of Sciences
Workpackage#                       4
Workpackage Acronym:    ICS-MM
Title:                                         An HPSG treebank for Polish
Coordinator:                            Leonard Bolc The goal : to create  a treebank of syntactic structures in Polish using HPSG (Head-driven Phrase Structure Grammar) for encoding the parse trees.

Summary of the project
The objective of our proposal is to create  a treebank of syntactic structures in Polish using HPSG (Head-driven Phrase Structure Grammar) for encoding the parse trees. The formal HPSG grammar of Polish developed in our ongoing KBN project will be used for this purpose. Such an HPSG-encoded treebank  will give sound linguistic grounds for evaluation and improvement of the KBN  grammar  and its implementation. The treebank can be also used for evaluation of other grammars,  writing more effective parsers, e.g., to capture free word order phenomena, add probabilistic data, etc.

The framework of  HPSG we have chosen is currently one of the leading linguistic formalisms used both in theoretical and application oriented research programs all over the world.
Recently the interest in modern language technologies  has been driven also to Slavic languages (Czech and Bulgarian so far) and HPSG-based grammars have been used in LaTeSlav (Language Processing Technologies  for Slavic Languages), a European Union joint research project. The use of a uniform linguistic platform for diverse  languages gives the advantage of simplifying potential   integration with  grammars of other languages.  Another vast step towards future practical developments  is building the treebank of linguistic constructions. Although in the KBN project we concentrate mostly on syntactic description of Polish, both semantics and morphology will be taken into account in our grammar.

The work in this project will be divided into two tasks: preparing the test data out of the Polish texts corpus and  manual annotation of this selected text corpus to prepare the linguistically motivated set of syntactic parses. This will be also the first test of adequacy and coverage of our HPSG grammar. Once such a bank is prepared, it will be used for the improvement of the implementation of the grammar. The organisation and management of the proposal are strictly related to our ongoing project. The work on the proposal can start no sooner as at the end of the  second phase of the KBN project.

Proposal           Staff           Results 

ESPRIT's home page                 Foundation for Polish Science