Software and linguistic resources

VISPER  http://itakura.kes.vslib.cz/kes/visper.html or http://www.fm.vslib.cz/~kes/visper.html

The VISPER is a unique software system designed for education of some essential topics in automatic speech recognition (ASR). Its main power consists in visualization of the basic ASR tasks, such as signal acquisition, speech parameterization, endpoint detection, DTW-based matching or the application of the continuous hidden Markov models. Learning and understanding these topics becomes much easier with the VISPER because the system is like an experimental workbench that allows a user to search answers on many common questions by running highly illustrative experiments.
 

GATE (General Architecture for Text Engineering) (http://www.dcs.shef.ac.uk/research/groups/nlp/gate/)

GATE is an architecture and development environment for language processing R&D, and comes bundled with an advanced Information Extraction system for English.
GATE 1.5.1 is now available for download. This release includes Java support, better SGML support, a manual annotation tool, an annotation comparison tool and various other goodies. The system is free for research purposes, and comes in source and binary form for common platforms.

Hamish Cunningham,Research Fellow in Computer Science, University of Sheffield, UK
http://www.dcs.shef.ac.uk/~hamish/
 

CoreLex  http://www.cs.brandeis.edu/~paulb/CoreLex/corelex.html

An ONTOLOGY, LEXICAL SEMANTIC DATABASE and TAGSET for nouns,
 organized around SYSTEMATIC POLYSEMY and UNDERSPECIFICATION.

CoreLex developed out of a thesis on systematic polysemy and underspecification of nouns, establishing an ontology and semantic database of 126 semantic types, covering around 40,000 nouns and defining a large number of systematic polysemous classes that are derived by a careful analysis of sense distributions in WordNet.
The semantic types are underspecified representations based on Generative Lexicon theory and are used in an underspecified approach to semantic tagging, addressing two problems: sense enumeration (the difficulty of deciding the number of discrete senses), due to systematic polysemy; and multiple reference (NP's denoting more than one model-theoretic referent), due to underspecification. Semantic tags that are based on traditional, discrete senses tend to be too fine-grained for practical use. For instance, WordNet has, on the lowest level, around 60,000 different tags (synsets) for nouns alone. The CoreLex approach, on the other hand, offers a concise set of 126 tags that are inherently more coarse-grained, by taking into account systematic polysemy and underspecification.
The CoreLex database is freely available for research purposes, including commercial ones.
 

 EAGLES report  http://www.ilc.pi.cnr.it/EAGLES96/rep2

EUROPEAN LANGUAGE RESOURCES ASSOCIATION ELRA News ELRA/ELDA 55-57 rue Brillat Savarin 75013 PARIS Tel: +33 1 43 13 33 33 Fax: +33 1 43 13 33 30
E-mail: info-elra@calva.net http://www.icp.grenet.fr/ELRA/home.html