Software and linguistic resources
The VISPER is a unique software system designed for education of some essential
topics in automatic speech recognition (ASR). Its main power consists in
visualization of the basic ASR tasks, such as signal acquisition, speech
parameterization, endpoint detection, DTW-based matching or the application
of the continuous hidden Markov models. Learning and understanding these
topics becomes much easier with the VISPER because the system is like an
experimental workbench that allows a user to search answers on many common
questions by running highly illustrative experiments.
GATE
(General Architecture for Text Engineering) (http://www.dcs.shef.ac.uk/research/groups/nlp/gate/)
GATE is an architecture and development environment for language processing
R&D, and comes bundled with an advanced Information Extraction system
for English.
GATE 1.5.1 is now available for download. This release includes Java
support, better SGML support, a manual annotation tool, an annotation comparison
tool and various other goodies. The system is free for research purposes,
and comes in source and binary form for common platforms.
Hamish Cunningham,Research Fellow in Computer Science, University of
Sheffield, UK
http://www.dcs.shef.ac.uk/~hamish/
CoreLex http://www.cs.brandeis.edu/~paulb/CoreLex/corelex.html
An ONTOLOGY, LEXICAL SEMANTIC DATABASE and TAGSET for nouns,
organized around SYSTEMATIC POLYSEMY and UNDERSPECIFICATION.
CoreLex developed out of a thesis on systematic polysemy and underspecification
of nouns, establishing an ontology and semantic database of 126 semantic
types, covering around 40,000 nouns and defining a large number of systematic
polysemous classes that are derived by a careful analysis of sense distributions
in WordNet.
The semantic types are underspecified representations based on Generative
Lexicon theory and are used in an underspecified approach to semantic tagging,
addressing two problems: sense enumeration (the difficulty of deciding
the number of discrete senses), due to systematic polysemy; and multiple
reference (NP's denoting more than one model-theoretic referent), due to
underspecification. Semantic tags that are based on traditional, discrete
senses tend to be too fine-grained for practical use. For instance, WordNet
has, on the lowest level, around 60,000 different tags (synsets) for nouns
alone. The CoreLex approach, on the other hand, offers a concise set of
126 tags that are inherently more coarse-grained, by taking into account
systematic polysemy and underspecification.
The CoreLex database is freely available for research purposes, including
commercial ones.
EUROPEAN LANGUAGE RESOURCES ASSOCIATION ELRA News ELRA/ELDA
55-57 rue Brillat Savarin 75013 PARIS Tel: +33 1 43 13 33 33 Fax: +33 1
43 13 33 30
E-mail: info-elra@calva.net http://www.icp.grenet.fr/ELRA/home.html