Search
Results
Fuzzy Clustering for Topic Analysis and Summarization of Document Collections

Abstract
Large document collections, such as those delivered by Internet search engines, are difficult and time-consuming for users to read and analyse. The detection of common and distinctive topics within a document set, together with the generation of multi-document summaries, can greatly ease the burden of information management. We show how this can be achieved with a clustering algorithm based on fuzzy set theory, which (i) is easy to implement and integrate into a personal information system, (ii) generates a highly flexible data structure for topic analysis and summarization, and (iii) also delivers excellent performance.
Ontological Text Mining of Software Documents

Abstract
Documents written in natural languages constitute a major part of the software engineering lifecycle artifacts. Especially during software maintenance or reverse engineering, semantic information conveyed in these documents can provide important knowledge for the software engineer. In this paper, we present a text mining system capable of populating a software ontology with information detected in documents.
Task-Dependent Visualization of Coreference Resolution Results

Abstract
Graphical visualizations of coreference chains support a system developer in analyzing the behavior of a resolution algorithm. In this paper, we state explicit use cases for coreference chain visualizations and show how they can be resolved by transforming chains into other, standardized data formats, namely Topic Maps and Ontologies.
Next-Generation Summarization: Contrastive, Focused, and Update Summaries

Abstract
Classical multi-document summaries focus on the common topics of a document set and omit distinctive themes particular to a single document—thereby often suppressing precisely that kind of information a user might need for a specific task. This can be avoided through advanced multi-document summaries that take a user's context and history into account, by delivering focused, contrastive, or update summaries. To facilitate the generation of these different summaries, we propose to generate all types from a single data structure, topic clusters, which provide for an abstract representation of a set of documents. Evaluations carried out on five years' worth of data from the DUC summarization competition prove the feasibility of this approach.
Durm German Lemmatizer v1.0 Released
Submitted by rene on Thu, 2007-05-31 08:59.I'm happy to announce the first public release of our free/open source Durm Lemmatization System for the German language.
The release comes with source code, binaries, documentation, resources (German lexicon, Case Tagger probabilities), and manually annotated texts from the German Wikipedia for evaluation.
New Job, New Website
Submitted by rene on Sat, 2008-05-31 19:00.Deadline extended for STSM
Submitted by rene on Wed, 2008-04-16 22:36.Call for Papers: International Workshop on Semantic Technologies in System Maintenance (STSM 2008)
Submitted by rene on Tue, 2008-02-12 21:19.Together with Jürgen Rilling, Dragan Gaševi?, and Jeff Z. Pan I'm organizing the first International Workshop on Semantic Technologies in System Maintenance (STSM 2008), which will be co-located with the 16th IEEE International Conference on Program Comprehension (ICPC 2008) in Amsterdam, The Netherlands.
Detailed information on the workshop, submission guidelines, and other news are now available from the workshop's webpage.
