Text Mining and Software Engineering: An Integrated Source Code and Document Analysis Approach

Abstract
Documents written in natural languages constitute a major part of the artifacts produced during the software engineering lifecycle. Especially during software maintenance or reverse engineering, semantic information conveyed in these documents can provide important knowledge for the software engineer. In this paper, we present a text mining system capable of populating a software ontology with information detected in documents. A particular novelty is the integration of results from automated source code analysis into an NLP pipeline, allowing to cross-link software artifacts represented in code and natural language on a semantic level.
Reference
René Witte, Qiangqiang Li, Yonggang Zhang, and Juergen Rilling. Text Mining and Software Engineering: An Integrated Source Code and Document Analysis Approach. IET Software Journal, Volume 2, Issue 1, 2008, pp.3-16. DOI: 10.1049/iet-sen:20070110. Special Section on Natural Language in Software Engineering.
Bibtex entry (also for download):
@article{witte_etal_ietsoftware2008,
title = {{Text Mining and Software Engineering:
an Integrated Source Code and Document Analysis Approach}},
author = {Ren\'{e} Witte and Qiangqiang Li
and Yonggang Zhang and Juergen Rilling},
journal = {IET Software},
number = {1},
volume = {2},
year = {2008},
pages = {3--16},
publisher = {IET},
url = {http://link.aip.org/link/?SEN/2/3/1},
doi = {10.1049/iet-sen:20070110}
}
You can also:
- visit the official version of this paper in the IET online library
- see the table of contents from IET Software Vol. 2 No. 1.
Download
Postprint: witte_etal_iet2008.pdf
MD5 checksum: 57e152ec1939a21c2c68c68831ac90cb
Copyright © 2008 IET. This paper is a postprint of a paper submitted to and accepted for publication in the IET Software Journal, Volume: 2, Issue: 1, 2008, and is subject to IET copyright [http://www.iet.org]. The copy of record is available at http://link.aip.org/link/?SEN/2/3/1, DOI: 10.1049/iet-sen:20070110.
