Outlier elimination in construction of software metric models

Victor K.Y. Chan, W. Eric Wong

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

12 Citations (Scopus)

Abstract

Software metric models are models relating various software metrics of software projects. Such models' purpose is to predict some of these metrics for certain future projects given the other metrics for those projects. The construction of software metric models derives such relationships and is usually based on data samples of concerned software metrics for past software projects. Often, in such a data sample, there are inevitably a few very extreme projects which have relationships among their metrics deviating substantially from those among the metrics for the remaining "mainstream" bulk of projects in the data sample. Such "outlier" projects exert considerable undue influence on the derivation of the said relationships during model construction in that the relationships so derived cannot candidly reflect the true "mainstream" relationships. The direct consequence is degraded prediction accuracy of the constructed models for future projects. To overcome this problem, we proposed a methodology to identify and thus eliminate such outliers prior to model construction. Our methodology makes use of the least of median squares (LMS) regression to uncover such outliers and is applicable irrespective of any subsequent model construction approaches. We also did a case study to apply our methodology, and the results prove our methodology being able to improve the prediction accuracy of most models experimented with in the study. Thus, our methodology is recommended for any further software metric model construction. This paper documents such a methodology and the successful case study.

Original languageEnglish
Title of host publicationProceedings of the 2007 ACM Symposium on Applied Computing
PublisherAssociation for Computing Machinery
Pages1484-1488
Number of pages5
ISBN (Print)1595934804, 9781595934802
DOIs
Publication statusPublished - 2007
Event2007 ACM Symposium on Applied Computing - Seoul, Korea, Republic of
Duration: 11 Mar 200715 Mar 2007

Publication series

NameProceedings of the ACM Symposium on Applied Computing

Conference

Conference2007 ACM Symposium on Applied Computing
Country/TerritoryKorea, Republic of
CitySeoul
Period11/03/0715/03/07

Keywords

  • Least of Median Squares (LMS)
  • Models
  • Outliers
  • Software metrics

Fingerprint

Dive into the research topics of 'Outlier elimination in construction of software metric models'. Together they form a unique fingerprint.

Cite this