Applying statistical methodology to optimize and simplify software metric models with missing data

W. Eric Wong, Jin Zhao, Victor K.Y. Chan

研究成果: Conference contribution同行評審

14 引文 斯高帕斯(Scopus)

摘要

During the construction of a software metric model, the decision on whether a particular predictor metric should be included is most likely based on an intuitive or experience based assumption that the predictor metric has an impact on the target metric with a statistical significance. However, a model constructed based on such an assumption may contain redundant predictor metric(s) and/or unnecessary predictor metric complexity. This is because the assumption made before the model construction is not verified after the model is constructed. To resolve the first problem (i.e., possible redundant predictor metric(s)), we propose a statistical hypothesis testing methodology to verify "retrospectively" the statistical significance of the impact of each predictor metric on the target metric. If the variation of a predictor metric does not correlate enough with the variation of the target metric, the predictor metric should be deleted from the model. For the second problem (i.e., unnecessary predictor metric complexity), we use "goodness-of-fit" to determine whether certain categories of a categorical predictor metric should be combined together. In addition, missing data often appear in the data sample used for constructing the model. We use a modified k-nearest neighbors (k-NN) imputation method to deal with this problem. A study using data from the "Repository Data Disk - Release 6" is reported. The results indicate that our methodology can be useful in trimming redundant predictor metrics and identifying unnecessary categories initially assumed for a categorical predictor metric in the model.

原文English
主出版物標題Applied Computing 2006 - The 21st Annual ACM Symposium on Applied Computing - Proceedings of the 2006 ACM Symposium on Applied Computing
發行者Association for Computing Machinery
頁面1728-1733
頁數6
ISBN(列印)1595931082, 9781595931085
DOIs
出版狀態Published - 2006
事件2006 ACM Symposium on Applied Computing - Dijon, France
持續時間: 23 4月 200627 4月 2006

出版系列

名字Proceedings of the ACM Symposium on Applied Computing
2

Conference

Conference2006 ACM Symposium on Applied Computing
國家/地區France
城市Dijon
期間23/04/0627/04/06

指紋

深入研究「Applying statistical methodology to optimize and simplify software metric models with missing data」主題。共同形成了獨特的指紋。

引用此