Abstracting and Indexing

  • PubMed NLM
  • Google Scholar
  • Semantic Scholar
  • Scilit
  • CrossRef
  • WorldCat
  • ResearchGate
  • Academic Keys
  • DRJI
  • Microsoft Academic
  • Academia.edu
  • OpenAIRE
  • Scribd
  • Baidu Scholar

Two Nearest Means Method: Regression through Searching in the Data

Author(s): Farrokh Alemi, Madhukar Reddy Vongala, Sri Surya Krishna Rama Taraka Naren Durbha, Manaf Zargoush

Background: Historically, fitting a regression equation has been done through minimizing sum of squared residuals.

Objective: We present an alternative approach that fits regression equations through searches for specific cases in the database. Case-based reasoning predicts outcomes based on matching to training cases, and without modeling the relationship between features and outcome. This study compares the accuracy of the two nearest means (2NM), a search and case-based reasoning approach, to regression, a feature-based reasoning.

Data Sources: The accuracy of the two methods was examined in predicting mortality of 296,051 residents in Veterans Health Affairs nursing homes. Data was collected from 1/1/2000 to 9/10/2012. Data was randomly divided into training (90%) and validation (10%) samples.

Study Design: Cohort observational study.

Data Collection/Extraction Methods: In the 2NM algorithm, first data were transformed so that all features are monotonely related to the outcome. Second, all means that violate monotone order were set aside; to be processed as exceptions to the general algorithm. Third, for predicting a new case, the means in the training set are divided into “excessive” and “partial” means, based on how they match a new case. Fourth, the outcome for the new case is predicted as the average of two means: the excessive mean with minimum outcome and the partial mean with maximum outcome. To evaluate, we predicted the accuracy of linear logistic regression and the proposed procedure in predicting mortality from age, gender, and 10 daily living disabilities.

Principal Findings: In cases set aside for validation, the 2NM had a McFadden Pseudo R-squared of 0.51. The linear logistic regression, trained on the same training sample and predicting to the same validation cases, had a McFadden Pseudo R-squared of 0.09. The 2NM was significantly more accurate (alpha <0.001) than linear logistic regression. A procedure is described for how to construct a non-linear regression that accomplishes the same level of accuracy as the 2NM.

Conclusions: 2NM, a Case-Based reasoning method, captured nonlinear interactions in the data.

Journal Statistics

Impact Factor: * 6.124

CiteScore: 2.9

Acceptance Rate: 76.33%

Time to first decision: 10.4 days

Time from article received to acceptance: 2-3 weeks

Discover More: Recent Articles

Grant Support Articles

© 2016-2025, Copyrights Fortune Journals. All Rights Reserved!