Research & Development World

  • R&D World Home
  • Topics
    • Aerospace
    • Automotive
    • Biotech
    • Careers
    • Chemistry
    • Environment
    • Energy
    • Life Science
    • Material Science
    • R&D Management
    • Physics
  • Technology
    • 3D Printing
    • A.I./Robotics
    • Software
    • Battery Technology
    • Controlled Environments
      • Cleanrooms
      • Graphene
      • Lasers
      • Regulations/Standards
      • Sensors
    • Imaging
    • Nanotechnology
    • Scientific Computing
      • Big Data
      • HPC/Supercomputing
      • Informatics
      • Security
    • Semiconductors
  • R&D Market Pulse
  • R&D 100
    • 2025 R&D 100 Award Winners
    • 2025 Professional Award Winners
    • 2025 Special Recognition Winners
    • R&D 100 Awards Event
    • R&D 100 Submissions
    • Winner Archive
  • Resources
    • Research Reports
    • Digital Issues
    • Educational Assets
    • R&D Index
    • Subscribe
    • Video
    • Webinars
    • Content submission guidelines for R&D World
  • Global Funding Forecast
  • Top Labs
  • Advertise
  • SUBSCRIBE

Counterintuitive Approach Yields Big Benefits for High-dimensional, Small-sized Problems

By R&D Editors | May 15, 2015

Emphasizing the less common classes in datasets leads to improved accuracy in feature selection.Extracting meaningful information out of clinical datasets can mean the difference between a successful diagnosis and a protracted illness. However, datasets can vary widely both in terms of the number of ‘features’ measured and the number of independent observations taken. Now, A*STAR researchers have developed an approach for targeted feature selection from datasets with small sample sizes, which tackles the so-called class imbalance problem.

The class imbalance problem — when the common, or ‘majority class’ data, overwhelm the rare, or ‘minority class’ data — is a significant hurdle in data mining. This is particularly evident for datasets that have lots of features, known as high-dimensional data, or have few samples — both of which are common to gene expression analysis and clinical data.

Feng Yang and colleagues from the A*STAR Institute of High Performance Computing took an unconventional approach to this problem. They began with a common pattern classification method called linear discriminant analysis (LDA). But to make feature selection tractable, the dataset had to be ‘regularized.’

“After we analyzed the different forms of regularization,” Yang recalls, “we found that one intrinsic difference of the existing forms of regularization is the class emphasis.”

Existing regularization methods favored the majority class: “intuitively, the majority class should be given more emphasis weight, since it has more samples,” acknowledges Yang, “however, our study proved that this is not true in the high-dimensional, small-sized situation with class imbalance.”

Indeed, their study showed that when the minority class was more heavily emphasized, that both the classification accuracy and the robustness performance improved.

“From the view of sample distribution in the subspace, minority class emphasis will actually ‘squeeze’ the samples in the minority class to form a compact ‘nucleus’ in the subspace of selected features, which would be easier to be classified,” Yang explains.

The approach was tested experimentally on five gene microarray datasets, which suffered class imbalance — with the number of samples ranging from 60 to 136 and the number of features from 2,000 to 12,600. By using an incremental approach, Yang and his team were able to significantly reduce the computational load related to feature selection from 4,215 seconds to 49 seconds.

“Due to some practical limitations, such as the very specific case of a rare disease in clinic data, many practical problems will be of high dimensionality, small sample size and class imbalance,” Yang notes. “There are still issues that need to be addressed to deal with these kinds of problems.”

The A*STAR-affiliated researchers contributing to this research are from the Institute of High Performance Computing

Citation: Yang, F., Mao, K. Z., Lee, G. K. K. & Tang, W. Emphasizing minority class in LDA for feature subset selection on high-dimensional small-sized problems. IEEE Transactions on Knowledge and Data Engineering 27, 88–101 (2015).

Related Articles Read More >

Maryland set for first subsea internet cable: AWS’s 320+ Tbps “Fastnet” to Ireland
Microsoft’s 4D geometric codes slash quantum errors by 1,000x
Berkeley Lab’s Dell and NVIDIA-powered ‘Doudna’ supercomputer to enable real-time data access for 11,000 researchers
QED-C outlines road map for merging quantum and AI
rd newsletter
EXPAND YOUR KNOWLEDGE AND STAY CONNECTED
Get the latest info on technologies, trends, and strategies in Research & Development.
RD 25 Power Index

R&D World Digital Issues

Fall 2025 issue

Browse the most current issue of R&D World and back issues in an easy to use high quality format. Clip, share and download with the leading R&D magazine today.

Research & Development World
  • Subscribe to R&D World Magazine
  • Sign up for R&D World’s newsletter
  • Contact Us
  • About Us
  • Drug Discovery & Development
  • Pharmaceutical Processing
  • Global Funding Forecast

Copyright © 2025 WTWH Media LLC. All Rights Reserved. The material on this site may not be reproduced, distributed, transmitted, cached or otherwise used, except with the prior written permission of WTWH Media
Privacy Policy | Advertising | About Us

Search R&D World

  • R&D World Home
  • Topics
    • Aerospace
    • Automotive
    • Biotech
    • Careers
    • Chemistry
    • Environment
    • Energy
    • Life Science
    • Material Science
    • R&D Management
    • Physics
  • Technology
    • 3D Printing
    • A.I./Robotics
    • Software
    • Battery Technology
    • Controlled Environments
      • Cleanrooms
      • Graphene
      • Lasers
      • Regulations/Standards
      • Sensors
    • Imaging
    • Nanotechnology
    • Scientific Computing
      • Big Data
      • HPC/Supercomputing
      • Informatics
      • Security
    • Semiconductors
  • R&D Market Pulse
  • R&D 100
    • 2025 R&D 100 Award Winners
    • 2025 Professional Award Winners
    • 2025 Special Recognition Winners
    • R&D 100 Awards Event
    • R&D 100 Submissions
    • Winner Archive
  • Resources
    • Research Reports
    • Digital Issues
    • Educational Assets
    • R&D Index
    • Subscribe
    • Video
    • Webinars
    • Content submission guidelines for R&D World
  • Global Funding Forecast
  • Top Labs
  • Advertise
  • SUBSCRIBE