Research & Development World

  • R&D World Home
  • Topics
    • Aerospace
    • Automotive
    • Biotech
    • Careers
    • Chemistry
    • Environment
    • Energy
    • Life Science
    • Material Science
    • R&D Management
    • Physics
  • Technology
    • 3D Printing
    • A.I./Robotics
    • Software
    • Battery Technology
    • Controlled Environments
      • Cleanrooms
      • Graphene
      • Lasers
      • Regulations/Standards
      • Sensors
    • Imaging
    • Nanotechnology
    • Scientific Computing
      • Big Data
      • HPC/Supercomputing
      • Informatics
      • Security
    • Semiconductors
  • R&D Market Pulse
  • R&D 100
    • 2025 R&D 100 Award Winners
    • 2025 Professional Award Winners
    • 2025 Special Recognition Winners
    • R&D 100 Awards Event
    • R&D 100 Submissions
    • Winner Archive
  • Resources
    • Research Reports
    • Digital Issues
    • Educational Assets
    • R&D Index
    • Subscribe
    • Video
    • Webinars
    • Content submission guidelines for R&D World
  • Global Funding Forecast
  • Top Labs
  • Advertise
  • SUBSCRIBE

Predictive Analytics: Harnessing Insights from Text and Network Data

By R&D Editors | November 14, 2014

Radhika Kulkarni, Ph.D. is SAS Vice President for Advanced Analytics R&D, and a 2014 INFORMS Fellow.The predictive analytics landscape covers a wide variety of techniques and methods designed to derive insights from data. These techniques, which include statistical modeling methods, classification rules, forecasting techniques, simulation models, machine learning tools, and so on, have been used successfully for many years on structured data (data that consists of numeric or categorical attributes, where the number of categories is limited). In recent times, the volume and variety of data available for analysis has exploded, and most of this data is in non-traditional forms, which the traditional techniques were not designed to handle.

This article describes how you can transform non-traditional data, such as unstructured data (text) or semi-structured data (networks), into a structured form that you can then use to augment traditional data. Combining both types of data provides greater opportunities for actionable insight.

Text Data

Traditional predictive modeling tools use structured data to predict a response variable, such as the likelihood of responding to a credit card offer, the probability of defaulting on a loan, or the possibility of reacting adversely to a drug treatment. Often, these applications include many sources of unstructured data that, until recently, have gone untapped. One of the most commonly available forms of such data is textual data, such as call center notes, warranty claims, survey responses, social media data, and blogs and tweets about new product releases.

An illustrative example of how such text data can be used in predictive models is described in Chakraborty and Pagolu (2014). Predictive models to detect which customers are likely to “churn” (switch to a different carrier in telecommunications, or to a different bank in the financial industry, and so on) are often used in many types of industries. Recent studies have shown that adding insight gained from some form of customer feedback (for example, survey responses) can improve the predictive power of the model. Text data is first transformed into structured data using some type of transformation, such as Singular Value Decomposition (SVD) or clustering. Then this structured data is used to augment the other traditional attributes in the model. Nareddy and Chakraborty (2011) include a detailed example that illustrates this process and shows that adding information from textual data can reduce misclassification rates considerably.

Network Data Other Sources of Data

Most predictive models (regression model, decision trees, neural networks, and so on) are built using attributes that pertain to individual observations, which often contain all the information about a specific customer or individual item. For example, if you want to identify customers who might be good candidates for a new smartphone model, you might build a probability-of-response model based on specific characteristics of the individual customers. You might intuit that a new smartphone campaign would be more successful if you offered attractive deals to some of the folks on your target list who are most popular in social networks. Modern social media enable you to collect information about “friends” and “friends of friends” so that you can easily build up a “network” of customers, from which you can gain valuable insight. This data is usually represented by a graph that shows individuals as nodes and relationships as links between them.

Figure 1: Example Network Model: FeaturizationExamples of analytical techniques used for incorporating network information into predictive models can be found in Baesens and Verbeke (2012). One such example of transforming network information into a structured form for a traditional predictive model relates to the use of first- and second-order relationships between customers (through social media connections or otherwise). These relationships can then be used as attributes in addition to traditional attributes, such as an individual’s age, recency of contact, number of contacts, and so on. Figure 1 shows some typical data for an analysis to determine the likelihood that a customer will “churn” (switch to a competing product). The premise for including such information in a predictive model is that individual customers are not isolated entities; they are often influenced by friends and relatives in their decisions to continue with a company’s services or to switch to a competitor.

Other Sources of Data

In addition to text data and network data, other sources of nontraditional data, such as voice, video, image, and streaming sensor data, can also be used effectively. The most common method of incorporating voice data into the analysis is to convert it to text and then use text analytics techniques. For video and image data, an initial analysis always includes the ability to find similarities and detect anomalies and temporal and spatial variations. In all such cases, you can take best advantage of your existing techniques by finding a way to transform the nontraditional data into a structured form and then using traditional techniques to mine it.

Conclusion

Analyzing all sources of data of multiple types in one overarching framework is clearly an area of rich opportunities for research and is worthwhile for adding valuable insight to business decisions.

References

  • Baesens, B., and Verbeke, W. 2012. “Social Networks in Data Mining: Challenges and Applications.” SAS Talks. http://support.sas.com/community/events/sastalks/presentations/SocialNetworksinDataMiningSAStalks.pdf
  • Chakraborty, G., and Pagolu, M. 2014. “Analysis of Unstructured Data: Applications of Text Analytics and Sentiment Mining.” In Proceedings of the SAS Global 2014 Conference. http://support.sas.com/resources/papers/proceedings14/1288-2014.pdf.
  • Nareddy, M., and Chakraborty, G. 2011. “Improving Customer loyalty Program through Text Mining of Customers’ Comments.” In Proceedings of the SAS Global 2011 Conference. http://support.sas.com/resources/papers/proceedings11/223-2011.pdf.

Radhika Kulkarni, Ph.D. is SAS Vice President for Advanced Analytics R&D, and a 2014 INFORMS Fellow. She may be reached at [email protected].

Related Articles Read More >

Lab automation is “vaporizing”: Why the hottest innovation is invisible
Google on how AI will extend researchers
Kythera Labs’ Wayfinder remasters incomplete medical data for AI analysis
Adviser Labs raises $1M to simplify cloud HPC for in AI and scientific computing
rd newsletter
EXPAND YOUR KNOWLEDGE AND STAY CONNECTED
Get the latest info on technologies, trends, and strategies in Research & Development.
RD 25 Power Index

R&D World Digital Issues

Fall 2024 issue

Browse the most current issue of R&D World and back issues in an easy to use high quality format. Clip, share and download with the leading R&D magazine today.

Research & Development World
  • Subscribe to R&D World Magazine
  • Sign up for R&D World’s newsletter
  • Contact Us
  • About Us
  • Drug Discovery & Development
  • Pharmaceutical Processing
  • Global Funding Forecast

Copyright © 2025 WTWH Media LLC. All Rights Reserved. The material on this site may not be reproduced, distributed, transmitted, cached or otherwise used, except with the prior written permission of WTWH Media
Privacy Policy | Advertising | About Us

Search R&D World

  • R&D World Home
  • Topics
    • Aerospace
    • Automotive
    • Biotech
    • Careers
    • Chemistry
    • Environment
    • Energy
    • Life Science
    • Material Science
    • R&D Management
    • Physics
  • Technology
    • 3D Printing
    • A.I./Robotics
    • Software
    • Battery Technology
    • Controlled Environments
      • Cleanrooms
      • Graphene
      • Lasers
      • Regulations/Standards
      • Sensors
    • Imaging
    • Nanotechnology
    • Scientific Computing
      • Big Data
      • HPC/Supercomputing
      • Informatics
      • Security
    • Semiconductors
  • R&D Market Pulse
  • R&D 100
    • 2025 R&D 100 Award Winners
    • 2025 Professional Award Winners
    • 2025 Special Recognition Winners
    • R&D 100 Awards Event
    • R&D 100 Submissions
    • Winner Archive
  • Resources
    • Research Reports
    • Digital Issues
    • Educational Assets
    • R&D Index
    • Subscribe
    • Video
    • Webinars
    • Content submission guidelines for R&D World
  • Global Funding Forecast
  • Top Labs
  • Advertise
  • SUBSCRIBE