Analyzing greater volumes at improved resolution that allows access for all levels
Data mining is an information extraction activity that seeks to discover hidden facts contained in large relational databases. However, reports generated from a data mining project may be difficult for non-technical users to understand. These reports are typically summarized to make them more easily understood. However, this means that they often omit important information that might be actionable by business users. Nuggets of information uncovered through the data mining exercise thus prove to be as valuable as fool’s gold.
Value of visual presentation
Why is it important to present interconnected information in a visual manner? There are three basic reasons:
• A vast majority of people are visual learners. By providing representations of data graphically, most of us can arrive more quickly at the “ah-ha moment” — the instant when we see a correlation in data that we once only suspected was there.
• The more information being explored, and the more complex the information, the harder it is to see relationships between data, particularly for non-technical personnel, such as sales and marketing. Information visualization products can examine information from a wide variety of database sources and find relationships that can be nearly impossible to discern when the volume of data becomes overwhelming.
• The process of manually querying a data repository is quite tedious and exacting.
With some information visualization products, the user is essentially required to have a pre-existing idea of the relationships for which they are searching. Conversely, more
click the image to enlarge A sample relationship map |
advanced information visualization products act like ad hoc query tools, in that it is not necessary to have a specific query. The user can simply “fish” for relationships without any defined strategic direction. Taking this capability a step further, such programs as the Thinkmap Java-based information visualization tool allow users to dynamically define the mapping between the data and the visualization in real time. What’s more, these tools encourage exploration and discovery and, more importantly, assist users in communicating their findings quickly to analysts or to other parties external to their organization. The visual information extracted through Thinkmap can then be exported into Adobe PDFs or pulled into a PowerPoint presentation.
Information visualization
Enter information visualization, an interface that sits atop the data mining reporting program. With this interface, complex relationships uncovered in multidimensional data are presented in an interactive graphical format that is easier to comprehend and re-query to find more meaningful information from data mining results. Ultimately, it enables more people to take an active role in data mining activities and creates a greater possibility for new insight to be found and used by the people who actually have customer contact.
Actually, information visualization can potentially do far more than just increase data comprehension. It also can allow the user to uncover a greater number of trends or patterns within the data — including less obvious ones — than can be discerned through a standard data mining report. In reality, the process acts similarly to having a conversation with your data, in that users can reach actionable decisions more efficiently by expressing underlying information — in the form of data, processes, relations or concepts — in such a way that patterns and trends are more effectively observed than through conventional forms of analysis, such as standard reports.
Because the field is relatively new, there still exists some confusion about what exactly constitutes an information visualization product and what does not. Essentially, there are four guidelines that help define this class of products:
• Information visualization is a breed of products that must have a graphic component tightly integrated with a data source, allowing for dynamic database queries. The end result should be a product that not only shows a flat graph, but also allows for further navigation.
• Information visualization products are end-user tools, and do not include products that are defined as database management, data warehousing, data acquisition, and so on.
• Information visualization products portray both qualitative and quantitative results. Products such as Excel that are limited to showing relative values in a graph provide a visual means of data comprehension, but not the means to work with qualitative results as one might find with data mining.
• Information visualization products are inherently analytical. Products that render 3-D imagery based on specification data should instead be classified as design software. Information visualization products do more than simply plot data; while plotting data is generally representative of historical analysis, information visualization offers some decision analysis components (“what if” capabilities, parametric graphics, and so forth).
Interestingly, while “modern” information visualization is a new discipline, the use of graphical methods to illustrate patterns or trends in data is not. Florence Nightingale is remembered as the mother of modern nursing. However, few realize that her place in history is at least partly linked to her use of graphical methods to convey complex statistical information dramatically to a broad audience. After witnessing deplorable sanitary conditions in the Crimea, she wrote Notes on Matters Affecting the Health, Efficiency and Hospital Administration of the British Army (1858), which includes several graphs of her own design that she called “Coxcombs.” Nightingale made it clear that far more deaths were attributable to non-battle causes (so-called “preventable causes”) than to battle-related causes.[1] A great resource for those looking to see examples of this type of visualization is Edward Tufte’s The Visual Display of Quantitative Information.
Practical applications
There are a number of applications for which information visualization is particularly well-suited:
• Gene Expression: Within the scientific community, we’ve seen adoption of information visualization applications in a number of different areas. Researchers exploring gene expression can look for gene patterns and correlate them to “soft” information, such as the number of articles published about a particular gene. This same type of visualization can be used within chemistry, as a means of navigating thousands of compounds, sifting through those that have certain similar properties.
• Pharmaceuticals: Effective marketing of pharmaceuticals requires an analysis of several factors, including public health information, doctor relationships and the interplay between cost and policy in treatment choice. Additionally, pharmaceutical research often involves analysis of multiple compounds in order to determine the best choice for production. Information visualization provides a more intuitive method of understanding professional influence, medical claims data, or even analysis of gene expressions.
• Intelligence community: Information visualization can present multiple dimensions of information that can be extremely useful in helping an analyst quickly sift through information to find patterns, filter the data live, and drill down to more meaningful result sets. These result sets can subsequently be exported via XML to other analytical packages.
• Knowledge management: Knowledge management describes the process of systematically leveraging information and internal expertise to improve innovation, productivity and competency. Through information visualization, it is possible to illustrate relationships that can help organizations improve teamwork and productivity. For example, access to a map of an organization’s divisional expertise connects employees with the colleagues and the data they need, resulting in quicker response rates, a reduction of redundant “innovation,” and an increase in employee satisfaction.
Information visualization products also are being employed in some comparatively “exotic” applications. Some of these offerings are being used in genealogy to create representations of family trees. A number of law firms are relying on these products to present corporate taxonomy within the firm and find the legal documents they need. Museums are even using them to find correlations between different artists and artwork.
Conclusion
The future of information visualization lies not only in its ability to analyze even greater volumes of data, but in improved resolution when moving from thousands of items on screen to millions and back again. Different information visualization products work at different resolutions. When the user goes beyond a certain number of data items, resolution tends to break down; in fact, too much information can cause the screen to resemble a giant hairball. Ultimately, information visualization companies are developing ways to move more smoothly between resolutions. Regardless of the changes, however, the core of information visualization will remain the same, allowing people at all levels of an organization to converse with their data and, from these conversations, glean the patterns and trends that will help them become more efficient, productive and successful.
Reference
1. www.math.yorku.ca/SCS/Gallery
Michael Freedman is co-founder and CEO of Thinkmap. He may be contacted at [email protected].