Research & Development World

  • R&D World Home
  • Topics
    • Aerospace
    • Automotive
    • Biotech
    • Careers
    • Chemistry
    • Environment
    • Energy
    • Life Science
    • Material Science
    • R&D Management
    • Physics
  • Technology
    • 3D Printing
    • A.I./Robotics
    • Software
    • Battery Technology
    • Controlled Environments
      • Cleanrooms
      • Graphene
      • Lasers
      • Regulations/Standards
      • Sensors
    • Imaging
    • Nanotechnology
    • Scientific Computing
      • Big Data
      • HPC/Supercomputing
      • Informatics
      • Security
    • Semiconductors
  • R&D Market Pulse
  • R&D 100
    • Call for Nominations: The 2025 R&D 100 Awards
    • R&D 100 Awards Event
    • R&D 100 Submissions
    • Winner Archive
    • Explore the 2024 R&D 100 award winners and finalists
  • Resources
    • Research Reports
    • Digital Issues
    • R&D Index
    • Subscribe
    • Video
    • Webinars
  • Global Funding Forecast
  • Top Labs
  • Advertise
  • SUBSCRIBE

Contingency Tables: A Special Class of Analysis

By R&D Editors | September 6, 2013

Mark AnawisSeveral tests can provide an understanding of the relationship between X and Y

Mark Twain wrote: “Facts are stubborn, but statistics are more pliable.” Statistics actually can be more “pliable,” although not in the sense that Mark Twain intended. Often, we have data that is categorical rather than continuous. Categorical data assumes discrete values, whereas continuous data can assume an infinite number of values. Categorical data can be numeric and an interval of a fixed unit size, such as liters of product. It also can be numeric or character and ordinal (i.e. ordered levels), such as tumor classes (numeric) or days of the week (character) respectively. Lastly, it can be character and nominal (i.e. no ordered levels), such as male/female. Categorical variables can be either the independent (X) or the dependent (Y) variables. Examples of X categorical variables are those listed above. An example of a Y categorical variable is pass/fail.

Contingency Analysis of Eye Color By Gender

A special class of analysis exists where both X and Y variables are categorical. It is called a Contingency Table, or Cross Tabulation. It displays X variables as rows and Y variables as columns in a grid. Each cell is a combination of one level of X variable and one level of Y variable. It contains the count of the number of values which fall into that cell (‘O’ for observed). The expected frequency (E) of each cell can be calculated as the product of row total and column total divided by grand total. Then a cell chi-square value (χ2) can be calculated (0 – E)2 / E. The individual chi-square values can be summed to calculate an overall chi-square value for hypothesis testing. This tests whether the variables are independent. The chi-square distribution is a non-symmetrical continuous distribution. Its distribution depends on the degrees of freedom  = n-1 where n is sample size. For small sample sizes, it is much skewed to the right. As n increases, it becomes more symmetrical. The alpha value is the amount of Type I error, which is allowed. This is the probability of rejecting the null hypothesis when it is true.

Contingency tables have tests which use the negative log-likelihood instead of sums of squares found in ANOVA analysis. They use different calculations to perform their tests. A model negative log-likelihood measures the reduction in uncertainty due to the model and is used to construct the chi-square test statistics. It is calculated as the difference between a corrected total negative log-likelihood and the error negative log-likelihood. The corrected total negative is the uncertainty when the probabilities are estimated by fixed rates for each response level. The error negative log-likelihood is the uncertainty calculated after fitting a model. An Rsquare is calculated as the ratio of the negative log-likelihood for the model and the negative log-likelihood for the corrected total.

Analysis of Means for ProportionsThe Likelihood Ratio Chi-square test is just twice the Model negative log-likelihood. The Pearson Chi-square test is the probability of obtaining a chi-square value greater than the one listed purely by chance if no relationship exists between X and Y and variables can be calculated. In the case of a 2 x 2 table, along with the Likelihood Ratio and Pearson Chi-square tests, there is also the Fisher’s Exact Test, which is based upon the hypergeometric probability and is applicable to either one-tailed or two-tailed hypotheses, relative risk, and odd ratios. If we examine a simple situation of gender versus eye color, we can see the following contingency table and tests (Figure 1).

The Fisher’s Exact test says that there is no difference between genders with regard to eye color as seen in the Prob for the 2-tail. Nor is there a difference for either Females or Males as seen in the Prob for Left and Right one-tailed tests.

Contingency Analysis of Test B By Test AFor Relative Risk, there are four combinations presented. Each is the ratio of the row percent of the genders for each eye color. Let’s look at the second Relative Risk. This says that for blue eye color, there is 1.29 (29 percent) higher probability of males vs. females. Lower and upper 95 percent confidence levels are also given for each ratio.

The Odds Ratio is calculated using a cross-product ratio. It says that there is a 0.66 (34 percent) lower odds of having blue eyes for females vs. males. Lower and upper 95 percent confidence levels also are given for the Odds Ratio.

An Analysis of Means of Proportions can be done to compare the proportions of the levels to the overall proportion using the normal approximation to the binomial. In this data set, we can see that the difference between females and males for blue eye color is not statistically significant at probability level 0.05 (Figure 2).

In comparing variables with the same levels, such as Test A vs. Test B, an agreement Kappa statistic with standard error and confidence limits can be calculated. A Bowker’s test of symmetry also can be calculated where the null hypothesis is that the probabilities satisfy symmetry. In this data set, the agreement is high (Kappa = 0.86) and the Bowker’s probability high indicating that the null hypothesis has not been rejected (i.e. symmetry)  (Figure 3).

When ordinal variables are compared, measures of association such as Gamma, Kendall’s Tau-b, Stuart’s Tau-c, and Somer’s D are used as measures of association, that is, to determine whether variable Y increases as X increases.

There also are tests for more complex situations. When there is a third classifying variable, the relationship across two other variables can be calculated accounting for blocking by the classifying variable using the Cochran-Mantel-Haenzel test. In the case where one variable has two levels acting as the dependent variable and the other variable is ordinal acting as the independent variable, the Cochran Armitage Trend test can be used.

Contingency Tables summarize information where both X and Y variables are categorical.  They have several tests associated with them, which can provide an understanding of the relationship between these variables.

Note: All tables and analysis were generated using JMP version 10.0.2.

Mark Anawis is a Principal Scientist and ASQ Six Sigma Black Belt at Abbott. He may be reached at [email protected].

Related Articles Read More >

Why IBM predicts quantum advantage within two years
Aardvark AI forecasts rival supercomputer simulations while using over 99.9% less compute
This week in AI research: Latest Insilico Medicine drug enters the clinic, a $0.55/M token model R1 rivals OpenAI’s $60 flagship, and more
How the startup ALAFIA Supercomputers is deploying on-prem AI for medical research and clinical care
rd newsletter
EXPAND YOUR KNOWLEDGE AND STAY CONNECTED
Get the latest info on technologies, trends, and strategies in Research & Development.
RD 25 Power Index

R&D World Digital Issues

Fall 2024 issue

Browse the most current issue of R&D World and back issues in an easy to use high quality format. Clip, share and download with the leading R&D magazine today.

Research & Development World
  • Subscribe to R&D World Magazine
  • Enews Sign Up
  • Contact Us
  • About Us
  • Drug Discovery & Development
  • Pharmaceutical Processing
  • Global Funding Forecast

Copyright © 2025 WTWH Media LLC. All Rights Reserved. The material on this site may not be reproduced, distributed, transmitted, cached or otherwise used, except with the prior written permission of WTWH Media
Privacy Policy | Advertising | About Us

Search R&D World

  • R&D World Home
  • Topics
    • Aerospace
    • Automotive
    • Biotech
    • Careers
    • Chemistry
    • Environment
    • Energy
    • Life Science
    • Material Science
    • R&D Management
    • Physics
  • Technology
    • 3D Printing
    • A.I./Robotics
    • Software
    • Battery Technology
    • Controlled Environments
      • Cleanrooms
      • Graphene
      • Lasers
      • Regulations/Standards
      • Sensors
    • Imaging
    • Nanotechnology
    • Scientific Computing
      • Big Data
      • HPC/Supercomputing
      • Informatics
      • Security
    • Semiconductors
  • R&D Market Pulse
  • R&D 100
    • Call for Nominations: The 2025 R&D 100 Awards
    • R&D 100 Awards Event
    • R&D 100 Submissions
    • Winner Archive
    • Explore the 2024 R&D 100 award winners and finalists
  • Resources
    • Research Reports
    • Digital Issues
    • R&D Index
    • Subscribe
    • Video
    • Webinars
  • Global Funding Forecast
  • Top Labs
  • Advertise
  • SUBSCRIBE