Research & Development World

  • Home Page
  • Topics
    • Aerospace
    • Archeology
    • Automotive
    • Biotech
    • Chemistry
    • COVID-19
    • Environment
    • Energy
    • Life Science
    • Material Science
    • R&D Market Pulse
    • R&D Management
    • Physics
  • Technology
    • 3D Printing
    • A.I./Robotics
    • Battery Technology
    • Controlled Environments
      • Cleanrooms
      • Graphene
      • Lasers
      • Regulations/Standards
      • Sensors
    • Imaging
    • Nanotechnology
    • Scientific Computing
      • Big Data
      • HPC/Supercomputing
      • Informatics
      • Security
      • Software
    • Semiconductors
  • 2021 R&D 100 Award Winners
    • R&D 100 Awards
    • 2020 Winners
    • Winner Archive
  • Resources
    • Digital Issues
    • Podcasts
    • Subscribe
  • Global Funding Forecast
  • Webinars

Contingency Tables: A Special Class of Analysis

By R&D Editors | September 6, 2013

Mark AnawisSeveral tests can provide an understanding of the relationship between X and Y

Mark Twain wrote: “Facts are stubborn, but statistics are more pliable.” Statistics actually can be more “pliable,” although not in the sense that Mark Twain intended. Often, we have data that is categorical rather than continuous. Categorical data assumes discrete values, whereas continuous data can assume an infinite number of values. Categorical data can be numeric and an interval of a fixed unit size, such as liters of product. It also can be numeric or character and ordinal (i.e. ordered levels), such as tumor classes (numeric) or days of the week (character) respectively. Lastly, it can be character and nominal (i.e. no ordered levels), such as male/female. Categorical variables can be either the independent (X) or the dependent (Y) variables. Examples of X categorical variables are those listed above. An example of a Y categorical variable is pass/fail.

Contingency Analysis of Eye Color By Gender

A special class of analysis exists where both X and Y variables are categorical. It is called a Contingency Table, or Cross Tabulation. It displays X variables as rows and Y variables as columns in a grid. Each cell is a combination of one level of X variable and one level of Y variable. It contains the count of the number of values which fall into that cell (‘O’ for observed). The expected frequency (E) of each cell can be calculated as the product of row total and column total divided by grand total. Then a cell chi-square value (χ2) can be calculated (0 – E)2 / E. The individual chi-square values can be summed to calculate an overall chi-square value for hypothesis testing. This tests whether the variables are independent. The chi-square distribution is a non-symmetrical continuous distribution. Its distribution depends on the degrees of freedom  = n-1 where n is sample size. For small sample sizes, it is much skewed to the right. As n increases, it becomes more symmetrical. The alpha value is the amount of Type I error, which is allowed. This is the probability of rejecting the null hypothesis when it is true.

Contingency tables have tests which use the negative log-likelihood instead of sums of squares found in ANOVA analysis. They use different calculations to perform their tests. A model negative log-likelihood measures the reduction in uncertainty due to the model and is used to construct the chi-square test statistics. It is calculated as the difference between a corrected total negative log-likelihood and the error negative log-likelihood. The corrected total negative is the uncertainty when the probabilities are estimated by fixed rates for each response level. The error negative log-likelihood is the uncertainty calculated after fitting a model. An Rsquare is calculated as the ratio of the negative log-likelihood for the model and the negative log-likelihood for the corrected total.

Analysis of Means for ProportionsThe Likelihood Ratio Chi-square test is just twice the Model negative log-likelihood. The Pearson Chi-square test is the probability of obtaining a chi-square value greater than the one listed purely by chance if no relationship exists between X and Y and variables can be calculated. In the case of a 2 x 2 table, along with the Likelihood Ratio and Pearson Chi-square tests, there is also the Fisher’s Exact Test, which is based upon the hypergeometric probability and is applicable to either one-tailed or two-tailed hypotheses, relative risk, and odd ratios. If we examine a simple situation of gender versus eye color, we can see the following contingency table and tests (Figure 1).

The Fisher’s Exact test says that there is no difference between genders with regard to eye color as seen in the Prob for the 2-tail. Nor is there a difference for either Females or Males as seen in the Prob for Left and Right one-tailed tests.

Contingency Analysis of Test B By Test AFor Relative Risk, there are four combinations presented. Each is the ratio of the row percent of the genders for each eye color. Let’s look at the second Relative Risk. This says that for blue eye color, there is 1.29 (29 percent) higher probability of males vs. females. Lower and upper 95 percent confidence levels are also given for each ratio.

The Odds Ratio is calculated using a cross-product ratio. It says that there is a 0.66 (34 percent) lower odds of having blue eyes for females vs. males. Lower and upper 95 percent confidence levels also are given for the Odds Ratio.

An Analysis of Means of Proportions can be done to compare the proportions of the levels to the overall proportion using the normal approximation to the binomial. In this data set, we can see that the difference between females and males for blue eye color is not statistically significant at probability level 0.05 (Figure 2).

In comparing variables with the same levels, such as Test A vs. Test B, an agreement Kappa statistic with standard error and confidence limits can be calculated. A Bowker’s test of symmetry also can be calculated where the null hypothesis is that the probabilities satisfy symmetry. In this data set, the agreement is high (Kappa = 0.86) and the Bowker’s probability high indicating that the null hypothesis has not been rejected (i.e. symmetry)  (Figure 3).

When ordinal variables are compared, measures of association such as Gamma, Kendall’s Tau-b, Stuart’s Tau-c, and Somer’s D are used as measures of association, that is, to determine whether variable Y increases as X increases.

There also are tests for more complex situations. When there is a third classifying variable, the relationship across two other variables can be calculated accounting for blocking by the classifying variable using the Cochran-Mantel-Haenzel test. In the case where one variable has two levels acting as the dependent variable and the other variable is ordinal acting as the independent variable, the Cochran Armitage Trend test can be used.

Contingency Tables summarize information where both X and Y variables are categorical.  They have several tests associated with them, which can provide an understanding of the relationship between these variables.

Note: All tables and analysis were generated using JMP version 10.0.2.

Mark Anawis is a Principal Scientist and ASQ Six Sigma Black Belt at Abbott. He may be reached at editor@ScientificComputing.com.

Related Articles Read More >

Unlocking the value of your scientific data
Sofar Ocean debuts Maritime Open Standard, Bristlemouth, at OCEANS 2021
The natural resources industry can no longer afford to be a digital laggard
Cambridge Quantum develops algorithm to accelerate Monte Carlo Integration on quantum computers 
2021 R&D Global Funding Forecast

Need R&D World news in a minute?

We Deliver!
R&D World Enewsletters get you caught up on all the mission critical news you need in research and development. Sign up today.
Enews Signup

R&D World Digital Issues

February 2020 issue

Browse the most current issue of R&D World and back issues in an easy to use high quality format. Clip, share and download with the leading R& magazine today.

Research & Development World
  • Subscribe to R&D World Magazine
  • Enews Sign Up
  • Contact Us
  • About Us
  • Drug Discovery & Development
  • Pharmaceutical Processing
  • 2022 Global Funding Forecast

Copyright © 2022 WTWH Media LLC. All Rights Reserved. The material on this site may not be reproduced, distributed, transmitted, cached or otherwise used, except with the prior written permission of WTWH Media
Privacy Policy | Advertising | About Us

Search R&D World

  • Home Page
  • Topics
    • Aerospace
    • Archeology
    • Automotive
    • Biotech
    • Chemistry
    • COVID-19
    • Environment
    • Energy
    • Life Science
    • Material Science
    • R&D Market Pulse
    • R&D Management
    • Physics
  • Technology
    • 3D Printing
    • A.I./Robotics
    • Battery Technology
    • Controlled Environments
      • Cleanrooms
      • Graphene
      • Lasers
      • Regulations/Standards
      • Sensors
    • Imaging
    • Nanotechnology
    • Scientific Computing
      • Big Data
      • HPC/Supercomputing
      • Informatics
      • Security
      • Software
    • Semiconductors
  • 2021 R&D 100 Award Winners
    • R&D 100 Awards
    • 2020 Winners
    • Winner Archive
  • Resources
    • Digital Issues
    • Podcasts
    • Subscribe
  • Global Funding Forecast
  • Webinars