Classical Test Theory in Psychometrics Explained
Classical Test Theory (CTT), also known asTrue Score Theory, is one of the oldest and most widely used frameworks in psychometrics. It provides a mathematical and conceptual foundation for understanding the reliability and validity of psychological and educational tests. CTT is particularly useful for analyzing test scores and improving the quality of measurements in fields such as psychology, education, and social sciences.
As such, CTT serves as the foundation for many standardized tests and psychological assessments. It remains a cornerstone in psychometrics due to its simplicity and practical applicability. This introduction will cover the core concepts, assumptions, formulas, applications and limitations of CTT.
Observed Score, True Score, and Error
In CTT, an individual's observed score on a test is considered a combination of two components: the true score and the error score . This relationship is expressed as:
- X : Observed score (the score actually obtained by an individual on a test).
- T : True score (the hypothetical score that would be obtained if there were no measurement error).
- E : Error score (the difference between the observed score and the true score, representing random measurement error).
The true score is a theoretical construct that reflects the individual's actual level of the trait being measured, while the error score represents random fluctuations due to factors such as test administration, environmental conditions, or individual variability (Lord & Novick, 1968).
Assumptions of Classical Test Theory
CTT is based on several key assumptions:
- Linearity: The observed score is a linear combination of the true score and error.
- Unbiasedness: The expected value of the error score is zero, i.e.,
- Independence: The true score and error score are uncorrelated, i.e.,
- Homoscedasticity: The variance of error scores is constant across all true score levels, i.e.,
These assumptions provide the foundation for the mathematical derivations and interpretations in CTT (Crocker & Algina, 1986).
Reliability in Classical Test Theory
Definition of Reliability
Reliability is a central concept in CTT, referring to the consistency or stability of test scores. It is defined as the proportion of variance in the observed scores that is attributable to the true scores:
Since the observed score variance is the sum of the true score variance and the error variance, reliability can also be expressed as:
Reliability ranges from 0 to 1, with higher values indicating greater consistency in test scores (Nunnally & Bernstein, 1994).
Types of Reliability
Several methods are used to estimate reliability in CTT:
- Test-Retest Reliability: Consistency of scores over time.
- Parallel-Forms Reliability: Consistency of scores across different versions of the test.
- Internal Consistency: Consistency of scores across items within the same test (e.g., Cronbach's alpha, i.e. average correlation between test items).
Standard Error of Measurement (SEM)
TheStandard Error of Measurement (SEM)quantifies the precision of an individual's test score by estimating the variability due to random measurement error. It is calculated as:
A smaller SEM indicates greater precision in measurement. The SEM provides a confidence interval around an individual's observed score, indicating the range within which their true score is likely to fall.
where z represents the critical value from the standard normal distribution (e.g., 1.96 for a 95% confidence level).
Validity in Classical Test Theory
While reliability concerns the consistency of test scores,validityrefers to the extent to which a test measures what it purports to measure. Validity is not directly quantified in CTT but is assessed through various forms of evidence. Some of them include:
- Content Validity: The extent to which the test content represents the domain of interest.
- Criterion-Related Validity: The relationship between test scores and an external criterion.
- Construct Validity: The degree to which the test measures the theoretical construct it is intended to measure.
Applications of Classical Test Theory
CTT is applied in various fields, including:
- Employment Testing: Organizations use CTT-based assessments to ensure the reliability of hiring tests, and establishing cut-off scores for decision-making.
- Psychological Assessments: Personality tests such as the Big Five Test can utilise CTT to estimate measurement precision of major personality dimensions, including Openness, Conscientiousness, Extraversion, Agreeableness, and Neuroticism.
- Survey Research: Ensures consistency in measuring attitudes, opinions, and behaviors.
- Educational Testing: Standardized tests such as the SAT and GRE rely on CTT principles to evaluate test reliability.
Limitations of Classical Test Theory
Despite its widespread use, CTT has several limitations:
- Dependence on Test Length: Reliability estimates can change based on the number of items in the test. Longer tests generally yield higher reliability estimates.
- Sample Dependency: The reliability and validity of a test are dependent on the sample from which it was derived, making generalization difficult (Traub, 1997).
- Equal Error Assumption: CTT assumes that measurement error is the same for all examinees, which may not hold in practice. Some individuals may be more prone to errors than others.
- Random Error Assumption: CTT assumes that measurement error is random and does not account for systematic error.
- Limited Ability to Handle Missing Data: Traditional CTT techniques do not offer robust solutions for dealing with missing responses in test data.
- Inability to Model Item-Specific Characteristics: CTT does not account for the difficulty or discrimination of individual items, unlike Item Response Theory (IRT), which provides a more detailed analysis of individual test items.
Alternatives: Item Response Theory (IRT)
Unlike CTT, Item Response Theory (IRT) considers item-level data and models the probability of a correct response based on an individual's ability. Key differences include:
Feature |
Classical Test Theory (CTT) |
Item Response Theory (IRT) |
Focus |
Total test scores |
Individual item performance |
Error Treatment |
Assumes equal error for all test takers |
Models error for each test item |
Application |
Simpler, easier to compute |
More complex, requires computational models |
Conclusion
Classical Test Theory remains a foundational framework in psychometrics, providing essential tools for understanding and improving the quality of psychological and educational measurements. While newer models, such as Item Response Theory (IRT), have expanded on CTT, the principles of CTT continue to play a vital role in test development, evaluation, and interpretation due to its practicability of application.