PSYCHOMETRICS AN INTRODUCTION PDF
Request PDF on ResearchGate | On Jan 1, , R. Michael Furr and others published Psychometrics: An Introduction. Using a meaning-based approach that emphasizes the “why” over the “how to,” Psychometrics: An Introduction provides thorough coverage of fundamental issues in psychological measurement. Author R. Michael Furr discusses traditional psychometric perspectives and issues. Editorial Reviews. About the Author. R. Michael Furr is Professor of Psychology at Wake Forest Psychometrics: An Introduction 2nd Edition, Kindle Edition. by.
|Language:||English, Spanish, German|
|ePub File Size:||24.69 MB|
|PDF File Size:||17.51 MB|
|Distribution:||Free* [*Regsitration Required]|
Introduction to Psychometrics. • Psychometrics & Measurement Validity. • Some important language. • Properties of a “good measure”. – Standardization. Part I: an introduction to R. What is R Psychometrics: the assigning of numbers to observed psychological .. Multiple web based and pdf tutorials see (e.g. Abstract. Psychometrics is that area of psychology that specializes in how to measure what we talk and think about. It is how to assign numbers.
This process is experimental and the keywords may be updated as the learning algorithm improves. This is a preview of subscription content, log in to check access. Preview Unable to display preview.
Download preview PDF. References Dewey, J. Beacon Press Google Scholar 2. Shusterman, R. Furr, M. Arden, R. Hee Kim, K.
Winograd, T. Dreyfus, H. The chapter is divided into three sections: 1 types of psychological tests, 2 psychometric properties of tests, and 3 test user qualifications and administration of tests.
Where possible an effort has been made to address the context of disability determination; however, the chapter is primarily an introduction to psychological testing. The ensuing discussion lays out some of the distinctions among such tests; however, it is important to note that there is no one correct cataloging of the types of tests because the different categorizations often overlap.
Psychological tests can be categorized by the very nature of the behavior they assess what they measure , their administration, their scoring, and how they are used. Figure illustrates the types of psychological measures as described in this report. Components of psychological assessment.
NOTE: Performance validity tests do not measure cognition, but are used in conjunction with performance-based cognitive tests to examine whether the examinee is exerting sufficient effort to perform well and responding more The Nature of Psychological Measures One of the most common distinctions made among tests relates to whether they are measures of typical behavior often non-cognitive measures versus tests of maximal performance often cognitive tests Cronbach, , A measure of typical behavior asks those completing the instrument to describe what they would commonly do in a given situation.
Measures of typical behavior, such as personality, interests, values, and attitudes, may be referred to as non-cognitive measures.
A test of maximal performance, obviously enough, asks people to answer questions and solve problems as well as they possibly can. Because tests of maximal performance typically involve cognitive performance, they are often referred to as cognitive tests.
Most intelligence and other ability tests would be considered cognitive tests; they can also be known as ability tests, but this would be a more limited category.
Non-cognitive measures rarely have correct answers per se, although in some cases e. It is through these two lenses—non-cognitive measures and cognitive tests—that the committee examines psychological testing for the purpose of disability evaluation in this report. One distinction among non-cognitive measures is whether the stimuli composing the measure are structured or unstructured.
A structured personality measure, for example, may ask people true-or-false questions about whether they engage in various activities or not. Those are highly structured questions.
On the other hand, in administering some commonly used personality measures, the examiner provides an unstructured projective stimulus such as an inkblot or a picture. The test-taker is requested to describe what they see or imagine the inkblot or picture to be describing.
The premise of these projective measures is that when presented with ambiguous stimuli an individual will project his or her underlying and unconscious motivations and attitudes. The scoring of these latter measures is often more complex than it is for structured measures. There is great variety in cognitive tests and what they measure, thus requiring a lengthier explanation.
Cognitive tests are often separated into tests of ability and tests of achievement; however, this distinction is not as clear-cut as some would portray it. Both types of tests involve learning.
Both kinds of tests involve what the test-taker has learned and can do. However, achievement tests typically involve learning from very specialized education and training experiences; whereas, most ability tests assess learning that has occurred in one's environment.
Some aspects of learning are clearly both; for example, vocabulary is learned at home, in one's social environment, and in school. Notably, the best predictor of intelligence test performance is one's vocabulary, which is why it is often given as the first test during intelligence testing or in some cases represents the body of the intelligence test e. Conversely, one can also have a vocabulary test based on words one learns only in an academic setting.
Intelligence tests are so prevalent in many clinical psychology and neuropsychology situations that we also consider them as neuropsychological measures. Some abilities are measured using subtests from intelligence tests; for example, certain working memory tests would be a common example of an intelligence subtest that is used singly as well.
There are also standalone tests of many kinds of specialized abilities. Some ability tests are broken into verbal and performance tests. Verbal tests, obviously enough, use language to ask questions and demonstrate answers. Performance tests on the other hand minimize the use of language; they can involve solving problems that do not involve language.
They may involve manipulating objects, tracing mazes, placing pictures in the proper order, and finishing patterns, for example. This distinction is most commonly used in the case of intelligence tests, but can be used in other ability tests as well. Performance tests are also sometimes used when the test-taker lacks competence in the language of the testing.
Many of these tests assess visual spatial tasks. Historically, nonverbal measures were given as intelligence tests for non-English speaking soldiers in the United States as early as World War I.
These tests continue to be used in educational and clinical settings given their reduced language component. Different cognitive tests are also considered to be speeded tests versus power tests. A truly speeded test is one that everyone could get every question correct if they had enough time.
Some tests of clerical skills are exactly like this; they may have two lists of paired numbers, for example, where some pairings contain two identical numbers and other pairings are different. The test-taker simply circles the pairings that are identical. Pure power tests are measures in which the only factor influencing performance is how much the test-taker knows or can do.
A true power test is one where all test-takers have enough time to do their best; the only question is what they can do. Obviously, few tests are either purely speeded or purely power tests.
Most have some combination of both. For example, a testing company may use a rule of thumb that 90 percent of test-takers should complete 90 percent of the questions; however, it should also be clear that the purpose of the testing affects rules of thumb such as this. Few teachers would wish to have many students unable to complete the tests that they take in classes, for example. When test-takers have disabilities that affect their ability to respond to questions quickly, some measures provide extra time, depending upon their purpose and the nature of the characteristics being assessed.
Questions on both achievement and ability tests can involve either recognition or free-response in answering. In educational and intelligence tests, recognition tests typically include multiple-choice questions where one can look for the correct answer among the options, recognize it as correct, and select it as the correct answer.
One must recall or solve the question without choosing from among alternative responses. This distinction also holds for some non-cognitive tests, but the latter distinction is discussed later in this section because it focuses not on recognition but selections.
For example, a recognition question on a non-cognitive test might ask someone whether they would rather go ice skating or to a movie; a free recall question would ask the respondent what they like to do for enjoyment.
Cognitive tests of various types can be considered as process or product tests. Take, for example, mathematics tests in school. In some instances, only getting the correct answer leads to a correct response. In other cases, teachers may give partial credit when a student performs the proper operations but does not get the correct answer.
Similarly, psychologists and clinical neuropsychologists often observe not only whether a person solves problems correctly i. Test Administration One of the most important distinctions relates to whether tests are group administered or are individually administered by a psychologist, physician, or technician.
Tests that traditionally were group administered were paper-and-pencil measures. Often for these measures, the test-taker received both a test booklet and an answer sheet and was required, unless he or she had certain disabilities, to mark his or her responses on the answer sheet. In recent decades, some tests are administered using technology i. There may be some adaptive qualities to tests administered by computer, although not all computer-administered tests are adaptive technology-administered tests are further discussed below.
An individually administered measure is typically provided to the test-taker by a psychologist, physician, or technician. More faith is often provided to the individually administered measure, because the trained professional administering the test can make judgments during the testing that affect the administration, scoring, and other observations related to the test.
Tests can be administered in an adaptive or linear fashion, whether by computer or individual administrator. A linear test is one in which questions are administered one after another in a pre-arranged order. An adaptive test is one in which the test-taker's performance on earlier items affects the questions he or she received subsequently.
An introduction to psychological assessment and psychometrics pdf download
Typically, if the test-taker is answering the first questions correctly or in accordance with preset or expected response algorithms, for example, the next questions are still more difficult until the level appropriate for the examinee performance is best reached or the test is completed.
If one does not answer the first questions correctly or as typically expected in the case of a non-cognitive measure, then easier questions would generally be presented to the test-taker.
Tests can be administered in written keyboard or paper-and-pencil fashion, orally, using an assistive device most typically for individuals with motor disabilities , or in performance format, as previously noted. It is generally difficult to administer oral or performance tests in a group situation; however, some electronic media are making it possible to administer such tests without human examiners. Another distinction among measures relates to who the respondent is.
In most cases, the test-taker him- or herself is the respondent to any questions posed by the psychologist or physician. In the case of a young child, many individuals with autism, or an individual, for example, who has lost language ability, the examiner may need to ask others who know the individual parents, teachers, spouses, family members how they behave and to describe their personality, typical behaviors, and so on.
Scoring Differences Tests are categorized as objectively scored, subjectively scored, or in some instances, both. An objectively scored instrument is one where the correct answers are counted and they either are, or they are converted to, the final scoring. Such tests may be scored manually or using optical scanning machines, computerized software, software used by other electronic media, or even templates keys that are placed over answer sheets where a person counts the number of correct answers.
Examiner ratings and self-report interpretations are determined by the professional using a rubric or scoring system to convert the examinee's responses to a score, whether numerical or not.
Sometimes subjective scores may include both quantitative and qualitative summaries or narrative descriptions of the performance of an individual. Scores on tests are often considered to be norm-referenced or normative or criterion-referenced. Norm-referenced cognitive measures such as college and graduate school admissions measures inform the test-takers where they stand relative to others in the distribution. For example, an applicant to a college may learn that she is at the 60th percentile, meaning that she has scored better than 60 percent of those taking the test and less well than 40 percent of the same norm group.
Likewise, most if not all intelligence tests are norm-referenced, and most other ability tests are as well. In recent years there has been more of a call for criterion-referenced tests, especially in education Hambleton and Pitoniak, For criterion-referenced tests, one's score is not compared to the other members of the test-taking population but rather to a fixed standard.
High school graduation tests, licensure tests, and other tests that decide whether test-takers have met minimal competency requirements are examples of criterion-referenced measures. When one takes a driving test to earn one's driver's license, for example, one does not find out where one's driving falls in the distribution of national or statewide drivers, one only passes or fails.
Test Content As noted previously, the most important distinction among most psychological tests is whether they are assessing cognitive versus non-cognitive qualities. In clinical psychological and neuropsychological settings such as are the concern of this volume, the most common cognitive tests are intelligence tests, other clinical neuropsychological measures, and performance validity measures. Many tests used by clinical neuropsychologists, psychiatrists, technicians, or others assess specific types of functioning, such as memory or problem solving.
Performance validity measures are typically short assessments and are sometimes interspersed among components of other assessments that help the psychologist determine whether the examinee is exerting sufficient effort to perform well and responding to the best of his or her ability. Most common non-cognitive measures in clinical psychology and neuropsychology settings are personality measures and symptom validity measures.
Some personality tests, such as the Minnesota Multiphasic Personality Inventory MMPI , assess the degree to which someone expresses behaviors that are seen as atypical in relation to the norming sample. Symptom validity measures are scales, like performance validity measures, that may be interspersed throughout a longer assessment to examine whether a person is portraying him- or herself in an honest and truthful manner.
Somewhere between these two types of tests—cognitive and non-cognitive—are various measures of adaptive functioning that often include both cognitive and non-cognitive components. In evaluating the quality of psychological measures we are traditionally concerned primarily with test reliability i. This section provides a general overview of these concepts to help orient the reader for the ensuing discussions in Chapters 4 and 5.
In addition, given the implications of applying psychological measures with subjects from diverse racial and ethnic backgrounds, issues of equivalence and fairness in psychological testing are also presented. Reliability Reliability refers to the degree to which scores from a test are stable and results are consistent.
When constructs are not reliably measured the obtained scores will not approximate a true value in relation to the psychological variable being measured. It is important to understand that observed or obtained test scores are considered to be composed of true and error elements. A standard error of measurement is often presented to describe, within a level of confidence e.
Introduction to Psychometric Theory by Tenko Raykov
Reliability is generally assessed in four ways: 1. Test-retest: Consistency of test scores over time stability, temporal consistency ; 2. Inter-rater: Consistency of test scores among independent judges; 3. Parallel or alternate forms: Consistency of scores across different forms of the test stability and equivalence ; and 4.
Internal consistency: Consistency of different items intended to measure the same thing within the test homogeneity. A special case of internal consistency reliability is split-half where scores on two halves of a single test are compared and this comparison may be converted into an index of reliability.
A number of factors can affect the reliability of a test's scores. These include time between two testing administrations that affect test-retest and alternate-forms reliability, and similarity of content and expectations of subjects regarding different elements of the test in alternate forms, split-half, and internal consistency approaches.
In addition, changes in subjects over time and introduced by physical ailments, emotional problems, or the subject's environment, or test-based factors such as poor test instructions, subjective scoring, and guessing will also affect test reliability.
It is important to note that a test can generate reliable scores in one context and not in another, and that inferences that can be made from different estimates of reliability are not interchangeable Geisinger, Validity While the scores resulting from a test may be deemed reliable, this finding does not necessarily mean that scores from the test have validity.
In discussing validity, it is important to highlight that validity refers not to the measure itself i. To be considered valid, the interpretation of test scores must be grounded in psychological theory and empirical evidence that demonstrates a relationship between the test and what it purports to measure Furr and Bacharach, ; Sireci and Sukin, Historically, the fields of psychology and education have described three primary types of evidence related to validity Sattler, ; Sireci and Sukin, : 1.
Construct evidence of validity: The degree to which an individual's test scores correlate with the theoretical concept the test is designed to measure i. Content evidence of validity: The degree to which the test content represents the targeted subject matter and supports a test's use for its intended purposes; and 3. Criterion-related evidence of validity: The degree to which the test's score correlates with other measurable, reliable, and relevant variables i.
Psychometric Tests in the Field of Drawing Based in Timing Measurements
Other kinds of validity with relevance to SSA have been advanced in the literature, but are not completely accepted in professional standards as types of validity per se. These include 1. Diagnostic validity: The degree to which psychological tests are truly aiding in the formulation of an appropriate diagnosis. Ecological validity: The degree to which test scores represent everyday levels of functioning e.
Cultural validity: The degree to which test content and procedures accurately reflect the sociocultural context of the subjects being tested. Each of these forms of validity poses complex questions regarding the use of particular psychological measures with the SSA population.
For example, ecological validity is especially critical in the use of psychological tests with SSA given that the focus of the assessment is on examining everyday levels of functioning.
Measures like intelligence tests have been sometimes criticized for lacking ecological validity Groth-Marnat, ; Groth-Marnat and Teal, More recent discussions on validity have shifted toward an argument-based approach to validity, using a variety of evidence to build a case for validity of test score interpretation Furr and Bacharach, In this approach, construct validity is viewed as an overarching paradigm under which evidence is gathered from multiple sources to build a case for validity of test score interpretation.
Five key sources of validity evidence that affect the degree to which a test fulfills its purpose are generally considered AERA et al. Test content: Does the test content reflect the important facets of the construct being measured?
TYPES OF PSYCHOLOGICAL TESTS
Are the test items relevant and appropriate for measuring the construct and congruent with the purpose of testing? Relation to other variables: Is there a relationship between test scores and other criterion or constructs that are expected to be related?
Internal structure: Does the actual structure of the test match the theoretically based structure of the construct? Response processes: Are respondents applying the theoretical constructs or processes the test is designed to measure? Consequences of testing: What are the intended and unintended consequences of testing? Standardization and Testing Norms As part of the development of any psychometrically sound measure, explicit methods and procedures by which tasks should be administered are determined and clearly spelled out.
This is what is commonly known as standardization. Typical standardized administration procedures or expectations include 1 a quiet, relatively distraction-free environment, 2 precise reading of scripted instructions, and 3 provision of necessary tools or stimuli.
All examiners use such methods and procedures during the process of collecting the normative data, and such procedures normally should be used in any other administration, which enables application of normative data to the individual being evaluated Lezak et al.
Standardized tests provide a set of normative data i. Norms consist of transformed scores such as percentiles, cumulative percentiles, and standard scores e. Without standardized administration, the individual's performance may not accurately reflect his or her ability.
For example, an individual's abilities may be overestimated if the examiner provides additional information or guidance than what is outlined in the test administration manual. Conversely, a claimant's abilities may be underestimated if appropriate instructions, examples, or prompts are not presented.
When nonstandardized administration techniques must be used, norms should be used with caution due to the systematic error that may be introduced into the testing process; this topic is discussed in detail later in the chapter.
It is important to clearly understand the population for which a particular test is intended. The standardization sample is another name for the norm group. Norms enable one to make meaningful interpretations of obtained test scores, such as making predictions based on evidence. Developing appropriate norms depends on size and representativeness of the sample. In general, the more people in the norm group the closer the approximation to a population distribution so long as they represent the group who will be taking the test.
Norms should be based upon representative samples of individuals from the intended test population, as each person should have an equal chance of being in the standardization sample. Stratified samples enable the test developer to identify particular demographic characteristics represented in the population and more closely approximate these features in proportion to the population.
For example, intelligence test scores are often established based upon census-based norming with proportional representation of demographic features including race and ethnic group membership, parental education, socioeconomic status, and geographic region of the country.
When tests are applied to individuals for whom the test was not intended and, hence, were not included as part of the norm group, inaccurate scores and subsequent misinterpretations may result.A standard error of measurement is often presented to describe, within a level of confidence e. Norms consist of transformed scores such as percentiles, cumulative percentiles, and standard scores e.
Probabilistic models for some intelligence and attainment tests. On the other hand, when measurement models such as the Rasch model are employed, numbers are not assigned based on a rule. However, traditionally, the teaching of art has been focused into the technical dimension of the drawings proportionality, composition, etc.