Home
page
About
Team Focus
Products
and services
Psychometrics and
assessment techniques
What's
new
Contact
us
Site
search

Psychometrics and assessment techniques

 

---> About psychometric testing ---> FAQs

  About psychometric testing

Introduction to psychometric testing

FAQs

  Profiling for Success on-line
  Psychometric products
  Other assessment techniques
  Demos and samples
  Test training
  Buy assessments on-line

  ©Team Focus 2004

FAQs

Click on any of the questions below to go to the answer.
 


What is a psychometric test?

How can I know if a particular test or questionnaire is worth using?

What does reliability mean?

What does validity mean?

Can psychometric tests replace conventional methods of assessing people?

Why use ability tests when a personality test can tell you all you need to know?

Is a personality test always better than 'human judgement'?

What are 'Level A' and 'Level B'?

What are the advantages and disadvantages of on-line testing as compared with the traditional pencil-and-paper method?

How are tests scored?

Why are some tests timed and others not?

How are psychometric tests constructed?

Can tests be practiced?
 


 

What is a psychometric test?

A psychometric test is a standardised method for assessing a mental or psychological attribute such as a skill, an ability or a dimension of personality. To say it is 'standardised' means that the assessment is made in a standardised manner (i.e. in the same way for all people and from occasion to occasion), that the person's responses to the test are recorded and scored in a standardised manner (e.g. by the use of a fixed scoring key) and that the results are interpreted in a standardised manner (for example, by comparing the scores obtained by an individual to those obtained by a large sample of people who have taken the test previously).

Back to top
 

How can I know if a particular test or questionnaire is worth using?

Firstly you need to know that the test has been developed according to sound psychometric principles and whether there is evidence of its reliability (its ability to give consistent results) and its validity (its ability to assess what it is supposed to assess). This is a complex area and you will probably not be able to determine the answer to these questions unless you are a psychologist or have undergone special training in psychometric assessment. However, you should be able to seek advice on these issues from a consultancy firm specialising in psychometric assessment, particularly if the firm employs qualified industrial / occupational psychologists.

Secondly, you need to know whether the test or questionnaire will answer the particular questions you are asking. For example, if you are selecting for a particular position, one of the principal questions you are likely to be asking is "how well would this candidate perform in the job a year from now?". In this case, you would need to know that the test is able to make valid predictions about future performance in work of the type in question. You might also need to know whether the person will work well in a team environment. For this purpose, you would need to know whether the test is able to provide scores on personality dimensions relevant to team work. More than likely, no single test or questionnaire will be able to answer all the questions you need answers to and you will probably need to use a combination of tests and questionnaires, in addition to other selection methods. Your consultancy firm should be able to help you select a battery of techniques suitable for the purpose in question.

Back to top
 

What does reliability mean?

Reliability is essentially a measure of a test's ability to produce consistent results. What this means is that if the person were to take the test on several different occasions then, practice effects aside, they would obtain the same result. To say 'practice effects aside' means that one has to take into consideration the obvious point that if a person takes an ability test several times, they are likely to improve their score each time simply due to the effect of practice. When assessing the reliability of a test, psychologists are able to use statistical techniques to control for this effect.

There are different types of reliability and different ways of assessing it. Though each method normally is expressed in terms of an index called the reliability coefficient. This index ranges from 0 to 1 with 0 meaning zero reliability and 1 meaning 100% reliability. The reliability of good ability and reasoning tests is normally somewhere between 0.7 and 0.9. The reliability of personality measures tends to be lower for a variety of reasons, but should desirably be around 0.7 or above. For reputable tests from reputable publishers, you should be able to check the reliability of the test by consulting the test manual or enquiring directly of the test publishers.

Reliability is considered to be an essential characteristic of any test. If a test is unable to measure something consistently, then it is unlikely to lead to valid inferences being drawn about the person being assessed. Nevertheless, reliability by itself is not sufficient. A test must also be valid for the purpose for which it is going to be used and it is validity which is in fact the most important consideration when evaluating a test.

Back to top
 

What does validity mean?

The term validity can be defined in two ways: (a) the ability of the test to assess what it is supposed to assess, and (b) the ability of the test to allow valid inferences to be drawn from its results. These are in fact two ways of saying the same thing. For example, suppose we are using a personality questionnaire which contains a dimension called "Social Confidence". We firstly wish to know that the score on this dimension really does reflect the amount of social confidence a person has - or putting it differently, that people with high scores on this dimension really will show more confidence in social settings than people who score low on the dimension. Alternatively, we might say that we want to be confident in any inferences we make about the person on the basis of their score on this dimension. For example, if we wish to make the inference that people who score high on this dimension will perform well in customer service jobs, then we would want to know that this would be a correct inference which would be born out in practice.

Psychologists measure validity in different ways and refer to different types of validity - though it is not always easy to understand the difference between the different types of validity. We say a test or questionnaire has 'construct validity' if indeed the test measures the psychological attribute (or construct) that it purports to measure. Thus, an intelligence test is said to be valid if it does indeed measure how much intelligence a person has. Unfortunately however, there is a chicken-and-egg problem here in that in order to know if this is the case, we need to have some independent measure of a person's intelligence to check the results of the test against. And then we need to have some further means of assessing the construct validity of that measure. It is a never ending problem.

For this reason, in employment situations (particularly in selection situations), a second form of validity, 'predictive validity' is thought to be more relevant. Predictive validity is the ability of a test or questionnaire to make specific predictions about something which can be objectively assessed. For example, we could assess the predictive validity of a dimension of 'Sociability' by seeing if people who score highly on that scale tend to spend more time in the company of friends and associates than those who score low on the scale. Or for a dimension of 'Sales Potential', we could follow up a group of sales personnel who had been given the questionnaire at selection and see if those who got high scores on the Sales Potential dimension really did have higher sales figures a year later than those how got low scores.

Just as with reliability, validity is an essential characteristic of a test. Before choosing a test to use for a given purpose, you should always check for evidence of its validity. Such evidence can be found either in the test manual or by enquiring directly to the test publisher.

Back to top
 

Can psychometric tests replace conventional methods of assessing people?

The answer to this is partly yes and partly no. Properly developed and valid psychometric tests can certainly replace less reliable and valid techniques. For example, unstructured interviews when used to assess both ability and personality are certainly far less effective than psychometric methods. Judgements about a person's ability solely on the basis of their educational qualifications also leave much to be desired and are far better done by means of tests. However, tests and questionnaires can only provide part of the picture. Their principal function is as a supplement to both traditional methods and other objective methods of assessment. A balanced assessment program, depending on the purposes to be achieved, should include structured interviews, tests and questionnaires and objective assessments of behaviour (for example structured exercises, role-plays and other observation methods).

Back to top
 

Why use ability tests when a personality test can tell you all you need to know

If a personality test can assess such dimensions as social confidence, leadership, rationality and so on, why should you need to use ability tests or other assessment methods to measure these characteristics? The problem is that what personality tests assess is not ability but inclination and disposition. If a person scores highly on a personality dimension of rationality, this means not that they are capable of rational thinking but that they have particular personality characteristics which dispose them to a rational rather than an irrational style of thinking. A particular individual might be very highly disposed to such a style of thinking but, because he/she lacks certain intellectual and cognitive skills, actually be extremely poor when it comes to rational thinking. In general therefore, if you need to know how good someone is at doing something, the best approach is to watch them doing it (by using a standardised objective exercise). If such a standardised exercise is not possible, the next best thing is to use a psychometric test designed to assess the ability in question. Personality measures of ability-related dispositions come a very poor third after these.

Back to top
 

Is a personality test always better than 'human judgement'?

Let us imagine that there exist certain individuals who, without having had training in psychology or a related discipline, just happen to have a great deal of insight into people. How would one expect such people to fare in competition with personality questionnaires? Well, they would probably do extremely well. Some people really do have great insight and really could tell you more about a person than a personality questionnaire could. And if your company's chief interviewer happens to be such a person, then you are in luck and you can save a great deal of money on personality testing.

The problem is - how do you know whether your chief interviewer is such a person? Because they themselves say so? A lot of people are extremely confident in their ability as judges of other people, but in most cases their confidence has little to support it. Could you yourself check out their ability as judges of people? Well here you have a problem as the only way you could check out their ability would be by being an expert judge of people yourself. And then someone else is going to have to confirm that you are indeed such an expert - and so the problem goes on.

To cut a long story short, there no doubt are a few people around who can make better judgements of personality than can be made by a personality questionnaire. The problem is that they are probably very few and far between and it is extremely difficult to find out who they are. Much better to give up the hope of finding the Holy Grail and invest some money in getting your people trained in personality assessment.

Having said this however, it nevertheless remains the case that a personality questionnaire by itself can only provide you with a set of scores, one for each dimension assessed by the questionnaire. After that point you really do need to use expert judgement. Some people will naturally be better at this than others, but all will need to be carefully trained in the sorts of inferences one can make from personality scores and how each dimension in the questionnaire relates to particular aspects of job performance.

Back to top
 

What are 'Level A' and 'Level B'?

Level A and Level B are the two levels of training which form the principal stages in the British Psychological Society's (BPS's) certification system for psychometric training. Level A is the first stage and covers the basic principles of psychometrics and qualifies you to use and purchase ability and reasoning tests and other simple instruments. Level B is the second stage and qualifies you to administer and interpret personality questionnaires.

Level A and Level B courses are offered by the major test publishers and by other training providers in the UK. To offer these courses, an organisation must have been 'verified' by the BPS to ensure that those responsible for training are full conversant with the basic syllabus set out by the BPS and are assessing course participants appropriately. Click here if you would like to find out more about the Level A and Level B courses offered by Team Focus.

Back to top
 

What are the advantages and disadvantages of on-line testing as compared with the traditional pencil-and-paper method?

The principal advantages of on-line testing are convenience, speed and cost. With on-line testing, you do not need people to come to your premises to take a test. All you need is to send them the link and their entry codes (e.g. organisation code, access code, password) for your on-line facility and this allows them to take the tests and questionnaires on-line whenever and wherever is convenient to them.

As regards speed, as soon as the questionnaire responses are submitted, the data is analysed and a report is created and e-mailed to you (with perhaps also a feedback report sent to the test-taker as well). All of this happens in less than a second after the data has been submitted.

As regards cost, the costs of on-line test administration and reports are very significantly cheaper than the cost of pencil-and-paper test materials and bureau-service reports. However, by far the greatest saving is in terms of staff time that would otherwise be involved in the test administration, scoring and report writing processes.

With advanced online systems, there are also considerable advantages in terms of the administration and management of testing sessions. Team Focus's 'Profiling for Success' on-line testing facility, for example, has a Client Area where clients can set up access codes for their testing sessions. Access codes determine precisely which tests will be administered during a session, what reports will be generated and to whom the reports will be sent. Clients can also order credits on-line for their usage of the Profiling for Success system.

A disadvantage of on-line testing is that it is not possible to make reliable assessments of ability in selection situations if the candidate takes the test remotely. Obviously, there would be nothing to prevent them having a more able friend sitting next to them telling them the answers to the question. This is not however to say that on-line testing cannot be used for ability assessment. Many organisations conduct on-line assessment of ability by getting candidates to take the test under supervised conditions while attending for interview. Although this requires the time of an administrator to supervise the testing, it still saves considerable time that would otherwise have been required for scoring and report writing

Back to top
 

How are tests scored?

Tests are normally scored using an answer key of some sort. In the case of ability tests, the answer key provides the 'correct' answer to each question and usually a positive number (most commonly 1) is assigned for a correct answer and a zero is assigned for an incorrect answer (though there might be exceptions to this general rule). The scores are then totalled to obtain what is called the "Raw Score" on the test.

In the case of personality questionnaires, there are no right answers as such. In this case, particular responses will contribute to one or other side of a personality dimension. For example, suppose a personality questionnaire contains a question such as "I like going to parties", to which the possible responses are (a) agree, (b) uncertain and (c) disagree. If the person gives response (a), this might then contribute a score of 2 points to a scale of Sociability. A response of (b) might contribute just one point to this same scale and a response of (c) would contribute zero points. In the case of other questionnaires, the different responses to a given question might actually contribute to different scales. Whichever way it is done, one finishes up with a set of raw scores, one raw score for each dimension which is assessed by the questionnaire.

Except for certain special types of test, the raw scores by themselves do not actually provide very much information on how well the person has done or what their personality is like. Suppose for instance that a person scores 27 points out of a maximum of 50 on a test of verbal ability. On the face of it, that doesn't sound particularly good. However, if most other people who take the same test score below 20, then a score of 27 is extremely good. The same applies with the raw scores on personality dimensions. Is a raw score of 18 points out of 26 on a scale of Extraversion high or low? We don't know until we can find out how other people have scored.

For this reason, we normally take the scoring one final step further, using what are called 'norm tables'. Norm tables are used to convert raw scores into standardised scores. The standardised scores express the person's performance on the test in terms of a comparison with how a large sample of other people have performed. For example, using a norm table in the first example above, we might find that raw scores higher than 27 were obtained by only 5 percent of a large sample of people who took the test on a previous occasion (and conversely that 95 percent of people scored less or equal to 27). So the raw score of 27 is actually a very high score. We would express the result by saying that the person scored "at the 95th percentile".

Using the norm table is very simple. We just look up the raw score in one column and then go over to the percentile score in another column. Of course it goes without saying that if we are using a computerised or on-line test, all of this, including the basic calculation of the raw scores is done for us by a computer program, so saving a great deal of administrative time.

As it happens, there are different types of standard score used with different tests. The percentile score as described above is the most common. But you may also encounter the Sten score (ranging from 1 to 10 with scores of 5 and 6 being in the middle), the Stanine score (ranging from 1 to 9 with a score of 5 at the middle) and the T-score (ranging from 20 to 80 with 50 in the middle).

Back to top
 

Why are some tests timed and others not?

Generally speaking, personality questionnaires and other instruments which assess attitudes, interests and similar traits are not timed. The person can take as long as they wish to answer all the questions. Nevertheless, when administering a personality questionnaire, the test administrator may encourage the person taking the test not to spend too long thinking about the questions and to try to give the answer that comes to them most naturally and instinctively.

On the other hand, most tests of ability and aptitude are timed. It is important that the timing is very precise - just another minute working on the test might significantly improve a person's score. It is important therefore that candidates work as quickly as they can, though not so quickly that they begin to make mistakes. If a candidate gets bogged down trying to find the answer to a particular question, he/she may waste a great deal of time which could have been devoted to answering other questions. Normally it is possible to return to earlier questions so the best policy in such circumstances would be to move on to other questions and then return to the difficult question at the end.

Back to top
 

How are psychometric tests constructed?

A full answer to this question would require a large technical volume. However, the basic principles can be set out simply.

The first stage in developing a test is to undertake research into the area of ability or personality that is to be assessed by the test. In the case of ability tests, this would involve researching the components that make up the ability in question, the way the ability is expressed in actual performance at tasks, the sorts of questions or exercises which would be tap into that ability and so on. In the case of personality it is somewhat more complex and involves looking into the nature of personality and how it is structured in terms of particular dimensions. Personality questionnaires are normally based on a particular 'model' of personality and it should be said that there a variety of different models of personality, each of which offers a different perspective upon human behaviour.

Once the initial research stage has been completed, the next stage involves writing trial test items (questions) and testing these out on preliminary samples of people. The purpose of this testing stage is to see if the items work in practice, whether they are reliable and whether each item appears to be assessing the ability or dimension of personality it is supposed to assess. This stage of the process can be very lengthy, involving several cycles of writing and rewriting test items and testing them out in practice.

The next stage is to assemble the best items from the initial trialling stages into something that will correspond to the final test. This test is once again trialled and the data from the trialling will allow a final version to be constructed.

The last stage is called 'standardisation'. This is where the final version of the test is administered to a large sample of people (the 'standardisation sample'), which is representative of the sort of people the test will be used on in the future. The sample would include at least several hundred people and often significantly more that this. The data aquired during this stage is used to construct the norm tables which will later be used to convert the raw scores obtained from the test into 'standardised scores'. These standardised scores express an individual's test result in terms of a comparison with the performance of the people in the standardisation sample.

Back to top
 

Can tests be practised?

There are competing arguments about whether practice affects performance on ability or aptitude tests. For tests such as the Scholastic Assessment Test (SAT), which is used to predict college performance in America, test preparation or ‘coaching’ is a big industry and many claims are made by coaching companies for the efficacy of their services.

The true claims for practice can be better understood if we look at two separate components that go to make up test performance. The first of these, and the one which test users are really interested in measuring, is a person’s true ability in the area being measured. This ability is likely to remain fairly static, although may fluctuate slightly with experience. For example, ability with numbers may increase if a person has to do a lot of number work and this, in turn, may be reflected in their score on a numerical reasoning test. Conversely, not using numbers for some time is likely to make a person far slower when completing the same numerical test, and so have the effect of lowering their overall score. Some improvements in score may occur if a respondent has recently taken the same test. In this case, improvement may be partially due to them remembering some of the test items and so working out the answers to them more quickly than the first time.

Despite this, coaching for actual test questions is likely to have very little effect on overall performance. This is because of the wide range of tests and item types available. A more effective way of preparing people, and one which is increasingly recognised as being best practice in psychometric testing, is to increase their familiarity with testing and so make them more ‘test wise’.

Many people use psychometrics because it allows them to compare respondents’ abilities on a ‘level playing field’. Companies may be faced with applicants with a range of experience and educational backgrounds, and this makes it difficult to make fair comparisons between them. With an ability or aptitude test, all respondents face the same set of questions under the same conditions. Whilst this increases the fairness of the comparison process, it needs to be recognised that respondents will have different levels of familiarity with testing and different levels of comfort with psychometrics. This familiarity or test wiseness, can have a significant effect on performance.

Example and practice questions, which are common in psychometric tests, are designed to make sure that all respondents are equally familiar with the test. However, many tests also come with test preparation leaflets or ‘test taker’s guides’ which can be sent to respondents before they attend the testing session. The purpose of these is to help test takers understand what they will be asked to do, give them tips on how to prepare and give them examples of the types of question they will see in the test. Use of such guides is now recognised as best practice and, in some cases, full practice tests are used to give respondents detailed feedback in order to help them develop their test taking style.

For a general guide on preparing for tests, click here (document in Microsoft Word format).

Full practice tests are also available from Profiling for Success to individuals or organisations . Click here if you would like to find out more about the Profiling for Success 'buy-online' facility which allows individuals to practice tests on-line and receive full feedback on their results.

Organisations wishing to find our more about Profiling for Success Reasoning tests (which are available in either standard or practice versions) should click here.

Back to top