Aspects of a Psychological Test
The first psychological tests were created to measure intelligence and aptitude, but as psychological theory changed and testing became acceptable and widespread throughout schools, in health care, industry, and the courts, tests were designed to measure many aspects of human behavior. Designing a psychological test is extremely difficult because the human phenomena being measured, such as personality or aptitude, are not tangible or easily observable and are extremely difficult to measure. The foundations of a good psychological test are reliability and validity. A test is reliable when it can be repeated over and over and get the same results, and scores do not vary. A test is considered valid when it accurately measures what the inventor intended it to. To meet the standards of validity a test must prove to be reliable. But though a test is proven reliable does not necessarily mean it is valid—measuring what it is supposed to.
To make tests reliable and valid, experts distinguished types of reliability and validity the person designing a psychological test must test for, such as test-retest reliability and inter-rater reliability. All tests contain the possibility of error, so a measurement error is added to the true score. Measurement errors might occur for a variety of reasons, which include; the person taking the test was not at their best, maybe anxious, sick, etc. Or the environment was not ideal for test taking, for example, people making noise in the room. Types of validity include face validity and construct validity.
Reliability
To evaluate a test's reliability the test-retest system is used, which means the test was given to the same group of individuals two or more times. Their scores from the first test are compared to the second set of scores to see what changes have occurred. To check for reliability a researcher has to figure out if the differences in scores resulted from the measurement error or from the real scores. Inter-rater reliability examines the scores between people who rate or code tests.
Validity
Researchers question a test's face validity by asking if, on the surface, the test measures what is was designed for. Construct validity is considered the most important type of validity, because it means that the test does measure what it is supposed to. Construct validity is very difficult to achieve and takes a long time.
Generalizability
The word generalizability is often used as an umbrella for reliability and validity in determining if a test is designed well. The term refers to the question of whether a test given to one group of people will show similar results if it is given to another group of people. Or will the results of a test be similar to other tests measuring the same thing. So, will two different intelligence tests get similar results from the same group? And will one test get similar results from two different groups?
Fact
The majority of psychological tests used by clinicians are published by a publisher who specializes in tests. These tests are copyrighted and cannot be copied without permission of the publisher. Your mental health provider has to buy psychological tests from the publishers, which is one of the reasons you have to pay additional fees for the testing.

