# Interpreting The Result of Item Analysis

In Item Analysis calculation, the interpretation of the result is pretty much straightforward. The only problem that might occur for beginner is familiarity with the data (or worst never analyze it before). The more we analyze, the easier we could understand what happened inside the data

Therefore, in this article we will start from basic knowledge of Item Analysis. The result discuss here, are the result from the calculation using the script I created in R to interpret the result of Computer Based Test (CBT) in Moodle platform.

This is the link Item Analysis using R

First, let refresh our memory, see the output from R calculation in the first sheet For novice user, focus on column C. The ideal items mark with “ok”. This means that there is no problem on difficulty level, discrimination, and item to score (using Point B serial). This is good questions. Whilst you can read the potential problems in column D to G.

On the top of the sheet, you will find the resume of the item quality. Please bare in mind that each items could have more that one problems, therefore if you try to sum up the n (amount of items) of ideal, easy, difficult , low discrimination, and item to score will not equal to the total number of question the assessment.

You can stop here, and with this data from first sheet, you could then create a duplicate of the question into new question with the same type. You will have big confidence that this type of questions will give you similar result. And you save so much time.

Or, you can decide with the provided information that you want particular exam consist of x percent difficult question and y percent of easy question. It is all depend on your purpose of your exam. You can use difficult/easy question from warning sheet which do not have warning in column E to G. Further consideration regarding the item analysis will discuss after “basic knowledge of Item Analysis” section.

What about questions with warnings ? Well we have to review further evidence.
Now, let’s review some basic theory. And check on the second sheet. Minimum Requirement Skills/Knowledge
First and foremost, we have to remember that in medical education, it is common that student have to be able to do or know certain skills/knowledge. Lets take Basic Life Support (BLS), as the example. If you as the educationalist already set the standard that the student have to be able to do BLS, understanding the all necessary knowledge to do BLS is a must. Therefore, if you have question regarding BLS, we will expect the result to be “easy” (in column B). Why ? Because every student will get the answer correct. In this case, having “easy” warning is not a problem, and thus the question is not categorize as bad question.

Empirical Analysis

1. Item difficulty.
2. Item – total test correlations.
3. Item discrimination.
4. Distracter functioning.

Item Difficulty (column E)

Item Difficulty is measuring how difficult the question to the students. It is a ratio between student who can answer correctly to the total of the student who join the exam. This ratio number range from 0 – 1. The more student could answer correctly, the higher the number closer to 1. If the question is difficult (few students could answer) then the number close to 0.
Let see simple example :
100 students join an exam with 100 questions. There are 65 students answer correctly in question number 1.
This will give us an Item Difficulty of the question number one , 0.65

Easy right ?
What is the purpose of Item Difficulty ? It is to give the information about the poor and the good performers. To distinguish between poor and good performers.

In the script I wrote here (Item Analysis using R), I use the range 0.35 – 0.8 as a good questions as it provide maximum values of information about differences between students. Some institution using value range 0.3 – 0.7.
Note : you should avoid question that have value close to 0 or 1. It might have a content validity problems.

In the first sheet, I wrote warning “difficult” and “too difficult”. Below 0.35 is “difficult” and below 0.25 is “too difficult”. And of course above 0.8 is “easy”.

Item – total test correlations (column G)
This is a correlation between answering question correctly with the total score. For example, is it answering correctly question number 2 correlate with the total score ? It is expected that if the students answer number 2 correctly, they will have higher total scores, compare who answer wrong.

In Psychometric library in R, the item criterion give us freedom to correlate item with Y criterion (choose what you want to correlate with). In the script, I correlate item with total score and the result is column G. It is a Point biserial correlation.

Conceptually, it is shown like below : This correlation assumes that the item score is a true dichotomy (MCQ only have one correct answer). The r ranges between -1.00 and 1.00. If we find high positive values, means a good item. On the other side, low or negative values are bad and you could consider to delete the item.

Item discrimination
This is called D-index. It is the ability to distinguish between the students who understand the material and who do not. It is calculates the proportion of high achievers and low achievers to answer the item correctly.

The value of D-index is between -1.0 and 1.0.
So, in this value,  a high-positive values are good items. With the opposite condition, low or negative value are bad. The simple explanation is, when the low-scorer student could answer the questions, means this question cannot distinguish the ability of the student that well.

Further Consideration In Item Analysis

Below are the consideration you should think when you analyse the item.

1. Easy question is not always bad, in the extreme condition,  easy with negative discrimination and not function distractor (all or majority students could answer). Do you know why ? If in that question you want to know the basic knowledge that all students in that year have to be able to know (ie. normal range of blood pressure), then you would expect all student answer correctly.
2. On the other side, if you want to know whether a difficult question can be answer by smart student, then you will expect that the students that able to answer is very low, therefore many of them will choose the wrong items.
3. Base on my experience, consider the alpha and total of passed student give another dimension. You have to remember that in CTT, this calculation is based on the sample that join the exam. You have to understand the student characteristic, are known as a clever-group of student or opposite condition. This will affect the result of the item analysis. With the same questions, if you test it in smart student, it will shown that the question all become easy. What should you consider in this situation ? Look at the Alpha, if it high (> 0.80) means your measurement is good, and you got a smart group of students.
4. If you got many failed students, with the high Alpha value, sometimes you can find many of the questions were marked as difficult. Don’t be panic. Your exam is good, it could be your student did not study, or in some case, they only study from the questions of the previous exams. So, when it comes to the exam with majority of the questions are new, in the item analysis will show a lot of question marked as difficult, but the alpha value is very high.
5. A high number of easy question with the low value of Alpha, can be a sign of “exam breach”
6. Another consideration is pay attention in the amount of borderliner students (with SEM in 95% confidence interval). If you found many students fall into this criteria, you should be aware that may be, your exam has many error.

All in all, distinguish the good or bad items is pieces of cakes. But, in some condition you should consider the other parameter that I provide in the spreadsheet.

References :
The variables in my scripts based on this and this. Both are workshop in psychometric in the AMEE conference 2015 in Glasgow.