Applying Generalizability Theory in OSCE for Reliability Analysis Using EduG : Simple How To

There are two keywords here, Generalizability Theory (G-Theory or GT) and EduG.

First, quick fact about EduG. EduG is a windows based software used to run a GT. The only GUI based GT software out there. It was designed by Prof Jean Cardinet from Switzerland, who was once involved in developing the GT theory itself under Prof Lee Cronbach in the 1960’s.

EduG is very flexible, produces a variance table, G table (it is very important), Decision table (D table), and of course a G coefficient. It allows you to create a nested design, mixed model, and even to isolate the facet(s). The limitation of EduG is that the data has to be balanced. In his book and email to me, Prof Cardinet said that if we can randomly discard the excess data, it would be fine. In certain conditions, mostly based on my experience, it will fit to our “regular” needs.

Even though this software produces the variance table, EduG does not produce the F test calculation. The author mentioned that we could calculate using other softwares to do it, since this is not the main focus of EduG. In some complicated designs, running an F test using widely available software is not possible. To solve it, you can simplify the design to run F test.

In this article, I will focus on using EduG to run the reliability analysis for the Objective Structured Clinical Examination (OSCE) and anything similar with it (i.e. Multiple Mini Interview). This should cover the basic knowledge to run the test. If you need to know more about GT you should understand the inside outs of Analysis of Variance and then go through GT book.

GT is the reliability analysis method. In AMEE Guideline No. 68, this method becomes the “golden standard” to analyze the reliability of an OSCE. Analyzed OSCE using Classical Test Theory (CTT) is not recommended. In my paper “Calibration of Communication Skills Items in OSCE Checklists according to the MAAS-Global “, published in Patient Education & Counseling (, I explained that CTT have limitations by only showing two variances only (true variance and error variance). Whilst in the AMEE Guideline, they give more sample explanations regarding the limitation CTT (please refer to the source).

Furthermore, in my paper, the strongest point of GT is the ability to quantified the contribution of the potential source of error. In this analysis, G Table, provides the detailed information of which an error contributes the most. We can pin point the problem(s). As mentioned in the guidelines, this provides valuable information to the OSCE administration to further increase the quality of the test. We can make models on how to increase the quality of the assessment using Decision Study (D-study). Therefore, we will have a clear guideline on how to reduce the assessment error for future tests.

Just for your information, GT is not popular at all in researcher and educationalist world. It might due to the complexity of the theory itself. Up to 2014, report from a systematic review study, majority of the OSCE were analyzed using Cronbach Alpha for the reliability analysis test. What happened with the Cronbach Alpha ? Prof Lee Cronbach, the creator of Cronbach Alpha, saw the limitation of it, therefore he did research creating a new theory to overcome those limitations. Cronbach alpha was designed to measure dichotomous. Yes or No. One correct answer only. Multiple choice question is included. Although it has 4 possible answer, the correct answer is only one. Whilst in OSCE, the rubrics usually using likert scale or numeric scale. Using Cronbach Alpha is a condition other than Yes No answer is not appropriate. Many researcher using Cronbach Alpha to measure internal consistency within station in OSCE. Thus the reliability of OSCE itself are the result of averaging the coefficient of each station.

What about the passing score in OSCE ? If we search in scholar, we will find many methods use in OSCE standard setting, it is your decision which standard setting to use. EduG provide SEM, which can be use for that purpose. Even though many OSCE cut score use static cut score rather than dynamic cut score, study suggest that valid cut score in OSCE should use Borderline Regression Analysis (BRA) method.

Ok, enough with the hocus pocus theory things above. Lets go straight on calculation.

In this article, the scenario is an OSCE with 10 stations, which is conducted in 3 different locations. This OSCE is administered in One day only, start in the morning and finish in the evening. There is small break every 2 cycles, and there is a full brake in the mid afternoon for lunch. In our school, clinical rotation program, there are 4 big department assess OSCE in the same day. In this scenario I took data from 2 departments only.

Starting with the raw data. Most likely the result will be manage and save using spreadsheet (eg: Ms Excel, Libre Calc). The picture below illustrate the file management.

Sample above using Libre Calc (an open source office suites).
Usually, the exam results save in this format. There is ID column, and followed by the score for each station to the right column. The next row will be the other students. And the stations’ score from one department grouped one after another.

If we have different locations of an OSCE due to the limitation of rooms, or we have more than one Circuit of OSCE, then we will safe another locations/circuits below students from other circuits/locations (illustration below)


This type of data format sometimes called wide data format. In order to be used as input in EduG, it has to be in Long data format, with .txt or .csv format. EduG need input data only the station score, no total_score needed, nor other information. Therefor the file should be clean.

Of course changing wide data format to long data format is not possible when using manually copy-paste with transpose. For those who familiar with SPSS, there is a menu to change wide data format to long data format, and you should save the score only into .txt or .csv.  In my experience, using macro in excel is possible too. And that is what I usually do.

Sub RangeToColumn()
Dim varray As Variant
Dim i As Long, j As Long, k As Long
Application.ScreenUpdating = False
k = 1
varray = Range(“C2:F61”).Value
For i = 1 To UBound(varray, 1)
    For j = 1 To UBound(varray, 2)
        Cells(k, 9).Value = varray(i, j)
        k = k + 1
Application.ScreenUpdating = True
End Sub

The macro I provided above , served me well. Before you use it, you need to know on how to use macro script in excel. Just copy and paste the script, edit little bit of the parameters.
Range(“C2:F61”) –> change with your data range
Cells(k, 9) –> column (k) to write the result , count from the left
You are good to go. Run the script.


Now, we assume that you already have an .txt or .csv file ready to import to EduG. We can start the EduG.

When you start the EduG, the blank windows appear. You have to create EduG file, before you can use. Click File – New , to choose where you want to save the file. This logic is so much different with MS Word, dont get confused. In word, once you open, the blank file is ready to use and save the file afterword. In EduG, we should create the file first, provide with blank file, then ready to use.

Picture above is clean window of EduG. Simple, all options are there. There is no hidden setting. After this, we will go through step by step creating the file to calculate GT, specific for the scenario above. Other setting for different type of research or calculation will not explain in this article.

Base on the scenario, we decide to make 4 facets. For those who do not familiar, facet is equivalent with variance in anova. Since GT developed base on CTT Analysis of Variance, to avoid confusion, in GT use facet instead of variance.

As we can see above, the students are nested within location. The students were basically divided into 3 big groups, based on the building (A, B, C). The stations were nested within departments. It is clear that the stations were design and own by departments, therefore the stations (O = Observation) were unique to each department. This nested code should be written correctly in “Label”. And The design have to be typed into “Measurement Design” before entering the data. The Differentiation facet were located in the left side of “/”. It doesn’t matter which label first, it is reversible. The important thing is the position to “/”, left or right side.

Do not make mistake on imputing the “Level”. It is basically the amount of each facet. The locations, are 3, since we have the exam in 3 different location. The students only 30 ? NO. Actually the total students (n) are 30 x 3 (remember, they are divided into 3 buildings). The total data value cells are 3 x 30 x 2 x 5  = 900. So, your input file should contain exactly 900 data value or in excel will be 900 lines in one column. If those two numbers (total level and data input), EduG will give you warning, and you can not continue to the next step.

Once you click import button, this menu appear. Choose your file and then you are ready to analyze. But before you star compute, you should re-check your data structure. Click the insert data, and you will see the manual data typing. This become very handy tools.

Look at the structure of the data. Is it in the correct order ? If not, then you will have false reading. Remember EduG only check whether the data is in the same level with the calculation settings.

Once your data import done, checking the data structure done. The next step is calculation.

The writer prefer the report in .RTF format, since it is better to read and easy to manipulate the table format compare with .txt . Usually, I prefer calculating Mean before run the GT analysis. This give me better understanding of the data.

In this example, checking the differences between locations and within department in each location is helpful.

As seen in graph above, it is printed in the file the mean for each location and mean each department in each location. Whilst the grand mean always printed on the top of the table.

NOTE : in this article, we will not discuss on interpreting the result of GT using EduG. In this section, the main focus is in running the analysis.

When the Optimization option clicked, it will appear variable to type.

The next step is the GT calculation, the main goal of this analysis. The most important menu too choose are Anova, Coef_G, and Optimization. The Optimization is the D-study. Whilst for advance user, it could combine with “Observation Design Reduction”.

Now we can start to compute. Once you click, it will give you warning regarding the file, since we already calculate Mean before this.

Do not click replace, it will create new file and your Mean calculation is gone. Choose “Add”, and the calculation will be add to the same file with the Mean calculation. Just to be sure that the file is not open, or it will give you error message.

The first table will be Analysis of Variance table, as seen below.

And below the Anova table, the GT table printed along with SEM too.

As we can see above, we can read the percentage contribution of error. Therefor, the exact source of problem can be determine. Regarding the Anova table, it shows the error variance only.

Remember with the D-study option ? It is printed below as follow.

The D table is telling us about hypothetically result if the exam setting, or the research condition, were change or different. As we can see that increasing the amount of stations from 5 to 10, will increase the reliability of assessment.

All in all, the most important conclusion of this exam is :
The biggest source of error, and need to be solve in the future test is the examiner error in one location only and in one department. Future exam need to address on the standardization of the student marking.

(How do I end up with that conclusion ? Please continue to the other part, Interpretation of GT in EduG)

Thanks for reading. Enjoy trying Generalizability Theory analysis on your own OSCE

This article is so special for me.  No word can ever express my appreciation to Prof Jean Cardinet who passed away in June 2015. I was lucky enough that during my PhD journey I could learn directly from the Guru, even though only via email. It started when I read his book about GT and EduG, searching through google for a long time, and finally I found his email. I start to email him since I did not understand GT at all. Not even a little. Funny happened, the first 4 email during 6 months or so, he always replied in French. He might think that I just regular random guy who try to learn GT. And I had to use translator just to read his email. Things changed after that, he replied in English, may be Prof Cardinet think that this guy is really want to learn. Email after email, I had discussed based on his book, chapter by chapter. Moving from beginning, middle of the book, till the end of his book. Everytime I have 1 or 2 question, his email would be pages long of explanation in attachment file. I will ask another question after a week or two reading and trying to understand his explanation. He mentioned that many of his explanation did not writen in his book since it would be to thick and create more complication for early learner of GT. In total, his explanation almost as thick as his book it self. I keep his note along with his book. Make it twice thicker than the original. If I am not mistaken, this happened in about 2 years. In early 2015 he mentioned that he is in cancer treatment, and he has to wait several days after Chemotherapy before he can answer my question. He was about 80 years + old. And finally, there was his email to me, said that this email might be his last email. I was so shock, so sad. After that, he never reply my email, and in early 2016 I had a news from his former office that he passed away on June 2015. I think, I become his last student of his. Thank you for your patient – Thank you for your time Prof Cardinet. Your guidance has given me invaluable knowledge. RIP