Difficulty-level Modeling of Ontology-based Factual Questions

Tracking #: 1898-3111

Vinu Ellampallil Venugopal
P Sreenivasa Kumar

Responsible editor: 
Michel Dumontier

Submission type: 
Full Paper
Semantics-based knowledge representations such as ontologies are found to be very useful in automatically generating meaningful factual questions. Determining the difficulty-level of these system-generated questions is helpful to effectively utilize them in various educational and professional applications. The existing approach for for predicting the difficulty-level of factual questions utilizes only few naive features and, its accuracy (F-measure) is found to be close to only 50% while considering our benchmark set of 185 questions. In this paper, we propose a new methodology for this problem by identifying new features and by incorporating an educational theory, related to difficulty-level of a question, called Item Response Theory (IRT). In the IRT, knowledge proficiency of end users (learners) are considered for assigning difficulty-levels, because of the assumptions that a given question is perceived differently by learners of various proficiency levels. We have done a detailed study on the features/factors of a question statement which could possibly determine its difficulty-level for three learner categories (experts, intermediates, and beginners). We formulate ontology-based metrics for the same. We then train three logistic regression models to predict the difficulty-level corresponding to the three learner categories. The output of these models is interpreted using the IRT to find a question’s overall difficulty-level. The accuracy of the three models based on cross-validation is found to be in satisfactory range (67-84%). The proposed model (containing three classifiers) outperforms the existing model by more than 20% in precision, recall and F1-score measures.
Full PDF Version: 


Solicited Reviews:
Click to Expand/Collapse
Review #1
Anonymous submitted on 04/Jun/2018
Review Comment:

All suggestions fro improvements from my original review have been met.

Review #2
Anonymous submitted on 22/Jul/2018
Review Comment:

The paper successfully closes the gap between engineered features and educational theory
by investigating the features' relatedness to Item Response Theory. Further, the work
proposes a difficulty model that takes into consideration the ability of the learner and
can estimate multi-class question difficulty for different learner categories (i.e.
expert, intermediate, beginner). The influence of each feature is studied and it was shown
that features specific to the learner categories correlate with the actual category. Also,
the method outperforms the author's previously proposed approach.

Significance of the results:
Results seem to be significant on the (small) test set with 185 questions. The argument
to choose only questions that where labeled with Method 1 for testing and all others
for training is not clear to me. Why not make sure that all datapoints are used for
training and testing, similar to n-fold cross-validation? Nonetheless, I find the
evaluation convincing that the proposed features are effective in determining question

Minor comments:
Section 5.2: From my understanding it should be "Actors who acted only in dramas are
possible answers to Qn-5." as compared to "Actors who acted only in dramas are not
possible answers to Qn-5."
Section 6.2: [17] presented accuracy of 66.4% not precision
References: Author naming is inconsistent: Compare [20] Ellampallil Venugopal Vinu and
[21] E.V Vinu, for example.

The paper is well written and easy to follow. The methodology is motivated by existing
theory and convincingly evaluated. The work is clearly positioned in contrast to
existing research. I therefore suggest to accept this paper!