identify student`s grade and dropout issue

MS Thesis
IDENTIFY STUDENT’S GRADE AND
DROPOUT ISSUE
Submitted By:
Mereen Rehman
Reg#: 671-FBAS/MSCS/S12
Supervisor:
Ms. Zakia Jalil
Assistant Professor
Co Supervisor:
Sadia Arshid
Department of Computer Science & Software Engineering,
Faculty of Basic & Applied Sciences, International Islamic University,
Sector H-10, Islamabad
2014
Declaration
FINAL APPROVAL
It is certified that we have read the project titled “Identify Student’s Grade and Dropout Issue”
submitted by Ms. Mehreen rehman (671-FBAS/MSCS/S12) is accepted in its present form by the
Department of Computer Sciences and Software Engineering, Faculty of Basic & Applied
Sciences, International Islamic University Islamabad, as satisfying the thesis requirements for the
degree of MS of Sciences in Computer Science.
Committee
Internal Examiner
Lecturer,
Department of Computer Science,
International Islamic University, Islamabad.
External Examiner
Lecturer,
Department of Computer Science,
International Islamic University, Islamabad.
Supervisor
Ms. Ms. Zakia Jalil
Assistant Professor
Department of Computer Science,
International Islamic University, Islamabad.
______________________________________________________________________________
Identify Student’s Grade and Dropout Issu
i
Declaration
A thesis submitted to the
Department of Computer Science
and Software Engineering,
International Islamic University, Islamabad
As a partial fulfillment of the requirements
for the award of the degree of
MS Computer Science
______________________________________________________________________________
Identify Student’s Grade and Dropout Issu
ii
Declaration
DECLARATION
I, hereby declare that “Identify Student’s Grade and Dropout Issue” neither as a whole nor as
a part thereof has been copied out from any source. I have developed this project and the
accompanied report entirely on the basis of my personal efforts made under the sincere guidance
of my supervisor. No portion of the work presented in this report has been submitted in support
of any application for any other degree or qualification of this or any other university or
institution of learning.
Mehreen Rehman
671-FBAS/MSCS/S-12
______________________________________________________________________________
Identify Student’s Grade and Dropout Issu
iii
Acknowledgements
ACKNOWLEDGEMENTS
All praises and much gratitude to Almighty Allah, the most merciful and glorious, who granted
me the potential to work hard and perseverance to accomplish this research work.
I would like to sprinkle special thanks on my supervisor Ms. Zakia Jalil Assistant Professor,
who always provided me greatest support and help whenever I needed throughout my research
work. He was readily available for consultation, shared his knowledge with me as well as
motivated me for successful completion of this research work.
I can’t forget successful support of my affectionate parents, who always shown desire and pray
for my success as well as provided me financial and moral support throughout my life.
I would like to thank all my respectable teachers, sincere friends and all those peoples in the
faculty who helped me during this research project.
Mehreen Rehman
671-FBAS/MSCS/S-12
______________________________________________________________________________
Identify Student’s Grade and Dropout Issu
iv
Project in Brief
PROJECT IN BRIEF
PROJECT TITLE
:
Identify Student’s Grade and Dropout Issue
UNIVERSITY
:
Department of Computer Science & Software Engineering
International Islamic University, Islamabad.
UNDERTAKEN BY :
Mehreen Rehman
671-FBAS/MSCS/s12
SUPERVISED BY
Ms. Zakia Jalil
Assistant Professor
Department of Computer Science & Software Engineering
International Islamic University, Islamabad,
Sadia Arshid
Assistant Professor
Department of Computer Science & Software Engineering
International Islamic University, Islamabad,
:
Co SUPERVISED BY:
TOOLS USED
:
WEKA
MS Office 2007 for documentation & presentation
OPERATING
SYSTEM
:
Windows 7 (64-bit.)
SYSTEM USED
:
HP Pavilion dv5
Intel (R) Core (TM) 2 Duo CPU P7350 @ 2.00 GHz
RAM 4 GB
START DATE
:
May, 2014
COMPLETION
DATE
:
Oct, 2014
______________________________________________________________________________
Identify Student’s Grade and Dropout Issu
v
Identify Student’s Grade and Dropout Issue
Abstract
Abstract
Educational data mining is used to study the data available in the educational field and bring out
the hidden knowledge from it. The knowledge is hidden among the educational data set and it is
extractable through data mining techniques. Each university has specific criteria of course
evaluation. In Islamic university total marks is 100 of a course and where 40% marks of internal
and 60% of final exam. Continuous Evaluation of internal 40 are (Quiz 10, mid 20, assignment
10). I will derive variables from internal data for predicting student performance in the final
exam. Admission merit criteria of Islamic university is Academics Qualification: 40% and
Admission Test: 60%. Admission test is divided into five sections: Series/sequences,
Quantitative, Logic, Analytical and English. By using HSSC and entry test result predict dropout
student.
In this research, the classification task will used Identify Student’s Grade and Dropout Issue. I
will use Decision Tree method for data classification. By this task I will extract knowledge that
describes students’ performance in final exam. It will helps to identifying the dropouts students
and students who need special attention and allow the teacher to provide appropriate advising
Keywords: Educational data mining, Data Mining.
______________________________________________________________________________
Identify Student’s Grade and Dropout Issu
vi
Identify Student’s Grade and Dropout Issue
Table of Contents
Table of Contents
1.
INTRODUCTION ............................................................................................................... 1
1.1
Data Mining .................................................................................................................. 1
1.2
Classification Methods .................................................................................................. 1
1.3
Clustering ..................................................................................................................... 2
1.4
Predication .................................................................................................................... 3
1.5
Association rule ............................................................................................................ 3
1.6
Neural networks ............................................................................................................ 4
1.7
Nearest Neighbor Method ............................................................................................. 4
1.8
Decision Tree ................................................................................................................ 5
1.9
Bayesian Classification ................................................................................................. 6
1.10 Research Objective ....................................................................................................... 7
2.
Literature Review................................................................................................................. 8
2.1 Analyze students’ performance .......................................................................................... 8
2.2
A prediction for Student's Performance ......................................................................... 8
2.3 A prediction for performance improvement........................................................................ 8
2.4 A Prediction for Performance Improvement of Engineering Students ................................ 9
2.5 Classification to Decrease Dropout Rate of Students .......................................................... 9
2.6 MED to Reduce Dropout Rates ........................................................................................ 10
2.7 Improving the Student’s Performance ............................................................................. 10
2.8 Study of Factors Analysis Affecting Academic Achievement .......................................... 10
2.9 EDM for Predicting the Performance of Students ........................................................... 11
2.10 A prediction of performer or underperformer using classification .................................. 11
2.11 The Student Performance Analysis and Prediction. ....................................................... 11
2.12 Predicting Student Performance using ID3 AND C4.5 .................................................. 12
2.13 Predicting Graduate Employment.................................................................................. 12
______________________________________________________________________________
Identify Student’s Grade and Dropout Issu
vii
Identify Student’s Grade and Dropout Issue
Table of Contents
2.14 Evaluation of Student Performance ............................................................................... 13
2.15 Literature Survey Concept matrix ................................................................................ 13
3.
Problem Statement .......................................................................................................... 16
4.
Proposed Solution .............................................................................................................. 17
4.1 WEKA............................................................................................................................. 17
4.2 Data collection for identify students grade ....................................................................... 17
4.3 Data collection for identify dropout issue ......................................................................... 18
4.4 Implementation ................................................................................................................ 18
5- Experiments .......................................................................................................................... 19
5.1 ID3 Decision Tree............................................................................................................ 19
5.2. Data set ........................................................................................................................... 19
5.3. Data selection and transformation ................................................................................... 19
5.4 Implementation of Mining Model .................................................................................... 22
5.5 Decision Tree .................................................................................................................. 22
5.6 The ID3 Decision Tree ..................................................................................................... 22
5.7 Impurity Measurement ..................................................................................................... 22
5.8 Entropy............................................................................................................................ 23
5.9 Information gain .............................................................................................................. 23
5.10 ID3 Algorithm .............................................................................................................. 23
5.11 C4.5 ............................................................................................................................... 24
6 Discussion on Result .............................................................................................................. 26
6.1 For identify student grades. .............................................................................................. 26
6.2 For identify dropout issue. ............................................................................................... 38
7 Conclusion .............................................................................. Error! Bookmark not defined.
7.
References ...................................................................................................................... 63
______________________________________________________________________________
Identify Student’s Grade and Dropout Issu
viii
Identify Student’s Grade and Dropout Issue
List of Tables
List of Tables
Table Number
Page No
Table 1: Conduction of research
2
Table 2: student relative variable
18
Table 3: student relative variable
76
Table 4: Data sets of English file
77
______________________________________________________________________________
Identify Student’s Grade and Dropout Issu
ix
Chapter 1
List of Figures
List of Figures
Figure Number
Page No
Figure 1: Data Mining
15
Figure 2: Classification
15
Figure 3: Clustering
15
Figure 4: Regression
15
Figure 5: Association rule
15
Figure 6: Nearest Neighbor Method
15
Figure 7: decision tree
15
Figure 8: English file in weka
15
Figure 9: C4.5 result
15
Figure 10: Evolution of test spit
15
Figure 11: Classifier visualize
15
Figure 12: dropout file in weka
15
Figure 13: Evolution of test spit
15
Figure 14: Run information
15
Figure 15: C4.5 result
15
______________________________________________________________________________
Identify Student’s Grade and Dropout Issu
x
Chapter 1
Introduction
1. INTRODUCTION
1.1
Data Mining
Data mining is the procedure of extraction of intriguing (non-insignificant,
understood, at one time obscure and possibly valuable) examples or learning from
colossal measure of information. Information mining procedures are utilized to work
on substantial volumes of information to find concealed examples and connections
supportive in choice making. An data mining calculation is a generally characterized
strategy that takes information as data and produces yield as models or examples. The
term generally characterized demonstrate that the methodology can be decisively
encoded as a limited set of principles.
Figure 1 data mining
The new rising field [1], called Educational Data Mining, concerns with creating
routines that find information from information starting from instructive situations.
Instructive information mining is utilized to distinguish and upgrade instructive
procedure which can enhance their choice making methodology. Key employments of
EDM incorporate anticipating understudy execution, and concentrating on adapting so
as to propose upgrades to current instructive practice. EDM can be viewed as one of
the learning sciences, and also a zone of information mining.
1.2
Classification Methods
Classification method like decision trees, Bayesian system and so forth can be
connected on the instructive information for foreseeing the understudy's execution in
examination. The forecast will help to distinguish the powerless understudies and help
them to score better stamps. The Id3 (Iterative Dichotomise 3), C4.5, CART and ADT
(Alternating Decision Tree) choice tree calculations are connected on understudy's
information to anticipate their execution in the end of the year test.
Identify Student’s Grade and Dropout Issue
1
Chapter 1
Introduction
Figure 2 classification
1.3
Clustering
Clustering can be said as identification of similar classes of objects. By using
clustering techniques we can further identify dense and sparse regions in object space
and can discover overall distribution pattern and correlations among data attributes.
Classification approach can also be used for effective means of distinguishing groups
or classes of object but it becomes costly so clustering can be used as preprocessing
approach for attribute subset selection and classification.
Figure 3 clustering
Identify Student’s Grade and Dropout Issue
2
Chapter 1
1.4
Introduction
Predication
Regression technique can be adjusted for predication. Relapse examination can be
utilized to model the relationship between one or more free variables and ward
variables. In information mining autonomous variables are qualities known and
reaction variables are what we need to anticipate. Lamentably, a lot of people
certifiable issues are not just expectation. Consequently, more mind boggling systems
(e.g., logistic relapse, choice trees, or neural nets) may be important to gauge future
qualities. The same model sorts can frequently be utilized for both relapse and
arrangement.
Figure 4 regression
For instance, the CART (Classification and Regression Trees) choice tree calculation
can be utilized to construct both characterization trees (to group absolute reaction
variables) and relapse trees (to figure constant reaction variables). Neural systems also
can make both order and relapse models.
1.5
Association rule
Association and correlation is generally to discover visit thing set discoveries among
expansive information sets. This kind of discovering helps organizations to settle on
specific choices, for example, index outline, cross advertising and client shopping
conduct investigation.
Identify Student’s Grade and Dropout Issue
3
Chapter 1
Introduction
Figure 5 Association
Association Rule calculations need to have the capacity to create tenets with certainty
values short of what one. However the quantity of conceivable Association Rules for
a given dataset is by and large vast and a high extent of the principles are more often
than not of little (if any) quality.
1.6
Neural networks
Neural system is a situated of associated info/yield units and every association has a
weight present with it. Amid the learning stage, system adapts by changing weights in
order to have the capacity to anticipate the right class names of the information tuples.
Neural systems have the striking capacity to determine significance from confounded
or loose information and can be utilized to concentrate examples and discover drifts
that are so mind boggling it is not possible be recognized by either people or other
machine methods.
These are appropriate for ceaseless esteemed inputs and yields. Neural systems are
best at distinguishing examples or patterns in information and appropriate for forecast
or estimating needs.
1.7
Nearest Neighbor Method
A method that groups each one record in a dataset focused around a blending of the
classes of the k record(s) most like it in an authentic dataset (where k is more
noteworthy than or equivalent to 1). Some of the time called the k-closest neighbor
method.
Identify Student’s Grade and Dropout Issue
4
Chapter 1
Introduction
Figure 6 Nearest Neighbor Method
1.8
Decision Tree
A decision tree is a tree in which each one limb hub speaks to a decision between
various decision, and each one leaf hub speaks to a choice. Choice tree are regularly
utilized for picking up data with the end goal of choice -making. Choice tree begins
with a root hub on which it is for clients to take activities. From this hub, clients part
every hub recursively as indicated by choice tree learning calculation. The last come
about is a choice tree in which each one extension speaks to a conceivable situation of
choice and its conclusion. The three broadly utilized choice tree learning calculations
are: Id3, ASSISTANT and C4.5.
Figure 7 Decision tree
Identify Student’s Grade and Dropout Issue
5
Chapter 1
Introduction
a) ID3
Id3 (Iterative Dichotomiser 3) is a choice tree calculation presented in 1986 by
Quinlan Ross [1]. Id3 utilizes data increase measure to pick the part quality. It just
acknowledges absolute traits in building a tree model. It doesn't give precise result
when there is commotion. To evacuate the commotion preprocessing procedure must
be utilized. Nonstop traits can be taken care of utilizing the Id3 calculation by
discretizing or straightforwardly, by considering the qualities to discover the best part
point by taking an edge on the property estimations. Id3 does not help pruning.
b) C4.5 algorithm
C4.5 calculation is a successor to Id3 created by Quinlan Ross [2]. C4.5 handles both
unmitigated and nonstop ascribes to construct a choice tree. To handle consistent
traits, C4.5 parts the characteristic qualities into two parcels focused around the chose
edge such that all the qualities over the limit as one kid and the staying as an alternate
tyke. It likewise handles missing characteristic qualities. C4.5 uses Gain Ratio as a
credit choice measure to assemble a choice tree. It uproots the biasness of data
increase when there are numerous result estimations of a property. From the start,
ascertain the addition degree of each one trait. The root hub will be the property
whose increase degree is most extreme. C4.5 utilizes cynical pruning to evacuate
unnecessary extensions in the choice tree to enhance the exactness of order.
1.9
Bayesian Classification
The Naïve Bayes Classifier procedure is especially suited when the dimensionality of
the inputs is high. Regardless of its effortlessness, Naive Bayes can frequently beat
more modern arrangement systems. Gullible Bayes model distinguishes the attributes
of dropout understudies. It demonstrates the likelihood of each one info property for
the anticipated state.
A Naive Bayesian classifier is a basic probabilistic classifier focused around applying
Bayesian hypothesis (from Bayesian detail) with solid (credulous) autonomy
presumptions. By the utilization of Bayesian hypothesis we can compos
Identify Student’s Grade and Dropout Issue
6
Chapter 1
Introduction
1.10 Research Objective
The main objective of this paper is to use data mining methodologies to study
students’ performance in the courses. Data mining provides many tasks that could be
used to study the student performance. In this research, the classification task will
used to evaluate student’s performance and as there are many approaches that are
used for data classification, the decision tree method is used here. Information like
Student batch (SB), Quiz marks (QM), Mid paper marks (MPM), Assignment Marks
(AM), Attendance of Student (ATT), HSSC Marks (HSSC), Entry Test Marks (ETM)
and End semester Marks (ESM) were collected from the students’ management
system, to predict the performance at final exam.
Identify Student’s Grade and Dropout Issue
7
Chapter 8
References
2. Literature Review
2.1 Analyze students’ performance
Bharadwaj and Pal [3] acquired the college understudies information like participation, class test,
course and task marks from the understudies' past database, to anticipate the execution at the end
of the semester. The information set utilized as a part of this study was gotten from VBS
Purvanchal University, Jaunpur (Uttar Pradesh) on the inspecting technique for machine
Applications bureau obviously MCA (Master of Computer Applications) from session 2007 to
2010. At first size of the information is 50.
The principle target was to utilize information mining strategies to study understudy's execution
in the courses. They were chosen few determined variables yet they can't select the variable mid
paper marks. They choose the variable (ASS – Assignment execution) and partition it into two
classes: Yes – understudy submitted task, No – Student not submitted task however I think it will
separate into three classes: Poor – < 40%, Average – > 40% and < 60%, Good –>60%. To assess
understudy's execution they utilize order undertaking choice tree strategy however in choice tree
there is no back following so a nearby ideal arrangement can be taken as worldwide arrangement
and guidelines.
2.2
A prediction for Student's Performance
The principle goal of Abeer and Ibrahim [4] was to utilize information mining procedures to
study understudy's execution in end. General gratefulness and Classification undertaking is
utilized to foresee the last grade of understudies. The information set utilized within this study
was gotten from an understudy's database utilized as a part of one of the instructive
establishments, on the inspecting system for Information framework office from session 2005 to
2010. At first size of the information is 1547 records.
In full research paper they can't characterize that they utilize weka apparatus for execute yet at
end they demonstrate a figure of result in weka. Weka is open source programming that
actualizes an expansive accumulation of machine inclining calculations and is broadly utilized as
a part of information mining applications.
2.3 A prediction for performance improvement
Bhardwaj and Pal [5] led study on the understudy execution based by selecting 300 understudies
from 5 distinctive degree school directing BCA (Bachelor of Computer Application) course of
Identify Student’s Grade and Dropout Issue
8
Chapter 8
References
Dr. R. M. L. Awadh University, Faizabad, India. By method for Bayesian order system on 17
characteristics, it was discovered that the components like students‟ review in senior auxiliary
exam, living area, medium of educating, moms capability, understudies other propensity, family
yearly pay and understudies family status were exceptionally associated with the understudy
scholastic execution.
They were surrounded to support the low scholarly achievers in advanced education. Bayesian
grouping strategy is utilized on understudy database to anticipate the understudies division on the
premise of earlier year database however Bayesian classifier request incredible consistency in
information so some other system can be taken in attention. Other arrangement errand i.e choice
tree system will likewise be use to foresee the understudies division on the premise of earlier year
database. They were chosen 14 determined variables. Different variables are additionally be
chosen i.e understudies review in High School instruction (HSG), Students review in Senior
(SSG), The affirmation sort (Atype).
2.4 A Prediction for Performance Improvement of Engineering Students
Surjeet Kumar Yadav and Saurabh Pal [6] led study on the understudy execution based by
selecting 90 understudies from 5 distinctive degree school leading BCA (Bachelor of Computer
Application) course of Dr. R. M. L. Awadh University, Faizabad, India. By method for choice
tree arrangement technique on 17 property, it was observed that the elements like understudies
review in senior optional exam, living area, medium of educating, moms capability, understudies
other propensity, family yearly wage and understudies family status were profoundly
corresponded with the understudy scholastic execution.
The goals were confined in order to aid the low scholastic achievers in building. The C4.5, Id3
and CART choice tree calculations are connected on building understudy's information to foresee
their execution in the last, most decisive test. Other characterization assignments are additionally
be connected i.e Bayesian order technique on these 17 traits.
2.5 Classification to Decrease Dropout Rate of Students
The primary target of Dr. Saurabh Pal [7] was to utilize information mining philosophies to
discover understudies which are prone to drop out their first year of designing. Study lead on the
understudy execution based by selecting 165 understudies. The grouping errand is utilized to
assess earlier year's understudy dropout information and as there are numerous methodologies
that are utilized for information characterization, the Bayesian order technique on 17 properties.
Data like imprints in High School, checks in Senior Secondary, understudies family position and
so forth were gathered from the understudy's administration framework, to anticipate rundown of
understudies who need unique consideration.
Identify Student’s Grade and Dropout Issue
9
Chapter 8
References
They utilize the Bayesian grouping strategy to discover understudies which are liable to drop out
their first year of designing. Anyway the issue with Bayesian hypothesis is it supports the most
elevated happening esteem so for utilizing this procedures information ought to be predictable
enough to beat this issue. Bayesian classifier request incredible consistency in information so
some other strategy can be taken in attention.
2.6 MED to Reduce Dropout Rates
Saurabh Pal [8] use information mining philosophies to discover understudies which are liable to
drop out their first year of building. In this examination, the order assignment is utilized to assess
earlier year's understudy dropout information and as there are numerous methodologies that are
utilized for information grouping, the Id3, C4.5, CART and ADT choice tree strategies is utilized
here. Data like evaluation in High School, review in Senior Secondary, understudy's family pay,
folks capability and so forth were gathered from the understudy's administration framework, to
foresee rundown of understudies who need unique consideration.
The fundamental destination was to utilize information mining techniques to discover
understudies which are prone to drop out their first year of designing the Id3, C4.5, CART and
ADT choice tree strategies is utilized here. They were chosen 14 determined variables. Different
variables are likewise be chosen i.e Students food habit(SFH), Students other habit(SOH)
Students family status(FStat), student’s family size(FSize), Family annual income Status (FAIn),
Student live in hostelor not(Hos).
2.7 Improving the Student’s Performance
In K.shanmuga Priya [9], information characterization and choice tree is utilization which serves
to enhance the understudy's execution in a finer manner. Give high certainty to understudies in
their studies. To distinguish the understudies who need uncommon prompting or directing by the
instructor this gives high caliber of instruction. The information set utilized is gotten from M.sc
IT bureau of Information Technology 2009 to 2012 clump, Hindustan College of Arts and
Science, Coimbatore. Initial 50 understudies information is taken as example and blunders were
evacuated.
No instrument and no product is utilization. They were chosen few determined variables however
they can't select the variable mid paper marks. They choose the variable (PP – Paper
Presentations. Paper presentation is separated into two classes: Yes – understudy partook
Presentation, No – Student not took part in Presentation. be that as it may I think it isolate into
three classes: Poor , Average and good.
2.8 Study of Factors Analysis Affecting Academic Achievement
The point of Pimpa Cheewaprakobkit [10] is to dissect variables influencing scholarly
accomplishment expectation of understudies' scholastic execution. It is helpful in recognizing
Identify Student’s Grade and Dropout Issue
10
Chapter 8
References
frail understudies who perform inadequately in their study. The information set included 1,600
understudy records with 22 traits of understudies enrolled between year 2001 and 2011 in a
college in Thailand. They utilized WEKA open source information mining instrument to dissect
characteristics Two order calculations have been embraced and looked at: the neural system
C4.5 choice tree calculation.
Three fundamental issue and future work is that Each component has an alternate critical Value
different variables or components ought to be considered too Find approaches to exhort and
support the at-danger understudies. Future examination ought to stretch the study to investigate
the understudies' execution in different projects.
2.9 EDM for Predicting the Performance of Students
The extent of Ajay Kumar Pal and Saurabh Pal [11], makes to concentrate the learning find from
the understudy database for enhancing the understudy execution. They by information mining
systems including a standard learner (Oner), a typical choice tree calculation C4.5 (J48), a neural
system (Multilayer Perceptron), and a Nearest Neighbor calculation (Ib1) are utilized. The
information set utilized as a part of that study was gotten from distinctive schools on the
inspecting technique for B.sc. (Single guys of Science) course of session 2011-12. At first size of
the information is 200. They utilized Weka open source programming.
2.10 A prediction of performer or underperformer using classification
The extent of Ajay Kumar Pal and Saurabh Pal [12], makes to concentrate the learning find from
the understudy database for enhancing the understudy execution. They by information mining
systems including a standard learner (Oner), a typical choice tree calculation C4.5 (J48), a neural
system (Multilayer Perceptron), and a Nearest Neighbor calculation (Ib1) are utilized. The
information set utilized as a part of that study was gotten from distinctive schools on the
inspecting technique for B.sc. (Single guys of Science) course of session 2011-12. At first size of
the information is 200. They utilized Weka open source programming.
2.11 The Student Performance Analysis and Prediction.
Information mining procedures assume a paramount part in information investigation. For the
development of an arrangement model which could foresee execution of understudies, especially
for building limbs, a choice tree calculation connected with the information mining strategies
have been utilized as a part of the exploration. Various variables may influence the execution of
understudies. In Vivek Kumar Sharma [13] some huge variables have been considered while
building the choice tree for ordering understudies as indicated by their characteristics (grades). In
this paper four diverse choice tree calculations J48, Nbtree, Reptree and Simple truck were
looked at and J48 choice tree calculation is discovered to be the best suitable calculation for
model development. Cross approval system and rate part technique were utilized to assess the
Identify Student’s Grade and Dropout Issue
11
Chapter 8
References
effectiveness of the diverse calculations. The customary KDD process has been utilized as a
philosophy. The WEKA (Waikato Environment for Knowledge Analysis) device was utilized for
dissection and expectation. . Results acquired in the present study may be useful for recognizing
the frail understudies so that administration could take proper activities, and achievement rate of
understudies could be expanded sufficiently.
2.12 Predicting Student Performance using ID3 AND C4.5
For Kalpesh Adhatrao [14] they have broke down the information of understudies enlisted in
first year of building. This information was gotten from the data gave by the conceded
understudies to the organization. It incorporates their full name, sex, application ID, scores in
board examinations of classes X and XII, scores in door examinations, class and affirmation sort.
We then connected the Id3 and C4.5 calculations in the wake of pruning the dataset to anticipate
the consequences of these understudies in their first semester as unequivocally as could
reasonably be expected.
In this project, prediction parameters such as the decision trees generated using RapidMiner are
not updated dynamically within the source code. In the future, we plan to make the entire
implementation dynamic to train the prediction parameters itself when new training sets are fed
into the web application. Also, in the current implementation, we have not considered
extracurricular activities and other vocational courses completed by students, which we believe
may have a significant impact on the overall performance of the students. Considering such
parameters would result in better accuracy of prediction.
2.13 Predicting Graduate Employment
Data mining has been connected in different zones on account of its capability to quickly break
down inconceivable measures of information. Bangsuk Jantawan [15 ]is to assemble the
Graduates Employment Model utilizing order undertaking within information mining, and to
look at a few of information mining methodologies, for example, Bayesian strategy and the Tree
system. The Bayesian system incorporates 5 calculations, including AODE, Bayesnet, HNB,
Navivebayes, WAODE. The Tree technique incorporates 5 calculations, including Bftree,
Nbtree, Reptree, Id3, C4.5. The examination utilizes a grouping undertaking as a part of WEKA,
and we analyze the consequences of every calculation, where a few order models were created.
To accept the produced model, the examinations were led utilizing true information gathered
from graduate profile at the Maejo University in Thailand. The model is expected to be utilized
for anticipating whether a graduate was utilized, unemployed, or in an undetermined
circumstance.
Identify Student’s Grade and Dropout Issue
12
Chapter 8
References
2.14 Evaluation of Student Performance
In P. Ajith, M.S.S.Sai [16] outlier location components are utilized for distinguishing outliers
which enhance the nature of choice making. We utilized outlier investigation to recognize
outliers in the understudy information. In proposed framework, bunching system alongside
univariant investigation is executed. Bunching is discovering gatherings of articles such that the
items in one gathering will be like each other and not quite the same as the items in an alternate
gathering. While bunching, the extensive information set is partition into bunches which
comprises of outliers. In the wake of Clustering, the information focuses which are available
outside the bunches are distinguished and treated as outliers. Recognizable proof is carried out by
utilizing univariate investigation which is the least complex type of quantitative (factual)
investigation. An essential method for introducing univariate information is to make a recurrence
dispersion of the individual cases Here, we dissect the execution of UG understudies of our
school and present the results utilizing outlier recognition instrument. The investigated results
are spoken to utilizing histograms which are focused around univariate examination.
2.15 Literature Survey Concept matrix
The Table 2.1 mentioned below; briefly describes the purpose for the conduction of research,
techniques that are being used in the proposed paper, the results and outcomes of the proposed
solution of paper, advantages of presenting the proposed method and future work of the
following papers
Publication
Name
& Year
(IJCSIS) -11
(WJCAT) -13
(IJCSIS) -11
Purpose
Mining
Educational
Data
to
Analyze
Students‟
Performance
A prediction
for Student's
Performance
Using
Classification
Method
A prediction
for
performance
Technique
Results/
Outcome
Advantage
Decision tree PSM has the Help
to
method
highest gain
improve the
division
of
the student
ID3 decision Mid
tream Reduce
tree,
Weka has
the failing ratio
tool
highest gain
Bayes
Classification
and
Identify Student’s Grade and Dropout Issue
Other factors Identify those
effect student student which
performance
needed
13
Chapter 8
(WCSIT) -12
(IJCSIS) -12
(IJIEEB) -12
(IJANA) -13
(IMECS) - 13
(IJCIT) -13
References
improvement
using
classification
A Prediction
for
Performance
Improvement
of
Engineering
Students using
Classification
Mining
Educational
Data
Using
Classification
to
Decrease
Dropout Rate
of Students
Mining
Educational
Data
to
Reduce
Dropout Rates
of
Engineering
Students
Improving the
Student’s
Performance
Using
Educational
Data Mining
Study
of
Factors
Analysis
Affecting
Academic
Achievement
of
Undergraduate
Students in
International
Program
Data Mining
MATLAB
tool
special
attention.
C4.5,
ID3,
and
CART
decision tree
algorithms.
And
Weka tool
C4.5
has
highest
accuracy of
67.778%
compare
to
other method.
Model
is
successfully
identifying
the
student
who are likely
to fail.
Bayes
Classification
was
implement in
weka tool
The student
with
mid=
hindi are not
continue their
study.
Predict
the
list of student
who are going
to drop their
study.
C4.5,
ID3,
CART
and
ADT decision
tree
algorithms
were
implement in
Weka tool
ID3 decision
tree.
ID3 can learn
effective
predictive
models from
the
student
dropout data.
Produce short
but accurate
prediction list
of
student
dropout.
The attribute
OSM has the
maximum
gain value.
It
improve
students’
performance
in an efficient
way.
Performance
comparison
between
Decision Tree
and
Neural
Network
models
Decision Tree
Model
is
more accurate
than the
Neural
Network
Model
The
model
will
be
updated and
tested to
have
a
satisfactory
level.
The
rule Nearest
Identify Student’s Grade and Dropout Issue
results
show
14
Chapter 8
References
Techniques in
EDM
for
Predicting the
Performance
of Students
learner
(OneR),
decision tree
algorithm
C4.5
(J48),
neural
network
(MultiLayer
Perceptron)
and Nearest
Neighbour
algorithm
(IB1)
were
implemented
in WEKA.
Neighbour
algorithm IB1
classifier has
the
lowest
average error
that
they
produce short
but
accurate
prediction list
for
the
student
by
applying the
predictive
models.
Table 1 conduction of research
Identify Student’s Grade and Dropout Issue
15
Chapter 8
3.
References
Problem Statement
In Abeer [2] they were selected few derived variables but they cannot select the variable mid
Marks which highly effect on final exam. The variable assigement divide it into two classes: Yes
– student submitted assignment, No – Student not submitted assignment but I think it divide into
three classes: Poor – < 40%, Average – > 40% and < 60%, Good –>60%. To analyze student’s
performance they use classification task decision tree method but in decision tree there is no back
tracking so a local optimal solution can be taken as global solution and rules are inferred if small
data set is selected. No tool and no software used. Not able to predict students division of first
semester.
Identify Student’s Grade and Dropout Issue
16
Chapter 8
4.
References
Proposed Solution
The classification did utilizing a Decision Tree technique to foresee the execution at the end of
the semester. Decision tree strategy will utilized on understudy database to anticipate the
understudies division on the premise of past database. Those variables will choose that
exceedingly impact on understudies' execution. This study will help to the understudies and the
educators to enhance the division of the understudy. This study will likewise work to recognize
those understudies which required unique thoughtfulness regarding lessen coming up short
proportion and making fitting move at opportune time.
Data Analysis and Implementation of decision support system is use for evaluating student’s
grade and Identify dropout student Using classification algorithm (decision tree). To evaluate
student grade select attributes of internal data from student database. To identify dropout student
select attributes of HSSC and entry test result from student database. The system will be
implemented in WEKA.
4.1 WEKA
Weka is a collection of machine learning algorithms for data mining tasks. The algorithms can
either be applied directly to a dataset or called from your own Java code. Weka contains tools for
data pre-processing, classification, regression, clustering, association rules, and visualization. It is
also well-suited for developing new machine learning schemes.
The data set will be obtained from IIUI, For Evaluating student’s grade Data set will be
Computer Science and Software Engineering department. Initially size of data is 200 for each
subject. For Identifying dropout student Data set will be obtained from admission department.
Initially size of data will be 9217.Data will be stored in different tables which will be join in a
single table.
4.2 Data collection for identify students grade
For Predict the students division of computer science and software engineering subject. The
attributes and there value are
1. Student Batch(present, senior)
2. Quiz marks (poor, average, good),
3. Mid paper marks (A, B+,B,C+,C,D+,D,F),
4. Assignment Marks (poor, average, good),
5. Attendance (poor, average, good),
6. End semester Marks (A, B+,B,C+,C,D+,D,F)
Identify Student’s Grade and Dropout Issue
17
Chapter 8
References
4.3 Data collection for identify dropout issue
To Predict the dropout students of International Islamic University, Islamabad (IIUI). The
attributes and there value are
1. Gender (male, female)
2. HSSC marks (A, B+,B,C+,C,D+,D,F),
3. Series/sequences (average, good),
4. Quantitative (poor, average, good),
5. Logic (average, good),
6. Analytical (poor, average, good),
7. English (poor, average, good, vgood),
8. End semester Marks (A, B+,B,C+,C,D+,D,F),
9. Dropout(yes.no)
4.4 Implementation
I have separated the whole execution into three stages. In the first stage, data about understudies
who have been conceded was gathered. This incorporated the points of interest submitted to the
school at the time of enrolment. In the second stage, unessential data was expelled from the
gathered information and the applicable data was nourished into a database. The third stage
included applying the Id3 and C4.5 calculations on the preparation information to get choice
trees of the calculations.
Identify Student’s Grade and Dropout Issue
18
Chapter 8
References
5. Experiments
In this chapter we have discussed the implementation scenario and obtained the results in detail.
This chapter is divided in to four parts. In first part we have discussed the data set and give its
related statistics. In second part we discussed shortly parameter settings for both methods and its
reasons. In third part we have discussed the results and discussions separately for both the
methods.
5.1 ID3 Decision Tree
The ID3 algorithm was invented by Ross Quinlan. It is a precursor to the c4.5 algorithm. We can
create the decision tree in a given data set using this ID3 algorithm. This algorithm classifies the
data using attributes. ID 3 follows the Occams’s Razer Principle. It is used to create the smallest
possible decision tree.
In an educational system student’s performance can be improved by analyzing the internal
assessment and end semester examination. Internal assessment means class test, seminar,
attendance, lab practical would be conducted by the teacher. Along with the internal assessment,
communication skill and paper presentations done by the student in their academic days are also
needed to analyze for the improvement of student’s performance.
5.2. Data set
The data set used in this study was obtained from Islamic International University Islamabad of
computer science and software engineering department of all courses from Batch 2013. Initially
size of data for each subject is 200. Data stored in different tables was joined in a single table.
Variable will derive from internal dataset.
For Identify dropout student. Data set obtained from admission department. Initially size of data
is 9217. But university cannot allow to use admission data so I make data set with my own it is
fake data.
5.3. Data selection and transformation
For identify student performance Some of the fields were selected which are required for data
mining process. Some derived attributes were included. These attributes are given in Table – 2
Attributes
Description
Possible Value
Batch
Student Batch
{Senior, present}
Identify Student’s Grade and Dropout Issue
19
Chapter 8
References
Quiz
Quiz marks
{Good, average, poor}
Ass
Assignment Marks
{Good, Average, Poor}
Mid
Mid Grades
A= (80% - 100%), B+ =( 75%
- 79%), B = (70% - 64%),
C+=(65% - 69%), C=(60% 65%), D+=(55% - 59%), D=
(50% - 54%), and F = < 50%.
ESM
End semester Marks
A= (80% - 100%), B+ =( 75%
- 79%), B = (70% - 64%),
C+=(65% - 69%), C=(60% 65%), D+=(55% - 59%), D=
(50% - 54%), and F = < 50%.
Table 2 student relative variables
The values for the attributes are explained as follows for student performance.

QUIZ – Marks obtained in quiz. In each semester two class tests are conducted and
average of three class test are used to calculate sessional marks. Quiz is split into three
classes: Poor – <, Average – > 40% and < 60%, Good –>60%.

ASS - Assignment performance. In each semester two assignments are given to students
by each teacher. Assignment performance is divided into two classes: Yes – student
submitted assignment, No – Student not submitted assignment.

MID - Grade are assigned to all students using following mapping A – 80% - 100%, B+ –
75% - 79%, B – 70% - 64%, C+ – 65% - 69%, C –60% - 65%, D+ –55% - 59%, D – 50%
- 54%, and F - < 50%.

ESM - End semester Marks obtained in semester. -Grade are assigned to all students
using following mapping A – 80% - 100%, B+ – 75% - 79%, B – 70% - 64%, C+ – 65% 69%, C –60% - 65%, D+ –55% - 59%, D – 50% - 54%, and F - < 50%.
For identify dropout issue Some of the fields were selected which are required for data mining
process. Some derived attributes were included. These attributes are given in Table – 3
Attributes
Description
Possible Value
Gender
Gender
Male, female
HSSC
HSSC Marks
A+ – 80% to 100%, A – 70%
- 79%, B – 60% - 69%, C –
50% - 59%.
Identify Student’s Grade and Dropout Issue
20
Chapter 8
References
ESM
End semester marks
A= (80% - 100%), B+ =( 75%
- 79%), B = (70% - 64%),
C+=(65% - 69%), C=(60% 65%), D+=(55% - 59%), D=
(50% - 54%), and F = < 50%.
SQ
Sequence
{ good > 10 and average < 10}
QT
Quantitative
{ good > 15, average >10 & <
15. Poor < 10}
LO
Logic
{ good > 15, average >10 & <
15. Poor < 10}
AT
Analytical
{ good > 10 and average <10}
EH
English
{ Vgood > 25, good > 20 & <
25, average >10 & < 20. Poor
<10.}
Dropout
Dropout
{yes, no}
Table 3 student relative variables
The values for the attributes are explained as follows for dropout student.

HSSC Marks - Students grade in High School education. Students. Grade are assigned to
all students using following mapping A+ – 80% to 100%, A – 70% - 79%, B – 60% 69%, C – 50% - 59%.

ESM - End semester Marks obtained in semester. -Grade are assigned to all students
using following mapping A – 80% - 100%, B+ – 75% - 79%, B – 70% - 64%, C+ – 65% 69%, C –60% - 65%, D+ –55% - 59%, D – 50% - 54%, and F - < 50%.

Sequence – It is entry test part and total number is 15. It is sprit in two classes: good > 10
and average < 10.

Quantitative – Entry test part. It is sprit in three classes: good > 15, average >10 & < 15.
Poor < 10.

Logic – Entry test part. It is sprit in three classes: good > 15, average >10 & < 15. Poor <
10.

Analytical– Entry test part. It is sprit in two classes: good > 10, average <10.

English – Entry test part. It is sprit in four classes: Vgood > 25, good > 20 & < 25,
average >10 & < 20. Poor <10.
Identify Student’s Grade and Dropout Issue
21
Chapter 8

References
Dropout - Dropout condition. Whether the student continues or not after one year.
Possible values are Yes if student continues study and No if student dropped the study
after one year.
5.4 Implementation of Mining Model
Weka is open source programming that executes a vast gathering of machine inclining
calculations and is broadly utilized as a part of information mining applications. From the above
information, drop.arff document was made. This record was stacked into WEKA wayfarer. The
characterize board empowers the client to apply order and relapse calculations to the ensuing
dataset, to gauge the precision of the ensuing prescient model, and to imagine wrong forecasts, or
the model itself. The calculation utilized for order is Naive Bayes. Under the "Test choices", the
10-fold cross-approval is chosen as our assessment approach. Since there is no different
assessment information set, this is important to get a sensible thought of exactness of the
produced model. This prescient model gives approach to anticipate whether another understudy
will keep on selecting or not following one year.
5.5 Decision Tree
A decision tree is a tree in which each one limb hub speaks to a decision between various plan B,
and each one leaf hub speaks to a choice. Decision tree are generally utilized for picking up data
with the end goal of choice -making. Decision tree begins with a root hub on which it is for
clients to take activities. From this hub, clients part every hub recursively as per choice tree
learning calculation. The last come about is a choice tree in which each one extension speaks to a
conceivable situation of choice and its result.
The three generally utilized choice tree learning calculations are: Id3, ASSISTANT and C4.5.
5.6 The ID3 Decision Tree
Id3 is a basic choice tree learning calculation created by Ross Quinlan [14]. The essential thought
of Id3 calculation is to build the choice tree by utilizing a top-down, covetous pursuit through the
offered sets to test each one property at each tree hub. Keeping in mind the end goal to choose
the trait that is most valuable for characterizing a given sets, we present a metric – data pick up.
To discover an ideal approach to arrange a learning set, what we have to do is to minimize the
inquiries asked (i.e. minimizing the profundity of the tree). Subsequently, we require some
capacity which can measure which inquiries give the most adjusted part. The data pick up metric
is such a capacity.
5.7 Impurity Measurement
In the dataset there will be a few quantities of characteristics and classes of properties. The
estimation of homogeneity or heterogeneity of information in the dataset is focused around
Identify Student’s Grade and Dropout Issue
22
Chapter 8
References
classes. The immaculateness of table can be distinguished by, which contains one and only class.
The information table which comprise of more than a few classes are known as heterogeneous or
debasement of table.
There are a few approaches to measure debasement of table polluting influence in the tables.
Anyhow the well strategy is entropy, gini file and grouping blunder.
Thus, the system for Entropy is utilized to ascertain the quantitative debasement. Entropy of
immaculate table gets to be zero when the likelihood turns into one and it achieves greatest
qualities when all classes in the dataset have approach likelihood.
5.8 Entropy
Given probabilities p1, p2, … , ps, where pi = 1, Entropy is characterized as
H(p1, p2, … , ps) =
- (pi log pi)
Entropy discovers the measure of request in a given database state. An estimation of H = 0
recognizes a flawlessly characterized set. As it were, the higher the entropy, the higher the
possibility to progress the grouping procedure.
5.9 Information gain
The Information increase can be expanded with the normal virtue of the subsets which are
delivered by the properties in the given information set. This measure is utilized to focus the best
quality for the specific hub in the tree. Selecting the new trait and apportioning the given
qualities will be rehashed for every non terminal hub. On the off chance that any quality has been
joined higher in the tree, that property will be prohibited. In this way, the greater part of the given
traits will be seemed once in the distance all through the tree.
Thusly above procedure will be proceeded in all the leaf hub till any of the conditions are met,
(i) Each trait is incorporated once in all the way of the tree, or
(ii) If each one trait's entropy quality is zero, the given worth will be connected with the leaf hub.
5.10 ID3 Algorithm
Id3 (Values given, Target_attribute, Attributes)
Step 1: Create a tree with root hub.
Step 2: Return the single tree root hub with mark +, if all the given qualities are certain.
Step 3: Return the single tree root hub with name -, if all the given qualities are negative.
Identify Student’s Grade and Dropout Issue
23
Chapter 8
References
Step 4: Return the single tree root hub with name = most regular estimations of target qualities in
the given worth. It can be performed when foreseeing trait is vacant.
Step 5: else start
(i) A points out best trait in the given quality.
(ii) In choice tree the root for a characteristic is A
(iii) For An, every conceivable qualities Vi as,
(a) If A = Vi include relating limb beneath root.
(b) Let given quality Vi which is subset of the given worth Vi for A.
(c) If the given quality Vi is unfilled
(i) Add another leaf to the extension hub which is equivalent to most normal target esteem in the
given quality.
(ii) Add the sub tree Id3 to this new limb hub (values given Vi, Target_attribute, Attribute).
Step 6: End the procedure.
Step 7: Return the root hub.
5.11 C4.5
C4.5 is a well-known calculation used to produce a choice trees. It is an augmentation of the Id3
calculation used to defeat its inconveniences. The choice trees produced by the C4.5 calculation
can be utilized for arrangement, and hence, C4.5 is additionally alluded to as a measurable
classifier. The C4.5 calculation rolled out various improvements to enhance Id3 calculation [2].
Some of these are:
1. Taking care of preparing information with missing estimations of traits
2. Taking care of varying expense traits
3. Pruning the choice tree after its creation
4. Taking care of traits with discrete and constant qualities
Let the preparation information be a set S = s1, s2 ... of effectively ordered specimens. Each one
example Si = xl, x2... is a vector where xl, x2 ... speak to qualities or peculiarities of the
specimen. The preparation information is a vectorC = c1, c2..., where c1, c2... speak to the class
to which each one example has a place with.
At every hub of the tree, C4.5 picks one quality of the information that most viably parts
information set of examples S into subsets that can be one class or the other [5]. It is the
standardized data pick up (distinction in entropy) that comes about because of picking a quality
Identify Student’s Grade and Dropout Issue
24
Chapter 8
References
for part the information. The characteristic variable with the most astounding standardized data
increase is considered to settle on the choice. The C4.5 calculation then proceeds the more
diminutive sub-records having next most elevated standardized data pick up.
Identify Student’s Grade and Dropout Issue
25
Chapter 8
References
6 Discussion on Result
6.1 For identify student grades.
The data set used in this study was obtained from Islamic International University Islamabad of
computer science and software engineering department of all courses from Batch 2013. Table 5
is the data sets of subject English total sample are 168. Samples are dividing in three equal parts
two parts are used for training and one part use for testing. 112 samples are used for training and
56 samples are used for testing.
ID
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
Batch
present
present
present
present
present
present
present
present
present
present
present
present
present
present
present
present
present
present
present
present
present
present
present
present
present
present
present
present
present
present
present
present
Quiz
good
good
good
good
good
average
good
good
average
good
good
good
good
good
average
good
average
good
good
average
good
average
good
average
average
good
good
good
good
good
good
good
Identify Student’s Grade and Dropout Issue
Ass
good
good
good
good
good
good
good
good
good
good
good
good
good
good
good
average
good
good
average
good
good
average
good
average
average
good
average
good
average
average
average
good
Mid
F
B+
A
A
D
C
A
B
D
A
A
B
A
D+
B
B+
B
D
C
C
C
F
A
B
D+
B+
A
B
B
C
D+
B
ESM
B
B
A
A
C
C+
A
B
C+
B+
A
C+
A
B
C+
B+
B
C+
C
B
B
D
A
C+
C+
B+
B
B
B
B
C+
B
26
Chapter 8
References
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
present
present
present
present
present
present
present
present
present
present
present
present
present
present
present
present
present
present
present
present
present
present
present
present
present
present
present
present
present
present
present
present
present
present
present
present
present
present
present
present
present
present
present
present
present
good
average
good
good
good
average
good
good
good
good
good
average
good
good
good
good
good
average
good
good
good
good
good
good
good
good
average
average
average
fail
good
average
good
good
average
good
average
good
good
good
good
average
poor
good
good
Identify Student’s Grade and Dropout Issue
average
good
good
good
good
average
good
good
average
average
good
average
good
average
good
good
average
good
good
average
good
good
average
good
average
good
average
average
good
good
good
average
good
good
good
average
average
good
good
good
fail
average
average
good
good
C+
B
B
B
A
C+
D
B
B
A
A
B
A
A
A
B+
B
C
B
D+
A
B+
A
A
C
A
B+
B+
A
B
B
B+
B
A
C
A
F
D
D
D
B
F
D+
B
B+
B
B
B
B
A
B
B+
A
B+
A
B+
C+
B+
B+
A
A
B+
C
B+
C+
A
B
A
A
B
B
B
B+
B
B
A
B+
B+
B+
B
A
D
C+
B+
C+
A
C+
C+
B+
A
27
Chapter 8
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
References
present
present
present
present
present
present
present
present
present
present
present
present
present
present
present
present
present
present
present
present
present
present
present
present
present
present
present
present
present
present
present
senior
present
present
present
present
present
present
present
present
present
present
present
present
present
good
average
average
fail
average
good
good
good
average
fail
good
good
good
good
good
good
average
average
good
good
good
good
average
good
good
good
good
good
average
poor
good
good
good
good
good
good
good
average
average
good
good
good
good
good
average
Identify Student’s Grade and Dropout Issue
average
good
average
average
good
good
good
average
average
good
good
good
good
good
good
average
good
good
good
good
average
good
fail
good
good
good
good
good
average
good
good
average
good
good
average
good
average
good
good
good
good
average
good
average
good
C
B
D+
D
D+
C
A
B+
A
A
A
C
B+
B+
A
B
C
A
B
C+
F
A
C
A
A
C+
B+
A
B+
C+
A
A
B+
A
A
C
A
A
A
B
A
A
A
B
F
B
C+
C+
D+
B
B
A
B+
B+
B+
A
C
B+
B+
B+
B+
B
B+
B+
B
D
A
C
A
A
B
A
A
B
B
A
A
B+
B+
B+
B+
B+
A
B+
B+
A
A
A
B+
C+
28
Chapter 8
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
References
present
present
present
present
present
present
present
present
present
present
present
present
present
present
present
present
present
present
present
present
present
present
present
present
present
present
present
present
present
present
present
present
present
present
present
present
present
present
present
present
present
present
present
senior
senior
average
good
average
good
average
average
good
average
average
good
good
average
good
average
good
good
average
good
average
good
good
good
good
good
good
good
good
fail
average
average
good
good
good
good
good
good
good
good
average
average
average
good
good
average
good
Identify Student’s Grade and Dropout Issue
good
good
good
average
good
good
average
good
average
good
good
good
good
average
average
average
good
average
good
good
good
good
average
good
average
average
good
average
good
average
good
average
good
average
average
average
good
good
good
average
average
good
good
fail
good
B
A
D
C+
C+
C+
A
C
D+
A
A
B
B
B
C
C
F
A
C
A
A
A
D
B+
B
B+
A
B+
D
A
C
B
B+
A
B+
C
A
C
B
C+
C
B
C
C+
A
C+
B+
C
C
B
B
B
C+
C+
A
B+
B
C+
B
C+
C+
D
A
B+
A
A
A
C
B+
C+
A
A
B
C
B
C+
B
B+
A
B+
C+
A
B
B
B
C+
B+
C+
A
B+
29
Chapter 8
168 senior
References
good
good
B
B
Table 4 Data set English subject
The root node can be deducted by calculating gain information from the given student data set.
Here by we have to calculate the entropy value first. Dataset S is a set of 168 given values are
“A”, “B+”,“B”, “C+”, “C”, “D+” “D” and “F” for the attribute ESM.
Entropy = - ((A/n) * (log (A/n))) – ((B+/n) * (log (B+/n))) – ((B/n) * (log (B/n))) – ((C+/n) * (log
(C+/n))) – ((C/n) * (log (C/n))) - ((D+/n) * (log (D+/n))) – ((D/n) * (log (D/n))) – ((F/n) * (log
(F/n)))
This form (Fig 1) shows the input values of the given data set. From this input values we
calculate the values for Entropy, Gain, Split Information and Gain Ratio for each attribute.
Using the Entropy value we are calculating the gain value,
Gain = Entropy - (Abs (((A)/n) * (log (A/n)))) / Abs (Entropy) * ((A/n) * (log (A/n)))) – (Abs
(((B+)/n) * (log (B+/n)))) / Abs (Entropy) * ((B+/n) * (log (B+/n)))) –(Abs (((B)/n) * (log
(B/n)))) / Abs (Entropy) * ((B/n) * (log (B/n)))) –(Abs (((C+)/n) * (log (C+/n)))) / Abs (Entropy)
* ((C+/n) * (log (C+/n)))) –(Abs (((C)/n) * (log (C/n)))) / Abs (Entropy) * ((C/n) * (log (C/n)))) –
(Abs (((D+)/n) * (log (D+/n)))) / Abs (Entropy) * ((D+/n) * (log (D+/n)))) –(Abs (((D)/n) * (log
(D/n)))) / Abs (Entropy) * ((D/n) * (log (D/n)))) –(Abs (((F)/n) * (log (F/n)))) / Abs (Entropy) *
((F/n) * (log (F/n))))
Attribute selection can be done by calculating Gain Ratio. Before that we must calculate the Split
Information. Split Information = log (gain)
= log (Gain)
Using the split value Gain Ratio can be calculated.
Gain Ratio = split information / gain
This Fig 2 shows the calculated value of entropy, gain, split value and gain ratio for the given
attributes. The attribute mid has the maximum gain value, so it is the root node of the decision
tree.
These calculations will be continued until all the data classification has been done or else till all
the given attributes get over.
WEKA toolbox is a broadly utilized tool stash for machine learning and information mining
initially created at the University of Waikato in New Zealand. It contains a vast gathering of
state-of-the-workmanship machine learning and information mining calculations written in Java.
Identify Student’s Grade and Dropout Issue
30
Chapter 8
References
WEKA contains devices for relapse, arrangement, grouping, affiliation guidelines, visualization,
and information preprocessing. WEKA has gotten to be extremely mainstream with scholastic
and modern analysts, and is additionally broadly utilized for instructing purposes. To utilize
WEKA, the gathered information need to be arranged and changed over to (arff) document
organization to be perfect with the WEKA information mining toolbox.
Figure 8 English file in weka
Decision tree methods have been connected on the dataset close by to construct the order model.
The procedures are: the Id3 choice tree calculation. In the wake of applying the preprocessing
and readiness systems, we attempt to break down the information outwardly and figure out the
conveyance of qualities. Figure 1 show the chat of ESM value. In data set of English subject
there are 41 sample of A, 41 sample of B+, 44 sample of B, 28 sample of C+, 9 sample of C, 1
sample of D+ and 4 sample of D.
Identify Student’s Grade and Dropout Issue
31
Chapter 8
References
Figure 9 C4.5 result
Figure show decision tree in which mid is on root node. Variable mid has highest gain ratio so it
is in root node
The tree created by Id3 calculation was exceptionally profound, since it began by characteristic
Mid, which has 8 values.
The Mid has the most extreme addition proportion max gain ratio, which made it the beginning
node and best variable. Different characteristics took an interest in the decision tree were batch,
quiz and Ass.
The Id3 tree demonstrated that all these characteristics have an impact on the grades of
understudy, yet the most emotional characteristics were: mid and quiz. Different indications
could be concentrated from the tree demonstrates that the understudies with Mid = “A" to “C+”
are proceed with their study.
Figure 10 show the result of decision tree id3 algorithm. Find the accuracy of id3 algorithm
Applied classification method on data set. The accuracy of id3 algorithms is 40%.
Identify Student’s Grade and Dropout Issue
32
Chapter 8
References
Figure 10 evaluation on test split
6.1.1 === Run information ===
Scheme:weka.classifiers.trees.Id3
Relation:
english3-weka.filters.unsupervised.attribute.Remove-R1
Instances: 168
Attributes: 5
Batch
Quiz
Ass
Mid
ESM
Test mode:split 66.0% train, remainder test
Identify Student’s Grade and Dropout Issue
33
Chapter 8
References
6.1.2=== Classifier model (full training set) ===
Id3
Mid = F
|
Quiz = good
|
|
Ass = good: B
|
|
Ass = average: D
|
|
Ass = fail: null
|
Quiz = average
|
|
Ass = good: C+
|
|
Ass = average: D
|
|
Ass = fail: null
|
Quiz = fail: null
|
Quiz = poor: null
Mid = B+
|
Quiz = good
|
|
|
|
Ass = good: B+
Ass = average: B+
|
|
Ass = fail: null
|
Quiz = average: B
|
Quiz = fail: B
|
Quiz = poor: null
Mid = A
|
Quiz = good
|
|
|
Ass = good
|
|
Batch = present: A
Identify Student’s Grade and Dropout Issue
34
Chapter 8
References
|
|
|
Batch = senior: B+
|
|
Ass = average
|
|
|
Batch = present: A
|
|
|
Batch = senior: A
|
|
Ass = fail: null
|
Quiz = average
|
|
Ass = good: B+
|
|
Ass = average: B
|
|
Ass = fail: null
|
Quiz = fail: B+
|
Quiz = poor: null
Mid = D
|
Quiz = good
|
|
|
Ass = good: C+
|
Ass = average: C
|
|
Ass = fail: null
|
Quiz = average: C
|
Quiz = fail: D+
|
Quiz = poor: null
Mid = C
|
Ass = good: B
|
Ass = average
|
|
Quiz = good: B
|
|
Quiz = average: C+
|
|
Quiz = fail: null
|
|
|
Ass = fail: C
Quiz = poor: null
Identify Student’s Grade and Dropout Issue
35
Chapter 8
References
Mid = B
|
Quiz = good
|
|
Ass = good
|
|
|
|
|
|
|
|
Batch = senior: B
Ass = average: B+
|
|
|
Quiz = average
|
|
Ass = fail: A
|
|
|
Batch = present: B+
Ass = good: B
Ass = average: C+
|
Ass = fail: null
|
Quiz = fail: B
|
Quiz = poor: null
Mid = D+
|
Ass = good: B
|
Ass = average: C+
|
Ass = fail: null
Mid = C+
|
Ass = good: B
|
Ass = average
|
|
Quiz = good: B
|
|
Quiz = average: B
|
|
Quiz = fail: null
|
|
Quiz = poor: null
|
Ass = fail: A
Time taken to build model: 0 seconds
Identify Student’s Grade and Dropout Issue
36
Chapter 8
References
6.1.3 === Evaluation on test split ===
=== Summary ===
Correctly Classified Instances
23
40.3509 %
Incorrectly Classified Instances
32
56.1404 %
Kappa statistic
0.2555
Mean absolute error
0.1728
Root mean squared error
0.3361
Relative absolute error
79.825 %
Root relative squared error
102.2144 %
UnClassified Instances
2
Total Number of Instances
3.5088 %
57
6.1.4=== Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure ROC Area Class
0.417
0.395
0.227
0.417
0.294
0.487
B
0.786
0.073
0.786
0.786
0.786
0.903
A
0.5
0.038
0.333
0.5
0.4
0.2
0.156
0.222
0.2
0.211
0.267
0.075
0.571
0.267
0.705
0.683
0.364
0.747
0
0
0
0
0
0.75
D
0
0
0
0
0
0.5
D+
0.458
0.418
Weighted Avg.
0.418
0.155
Identify Student’s Grade and Dropout Issue
C
0.416
C+
B+
0.717
37
Chapter 8
References
6.1.4=== Confusion Matrix ===
a b c d e f g <-- classified as
5 0 1 5 1 0 0| a=B
2 11 0 0 1 0 0 | b = A
1 0 1 0 0 0 0| c=C
6 0 1 2 1 0 0 | d = C+
7 3 0 1 4 0 0 | e = B+
1 0 0 1 0 0 0| f=D
0 0 0 0 0 0 0 | g = D+
Figure 11 classifier visalize
6.2 For identify dropout issue.
The data set used in this study had been obtained from Islamic International University
Islamabad admission department. But university not allow. Table 6 is the data sets of sample are
9217. Samples are divide in three equal parts two part are used for training and one part use for
testing. Where 6144 sample are use for training and 3074 samples are used for testing.
Gender
male
male
male
male
male
male
male
male
male
male
male
male
male
male
male
male
HSSC
Marks
A+
A+
A+
A+
A+
A+
A+
A+
A+
A+
A+
A+
A+
A+
A+
A+
ESM
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
sequence
good
good
good
good
good
good
good
good
good
good
good
good
good
good
good
good
Quantitative
good
good
good
good
good
good
good
good
good
good
good
good
good
good
good
good
Identify Student’s Grade and Dropout Issue
Logic
good
good
good
good
good
good
good
good
average
average
average
average
average
average
average
average
Analytical
good
good
good
good
average
average
average
average
good
good
good
good
average
average
average
average
English
vgood
good
average
poor
vgood
good
average
poor
vgood
good
average
poor
vgood
good
average
poor
droupout
no
no
no
yes
no
no
no
no
no
yes
no
no
no
yes
no
no
38
Chapter 8
male
male
male
male
male
male
male
male
male
male
male
male
male
male
male
male
male
male
male
male
male
male
male
.
References
A+
A+
A+
A+
A+
A+
A+
A+
A+
A+
A+
A+
A+
A+
A+
A+
A+
A+
A+
A+
A+
A+
A+
.
.
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
A
.
.
good
good
good
good
good
good
good
good
good
good
good
good
good
good
good
good
good
good
good
good
good
good
good
.
.
.
.
good
good
good
good
good
good
good
good
average
average
average
average
average
average
average
average
average
average
average
average
average
average
average
.
.
.
.
poor
poor
poor
poor
poor
poor
poor
poor
good
good
good
good
good
good
good
good
average
average
average
average
average
average
average
.
.
.
.
good
good
good
good
average
average
average
average
good
good
good
good
average
average
average
average
good
good
good
good
average
average
average
.
.
vgood
good
average
poor
vgood
good
average
poor
vgood
good
average
poor
vgood
good
average
poor
vgood
good
average
poor
vgood
good
average
.
.
no
yes
no
no
no
no
no
no
no
no
no
no
no
no
no
no
no
no
no
no
yes
no
no
.
.
.
.
Decision tree methods have been connected on the dataset close by to construct the order model.
The procedures are: the Id3 choice tree calculation. In the wake of applying the preprocessing
and readiness systems, we attempt to break down the information outwardly and figure out the
conveyance of qualities. Figure 1 show the chat of class label dropout. In data set of dropout
there are 6103 sample of yes and 3113 sample of no.
Identify Student’s Grade and Dropout Issue
39
Chapter 8
References
Fihure 12 dropout file in weka
6.2.1 === Run information ===
Scheme:weka.classifiers.trees.Id3
Relation:
dropout7
Instances: 9216
Attributes: 9
Gender
HSSC Marks
ESM
sequence
Quantitative
Identify Student’s Grade and Dropout Issue
40
Chapter 8
References
Logic
Analytical
English
droupout
Test mode:split 66.0% train, remainder test
Figure 14 run information
6.2.2 === Classifier model (full training set) ===
Id3
ESM = A
|
English = vgood
|
|
sequence = good
Identify Student’s Grade and Dropout Issue
41
Chapter 8
|
|
|
|
References
|
Quantitative = good: no
|
|
|
|
|
Quantitative = average
|
|
Logic = good: no
|
|
Logic = average
|
|
|
|
|
Analytical = good: no
|
|
|
|
|
Analytical = average: yes
Logic = poor: no
|
|
|
|
|
|
|
Quantitative = poor
|
|
|
|
Logic = good: no
|
|
|
|
Logic = average: no
|
|
|
|
Logic = poor
|
|
|
|
|
Analytical = good: yes
|
|
|
|
|
Analytical = average: no
|
|
sequence = average: no
|
English = good
|
|
|
Quantitative = good
|
|
sequence = good
|
|
|
|
Logic = good: no
|
|
|
|
Logic = average: yes
|
|
|
|
Logic = poor
|
|
|
|
|
Analytical = good: yes
|
|
|
|
|
Analytical = average: no
|
|
|
sequence = average: no
|
|
Quantitative = average: no
|
|
Quantitative = poor: no
|
English = average
|
|
Quantitative = good: no
Identify Student’s Grade and Dropout Issue
42
Chapter 8
|
References
|
Quantitative = average: no
|
|
Quantitative = poor
|
|
|
|
|
sequence = average
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Logic = average
|
|
|
|
Analytical = good: yes
Analytical = average: no
Logic = poor
|
|
Logic = good: no
|
|
|
|
sequence = good: no
|
|
|
|
|
|
|
|
|
|
Analytical = good: no
Analytical = average: yes
English = poor
|
|
Logic = good
|
|
sequence = good
|
|
|
|
Analytical = good
|
|
|
|
|
Quantitative = good: yes
|
Quantitative = average: no
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Quantitative = poor: yes
Analytical = average: no
sequence = average: no
|
Logic = average
|
|
|
Quantitative = good: no
|
|
|
Quantitative = average
|
|
|
|
sequence = good: no
|
|
|
|
sequence = average
|
|
|
|
|
|
|
|
|
Analytical = good: no
|
Analytical = average: yes
Identify Student’s Grade and Dropout Issue
43
Chapter 8
References
| | | Quantitative = poor: no
| | Logic = poor
| | | Quantitative = good
| | | | sequence = good: no
| | | | sequence = average: yes
| | | Quantitative = average
| | | | sequence = good
| | | | | Analytical = good: yes
| | | | | Analytical = average: no
| | | | sequence = average: yes
| | | Quantitative = poor: no
ESM = B+
| English = vgood
| | Quantitative = good
| | | Logic = good: no
| | | Logic = average
| | | | sequence = good: no
| | | | sequence = average
| | | | | Analytical = good: no
| | | | | Analytical = average: yes
| | | Logic = poor
| | | | sequence = good
| | | | | Analytical = good: yes
| | | | | Analytical = average: no
| | | | sequence = average: no
| | Quantitative = average: no
| | Quantitative = poor: no
Identify Student’s Grade and Dropout Issue
44
Chapter 8
References
| English = good
| | Quantitative = good
| | | Logic = good
| | | | sequence = good: yes
| | | | sequence = average
| | | | | Analytical = good: yes
| | | | | Analytical = average: no
| | | Logic = average
| | | | sequence = good
| | | | | Analytical = good: yes
| | | | | Analytical = average: no
| | | | sequence = average: no
| | | Logic = poor: no
| | Quantitative = average
| | | sequence = good: no
| | | sequence = average
| | | | Analytical = good: no
| | | | Analytical = average
| | | | | Logic = good: yes
| | | | | Logic = average: no
| | | | | Logic = poor: yes
| | Quantitative = poor: no
| English = average
| | Logic = good
| | | Quantitative = good: no
| | | Quantitative = average: no
| | | Quantitative = poor
Identify Student’s Grade and Dropout Issue
45
Chapter 8
References
| | | | sequence = good
| | | | | Analytical = good: no
| | | | | Analytical = average: yes
| | | | sequence = average: no
| | Logic = average
| | | Quantitative = good: no
| | | Quantitative = average: no
| | | Quantitative = poor
| | | | Analytical = good: no
| | | | Analytical = average: yes
| | Logic = poor
| | | Quantitative = good
| | | | sequence = good: no
| | | | sequence = average
| | | | | Analytical = good: yes
| | | | | Analytical = average: no
| | | Quantitative = average: no
| | | Quantitative = poor: no
| English = poor
| | Logic = good
| | | sequence = good
| | | | Quantitative = good: no
| | | | Quantitative = average
| | | | | Analytical = good: no
| | | | | Analytical = average: yes
| | | | Quantitative = poor: no
| | | sequence = average
Identify Student’s Grade and Dropout Issue
46
Chapter 8
References
| | | | Quantitative = good
| | | | | Analytical = good: yes
| | | | | Analytical = average: no
| | | | Quantitative = average: no
| | | | Quantitative = poor
| | | | | Analytical = good: no
| | | | | Analytical = average: yes
| | Logic = average
| | | Quantitative = good: no
| | | Quantitative = average
| | | | sequence = good: no
| | | | sequence = average
| | | | | Analytical = good: no
| | | | | Analytical = average: yes
| | | Quantitative = poor: no
| | Logic = poor
| | | Quantitative = good
| | | | Analytical = good: no
| | | | Analytical = average: yes
| | | Quantitative = average: no
| | | Quantitative = poor
| | | | sequence = good: yes
| | | | sequence = average
| | | | | Analytical = good: yes
| | | | | Analytical = average: no
ESM = B
| English = vgood
Identify Student’s Grade and Dropout Issue
47
Chapter 8
References
| | Logic = good
| | | Analytical = good
| | | | sequence = good
| | | | | Quantitative = good: yes
| | | | | Quantitative = average: no
| | | | | Quantitative = poor: no
| | | | sequence = average
| | | | | Quantitative = good: no
| | | | | Quantitative = average: yes
| | | | | Quantitative = poor: yes
| | | Analytical = average
| | | | Quantitative = good
| | | | | sequence = good: yes
| | | | | sequence = average: no
| | | | Quantitative = average: no
| | | | Quantitative = poor: no
| | Logic = average
| | | Quantitative = good: no
| | | Quantitative = average
| | | | sequence = good: no
| | | | sequence = average
| | | | | Analytical = good: yes
| | | | | Analytical = average: no
| | | Quantitative = poor: no
| | Logic = poor: no
| English = good
| | sequence = good: no
Identify Student’s Grade and Dropout Issue
48
Chapter 8
References
| | sequence = average
| | | Quantitative = good: no
| | | Quantitative = average
| | | | Analytical = good: no
| | | | Analytical = average
| | | | | Logic = good: no
| | | | | Logic = average: yes
| | | | | Logic = poor: yes
| | | Quantitative = poor
| | | | Logic = good
| | | | | Analytical = good: yes
| | | | | Analytical = average: no
| | | | Logic = average: no
| | | | Logic = poor: no
| English = average
| | sequence = good: no
| | sequence = average
| | | Quantitative = good: no
| | | Quantitative = average
| | | | Logic = good
| | | | | Analytical = good: no
| | | | | Analytical = average: yes
| | | | Logic = average: no
| | | | Logic = poor: no
| | | Quantitative = poor
| | | | Analytical = good
| | | | | Logic = good: no
Identify Student’s Grade and Dropout Issue
49
Chapter 8
References
| | | | | Logic = average: no
| | | | | Logic = poor: yes
| | | | Analytical = average
| | | | | Logic = good: yes
| | | | | Logic = average: yes
| | | | | Logic = poor: no
| English = poor
| | Analytical = good
| | | Quantitative = good
| | | | sequence = good
| | | | | Logic = good: no
| | | | | Logic = average: yes
| | | | | Logic = poor: yes
| | | | sequence = average: no
| | | Quantitative = average
| | | | sequence = good
| | | | | Logic = good: yes
| | | | | Logic = average: yes
| | | | | Logic = poor: no
| | | | sequence = average: no
| | | Quantitative = poor
| | | | sequence = good: no
| | | | sequence = average: yes
| | Analytical = average
| | | sequence = good: no
| | | sequence = average
| | | | Quantitative = good
Identify Student’s Grade and Dropout Issue
50
Chapter 8
References
| | | | | Logic = good: no
| | | | | Logic = average: no
| | | | | Logic = poor: yes
| | | | Quantitative = average: no
| | | | Quantitative = poor
| | | | | Logic = good: yes
| | | | | Logic = average: yes
| | | | | Logic = poor: no
ESM = C+
| English = vgood
| | Quantitative = good
| | | Logic = good: no
| | | Logic = average: no
| | | Logic = poor
| | | | sequence = good: no
| | | | sequence = average
| | | | | Analytical = good: yes
| | | | | Analytical = average: no
| | Quantitative = average
| | | Logic = good: no
| | | Logic = average: no
| | | Logic = poor: no
| | Quantitative = poor: no
| English = good
| | Logic = good
| | | Quantitative = good: no
| | | Quantitative = average: no
Identify Student’s Grade and Dropout Issue
51
Chapter 8
References
| | | Quantitative = poor
| | | | sequence = good: no
| | | | sequence = average
| | | | | Analytical = good: yes
| | | | | Analytical = average: no
| | Logic = average: no
| | Logic = poor
| | | Quantitative = good: no
| | | Quantitative = average
| | | | sequence = good
| | | | | Analytical = good: no
| | | | | Analytical = average: yes
| | | | sequence = average: no
| | | Quantitative = poor: no
| English = average
| | Quantitative = good: no
| | Quantitative = average: no
| | Quantitative = poor
| | | Logic = good: no
| | | Logic = average: no
| | | Logic = poor
| | | | sequence = good
| | | | | Analytical = good: yes
| | | | | Analytical = average: no
| | | | sequence = average: no
| English = poor: no
ESM = C
Identify Student’s Grade and Dropout Issue
52
Chapter 8
References
| Analytical = good
| | Quantitative = good
| | | English = vgood
| | | | Logic = good: no
| | | | Logic = average: no
| | | | Logic = poor
| | | | | sequence = good: no
| | | | | sequence = average: yes
| | | English = good: no
| | | English = average: no
| | | English = poor
| | | | Logic = good
| | | | | sequence = good: yes
| | | | | sequence = average: no
| | | | Logic = average: no
| | | | Logic = poor: no
| | Quantitative = average: no
| | Quantitative = poor: no
| Analytical = average
| | Quantitative = good
| | | English = vgood
| | | | Logic = good
| | | | | sequence = good: no
| | | | | sequence = average: yes
| | | | Logic = average: no
| | | | Logic = poor: no
| | | English = good: no
Identify Student’s Grade and Dropout Issue
53
Chapter 8
References
| | | English = average: no
| | | English = poor: no
| | Quantitative = average
| | | Logic = good
| | | | English = vgood
| | | | | sequence = good: yes
| | | | | sequence = average: no
| | | | English = good: no
| | | | English = average: no
| | | | English = poor
| | | | | sequence = good: no
| | | | | sequence = average: yes
| | | Logic = average: no
| | | Logic = poor
| | | | English = vgood: no
| | | | English = good: no
| | | | English = average: yes
| | | | English = poor: no
| | Quantitative = poor
| | | English = vgood: no
| | | English = good
| | | | Logic = good: no
| | | | Logic = average: yes
| | | | Logic = poor: no
| | | English = average: no
| | | English = poor: no
ESM = D+
Identify Student’s Grade and Dropout Issue
54
Chapter 8
References
| sequence = good
| | English = vgood
| | | Quantitative = good: no
| | | Quantitative = average
| | | | Logic = good: no
| | | | Logic = average
| | | | | Analytical = good: no
| | | | | Analytical = average: yes
| | | | Logic = poor: no
| | | Quantitative = poor: no
| | English = good
| | | Quantitative = good
| | | | Logic = good: no
| | | | Logic = average
| | | | | Analytical = good: yes
| | | | | Analytical = average: no
| | | | Logic = poor: no
| | | Quantitative = average: no
| | | Quantitative = poor
| | | | Logic = good
| | | | | Analytical = good: no
| | | | | Analytical = average: yes
| | | | Logic = average: no
| | | | Logic = poor: no
| | English = average
| | | Quantitative = good
| | | | Logic = good: no
Identify Student’s Grade and Dropout Issue
55
Chapter 8
References
| | | | Logic = average
| | | | | Analytical = good: yes
| | | | | Analytical = average: no
| | | | Logic = poor
| | | | | Analytical = good: no
| | | | | Analytical = average: yes
| | | Quantitative = average: no
| | | Quantitative = poor
| | | | Logic = good
| | | | | Analytical = good: no
| | | | | Analytical = average: yes
| | | | Logic = average: no
| | | | Logic = poor: no
| | English = poor
| | | Quantitative = good
| | | | Logic = good: no
| | | | Logic = average: no
| | | | Logic = poor
| | | | | Analytical = good: no
| | | | | Analytical = average: yes
| | | Quantitative = average
| | | | Logic = good: no
| | | | Logic = average
| | | | | Analytical = good: yes
| | | | | Analytical = average: no
| | | | Logic = poor: no
| | | Quantitative = poor: no
Identify Student’s Grade and Dropout Issue
56
Chapter 8
References
| sequence = average
| | Analytical = good
| | | English = vgood
| | | | Quantitative = good: no
| | | | Quantitative = average: no
| | | | Quantitative = poor
| | | | | Logic = good: no
| | | | | Logic = average: yes
| | | | | Logic = poor: no
| | | English = good
| | | | Quantitative = good
| | | | | Logic = good: yes
| | | | | Logic = average: yes
| | | | | Logic = poor: no
| | | | Quantitative = average
| | | | | Logic = good: no
| | | | | Logic = average: no
| | | | | Logic = poor: yes
| | | | Quantitative = poor
| | | | | Logic = good: yes
| | | | | Logic = average: yes
| | | | | Logic = poor: no
| | | English = average
| | | | Quantitative = good
| | | | | Logic = good: yes
| | | | | Logic = average: yes
| | | | | Logic = poor: no
Identify Student’s Grade and Dropout Issue
57
Chapter 8
References
| | | | Quantitative = average
| | | | | Logic = good: yes
| | | | | Logic = average: no
| | | | | Logic = poor: yes
| | | | Quantitative = poor: no
| | | English = poor
| | | | Logic = good: yes
| | | | Logic = average
| | | | | Quantitative = good: yes
| | | | | Quantitative = average: no
| | | | | Quantitative = poor: no
| | | | Logic = poor: no
| | Analytical = average
| | | Quantitative = good: no
| | | Quantitative = average
| | | | English = vgood: no
| | | | English = good: no
| | | | English = average: no
| | | | English = poor
| | | | | Logic = good: no
| | | | | Logic = average: no
| | | | | Logic = poor: yes
| | | Quantitative = poor
| | | | Logic = good
| | | | | English = vgood: no
| | | | | English = good: no
| | | | | English = average: no
Identify Student’s Grade and Dropout Issue
58
Chapter 8
References
| | | | | English = poor: yes
| | | | Logic = average
| | | | | English = vgood: yes
| | | | | English = good: yes
| | | | | English = average: no
| | | | | English = poor: no
| | | | Logic = poor
| | | | | HSSC Marks = A+: no
| | | | | HSSC Marks = A: no
| | | | | HSSC Marks = B: no
| | | | | HSSC Marks = C
| | | | | | English = vgood: no
| | | | | | English = good: no
| | | | | | English = average: no
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
English = poor
|
|
|
|
Gender = male: no
Gender = female: yes
ESM = D: yes
ESM = F: yes
Time taken to build model: 0.02 seconds
6.2.3 === Evaluation on test split ===
=== Summary ===
Correctly Classified Instances
3088
Incorrectly Classified Instances
45
Identify Student’s Grade and Dropout Issue
98.5637 %
1.4363 %
59
Chapter 8
References
Kappa statistic
0.9678
Mean absolute error
0.0144
Root mean squared error
Relative absolute error
0.1198
3.2148 %
Root relative squared error
25.4112 %
Total Number of Instances
3133
Figure 13 evalution of test split
6.2.4 === Detailed Accuracy By Class ===
TP Rate FP Rate Precision Recall F-Measure ROC Area Class
0.986
0.015
0.992
0.986
0.989
0.985
no
0.985
0.014
0.973
0.985
0.979
0.985
yes
Identify Student’s Grade and Dropout Issue
60
Chapter 8
References
Weighted Avg.
0.986
0.015
0.986
0.986
0.986
0.985
=== Confusion Matrix ===
a b <-- classified as
2058 29 |
a = no
16 1030 | b = yes
Figure15 C4.5 result
Identify Student’s Grade and Dropout Issue
61
Chapter 8
References
7 Conclusion
In this paper, the classification task is used on student database to predict the student’s
performance on the basis of previous semester database. As there are many approaches that are
used for data classification, the decision tree method is used here. Information’s like Attendance,
mid paper mark, Assignment marks entry test marks and other variables were collected from the
student’s database, to predict the performance at the end of the semester and Information’s like
gender, ESM, HSSC marks, entry test marks and other variables were collected from the
student’s database, to identify dropout issue. This study will help to the students and the teachers
to improve the division of the student. This study will also work to identify those students which
needed special attention to reduce failing ration and taking appropriate action at right time.
Identify Student’s Grade and Dropout Issue
62
Chapter 8
8.
References
References
[1] J. R. Quinlan, “Introduction of decision tree”, Journal of Machine learning”, : pp. 81-106,
1986.
[2] J. R. Quinlan, “C4.5: Programs for Machine Learning”, Morgan Kaufmann Publishers, Inc,
1992.
[3] Brijesh Kumar Bhardwaj, Saurabh Pal. “Mining Educational Data to Analyze Students‟
Performance” (IJCSIS), Vol. 2, No. 6, 2011
[4] Abeer Badr El Din Ahmed, Ibrahim Sayed Elaraby “Data Mining: A prediction for Student's
Performance Using Classification Method” (WJCAT) 2(2): 43-47, 2014
[5] Brijesh Kumar Bhardwaj, Saurabh Pal “Data Mining: A prediction for performance
improvement using classification” (IJCSIS), Vol. 9, No. 4, April 2011
[6] Surjeet Kumar Yadav , Saurabh Pal. “Data Mining: A Prediction for Performance
Improvement of Engineering Students using Classification” (WCSIT) Vol. 2, No. 2, 51-56, 2012
[7] Dr. Saurabh Pal. Mining Educational Data Using Classification to Decrease Dropout Rate of
Students” (IJMSE), VOL. 3, NO. 5, MAY 2012
[8] Saurabh Pal. “Mining Educational Data to Reduce Dropout Rates of Engineering Students”
(ijieeb), 2012
[9] K.Shanmuga Priya and A.V.Senthil Kumar. ”Improving the Student’s Performance Using
Educational Data Mining” (IJANA) Volume: 04 Issue: 04 Pages:1680-1685 (2013)
[10] Pimpa Cheewaprakobkit, “Study of Factors Analysis Affecting Academic Achievement of
Undergraduate Students in International Program” IMECS 2013, March 13 - 15, 2013
[11] Ajay Kumar Pal and Saurabh Pal “Data Mining Techniques in EDM for Predicting the
Performance of Students” JCSIT, Volume 02– Issue 06, November 2013
[12] U . K. Pandey, and S. Pal, “Data Mining: A prediction of performer or underperformer using
classification”, (IJCSIT) International Journal of Computer Science and Information Technology,
Vol. 2(2), pp.686-690, ISSN:0975-9646, 2011.
[13] Mrinal Pandey, Vivek Kumar Sharma “A Decision Tree Algorithm Pertaining to the Student
Performance Analysis and Prediction” International Journal of Computer Applications (0975 –
8887) Volume 61– No.13, January 2013
[14] Kalpesh Adhatrao, Aditya Gaykar, Amiraj Dhawan, Rohit Jha and Vipul Honrao
“PREDICTING STUDENTS’ PERFORMANCE USING ID3 AND C4.5 CLASSIFICATION
ALGORITHMS” International Journal of Data Mining & Knowledge Management Process
(IJDKP) Vol.3, No.5, September 2013.
Identify Student’s Grade and Dropout Issue
63
Chapter 8
References
[15] Bangsuk Jantawan, Cheng-Fa Tsai “The Application of Data Mining to Build Classification
Model for Predicting Graduate Employment” (IJCSIS) International Journal of Computer
Science and Information Security, Vol. 11, No. 10, October 2013
[16] P. Ajith, M.S.S.Sai, B. Tejaswi “Evaluation of Student Performance: An Outlier Detection
Perspective” International Journal of Innovative Technology and Exploring Engineering
(IJITEE)ISSN: 2278-3075, Volume-2, Issue-2, January, 2013
[15] Margret H. Dunham, “Data Mining: Introductory and advance topic”.
[16] http://en.wikipedia.org/wiki/Predictive_modelling
Identify Student’s Grade and Dropout Issue
64