Download Report

V-Index: an Index based on Consistent
Researcher Productivity
Submitted By:
S. M Saleem Yasir
364-FBAS/MSCS/F07
Supervised By:
Dr. Ali Daud
Assistant Professor
Department of Computer Science and Software Engineering
International Islamic University, Islamabad
Department of Computer Science and Software Engineering
Faculty of Basic and Applied Sciences
International Islamic University, Islamabad
V-Index: an Index based on Consistent
Researcher Productivity
Submitted By:
S. M Saleem Yasir
A dissertation submitted in partial fulfillment of requirements
for the degree of MS in Computer Science
at the Faculty of Basic and Applied Sciences
International Islamic University
Islamabad, Pakistan
Supervised By:
Dr. Ali Daud
Assistant Professor
Department of Computer Science and Software Engineering
International Islamic University, Islamabad
May 2012
In the name of
Allah
Most Merciful and Compassionate, the most gracious and beneficent
whose help and guidance we always solicit at every step and moment.
Dedicated to my Parents, Teachers
and
Muslim Ummah.
Department of Computer Science and Software Engineering,
International Islamic University Islamabad, Pakistan
Date: 30 /08 /2012
Final Approval

This is to certify that we have read and evaluated the thesis entitled V-Index: an Index based
Consistent Researcher Productivity submitted by S M Saleem Yasir under Reg No. 364FBAS/MSCS/F07 and that in our opinion it is fully adequate in scope and quality as a thesis for
the degree of Master of Science in Computer Science.
Committee
External Examiner:
Dr. Zia Ul Qayyum
Professor
Department of Computing and Technology
IQRA University H-9 Islamabad
Internal Examiner:
Dr. Ayyaz Hussain
Assistant Professor
Department of CS & SE
IIU Islamabad
Supervisor:
Dr. Ali Daud
Assistant Professor
Department of CS&SE
IIUI
Declaration
I hereby certify that the work presented in this thesis is, to the best of my knowledge and belief,
original, except as acknowledge in the text, and that the material has not been submitted, either
in whole or in part, for a degree at this or any other university.
I acknowledge that I have read and understood the University’s rules, requirements, procedures
and policy relating to my higher degree research award and to my thesis. I certify that I have
compiled with the rules, requirements, procedures and policy of the University (as they may be
from time to time).
Name: _______________________________________
Signature: ____________________________________
Date: ________________________________________
ACKNOWLEDGMENT
In the name of Allah, Most Gracious, Most Merciful
Thanks Almighty ALLAH for giving me the courage and patience to carry out this work. I am
very thankful to International Islamic University for providing such a good research
environment.
I wish to thank my supervisor Dr.Ali Daud for his continuous advise, support and encouragement
throughout this work. He has instilled in me a state of confidence, with which I now feel that I
can do research of any new topic following his research guidelines.
I am grateful to Department of Computer Science and Software Engineering IIU Islamabad and
faculty members for providing healthy environment for research.
I would be failing in my duties if I would not remember to thank my fellow graduate students,
especially Mr. Khalid Mahmood, Mr. Asad Mehmood Khan, Mr. Naveed Ahmad, Mr. Zafar
Mahmood and Mr. Muhammad Abid for their continuous motivational support. I am looking
forward to a continue collaboration with them in the future.
I would also like to thank my dear friends Mr. Muhammad Mehran Ajmal, Mr. Ibrar Munsif, Mr.
Mr. Waqar Ahmad, Mr. Saqib Hanif, Mr. Haris, Mr. Haider Ali Farooq and Mr. Waqas Mirza
who have been a continuous motivation behind my success. I am specially thankful to Mr.
Kashif Abbasi for helping me out when I was doing my implementation.
Finally I am eternally grateful to my parents and whole family. Their endless support
encouragement and stimulation have been a true source of strength and inspiration for me. I also
thank my wife for her consistent optimism whenever I was frustrated.
Abstract
In the current era, tremendous amount of scientific research work is published by
thousands of researchers annually. Different methods have been proposed for researcher
productivity indexing based on quantity and quality of publications. Unfortunately, none of them
considered the variation among the number of citations received by a researcher for his papers.
In this paper, a new method named Variation-Index (v-index) is proposed to handle this issue. It
will consider variation in number of citations received by the researcher’s publications. V-index
considers the consistency in citations of researcher’s publication in addition to their quantity and
quality for indexing. We have proposed an idea of indulging time factor within normal v-index
by merging the capabilities of m quotient index. This citation variation enhancement is quite
general and can be merged in any of the existing indexing measures with ease.
We have used h and g indices as our base study and finally compare the results extracted
from the real data set of scientists from Google Scholar database. In results analysis we have
compared the results of both simple and time normalized v-index and it has clear impact on the
scientist ranking. Scientist on higher rank in v-index gets impacted by the time normalized vindex and its ranking changes accordingly. This shows that time factor has strong impact on
scientist ranking. Through statistical measure we have proved that our method performs better in
term of consistency to evaluate scientist than other methods. Quantitatively we have calculated
the standard deviation and applied it to final results for ranking purpose, more the standard
deviation lower the consistency and thus lower ranking. This is our quantitative measure for
ranking. We have developed an application which is used to calculate v-index of scientists and
produces results in comparison with the existing h and g indices.
Table of Contents
Chapter 1 Introduction
1.1-
Introduction .....................................................................................................................2
1.2-
Motivation ........................................................................................................................3
1.3-
Objective of Study ............................................................................................................4
1.4-
Scope of Study .................................................................................................................4
1.5-
Thesis Organization ..........................................................................................................4
Chapter 2 Related Work
2.1
Related Work ...................................................................................................................7
2.2
Ranking Methods .............................................................................................................7
2.2.1-
Citations Count .................................................................................................................... 7
2.2-2
Impact Factors...................................................................................................................... 8
2.2-3
Index .................................................................................................................................... 8
2.3-
Existing Methodologies ....................................................................................................9
2.3.1-
H-Index .............................................................................................................................. 11
2.3.2-
G-Index .............................................................................................................................. 12
2.4-
What is Problem Statement? ........................................................................................... 15
2.5-
Consistency .................................................................................................................... 15
2.6-
Problem Statement ......................................................................................................... 15
Summary ............................................................................................................................... 16
2.7-
Chapter 3 Methodology
3.1-
Methodology .................................................................................................................. 18
3.2-
Standard Deviation ......................................................................................................... 18
3.3-
Proposed Method............................................................................................................ 19
3.3.1-
V-Index: Variation Index .................................................................................................... 19
3.3.1.1-
Example ..................................................................................................................... 19
3.3.2-
Time Normalized V-index .................................................................................................. 21
3.3.3-
Granularity of results .......................................................................................................... 22
3.3.4-
Results and Discussions of solved example......................................................................... 22
3.4-
Summary ........................................................................................................................ 23
Chapter 4 Experiments and Implementation
4.1-
Experiment and Implementation ..................................................................................... 25
4.2-
Dataset ........................................................................................................................... 25
4.2.1-
Publish and Perish Utility ................................................................................................... 25
4.2.2-
Data Extraction and Preprocessing...................................................................................... 25
4.2.3-
Database Tables ................................................................................................................. 26
4.3-
Development Tool and Programming language .............................................................. 28
4.3.1-
Visual Studio 2008 ............................................................................................................. 28
4.3.2-
C# ...................................................................................................................................... 28
4.4-
Application ..................................................................................................................... 28
4.4.1-
4.5-
Screen shots and Descriptions............................................................................................. 29
Summary ........................................................................................................................ 30
Chapter 5 Results and Analysis
5.1-
Results and Analysis ....................................................................................................... 32
5.2-
Scientists with same h-index ........................................................................................... 32
5.3-
Scientists with same g-index ........................................................................................... 37
5.4-
Scientists with same h and g-index ................................................................................. 42
5.5-
Summary ........................................................................................................................ 45
Chapter 6 Research Contribution and Conclusions
6.1
Research Contribution .................................................................................................... 47
6.1.1 Productivity and Efficiency with Consistency............................................................................ 47
6.1.2 Proposal of Simple Method ....................................................................................................... 47
6.2
Conclusion ..................................................................................................................... 47
References ................................................................................................................................ 49
List of Tables
1. Table 2.1: Scientist A’s sample citation distribution……………………….14
2. Table 3.1: Sample data scientist A……………………………………..…..19
3. Table 3.2: Sample data scientist B……………………………………..…..19
4. Table 3.3: H,G and V Indices……………………………………….……..22
5. Table 4.1: Dataset Table…………………………………………….……..26
6. Table 4.2: Index Table…………………………………………….…….....27
7. Table 5.1: Scientist with same h-index 15…………………………..…..….33
8. Table 5.2: Result of scientists with h-index 15…………………………..... 34
9. Table 5.3: Scientists with h-index 16………………………….……………35
10. Table 5.4: Result of scientists with h-index 16…………………………….36
11. Table 5.5: Scientist with same g-index 25………………..…………….…..37
12. Table 5.6: Result of scientists with g-index 25…………………….……….39
13. Table 5.7: Scientist with same g-index 21………………..………….……..40
14. Table 5.8: Result of scientists with g-index 21……………….……………42
15. Table 5.9:Scientists with same h and g indices……………….……………42
16. Table 5.10:Results of scientists with same h and g indices……………..…44
List of Figures
1. Figure 1.1: Comparing Scientist’s work……………………………………..…03
2. Figure 3.1: Comparison between the received citations of author
A and author B……………………………………………………..…21
3. Figure 4.1: Screen shot V-index simulation form…………………...…….……29
4. Figure 4.2: Screen shot index calculation form………………………….…......30
5. Figure 5.1: Chart of scientists with h-index 15………………..………………..34
6. Figure 5.2: Chart of scientists with h-index 16………………..………………..36
7. Figure 5.3: Chart of scientists with g-index 25…………………..……………...39
8. Figure 5.4: Chart of scientists with g-index 15……………………….…………41
9. Figure 5.5: Chart of scientists with same h and g indices……………………….44
Chapter 1
Introduction
V-Index: An Index Based on Consistent Researcher Productivity
1
Chapter 1
1.1-
Introduction
Introduction
In the current era scientific success is based on the research produced by the scientists in
different field of studies. Researcher’s success is based on the papers published by him/her in
different journals and conferences. A large amount of money is being invested on scientific
research in advanced countries due to which competition is getting tougher every day. Currently
massive amount of scientific research work is published and organizations need to evaluate
researcher’s work for finding suitable researchers for emerging industry requirements [7,11]. All
of the scientific progress depends on the quality of research work produced by the researcher’s.
Scopus, Thompson ISI, Google Scholar and Microsoft Academic Search has built and
maintained a large database of researcher’s publication in different journals and conference
proceedings, and citations received by them. They provide h and g-indexes of researchers which
are mostly used for judging researcher productivity.
Ranking of scientific journals, conferences or individual scientists created a competitive
environment and due to this competition enormous amount of research work is produced by the
researchers as ranking depends upon the number of publications. The major problem for ranking
is to evaluate the quality work among the huge data. Many measures have been proposed and
used to evaluate individual work as well as journals and conferences. Different measures used
different technique for evaluation i.e. some consider the number of publications, and some
consider the received citations over the published work. With time new variations are introduced
to overcome the problem of previous methods, number of other factors have been introduced
like, time, age, field, area, application etc.
Different indexing methods have been proposed and used to measure the quantity and
quality of work of researchers. In the past, impact factor (IF) [13] was considered to be the best
indexing method for evaluating the journal articles. It uses the fact of average number of
citations received by an article published in science journals. Journals with high IF considered as
more productive than of those with the lower IF. Impact Factor was limited to the journals
indexing, consequently a general indexing scheme useful for journals, conferences and
researchers named h-index [15] was proposed. It does not consider the average citations count to
the number of documents published. One can use h-index to assess the work of individual
researcher as well as group of researchers or team. Later g-index [9] was proposed, which has
V-Index: An Index Based on Consistent Researcher Productivity
2
Chapter 1
Introduction
used the same method as h-index to calculate the impact and quantity of published work by
researcher but it is more sensitive than h-index by providing more importance to researchers with
highly cited papers. Number of different variants of h-index and g-index were proposed by
different researchers suggesting new enhancement to the existing method by removing their
weaknesses [5,6,16,17]. H-index and g-index are also merged to get benefit from both at the
same time [2]. All the existing indexing methods ignore the variation in citations of papers for
researchers.
1.2- Motivation
Identifying the scientist’s contribution in term of producing quality work has always been a
problem. Any method can be used to evaluate scientist’s worth but it should be transparent and
fair because this evaluation results the decisions for giving limited grants, promotions,
fellowships and awards. Our motivation is to find the best way to assess the contribution of
scientist by removing the gray areas of existing h and g indices. We have based our proposed
solution on the “Consistency” factor which has been ignored by the previous methods. In our
method more consistent scientist in producing quality work regarding received citations is more
contributing.
Figure 1.1: Comparing Scientist’s work
V-Index: An Index Based on Consistent Researcher Productivity
3
Chapter 1
Introduction
As shown in figure 1 we have the situation in which all the scientists have same h index value
and we have to find the best scientist among them.
1.3- Objective of Study
Objective of our study is to introduce our new index which uses consistency as the key
parameter for evaluating the scientist’s work and then we have compared this index with existing
well known indices the h-index and g-index. Following two is main objectives of this study.

Proposing new v-index (variation index) which identify the variations and time
efficiency in received citations and finds the most consistent scientist.

Comparing the results of proposed method with existing h and g indices.

To improve efficiency of existing ranking methods.
1.4- Scope of Study
Scope of our study is to implement our proposed solution using real data of scientists and
generate the results. We have developed simulation software using C# as a programming
language in Visual studio 2008 environment. Dataset of all the scientist has be extracted from
Google scholar database using Publish or Perish utility (PoP). We have performed all the
calculations on the real dataset.
1.5- Thesis Organization
Rest of the thesis is organized in following manner.
Chapter 2: In this chapter we have described in detail different evaluation method for ranking
scientist, journal or conferences and the history of these methods and their applications. And all
the related worked is written in detail. Different indices and previously proposed methods for
evaluating scientist along with their strength and drawbacks are described in this chapter.
On the basis of this literature review problem statement is formulated and written down
in this section.
Chapter 3: In this chapter we have mentioned the methodology we have used to support our
proposed solution. Definition and explanation of standard deviation and Variance is given in this
V-Index: An Index Based on Consistent Researcher Productivity
4
Chapter 1
Introduction
chapter and then we have explained our proposed solution that how it will be applied. Sample
problem is also solved with dummy data here support our proposed solution.
Chapter 4: In this chapter we have mentioned the Tools and utilities used to collect data and data
preprocessing, tools and programming language used to develop our simulation software to
calculate our variation index and then screen shot with descriptions are added in this chapter to
give the UI look to the readers.
Chapter 5: In this chapter input data and the output after execution of our developed program is
discussed in details and analysis of the results is presented. We have taken different criteria to
support our proposed methodology for the evaluation of scientist i.e. if two scientists have same
h-index, same g-index and then same h and g indices values.
Chapter 6: In this chapter we have concluded our thesis and enhancements. In a precise way
research contributions are also mentioned in this chapter.
V-Index: An Index Based on Consistent Researcher Productivity
5
Chapter 2
Related Work and Problem
Statement
V-Index: An Index Based on Consistent Researcher Productivity
6
Chapter 2
2.1
Related Work and Problem Statement
Related Work
We have studied in detail number of articles and research papers regarding scientist
ranking and evaluation. Number of different methods are explored i.e. citation count, Impact
factor, h-index and its variations and enhancements. During our literature review we have
considered strength and weaknesses of different methods their application, impact and research
contribution. On the basis of our literature review we have selected well known h-index and gindex as our base indices and then we have studied in detail the enhancement and variation to
these indices. Through this extensive study we have found that element of Consistency of
received citations is missing in the process of evaluating the scientist work. Details of our related
work with selected papers from different scientists are explained in the coming section of
existing methodologies.
2.2
Ranking Methods
Ranking a scientist on the basis of his/her research contribution is very serious and highly
discussed issues for many years. Different type of methods are used to rank scientist but no
single method is able to compute ranking number more effectively by considering all the
aspects[21]. If one of the evaluation methods is good in one aspect, it lacks in other. Some of the
mostly used ranking methods are discussed below.
2.2.1- Citations Count
Citation is defined as the process of acknowledging the work of one scientist to be used
as reference in our work i.e. if we use idea or work published in any journal or conference as a
reference in our publication or article it means that we have cited this paper and its citation count
will increase by one. Citation count is one of the earliest methods used to evaluate the ranking of
scientist, journal or conference by count the number of citations received in certain period of
time [24][25]. Citation Count can be applied in following criteria.

Individual publication evaluation ( how many have cited this publication)

Scientist or Author ( Total number of received citation for each publication of a scientist)

Journal or Conference (Average citations count received by the article or paper published
in specific journal or conference).
V-Index: An Index Based on Consistent Researcher Productivity
7
Chapter 2
Related Work and Problem Statement
2.2-2 Impact Factors
Eugene Garfield proposed method to assess the quality of work published by the journal
which is known as Impact Factor (IF). Impact factor has been widely used as standard method to
evaluate the journal ranking. Journal which has higher Impact Factor was considered to be
valuable among others. Impact Factor of a journal is calculated as the average number of
citations for each of the published paper gained during the previous two years. Impact Factor of
the journal published in 2011 would be measured as follows.
T = Number of times paper received citations published in 2009 and 2010 in indexed
journal during 2011.
C = Total number of items published in 2009 and 2010 which can be cited.
Then Impact Factor of year 2011 of that journal is calculated as
IF (2011) = T/C
Impact Factor was used to rank the journal and it cannot be applied to individual scientist’s work
[5]. Some other improvements and variations to Impact factor were also proposed by the same
organization. The Immediacy Index which is calculated by dividing the number of citations the
published papers in a journal receive by the number of published papers. Then other
enhancements like Aggregate Impact Factor were introduced for the subject category of the
journals. All these measures can only be applied to journal ranking not the individual or group of
people (Team) work.
2.2-3 Index
A new way of ranking scientist and evaluating his work was proposed by JE
Hirsch in 2005 which later known as h-index. This index based method incorporated both quality
of work and the quantity of work produced by the scientist without taking complex mathematical
calculation and derived a single value index. This index uses citations to identify the quality of
published work and number on publication to consider quantity of the produced work at same
time. This Hirsch index is very easy to use and so it is more widely used. There are number of
other indices proposed are the variations of h index.
V-Index: An Index Based on Consistent Researcher Productivity
8
Chapter 2
Related Work and Problem Statement
2.3- Existing Methodologies
Garfield [13] proposed a method to assess the quality of work published by the journal
which is known as Impact Factor (IF). Journal which has higher Impact Factor was considered to
be valuable among others. Impact Factor of a journal is calculated as the average number of
citations for each of the published paper gained during the previous two years. Impact Factor was
used to rank the journals and it cannot be applied to individual researcher’s work [1] directly. As
IF was representative of whole journal and a researcher who published a paper in that journal and
even his paper did not get citations will get the same IF as other researchers published in that
journal whose papers got many citations.
Individuals should be indexed based of the quality and quantity of their own publications and
citations received by their publications and not by journal in which they publish. To compare and
evaluate the individual research Hirsch [15] proposed h-index. In it papers are arranged in
descending order according to the citations received by them. The h-index is the paper number
N, equals to or less than the number of citations of respective paper and all the proceeding
documents have N or fewer citations. The h-index was robust in the sense that it did not punish a
researcher for the number of papers which are not cited to the ones with high citation rate [4,12].
One everlasting limitation of these indexing methods is also discussed that they cannot be used
to measure the impact of a researchers awarded with Nobel Prize on their extra ordinary work
[15].
During the recent years h-index is used to be the most practicing index to measure and assess the
quality and quantity of work of individual researchers directly unlike IF which can measure
researcher productivity indirectly through journal citations. It can be applied to journal
publications as well as article appears in different conferences, but it has been found that h-index
appears to be less sensitive to tackle different factors like giving more importance to highly cited
papers. Consequently, Egghe [9,10] proposed new index called g-index, which gives extra
weight to highly cited papers. If publications of scientist are ranked in descending order then gindex is the largest document number such that top g publications collectively received at least g2
citations. The g-index calculation resembles to the h-index and it makes the procedure of ranking
the scientist more sensitive [8], but as both of the indices used natural number to calculate so
they both have deficiency of discriminatory authority.
V-Index: An Index Based on Consistent Researcher Productivity
9
Chapter 2
Related Work and Problem Statement
Both h-index and g-index ignored the career length of researcher which is discussed by Burrell
[5] and an enhancement named m-quotient to existing h-index by including career length was
proposed. In M-quotient the h-index value is divided by the number of years of research activity.
Later, Burrell [6] proposed a-index by saying most prolific core of scientist output can be
expressed as the average number of citations of a published paper in h core. Instead of using
arithmetic average to measure the central tendency of citation distributions, new method based of
median named m-index was introduced [20], by discussing the extreme values effect on
arithmetic average. Another variation of g-index and h-index was presented by Kosmulski [17]
known as h(2) index. Calculation of h(2) index just like original g-index, has added more
sensitivity to h-index and gives importance to more cited papers like g-index. The h(2) index of a
scientist is the natural number equals to h(2) such that most cited h(2) publications received at
least (h(2))2 citations collectively [17]. A weakness of a-index was discussed [16], that its
process involves the division by h-index which affects the result of a good researcher with higher
h-index. Jin et al. [16] handled this unfair behavior of a-index and proposed new solution in the
shape of r-index. In r-index instead of dividing by h-index value of a researcher, author used
method of taking square root of the sum of the citations of published papers in Hirsch’s core to
calculate the index. Jin et al. [16] along with r-index also proposed the Ar-index which adapted
the power of r-index. It considered not only the intensity of the citations of the published article
but also make use of the life time of the publication, which make it more sensitive as with the
passage of time index of a scientist not only increases but can be decreased.
New methods are suggested to complement existing h-index by removing the weakness of
ignoring the details [18]. New idea is to create h-sequence and h-matrix of the scientist to find
out rank at different scientific career time span, whereas one could also find out the original
Hirsch index to that scientist in h-sequence and h-matrix. Egghe and Rousseau [11] proposed
weighted h-index written as hw-index [11]. It depends on the number of citations obtained by the
published papers in Hirsh core. It was presented in continuous settings and discrete. It was
observed that in its continuous setting this index worked well and shows some good results,
while in discrete setting some deviations from the ideal results are countered. Alonso et al. [2]
tried to reduce the weaknesses of h-index and g-index by merging both of the indices. He merged
the properties of both of the indices and created new index known as hg-index. The relationship
V-Index: An Index Based on Consistent Researcher Productivity
10
Chapter 2
Related Work and Problem Statement
of journals and researchers is discussed and an Indexing criteria by considering journal and the
scientist at the time is proposed [3]. The intuition was that both entities are interrelated to each
other as highly ranked journals have publications of highly ranked scientists. In the related work
studied so far no one handles the problem of variations of citations of the published work of a
researcher which motivated us to propose v-index.
2.3.1- H-Index
Jorge E. Hirsch proposed this index which is known as h-index [1]. It is an index used to
measure the productivity and impact of the published research or work of a scholar. This index
uses the set of most cited papers of a scientist and the number of citations relevant to each of
these papers which they have gained in the other scientist’s work. Hirsch index can be used to
find the productivity and impact of the work produced by team of scientists. The h-index requires
the distribution of citations received by the scientist’s publications. Hirsch defines it as
A scientist has index h if h of [his/her] Np papers have at least h citations each, and the other
(Np − h) papers have at most h citations each.
Simply we can explain it as scientist who has received index h has produced h papers
each of which has received at least h citation by other researchers. Thus h-index covers both, the
publications and the number of citations they have received. Index was intended to measure the
work of different scientists of the same field and it improves the impact of simple measures i.e
total number of citations or publications.
Following is the equation to find the h-index.
ℎ=
(1)
Where NcT is the total number of citations received and “a” is the proportionality
constant range between 3 and 5. The h-index is widely used as a substitute to the customary
Impact factor to evaluate the efforts and contribution of particular scientist. The h-index
considers the most cited papers of individual scientist so its calculation is simpler and easy
process. The h-index of a scientist increases as citations accumulate, so it depends on the number
of years of career of a scientist.
V-Index: An Index Based on Consistent Researcher Productivity
11
Chapter 2
Related Work and Problem Statement
Hirsch has proposed the h-index to address the main weaknesses of other evaluation
indicators, such as impact of total number of papers or citations. As number of papers doesn’t
mean the quality of work produced by the scientist while total number of citations can be
influenced by the huge number of citations received by the one or two publications whereas
other publications of same scientist have none or very few citation’s count. The h-index
evaluates and counts both number of citations and the number of publications at the same time.
The h-index is not affected by great number of citations of one or two papers.
Example 1
Scientist A have
Number of document published
12
Total number of Citations
249
Then h-index will be
ℎ =
=
=7.889
h= 7.889
Nct = Total number of citation
a = Proportionality constant that ranges between 3 and 5
h = h-index
2.3.2- G-Index
Leo Egge in 2006 presented new solution based on existing h-index to assess the
contribution of scientist [2]. This new solution known as g-index also depends upon the number
of citations received by the published work of individual scientist. The g-index utilizes the same
method of h-index for the calculations i.e. documents are ranked and sorted in descending order
with respect to their achieved citation numbers. The index is measured by counting the
distribution of citations achieved by the published article of the given scientist [3].
V-Index: An Index Based on Consistent Researcher Productivity
12
Chapter 2
Related Work and Problem Statement
Given is the set of published papers sorted in descending order according to the number of
citations the gained, then g-index is the largest number such that the top g papers collectively
have) at least g2 citations [3].
The g-index is very much resembles to the h-index, and it tried to remove the weaknesses
of h-index. It makes the procedure of ranking the scientist more sensitive, but as both of the
indices used natural number to calculate so they both have deficiency of discriminatory
authority.
Example 2
LET WE HAVE SCIENTIST A WITH
Number of document published
Total number of Citations
12
249
The highlighted record in the sample data table shows the calculated g-index of
scientist A.
g-index = 15
V-Index: An Index Based on Consistent Researcher Productivity
13
Chapter 2
Related Work and Problem Statement
Table Titles: Document Rank (DR), Citation (Cit), Sum of Citations (∑Ci)
Table 2.1: Scientist A’s sample
citation distributions
PubR Ci
PubR2 ∑Ci
1
50
1
50
2
44
4
94
3
40
9
134
4
31
16
165
5
25
25
190
6
18
36
208
7
12
49
220
8
10
64
230
9
9
81
239
10
5
100
244
11
4
122
248
12
1
144
249
13*
0
169
249
14*
0
196
249
15*
0
225
249
16*
0
256
249
V-Index: An Index Based on Consistent Researcher Productivity
14
Chapter 2
Related Work and Problem Statement
2.4- What is Problem Statement?
Problem statement is most sensitive part of any research. A well formalized and declared
problem statement shows the worth of a scientific work and so it can add value to the proposed
methodology [23]. Problem statement provides the description of existing problem that help to
define the scope and provide the direction to our research area.
2.5- Consistency
Consistency is measure to estimate the distribution of data items in a data set that how
closely they are dispersed around the mean value or center. More the consistency values means
that data items in a given data set are closely coupled to each other. In our research, consistency
is the basic element for the formulation of problem statement and the proposed solution. We
have found in related work study that no one before consider this basic attribute to evaluate
scientist. Our proposed solution defines that more consistent scientist in term of received
citations to his/her published papers is more contributing scientist.
2.6- Problem Statement
As we have discussed Hirsch’s h-index and g-index and there different variations
proposed in the recent years. The detail study h-index and g-index shows that they are not
handling the variation of number of citations between the work of two or more scientist with
same value of h-index or g-index. For example we have two scientists A and B, both have hindex of 20. A has top 15 papers containing more than 50 citations each while B has no paper
exceeding the 50 citations. This show that scientists A work is more worthy than that of the
scientist B, but h-index and g-index are not sensible to that situation and are unable to consider
the variation of number of citations of two scientists having the same h-index and g-index values.
We are proposing new solution which will calculate the variation and then we divide the
scientist’s h-index and g-index value with that variation value to get the final rank of the quality
of work of a given scientist. Our method will make h-index and g-index more sensible to handle
the variation count of citations.
V-Index: An Index Based on Consistent Researcher Productivity
15
Chapter 2
2.7-
Related Work and Problem Statement
Summary
In this chapter we have discussed in detail the methods or techniques of ranking scientist
work Impact factor, Average count, and or indexes. After that details of recent work on ranking
and its enhancement or variation of h-index are explained. In related work h-index and its
variations like g index, m quotient, A-index, Ar-index, hg-index etc and many more are
presented with their proposed work and weaknesses. Then we have taken h-index and g-index
and solved the examples to calculated the indexes and found that in both examples scientists
have resembled each other and h-index and g-index are unable to find the difference between
two scientists. This make base for our motivation and formulation of problem statement. In our
problem statement it is clearly mentioned that two scientist resembles each other in every aspect
have some variations in their received citations. This citation variation factor plays an important
role in scientist ranking.
V-Index: An Index Based on Consistent Researcher Productivity
16
Chapter 3
V-Index: An Index Based on Consistent Researcher Productivity
Methodology
17
Chapter 3
Methodology
3.1- Methodology
The process of collecting and extracting data for scientific research is simply called
research methodology. Collection of data may be done theoretically and or practically i.e. in
theoretical way a concept is to be proved by taking strategic measures which are not practically
proved where as in the later way real time statistical data is used and idea is experimentally or
mathematically proved by strong analysis.
In our research we have used real time data of scientists and then we applied different
mathematical operations by implementing it practically to prove our proposed solution.
3.2- Standard Deviation
Standard deviation is used to determine the inconsistency and diversity in statistics. It
finds the disparity or distribution of a value from the mean value. Lesser standard deviation
means the data point or value is closer to the mean value whereas higher standard deviation
shows value far away from the mean[26].
Standard deviation of a dataset is simply the square root of variance. It is denoted as (σ) and
formula to calculate standard deviation is given below.
σ=
∑
(
µ)
(3)
Where ‘i’ from 1 to n represents the number of publication, ‘X’ show the citation received by the
publication, ′µ′ represent the mean value and ‘d’ is the total number of publications.
Standard deviation can be used if we have to

Assess the degree of scattering of the values from its mean,

Assess the inaccuracy in the mean of a data set is taken when making assumption of the mean of
the whole population from which sample data set is was extracted.

Calculate the probabilities of events occurring in the given data set.
V-Index: An Index Based on Consistent Researcher Productivity
18
Chapter 3
Methodology
3.3- Proposed Method
We are here proposing our proposed methods for scientist ranking. Our new indexes are v-index and v-index with time normalization. Both of these indexes are discussed in detail down
below.
3.3.1- V-Index: Variation Index
In this section, our proposed method named v-index is given, where ‘v’ stands for the
variations in the received citations. The case of two scientists A and B given in Table 1 and
Table 2 is taken to show that how variation in received citations of published papers play an
important role to differentiate between work of researchers of same h-index and g-index value.
We are dealing here with the situation in which both of the researchers have same number of
published papers with same number of total citations. In this case quality of the work produced
by the researchers is not differentiated by h and g indices due to their inability to handle citation
variation. The v-index takes into consideration the quantity, quality and citation variation of
papers altogether.
3.3.1.1-
Example
In the dataset given in Table 1 and Table 2 both scientist have same number of published articles,
same number of total received citations. H-index and g-index shown with highlighted rows for
researcher A and B which is the same for both. Publication Rank is denoted by (PubR), Ci is
denoted by (Citations), Square of Publication Rank is denoted by (PubR2) and Cumulative
citations is denoted by (∑Ci).
TABLE 3.1:SAMPLE DATA
SCIENTIST A
TABLE 3.2: SAMPLE DATA
SCIENTIST B
PubR
1
Ci
50
PubR2
1
∑Ci
50
PubR
1
Citations
108
PubR2
1
∑Ci
108
2
44
4
94
2
50
4
158
3
40
9
134
3
20
9
178
4
31
16
165
4
14
16
192
5
25
25
190
5
13
25
205
6
18
36
208
6
13
36
218
7
12
49
220
7
11
49
229
8
10
64
230
8
10
64
239
V-Index: An Index Based on Consistent Researcher Productivity
19
Chapter 3
Methodology
9
9
81
239
9
9
81
248
10
5
100
244
10
1
100
249
11
4
122
248
11
0
122
249
12
1
144
249
12
0
144
249
13*
0
169
249
13*
0
169
249
14*
0
196
249
14*
0
196
249
15*
0
225
249
15*
0
225
249
16*
0
256
249
16*
0
256
249
Table 1 and Table 2 show that both researcher A and B have the same h and g indexes. Both of
the indexes are less sensitive that they are unable to find the difference between the works of
scientists by considering the variations of citations distribution. Figure 1 shows the citation
variation for the work published by researcher A and B. It clearly shows that the citation
variation for researcher A is less as compared to citation variation of researcher B. One can say
that researcher A has a more stable graph of citations or productivity.
For adding the citation variation factor in h or g index like indexes standard deviation is used
which is a commonly used method of finding variation in data is given in Eq. 3. Standard
deviation can also be calculated by taking the square root of variance1. Variance is calculated by
taking the arithmetic mean of the square of difference of each value and the mean.
1
http://www.mathsisfun.com/data/standard-deviation.html.
V-Index: An Index Based on Consistent Researcher Productivity
20
Chapter 3
Methodology
Figure 3.1: Comparison between the received citations of author A and author B.
After the citation variation is calculated, v-index is obtained by simply dividing the
scientist existing index value i.e. h or g indexes by the calculated standard deviation and the the
active life time of the scientist. The new value which is the v-index shows ranking of scientist
with all the good features of h and g indices along with the consistency of their quality work. In
this work, the citation variation effect is added only to h and g indexes (Eq. 4) but this
enhancement is very general and can be added to all other existing index as well.
v
= σ or σ
(4)
Where h and g are the existing indices and σ is the calculated citation variation calculated
through standard deviation.
3.3.2- Time Normalized V-index
We can improve the results with time normalized v-index. This is another index which is
derived from the simple v-index. In the index we have included the Author Research Age factor
to show the efficiency of the author.
Formula for Time normalized v-index will be.
V-Index: An Index Based on Consistent Researcher Productivity
21
Chapter 3
Methodology
v
= (σ or σ)
(4)
Where t denotes the active life time of a scientist which can be calculate as the difference
of years from the current time and the year of first publication of a given scientist.
3.3.3- Granularity of results
We can see that results are appearing in decimal values most of the time which is bit
complex to show the index of a scientist. Index should be more precise and single value,
currently we have more granule value of the index of a scientist. We can multiply the results with
10, 100, 1000 and more to step out of the granularity unless we get the desired value and then
rounding the results to get rid of decimal point.
3.3.4- Results and Discussions of solved example
Table 3 shows the calculated indexes values of scientist A and B for h, g and v indexes. It
can be noticed that v-index poses more sensitivity than widely used existing h and g indices.
Both of the existing indices do not consider the consistency of producing good work. There are
chances that one researcher produces a paper which receives great number of citations whereas
all other publications receive average or less citation count while second scientist consistently
produces good work with high number of citations. Both scientists have 9 h-index and 15 gindex values and both of these indices are insensitive to find the consistency of publishing highly
cited work. Table 3 shows the final results and comparison of v-index and other indices with
calculated standard deviation. Our proposed method v-index values are (0.56, 0.90) for scientist
A and (0.30, 0.55) for scientist B for h and g indexes values, respectively. The higher values of
v-index for researcher A as compared to researcher B which for the h and g indexes, shows that
scientist A is better than scientist B due to more consistency in producing quality work.
Table 3.3: h, g and v indices
Scientist
A
Indices
(h,g)
Standard
Deviation ( )
V-index
H
9
16.19
G
15
16.69
Multiple of Multiple of
10
100
0.56
6
56
0.90
9
90
V-Index: An Index Based on Consistent Researcher Productivity
22
Chapter 3
Methodology
B
H
9
29.20
0.30
3
30
G
15
27.41
0.55
6
55
It is assumed that if this example of resemblance in ranking of two or more researcher’s v-index
outperforms over existing methods, it can be applied to any real data for more accurate indexing.
We can see that by multiplying the results with 10 and 100 we get numeric value which can be
used as new index of the scientists.
3.4- Summary
In this chapter we have discussed the methodology which is followed to implement the proposed
solution. Variance and Standard deviation are the key factor involved in v-index calculation so in
first part we have explained in detail both of these terms and formula to calculate. V-index
method has been applied to solve the example. Calculation of v-index is simple in the first step
calculate variance and standard deviation and then divide the existing h or g index value with
calculated standard deviation. We will get the v-index now multiply the result with 100 and
round the value to get single numeric value easy to understand. Value of 100 defines the
granularity and helps in formulating results into human understandable format. Results table
shows that scientist A has better index ranking than scientist B which was undetectable by
existing h and g indexes.
V-Index: An Index Based on Consistent Researcher Productivity
23
Chapter 4
Experiment and
Implementation
V-Index: An Index Based on Consistent Researcher Productivity
24
Chapter 4
Experiment and Implementation
4.1- Experiment and Implementation
To fulfill the requirement of Thesis I have developed an application to prove the
performance of Variation index which will execute all the process of calculating h and g indices
and the variation index and give us the final results. Different experiments are performed using
different criteria. All the experiments and simulations are run on local system using the real data
set of number of scientists extracted from the database of Google Scholar.
4.2- Dataset
There are different databases containing the information of scientists and their
publications in journal and conferences. Google Scholar, Scopus and Web of Science (WoS) also
known as Web of Knowledge provide the facility to calculate the h-index and g-index of given
scientist. All of these mentioned above organizations are maintaining their own databases in the
backend so there will be different result for the publications of same author/scientist on different
databases and so the difference in h and g indices.
We have used the Google scholar database to extract the data of scientist by using POP
utility. Google scholar has been growing as the huge source of online data since recent years and
it is free and easy to access source.
4.2.1- Publish and Perish Utility
Publish or Perish utility by the Harzing provides the same facility of getting the author
information about his published work and rank on different measuring scales. Publish or Perish
is using Google Scholar’s database in the backend. We have used Publish or Perish (PoP) utility
to extract the information of more than 12 thousand scientist with their respective publications
and Citation records. All the data is extracted to Comma Separated Delimited data file.
4.2.2- Data Extraction and Preprocessing
As we have all the data in the form of Text file which is difficult to manipulate. So there
required some level of data preprocessing over the data to convert all the data our desired form.
We have used Microsoft Access for database creation. The preprocessing of our data is to export
all the data which is in the form of CSV file to respective database table in the appropriate fields.
V-Index: An Index Based on Consistent Researcher Productivity
25
Chapter 4
Experiment and Implementation
4.2.3- Database Tables
We have created two database tables in MS Access. Tables can be viewed as follows with
Fields and structure. Table named as “Data set” is the main table contains information about the
scientist and his publications with the citation received. We use this table data to calculate the
indices of scientists.
Table 4.1: Dataset table
Dataset
Number
Citations
Author 1
Author 2
Author 3
Author 4
Paper_Title
Year
Pub_Type
Publisher
Publisher_link
Paper_Link
Fields in the above table stores and contains following type of data.
Number: It is the auto generated number shows the sequence number.
Citations: It contains the received citations of the respective publication.
Author1: It contains the name of first Author.
Author2: It contains the name of second Author if any.
Author3: It contains the name of third Author if any.
Author4: It contains the name of fourth Author if any.
Paper_Title : It save the title of the publication.
Year: It contains the year of publication.
Pub_Type: It shows the publication type either journal or conference.
V-Index: An Index Based on Consistent Researcher Productivity
26
Chapter 4
Experiment and Implementation
Publisher: It shows the name of the publisher.
Publisher_link: It contains link to publisher web site.
Paper_Link: It contains the link to online paper view.
Second table named “Indextable” in our database contains the calculated indices information.
This table data is calculated runtime during the execution of applications. Data of this table is
used to show the results and the best scientist name during the result and analysis phase.
Structure of “Indextable” is shown below.
Table 4.2: Index table
Indextable
Publications
Scientist_name
H_index
G_index
Variance
Standard_deviation
V_index_hindex
V_index_gindex
Vindex_h_time
Vindex_g_time
Author_age
Fields in the above table stores and contains following type of data.
Publications: It contains the total number of publications of respective scientist.
Scientist_Name: It contains the name of scientist.
H_Index: It shows the calculated h-index of respective scientist.
G_Index: It shows the calculated g-index of respective scientist.
Variance: It shows the variation in the received citations of respective scientist.
Standard_Deviation: It shows the standard deviation value.
V-Index: An Index Based on Consistent Researcher Productivity
27
Chapter 4
Experiment and Implementation
V_index_hindex: It shows the calculated Variation Index for h-index value of scientist.
V_index_gindex: It shows the calculated Variation Index for g-index value of scientist.
Vindex_h_time: It shows the time normalized result for h index value of scientist.
Vindex_g_time: It shows the time normalized result for h index value of scientist.
Author_age: It shows the activeAuthor Research Age of a scientist.
4.3- Development Tool and Programming language
There are different development tool available. To simulate this project I have used
Microsoft application development tool i.e. Visual Studio 2008 integrated environment with C#
as programming language.
4.3.1- Visual Studio 2008
Microsoft Visual Studio provides an integrated development environment (IDE) for
developers. User can develop different console and GUI based like windows form applications,
web application and services. Visual Studio supports number of programming languages by
using different language services, code editor and debugger with the help of common language
runtime CLR maintain almost any programming language, provided a certain language-specific
service is present and installed. There are number of built in languages which includes C, C++,
Visual C++, VB.net, C#.net, J#, F# etc [22].
4.3.2- C#
Visual C# is designed for developing different type of applications using the .NET
framework environment. C# is easy, versatile, authoritative, type-safe, and object-oriented, with
its novelty. It enables quick application development and also keeps the clarity and style of old C
language. I have coded the entire application using C# or sharp language
4.4- Application
Our developed application is a desktop application contains different forms developed in
Visual studio 2008 using C# development language. Our application is integrated with database
tables created in MS Access. Two forms are designed, first form named as “V-index Simulation”
V-Index: An Index Based on Consistent Researcher Productivity
28
Chapter 4
Experiment and Implementation
is shown on launching the application. This form is used to load data from the database and to
view the data of individual scientist on selection from the scientist list. Second form named as
“Index Form” is visible when user click “Calculate Index” button on the first form. On loading
the second form all the required calculation are made in the backend. This form is used to
compare different scientist fulfilling the filtering criteria and result of the best scientist is shown
on the form.
4.4.1- Screen shots and Descriptions
In Figure 3 main form is visible which is shown when application is launched. This form
is used to load the database and view the scientist’s list. User can select a scientist to view his/her
publications and the relevant information of respective publications. User can select or filter
scientist’s list in two ways i.e. with respect to number of publications and citations. Calculate
Index button is available to calculate h, g and v indices of all the scientists available in the list.
Figure 4.1: Screen shot V-index simulation form.
When user clicks calculate index button on form one, new form gets open. This new form that is
shown in Figure 4 is used to filter the scientists according to their indices. User can select
scientist with respect to their h and g indices alone and as well as collectively. When user clicks
V-Index: An Index Based on Consistent Researcher Productivity
29
Chapter 4
Experiment and Implementation
Apply button after defining the filter criteria in the left pane all the details with calculated
variance, standard deviation and v-index starts appearing meeting the selection criteria. In the
mean while most consistent scientist’s details are viewable in the bottom left pane which is
helpful for the user to see the scientist name and his/her v-index.
Figure 4.2: Screen shot Index calculation form
4.5- Summary
In this chapter we have discussed the tools and technologies used during the
implementation of the project. Dataset has been generated through Google scholar by using
publish and perish (PoP) utility available free. We have generated data of more than 10000
scientists with more than 16000 publications. All this data set was extracted in (.csv) file which
is converted into MS Access database table. Another table for results calculation “Indextable”
has been created. We have used c#.net and ADO to develop UI for our implementation and
database connection. Screen shots are attached of our developed project.
V-Index: An Index Based on Consistent Researcher Productivity
30
Chapter 5
Chapter 5
Results and Analysis
Results and Analysis
V-Index: An Index Based on Consistent Researcher Productivity
31
Chapter 5
Results and Analysis
5.1- Results and Analysis
In this chapter we are discussing in detail the generated results by our proposed idea of
V-index by using the real data of scientist extracted from Google scholar. To properly analyze
the results we have defined three conditions on which two or more scientist may resemble each
other and stand on the same rank and existing h and g indices are insensitive to this situation.
Then we have used our proposed v-index method to distinguish between the scientists on the
basis of consistency of their quality work. After applying our method we are able to rank the
scientists more properly.
5.2- Scientists with same h-index
In this section we are considering the criteria that if two or more scientists have same hindex values. By using our data set we have found that following seven scientists mentioned in
Table4 (a) have h-index value of 15. This is our first problem which we have solved using vindex to distinguish between these scientist and rank them properly base on the consistency of
their good work in sense of received citations on their publications.
In the table citations distribution and the publications are given for the given scientists along
with the calculated h-index which is highlighted in row 15. Below is the chart drawn using the
above data values. In Figure 3(a) we can see that there is clear difference in the received citation
rate for the individual scientist which is undetectable by the h-index
V-Index: An Index Based on Consistent Researcher Productivity
32
Chapter 5
Results and Analysis
Table 5.1: Scientist with same h-index 15
Pub
R
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
MR
AzimiSadjadi
F
Hirsch
Citation
s
83
80
69
57
50
30
27
26
23
22
20
19
18
16
15
14
14
14
14
13
13
12
0
0
Citation
s
100
71
45
43
40
38
37
36
32
31
25
18
16
16
15
12
11
10
7
6
6
5
5
4
TF
SyedaMahmoo
d
PK
Bhattachary
a
M
Bhattachary
a
Citations
Citations
Citations
147
111
65
49
49
42
41
35
31
29
26
18
17
15
15
14
14
13
12
11
10
9
8
6
132
90
73
72
68
57
50
38
34
32
21
19
15
15
15
10
10
9
9
8
8
8
8
7
375
54
52
36
27
24
22
22
18
18
18
17
15
15
15
13
12
12
12
11
10
10
9
9
DV
Gupta
AK
Aziz
Citation
s
351
195
105
90
87
74
63
51
50
29
28
22
20
15
15
13
13
10
9
8
7
4
2
1
Citation
s
770
463
201
101
50
33
31
31
29
25
24
20
18
15
15
14
13
13
13
12
11
11
10
9
.
V-Index: An Index Based on Consistent Researcher Productivity
33
Chapter 5
Results and Analysis
Figure 5.1: Chart of scientists with h-index 15
We have applied our V-index method to rank the scientists given in table 4(a). Results of the
experiment are mentioned in table 8 and it is clear that v-index have detected the difference
between the contributions of each scientist on the basis of consistency of received citations.
Table 5.2: Result of scientists with h-index 15
V-index
V-index
Time
Normalized
Author
Research
Age
(yrs)
h-index
Standard Deviation
( )
15
30.6165221501631
49
2
21
15
22.696365347782
66
8
8
15
17.8922702118354
84
3
26
PK Bhattacharya
15
32.2897042414451
47
1
M Bhattacharya
DV Gupta
AK Aziz
15
15
15
62.5699608438426
72.1920331835336
145.03648966151
24
21
10
1
1
0
Scientist Name
TF SyedaMahmood
F Hirsch
MR AzimiSadjadi
V-Index: An Index Based on Consistent Researcher Productivity
49
38
36
51
34
Chapter 5
Results and Analysis
Highlighted rows in the results table show the most consistent scientist and so are highly ranked
in the following scientists.
Table shows that scientist ranking is effecting by the time normalization factor in v-index
calculation. We can see that with simple v-index calculation Mr Azimi-Sadjadi is ranked higher
but when time factor normalization is applied F Hirsch moves upward in index table, which
shows that time factor has major impact on the scientist ranking.
In our second set of scientists we have found 4 scientists with same h-index values of 16. We
have applied same process as applied on the scientists of table 4(a).
Table 5.3: Scientists with h-index 16
M Heinz
PubR
Citations
1
135
2
127
3
81
4
67
5
65
6
56
7
41
8
35
9
32
10
29
11
26
12
22
13
22
14
20
15
18
16
17
17
15
18
13
19
13
20
12
21
11
22
10
R Bhattacharya
PubR
Citations
1
110
2
49
3
44
4
41
5
40
6
35
7
30
8
26
9
26
10
21
11
20
12
20
13
19
14
17
15
16
16
16
17
14
18
12
19
12
20
11
21
10
22
10
SY Chung
PubR
Citations
1
74
2
74
3
67
4
42
5
42
6
37
7
37
8
30
9
29
10
19
11
19
12
18
13
18
14
17
15
17
16
16
17
15
18
9
19
9
20
8
21
8
22
7
M Khalid
PubR Citations
1
151
2
61
3
50
4
43
5
38
6
35
7
33
8
29
9
22
10
21
11
19
12
18
13
18
14
17
15
16
16
16
17
14
18
13
19
13
20
13
21
12
22
11
In figure 3(b) chart is drawn using the values given in table 4(b). This chart also shows the difference
between the scientists.
V-Index: An Index Based on Consistent Researcher Productivity
35
Chapter 5
Results and Analysis
Figure 5.2: Chart of scientists with h-index 16
We have applied our proposed index (v-index) and after experiment results are generated shown
in table 9. Values in result table shows that M Khalid is more contributing scientist among the
other and so he is placed on the top highlighted in the table with normal v-index calculation.
Result table shows that standard deviation value plays the key role in ranking the scientist. Less
standard deviation results better performance of a scientist.
Table 5.4: Result of scientists with h-index 16
Vindex
V-index
Time
Normalized
Author
Research
Age
(yrs)
Scientist
Name
hindex
Standard
Deviation
( )
M Khalid
16
19.0702733246098
84
3
32
SY Chung
16
19.631215331319
82
14
6
R
Bhattacharya
16
20.3232498080061
79
3
27
M Heinz
16
32.6375875519641
49
12
4
V-Index: An Index Based on Consistent Researcher Productivity
36
Chapter 5
Results and Analysis
Again when we introduce time normalized v-index, it impacts the ranking of scientists now SY
Chung is placed higher than M Khalid even M Heinz got higher position from M Khalid.
5.3- Scientists with same g-index
In our second case we have found multiple scientists with same g-index value. In this
section we have extracted the data of 5 scientists with g-index value of 25 and then we have
applied our v-index method to rank these scientists properly. In Table 7(a) all the scientists with
their calculated sum of citations are mentioned. Highlighted row in the table shows the g-index
value.
Table 5.5: Scientists with g-index value 25
L Egghe
SY Chung
Cit
Cit
Pu
Citat
Pu
∑C
bR
ions
bR2
i
1
75
1
75
74
1
74
2
44
4
119
74
4
3
43
9
162
67
4
41
16
203
5
32
25
6
30
7
T Syeda-Mahmood
Cit
∑C
bR2
i
ons
2
110
1
110
92
1
92
126
1
126
148
49
4
159
60
4
152
83
4
209
9
215
44
9
203
58
9
210
73
9
282
42
16
257
41
16
244
57
16
267
70
16
352
235
42
25
299
40
25
284
56
25
323
46
25
398
36
265
37
36
336
35
36
319
49
36
372
40
36
438
27
49
292
37
49
373
30
49
349
39
49
411
39
49
477
8
26
64
318
30
64
403
26
64
375
23
64
434
38
64
515
9
25
81
343
29
81
432
26
81
401
20
81
454
22
81
537
10
24
100 367
19
100
451
21
100
422
17
100
471
21
100
558
11
23
121 390
19
121
470
20
121
442
16
121
487
11
121
569
12
21
144 411
18
144
488
20
144
462
15
144
502
10
144
579
13
21
169 432
18
169
506
19
169
481
14
169
516
8
169
587
14
21
196 453
17
196
523
17
196
498
13
196
529
8
196
595
ns
bR2
∑Ci
atio
ns
V-Index: An Index Based on Consistent Researcher Productivity
Citati PubR
M Mahmood
Pu
atio
Pu
R Bhattacharya
∑Ci
atio
ns
Pu
bR2
37
∑Ci
Chapter 5
Results and Analysis
15
19
225 472
17
225
540
16
225
514
11
225
540
8
225
603
16
19
256 491
15
256
555
15
256
529
11
256
551
6
256
609
17
19
289 510
15
289
570
14
289
543
11
289
562
5
289
614
18
17
324 527
9
324
579
12
324
555
10
324
572
4
324
618
19
17
361 544
9
361
588
12
361
567
10
361
582
4
361
622
20
16
400 560
8
400
596
11
400
578
10
400
592
3
400
625
21
15
441 575
8
441
604
10
441
588
9
441
601
3
441
628
22
15
484 590
7
484
611
10
484
598
9
484
610
3
484
631
23
15
529 605
7
529
618
10
529
608
7
529
617
3
529
634
24
13
576 618
6
576
624
9
576
617
7
576
624
2
576
636
25
13
625 631
6
625
630
9
625
626
7
625
631
2
625
638
26
13
676 644
5
676
635
9
676
635
6
676
637
2
676
640
27
13
729 657
5
729
640
9
729
644
6
729
643
2
729
642
28
12
784 669
4
784
644
7
784
651
5
784
648
2
784
644
Figure 4(a) shows the chart of the citation distributions of the respective publication of the given
scientists. Chart is drawn against the values given in table 7(a) and it also clearly show the
difference and inconsistency of different scientists for their received citations. This inconsistency
is unseen by the existing g-index. To handle this weakness of g-index we have applied our
proposed v-index to evaluate the work of scientists.
Results of the experiments are shown in Table 11. Our proposed v-index has found the
dissimilarity between scientists which g-index was unable to determine. New ranking of the
scientist is given in table 11. Previous methods are unable to consider the consistency in received
citations. Experiment showed that consistency has play major role to rank the scientists.
V-Index: An Index Based on Consistent Researcher Productivity
38
Chapter 5
Results and Analysis
Figure 5.3: Chart of scientists with g-index 25
Highlighted row displays the most consistent researcher or scientist among the given scientists.
Table 5.6: Result of scientists with g-index 25
Scientist Name
L Egghe
SY Chung
R
Bhattacharya
T SyedaMahmood
M Mahmood
gindex
25
25
25
25
25
Vindex
V-index
Time
Normalized
Author
Research
Age
(yrs)
11.2829554969327
222
6
35
19.631215331319
127
21
6
20.3232498080061
123
5
27
20.4859566192221
122
6
20
29.292154291385
85
3
27
Standard
Deviation
( )
V-Index: An Index Based on Consistent Researcher Productivity
39
Chapter 5
Results and Analysis
In the above table we can see that scientist ranking has different sets with normal v-index and vindex with time normalization. L Egghe is ranked higher in normal v-index while with time
normalization SY change is placed at the top.
Another set of scientist is found in our dataset with same g-index. In this set two
scientists are present with g-index value of 21. Table 7(b) shows the calculated cumulative sum
of the received citations of individual scientist and g-index value is shown in the highlighted row
of the table. We have applied our proposed method on this set of data to find the most consistent
scientist regarding the received citations record.
Table 5.7: Scientists with g-index value 21
BB Bhattacharya
Pub R
Citati
ons
PubR2
∑Ci
P Dev
Citati
PubR
ons
2
∑Ci
1
65
1
65
108
1
108
2
62
4
127
37
4
145
3
41
9
168
37
9
182
4
27
16
195
32
16
214
5
26
25
221
25
25
239
6
23
36
244
23
36
262
7
18
49
262
22
49
284
8
17
64
279
20
64
304
9
17
81
296
19
81
323
10
16
100
312
18
100
341
11
16
121
328
14
121
355
12
15
144
343
13
144
368
13
14
169
357
13
169
381
14
14
196
371
12
196
393
15
13
225
384
12
225
405
16
13
256
397
10
256
415
17
13
289
410
7
289
422
V-Index: An Index Based on Consistent Researcher Productivity
40
Chapter 5
Results and Analysis
18
13
324
423
7
324
429
19
13
361
436
7
361
436
20
12
400
448
7
400
443
21
12
441
460
7
441
450
22
12
484
472
6
484
456
23
12
529
484
6
529
462
24
12
576
496
6
576
468
25
12
625
508
5
625
473
Figure 4(b) show the chart displaying the graphical representation of the data given in table 7(b).
Chart clearly depicts that scientist BB Battacharya is more consistent than P Dev whereas
existing g-index has placed both scientists on the same rank neglecting the consistency of
received citations. With our new index the v-index we are able to find the more contributing
scientists among them.
Figure 5.4: Chart of scientists with g-index 21.
Table 9 show the calculated results which shows that scientist BB Bhattacharya is more
consistent than the other. Hence our proposed v-index has placed is on higher rank whereas
V-Index: An Index Based on Consistent Researcher Productivity
41
Chapter 5
Results and Analysis
existing g-index was unable to find the difference and both scientist has same g-index rank.
Highlighted row shows the more consistent scientist in both case normal and with time
normalization.
Table 5.8: Result of scientists with g-index 21
Scientist
Name
Vindex
V-index
Time
Normalized
Author
Research
Age
(yrs)
14.7586212570298
142
3
42
12.3199223944341
171
6
27
Standard
Deviation
( )
gindex
21
P Dev
BB
Bhattacharya
21
5.4- Scientists with same h and g-index
This is the third and last case in which we have consider the situation which rarely exist
that if two are more scientists have same h-index and g-index values. Fortunately we have found
the case in our data set. There are two scientists A Dev and DP Chakraborty with same h and g
indices. Both scientists have 7 h-index value and 15 g-index value. This the best real data
example that both of the existing indices collectively were unable to rank the scientists whereas
after applying our v-index method on the base of consistency we have found the rank difference
between these two scientists. Table 10 shows the citation distribution of both scientists with
calculated h and g indices in the highlighted rows.
Table 5.9: Scientists with same h and g indices
A Dev
Pub R
Citati
ons
DP Chakraborty
PubR2
∑Ci
Citati
PubR
ons
2
∑Ci
1
57
1
57
138
1
138
2
45
4
102
26
4
164
3
41
9
143
19
9
183
4
22
16
165
9
16
192
V-Index: An Index Based on Consistent Researcher Productivity
42
Chapter 5
Results and Analysis
5
21
25
186
8
25
200
6
9
36
195
7
36
207
7
9
49
204
7
49
213
8
6
64
210
5
64
218
9
5
81
215
4
81
222
10
5
100
220
4
100
226
11
5
121
225
3
121
229
12
5
144
230
2
144
231
13
5
169
235
2
169
233
14
4
196
239
1
196
234
15
4
225
243
1
225
235
16
4
256
247
1
256
236
17
3
289
250
1
289
237
18
3
324
253
1
324
238
19
2
361
255
0
361
238
20
2
400
257
0
400
238
Figure 5 shows the graphical view of received citations with respect to the publications of
both scientists. Chart shows the in consistency of received citations and difference between both
scientists which existing method were even collectively unable to find. Our applied methodology
founds the most consistent scientist effectively.
Table 11 shows the results after processing the data of both scientists by using v-index
methodology. We have found the clear difference between the scientists with respect to
consistency in received citations. In the case V-index outperforms over the existing h and g
indices.
V-Index: An Index Based on Consistent Researcher Productivity
43
Chapter 5
Results and Analysis
Figure 5.5: Chart of scientists with same h and g indices
Table 5.10: Results of scientists with same h and g indices
Scientist
A Dev
Indices
(h,g)
Standard
Deviation ( )
V-index
V-index
Time
Normaliz
ed
H
7
13.1708854000869
53
2
G
15
13.1708854000869
114
5
H
7
25.6166798339341
27
1
G
15
25.6166798339341
59
3
Author
Research
Age
(yrs)
23
DP
Chakraborty
23
V-Index: An Index Based on Consistent Researcher Productivity
44
Chapter 5
Results and Analysis
5.5- Summary
In this chapter analysis on the calculated results has been performed. We have taken three
cases and calculated the results by performing index calculation on the dataset. We have
calculated v-index and v-index with time normalization factor. In first case scientist with same h
index values are considered and it is observed that v-index and v-index time normalized has
performed better and ranked the scientist properly in term of consistency. Likewise in other case
with same g index and with same g and h index number of scientists are considered and we have
applied both versions of v-index with time and without time normalization and it has major
impact on scientist ranking. During result analysis it has been found that scientist ranking get
disturbed when we apply time normalization factor. Time factor changes the order of the ranking
on can see the highlighted rows in the results table.
V-Index: An Index Based on Consistent Researcher Productivity
45
Chapter 6
Chapter 6
Research Contributions and Conclusions
Research Contribution and
Conclusions
V-Index: An Index Based on Consistent Researcher Productivity
46
Chapter 6
6.1
Research Contributions and Conclusions
Research Contribution
The major contributions of this work are (1) highlighting the importance of consistent
citations of publication for researcher productivity indexing (2) a proposal of a simple method
for calculating variation among citations received by the papers of a researcher. To the best of
our knowledge this is the first work which considers citations variation of papers for researcher
productivity indexing.
6.1.1 Productivity and Efficiency with Consistency
Highlighting the importance of consistency in term of received citations is the major
contribution of our research work. We have shown that the how consistency plays an important
role to rank scientists and find out the productivity and efficiency in his published work. With
addition of adding scientist research age with v-index has produced great results.
6.1.2 Proposal of Simple Method
In our research we have proposed simple method for the calculation of ranking index for
scientist’s work. It is very easy to apply on any data set and it can be used with any existing
indexes. Its main purpose is to add consistency and efficiency factor to existing indexes finding
new index value. We have proven our results by applying on existing h and g indexes.
6.2
Conclusions
Existing methods for indexing researchers or groups are not considering very important
factor of their consistent productivity. The addition of consistent productivity factor in terms of
citation variation of researcher papers is novel as researcher with more consistent citation record
is more productive. The idea of consistent productivity is quite general and can be applied to all
existing researcher productivity indexing methods by simply dividing their values by citation
variation value calculated through standard deviation same as we did here in case of h and g
indexes values. The time factor normalization like m-quotient for research career length [6] can
be easily performed for citation variation by considering the Research Age of an author or
scientist. This time factor indulging has improved the results and shown the more efficient
scientist to achieve the certain ranking in short period of time.
We can improve it by considering the year wise citations of each publication. This is an
V-Index: An Index Based on Consistent Researcher Productivity
47
Chapter 6
Research Contributions and Conclusions
open enhancement to rank the scientist through its career growth by receiving the citations yearly
for specific publication. This will show the quality of work contributions of a scientist if
publication is receiving more and more citations every year.
V-Index: An Index Based on Consistent Researcher Productivity
48
References
References
[1] Adler, R., Ewing, J. and Taylor, P. (2008) “Citation Statistics” A report from the
International Mathematical Union (IMU).
[2] Alonso, S., Cabrerizo, F.J., Herrera-Viedma, E. and Herrera, F. (2010) “hg-index: A new
index to characterize the scientific output of researchers based on the h- and g- indices.”
Science metrics, Vol. 82(2), pp. 391-400.
[3] Bouyssou, D. and Marchant, T. (2010) “Consistent Bibliometric rankings of authors and of
journals.” Journal of Informetrics, Vol. 4, pp, 365-378.
[4] Braun, T., Glänzel, W. and Schubert, A. (2006) “A Hirsch-type index for journals.” Jointly
published by Akadémiai Kiadó, Budapest Scientometrics, and Springer, Dordrecht., Vol.
69(1), pp. 169–173.
[5] Burrell, Q.L. (2007) “Hirsch’s h-index: a stochastic model”. Journal of Informatics, Vol.
1(1), pp. 16–25.
[6] Burrell, Q.L. (2007) “On the h-index, the size of the Hirsch core and Jin’s A-index.” Journal
of Informetrics, Vol. 1(2), pp. 170–177.
[7] Cabrerizo, F.J., Alonso, S., Herrera-Viedma, E. and Herrera, F. (2009) “q2-Index:
Quantitative and Qualitative Evaluation Based on the Number and Impact of Papers in the
Hirsch Core.” Journal of Informatics, Vol. 4(1), pp. 23-28.
[8] Costas, R. and Bordons, M. (2008) “Is g-index better than h-index? An exploratory study at
the individual level”, Jointly published by Akadémiai Kiadó, Budapest and Springer,
Dordrecht, Scientometrics, Vol. 77(2), pp. 267–288.
[9] Egghe, L. (2006) “Theory and Practice of the g-index.” Jointly published by Akadémiai
Kiadó, Budapest and Springer, Dordrecht; Scientometrics, Vol. 69(1), pp. 131–152.
[10] Egghe, L. (2006) “An Improvement to H-index: The G-index”. ISSI News-Letter, Vol. 2(1)
pp. 8-9.
[11] Egghe, L. and Rousseau, R. (2008) “An h-index weighted by citation impact.” Information
Processing and Management, Vol. 44(2), pp. 770-780.
[12] Egghe, L. and Rousseau, R. (2006) “An informetric model for the Hirsch-index” Jointly
published by Akadémiai Kiadó, Budapest and Springer, Dordrecht. Scientometrics, Vol.
69(1), pp.121–129.
[13] Garfield, E. (2001) “Impact factors, and why they won’t go away.” Nature, Vol. 411(6837),
pp. 522–522.
V-Index: An Index Based on Consistent Researcher Productivity
49
References
[14] Glänzel, W. (2006) “On the h-index – a mathematical approach to a new measure of
publication activity and citation impact.” Scientometrics, 67, pp. 315-321.
[15] Hirsch, J. E. (2005) “An index to quantify an individual research output.” Proceedings of
the National Academy of Sciences of the United States of America, Vol. 102, pp. 16569–
16572.
[16] Jin, B.h., Liang, L.M., Rousseau, R. and Egghe, L. (2007) “The R- and AR- indices:
Complementing the h-index.”, Chinese Science Bulletin, Vol. 52(6), pp. 855-863.
[17] Kosmulski, M. (2006) “A new hirsch-type index saves time and works equally well as the
original h-index”. International Society for Scientometrics and Informetrics (ISSI).
[18] Liang, L. (2006) “h-index sequence and h-index matrix: Constructions and applications.”
Jointly published by Academia Kiadó, Budapest and Springer, Dordrecht, Scientometrics,
Vol. 69(1), pp. 153–159.
[19] Rousseau, R. (2006) “Simple models and the corresponding h and g-indexes.”
http://hdl.handle.net/1942/944.
[20] Sidiropoulos, A., Katsaros, D. and Manolopoulos, Y. (2007) “Generalized Hirsch h-index
for disclosing latent facts in citation networks” Jointly published by Akadémiai Kiadó,
Budapest and Springer, Dordrecht Scientometrics, Vol. 72(2), pp. 253–280.
[21] M Jagdesh Kumar.(2011)“Evaluating Scientists: Citations, Impact Factor, h-Index, Online
Page Hits and What Else?” published in
http://mamidala.wordpress.com/2011/07/10/25/
[22] Visual Studio 2008, http://en.wikipedia.org/wiki/Microsoft_Visual_Studio.
[23] Editorial Science Direct. (2007) “What is a Problem statement”. Library & Information
Science Research 29, pp.307–309.
[24] Garfield, E. (1955) “Citation Indexes for Science: A New Dimension in Documentation through
Association of Ideas”. Science, Vol:122, No:3159, p. 108-111
[25] Garfield, E. (1973) “Citation Frequency as a Measure of Research Activity and Performance”.
Published in Essays of an Information Scientist Vol.1, pp. 406-408
[26] Article on Standard deviation in Wikipedia. http://en.wikipedia.org/wiki/Standard_deviation
[27] Article on standard deviation and Variance. http://www.managers-net.com/stddev.html
V-Index: An Index Based on Consistent Researcher Productivity
50