Author Productivity Indexing via Topic Sensitive

Author Productivity Indexing via Topic Sensitive
Weighted Citations
Submitted By:
Shabnam Bibi
589-FBAS/MSCS/F09
Supervised By:
Dr. Ali Daud
Assistant Professor
Department of Computer Science and Software Engineering
Faculty of Basic and Applied Sciences
International Islamic University, Islamabad
Author Productivity Indexing via Topic Sensitive
Weighted Citations
Dissertation submitted in partial fulfillment of requirements
for the award of degree of MS in Computer Science
Department of Computer Science & Software Engineering
Faculty of Basic and Applied Sciences
International Islamic University, Islamabad
By:
Shabnam Bibi
October 2013
Dedicated to my beloved grandfather and my
Parents
Department of Computer Science and Software Engineering,
International Islamic University Islamabad, Pakistan
Date:
/ /2013
Final Approval
This is to certify that we have read and evaluated the thesis entitled “Author Productivity
Indexing via Topic Sensitive Weighted Citations” submitted by Shabnam Bibi under Reg No.
589-FBAS/MSCS/F09. It is our judgment that this thesis is of sufficient standard to warrant its
acceptance by International Islamic University, Islamabad for the degree of MS in Computer
Science.
Committee
External Examiner
Dr. Waseem Shahzad
____________________________________
Assistant Professor
FAST – National University of Computer and Emerging Sciences
Islamabad
Internal Examiner
Miss Zakia Jalil
Lecturer
International Islamic University Islamabad
Supervisor
Dr. Ali Daud
Assistant Professor
International Islamic University Islamabad
_____________________________________
_____________________________________
Declaration
I hereby certify that the work presented in this thesis is, to the best of my knowledge and belief,
contains no previously published or written material by another person, except where due
reference has been made in the text, and that the material has not been submitted, either in whole
or in part, for a degree at this or any other university.
It is further declared that the work was done under the dexterous guidance of my supervisor Dr.
Ali Daud. I acknowledge that I have read and understood the University’s rules, requirements,
procedures and policy relating to my higher degree research award and to my thesis. I certify that
I have compiled with the rules, requirements, procedures and policy of the University (as they
may be from time to time).
Name: _______________________________________
Signature: ____________________________________
Date: ________________________________________
ACKNOWLEDGMENT
In the name of Allah, Most Gracious, Most Merciful
All praises to the Allah Almighty whose Divine help and guidance enabled me to accomplish
this tedious task. His blessings and perpetual succor have always surrounded me amidst the
darkness and helplessness despite my disobedience and ignorance.
Besides, it gives me pleasure to acknowledge the paternal affection of my supervisor Dr. Ali
Daud for his continuous advise, support and encouragement throughout this work. It was due to
his erudite and creative guidance that I decided to write my thesis on (Author Productivity
Indexing via Topic Sensitive Weighted Citations). It is due to his consistent encouragement and
immense help that enabled me to complete this work. I am grateful to Department of Computer
Science and Software Engineering IIU Islamabad and faculty members for providing healthy
environment for research.
I pay my heart-felt gratitude to my teachers as well as fellow graduate students for their
continuous motivational support.
My dearest friends Miss Saba Gul, Miss Nodia Athar, Miss Nabila Naz, Miss Ambreen Fatima
and Miss Saima Tariq truly deserve my special thanks for their selfless support for me which is
not less than a blessing in this world of chaos.
Finally I am perpetually obliged to my parents, my grandfather and whole family. Their endless
support encouragement and motivation have been a true source of strength and inspiration for
me.
Abstract
Different author productivity indexing methods have been proposed in order to rank scientists on
the basis of their research work. Some of these methods have used scientist’s publications with
total number of citations received by these publications for ranking purpose e.g. h-index. Some
methods have assigned weighted citations to authors in multi-authored paper according to their
contribution in the paper. As there have no ground truth about the actual contribution of authors
so the contribution of authors has been judged by their rank in the paper e.g. kth –rank index. The
author productivity indexing methods proposed so far have not considered the topic based
contribution of authors for assigning them weighted citations in a multi-authored paper. We have
proposed two methods to deal with this limitation, First method is NWC-index that has assigned
Normalized Weighted Citations (NWC) score to co-authors of a paper according to their rank by
dividing its total citations among them. Second method is TSWC-index that has assigned Topic
Sensitive Weighted Citations to authors of a paper according to their topic relatedness. Topic of
co-authors in each paper against its first author has been checked and if they have same topic
then their Normalized Weighted Citations score has increased and if they do not have same topic
like the first author then their Normalized Weighted Citations score has decreased. We have used
h-index and kth-rank index as our baseline methods and compared the results of our proposed
methods with baseline methods. The results of our proposed methods clearly show the difference
among author’s full citations score, weighted citations score and topic sensitive weighted
citations score.
Table of Contents
Chapter 1 ............................................................................................................................................... 1
1. Introduction ......................................................................................................................................... 2
1.1. Author Productivity Indexing ........................................................................................................ 2
1.2. Why we use author productivity indexing ..................................................................................... 3
1.3. Weighted Citation ......................................................................................................................... 4
1.4. Topic sensitive weighted citation (our idea)................................................................................... 4
1.5. Research Contribution................................................................................................................... 4
1.6. Thesis Outline ............................................................................................................................... 5
Chapter 2 ............................................................................................................................................... 6
2. Literature Review ................................................................................................................................ 7
2.1. Classification of Author Productivity Indexing .............................................................................. 8
2.1.1. Concept Matrix of the Indexing Methods ................................................................................ 9
2.1.2. Citations Based Indexing ...................................................................................................... 10
2.1.3. High Citations Based Indexing ............................................................................................. 11
2.1.4. Time Based Indexing ........................................................................................................... 12
2.1.5. Excess Citations Based Indexing .......................................................................................... 12
2.1.6. Co-authors and Weighted Citations Based Indexing.............................................................. 13
2.1.7. Topic Based Indexing........................................................................................................... 18
2.2. Problem Statement ...................................................................................................................... 18
2.3. Objective of Research ................................................................................................................. 19
Chapter 3 ............................................................................................................................................. 20
3. Methodology ..................................................................................................................................... 21
3.1. Baseline Methods........................................................................................................................ 21
3.1.1. h-index ................................................................................................................................. 21
3.1.2. kth-rank Index ...................................................................................................................... 21
3.2. Proposed Method ........................................................................................................................ 22
3.3. Latent Dirichlet Allocation .......................................................................................................... 26
Chapter 4 ............................................................................................................................................. 30
4. Experiments ...................................................................................................................................... 31
4.1. Data set....................................................................................................................................... 31
4.2. Development Tools and Programming language.......................................................................... 31
4.3. Parameter settings ....................................................................................................................... 32
4.3.1. Parameters for LDA ............................................................................................................. 32
4.3.2. Parameter for h-index ........................................................................................................... 32
4.4. Baseline Methods........................................................................................................................ 32
4.4.1. H-index ................................................................................................................................ 32
4.4.2. Kth-rank index ...................................................................................................................... 32
4.5. Results and Discussions .............................................................................................................. 32
4.5.1. H-index ................................................................................................................................ 33
4.5.2. Kth-rank index ...................................................................................................................... 34
4.5.3. Proposed methods NWC-index and TSWC-index ................................................................. 35
4.5.4. Scenarios of NWC-index ...................................................................................................... 40
4.5.5. Scenarios of TSWC-index .................................................................................................... 47
4.5.6. Comparison of TSWC-index and NWC-index ...................................................................... 53
Chapter 5 ............................................................................................................................................. 59
5. Conclusions ....................................................................................................................................... 60
References............................................................................................................................................. 61
List of Tables
Table 2.1: Concept Matrix ....................................................................................................................... 9
Table 3.1: NWC and TCWC scores of authors having one not same topic author ................................... 25
Table 3.2: NWC and TSWC scores of authors of more than one not same topic authors ......................... 25
Table 3.3: NWC and TSWC scores of same topic authors in paper......................................................... 26
Table 3.4: h-index, NWC-index, TSWC-index and kth-rank index .......................................................... 26
Table 4.1: Rank of authors by their h-index ........................................................................................... 33
Table 4.2: Rank of authors by their kth-rank index .................................................................................. 34
Table 4.3: Rank of authors by their NWC-index .................................................................................... 35
Table 4.4: Rank of authors by their TSWC-index ................................................................................... 36
Table 4.5: Rank of authors by their TSWC-index and variations with NWC-index ................................. 38
Table 4.6: Position relocation with respect to h-index: Position up ......................................................... 40
Table 4.7: Position relocation with respect to h-index: Position down .................................................... 42
Table 4.8: Position stable with respect to h-index ................................................................................... 43
Table 4.9: Position relocation with respect to kth-rank index: Position up ............................................... 44
Table 4.10: Position relocation with respect to kth-rank index: Position down ......................................... 45
Table 4.11: Position stable with respect to kth-rank index ....................................................................... 46
Table 4.12: Position relocation with respect to h-index: Position up ....................................................... 47
Table 4.13: Position relocation with respect to h-index: Position down .................................................. 48
Table 4.14: Position stable with respect to h-index ................................................................................. 49
Table 4.15: Position relocation with respect to kth-rank index: Position up ............................................. 50
Table 4.16: Position relocation with respect to kth-rank index: Position down ......................................... 51
Table 4.17: Position stable with respect to kth-index ............................................................................... 52
Table 4.18: Position relocation with respect to NWC-index: Position up ................................................ 54
Table 4.19: Position relocation with respect to NWC-index: Position down ........................................... 55
Table 4.20: Position stable with respect to NWC-index .......................................................................... 57
List of Figures
Figure 2.1: Classification of Author Productivity Indexing....................................................................... 8
Figure 3.1: Latent Dirichlet allocation.................................................................................................... 28
Figure 4.1: Comparison of h-index, kth-rank index, NWC-index and TSWC-index ................................. 39
Figure 4.2: Comparison of h-index with NWC-index ............................................................................. 40
Figure 4.3: Scenario 1: Position up with respect to h-index .................................................................... 41
Figure 4.5: Scenario 3: Position stable with respect to h-index ............................................................... 43
Figure 4.6: Scenario 1: Position up with respect to kth-rank index ........................................................... 44
Figure 4.7: Scenario 2: Position down with respect to kth-rank index ...................................................... 45
Figure 4.9: Scenario 1: Position up with respect to h-index .................................................................... 48
Figure 4.11: Scenario 3: Position stable with respect to h-index ............................................................. 50
Figure 4.12: Scenario 1: Position up with respect to kth-rank index ......................................................... 51
Figure 4.13: Scenario 2: Position down with respect to kth-rank index .................................................... 52
Figure 4.14: Scenario 3: Position stable with respect to kth-rank index.................................................... 53
Figure 4.15: Comparison of TSWC-index with NWC-index .................................................................. 54
Figure 4.16: Scenario 1: Position up with respect to NWC-index ........................................................... 55
Figure 4.17: Scenario 2: Position down with respect to NWC-index ....................................................... 56
Figure 4.18: Scenario 3: Position stable with respect to NWC-index ...................................................... 58
Chapter 1
Introduction
_____________________________________________________________________________________________
Author Productivity Indexing Via Topic Sensitive Weighted Citations
1
Chapter 1
Introduction
1. Introduction
Researcher’s success in any field is judged by his/her productivity and its impact.
Productivity is the number of papers a researcher has published. It is the quantitative aspect of
research. Impact is the number of citations that the publications have received so it is the
qualitative aspect of research. The research work produced by researchers are published in
different journals and presented in conferences. This research work either published or
unpublished is acknowledged by others in their work as a reference and get cited.
1.1. Author Productivity Indexing
To know about prominent researchers, to measure the performance of individuals in
research and to rank journals and conferences, a method of evaluation is needed. This method of
evaluation is known as author productivity indexing.
Different author productivity indexing methods have proposed to fairly evaluate the
research productivity of researchers and removed the limitations of existing methods. At the
start, Impact Factor abbreviated as IF [18] was used as a tool to quantify the research work
produced. It is used to compare, evaluate and rank journals instead of ranking and evaluating
researchers. A journal’s IF is the average number of citations received by papers published in
that journal in the previous two years. For comparing individual papers, people use impact factor
of that journal in which the paper has published. It does not mean that a paper with high IF will
receive high citations as well so this is not fair ranking of individual papers. Some methods are
used for author productivity indexing which only consider the total number of publications
published by researchers. Thus they only deal with the quantitative aspect of research because
only the productivity is considered and the impact of publications is not considered. Some
methods consider the total number of citations received by the papers that measures the impact.
Other indexing methods have proposed for evaluating the scientific performance of
individuals, comparing researchers in same field and in different fields and for ranking them
directly. Different techniques have proposed for this purpose in which different aspects of
research work have considered. Almost all of techniques do consider number of papers published
by researcher and total number of citations received by those papers to evaluate a scientist
research performance. When citations are counted then different conditions are kept in
_____________________________________________________________________________________________
Author Productivity Indexing Via Topic Sensitive Weighted Citations
2
Chapter 1
Introduction
observation like in case of one author paper, the author will get all of the credit. The average
number of authors on scientific papers is increasing because complicated problems need more
different subspecialties [25]. In case when multiple authors have contributed to a paper then
some techniques are needed to assign them credit according to their contributions.
One of the well-known indexing methods named h-index [20] was proposed in 2005 that
is a single valued index, used for evaluating the scientific performance of researchers. It
measures the total number of papers and total number of citations received by those papers. Hindex was insensitive towards highly cited papers [15], [26] so g-index [15][16] and h(2)-index
[26] were proposed later that were an enhancement to h-index and had removed its limitation of
insensitivity towards highly cited papers. Different variations to the h-index and g-index were
proposed later to overcome some of their limitations and add improvements like A-index [22], R
and AR-indices [23][24], m-index [9], e-index [35], k and w [5] etc. Flaw of these author
productivity indexing is that they all assign the total citations of a paper to each of its author in
case of multi-authored paper even the contribution of all authors in a paper are not same. To
remove this limitation, some techniques were proposed that consider number of collaborators
that worked together and assigned them credit according to their contributions (by considering
different criteria) like hI-index [7], fractional h and g indices [17], hp-Index [34], hap-index [13],
hm-index [29][30], harmonic h-index [19], kth-rank [33], w [36], gm-index [31], ĥ-index [21],
CCA h and g indices [27], hmc [28], k-norm and w-norm [5] etc. Some techniques were proposed
to consider researcher’s career length like m-quotient [10]. Some indices based on the
combination of existing indices like hg-index [4] and q2-index [11] were proposed to keep
advantages of them collectively and remove their disadvantages.
1.2. Why we use author productivity indexing
Author productivity indexing is a good method of ranking researchers for the following
reasons.

To hire good faculty in universities and institutes.

To find paper reviewers for conferences and journals.

To find experts for projects reviewing committee.

To find good researchers for collaboration.
_____________________________________________________________________________________________
Author Productivity Indexing Via Topic Sensitive Weighted Citations
3
Chapter 1
Introduction
There may be the case that a researcher receives noble prize by publishing just one paper with
exceptional performance in his/her field i.e. noble prize winner work will not be considered to
evaluate using these indices [20].
1.3. Weighted Citation
Weighted citation is a quantitative scheme that is used to measure contribution of an
author or a researcher in a multi-authored paper. A weighted criteria is used to assign citations of
a paper to authors of that paper. The citations of paper are allocated to authors based on their
contribution. Different Indexing methods have proposed which account for the number of coauthors and collaborators and assign weighted citations and credit to authors according to their
contribution.
1.4. Topic sensitive weighted citation (our idea)
The weighted criteria of contributions do not assign weights to the researchers according
to their relatedness to that topic. Topic sensitive weighted citation means that weighted citations
are assigned to authors of a paper according to their topic relatedness. For example if a paper on
a topic named machine learning has four authors A1, A2, A3, A4 and it has cited 50 times.
Author A1, A2 and A4 have the same topic of machine learning and author A3 has another topic
instead of machine learning. In this case author A3 has limited knowledge and contribution to
this topic as compare to A1, A2, and A4 so 50 citations must be divided among these authors in
such a way that author A1, A2 and A4 can get maximum citations according to their rank in the
paper and author A3 can get minimum number of citations as he/she has not this topic.
1.5. Research Contribution
By studying the indexing methods for assessing the research work of scientists, we have
noticed that all of these methods did not consider topic relatedness of authors for assigning them
weighted citations of a paper. We have decided to overcome this limitation in our research. Our
contribution is to assign Topic Sensitive Weighted Citations to authors in multi-authored papers.
We have assigned weighted citations to authors of a paper by considering topic sensitivity as a
key factor for evaluating researchers work.
The proposed index has then calculated for
researchers and the results have compared with our baseline methods.
_____________________________________________________________________________________________
Author Productivity Indexing Via Topic Sensitive Weighted Citations
4
Chapter 1
Introduction
1.6. Thesis Outline
Rest of our thesis is arranged in the following manner.
Chapter 2: In chapter 2, literature review of the existing author productivity indexing methods
have given in detail. Their contribution and drawbacks have discussed. After literature review of
the indexing methods used for assessing the research work, a problem statement is formulated on
the basis of these methods and objective of research is discussed.
Chapter 3: In this chapter, we have described in detail the methodology used for our proposed
solution and baseline methods.
Chapter 4: In this chapter, we have discussed our data set and the experiment that we have done
on the data set. We have discussed the results of our proposed idea also. We have compared the
results with baseline methods.
Chapter 5: In this chapter, conclusion of our research is presented. Research contribution and
future work is discussed.
_____________________________________________________________________________________________
Author Productivity Indexing Via Topic Sensitive Weighted Citations
5
Chapter 2
Literature Review
_____________________________________________________________________________________________
Author Productivity Indexing Via Topic Sensitive Weighted Citations
6
Chapter 2
Literature Review
2. Literature Review
Many techniques have been proposed for the evaluation of scientific performance of
researchers. We have studied a number of research papers of different researchers in which
different techniques and indices have proposed for ranking of researchers on the bases of their
productivity and impact. As discussed in chapter 1, all these indices and techniques are author
productivity indexing. Author productivity indexing methods use publications of authors as a
base for assessment of their research work. In this chapter, these indices and techniques are
discussed in detail along with their strong points and weaknesses.
At the start, Impact Factor was used. Impact Factor abbreviated as IF [18] was proposed by
Eugene Garfield in 1960. It was used for selection of journals for Science Citation Index (SCI) in
order to compare and rank journals. A journal’s IF is the average number of citations received by
papers published in that journal in the previous two years. If a journal has IF of 4 in 2013 then it
means that each paper published by it in 2011 and 2012 received 4 citations on average in 2013.
Journal with high IF is considered more significant.
IF of 2013 will be calculated as:
Total number of citations of papers during 2013 which have published in 2011 and 2012 will
be divided by the total number of papers published in 2011 and 2012.
Different types of problems are associated with IF [32], like it does not statistically represent
the individual articles of journal, their real citations, citations on the average are assigned to noncitable articles and many more as well. Same score i.e. journal impact factor is assigned to all
articles of journal if some of them received more citations, some received less and to those which
did not receive any citation at all. Thus an article published in high IF journal with less number
of citations is considered more effective as compared to the one published in low IF journal with
more citations.
People use impact factor of journal in which the paper has published for comparing
individual papers. Using IF to compare individual papers is a dangerous misuse and when it is
realized that to substitute IF for individual article, citation counts makes no sense then it follows
that using IF to evaluate the authors of those articles makes no sense [3]. Some indexing methods
_____________________________________________________________________________________________
Author Productivity Indexing Via Topic Sensitive Weighted Citations
7
Chapter 2
Literature Review
consider only the total number of publications published for evaluating the research work of
researchers and ignored the impact of publications. Some consider the citations received by
publications thus they measure the impact of publications.
2.1. Classification of Author Productivity Indexing
We have categorized author productivity indexing methods according to the aspects
discussed there. The categorization is as follows:
Author Productivity
Indexing
Citations
High
Citations
Co-authors and
Weighted Citations
Figure 2.1:
Excess
Citations
Time
Based
Topic
Based
Classification of Author Productivity Indexing
Figure 2.1 briefly demonstrates the indexing methods we have studied and included in
our literature review part. We have categorized these methods in different classes on the basis of
different aspects discussed. These indexing methods have made improvements in the existing
indexing methods from time to time.
_____________________________________________________________________________________________
Author Productivity Indexing Via Topic Sensitive Weighted Citations
8
Chapter 2
Literature Review
2.1.1. Concept Matrix of the Indexing Methods
Table 2.1: Concept Matrix
Index
h-index [20]
g-index [15][16]
h(2)-index [26]
hI-index [7]
m-quotient [10]
A-index [22]
R-index [24]
AR-index [23][24]
Fractional h and g indices
[17]
hp-Index [34]
hap-index [13]
hm-index [29][30]
Harmonic h-index [19]
m-index [9]
kth-rank [33]
w-index [36]
e-index [35]
gm-index [31]
ĥ-index [21]
hg-index [4]
q2-index [11]
Fractional counting of
authorship [12]
Positional and equal weights
scheme [1]
CCA h and g indices [27]
hmc-index [28]
k and w indices [5]
k-norm and w-norm [5]
WFO, WFI [2]
p-index [6]








High
Citations
×


×
×


×
Co-authors and
Weighted Citations
×
×
×

×
×
×
×
Excess
Citations
×
×
×
×
×
×
×
×
Time
Based
×
×
×
×

×
×

Topic
Based
×
×
×
×
×
×
×
×

×

×
×
×












×
×
×
×

×
×
×
×
×






×


×

×
×
×
×
×
×
×
×
×
×

×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
×

×

×
×
×

×

×
×
×






×
×
×
×
×
×


×



×
×

×
×
×
×
×
×
×
×
×
×
×
×
×
×
×
Citations
The concept matrix shown in table 2.1 shows which indexing method has addressed which of
the aspects. All of the indexing methods use publications as a key factor for assessing the
research work of scientists. Citations received by publications are used in different forms like
high citations count, weighted citations and excess citations count which have shown the
previous indexing methods flaws of not addressing these aspects and have made improvements.
Some methods have involved time factor of researchers to assess the scientists work according to
their scientific age. We have studied these indexing methods with their contributions, limitations
and advantages. We pointed out that none of the indexing method discussed the topic based
weight assigning scheme for authors i.e. if an author in a multi-authored paper has that topic on
which the paper has published and the rest have not that topic then that author get high weighted
_____________________________________________________________________________________________
Author Productivity Indexing Via Topic Sensitive Weighted Citations
9
Chapter 2
Literature Review
score and the others have punished for it and their weighted scores have minimized. We have
proposed TSWC-index to consider topics of authors for assigning weighted citations to them.
2.1.2. Citations Based Indexing
To rank scientists based on their productivity and its impact, different indexing
methods have proposed. An index for evaluating the scientific performance of researchers called
h-index [20] was proposed by Hirsch JE in 2005. Unlike IF, it directly measures productivity
(total publications or papers) and impact (total citations of the papers). The papers are arranged
in descending order of their received citations until the paper rank or paper number N is less than
or equal to the citations received by that paper. The subsequent papers have less or equal
citations each. Hirsch defines h-index as:
A scientist has index h if N of his or her papers have ≥ h citations each and other (N-h)
papers have ≤ h citations each.
It is a single number criteria for evaluating the scientific output of researchers and easy to
understand but this can’t be used to evaluate the work of Noble Prize winner. In h-index, once
the papers are considered for defining h-index then it becomes insignificant whether these papers
go on with more citations or no citations at all. Later, different indexing methods were proposed
which removed its different limitations. Some of the limitations discussed by other authors are: it
is insensitive to highly cited papers [15]. h-index unnoticed the career length of researcher [10].
Zhang CT [35] criticized h-index for two reasons. One for excess citations being ignored and
second is that the h-index is natural number so different researchers may have same h-index. hindex was also criticized for its insensitivity to the number of co-authors and their contributions
[33],[36],[17],[7],[27][30],[34] and [13]. The author who proposed h-index also noticed it and
proposed ℎ-index [21] that takes into account the number of co-authors. This index is defined as
that a scientist has ℎ-index if his/her ℎ papers belong to his/her ℎ core and a paper belongs to ℎ
core if it has at least ℎ citations and that paper also belongs to the ℎ core of each of the co-author
of the paper. Thus one can say that a full credit may be allocated to each author or no credit at all
and senior authors are favored.
_____________________________________________________________________________________________
Author Productivity Indexing Via Topic Sensitive Weighted Citations
10
Chapter 2
Literature Review
2.1.3. High Citations Based Indexing
L. Egghe proposed g-index [15][16] which is more sensitive to highly cited papers. gindex measures the performance of researcher or journal by dealing with the top papers and by
counting the number of citations they received. Its calculation is as simple as that of h-index.
Papers are arranged in descending order of citations received and g-index is the highest number
“g” of papers that jointly received “g2” or more than “g2” citations. If we compare h and g
indices of a researcher then “g” will be greater than or equal to “h”. Another variation of g-index
and h-index was proposed by Kosmulski M [26] known as h(2)-index which gives more weight to
highly cited papers as that of g-index. It is defined as the highest natural number such that his/her
h(2) most cited papers received each at least [h(2)]2 citations. For example if an author has h(2)index of 10 then it means that the author has published at least 10 papers among which each has
cited 100 times. h(2)-index of an author will always less than his/her h-index so it will require less
work for the verification of authorship of the relevant papers, especially in case when different
scientists first and last names are same.
Later, Jin [22] proposed A-index which is the average number of citations of published
papers included in h core. The total number of citations of papers is divided by h to obtain Aindex. A-index and g-index, both do consider the total number of citations and highly cited
papers included in h core thus A-index can be increased if h-index remains same while citations
of papers get increased. Jin et al [24] noticed that calculating A-index, the total number of
citations of papers in h core is divided by h-index hence a better researcher is punished because
of higher h-index. To get rid of this problem Jin et al [24] proposed R-index which is calculated
by taking the square root of the total number of citations of papers in h core.
A variation of A-index called m-index, based on median instead of average was proposed
by Bornmann et al [9]. m-index is the median number of citations of published papers included
in the h-core. Median is calculated by arranging papers in h-core by their decreasing order of
number of citations and then choosing the middle one. The median has chosen because citation
counts distribution is usually skewed.
Some indices based on the combination of existing indices were proposed which combined
their advantages and removed their disadvantages. One of them is hg-index [4]. hg-index is
based on the combination of h-index and g-index and was proposed by Alonso et al [4]. It keeps
_____________________________________________________________________________________________
Author Productivity Indexing Via Topic Sensitive Weighted Citations
11
Chapter 2
Literature Review
their advantages and has removed their disadvantages. hg-index is defined as the geometric mean
of h and g indices. It is good for comparing researchers of same h-index with different number of
papers and citations. It takes highly cited papers into account and also minimizes the impact of
single highly cited papers.
In order to combine the number of papers with impact of papers Cabrerize et al [11]
proposed q2-index. It is the geometric mean of h-index and m-index. h-index gives information
of papers while m-index shows the impact of papers in h-core so the advantages of two indices
are combined. Its computation is simple and it will be increased in case of increase in any one of
the two indices.
2.1.4. Time Based Indexing
Burrel [10] noticed that h-index unnoticed the career length of researcher and
proposed m-quotient which depends on researcher’s career length. A researcher whose career
began some years ago and till now he/she is active. In order to calculate m-quotient, the h-index
is divided by career length i.e. total number of years since published first paper.
After proposing R-index [24], Jin et al [23][24] also proposed age dependent R-index
named AR-index. AR-index is defined as taking the square root of sum of average citations of
papers per year included in h core. AR-index takes into account the age of papers with total
number of citations in h core thus it allows that with the passage of time, a researcher index may
be decreased.
2.1.5. Excess Citations Based Indexing
Zhang CT [35] criticized h-index for two reasons. One for excess citations being
ignored and second is that the h-index is natural number so different researchers may have same
h-index. To solve these limitations, he proposed complement of h-index called e-index. e-index
represents ignored excess citations and is used to evaluate the scientific output of researchers
having same h-index. e-index is calculated as:
= ∑
−ℎ
(2.1)
_____________________________________________________________________________________________
Author Productivity Indexing Via Topic Sensitive Weighted Citations
12
Chapter 2
∑
Literature Review
is citations received by papers from rank 1 to h-index. h2 is the square root of h-index.
e-index is real number that ranges between 0 and ∞ so for fair comparison of researchers, eindex must be used with h-index because both give complete information of citations in h-core.
In order to moderately evaluate scientific output in fields of social sciences and humanities
where research papers get fewer citations, Anania et al [5] proposed two indices named k-index
and w-index. Both indices are defined over real numbers. The integer part is equal to h-index and
fractional part denotes the excess citations. k-index only accounts the total citations of one’s
most cited papers i.e. papers included in h-core. k-index is calculated as:
=ℎ+ 1− ℎ ∑
, ,..
, ∀ℎ > 0
(2.2)
And k = 0 when h = 0. w-index is defined as:
= ℎ+ 1−ℎ
, ∀ℎ > 0
(2.3)
And w = 0 when h=0. The w-index accounts for the total citations of all the papers published. So
both indices get increased when the papers get more and more citations.
2.1.6. Co-authors and Weighted Citations Based Indexing
As h-index is used to compare researchers of the same field, Batista et al [7] proposed
hI index for comparison of researchers of different fields like Physics, Chemistry, Biology,
Mathematics etc. hI considers the co-authors of the paper as well. hI is actually the number of
papers that a researcher has worked alone and has got at least hI citations. hI is calculated as h2
divided by total number of authors in h papers. h2 is the square of h-index. If a researcher has
index h and he/she has worked alone in those papers then total number of authors will be equal to
h and hence dividing h2 by h will result h so in this case he/she will have hI equal to h and in case
of co-authors, hI will be less than h. its limitations are that its value get decreased when papers of
many authors in h-core get cited more times and it is also restricted to h-core so single-authored
papers between hI and h are not taken in account [30][29]. h and g indices were also criticized
due to their insensitivity to the number of co-authors by Egghe L [17] and proposed fractional h
_____________________________________________________________________________________________
Author Productivity Indexing Via Topic Sensitive Weighted Citations
13
Chapter 2
Literature Review
and g indices by dealing with information of co-authors of papers in fractional way by two
methods.
a) By fractional papers counts in which the citation scores remains unchanged so the papers
order are not changed. Each paper takes
()
rank in case of (i ) authors of that paper so the
entire ranking of papers i.e. 1, 2, 3 . . . are replaced by
( )
,
( )
+
( )
,
( )
+
( )
+
( )
and so on respectively. Formula for fractional h-index i.e. hF is the largest rank r(fractional)
such that
ℎ =∑
()
≤
(2.4)
Fractional g-index i.e. gF is the largest rank r(fractional) such that
= ∑
()
≤∑
(2.5)
b) By fractional citation counts in which an author receives
has
citations and
( ) authors. After calculating
()
()
credit in i paper if that paper
i.e. fractional citations for each
paper, the papers are then arranged by citations received in descending order and fractional h
and g indexes are then calculated. The formula for calculating fractional h-index i.e. hf is the
largest rank r such that
≥ℎ
(2.6)
Formula for calculating fractional g-index i.e. gf is the largest rank r such that
∑
()
≥
(2.7)
In some cases the gF value get increased which is not reasonable because the citations of
paper are distributed among co-authors so the value of g-index should be reduced in case of
fractional counting [31]. It’s another disadvantage is that the papers are rearranged after
calculating fractional citation for each paper thus the highly cited papers having many authors
are dropped out of the core [30]. Authors [7] and [17] considered the number of co-authors in
papers for calculation of hI index and fractional h, g indexes respectively. Wan et al [34]
_____________________________________________________________________________________________
Author Productivity Indexing Via Topic Sensitive Weighted Citations
14
Chapter 2
Literature Review
proposed another variation of h-index named pure h-index denoted by hP that not only take into
account the number of co-authors of a paper but also the author’s relative rank in paper. This
approach is also applied to R-index, called pure R-index denoted by RP which considers the total
number of citations in h core as well. For calculating hP, normalized score for each author in a
paper is calculated such that summation of scores of all authors in a paper must be equal to one.
For normalized scoring, fractional counting, proportional counting and geometric counting etc
may be used. Rp-index is same as R-index but all of citations of papers of an author are divided
by the average number of co-authors of that author in h core. Chai et al [13] criticized hp-index
as it is in favor of multi-authored papers and proposed an improvement to the hp-index called
Adapted pure h-index denoted by hap. This index is in favor of single-authored papers. It also
changed the h-core to hap-core if all articles citations are greater than or equal to hap. Schreiber M
[29][30] proposed a modification of h-index called hm-index that removed the limitations of hIindex and hf-index. It is defined as the reduced number of papers that have cited hm or more
times. Each paper is counted fractionally according to the inverse of the number of authors called
effective rank, denoted by reff. And hm is calculated as:
ℎ = max
( )≤ ( )
(2.8)
C(r) is total citations of a paper. Calculating hm-index does not require changing order or
rearrangement of papers and its value is increased when papers of many authors in h-core get
more citations.
Another method of sharing a paper credit among all of its co-authors was proposed by
Hagen [19] that is called harmonic credit allocation. In this method, each author receives credit
according to his/her rank and number of co-authors ‘N’ such that first author receives most credit
and authors with increasing ranks get less credit i.e the ith author more credit than (i+1)th author.
As the numbers of authors get increased, credit per author gets decreased. The ith author credit is
calculated as:
1
1 + 1 2 + ⋯ + (1
)
(2.9)
_____________________________________________________________________________________________
Author Productivity Indexing Via Topic Sensitive Weighted Citations
15
Chapter 2
Literature Review
Then the credit of ith author is multiplied with the original citations of that paper to obtain his/her
contribution in determining the harmonic h-index. Another varient of h-index known as kth-rank
index was proposed by Sekercioglu C. H [33] that quantify co-authors contributions according to
the rank of authors in that paper. According to Sekercioglu , the kth-ranked co-author contribute
1/k as much as the first author i.e. in case of 3 authors for a paper, the first author will contribute
1 because he/she has rank 1. Second author will contribute 1/2 and third author will contribute
1/3 and so on. Zhang C.T [36] also criticized h-index for its insensitivity to the number of coauthors and their individual contribution according to their ranks in paper. For this purpose he
proposed weighted h-index ‘w’ which is based on weighted citations. He also criticized kth-rank
index [33] that the last author is corresponding author so first and last author should take full
credit and the rest co-authors credit is decreased such that the sum of weights for these authors is
one. We can say that the idea of weighted h-index based on weighted citations was proposed in
order to quantify the contribution of researchers according to their rank in particular article. Later
Schreiber M [31] proposed a modification of g-index named gm-index which is also the
enhancement of gF [17] but in gm-index fractional counting of papers always results reduction of
the g-index. It is defined as the highest effective rank that is less or equal to the effective number
of citations on average. Another variation regarding to fractionally quantify the research output
in case of multiple authors was proposed by Carbone V [12]. In this approach, the total number
of citations of each paper is divided by the square root of the number of co-authors of that paper.
Like fractional counting, this approach also divides the citations of paper equally among all of its
co-authors and it does not take into consideration the rank of authors. Other enhancements to the
h-index which are known as weighted h-index and weighted citation h-cuts were proposed by
Abbas A M [1] which takes into account the number of authors of paper. He proposed weight
assignment schemes. One assign weights to authors according to their positions in a paper and
another one assign them weights equally. The weighted contributions of authors are taken into
account in weighted h-indices and excess citation count is taken into account in weighted citation
h-cuts.
Another variation of the h-index to allocate credit to authors in multi-authored paper
according to their contribution is proposed by Liu et al [27]. This method of allocating credit is
between fractional counting and harmonic credit allocation so it is called combined credit
allocation (CCA) method. The indices proposed are known as CCA h-index, hc and CCA g_____________________________________________________________________________________________
Author Productivity Indexing Via Topic Sensitive Weighted Citations
16
Chapter 2
Literature Review
index, gc. The first author and the corresponding authors are the most important authors or MIAs
so the order of authors is arranged and are tie for the first rank. Their normalized credit allocated
proportion in a paper is calculated as:
( , ) =
∑
(2.10)
∑
Where
is the total number of MIAs in a paper and n is the total number of authors in that
paper. Value of q is set to 2/3.
The normalized credit allocated proportion of the rth-author in n-authored paper is:
( , ) =
∑
(2.11)
If a paper has cited c1 times then the citations allocated to rth-author will be c1 × P(r,n). the
allocated citations produced by this method replace the original citations of each paper for r thauthor in determining hc and gc. CCA h-index needs reordering of papers when citations are
allocated to each author in multi-authored paper [28]. To remove this limitation Liu et al [28]
proposed a modification of h-index known as hmc-index. hmc-index uses the framework of hmindex [29][30] but instead of fractional counting, it uses combined credit allocation (CCA)
method so that authors with different contributions gain different credit. This method uses
effective paper count instead of effective rank. The effective paper count is the summation of
normalized credit allocated proportion that an author has obtained in r papers.
( ) =∑
(
( ), ( ))
(2.12)
Where rank (i ) is the author rank in ith paper and n(i) is the number of authors in ith paper and
hmc is defined as:
ℎ
= max(
( ) ≤ ( ))
(2.13)
In some cases an author with lower contribution get higher hmc-index. To remove this,
Rational hmc denoted as hmcr-index is used to assign higher hmcr to author with more contribution.
The k and w indices proposed in [5] were also normalized to accounts for co-authors by
normalized citations. k and w indices have modified in k-norm and w-norm indices. Normalized
citations for a researcher in multi-authored paper have obtained by dividing the total citations of
_____________________________________________________________________________________________
Author Productivity Indexing Via Topic Sensitive Weighted Citations
17
Chapter 2
Literature Review
each paper by its author number. Abramo et al [2] measured the distortion introduced in
performance ranking of individuals in case when the number of co-authors, their rank and
contribution is ignored in multi-authored papers. They proposed two yearly productivity
measures, Weighted Fractional Output (WFO) and Weighted Fractional Impact (WFI). WFO is
papers based weight of co-author according to his/her rank and WFI considers citations of papers
as well. Another method of correcting the citations of co-authors in multi-authored papers, Aziz
et al [6] have introduced harmonic weighting algorithm, considering order and number of coauthors contributed in the paper and proposed “profit (p)-index” which calculate approximately
the extent to which authors get profit from co-authors contribution. It shows that co-authors
contribution to the work of an author is significant. P-index value ranges from 0 to 1 and its
higher value specify greater contribution of co-authors to an author’s papers.
2.1.7. Topic Based Indexing
Topic based indexing means an indexing technique that assigns weighted citations to
co-authors of a paper according to their topics. None of the indexing method has discussed this
issue. We have chosen this as our proposed idea of assigning weights to co-authors so that if an
author has not that topic on which the paper has published then we have to minimize his/her
weight and have to maximize the weighted score of other authors of that topic according to their
rank in that paper.
2.2. Problem Statement
If four authors A1, A2, A3 and A4 have contributed to a paper on machine learning.
Author A1, A2 and A4 have that topic while A3 has not that topic then it is clear that contributions
of A1, A2 and A4 have more weights than that of A3 because if a researcher having that topic of
interest and contributed to a paper on that topic then his/her contribution is very useful as
compare to the one contributed to that paper but having another topic of interest. By studying
different author productivity indexing methods in literature review, we noticed that all of them
did not consider topic based contribution of co-authors in multi-authored papers and they did not
assign weighted citations to authors according to their topics. Topic sensitive weighted citations
will show the worth of researchers’ contributions in a paper on the basis of their topics. We have
proposed a quantitative method that will assign weighted citations to co-authors according to
their topics.
_____________________________________________________________________________________________
Author Productivity Indexing Via Topic Sensitive Weighted Citations
18
Chapter 2
Literature Review
2.3. Objective of Research
As all of the Author Productivity Indexing methods discussed in literature review had not
considered topics of authors for assigning them weighted citations. The objective of our research
is to handle this limitation of all of the previous indexing methods and to introduce our new
index. Following are the main objectives of our research.

To assign Normalized Weighted Citations (NWC-index) to co-authors of a paper.

To propose TSWC-index (Topic Sensitive Weighted Citations index) that
increase or decrease NWC score of co-authors of a paper according to their
relatedness to that topic.

To compare the results of our proposed methods with baseline methods, h-index
and kth-rank index.
We have used LDA for making clusters of authors based on their papers titles. Clusters have
made to know about authors topics and to calculate TSWC-index based on their topics.
_____________________________________________________________________________________________
Author Productivity Indexing Via Topic Sensitive Weighted Citations
19
Chapter 3
Methodology
_____________________________________________________________________________________________
Author Productivity Indexing Via Topic Sensitive Weighted Citations
20
Chapter 3
Methodology
3. Methodology
Methodology refers to the methods applied to a specific field of study. In this chapter we will
discuss the methodology used for our research. We have chosen h-index and kth-rank index as
our baseline methods. We have used our proposed method to implement the dataset to prove our
proposed solution. The results have then compared with the baseline methods.
3.1. Baseline Methods
3.1.1. h-index
h-index proposed by Hirsch J E [20] to assess the scientific productivity of
researchers. Hirsch defined h-index as: “A scientist has index h if h of his or her N papers have at
least h citations each and the other (N − h) papers have ≤ h citations each”. Mathematically
.
ℎ =
Nc.t is the sum of citations of all papers of scientist.
(3.1)
is proportionality constant that ranges
between 3 and 5. We have set its value to 4. For example scientist A1 has published 10 papers
and the total numbers of citations received by these papers are 392 then
h=
√
= 9.89
h-index assigns all of the citations of a paper to each of its co-authors.
3.1.2. kth-rank Index
kth-rank Index was proposed by Sekercioglu C. H [33] that quantify co-authors
contributions according to the rank of authors in that paper. According to Sekercioglu , the k thranked co-author contribute 1/k as much as the first author i.e. in case of 3 authors paper, 1st
author will contribute 1 because he/she has rank 1, 2nd author will contribute 1/2 and 3rd author
will contribute 1/3 respectively. If a paper has received 50 citations then 1st author will receives
50 citations, 2nd author will receives 25, 3rd will receives 17 citations. The combined numbers of
citations assigned to co-authors of the paper are 92 which are greater than the total number of
citations (50) received by the paper. For this purpose we have proposed a method of dividing the
_____________________________________________________________________________________________
Author Productivity Indexing Via Topic Sensitive Weighted Citations
21
Chapter 3
Methodology
total citations received by a paper in such a way that authors receive citations according to their
rank.
3.2. Proposed Method
As none of the indexing method discussed in literature review has covered topic based
weighted citations for authors in multi-authored papers. We have proposed two methods of
assigning weights to authors. First is Normalized Weighted Citations (NWC) that assigns
weighted citations to authors of a paper. Names of authors are arranged in a paper according to
their contribution so this method divides the total number of citations received by a paper among
its authors according to their rank in the paper. Second method is Topic Sensitive Weighted
Citations (TSWC) that increases or decreases NWC score of authors on the basis of their topics.
The topic similarity is checked if an author of a paper has not same topic to the first author of
that paper then his/her NWC score is minimized and NWC score of the first author and other
authors having same topic to first author is maximized. We have also calculated NWC-index and
TSWC-index and compared the results with h-index and kth-rank Index. Before calculating NWC
and TSWC, we have made clusters of authors on the basis of their papers titles to know about
their topics. A topic having the maximum probability for an author is considered his/her topic of
interest. For clusters making, we have used Latent Dirichlet Allocation (LDA).
Algorithm
Step 1:
Make author wise 100 clusters based on their papers using LDA. Assume an author topic to
that one for which he/she has the maximum probability.
Step 2:
Calculate Normalized Weighted Citations (NWC) score of each author in each paper.
NWCi =
∑
×
, i is the rank of author in a paper and N is the total number of authors
of that paper. Cit is the total citations received by that paper.
_____________________________________________________________________________________________
Author Productivity Indexing Via Topic Sensitive Weighted Citations
22
Chapter 3
Methodology
Step 3:
Check topic of authors in each paper with its first author’s topic.
1) For same topic author, Use NWCi calculated in step 2.
2) For not same topic author, decrease their NWCi as NWCi / 2 and round the result.
Step 4:
Calculate weights of same topic authors in the value calculated in step 3_2) of not same
topic author as:
NWCj =
∑
×
, j is rank of same topic author and NWC is the value
calculated in step 3_2).
Step 5:
Calculate Topic Sensitive Weighted Citations (TSWC) of same topic author by adding the
value calculated in step 4 to step 3_1) for that author. Use the value calculated in step 3_2) as
TSWC of not same topic author.
Step 6:
Calculate NWC-index and TSWC-index of each author as that of h-index.
Step 7:
Compare NWC-index and TSWC-index to baseline methods.
Note: In case of more than one not same topic authors in a paper, the values calculated in step
3_2) are summed up and then step 4 is calculated.
Example:
A paper has four authors A, B, C and D with rank 1, 2, 3 and 4 respectively. Citations of
paper are 15. First author i.e. A has the maximum contribution in the paper so he/she has that
topic in which the paper has included. Topic of B and D have also same to A’s topic while topic
_____________________________________________________________________________________________
Author Productivity Indexing Via Topic Sensitive Weighted Citations
23
Chapter 3
Methodology
of C has not same to A. According to step 2 of proposed method, the NWC score of each author
is:
A’s NWC =
˟ 15 = 6
B’s NWC =
˟ 15 = 5
C’s NWC =
˟ 15 = 3
D’s NWC =
˟ 15 = 2
According to step 3, author B and D have same topic to A so their NWC score will be 5 and 2
respectively and A’s NWC score will 6 as calculated in step 2. C’s topic has not same to A’s
topic so his/her NWC score has decreased as:
C’s NWC = 3/2 = 2
According to step 4, calculate weights of authors A, B and D in the NWC score of C calculated
in step 3 as:
A’s NWC =
˟2=1
B’s NWC =
˟2=1
D’s NWC =
˟2=0
In step 5, we will calculate TSWC score of A, B and D by adding their weights calculated in step
4 to their NWC score of step 3:
A’s TSWC = 6+1 = 7
B’s TSWC = 5+1 = 6
D’s TSWC = 2+0 = 2
We will use the value calculated in step 3 for C
C’s TSWC = 2
_____________________________________________________________________________________________
Author Productivity Indexing Via Topic Sensitive Weighted Citations
24
Chapter 3
Methodology
The following tables show the NWC and TSWC of co-authors in a paper. Topic column
identify the area of interest (topic) of an author either true or false. For first author of a paper, the
topic will always true in which the paper has included while for other co-authors of that paper we
will check their topics and will assign weights to them related to the topic of first author. In the
following table 3.1, A1 is first author because of his maximum contribution in the paper so
his/her topic has true for the paper. It means that he/she has that area of interest and we have no
doubt about his/her contribution in the paper. Contribution of other co-authors will be judged by
their topic similarity to the first author. A2 and A4 have same topic to author A1 and A3 has not
same topic to A1. There may be the case that a co-author’s papers may be clustered in different
clusters or topics. In that case, we will consider that one topic for the author for which he/she has
the maximum probability.
Table 3.1: NWC and TCWC scores of authors having one not same topic author
Author
A1
A2
A3
A4
Topic
True
True
False
True
Cit
15
15
15
15
6
5
3
2
7
6
2
2
Rank
1
2
3
4
In the above table, we have decreased the TSWC of A3 because he/she has not same topic to A1.
A2 and A4 have same topic to A1. The TSWC of A1, A2 and A4 have then increased as they have
same topic of interest. Another example is presented in table 3.2 in which more than one author
have different topic from author A1.
Table 3.2: NWC and TSWC scores of authors of more than one not same topic authors
Author
A1
A5
A6
A7
Topic
True
False
False
False
Cit
15
15
15
15
6
5
3
2
12
3
2
1
Rank
1
2
3
4
In table 3.2, the TSWC of A5, A6 and A7 have decreased and only TSWC of A1 has increased by
assigning the combined citations of A5, A6 and A7. Another example is presented in table 3.3 that
consists of three authors having same topic.
_____________________________________________________________________________________________
Author Productivity Indexing Via Topic Sensitive Weighted Citations
25
Chapter 3
Methodology
Table 3.3: NWC and TSWC scores of same topic authors in paper
Author
A4
A8
A1
Topic
True
True
True
Cit
16
16
16
8
5
3
8
5
3
Rank
1
2
3
All of the three authors of paper in table 3.3 have same topic so we have not decreased their
NWC score. They have same NWC and TSWC scores. The h-index, NWC-index, TSWC-index
and kth-rank index of A1, A2 and A4 are:
Table 3.4: h-index, NWC-index, TSWC-index and kth-rank index
Author
A1
A2
A4
Cit
h-index
NWC
46
15
31
3
1
2
20
5
10
NWCindex
2
1
1
TSWC
27
6
9
TSWCindex
2
1
1
Kth-rank
35
8
20
Kth-rank
index
2
1
2
In the above table, author A1 has h-index of 3. After calculating NWC and TSWC his/her NWCindex and TSWC-index became 2 which is less than h-index and equal to kth-rank index. Author
A2 NWC-index and TSWC-index remained same as that of h-index and kth-rank index which is
1. A4’s h-index and kth-rank index is 2 that is greater than his/her NWC-index and TSWC-index
which is 1. In our proposed methods, the NWC-index and TSWC-index will always less or equal
to the h-index and kth-rank index because the citations are divided among authors of a paper and
none of the author receives total citations in multi-authored paper except the one in a singleauthor paper. This phenomenon leads the NWC-index and TSWC-index always minimum or
equal to the baseline methods.
3.3. Latent Dirichlet Allocation
Hard clustering assigns a document to one cluster without dealing with semantics. This
problem motivate topic modeling technique based on latent topic layer. Topic model is used for
discovering topics in a collection of documents. It generates soft clusters and deals with
semantics of words in documents. Latent Dirichlet Allocation (LDA) proposed by Blei et al [8] is
used in topic modeling. It clusters the words into topics. Latent topic allows document to more
than one cluster. K-Mean algorithm is an example of hard clustering and it assigns only one topic
to an author. We have used generative probabilistic model LDA for making clusters of authors
on the basis of their papers titles which considers multiple topics of an author. Generative
_____________________________________________________________________________________________
Author Productivity Indexing Via Topic Sensitive Weighted Citations
26
Chapter 3
Methodology
probabilistic model generates observable data randomly if hidden parameters are specified. It
presents a joint probability distribution over label sequences and observation. Multinomial
distribution
for each document d is sampled over topics Z. Multinomial distribution is actually
the generalization of binomial distribution, where binomial distribution is composed of n
independent trails in which each has a number of possible outcomes such that the combined
probability of all of the outcomes is 1. Basic terminology and notation used in LDA are word,
document and corpus.
 A word is an item in vocabulary. It is a unit of discrete data denoted by “w”.
 A document is a collection of words denoted by W ={w1, w2, . . . . . . .wn}.where wn
is the nth word.
 Corpus is the collection of documents denoted by D ={W1, W 2, . . . . . W M}.
Latent topic layer Z ={Z1, Z2, Z3,……..Zi}. where Zi is latent topic in document vector d words
wd. Dirichlet parameter α is topic distribution per document and parameter β is word distribution
per topic. LDA suppose following generative process for each document W in corpus D.
1. Select N ~ Poisson (ξ).
2. Select
~ Dirichlet distribution (Dir (α)).
3. For each of the N words wn:
a) Select a topic Zn~Multinomial ( ).
b) Select a word wn from p (wn | Zn, β), a multinomial probability conditioned on the
topic Zn.
In basic model, following assumptions are made.
 The dimensionality k and z of Dirichlet distribution and topic variable respectively is
supposed known and fixed.
 Word probabilities k x V matrix is β (βij = P(wj = 1| zi = 1))
 Realistic document length distribution may be used when needed and Poisson supposition
is not critical to something that follows.
_____________________________________________________________________________________________
Author Productivity Indexing Via Topic Sensitive Weighted Citations
27
Chapter 3
Methodology
Figure 3.1: Latent Dirichlet allocation
Boxes are plates. Outer plate represents documents and inner plate represents words and topics
repeated in document. There are N words w and z topics in a document. Unshaded and shaded
circles represent hidden variables and observable variables respectively. For each word, inner
plate will iterate N time. M represents the number of iteration for each document. For parameters
α and β, variational EM Algorithm [8] is used.
Parameters α and β are given which are corpus level parameters and sampled once in generating
corpus. A k dimensional random dirichlet variable is
such that ∑
= 1 . Following
probability density is used.
P( | α) =
Γ ∑
∏
Γ( )
….
Number of N topics z, N words w, joint distribution of a topic mixture
P( ,z,w|α,β) = P( | α) ∏
P(zn| ) is
p(z |θ)p(w |z , β)
for value of that is unique ,
(3.2)
is given by:
(3.3)
= 1. Summing z and integrating , a document
marginal distribution is obtained as:
P(W|α,β)=∫ p(θ|α)(∏
∑
(z | ) (
|z , ))d
(3.4)
_____________________________________________________________________________________________
Author Productivity Indexing Via Topic Sensitive Weighted Citations
28
Chapter 3
Methodology
Finally, corpus probability is calculated by multiplying the marginal probabilities of documents.
P(D|α,β)= ∏
∫ p(θ |α) ∏
∑
p(z |θ )p(w |z , β) dθ
(3.5)
is document level variable that sampled once per document. Zdn and wdn are word level
variables that sampled once for each word in a document.
LDA assumes that topics generate words. Words are the only observable data in documents
which shows that each document consists of some topics and that each word is created by one of
the document’s topic. Firstly, multinomial distribution θ over topics for each document d is
randomly sampled with parameter α. Secondly, a topic z is chosen for each word w from this
topic distribution and finally word w is generated by randomly sampling from topic specific
multinomial distribution ɸ .
A word w (in document d) generating probability for LDA is:
P(w|d, θ, ɸ) = ∑
( |z, ɸ ) ( |d, θ )
(3.6)
_____________________________________________________________________________________________
Author Productivity Indexing Via Topic Sensitive Weighted Citations
29
Chapter 4
Experiments
_____________________________________________________________________________________________
Author Productivity Indexing Via Topic Sensitive Weighted Citations
30
Chapter 4
Experiments
4. Experiments
In this chapter we will discuss the experiments we have done on our data set. We will discuss
their results also. We have organized this chapter in four parts. First, we will discuss the data set
that we have used in our experiments. Second, we will discuss parameter settings for baseline
methods and proposed methods. Third, we will briefly discuss the baseline methods that we have
chosen for results comparisons. Fourth, we will discuss the results of our proposed methods and
we will compare the results with baseline methods.
4.1. Data set
We have chosen DBLP-Citation-network V5 data set for our experiments from
arnetminer.org [14]. DBLP is computer science bibliography that consists of about 1.8 million
publications of about 1 million authors. Our data set consists of following data variables.

Paper ID

Authors

Title

Citations
The size of our data consists of:

Number of Authors: 127410

Number of publications: 100000
We have preprocessed the titles by removing stop words, punctuations and numbers to get
correct results. Stop words list is given on internet that consists of a, an, the, is, are, than, of, to,
in, on etc.
4.2. Development Tools and Programming language
We have preprocessed our data set using PHP. We have used WampServer2.2a-x32
Windows web development environment for preprocessing of our data set. We have
implemented LDA using C++ language and for calculation of our proposed indices we have used
JAVA programming language.
_____________________________________________________________________________________________
Author Productivity Indexing Via Topic Sensitive Weighted Citations
31
Chapter 4
Experiments
4.3. Parameter settings
4.3.1. Parameters for LDA
In our experiments, clusters of authors have made based on the titles of papers using
LDA. Corpus level parameters, α and β have set to 50/z and 0.01 respectively. The value of
topics z has set to 100 with respect to our data set. We have selected 100 topics on the basis of
human judgment of meaningful topics in addition with perplexity measured [8] which estimates
the performance of probabilistic topic models with the lower the best.
4.3.2. Parameter for h-index
h-index is calculated by dividing the total number of citations of an author’s papers
by a proportionality constant “a” and then taking its square root. “a” ranges between 3 and 5. We
have selected 4 as the value of a.
4.4. Baseline Methods
4.4.1. H-index
One of our baseline methods is h-index that is a well-known index used for the
assessment of researcher’s work. H-index does not consider the individual contribution of
authors in papers and topic sensitivity for assigning weighted citations based on their individual
contribution. We have considered these issues in our proposed indices named NWC-index and
TSWC-index. We have chosen 30 authors for analysis purposes.
4.4.2. Kth-rank index
Second baseline method is kth-rank index that has considered the individual
contribution of each author in a paper and assigned weighted citations to authors according to
their contributions in the paper. In this method, first author of a paper always get all citations of
that paper as that of each author in h-index while the others receive less citations as compare to
the first author. This method has not discussed topic sensitivity in assigning weighted citations to
authors that we have proposed in TSWC-index. We have chosen 30 authors for analysis
purposes.
4.5. Results and Discussions
In this part, we will compare the results of our proposed indices NWC-index and TSWCindex to baseline methods. We will discuss about the authors who have gained same rank and
_____________________________________________________________________________________________
Author Productivity Indexing Via Topic Sensitive Weighted Citations
32
Chapter 4
Experiments
less rank as that in the baseline methods. Before calculating NWC-index, we have made clusters
of authors based on their papers using LDA to know about their topics that have further used in
calculating TSWC-index.
Comparison with baseline methods
4.5.1. H-index
h-index assigns all citations of a paper to each author of that paper. We have
chosen 30 authors for analysis. Following table shows ranks of authors calculated using h-index.
Table 4.1: Rank of authors by their h-index
S.N0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
Rank
1
2
3
4
5
6
7
8
9
10
11
11
12
12
13
13
14
14
14
14
14
14
15
15
16
17
17
18
19
19
Authors
david e. Goldberg
william g. Cochran
c. a. r. hoare
j. ross quinlan
jeffrey d. ullman
bertrand meyer
jiawei han
ricardo a. baeza-yates
micheline kamber
edmund m. Clarke
paul c. van oorschot
alfred menezes
scott a. Vanstone
gerard salton
michael mcgill
berthier a. ribeiro-neto
frederick eddy
michael r. blaha
william j. premerlani
james e. rumbaugh
william e. lorensen
gregory piatetsky-shapiro
anil k. jain
rajeev motwani
prabhakar raghavan
bernd-holger schlingloff
christos h. Papadimitriou
usama m. Fayyad
richard c. dubes
franco p. preparata
Citations
36114
34194
28234
16859
15825
14960
11376
10628
10144
9759
9022
8940
8606
8499
8440
8379
7966
7966
7966
7966
7966
7846
7554
7489
7293
6999
6913
6268
6064
6010
h-index
95
92
84
64
62
61
53
51
50
49
47
47
46
46
45
45
44
44
44
44
44
44
43
43
42
41
41
39
38
38
_____________________________________________________________________________________________
Author Productivity Indexing Via Topic Sensitive Weighted Citations
33
Chapter 4
Experiments
In table 4.1, the authors are ranked according to their h-index. The highest h-index of author has
ranked 1, 2nd highest h-index of author has ranked 2 and so on.
4.5.2. Kth-rank index
Kth-rank index assigns citations of a paper to its co-authors according to their
ranks in the paper. Thus it assigns weighted citations to authors of a paper unlike to h-index
which assigns total citations of paper to each of its co-author even their contribution in the paper
is not same. Only the first author in a co-authored paper receives the total citations of that paper
in kth rank index. Following table shows ranks of authors calculated using kth-rank index.
Table 4.2: Rank of authors by their kth-rank index
S.N0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
Rank
Authors
Citations
1
2
3
4
5
6
7
8
9
9
9
10
11
12
13
13
14
15
16
16
17
17
18
19
20
21
22
23
24
25
david e. Goldberg
william g. Cochran
c. a. r. hoare
j. ross quinlan
bertrand meyer
jeffrey d. ullman
jiawei han
ricardo a. baeza-yates
edmund m. Clarke
alfred menezes
gerard salton
james e. rumbaugh
rajeev motwani
anil k. jain
usama m. fayyad
christos h. papadimitriou
franco p. preparata
micheline kamber
gregory piatetsky-shapiro
paul c. van oorschot
michael mcgill
berthier a. ribeiro-neto
michael r. blaha
prabhakar raghavan
bernd-holger schlingloff
richard c. dubes
scott a. Vanstone
william j. premerlani
frederick eddy
william e. lorensen
36114
34194
28234
16859
14960
15825
11376
10628
9759
8940
8499
7966
7489
7554
6268
6913
6010
10144
7846
9022
8440
8379
7966
7293
6999
6064
8606
7966
7966
7966
Kth-rank
citations
36097
34194
28231
16859
14868
13835
10734
9438
8741
8701
8499
7966
7324
6685
6239
6149
5985
5048
4611
4580
4220
4162
3983
3639
3480
3032
2880
2655
1992
1593
Kth-rank
index
94
92
84
64
60
58
51
48
46
46
46
44
42
40
39
39
38
35
33
33
32
32
31
30
29
27
26
25
22
19
_____________________________________________________________________________________________
Author Productivity Indexing Via Topic Sensitive Weighted Citations
34
Chapter 4
Experiments
4.5.3. Proposed methods NWC-index and TSWC-index
NWC-index is one of our proposed methods which divides the citations received
by a paper in its co-authors in such a way that first author receives highest citations; the second
author receives the second highest citations and so on. In this method, none of the co-author
receives total citations of a paper except the one in a one-authored paper. Following table shows
the ranks of authors calculated by NWC-index.
Table 4.3: Rank of authors by their NWC-index
S.N0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
Rank
Authors
Citations
NWC
NWC-index
1
2
3
4
5
6
7
8
9
10
11
12
12
13
14
15
15
16
17
17
17
18
19
19
20
21
22
22
23
24
david e. Goldberg
william g. Cochran
c. a. r. hoare
j. ross quinlan
bertrand meyer
jeffrey d. ullman
jiawei han
ricardo a. baeza-yates
gerard salton
edmund m. Clarke
rajeev motwani
anil k. jain
alfred menezes
christos h. papadimitriou
franco p. preparata
micheline kamber
gregory piatetsky-shapiro
paul c. van oorschot
usama m. Fayyad
michael mcgill
berthier a. ribeiro-neto
james e. rumbaugh
prabhakar raghavan
bernd-holger schlingloff
michael r. blaha
richard c. dubes
william j. premerlani
scott a. Vanstone
frederick eddy
william e. lorensen
36114
34194
28234
16859
14960
15825
11376
10628
8499
9759
7489
7554
8940
6913
6010
10144
7846
9022
6268
8440
8379
7966
7293
6999
7966
6064
7966
8606
7966
7966
36085
34194
28227
16859
14845
13175
7095
6311
5686
5469
4875
4386
4358
4231
3987
3363
3201
3081
2872
2813
2753
2655
2423
2317
2124
2021
1593
1445
1062
531
94
92
84
64
60
57
42
39
37
36
34
33
33
32
31
28
28
27
26
26
26
25
24
24
23
22
19
19
16
11
Table 4.3 shows researcher’s rank according to the proposed method NWC-index. The table
shows variations in ranks and NWC-index than that of h-index. Some of the researcher’s NWC
_____________________________________________________________________________________________
Author Productivity Indexing Via Topic Sensitive Weighted Citations
35
Chapter 4
Experiments
have decreased because our proposed method has divided the total citations of a paper among its
co-authors. Only an author in a single-authored paper has received the total citations of that
paper. william g. Cochran, j. ross quinlan have same number of citations in NWC with the total
citations received by their papers because they are the single authors of their all paper and their
h-index and NWC-index have same. After calculating NWC for each author in each paper, we
have checked their topics and calculated their TSWC scores for each author in paper that have
further used for
calculating TSWC-index. Following table shows ranks of authors after
calculating TSWC score of each author.
Table 4.4: Rank of authors by their TSWC-index
S.N0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
Rank
Authors
NWC
TSWC
TSWC-index
1
2
3
4
5
6
7
7
8
9
10
11
11
12
13
14
15
16
17
18
19
20
21
22
23
24
24
25
26
27
david e. Goldberg
william g. Cochran
c. a. r. hoare
j. ross quinlan
bertrand meyer
jeffrey d. ullman
ricardo a. baeza-yates
jiawei han
edmund m. Clarke
rajeev motwani
gerard salton
alfred menezes
anil k. jain
franco p. preparata
christos h. papadimitriou
usama m. Fayyad
micheline kamber
michael mcgill
james e. rumbaugh
gregory piatetsky-shapiro
michael r. blaha
scott a. Vanstone
paul c. van oorschot
william j. premerlani
berthier a. ribeiro-neto
prabhakar raghavan
bernd-holger schlingloff
frederick eddy
richard c. dubes
william e. lorensen
36085
34194
28227
16859
14845
13175
6311
7095
5469
4875
5686
4358
4386
3987
4231
2872
3363
2813
2655
3201
2124
1445
3081
1593
2753
2423
2317
1062
2021
531
36084
34194
28227
16859
14834
12878
7292
7110
6494
6074
5686
5311
5282
4981
4813
3483
3363
2813
2655
2403
2124
1912
1658
1593
1397
1213
1180
1062
1011
531
94
92
84
64
60
56
42
42
40
38
37
36
36
35
34
29
28
26
25
24
23
21
20
19
18
17
17
16
15
11
_____________________________________________________________________________________________
Author Productivity Indexing Via Topic Sensitive Weighted Citations
36
Chapter 4
Experiments
Table 4.4 shows variations in rank of authors after calculating TSWC score by checking the
topics of authors in each paper i.e. for not same topic author, his/her NWC score has decreased
and for same topic author, his/her NWC score has increased which has shown in TSWC column.
TSWC-index is then calculated for all authors. The citations of authors e.g. ricardo a. baezayates, jiawei han and edmund m Clarke etc have increased in TSWC because of the same topic to
first author in which their papers have included. William g. cochran and j. ross quinlan have
same citations score as that in the h-index table because they are the single authors of their
papers. jeffrey d. ullman and prabhakar raghavan’s citations score in TSWC have decreased
because they have not same topic to the first author in some of their papers. The variations in
indices and rank of authors can be seen in the following table.
_____________________________________________________________________________________________
Author Productivity Indexing Via Topic Sensitive Weighted Citations
37
Chapter 4
Experiments
Table 4.5: Rank of authors by their TSWC-index and variations with NWC-index
S.N0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
Authors
NWC-index
TSWC-index
david e. Goldberg
william g. Cochran
c. a. r. hoare
j. ross quinlan
bertrand meyer
jeffrey d. ullman
ricardo a. baeza-yates
jiawei han
edmund m. Clarke
rajeev motwani
gerard salton
alfred menezes
anil k. jain
franco p. preparata
christos h. Papadimitriou
usama m. Fayyad
micheline kamber
michael mcgill
james e. rumbaugh
gregory piatetsky-shapiro
michael r. blaha
scott a. Vanstone
paul c. van oorschot
william j. premerlani
berthier a. ribeiro-neto
prabhakar raghavan
bernd-holger schlingloff
frederick eddy
richard c. dubes
william e. lorensen
94
92
84
64
60
57
39
42
36
34
37
33
33
31
32
26
28
26
25
28
23
19
27
19
26
24
24
16
22
11
94
92
84
64
60
56
42
42
40
38
37
36
36
35
34
29
28
26
25
24
23
21
20
19
18
17
17
16
15
11
Variations
in index
0
0
0
0
0
-1
+3
0
+4
+4
0
+3
+3
+4
+2
+3
0
0
0
-4
0
+2
-7
0
-8
-7
-7
0
-7
0
Earned position
in rank
0
0
0
0
0
0
+1
0
+2
+2
-1
+1
+1
+2
0
+3
0
+1
+1
-3
+1
+2
-5
0
-6
-5
-5
-2
-5
-3
In table 4.5, (-) sign and (+) sign represent decrease and increase in TSWC-index respectively as
compared to NWC-index and in rank. Rajeev motwani has NWC-index of 34 and TSWC-index
of 38 so it shows that his TSWC-index has increased by 4 denoted by +4 and his rank has
increased by 2 since in table 4.3, his rank is 11 and in table 4.4, his rank is 9. Rajeev motwani’s
TSWC-index has increased because some of his co-authors have not that topic which has his
topic of interest so we have decreased their TSWC-index and increased his TSWC-index. Rank
and TSWC-index of jiawei han have 7 and 42 respectively in table 4.3 and in table 4.4 so there is
no change in his rank which is denoted by 0 in table 4.5. Richard c.dubes has NWC-index of 22
_____________________________________________________________________________________________
Author Productivity Indexing Via Topic Sensitive Weighted Citations
38
Chapter 4
Experiments
and TSWC-index of 15 so his TSWC-index has decreased by 7 points and his rank is also
decreased by 5 points.
Comparison of all the baseline methods and proposed methods have presented in the following
figure.
Comparison of baseline indices and proposed indices
30
Comparitive analysis h-index, kth-rank index, NWC-index and TSWC-index
index rank
25
20
15
h-index rank
10
kth-rank index rank
5
david e. Goldberg
william g. Cochran
c. a. r. hoare
j. ross quinlan
jeffrey d. ullman
bertrand meyer
jiawei han
ricardo a. baeza-yates
micheline kamber
edmund m. Clarke
paul c. van oorschot
alfred menezes
scott a. Vanstone
gerard salton
michael mcgill
berthier a. ribeiro-neto
frederick eddy
michael r. blaha
william j. premerlani
james e. rumbaugh
william e. lorensen
gregory piatetsky-…
anil k. jain
rajeev motwani
prabhakar raghavan
bernd-holger …
christos h.…
usama m. Fayyad
richard c. dubes
franco p. preparata
0
NWC-index rank
TSWC-index rank
Authors
Figure 4.1: Comparison of h-index, kth-rank index, NWC-index and TSWC-index
Figure 4.1 shows the comparison of h-index, kth-rank index, NWC-index and TSWC-index ranks
of authors. NWC-index and TSWC-index of authors are less or equal to the baseline methods.
There is no greater NWC-index and TSWC-index because in our proposed methods only an
author with all of his/her single-authored papers gets all citations so his/her NWC-index and
TSWC-index will equal to h-index and in case of co-authored papers his/her NWC-index and
TSWC-index will be less than h-index and kth rank index. According to NWC-index and TSWCindex calculation, most of authors’ ranks have decreased as compared to h-index rank of authors.
Now we will compare our proposed methods one by one with baseline methods.
_____________________________________________________________________________________________
Author Productivity Indexing Via Topic Sensitive Weighted Citations
39
Chapter 4
Experiments
Comparison of NWC-index with h-index
30
Comparitive analysis h-index and NWC-index
25
index rank
20
15
10
h-index rank
5
david e. Goldberg
william g. Cochran
c. a. r. hoare
j. ross quinlan
jeffrey d. ullman
bertrand meyer
jiawei han
ricardo a. baeza-yates
micheline kamber
edmund m. Clarke
paul c. van oorschot
alfred menezes
scott a. Vanstone
gerard salton
michael mcgill
berthier a. ribeiro-neto
frederick eddy
michael r. blaha
william j. premerlani
james e. rumbaugh
william e. lorensen
gregory piatetsky-…
anil k. jain
rajeev motwani
prabhakar raghavan
bernd-holger schlingloff
christos h. Papadimitriou
usama m. Fayyad
richard c. dubes
franco p. preparata
0
NWC-index rank
Authors
Figure 4.2: Comparison of h-index with NWC-index
4.5.4. Scenarios of NWC-index
NWC-index has either equal or less than h-index and kth-rank index but there may be
different scenarios with respect to the ranks of authors in these indices. Some authors may have
NWC-index less than or equal to h-index but higher rank than that of h-index. Some may be
equal rank as compared to h-index and some of them may have less rank as compared to their hindex rank. Following are the three scenarios of rank up, down and stable.
4.5.4.1. Comparison of NWC-index with h-index
Scenario 1: Relocation with respect to h-index rank: Rank up
S.N0
1
2
3
4
5
6
7
Table 4.6: Position relocation with respect to h-index: Position up
h-index
Authors
h-index NWC-Rank NWC-index
Rank
bertrand meyer
6
61
5
60
gerard salton
12
46
9
37
anil k. jain
15
43
12
33
rajeev motwani
15
43
11
34
christos h. papadimitriou
17
41
13
32
usama m. fayyad
18
39
17
26
franco p. preparata
19
38
14
31
Earned position in
NWC-index
+1
+3
+3
+4
+4
+1
+5
_____________________________________________________________________________________________
Author Productivity Indexing Via Topic Sensitive Weighted Citations
40
Chapter 4
Experiments
In the above table, Bertrand meyer has h-index of 61 and has ranked at position 6 but his NWCindex has 60 which is less than his h-index because NWC-index has calculated on the basis of
weighted citations received by each author in multi-authored papers. He has earned one position
higher than that of his h-index rank. Similarly anil k. jain has NWC-index less than h-index, a
variation of 10 in his h-index and NWC-index but his rank has increased by 3. franco p.
preparata has h-index 38 and according to our method his NWC-index has 31 which is less than
his h-index but his rank has increased by 5 positions.
index Rank
Scenario 1: Position up
20
18
16
14
12
10
8
6
4
2
0
h-index Rank
NWC-Rank
Authors
Figure 4.3: Scenario 1: Position up with respect to h-index
In figure 4.3 heighted bar represents h-index rank of author while the other represents NWC rank
of authors. Increase in bar represents decrease in rank of an author.
_____________________________________________________________________________________________
Author Productivity Indexing Via Topic Sensitive Weighted Citations
41
Chapter 4
Experiments
Scenario 2: Relocation with respect to h-index rank: Rank down
Table 4.7: Position relocation with respect to h-index: Position down
h-index
NWC-index
Authors
h-index
NWC-index
Rank
Rank
jeffrey d. ullman
5
62
6
57
micheline kamber
9
50
15
28
paul c. van oorschot
11
47
16
27
alfred menezes
11
47
12
33
scott a. Vanstone
12
46
22
19
michael mcgill
13
45
17
26
berthier a. ribeiro-neto
13
45
17
26
james e. rumbaugh
14
44
18
25
michael r. blaha
14
44
20
23
william j. premerlani
14
44
22
19
frederick eddy
14
44
23
16
william e. lorensen
14
44
24
11
gregory piatetsky-shapiro
14
44
15
28
prabhakar raghavan
16
42
19
24
bernd-holger schlingloff
17
41
19
24
richard c. dubes
19
38
21
22
S.N0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Position down in
NWC-index
-1
-6
-5
-1
-10
-4
-4
-4
-6
-8
-9
-10
-1
-3
-2
-2
In table 4.7, the NWC-index of each author is decreased because of dividing the citations among
co-authors but with decreasing of NWC-index their NWC-index rank has also decreased as
compared to h-index rank. jeffrey d. ullman rank has 5 in h-index and in NWC-index it has 6 so
his position has down by 1 in NWC-index. Scott a. Vanstone’s rank has decreased by 10
positions in NWC-index because his h-index rank has 12 and his NWC-index rank has 22. Same
is the case for other authors as well.
30
Scenario 2:Position down
index Rank
25
20
15
10
5
h-index Rank
0
NWC-index Rank
Authors
Figure 4.4: Scenario 2: Position down with respect to h-index
_____________________________________________________________________________________________
Author Productivity Indexing Via Topic Sensitive Weighted Citations
42
Chapter 4
Experiments
In the above figure, the decreasing bar i.e. h-index rank bar denotes increasing in the ranks of
authors while the increasing bar of NWC-index rank represents decrease in the ranks of authors.
Scenario 3: Position stable with respect to h-index
Table 4.8: Position stable with respect to h-index
S.N0
1
2
3
4
5
6
7
Authors
david e. Goldberg
william g. Cochran
c. a. r. hoare
j. ross quinlan
jiawei han
ricardo a. baeza-yates
edmund m. Clarke
h-index
Rank
1
2
3
4
7
8
10
h-index
95
92
84
64
53
51
49
NWC-index
Rank
1
2
3
4
7
8
10
NWC-index
94
92
84
64
42
39
36
In table 4.8, the ranks of all authors have same in both of the h-index and NWC-index even there
have difference in the h-index and NWC-index of the authors because in NWC-index we have
divided papers citations among co-authors. william g. Cochran and j. ross quinlan are the single
authors of their papers and david e. Goldberg has the maximum citations even he has co-author
in some of his papers so they all have got the same rank in NWC-index as that was in h-index.
12
Scenario 3: Position stable
index Rank
10
8
6
h-index Rank
4
NWC-index Rank
2
0
Authors
Figure 4.5: Scenario 3: Position stable with respect to h-index
4.5.4.2. Comparison of NWC-index with kth-rank index
An author’s NWC-index may be equal to or less than his/her kth-rank index but
there are three scenarios with respect to the ranks of authors in both indices. The NWC-index
cannot be higher than kth-rank index because in our proposed method, the total citations of a
_____________________________________________________________________________________________
Author Productivity Indexing Via Topic Sensitive Weighted Citations
43
Chapter 4
Experiments
paper has divided among its authors and none of them has got the total citations of that paper
except an author with a single-authored paper. Some authors may have NWC-index less than or
equal to kth-rank index but higher rank than that of kth-rank index. Some authors rank may be
equal to kth-rank index and some of them may have less rank as compared to their kth-rank index
rank. Following are the three scenarios of rank up, down and stable.
Scenario 1: Relocation with respect to kth-rank index rank: Rank up
Table 4.9: Position relocation with respect to kth-rank index: Position up
S.N0
1
2
3
4
5
Authors
Kth-rank
gregory piatetsky-shapiro
bernd-holger schlingloff
william j. premerlani
frederick eddy
william e. lorensen
16
20
23
24
25
Kth-rank
index
33
29
25
22
19
NWCindex
28
24
19
16
11
NWC-index
Rank
15
19
22
23
24
Earned position
in NWC-index
+1
+1
+1
+1
+1
In table 4.9, NWC-index of all the authors have decreased but their NWC-index rank has
increased by one position. Since in NWC-index, first author of a paper does not get total citations
of the paper as that in kth-rank index so the NWC-index of authors has decreased but they have
index Rank
got higher rank as compared to their kth-rank.
30
25
20
15
10
5
0
Scenario 1: Position up
Kth-rank
NWC-index Rank
Authors
Figure 4.6: Scenario 1: Position up with respect to kth-rank index
The decreasing bar of NWC-index rank represents the increase in ranks of authors as compared
to their kth-rank.
_____________________________________________________________________________________________
Author Productivity Indexing Via Topic Sensitive Weighted Citations
44
Chapter 4
Experiments
Scenario 2: Relocation with respect to kth-rank index: Rank down
Table 4.10: Position relocation with respect to kth-rank index: Position down
S.N0
1
2
3
4
5
Authors
Kth-rank
edmund m. Clarke
alfred menezes
james e. rumbaugh
usama m. fayyad
michael r. blaha
9
9
10
13
18
Kth-rank
index
46
46
44
39
31
NWC-index
36
33
25
26
23
NWC-index
Rank
10
12
18
17
20
Position down in
NWC-index
-1
-3
-8
-4
-2
In above table the ranks of authors have decreased in NWC-index with the decreasing of NWCindex. Difference in the kth-rank index and NWC-index of authors has greater so the authors
have lost their rank in NWC-index. E.g. james e. rumbaugh has kth-rank index of 44 and after
calculating NWC-index his NWC-index decreased to 25 and his NWC-index rank is also
decreased to 18 from 10.
Scenario 2: Position down
25
index Rank
20
15
10
Kth-rank
5
NWC-index Rank
0
Authors
Figure 4.7: Scenario 2: Position down with respect to kth-rank index
_____________________________________________________________________________________________
Author Productivity Indexing Via Topic Sensitive Weighted Citations
45
Chapter 4
Experiments
Scenario 3: Position stable with respect to kth-rank index
Table 4.11: Position stable with respect to kth-rank index
S.N0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
Authors
Kth-rank
david e. Goldberg
william g. Cochran
c. a. r. hoare
j. ross quinlan
bertrand meyer
jeffrey d. ullman
jiawei han
ricardo a. baeza-yates
gerard salton
rajeev motwani
anil k. jain
christos h. papadimitriou
franco p. preparata
micheline kamber
paul c. van oorschot
michael mcgill
berthier a. ribeiro-neto
prabhakar raghavan
richard c. dubes
scott a. Vanstone
1
2
3
4
5
6
7
8
9
11
12
13
14
15
16
17
17
19
21
22
Kth-rank
index
94
92
84
64
60
58
51
48
46
42
40
39
38
35
33
32
32
30
27
26
NWC-index
94
92
84
64
60
57
42
39
37
34
33
32
31
28
27
26
26
24
22
19
NWC-index
Rank
1
2
3
4
5
6
7
8
9
11
12
13
14
15
16
17
17
19
21
22
In table 4.11, NWC-index of authors has either equal to or less than their kth-rank index but their
ranks in both methods have equal.
Scenario 3: Position stable
25
15
10
scott a.…
richard c.…
prabhakar…
berthier a.…
michael mcgill
micheline…
paul c. van…
franco p. …
christos h.…
anil k. jain
rajeev motwani
gerard salton
ricardo a.…
jiawei han
jeffrey d.…
bertrand meyer
NWC-index Rank
j. ross quinlan
0
c. a. r. hoare
Kth-rank
david e.…
5
william g.…
index rank
20
Authors
Figure 4.8: Scenario 3: Position stable with respect to kth-rank index
_____________________________________________________________________________________________
Author Productivity Indexing Via Topic Sensitive Weighted Citations
46
Chapter 4
Experiments
4.5.5. Scenarios of TSWC-index
TSWC-index has either equal to or less than h-index and kth-rank index. There have
been three scenarios with respect to the ranks of authors in these indices. Some authors have
TSWC-index less than or equal to h-index but higher rank than that of h-index. Some have equal
rank as compared to h-index and some of them have less rank as compared to their h-index rank.
Following are the three scenarios of TSWC-index that shows the rank of authors either rank up
from h-index, rank down or rank stable to h-index.
4.5.5.1. Comparison of TSWC-index with h- index
Topic Sensitive Weighted Citation (TSWC-index) has assigned weighted citations
to co-authors of a paper according to their topic relatedness with the first author of that paper.
Some authors TSWC score has decreased when they have not same topic to the first author.
Some of the authors TSWC score has increased when they have same topic to the first author. hindex on the other hand has assigned full citations of a paper to each of its co-author so they
always have got higher or equal h-index as compared to TSWC-index. There have three
scenarios that have shown the relation of h-index and TSWC-index when TSWC-index has equal
to or less than h-index.
Scenario 1: Relocation with respect to h-index rank: Rank up
Table 4.12: Position relocation with respect to h-index: Position up
S.N0
1
2
3
4
5
6
7
8
9
Authors
bertrand meyer
ricardo a. baeza-yates
edmund m. Clarke
gerard salton
anil k. jain
rajeev motwani
christos h. papadimitriou
usama m. Fayyad
franco p. preparata
h-index
Rank
6
8
10
12
15
15
17
18
19
h-index
61
51
49
46
43
43
41
39
38
TSWCindex
60
42
40
37
36
38
34
29
35
TSWC-index
Rank
5
7
8
10
11
9
13
14
12
Earned Position in
TSWC-index
+1
+1
+2
+2
+4
+6
+4
+4
+7
_____________________________________________________________________________________________
Author Productivity Indexing Via Topic Sensitive Weighted Citations
47
Chapter 4
Experiments
In the above table, authors TSWC-index has less than their h-index but they have higher rank as
compared to h-index. Bertrand meyer h-index has 61. His TSWC-index has decreased to 60
because topic sensitive weighted citations have allocated to him. His rank has increased by 2
positions. Similar is the case with other authors as well. The figure showing the above results has
index Rank
presented below.
20
18
16
14
12
10
8
6
4
2
0
Scenario 1: Position up
h-index Rank
TSWC-index Rank
Authors
Figure 4.9: Scenario 1: Position up with respect to h-index
Scenario 2: Relocation with respect to h-index rank: Rank down
S.N0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Table 4.13: Position relocation with respect to h-index: Position down
h-index
TSWCTSWCPosition down
Authors
h-index
Rank
index
index Rank in TSWC-index
jeffrey d. ullman
5
62
56
6
-1
micheline kamber
9
50
28
15
-6
paul c. van oorschot
11
47
20
21
-10
scott a. Vanstone
12
46
21
20
-8
michael mcgill
13
45
26
16
-3
berthier a. ribeiro-neto
13
45
18
23
-10
james e. rumbaugh
14
44
25
17
-3
michael r. blaha
14
44
23
19
-5
william j. premerlani
14
44
19
22
-8
frederick eddy
14
44
16
25
-11
william e. lorensen
14
44
11
27
-13
gregory piatetsky-shapiro
14
44
24
18
-4
prabhakar raghavan
16
42
17
24
-8
bernd-holger schlingloff
17
41
17
24
-7
richard c. dubes
19
38
15
26
-7
_____________________________________________________________________________________________
Author Productivity Indexing Via Topic Sensitive Weighted Citations
48
Chapter 4
Experiments
Table 4.13 has shown the authors TSWC-index and their rank decreased by calculating TSWCindex. Micheline kamber’s rank has decreased by 6 positions and his TSWC-index has decreased
to 28 from his h-index 50 so the difference is very high which has resulted in decrease of rank as
well. Authors on s.no 7, 8, 9, 10, 11 and 12 have same h-index of 44 and same h-index rank of
14 but after calculating their TSWC-index, their TSWC-index has decreased and became
different because the topic relatedness of each author has checked with the topic of the first
author so in case of not same topic to the first author, the authors NWC score has minimized that
index Rank
has resulted in reduction of their TSWC-index and their rank as well.
30
25
20
15
10
5
0
Scenario 2: Position down
h-index Rank
TSWC-index Rank
Authors
Figure 4.10: Scenario 2: Position down with respect to h-index
Scenario 3: Position stable with respect to h- index
Table 4.14: Position stable with respect to h-index
S.N0
1
2
3
4
5
6
Authors
david e. Goldberg
william g. Cochran
c. a. r. hoare
j. ross quinlan
jiawei han
alfred menezes
h-index
Rank
1
2
3
4
7
11
h-index
TSWC-index
95
92
84
64
53
47
94
92
84
64
42
36
TSWC-index
Rank
1
2
3
4
7
11
In the above table, all authors have same rank. The h-index and TSWC-index of William g.
Cochran, c. a. r. hoare and j. ross quinlan have same because Cochran and quinlan have same
number of citations which shows that they are single-authors in their papers. Citations of hoare
_____________________________________________________________________________________________
Author Productivity Indexing Via Topic Sensitive Weighted Citations
49
Chapter 4
Experiments
have minimized a little bit because of co-authored paper so it did not change his TSWC-index.
Jiawei han and Alfred menezes TSWC-index has decreased but they have same rank.
index Rank
Scenario 3: Position stable
12
10
8
6
4
2
h-index Rank
TSWC-index Rank
0
david e. william g. c. a. r.
goldberg cochran hoare
j. ross
quinlan
jiawei alfred
han menezes
Authors
Figure 4.11: Scenario 3: Position stable with respect to h-index
4.5.5.2. Comparison of TSWC-index with kth-rank index
Both kth-rank index and TSWC index assign weighted citations to authors but the
first author in multi-authored paper receives full citations of the paper in kth-rank index and other
authors receive citations according to their rank. The TSWC-index on the other hand divides the
total citations among authors of a paper using the criteria of topic sensitivity. The TSWC-index
of an author may be either less than or equal to kth-rank index depending on the situation that the
author has same topic to the first author or not and that an author is a single author of a paper or
not. Authors rank in calculating TSWC-index may be less, greater or equal to the kth-rank index.
Scenario 1: Relocation with respect to kth-rank index: Rank up
Table 4.15: Position relocation with respect to kth-rank index: Position up
S.N0
1
2
3
4
5
6
7
8
Authors
Kth-rank
ricardo a. baeza-yates
edmund m. Clarke
rajeev motwani
anil k. jain
franco p. preparata
michael mcgill
scott a. Vanstone
william j. premerlani
8
9
11
12
14
17
22
23
Kth-rank
index
48
46
42
40
38
32
26
25
TSWC-index
42
40
38
36
35
26
21
19
TSWC-index
Rank
7
8
9
11
12
16
20
22
Earned position in
TSWC-index
+1
+1
+2
+1
+2
+1
+2
+1
_____________________________________________________________________________________________
Author Productivity Indexing Via Topic Sensitive Weighted Citations
50
Chapter 4
Experiments
The TSWC-index of the authors in table 4.15 has decreased because the actual citations of paper
have divided among its co-authors. The TSWC-index rank of these authors has increased as
compare to the kth-rank. rajeev motwani’s TSWC-index rank has increased by 2 and the other
authors ranks have also increased.
Scenario 1: Position up
25
index Rank
20
15
10
Kth-rank
TSWC-index Rank
5
0
Authors
Figure 4.12: Scenario 1: Position up with respect to kth-rank index
Scenario 2: Relocation with respect to kth-rank index: Rank down
Table 4.16: Position relocation with respect to kth-rank index: Position down
S.N0
1
2
3
s4
5
6
7
8
9
10
11
12
13
Authors
Kth-rank
alfred menezes
gerard salton
james e. rumbaugh
usama m. fayyad
gregory piatetsky-shapiro
paul c. van oorschot
berthier a. ribeiro-neto
michael r. blaha
prabhakar raghavan
bernd-holger schlingloff
richard c. dubes
frederick eddy
william e. lorensen
9
9
10
13
16
16
17
18
19
20
21
24
25
Kth-rank
index
46
46
44
39
33
33
32
31
30
29
27
22
19
TSWCindex
36
37
25
29
24
20
18
23
17
17
15
16
11
TSWC-index
Rank
11
10
17
14
18
21
23
19
24
24
26
25
27
Position down in
TSWC-index
-2
-1
-7
-1
-2
-5
-6
-1
-5
-4
-5
-1
-2
In the above table, all authors rank has decreased in TSWC-index. Gregory piatetsky-shapiro and
paul c. van oorschot have same kth-rank of 16 and kth-rank index of 33 but their TSWC-index
_____________________________________________________________________________________________
Author Productivity Indexing Via Topic Sensitive Weighted Citations
51
Chapter 4
Experiments
has decreased to 24 and 20 and their ranks to 18 and 21 respectively because their citations have
decreased in topic sensitive weighted citations.
Scenario 2: Position down
30
index Rank
25
20
15
10
5
Kth-rank
0
TSWC-index Rank
Authors
Figure 4.13: Scenario 2: Position down with respect to kth-rank index
Scenario 3: Position stable with respect to kth-rank index
Table 4.17: Position stable with respect to kth-index
S.N0
1
2
3
4
5
6
7
8
9
Authors
Kth-rank
Kth-rank index
TSWC-index
david e. Goldberg
william g. Cochran
c. a. r. hoare
j. ross quinlan
bertrand meyer
jeffrey d. ullman
jiawei han
christos h. papadimitriou
micheline kamber
1
2
3
4
5
6
7
13
15
94
92
84
64
60
58
51
39
35
94
92
84
64
60
56
42
34
28
TSWC-index
Rank
1
2
3
4
5
6
7
13
15
All authors in table 4.17 have same rank in both methods. First five authors have same kth-rank
index and TSWC-index while the subsequent authors have less TSWC-index as compare to the
kth-rank index.
_____________________________________________________________________________________________
Author Productivity Indexing Via Topic Sensitive Weighted Citations
52
Chapter 4
Experiments
Scenario 3: Position stable
16
index Rank
14
12
10
8
6
Kth-rank
4
2
TSWC-index Rank
0
Authors
Figure 4.14: Scenario 3: Position stable with respect to kth-rank index
4.5.6. Comparison of TSWC-index and NWC-index
Some authors have got higher TSWC-index because of same area of interest in which most of
their papers have included, some authors have got less TSWC-index because of not same topic to
the first author so their NWC score has then decreased and some have got equal TSWC-index to
the NWC-index because they have all same topic and their NWC score and TSWC score have
same. Comparison of NWC-index and TSWC-index has given in the following figure.
_____________________________________________________________________________________________
Author Productivity Indexing Via Topic Sensitive Weighted Citations
53
Chapter 4
Experiments
Comparison of Authors NWC-index rank and TSWC-index rank
30
index rank
25
20
15
10
NWC-index rank
5
TSWC-index rank
david e. Goldberg
william g. Cochran
c. a. r. hoare
j. ross quinlan
jeffrey d. ullman
bertrand meyer
jiawei han
ricardo a. baeza-yates
micheline kamber
edmund m. Clarke
paul c. van oorschot
alfred menezes
scott a. Vanstone
gerard salton
michael mcgill
berthier a. ribeiro-neto
frederick eddy
michael r. blaha
william j. premerlani
james e. rumbaugh
william e. lorensen
gregory piatetsky-…
anil k. jain
rajeev motwani
prabhakar raghavan
bernd-holger schlingloff
christos h. Papadimitriou
usama m. Fayyad
richard c. dubes
franco p. preparata
0
Authors
Figure 4.15: Comparison of TSWC-index with NWC-index
Figure 4.15 shows variation in rank of authors after calculating NWC-index and TSWC-index. In
TSWC-index, ranks of those authors have increased which have same topic to the first authors in
most of their multi-authored papers. Authors having not same topic to the first author of the
paper got lower ranks in TSWC-index. Different scenarios of TSWC-index with respect to
NWC-index are shown below.
Scenario 1: Relocation with respect to NWC-index: Rank up
Table 4.18: Position relocation with respect to NWC-index: Position up
S.N0
1
2
3
4
5
6
7
8
9
10
11
Authors
ricardo a. baeza-yates
edmund m. Clarke
rajeev motwani
alfred menezes
anil k. jain
franco p. preparata
usama m. Fayyad
michael mcgill
james e. rumbaugh
michael r. blaha
scott a. Vanstone
TSWCindex Rank
7
8
9
11
11
12
14
16
17
19
20
TSWCindex
42
40
38
36
36
35
29
26
25
23
21
NWCindex
39
36
34
33
33
31
26
26
25
23
19
NWC-index
Rank
8
10
11
12
12
14
17
17
18
20
22
Earned position
in TSWC-index
+1
+2
+2
+1
+1
+2
+3
+1
+1
+1
+2
_____________________________________________________________________________________________
Author Productivity Indexing Via Topic Sensitive Weighted Citations
54
Chapter 4
Experiments
In the above table, Michael mcgill, james e. rumbaugh and Michael r. blaha have TSWC-index
equal to their NWC-index because they have same score of NWC and TSWC but their ranks
have increased by one position in TSWC-index. Their score of TSWC has not decreased because
they have same topic to their first author and as well as co-authors. The remaining authors have
greater TSWC-index than their NWC-index and higher ranks as well. Their NWC score have
increased because some of their co-authors NWC score have minimized due to not same topic to
first author so we have decreased their co-authors NWC and increased their NWC score.
Scenario 1: Position up
25
index Rank
20
15
TSWC-index Rank
10
NWC-index Rank
5
0
Authors
Figure 4.16: Scenario 1: Position up with respect to NWC-index
Scenario 2: Relocation with respect to NWC-index: Rank down
Table 4.19: Position relocation with respect to NWC-index: Position down
S.N0
1
2
3
4
5
6
7
8
9
Authors
gerard salton
gregory piatetsky-shapiro
paul c. van oorschot
berthier a. ribeiro-neto
prabhakar raghavan
bernd-holger schlingloff
frederick eddy
richard c. dubes
william e. lorensen
TSWCindex Rank
10
18
21
23
24
24
25
26
27
TSWC
-index
37
24
20
18
17
17
16
15
11
NWC
-index
37
28
27
26
24
24
16
22
11
NWC-index
Rank
9
15
16
17
19
19
23
21
24
Position down
in TSWC-index
-1
-3
-5
-6
-5
-5
-2
-5
-3
_____________________________________________________________________________________________
Author Productivity Indexing Via Topic Sensitive Weighted Citations
55
Chapter 4
Experiments
In table 4.19, all authors TSWC-index rank has decreased. gerard salton, frederick eddy and
william e. lorensen have TSWC-index same to their NWC-index but their TSWC-index rank has
decreased by 1, 2 and 3 respectively. Their rank has decreased because other authors TSWCindex has increased and resulted in increase in their rank as well. Similarly gregory piatetskyshapiro, paul c. van oorschot, berthier a. ribeiro-neto, prabhakar raghavan, bernd-holger
schlingloff and richard c. dubes have not same topic to first author in some of their multiauthored papers so their NWC score have decreased which have further decreased their TSWCindex and their rank as well.
Scenario 2: Position down
30
index rank
25
20
15
10
5
TSWC-index Rank
0
NWC-index Rank
Authors
Figure 4.17: Scenario 2: Position down with respect to NWC-index
_____________________________________________________________________________________________
Author Productivity Indexing Via Topic Sensitive Weighted Citations
56
Chapter 4
Experiments
Scenario 3: Position stable with respect to NWC-index
Table 4.20: Position stable with respect to NWC-index
S.N0
Authors
1
2
3
4
5
6
7
8
9
10
david e. Goldberg
william g. Cochran
c. a. r. hoare
j. ross quinlan
bertrand meyer
jeffrey d. ullman
jiawei han
christos h. papadimitriou
micheline kamber
william j. premerlani
TSWC-index
Rank
1
2
3
4
5
6
7
13
15
22
TSWC-index
NWC-index
94
92
84
64
60
56
42
34
28
19
94
92
84
64
60
57
42
32
28
19
NWC-index
Rank
1
2
3
4
5
6
7
13
15
22
Authors in table 4.20 have same rank in their NWC-index and TSWC-index. william g. Cochran,
c. a. r. hoare, j. ross quinlan, micheline kamber and william j. premerlani have 34194, 28227,
16859, 3363 and 1593 NWC and TSWC score respectively so they have same NWC-index and
TSWC-index. They have same topic to their first author and to all of co-authors in their multiauthored papers so their TSWC score neither increased nor decreased and they got same rank as
well. David e. Goldberg, bertrand meyer and jiawei han TSWC score have minimized by few
numbers so it did not affect their TSWC-index and rank. jeffrey d. ullman TSWC score has
decreased to 12878 from 13175 which has decreased his TSWC-index by one number and
christos h. Papadimitriou TSWC score has increased that has increased his TSWC-index to 34
while his NWC-index was 32 but their rank in both indices have remained same.
_____________________________________________________________________________________________
Author Productivity Indexing Via Topic Sensitive Weighted Citations
57
Chapter 4
Experiments
Scenario 3: Position stable
25
index rank
20
15
TSWC-index Rank
10
NWC-index Rank
5
0
Authors
Figure 4.18: Scenario 3: Position stable with respect to NWC-index
_____________________________________________________________________________________________
Author Productivity Indexing Via Topic Sensitive Weighted Citations
58
Chapter 5
Conclusions
_____________________________________________________________________________________________
Author Productivity Indexing Via Topic Sensitive Weighted Citations
59
Chapter 5
Conclusion
5. Conclusions
It is important to consider topics of co-authors when weighted citations are assigned to them
in multi-authored paper. To evaluate scientists on the basis of their topic based contribution, we
have proposed NWC-index and TSWC-index. NWC-index of each author has calculated like that
of h-index after the allocation of Normalized Weighted Citations (NWC) score to authors in
multi-authored papers according to their rank. Topic of each author has checked in each paper
against its first author and in case of not same topic their NWC score has reduced and the
remaining same topic authors NWC score has increased and then their TSWC-index has
calculated like h-index. We have compared the results of both proposed methods that have
clearly shown the effects on ranking of authors and variations in index according to the
allocation of Topic Sensitive Weighted Citations to authors. Our results have also shown the
effects on ranking of authors and variations in indices with respect to kth-rank index and h-index.
Our analysis has shown that an author with single-authored papers has got the full citations score
and his/her NWC-index and TSWC-index have same with h-index and kth-rank index. Future
work can be to find equal contributions of authors in multi-authored paper i.e. if all of authors of
a paper have contributed equally to that paper. Another future work can be to minimize
coauthors weights according to their correlation of topics with first author. If a coauthor topic is
closely correlated with first author topic then his/her weight should be minimized by a smaller
amount and if his/her topic is hardly correlated with first author topic then the weight should be
minimized more.
_____________________________________________________________________________________________
Author Productivity Indexing Via Topic Sensitive Weighted Citations
60
References
References
[1] Abbas, A.M. 2011. Weighted indices for evaluating the quality of research with multiple
authorship. Scientometrics, 88, 1, 107–131.
[2] Abramo, G., D’Angelo, C.A. and Rosati, F. 2013. The importance of accounting for the
number of coauthors and their order when assessing research performance at the
individual level in the life sciences. Journal of Informetrics, 7, 1, 198–208.
[3] Adler, R., Ewing, J. and Taylor, P. 2008. Citation Statistics. A report from the
International Mathematical Union (IMU).
[4] Alonso, S., Cabrerizo, F.J., Herrera-Viedma, E. and Herrera, F. 2010. hg-Index: A new
index to characterize the scientific output of researchers based on the h- and g-indices.
Scientometrics, 82, 2, 391-400.
[5] Anania, G. and Caruso, A. 2013. Two simple new bibliometric indexes to better evaluate
research in disciplines where publications typically receive less citations. Scientometrics,
96, 2, 617-631.
[6] Aziz, N.A. and Rozing, M.P. 2013. Profit (p)-Index: The Degree to Which Authors Profit
from Co-Authors. PLoS ONE, 8, 4, e59814. doi:10.1371/journal.pone.0059814.
[7] Batista, P.D., Campiteli, M.G., Kinouchi, O. and Martinez, A.S. 2006. Is it possible to
compare researchers with different scientific interests? Scientometrics, 68, 1, 179-189.
[8] Blei, D.M., Ng, A.Y. and Jordan, M.I. 2003. Latent Dirichlet Allocation. JMLR, 3, 9931022.
[9] Bornmann, L., Mutz, R. and Daniel, H.D. 2008. Are there better indices for evaluation
purposes than the h-index? A comparison of nine different variants of the h-index using
data from biomedicine. Journal of the American Society for Information Science and
Technology, 59, 5, 830-837.
[10] Burrell, Q.L. 2007. Hirsch’s h-index: a stochastic model. Journal of Informatics, 1, 1, 16–
25.
[11] Cabrerizo, F.J., Alonso, S., Herrera-Viedma, E. and Herrera, F. 2010. q2-Index:
Quantitative and Qualitative Evaluation Based on the Number and Impact of Papers in
the Hirsch Core. Journal of Informatics, 4, 1, 23-28.
_____________________________________________________________________________________________
Author Productivity Indexing Via Topic Sensitive Weighted Citations
61
References
[12] Carbone, V. 2011. Fractional counting of authorship to quantify scientific research
output. arxiv:1106.0114v1.
[13] Chai, J.C., Hua, P.H., Rousseau, R. and Wan, J.K. 2008. The Adapted Pure h-Index. In
Proccedings of WIS 2008, Berlin Fourth International Conference on Webometrics,
Informetrics and Scientometrics & Ninths COLLNET Meeting.
[14] DBLP
Bibliography
Database:
DBLP-Citation-network
V5
,
http://arnetminer.org/citation.
[15] Egghe, L. 2006. Theory and Practice of the g-index. Jointly published by Akadémiai
Kiadó, Budapest and Springer, Dordrecht, Scientometrics, 69, 1, 131-152.
[16] Egghe, L. 2006. An Improvement to H-index: The G-index. ISSI News-Letter, 2, 1, 8-9.
[17] Egghe, L. 2008f. Mathematical theory of the h-index and g-index in case of fractional
counting of authorship. Journal of the American Society for Information Science and
Technology, 59, 10, 1608–1616.
[18] Garfield, E. 1999. Journal Impact Factor: a brief review. CMAJ, 161, 979-980.
[19] Hagen, N.T. 2008. Harmonic allocation of authorship credit: Source-level correction of
bibliometric bias assures accurate publication and citation analysis. PLoS One, 3, 12,
e4021.
[20] Hirsch, J.E. 2005. An index to quantify an individual’s scientific research output. In
Proceedings of the National Academy of Sciences, 102, 46, 16569-16572.
[21] Hirsch, J.E. 2010. An index to quantify an individual’s scientific research output that
takes into account the effect of multiple coauthorship. Scientometrics, 85, 3, 741–754.
[22] Jin, B.H. 2006. h-Index: An evaluation indicator proposed by scientist. Science Focus, 1,
1, 8–9.
[23] Jin, B.H. 2007. The AR-index: complementing the h-index. ISSI Newsletter, 3, 1, 6.
[24] Jin, B.H., Liang, L.M., Rousseau, R. and Egghe, L. 2007. The R- and AR- indices:
Complementing the h-index. Chinese Science Bulletin, 52, 6, 855-863.
[25] Kennedy, D. 2003. Multiple authors, multiple problems. Science, 301, 733.
[26] Kosmulski, M. 2006. A new Hirsch-type index saves time and works equally well as the
original h-index. ISSI Newsletter, 2, 3, 4–6.
[27] Liu, X.Z. and Fang, H. 2012. Fairly sharing the credit of multi-authored papers and its
application in the modification of h-index and g-index. Scientometrics, 91, 1, 37–49.
_____________________________________________________________________________________________
Author Productivity Indexing Via Topic Sensitive Weighted Citations
62
References
[28] Liu, X.Z. and Fang, H. 2012b. Modifying h-index by allocating credit of multi-authored
papers whose author names rank based on contribution. Journal of Informetrics, 6, 4,
557–565.
[29] Schreiber, M. 2008a. To share the fame in a fair way, hm modifies h for multi-authored
manuscripts. New Journal of Physics, 10, 040201, 1-9.
[30] Schreiber M. 2008b. A modification of the h-index: The h(m)-index accounts for multiauthored manuscripts. Journal of Informetrics, 2, 3, 211–216.
[31] Schreiber, M. 2009. Fractionalized counting of publications for the g-Index. Journal of
the American Society for Information Science, 60, 10, 2145–2150.
[32] Seglen, P.O. 1997. Why the impact factor of journals should not be used for evaluating
research. British Medical Journal, 314, 7079, 498-502.
[33] Sekercioglu, C.H. 2008. Quantifying coauthor contributions. Science, 322, 371.
[34] Wan, J.K., Hua, P.H. and Rousseau, R. 2007. The pure h-index: calculating an author’s hindex by taking co-authors into account. COLLNET Journal of Scientometrics and
Information Management, 1, 2, 1-5.
[35] Zhang, C.T. 2009. The e-Index, Complementing the h-Index for Excess Citations. PLoS
ONE 4, 5, e5429. doi:10.1371/journal.pone.0005429.
[36] Zhang, C.T. 2009. A proposal for calculating weighted citations based on author rank.
EMBO Reports, 10, 5, 416–417.
_____________________________________________________________________________________________
Author Productivity Indexing Via Topic Sensitive Weighted Citations
63