Author Productivity Indexing via Topic Sensitive Weighted Citations Submitted By: Shabnam Bibi 589-FBAS/MSCS/F09 Supervised By: Dr. Ali Daud Assistant Professor Department of Computer Science and Software Engineering Faculty of Basic and Applied Sciences International Islamic University, Islamabad Author Productivity Indexing via Topic Sensitive Weighted Citations Dissertation submitted in partial fulfillment of requirements for the award of degree of MS in Computer Science Department of Computer Science & Software Engineering Faculty of Basic and Applied Sciences International Islamic University, Islamabad By: Shabnam Bibi October 2013 Dedicated to my beloved grandfather and my Parents Department of Computer Science and Software Engineering, International Islamic University Islamabad, Pakistan Date: / /2013 Final Approval This is to certify that we have read and evaluated the thesis entitled “Author Productivity Indexing via Topic Sensitive Weighted Citations” submitted by Shabnam Bibi under Reg No. 589-FBAS/MSCS/F09. It is our judgment that this thesis is of sufficient standard to warrant its acceptance by International Islamic University, Islamabad for the degree of MS in Computer Science. Committee External Examiner Dr. Waseem Shahzad ____________________________________ Assistant Professor FAST – National University of Computer and Emerging Sciences Islamabad Internal Examiner Miss Zakia Jalil Lecturer International Islamic University Islamabad Supervisor Dr. Ali Daud Assistant Professor International Islamic University Islamabad _____________________________________ _____________________________________ Declaration I hereby certify that the work presented in this thesis is, to the best of my knowledge and belief, contains no previously published or written material by another person, except where due reference has been made in the text, and that the material has not been submitted, either in whole or in part, for a degree at this or any other university. It is further declared that the work was done under the dexterous guidance of my supervisor Dr. Ali Daud. I acknowledge that I have read and understood the University’s rules, requirements, procedures and policy relating to my higher degree research award and to my thesis. I certify that I have compiled with the rules, requirements, procedures and policy of the University (as they may be from time to time). Name: _______________________________________ Signature: ____________________________________ Date: ________________________________________ ACKNOWLEDGMENT In the name of Allah, Most Gracious, Most Merciful All praises to the Allah Almighty whose Divine help and guidance enabled me to accomplish this tedious task. His blessings and perpetual succor have always surrounded me amidst the darkness and helplessness despite my disobedience and ignorance. Besides, it gives me pleasure to acknowledge the paternal affection of my supervisor Dr. Ali Daud for his continuous advise, support and encouragement throughout this work. It was due to his erudite and creative guidance that I decided to write my thesis on (Author Productivity Indexing via Topic Sensitive Weighted Citations). It is due to his consistent encouragement and immense help that enabled me to complete this work. I am grateful to Department of Computer Science and Software Engineering IIU Islamabad and faculty members for providing healthy environment for research. I pay my heart-felt gratitude to my teachers as well as fellow graduate students for their continuous motivational support. My dearest friends Miss Saba Gul, Miss Nodia Athar, Miss Nabila Naz, Miss Ambreen Fatima and Miss Saima Tariq truly deserve my special thanks for their selfless support for me which is not less than a blessing in this world of chaos. Finally I am perpetually obliged to my parents, my grandfather and whole family. Their endless support encouragement and motivation have been a true source of strength and inspiration for me. Abstract Different author productivity indexing methods have been proposed in order to rank scientists on the basis of their research work. Some of these methods have used scientist’s publications with total number of citations received by these publications for ranking purpose e.g. h-index. Some methods have assigned weighted citations to authors in multi-authored paper according to their contribution in the paper. As there have no ground truth about the actual contribution of authors so the contribution of authors has been judged by their rank in the paper e.g. kth –rank index. The author productivity indexing methods proposed so far have not considered the topic based contribution of authors for assigning them weighted citations in a multi-authored paper. We have proposed two methods to deal with this limitation, First method is NWC-index that has assigned Normalized Weighted Citations (NWC) score to co-authors of a paper according to their rank by dividing its total citations among them. Second method is TSWC-index that has assigned Topic Sensitive Weighted Citations to authors of a paper according to their topic relatedness. Topic of co-authors in each paper against its first author has been checked and if they have same topic then their Normalized Weighted Citations score has increased and if they do not have same topic like the first author then their Normalized Weighted Citations score has decreased. We have used h-index and kth-rank index as our baseline methods and compared the results of our proposed methods with baseline methods. The results of our proposed methods clearly show the difference among author’s full citations score, weighted citations score and topic sensitive weighted citations score. Table of Contents Chapter 1 ............................................................................................................................................... 1 1. Introduction ......................................................................................................................................... 2 1.1. Author Productivity Indexing ........................................................................................................ 2 1.2. Why we use author productivity indexing ..................................................................................... 3 1.3. Weighted Citation ......................................................................................................................... 4 1.4. Topic sensitive weighted citation (our idea)................................................................................... 4 1.5. Research Contribution................................................................................................................... 4 1.6. Thesis Outline ............................................................................................................................... 5 Chapter 2 ............................................................................................................................................... 6 2. Literature Review ................................................................................................................................ 7 2.1. Classification of Author Productivity Indexing .............................................................................. 8 2.1.1. Concept Matrix of the Indexing Methods ................................................................................ 9 2.1.2. Citations Based Indexing ...................................................................................................... 10 2.1.3. High Citations Based Indexing ............................................................................................. 11 2.1.4. Time Based Indexing ........................................................................................................... 12 2.1.5. Excess Citations Based Indexing .......................................................................................... 12 2.1.6. Co-authors and Weighted Citations Based Indexing.............................................................. 13 2.1.7. Topic Based Indexing........................................................................................................... 18 2.2. Problem Statement ...................................................................................................................... 18 2.3. Objective of Research ................................................................................................................. 19 Chapter 3 ............................................................................................................................................. 20 3. Methodology ..................................................................................................................................... 21 3.1. Baseline Methods........................................................................................................................ 21 3.1.1. h-index ................................................................................................................................. 21 3.1.2. kth-rank Index ...................................................................................................................... 21 3.2. Proposed Method ........................................................................................................................ 22 3.3. Latent Dirichlet Allocation .......................................................................................................... 26 Chapter 4 ............................................................................................................................................. 30 4. Experiments ...................................................................................................................................... 31 4.1. Data set....................................................................................................................................... 31 4.2. Development Tools and Programming language.......................................................................... 31 4.3. Parameter settings ....................................................................................................................... 32 4.3.1. Parameters for LDA ............................................................................................................. 32 4.3.2. Parameter for h-index ........................................................................................................... 32 4.4. Baseline Methods........................................................................................................................ 32 4.4.1. H-index ................................................................................................................................ 32 4.4.2. Kth-rank index ...................................................................................................................... 32 4.5. Results and Discussions .............................................................................................................. 32 4.5.1. H-index ................................................................................................................................ 33 4.5.2. Kth-rank index ...................................................................................................................... 34 4.5.3. Proposed methods NWC-index and TSWC-index ................................................................. 35 4.5.4. Scenarios of NWC-index ...................................................................................................... 40 4.5.5. Scenarios of TSWC-index .................................................................................................... 47 4.5.6. Comparison of TSWC-index and NWC-index ...................................................................... 53 Chapter 5 ............................................................................................................................................. 59 5. Conclusions ....................................................................................................................................... 60 References............................................................................................................................................. 61 List of Tables Table 2.1: Concept Matrix ....................................................................................................................... 9 Table 3.1: NWC and TCWC scores of authors having one not same topic author ................................... 25 Table 3.2: NWC and TSWC scores of authors of more than one not same topic authors ......................... 25 Table 3.3: NWC and TSWC scores of same topic authors in paper......................................................... 26 Table 3.4: h-index, NWC-index, TSWC-index and kth-rank index .......................................................... 26 Table 4.1: Rank of authors by their h-index ........................................................................................... 33 Table 4.2: Rank of authors by their kth-rank index .................................................................................. 34 Table 4.3: Rank of authors by their NWC-index .................................................................................... 35 Table 4.4: Rank of authors by their TSWC-index ................................................................................... 36 Table 4.5: Rank of authors by their TSWC-index and variations with NWC-index ................................. 38 Table 4.6: Position relocation with respect to h-index: Position up ......................................................... 40 Table 4.7: Position relocation with respect to h-index: Position down .................................................... 42 Table 4.8: Position stable with respect to h-index ................................................................................... 43 Table 4.9: Position relocation with respect to kth-rank index: Position up ............................................... 44 Table 4.10: Position relocation with respect to kth-rank index: Position down ......................................... 45 Table 4.11: Position stable with respect to kth-rank index ....................................................................... 46 Table 4.12: Position relocation with respect to h-index: Position up ....................................................... 47 Table 4.13: Position relocation with respect to h-index: Position down .................................................. 48 Table 4.14: Position stable with respect to h-index ................................................................................. 49 Table 4.15: Position relocation with respect to kth-rank index: Position up ............................................. 50 Table 4.16: Position relocation with respect to kth-rank index: Position down ......................................... 51 Table 4.17: Position stable with respect to kth-index ............................................................................... 52 Table 4.18: Position relocation with respect to NWC-index: Position up ................................................ 54 Table 4.19: Position relocation with respect to NWC-index: Position down ........................................... 55 Table 4.20: Position stable with respect to NWC-index .......................................................................... 57 List of Figures Figure 2.1: Classification of Author Productivity Indexing....................................................................... 8 Figure 3.1: Latent Dirichlet allocation.................................................................................................... 28 Figure 4.1: Comparison of h-index, kth-rank index, NWC-index and TSWC-index ................................. 39 Figure 4.2: Comparison of h-index with NWC-index ............................................................................. 40 Figure 4.3: Scenario 1: Position up with respect to h-index .................................................................... 41 Figure 4.5: Scenario 3: Position stable with respect to h-index ............................................................... 43 Figure 4.6: Scenario 1: Position up with respect to kth-rank index ........................................................... 44 Figure 4.7: Scenario 2: Position down with respect to kth-rank index ...................................................... 45 Figure 4.9: Scenario 1: Position up with respect to h-index .................................................................... 48 Figure 4.11: Scenario 3: Position stable with respect to h-index ............................................................. 50 Figure 4.12: Scenario 1: Position up with respect to kth-rank index ......................................................... 51 Figure 4.13: Scenario 2: Position down with respect to kth-rank index .................................................... 52 Figure 4.14: Scenario 3: Position stable with respect to kth-rank index.................................................... 53 Figure 4.15: Comparison of TSWC-index with NWC-index .................................................................. 54 Figure 4.16: Scenario 1: Position up with respect to NWC-index ........................................................... 55 Figure 4.17: Scenario 2: Position down with respect to NWC-index ....................................................... 56 Figure 4.18: Scenario 3: Position stable with respect to NWC-index ...................................................... 58 Chapter 1 Introduction _____________________________________________________________________________________________ Author Productivity Indexing Via Topic Sensitive Weighted Citations 1 Chapter 1 Introduction 1. Introduction Researcher’s success in any field is judged by his/her productivity and its impact. Productivity is the number of papers a researcher has published. It is the quantitative aspect of research. Impact is the number of citations that the publications have received so it is the qualitative aspect of research. The research work produced by researchers are published in different journals and presented in conferences. This research work either published or unpublished is acknowledged by others in their work as a reference and get cited. 1.1. Author Productivity Indexing To know about prominent researchers, to measure the performance of individuals in research and to rank journals and conferences, a method of evaluation is needed. This method of evaluation is known as author productivity indexing. Different author productivity indexing methods have proposed to fairly evaluate the research productivity of researchers and removed the limitations of existing methods. At the start, Impact Factor abbreviated as IF [18] was used as a tool to quantify the research work produced. It is used to compare, evaluate and rank journals instead of ranking and evaluating researchers. A journal’s IF is the average number of citations received by papers published in that journal in the previous two years. For comparing individual papers, people use impact factor of that journal in which the paper has published. It does not mean that a paper with high IF will receive high citations as well so this is not fair ranking of individual papers. Some methods are used for author productivity indexing which only consider the total number of publications published by researchers. Thus they only deal with the quantitative aspect of research because only the productivity is considered and the impact of publications is not considered. Some methods consider the total number of citations received by the papers that measures the impact. Other indexing methods have proposed for evaluating the scientific performance of individuals, comparing researchers in same field and in different fields and for ranking them directly. Different techniques have proposed for this purpose in which different aspects of research work have considered. Almost all of techniques do consider number of papers published by researcher and total number of citations received by those papers to evaluate a scientist research performance. When citations are counted then different conditions are kept in _____________________________________________________________________________________________ Author Productivity Indexing Via Topic Sensitive Weighted Citations 2 Chapter 1 Introduction observation like in case of one author paper, the author will get all of the credit. The average number of authors on scientific papers is increasing because complicated problems need more different subspecialties [25]. In case when multiple authors have contributed to a paper then some techniques are needed to assign them credit according to their contributions. One of the well-known indexing methods named h-index [20] was proposed in 2005 that is a single valued index, used for evaluating the scientific performance of researchers. It measures the total number of papers and total number of citations received by those papers. Hindex was insensitive towards highly cited papers [15], [26] so g-index [15][16] and h(2)-index [26] were proposed later that were an enhancement to h-index and had removed its limitation of insensitivity towards highly cited papers. Different variations to the h-index and g-index were proposed later to overcome some of their limitations and add improvements like A-index [22], R and AR-indices [23][24], m-index [9], e-index [35], k and w [5] etc. Flaw of these author productivity indexing is that they all assign the total citations of a paper to each of its author in case of multi-authored paper even the contribution of all authors in a paper are not same. To remove this limitation, some techniques were proposed that consider number of collaborators that worked together and assigned them credit according to their contributions (by considering different criteria) like hI-index [7], fractional h and g indices [17], hp-Index [34], hap-index [13], hm-index [29][30], harmonic h-index [19], kth-rank [33], w [36], gm-index [31], ĥ-index [21], CCA h and g indices [27], hmc [28], k-norm and w-norm [5] etc. Some techniques were proposed to consider researcher’s career length like m-quotient [10]. Some indices based on the combination of existing indices like hg-index [4] and q2-index [11] were proposed to keep advantages of them collectively and remove their disadvantages. 1.2. Why we use author productivity indexing Author productivity indexing is a good method of ranking researchers for the following reasons. To hire good faculty in universities and institutes. To find paper reviewers for conferences and journals. To find experts for projects reviewing committee. To find good researchers for collaboration. _____________________________________________________________________________________________ Author Productivity Indexing Via Topic Sensitive Weighted Citations 3 Chapter 1 Introduction There may be the case that a researcher receives noble prize by publishing just one paper with exceptional performance in his/her field i.e. noble prize winner work will not be considered to evaluate using these indices [20]. 1.3. Weighted Citation Weighted citation is a quantitative scheme that is used to measure contribution of an author or a researcher in a multi-authored paper. A weighted criteria is used to assign citations of a paper to authors of that paper. The citations of paper are allocated to authors based on their contribution. Different Indexing methods have proposed which account for the number of coauthors and collaborators and assign weighted citations and credit to authors according to their contribution. 1.4. Topic sensitive weighted citation (our idea) The weighted criteria of contributions do not assign weights to the researchers according to their relatedness to that topic. Topic sensitive weighted citation means that weighted citations are assigned to authors of a paper according to their topic relatedness. For example if a paper on a topic named machine learning has four authors A1, A2, A3, A4 and it has cited 50 times. Author A1, A2 and A4 have the same topic of machine learning and author A3 has another topic instead of machine learning. In this case author A3 has limited knowledge and contribution to this topic as compare to A1, A2, and A4 so 50 citations must be divided among these authors in such a way that author A1, A2 and A4 can get maximum citations according to their rank in the paper and author A3 can get minimum number of citations as he/she has not this topic. 1.5. Research Contribution By studying the indexing methods for assessing the research work of scientists, we have noticed that all of these methods did not consider topic relatedness of authors for assigning them weighted citations of a paper. We have decided to overcome this limitation in our research. Our contribution is to assign Topic Sensitive Weighted Citations to authors in multi-authored papers. We have assigned weighted citations to authors of a paper by considering topic sensitivity as a key factor for evaluating researchers work. The proposed index has then calculated for researchers and the results have compared with our baseline methods. _____________________________________________________________________________________________ Author Productivity Indexing Via Topic Sensitive Weighted Citations 4 Chapter 1 Introduction 1.6. Thesis Outline Rest of our thesis is arranged in the following manner. Chapter 2: In chapter 2, literature review of the existing author productivity indexing methods have given in detail. Their contribution and drawbacks have discussed. After literature review of the indexing methods used for assessing the research work, a problem statement is formulated on the basis of these methods and objective of research is discussed. Chapter 3: In this chapter, we have described in detail the methodology used for our proposed solution and baseline methods. Chapter 4: In this chapter, we have discussed our data set and the experiment that we have done on the data set. We have discussed the results of our proposed idea also. We have compared the results with baseline methods. Chapter 5: In this chapter, conclusion of our research is presented. Research contribution and future work is discussed. _____________________________________________________________________________________________ Author Productivity Indexing Via Topic Sensitive Weighted Citations 5 Chapter 2 Literature Review _____________________________________________________________________________________________ Author Productivity Indexing Via Topic Sensitive Weighted Citations 6 Chapter 2 Literature Review 2. Literature Review Many techniques have been proposed for the evaluation of scientific performance of researchers. We have studied a number of research papers of different researchers in which different techniques and indices have proposed for ranking of researchers on the bases of their productivity and impact. As discussed in chapter 1, all these indices and techniques are author productivity indexing. Author productivity indexing methods use publications of authors as a base for assessment of their research work. In this chapter, these indices and techniques are discussed in detail along with their strong points and weaknesses. At the start, Impact Factor was used. Impact Factor abbreviated as IF [18] was proposed by Eugene Garfield in 1960. It was used for selection of journals for Science Citation Index (SCI) in order to compare and rank journals. A journal’s IF is the average number of citations received by papers published in that journal in the previous two years. If a journal has IF of 4 in 2013 then it means that each paper published by it in 2011 and 2012 received 4 citations on average in 2013. Journal with high IF is considered more significant. IF of 2013 will be calculated as: Total number of citations of papers during 2013 which have published in 2011 and 2012 will be divided by the total number of papers published in 2011 and 2012. Different types of problems are associated with IF [32], like it does not statistically represent the individual articles of journal, their real citations, citations on the average are assigned to noncitable articles and many more as well. Same score i.e. journal impact factor is assigned to all articles of journal if some of them received more citations, some received less and to those which did not receive any citation at all. Thus an article published in high IF journal with less number of citations is considered more effective as compared to the one published in low IF journal with more citations. People use impact factor of journal in which the paper has published for comparing individual papers. Using IF to compare individual papers is a dangerous misuse and when it is realized that to substitute IF for individual article, citation counts makes no sense then it follows that using IF to evaluate the authors of those articles makes no sense [3]. Some indexing methods _____________________________________________________________________________________________ Author Productivity Indexing Via Topic Sensitive Weighted Citations 7 Chapter 2 Literature Review consider only the total number of publications published for evaluating the research work of researchers and ignored the impact of publications. Some consider the citations received by publications thus they measure the impact of publications. 2.1. Classification of Author Productivity Indexing We have categorized author productivity indexing methods according to the aspects discussed there. The categorization is as follows: Author Productivity Indexing Citations High Citations Co-authors and Weighted Citations Figure 2.1: Excess Citations Time Based Topic Based Classification of Author Productivity Indexing Figure 2.1 briefly demonstrates the indexing methods we have studied and included in our literature review part. We have categorized these methods in different classes on the basis of different aspects discussed. These indexing methods have made improvements in the existing indexing methods from time to time. _____________________________________________________________________________________________ Author Productivity Indexing Via Topic Sensitive Weighted Citations 8 Chapter 2 Literature Review 2.1.1. Concept Matrix of the Indexing Methods Table 2.1: Concept Matrix Index h-index [20] g-index [15][16] h(2)-index [26] hI-index [7] m-quotient [10] A-index [22] R-index [24] AR-index [23][24] Fractional h and g indices [17] hp-Index [34] hap-index [13] hm-index [29][30] Harmonic h-index [19] m-index [9] kth-rank [33] w-index [36] e-index [35] gm-index [31] ĥ-index [21] hg-index [4] q2-index [11] Fractional counting of authorship [12] Positional and equal weights scheme [1] CCA h and g indices [27] hmc-index [28] k and w indices [5] k-norm and w-norm [5] WFO, WFI [2] p-index [6] High Citations × × × × Co-authors and Weighted Citations × × × × × × × Excess Citations × × × × × × × × Time Based × × × × × × Topic Based × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × × Citations The concept matrix shown in table 2.1 shows which indexing method has addressed which of the aspects. All of the indexing methods use publications as a key factor for assessing the research work of scientists. Citations received by publications are used in different forms like high citations count, weighted citations and excess citations count which have shown the previous indexing methods flaws of not addressing these aspects and have made improvements. Some methods have involved time factor of researchers to assess the scientists work according to their scientific age. We have studied these indexing methods with their contributions, limitations and advantages. We pointed out that none of the indexing method discussed the topic based weight assigning scheme for authors i.e. if an author in a multi-authored paper has that topic on which the paper has published and the rest have not that topic then that author get high weighted _____________________________________________________________________________________________ Author Productivity Indexing Via Topic Sensitive Weighted Citations 9 Chapter 2 Literature Review score and the others have punished for it and their weighted scores have minimized. We have proposed TSWC-index to consider topics of authors for assigning weighted citations to them. 2.1.2. Citations Based Indexing To rank scientists based on their productivity and its impact, different indexing methods have proposed. An index for evaluating the scientific performance of researchers called h-index [20] was proposed by Hirsch JE in 2005. Unlike IF, it directly measures productivity (total publications or papers) and impact (total citations of the papers). The papers are arranged in descending order of their received citations until the paper rank or paper number N is less than or equal to the citations received by that paper. The subsequent papers have less or equal citations each. Hirsch defines h-index as: A scientist has index h if N of his or her papers have ≥ h citations each and other (N-h) papers have ≤ h citations each. It is a single number criteria for evaluating the scientific output of researchers and easy to understand but this can’t be used to evaluate the work of Noble Prize winner. In h-index, once the papers are considered for defining h-index then it becomes insignificant whether these papers go on with more citations or no citations at all. Later, different indexing methods were proposed which removed its different limitations. Some of the limitations discussed by other authors are: it is insensitive to highly cited papers [15]. h-index unnoticed the career length of researcher [10]. Zhang CT [35] criticized h-index for two reasons. One for excess citations being ignored and second is that the h-index is natural number so different researchers may have same h-index. hindex was also criticized for its insensitivity to the number of co-authors and their contributions [33],[36],[17],[7],[27][30],[34] and [13]. The author who proposed h-index also noticed it and proposed ℎ-index [21] that takes into account the number of co-authors. This index is defined as that a scientist has ℎ-index if his/her ℎ papers belong to his/her ℎ core and a paper belongs to ℎ core if it has at least ℎ citations and that paper also belongs to the ℎ core of each of the co-author of the paper. Thus one can say that a full credit may be allocated to each author or no credit at all and senior authors are favored. _____________________________________________________________________________________________ Author Productivity Indexing Via Topic Sensitive Weighted Citations 10 Chapter 2 Literature Review 2.1.3. High Citations Based Indexing L. Egghe proposed g-index [15][16] which is more sensitive to highly cited papers. gindex measures the performance of researcher or journal by dealing with the top papers and by counting the number of citations they received. Its calculation is as simple as that of h-index. Papers are arranged in descending order of citations received and g-index is the highest number “g” of papers that jointly received “g2” or more than “g2” citations. If we compare h and g indices of a researcher then “g” will be greater than or equal to “h”. Another variation of g-index and h-index was proposed by Kosmulski M [26] known as h(2)-index which gives more weight to highly cited papers as that of g-index. It is defined as the highest natural number such that his/her h(2) most cited papers received each at least [h(2)]2 citations. For example if an author has h(2)index of 10 then it means that the author has published at least 10 papers among which each has cited 100 times. h(2)-index of an author will always less than his/her h-index so it will require less work for the verification of authorship of the relevant papers, especially in case when different scientists first and last names are same. Later, Jin [22] proposed A-index which is the average number of citations of published papers included in h core. The total number of citations of papers is divided by h to obtain Aindex. A-index and g-index, both do consider the total number of citations and highly cited papers included in h core thus A-index can be increased if h-index remains same while citations of papers get increased. Jin et al [24] noticed that calculating A-index, the total number of citations of papers in h core is divided by h-index hence a better researcher is punished because of higher h-index. To get rid of this problem Jin et al [24] proposed R-index which is calculated by taking the square root of the total number of citations of papers in h core. A variation of A-index called m-index, based on median instead of average was proposed by Bornmann et al [9]. m-index is the median number of citations of published papers included in the h-core. Median is calculated by arranging papers in h-core by their decreasing order of number of citations and then choosing the middle one. The median has chosen because citation counts distribution is usually skewed. Some indices based on the combination of existing indices were proposed which combined their advantages and removed their disadvantages. One of them is hg-index [4]. hg-index is based on the combination of h-index and g-index and was proposed by Alonso et al [4]. It keeps _____________________________________________________________________________________________ Author Productivity Indexing Via Topic Sensitive Weighted Citations 11 Chapter 2 Literature Review their advantages and has removed their disadvantages. hg-index is defined as the geometric mean of h and g indices. It is good for comparing researchers of same h-index with different number of papers and citations. It takes highly cited papers into account and also minimizes the impact of single highly cited papers. In order to combine the number of papers with impact of papers Cabrerize et al [11] proposed q2-index. It is the geometric mean of h-index and m-index. h-index gives information of papers while m-index shows the impact of papers in h-core so the advantages of two indices are combined. Its computation is simple and it will be increased in case of increase in any one of the two indices. 2.1.4. Time Based Indexing Burrel [10] noticed that h-index unnoticed the career length of researcher and proposed m-quotient which depends on researcher’s career length. A researcher whose career began some years ago and till now he/she is active. In order to calculate m-quotient, the h-index is divided by career length i.e. total number of years since published first paper. After proposing R-index [24], Jin et al [23][24] also proposed age dependent R-index named AR-index. AR-index is defined as taking the square root of sum of average citations of papers per year included in h core. AR-index takes into account the age of papers with total number of citations in h core thus it allows that with the passage of time, a researcher index may be decreased. 2.1.5. Excess Citations Based Indexing Zhang CT [35] criticized h-index for two reasons. One for excess citations being ignored and second is that the h-index is natural number so different researchers may have same h-index. To solve these limitations, he proposed complement of h-index called e-index. e-index represents ignored excess citations and is used to evaluate the scientific output of researchers having same h-index. e-index is calculated as: = ∑ −ℎ (2.1) _____________________________________________________________________________________________ Author Productivity Indexing Via Topic Sensitive Weighted Citations 12 Chapter 2 ∑ Literature Review is citations received by papers from rank 1 to h-index. h2 is the square root of h-index. e-index is real number that ranges between 0 and ∞ so for fair comparison of researchers, eindex must be used with h-index because both give complete information of citations in h-core. In order to moderately evaluate scientific output in fields of social sciences and humanities where research papers get fewer citations, Anania et al [5] proposed two indices named k-index and w-index. Both indices are defined over real numbers. The integer part is equal to h-index and fractional part denotes the excess citations. k-index only accounts the total citations of one’s most cited papers i.e. papers included in h-core. k-index is calculated as: =ℎ+ 1− ℎ ∑ , ,.. , ∀ℎ > 0 (2.2) And k = 0 when h = 0. w-index is defined as: = ℎ+ 1−ℎ , ∀ℎ > 0 (2.3) And w = 0 when h=0. The w-index accounts for the total citations of all the papers published. So both indices get increased when the papers get more and more citations. 2.1.6. Co-authors and Weighted Citations Based Indexing As h-index is used to compare researchers of the same field, Batista et al [7] proposed hI index for comparison of researchers of different fields like Physics, Chemistry, Biology, Mathematics etc. hI considers the co-authors of the paper as well. hI is actually the number of papers that a researcher has worked alone and has got at least hI citations. hI is calculated as h2 divided by total number of authors in h papers. h2 is the square of h-index. If a researcher has index h and he/she has worked alone in those papers then total number of authors will be equal to h and hence dividing h2 by h will result h so in this case he/she will have hI equal to h and in case of co-authors, hI will be less than h. its limitations are that its value get decreased when papers of many authors in h-core get cited more times and it is also restricted to h-core so single-authored papers between hI and h are not taken in account [30][29]. h and g indices were also criticized due to their insensitivity to the number of co-authors by Egghe L [17] and proposed fractional h _____________________________________________________________________________________________ Author Productivity Indexing Via Topic Sensitive Weighted Citations 13 Chapter 2 Literature Review and g indices by dealing with information of co-authors of papers in fractional way by two methods. a) By fractional papers counts in which the citation scores remains unchanged so the papers order are not changed. Each paper takes () rank in case of (i ) authors of that paper so the entire ranking of papers i.e. 1, 2, 3 . . . are replaced by ( ) , ( ) + ( ) , ( ) + ( ) + ( ) and so on respectively. Formula for fractional h-index i.e. hF is the largest rank r(fractional) such that ℎ =∑ () ≤ (2.4) Fractional g-index i.e. gF is the largest rank r(fractional) such that = ∑ () ≤∑ (2.5) b) By fractional citation counts in which an author receives has citations and ( ) authors. After calculating () () credit in i paper if that paper i.e. fractional citations for each paper, the papers are then arranged by citations received in descending order and fractional h and g indexes are then calculated. The formula for calculating fractional h-index i.e. hf is the largest rank r such that ≥ℎ (2.6) Formula for calculating fractional g-index i.e. gf is the largest rank r such that ∑ () ≥ (2.7) In some cases the gF value get increased which is not reasonable because the citations of paper are distributed among co-authors so the value of g-index should be reduced in case of fractional counting [31]. It’s another disadvantage is that the papers are rearranged after calculating fractional citation for each paper thus the highly cited papers having many authors are dropped out of the core [30]. Authors [7] and [17] considered the number of co-authors in papers for calculation of hI index and fractional h, g indexes respectively. Wan et al [34] _____________________________________________________________________________________________ Author Productivity Indexing Via Topic Sensitive Weighted Citations 14 Chapter 2 Literature Review proposed another variation of h-index named pure h-index denoted by hP that not only take into account the number of co-authors of a paper but also the author’s relative rank in paper. This approach is also applied to R-index, called pure R-index denoted by RP which considers the total number of citations in h core as well. For calculating hP, normalized score for each author in a paper is calculated such that summation of scores of all authors in a paper must be equal to one. For normalized scoring, fractional counting, proportional counting and geometric counting etc may be used. Rp-index is same as R-index but all of citations of papers of an author are divided by the average number of co-authors of that author in h core. Chai et al [13] criticized hp-index as it is in favor of multi-authored papers and proposed an improvement to the hp-index called Adapted pure h-index denoted by hap. This index is in favor of single-authored papers. It also changed the h-core to hap-core if all articles citations are greater than or equal to hap. Schreiber M [29][30] proposed a modification of h-index called hm-index that removed the limitations of hIindex and hf-index. It is defined as the reduced number of papers that have cited hm or more times. Each paper is counted fractionally according to the inverse of the number of authors called effective rank, denoted by reff. And hm is calculated as: ℎ = max ( )≤ ( ) (2.8) C(r) is total citations of a paper. Calculating hm-index does not require changing order or rearrangement of papers and its value is increased when papers of many authors in h-core get more citations. Another method of sharing a paper credit among all of its co-authors was proposed by Hagen [19] that is called harmonic credit allocation. In this method, each author receives credit according to his/her rank and number of co-authors ‘N’ such that first author receives most credit and authors with increasing ranks get less credit i.e the ith author more credit than (i+1)th author. As the numbers of authors get increased, credit per author gets decreased. The ith author credit is calculated as: 1 1 + 1 2 + ⋯ + (1 ) (2.9) _____________________________________________________________________________________________ Author Productivity Indexing Via Topic Sensitive Weighted Citations 15 Chapter 2 Literature Review Then the credit of ith author is multiplied with the original citations of that paper to obtain his/her contribution in determining the harmonic h-index. Another varient of h-index known as kth-rank index was proposed by Sekercioglu C. H [33] that quantify co-authors contributions according to the rank of authors in that paper. According to Sekercioglu , the kth-ranked co-author contribute 1/k as much as the first author i.e. in case of 3 authors for a paper, the first author will contribute 1 because he/she has rank 1. Second author will contribute 1/2 and third author will contribute 1/3 and so on. Zhang C.T [36] also criticized h-index for its insensitivity to the number of coauthors and their individual contribution according to their ranks in paper. For this purpose he proposed weighted h-index ‘w’ which is based on weighted citations. He also criticized kth-rank index [33] that the last author is corresponding author so first and last author should take full credit and the rest co-authors credit is decreased such that the sum of weights for these authors is one. We can say that the idea of weighted h-index based on weighted citations was proposed in order to quantify the contribution of researchers according to their rank in particular article. Later Schreiber M [31] proposed a modification of g-index named gm-index which is also the enhancement of gF [17] but in gm-index fractional counting of papers always results reduction of the g-index. It is defined as the highest effective rank that is less or equal to the effective number of citations on average. Another variation regarding to fractionally quantify the research output in case of multiple authors was proposed by Carbone V [12]. In this approach, the total number of citations of each paper is divided by the square root of the number of co-authors of that paper. Like fractional counting, this approach also divides the citations of paper equally among all of its co-authors and it does not take into consideration the rank of authors. Other enhancements to the h-index which are known as weighted h-index and weighted citation h-cuts were proposed by Abbas A M [1] which takes into account the number of authors of paper. He proposed weight assignment schemes. One assign weights to authors according to their positions in a paper and another one assign them weights equally. The weighted contributions of authors are taken into account in weighted h-indices and excess citation count is taken into account in weighted citation h-cuts. Another variation of the h-index to allocate credit to authors in multi-authored paper according to their contribution is proposed by Liu et al [27]. This method of allocating credit is between fractional counting and harmonic credit allocation so it is called combined credit allocation (CCA) method. The indices proposed are known as CCA h-index, hc and CCA g_____________________________________________________________________________________________ Author Productivity Indexing Via Topic Sensitive Weighted Citations 16 Chapter 2 Literature Review index, gc. The first author and the corresponding authors are the most important authors or MIAs so the order of authors is arranged and are tie for the first rank. Their normalized credit allocated proportion in a paper is calculated as: ( , ) = ∑ (2.10) ∑ Where is the total number of MIAs in a paper and n is the total number of authors in that paper. Value of q is set to 2/3. The normalized credit allocated proportion of the rth-author in n-authored paper is: ( , ) = ∑ (2.11) If a paper has cited c1 times then the citations allocated to rth-author will be c1 × P(r,n). the allocated citations produced by this method replace the original citations of each paper for r thauthor in determining hc and gc. CCA h-index needs reordering of papers when citations are allocated to each author in multi-authored paper [28]. To remove this limitation Liu et al [28] proposed a modification of h-index known as hmc-index. hmc-index uses the framework of hmindex [29][30] but instead of fractional counting, it uses combined credit allocation (CCA) method so that authors with different contributions gain different credit. This method uses effective paper count instead of effective rank. The effective paper count is the summation of normalized credit allocated proportion that an author has obtained in r papers. ( ) =∑ ( ( ), ( )) (2.12) Where rank (i ) is the author rank in ith paper and n(i) is the number of authors in ith paper and hmc is defined as: ℎ = max( ( ) ≤ ( )) (2.13) In some cases an author with lower contribution get higher hmc-index. To remove this, Rational hmc denoted as hmcr-index is used to assign higher hmcr to author with more contribution. The k and w indices proposed in [5] were also normalized to accounts for co-authors by normalized citations. k and w indices have modified in k-norm and w-norm indices. Normalized citations for a researcher in multi-authored paper have obtained by dividing the total citations of _____________________________________________________________________________________________ Author Productivity Indexing Via Topic Sensitive Weighted Citations 17 Chapter 2 Literature Review each paper by its author number. Abramo et al [2] measured the distortion introduced in performance ranking of individuals in case when the number of co-authors, their rank and contribution is ignored in multi-authored papers. They proposed two yearly productivity measures, Weighted Fractional Output (WFO) and Weighted Fractional Impact (WFI). WFO is papers based weight of co-author according to his/her rank and WFI considers citations of papers as well. Another method of correcting the citations of co-authors in multi-authored papers, Aziz et al [6] have introduced harmonic weighting algorithm, considering order and number of coauthors contributed in the paper and proposed “profit (p)-index” which calculate approximately the extent to which authors get profit from co-authors contribution. It shows that co-authors contribution to the work of an author is significant. P-index value ranges from 0 to 1 and its higher value specify greater contribution of co-authors to an author’s papers. 2.1.7. Topic Based Indexing Topic based indexing means an indexing technique that assigns weighted citations to co-authors of a paper according to their topics. None of the indexing method has discussed this issue. We have chosen this as our proposed idea of assigning weights to co-authors so that if an author has not that topic on which the paper has published then we have to minimize his/her weight and have to maximize the weighted score of other authors of that topic according to their rank in that paper. 2.2. Problem Statement If four authors A1, A2, A3 and A4 have contributed to a paper on machine learning. Author A1, A2 and A4 have that topic while A3 has not that topic then it is clear that contributions of A1, A2 and A4 have more weights than that of A3 because if a researcher having that topic of interest and contributed to a paper on that topic then his/her contribution is very useful as compare to the one contributed to that paper but having another topic of interest. By studying different author productivity indexing methods in literature review, we noticed that all of them did not consider topic based contribution of co-authors in multi-authored papers and they did not assign weighted citations to authors according to their topics. Topic sensitive weighted citations will show the worth of researchers’ contributions in a paper on the basis of their topics. We have proposed a quantitative method that will assign weighted citations to co-authors according to their topics. _____________________________________________________________________________________________ Author Productivity Indexing Via Topic Sensitive Weighted Citations 18 Chapter 2 Literature Review 2.3. Objective of Research As all of the Author Productivity Indexing methods discussed in literature review had not considered topics of authors for assigning them weighted citations. The objective of our research is to handle this limitation of all of the previous indexing methods and to introduce our new index. Following are the main objectives of our research. To assign Normalized Weighted Citations (NWC-index) to co-authors of a paper. To propose TSWC-index (Topic Sensitive Weighted Citations index) that increase or decrease NWC score of co-authors of a paper according to their relatedness to that topic. To compare the results of our proposed methods with baseline methods, h-index and kth-rank index. We have used LDA for making clusters of authors based on their papers titles. Clusters have made to know about authors topics and to calculate TSWC-index based on their topics. _____________________________________________________________________________________________ Author Productivity Indexing Via Topic Sensitive Weighted Citations 19 Chapter 3 Methodology _____________________________________________________________________________________________ Author Productivity Indexing Via Topic Sensitive Weighted Citations 20 Chapter 3 Methodology 3. Methodology Methodology refers to the methods applied to a specific field of study. In this chapter we will discuss the methodology used for our research. We have chosen h-index and kth-rank index as our baseline methods. We have used our proposed method to implement the dataset to prove our proposed solution. The results have then compared with the baseline methods. 3.1. Baseline Methods 3.1.1. h-index h-index proposed by Hirsch J E [20] to assess the scientific productivity of researchers. Hirsch defined h-index as: “A scientist has index h if h of his or her N papers have at least h citations each and the other (N − h) papers have ≤ h citations each”. Mathematically . ℎ = Nc.t is the sum of citations of all papers of scientist. (3.1) is proportionality constant that ranges between 3 and 5. We have set its value to 4. For example scientist A1 has published 10 papers and the total numbers of citations received by these papers are 392 then h= √ = 9.89 h-index assigns all of the citations of a paper to each of its co-authors. 3.1.2. kth-rank Index kth-rank Index was proposed by Sekercioglu C. H [33] that quantify co-authors contributions according to the rank of authors in that paper. According to Sekercioglu , the k thranked co-author contribute 1/k as much as the first author i.e. in case of 3 authors paper, 1st author will contribute 1 because he/she has rank 1, 2nd author will contribute 1/2 and 3rd author will contribute 1/3 respectively. If a paper has received 50 citations then 1st author will receives 50 citations, 2nd author will receives 25, 3rd will receives 17 citations. The combined numbers of citations assigned to co-authors of the paper are 92 which are greater than the total number of citations (50) received by the paper. For this purpose we have proposed a method of dividing the _____________________________________________________________________________________________ Author Productivity Indexing Via Topic Sensitive Weighted Citations 21 Chapter 3 Methodology total citations received by a paper in such a way that authors receive citations according to their rank. 3.2. Proposed Method As none of the indexing method discussed in literature review has covered topic based weighted citations for authors in multi-authored papers. We have proposed two methods of assigning weights to authors. First is Normalized Weighted Citations (NWC) that assigns weighted citations to authors of a paper. Names of authors are arranged in a paper according to their contribution so this method divides the total number of citations received by a paper among its authors according to their rank in the paper. Second method is Topic Sensitive Weighted Citations (TSWC) that increases or decreases NWC score of authors on the basis of their topics. The topic similarity is checked if an author of a paper has not same topic to the first author of that paper then his/her NWC score is minimized and NWC score of the first author and other authors having same topic to first author is maximized. We have also calculated NWC-index and TSWC-index and compared the results with h-index and kth-rank Index. Before calculating NWC and TSWC, we have made clusters of authors on the basis of their papers titles to know about their topics. A topic having the maximum probability for an author is considered his/her topic of interest. For clusters making, we have used Latent Dirichlet Allocation (LDA). Algorithm Step 1: Make author wise 100 clusters based on their papers using LDA. Assume an author topic to that one for which he/she has the maximum probability. Step 2: Calculate Normalized Weighted Citations (NWC) score of each author in each paper. NWCi = ∑ × , i is the rank of author in a paper and N is the total number of authors of that paper. Cit is the total citations received by that paper. _____________________________________________________________________________________________ Author Productivity Indexing Via Topic Sensitive Weighted Citations 22 Chapter 3 Methodology Step 3: Check topic of authors in each paper with its first author’s topic. 1) For same topic author, Use NWCi calculated in step 2. 2) For not same topic author, decrease their NWCi as NWCi / 2 and round the result. Step 4: Calculate weights of same topic authors in the value calculated in step 3_2) of not same topic author as: NWCj = ∑ × , j is rank of same topic author and NWC is the value calculated in step 3_2). Step 5: Calculate Topic Sensitive Weighted Citations (TSWC) of same topic author by adding the value calculated in step 4 to step 3_1) for that author. Use the value calculated in step 3_2) as TSWC of not same topic author. Step 6: Calculate NWC-index and TSWC-index of each author as that of h-index. Step 7: Compare NWC-index and TSWC-index to baseline methods. Note: In case of more than one not same topic authors in a paper, the values calculated in step 3_2) are summed up and then step 4 is calculated. Example: A paper has four authors A, B, C and D with rank 1, 2, 3 and 4 respectively. Citations of paper are 15. First author i.e. A has the maximum contribution in the paper so he/she has that topic in which the paper has included. Topic of B and D have also same to A’s topic while topic _____________________________________________________________________________________________ Author Productivity Indexing Via Topic Sensitive Weighted Citations 23 Chapter 3 Methodology of C has not same to A. According to step 2 of proposed method, the NWC score of each author is: A’s NWC = ˟ 15 = 6 B’s NWC = ˟ 15 = 5 C’s NWC = ˟ 15 = 3 D’s NWC = ˟ 15 = 2 According to step 3, author B and D have same topic to A so their NWC score will be 5 and 2 respectively and A’s NWC score will 6 as calculated in step 2. C’s topic has not same to A’s topic so his/her NWC score has decreased as: C’s NWC = 3/2 = 2 According to step 4, calculate weights of authors A, B and D in the NWC score of C calculated in step 3 as: A’s NWC = ˟2=1 B’s NWC = ˟2=1 D’s NWC = ˟2=0 In step 5, we will calculate TSWC score of A, B and D by adding their weights calculated in step 4 to their NWC score of step 3: A’s TSWC = 6+1 = 7 B’s TSWC = 5+1 = 6 D’s TSWC = 2+0 = 2 We will use the value calculated in step 3 for C C’s TSWC = 2 _____________________________________________________________________________________________ Author Productivity Indexing Via Topic Sensitive Weighted Citations 24 Chapter 3 Methodology The following tables show the NWC and TSWC of co-authors in a paper. Topic column identify the area of interest (topic) of an author either true or false. For first author of a paper, the topic will always true in which the paper has included while for other co-authors of that paper we will check their topics and will assign weights to them related to the topic of first author. In the following table 3.1, A1 is first author because of his maximum contribution in the paper so his/her topic has true for the paper. It means that he/she has that area of interest and we have no doubt about his/her contribution in the paper. Contribution of other co-authors will be judged by their topic similarity to the first author. A2 and A4 have same topic to author A1 and A3 has not same topic to A1. There may be the case that a co-author’s papers may be clustered in different clusters or topics. In that case, we will consider that one topic for the author for which he/she has the maximum probability. Table 3.1: NWC and TCWC scores of authors having one not same topic author Author A1 A2 A3 A4 Topic True True False True Cit 15 15 15 15 6 5 3 2 7 6 2 2 Rank 1 2 3 4 In the above table, we have decreased the TSWC of A3 because he/she has not same topic to A1. A2 and A4 have same topic to A1. The TSWC of A1, A2 and A4 have then increased as they have same topic of interest. Another example is presented in table 3.2 in which more than one author have different topic from author A1. Table 3.2: NWC and TSWC scores of authors of more than one not same topic authors Author A1 A5 A6 A7 Topic True False False False Cit 15 15 15 15 6 5 3 2 12 3 2 1 Rank 1 2 3 4 In table 3.2, the TSWC of A5, A6 and A7 have decreased and only TSWC of A1 has increased by assigning the combined citations of A5, A6 and A7. Another example is presented in table 3.3 that consists of three authors having same topic. _____________________________________________________________________________________________ Author Productivity Indexing Via Topic Sensitive Weighted Citations 25 Chapter 3 Methodology Table 3.3: NWC and TSWC scores of same topic authors in paper Author A4 A8 A1 Topic True True True Cit 16 16 16 8 5 3 8 5 3 Rank 1 2 3 All of the three authors of paper in table 3.3 have same topic so we have not decreased their NWC score. They have same NWC and TSWC scores. The h-index, NWC-index, TSWC-index and kth-rank index of A1, A2 and A4 are: Table 3.4: h-index, NWC-index, TSWC-index and kth-rank index Author A1 A2 A4 Cit h-index NWC 46 15 31 3 1 2 20 5 10 NWCindex 2 1 1 TSWC 27 6 9 TSWCindex 2 1 1 Kth-rank 35 8 20 Kth-rank index 2 1 2 In the above table, author A1 has h-index of 3. After calculating NWC and TSWC his/her NWCindex and TSWC-index became 2 which is less than h-index and equal to kth-rank index. Author A2 NWC-index and TSWC-index remained same as that of h-index and kth-rank index which is 1. A4’s h-index and kth-rank index is 2 that is greater than his/her NWC-index and TSWC-index which is 1. In our proposed methods, the NWC-index and TSWC-index will always less or equal to the h-index and kth-rank index because the citations are divided among authors of a paper and none of the author receives total citations in multi-authored paper except the one in a singleauthor paper. This phenomenon leads the NWC-index and TSWC-index always minimum or equal to the baseline methods. 3.3. Latent Dirichlet Allocation Hard clustering assigns a document to one cluster without dealing with semantics. This problem motivate topic modeling technique based on latent topic layer. Topic model is used for discovering topics in a collection of documents. It generates soft clusters and deals with semantics of words in documents. Latent Dirichlet Allocation (LDA) proposed by Blei et al [8] is used in topic modeling. It clusters the words into topics. Latent topic allows document to more than one cluster. K-Mean algorithm is an example of hard clustering and it assigns only one topic to an author. We have used generative probabilistic model LDA for making clusters of authors on the basis of their papers titles which considers multiple topics of an author. Generative _____________________________________________________________________________________________ Author Productivity Indexing Via Topic Sensitive Weighted Citations 26 Chapter 3 Methodology probabilistic model generates observable data randomly if hidden parameters are specified. It presents a joint probability distribution over label sequences and observation. Multinomial distribution for each document d is sampled over topics Z. Multinomial distribution is actually the generalization of binomial distribution, where binomial distribution is composed of n independent trails in which each has a number of possible outcomes such that the combined probability of all of the outcomes is 1. Basic terminology and notation used in LDA are word, document and corpus. A word is an item in vocabulary. It is a unit of discrete data denoted by “w”. A document is a collection of words denoted by W ={w1, w2, . . . . . . .wn}.where wn is the nth word. Corpus is the collection of documents denoted by D ={W1, W 2, . . . . . W M}. Latent topic layer Z ={Z1, Z2, Z3,……..Zi}. where Zi is latent topic in document vector d words wd. Dirichlet parameter α is topic distribution per document and parameter β is word distribution per topic. LDA suppose following generative process for each document W in corpus D. 1. Select N ~ Poisson (ξ). 2. Select ~ Dirichlet distribution (Dir (α)). 3. For each of the N words wn: a) Select a topic Zn~Multinomial ( ). b) Select a word wn from p (wn | Zn, β), a multinomial probability conditioned on the topic Zn. In basic model, following assumptions are made. The dimensionality k and z of Dirichlet distribution and topic variable respectively is supposed known and fixed. Word probabilities k x V matrix is β (βij = P(wj = 1| zi = 1)) Realistic document length distribution may be used when needed and Poisson supposition is not critical to something that follows. _____________________________________________________________________________________________ Author Productivity Indexing Via Topic Sensitive Weighted Citations 27 Chapter 3 Methodology Figure 3.1: Latent Dirichlet allocation Boxes are plates. Outer plate represents documents and inner plate represents words and topics repeated in document. There are N words w and z topics in a document. Unshaded and shaded circles represent hidden variables and observable variables respectively. For each word, inner plate will iterate N time. M represents the number of iteration for each document. For parameters α and β, variational EM Algorithm [8] is used. Parameters α and β are given which are corpus level parameters and sampled once in generating corpus. A k dimensional random dirichlet variable is such that ∑ = 1 . Following probability density is used. P( | α) = Γ ∑ ∏ Γ( ) …. Number of N topics z, N words w, joint distribution of a topic mixture P( ,z,w|α,β) = P( | α) ∏ P(zn| ) is p(z |θ)p(w |z , β) for value of that is unique , (3.2) is given by: (3.3) = 1. Summing z and integrating , a document marginal distribution is obtained as: P(W|α,β)=∫ p(θ|α)(∏ ∑ (z | ) ( |z , ))d (3.4) _____________________________________________________________________________________________ Author Productivity Indexing Via Topic Sensitive Weighted Citations 28 Chapter 3 Methodology Finally, corpus probability is calculated by multiplying the marginal probabilities of documents. P(D|α,β)= ∏ ∫ p(θ |α) ∏ ∑ p(z |θ )p(w |z , β) dθ (3.5) is document level variable that sampled once per document. Zdn and wdn are word level variables that sampled once for each word in a document. LDA assumes that topics generate words. Words are the only observable data in documents which shows that each document consists of some topics and that each word is created by one of the document’s topic. Firstly, multinomial distribution θ over topics for each document d is randomly sampled with parameter α. Secondly, a topic z is chosen for each word w from this topic distribution and finally word w is generated by randomly sampling from topic specific multinomial distribution ɸ . A word w (in document d) generating probability for LDA is: P(w|d, θ, ɸ) = ∑ ( |z, ɸ ) ( |d, θ ) (3.6) _____________________________________________________________________________________________ Author Productivity Indexing Via Topic Sensitive Weighted Citations 29 Chapter 4 Experiments _____________________________________________________________________________________________ Author Productivity Indexing Via Topic Sensitive Weighted Citations 30 Chapter 4 Experiments 4. Experiments In this chapter we will discuss the experiments we have done on our data set. We will discuss their results also. We have organized this chapter in four parts. First, we will discuss the data set that we have used in our experiments. Second, we will discuss parameter settings for baseline methods and proposed methods. Third, we will briefly discuss the baseline methods that we have chosen for results comparisons. Fourth, we will discuss the results of our proposed methods and we will compare the results with baseline methods. 4.1. Data set We have chosen DBLP-Citation-network V5 data set for our experiments from arnetminer.org [14]. DBLP is computer science bibliography that consists of about 1.8 million publications of about 1 million authors. Our data set consists of following data variables. Paper ID Authors Title Citations The size of our data consists of: Number of Authors: 127410 Number of publications: 100000 We have preprocessed the titles by removing stop words, punctuations and numbers to get correct results. Stop words list is given on internet that consists of a, an, the, is, are, than, of, to, in, on etc. 4.2. Development Tools and Programming language We have preprocessed our data set using PHP. We have used WampServer2.2a-x32 Windows web development environment for preprocessing of our data set. We have implemented LDA using C++ language and for calculation of our proposed indices we have used JAVA programming language. _____________________________________________________________________________________________ Author Productivity Indexing Via Topic Sensitive Weighted Citations 31 Chapter 4 Experiments 4.3. Parameter settings 4.3.1. Parameters for LDA In our experiments, clusters of authors have made based on the titles of papers using LDA. Corpus level parameters, α and β have set to 50/z and 0.01 respectively. The value of topics z has set to 100 with respect to our data set. We have selected 100 topics on the basis of human judgment of meaningful topics in addition with perplexity measured [8] which estimates the performance of probabilistic topic models with the lower the best. 4.3.2. Parameter for h-index h-index is calculated by dividing the total number of citations of an author’s papers by a proportionality constant “a” and then taking its square root. “a” ranges between 3 and 5. We have selected 4 as the value of a. 4.4. Baseline Methods 4.4.1. H-index One of our baseline methods is h-index that is a well-known index used for the assessment of researcher’s work. H-index does not consider the individual contribution of authors in papers and topic sensitivity for assigning weighted citations based on their individual contribution. We have considered these issues in our proposed indices named NWC-index and TSWC-index. We have chosen 30 authors for analysis purposes. 4.4.2. Kth-rank index Second baseline method is kth-rank index that has considered the individual contribution of each author in a paper and assigned weighted citations to authors according to their contributions in the paper. In this method, first author of a paper always get all citations of that paper as that of each author in h-index while the others receive less citations as compare to the first author. This method has not discussed topic sensitivity in assigning weighted citations to authors that we have proposed in TSWC-index. We have chosen 30 authors for analysis purposes. 4.5. Results and Discussions In this part, we will compare the results of our proposed indices NWC-index and TSWCindex to baseline methods. We will discuss about the authors who have gained same rank and _____________________________________________________________________________________________ Author Productivity Indexing Via Topic Sensitive Weighted Citations 32 Chapter 4 Experiments less rank as that in the baseline methods. Before calculating NWC-index, we have made clusters of authors based on their papers using LDA to know about their topics that have further used in calculating TSWC-index. Comparison with baseline methods 4.5.1. H-index h-index assigns all citations of a paper to each author of that paper. We have chosen 30 authors for analysis. Following table shows ranks of authors calculated using h-index. Table 4.1: Rank of authors by their h-index S.N0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 Rank 1 2 3 4 5 6 7 8 9 10 11 11 12 12 13 13 14 14 14 14 14 14 15 15 16 17 17 18 19 19 Authors david e. Goldberg william g. Cochran c. a. r. hoare j. ross quinlan jeffrey d. ullman bertrand meyer jiawei han ricardo a. baeza-yates micheline kamber edmund m. Clarke paul c. van oorschot alfred menezes scott a. Vanstone gerard salton michael mcgill berthier a. ribeiro-neto frederick eddy michael r. blaha william j. premerlani james e. rumbaugh william e. lorensen gregory piatetsky-shapiro anil k. jain rajeev motwani prabhakar raghavan bernd-holger schlingloff christos h. Papadimitriou usama m. Fayyad richard c. dubes franco p. preparata Citations 36114 34194 28234 16859 15825 14960 11376 10628 10144 9759 9022 8940 8606 8499 8440 8379 7966 7966 7966 7966 7966 7846 7554 7489 7293 6999 6913 6268 6064 6010 h-index 95 92 84 64 62 61 53 51 50 49 47 47 46 46 45 45 44 44 44 44 44 44 43 43 42 41 41 39 38 38 _____________________________________________________________________________________________ Author Productivity Indexing Via Topic Sensitive Weighted Citations 33 Chapter 4 Experiments In table 4.1, the authors are ranked according to their h-index. The highest h-index of author has ranked 1, 2nd highest h-index of author has ranked 2 and so on. 4.5.2. Kth-rank index Kth-rank index assigns citations of a paper to its co-authors according to their ranks in the paper. Thus it assigns weighted citations to authors of a paper unlike to h-index which assigns total citations of paper to each of its co-author even their contribution in the paper is not same. Only the first author in a co-authored paper receives the total citations of that paper in kth rank index. Following table shows ranks of authors calculated using kth-rank index. Table 4.2: Rank of authors by their kth-rank index S.N0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 Rank Authors Citations 1 2 3 4 5 6 7 8 9 9 9 10 11 12 13 13 14 15 16 16 17 17 18 19 20 21 22 23 24 25 david e. Goldberg william g. Cochran c. a. r. hoare j. ross quinlan bertrand meyer jeffrey d. ullman jiawei han ricardo a. baeza-yates edmund m. Clarke alfred menezes gerard salton james e. rumbaugh rajeev motwani anil k. jain usama m. fayyad christos h. papadimitriou franco p. preparata micheline kamber gregory piatetsky-shapiro paul c. van oorschot michael mcgill berthier a. ribeiro-neto michael r. blaha prabhakar raghavan bernd-holger schlingloff richard c. dubes scott a. Vanstone william j. premerlani frederick eddy william e. lorensen 36114 34194 28234 16859 14960 15825 11376 10628 9759 8940 8499 7966 7489 7554 6268 6913 6010 10144 7846 9022 8440 8379 7966 7293 6999 6064 8606 7966 7966 7966 Kth-rank citations 36097 34194 28231 16859 14868 13835 10734 9438 8741 8701 8499 7966 7324 6685 6239 6149 5985 5048 4611 4580 4220 4162 3983 3639 3480 3032 2880 2655 1992 1593 Kth-rank index 94 92 84 64 60 58 51 48 46 46 46 44 42 40 39 39 38 35 33 33 32 32 31 30 29 27 26 25 22 19 _____________________________________________________________________________________________ Author Productivity Indexing Via Topic Sensitive Weighted Citations 34 Chapter 4 Experiments 4.5.3. Proposed methods NWC-index and TSWC-index NWC-index is one of our proposed methods which divides the citations received by a paper in its co-authors in such a way that first author receives highest citations; the second author receives the second highest citations and so on. In this method, none of the co-author receives total citations of a paper except the one in a one-authored paper. Following table shows the ranks of authors calculated by NWC-index. Table 4.3: Rank of authors by their NWC-index S.N0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 Rank Authors Citations NWC NWC-index 1 2 3 4 5 6 7 8 9 10 11 12 12 13 14 15 15 16 17 17 17 18 19 19 20 21 22 22 23 24 david e. Goldberg william g. Cochran c. a. r. hoare j. ross quinlan bertrand meyer jeffrey d. ullman jiawei han ricardo a. baeza-yates gerard salton edmund m. Clarke rajeev motwani anil k. jain alfred menezes christos h. papadimitriou franco p. preparata micheline kamber gregory piatetsky-shapiro paul c. van oorschot usama m. Fayyad michael mcgill berthier a. ribeiro-neto james e. rumbaugh prabhakar raghavan bernd-holger schlingloff michael r. blaha richard c. dubes william j. premerlani scott a. Vanstone frederick eddy william e. lorensen 36114 34194 28234 16859 14960 15825 11376 10628 8499 9759 7489 7554 8940 6913 6010 10144 7846 9022 6268 8440 8379 7966 7293 6999 7966 6064 7966 8606 7966 7966 36085 34194 28227 16859 14845 13175 7095 6311 5686 5469 4875 4386 4358 4231 3987 3363 3201 3081 2872 2813 2753 2655 2423 2317 2124 2021 1593 1445 1062 531 94 92 84 64 60 57 42 39 37 36 34 33 33 32 31 28 28 27 26 26 26 25 24 24 23 22 19 19 16 11 Table 4.3 shows researcher’s rank according to the proposed method NWC-index. The table shows variations in ranks and NWC-index than that of h-index. Some of the researcher’s NWC _____________________________________________________________________________________________ Author Productivity Indexing Via Topic Sensitive Weighted Citations 35 Chapter 4 Experiments have decreased because our proposed method has divided the total citations of a paper among its co-authors. Only an author in a single-authored paper has received the total citations of that paper. william g. Cochran, j. ross quinlan have same number of citations in NWC with the total citations received by their papers because they are the single authors of their all paper and their h-index and NWC-index have same. After calculating NWC for each author in each paper, we have checked their topics and calculated their TSWC scores for each author in paper that have further used for calculating TSWC-index. Following table shows ranks of authors after calculating TSWC score of each author. Table 4.4: Rank of authors by their TSWC-index S.N0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 Rank Authors NWC TSWC TSWC-index 1 2 3 4 5 6 7 7 8 9 10 11 11 12 13 14 15 16 17 18 19 20 21 22 23 24 24 25 26 27 david e. Goldberg william g. Cochran c. a. r. hoare j. ross quinlan bertrand meyer jeffrey d. ullman ricardo a. baeza-yates jiawei han edmund m. Clarke rajeev motwani gerard salton alfred menezes anil k. jain franco p. preparata christos h. papadimitriou usama m. Fayyad micheline kamber michael mcgill james e. rumbaugh gregory piatetsky-shapiro michael r. blaha scott a. Vanstone paul c. van oorschot william j. premerlani berthier a. ribeiro-neto prabhakar raghavan bernd-holger schlingloff frederick eddy richard c. dubes william e. lorensen 36085 34194 28227 16859 14845 13175 6311 7095 5469 4875 5686 4358 4386 3987 4231 2872 3363 2813 2655 3201 2124 1445 3081 1593 2753 2423 2317 1062 2021 531 36084 34194 28227 16859 14834 12878 7292 7110 6494 6074 5686 5311 5282 4981 4813 3483 3363 2813 2655 2403 2124 1912 1658 1593 1397 1213 1180 1062 1011 531 94 92 84 64 60 56 42 42 40 38 37 36 36 35 34 29 28 26 25 24 23 21 20 19 18 17 17 16 15 11 _____________________________________________________________________________________________ Author Productivity Indexing Via Topic Sensitive Weighted Citations 36 Chapter 4 Experiments Table 4.4 shows variations in rank of authors after calculating TSWC score by checking the topics of authors in each paper i.e. for not same topic author, his/her NWC score has decreased and for same topic author, his/her NWC score has increased which has shown in TSWC column. TSWC-index is then calculated for all authors. The citations of authors e.g. ricardo a. baezayates, jiawei han and edmund m Clarke etc have increased in TSWC because of the same topic to first author in which their papers have included. William g. cochran and j. ross quinlan have same citations score as that in the h-index table because they are the single authors of their papers. jeffrey d. ullman and prabhakar raghavan’s citations score in TSWC have decreased because they have not same topic to the first author in some of their papers. The variations in indices and rank of authors can be seen in the following table. _____________________________________________________________________________________________ Author Productivity Indexing Via Topic Sensitive Weighted Citations 37 Chapter 4 Experiments Table 4.5: Rank of authors by their TSWC-index and variations with NWC-index S.N0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 Authors NWC-index TSWC-index david e. Goldberg william g. Cochran c. a. r. hoare j. ross quinlan bertrand meyer jeffrey d. ullman ricardo a. baeza-yates jiawei han edmund m. Clarke rajeev motwani gerard salton alfred menezes anil k. jain franco p. preparata christos h. Papadimitriou usama m. Fayyad micheline kamber michael mcgill james e. rumbaugh gregory piatetsky-shapiro michael r. blaha scott a. Vanstone paul c. van oorschot william j. premerlani berthier a. ribeiro-neto prabhakar raghavan bernd-holger schlingloff frederick eddy richard c. dubes william e. lorensen 94 92 84 64 60 57 39 42 36 34 37 33 33 31 32 26 28 26 25 28 23 19 27 19 26 24 24 16 22 11 94 92 84 64 60 56 42 42 40 38 37 36 36 35 34 29 28 26 25 24 23 21 20 19 18 17 17 16 15 11 Variations in index 0 0 0 0 0 -1 +3 0 +4 +4 0 +3 +3 +4 +2 +3 0 0 0 -4 0 +2 -7 0 -8 -7 -7 0 -7 0 Earned position in rank 0 0 0 0 0 0 +1 0 +2 +2 -1 +1 +1 +2 0 +3 0 +1 +1 -3 +1 +2 -5 0 -6 -5 -5 -2 -5 -3 In table 4.5, (-) sign and (+) sign represent decrease and increase in TSWC-index respectively as compared to NWC-index and in rank. Rajeev motwani has NWC-index of 34 and TSWC-index of 38 so it shows that his TSWC-index has increased by 4 denoted by +4 and his rank has increased by 2 since in table 4.3, his rank is 11 and in table 4.4, his rank is 9. Rajeev motwani’s TSWC-index has increased because some of his co-authors have not that topic which has his topic of interest so we have decreased their TSWC-index and increased his TSWC-index. Rank and TSWC-index of jiawei han have 7 and 42 respectively in table 4.3 and in table 4.4 so there is no change in his rank which is denoted by 0 in table 4.5. Richard c.dubes has NWC-index of 22 _____________________________________________________________________________________________ Author Productivity Indexing Via Topic Sensitive Weighted Citations 38 Chapter 4 Experiments and TSWC-index of 15 so his TSWC-index has decreased by 7 points and his rank is also decreased by 5 points. Comparison of all the baseline methods and proposed methods have presented in the following figure. Comparison of baseline indices and proposed indices 30 Comparitive analysis h-index, kth-rank index, NWC-index and TSWC-index index rank 25 20 15 h-index rank 10 kth-rank index rank 5 david e. Goldberg william g. Cochran c. a. r. hoare j. ross quinlan jeffrey d. ullman bertrand meyer jiawei han ricardo a. baeza-yates micheline kamber edmund m. Clarke paul c. van oorschot alfred menezes scott a. Vanstone gerard salton michael mcgill berthier a. ribeiro-neto frederick eddy michael r. blaha william j. premerlani james e. rumbaugh william e. lorensen gregory piatetsky-… anil k. jain rajeev motwani prabhakar raghavan bernd-holger … christos h.… usama m. Fayyad richard c. dubes franco p. preparata 0 NWC-index rank TSWC-index rank Authors Figure 4.1: Comparison of h-index, kth-rank index, NWC-index and TSWC-index Figure 4.1 shows the comparison of h-index, kth-rank index, NWC-index and TSWC-index ranks of authors. NWC-index and TSWC-index of authors are less or equal to the baseline methods. There is no greater NWC-index and TSWC-index because in our proposed methods only an author with all of his/her single-authored papers gets all citations so his/her NWC-index and TSWC-index will equal to h-index and in case of co-authored papers his/her NWC-index and TSWC-index will be less than h-index and kth rank index. According to NWC-index and TSWCindex calculation, most of authors’ ranks have decreased as compared to h-index rank of authors. Now we will compare our proposed methods one by one with baseline methods. _____________________________________________________________________________________________ Author Productivity Indexing Via Topic Sensitive Weighted Citations 39 Chapter 4 Experiments Comparison of NWC-index with h-index 30 Comparitive analysis h-index and NWC-index 25 index rank 20 15 10 h-index rank 5 david e. Goldberg william g. Cochran c. a. r. hoare j. ross quinlan jeffrey d. ullman bertrand meyer jiawei han ricardo a. baeza-yates micheline kamber edmund m. Clarke paul c. van oorschot alfred menezes scott a. Vanstone gerard salton michael mcgill berthier a. ribeiro-neto frederick eddy michael r. blaha william j. premerlani james e. rumbaugh william e. lorensen gregory piatetsky-… anil k. jain rajeev motwani prabhakar raghavan bernd-holger schlingloff christos h. Papadimitriou usama m. Fayyad richard c. dubes franco p. preparata 0 NWC-index rank Authors Figure 4.2: Comparison of h-index with NWC-index 4.5.4. Scenarios of NWC-index NWC-index has either equal or less than h-index and kth-rank index but there may be different scenarios with respect to the ranks of authors in these indices. Some authors may have NWC-index less than or equal to h-index but higher rank than that of h-index. Some may be equal rank as compared to h-index and some of them may have less rank as compared to their hindex rank. Following are the three scenarios of rank up, down and stable. 4.5.4.1. Comparison of NWC-index with h-index Scenario 1: Relocation with respect to h-index rank: Rank up S.N0 1 2 3 4 5 6 7 Table 4.6: Position relocation with respect to h-index: Position up h-index Authors h-index NWC-Rank NWC-index Rank bertrand meyer 6 61 5 60 gerard salton 12 46 9 37 anil k. jain 15 43 12 33 rajeev motwani 15 43 11 34 christos h. papadimitriou 17 41 13 32 usama m. fayyad 18 39 17 26 franco p. preparata 19 38 14 31 Earned position in NWC-index +1 +3 +3 +4 +4 +1 +5 _____________________________________________________________________________________________ Author Productivity Indexing Via Topic Sensitive Weighted Citations 40 Chapter 4 Experiments In the above table, Bertrand meyer has h-index of 61 and has ranked at position 6 but his NWCindex has 60 which is less than his h-index because NWC-index has calculated on the basis of weighted citations received by each author in multi-authored papers. He has earned one position higher than that of his h-index rank. Similarly anil k. jain has NWC-index less than h-index, a variation of 10 in his h-index and NWC-index but his rank has increased by 3. franco p. preparata has h-index 38 and according to our method his NWC-index has 31 which is less than his h-index but his rank has increased by 5 positions. index Rank Scenario 1: Position up 20 18 16 14 12 10 8 6 4 2 0 h-index Rank NWC-Rank Authors Figure 4.3: Scenario 1: Position up with respect to h-index In figure 4.3 heighted bar represents h-index rank of author while the other represents NWC rank of authors. Increase in bar represents decrease in rank of an author. _____________________________________________________________________________________________ Author Productivity Indexing Via Topic Sensitive Weighted Citations 41 Chapter 4 Experiments Scenario 2: Relocation with respect to h-index rank: Rank down Table 4.7: Position relocation with respect to h-index: Position down h-index NWC-index Authors h-index NWC-index Rank Rank jeffrey d. ullman 5 62 6 57 micheline kamber 9 50 15 28 paul c. van oorschot 11 47 16 27 alfred menezes 11 47 12 33 scott a. Vanstone 12 46 22 19 michael mcgill 13 45 17 26 berthier a. ribeiro-neto 13 45 17 26 james e. rumbaugh 14 44 18 25 michael r. blaha 14 44 20 23 william j. premerlani 14 44 22 19 frederick eddy 14 44 23 16 william e. lorensen 14 44 24 11 gregory piatetsky-shapiro 14 44 15 28 prabhakar raghavan 16 42 19 24 bernd-holger schlingloff 17 41 19 24 richard c. dubes 19 38 21 22 S.N0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 Position down in NWC-index -1 -6 -5 -1 -10 -4 -4 -4 -6 -8 -9 -10 -1 -3 -2 -2 In table 4.7, the NWC-index of each author is decreased because of dividing the citations among co-authors but with decreasing of NWC-index their NWC-index rank has also decreased as compared to h-index rank. jeffrey d. ullman rank has 5 in h-index and in NWC-index it has 6 so his position has down by 1 in NWC-index. Scott a. Vanstone’s rank has decreased by 10 positions in NWC-index because his h-index rank has 12 and his NWC-index rank has 22. Same is the case for other authors as well. 30 Scenario 2:Position down index Rank 25 20 15 10 5 h-index Rank 0 NWC-index Rank Authors Figure 4.4: Scenario 2: Position down with respect to h-index _____________________________________________________________________________________________ Author Productivity Indexing Via Topic Sensitive Weighted Citations 42 Chapter 4 Experiments In the above figure, the decreasing bar i.e. h-index rank bar denotes increasing in the ranks of authors while the increasing bar of NWC-index rank represents decrease in the ranks of authors. Scenario 3: Position stable with respect to h-index Table 4.8: Position stable with respect to h-index S.N0 1 2 3 4 5 6 7 Authors david e. Goldberg william g. Cochran c. a. r. hoare j. ross quinlan jiawei han ricardo a. baeza-yates edmund m. Clarke h-index Rank 1 2 3 4 7 8 10 h-index 95 92 84 64 53 51 49 NWC-index Rank 1 2 3 4 7 8 10 NWC-index 94 92 84 64 42 39 36 In table 4.8, the ranks of all authors have same in both of the h-index and NWC-index even there have difference in the h-index and NWC-index of the authors because in NWC-index we have divided papers citations among co-authors. william g. Cochran and j. ross quinlan are the single authors of their papers and david e. Goldberg has the maximum citations even he has co-author in some of his papers so they all have got the same rank in NWC-index as that was in h-index. 12 Scenario 3: Position stable index Rank 10 8 6 h-index Rank 4 NWC-index Rank 2 0 Authors Figure 4.5: Scenario 3: Position stable with respect to h-index 4.5.4.2. Comparison of NWC-index with kth-rank index An author’s NWC-index may be equal to or less than his/her kth-rank index but there are three scenarios with respect to the ranks of authors in both indices. The NWC-index cannot be higher than kth-rank index because in our proposed method, the total citations of a _____________________________________________________________________________________________ Author Productivity Indexing Via Topic Sensitive Weighted Citations 43 Chapter 4 Experiments paper has divided among its authors and none of them has got the total citations of that paper except an author with a single-authored paper. Some authors may have NWC-index less than or equal to kth-rank index but higher rank than that of kth-rank index. Some authors rank may be equal to kth-rank index and some of them may have less rank as compared to their kth-rank index rank. Following are the three scenarios of rank up, down and stable. Scenario 1: Relocation with respect to kth-rank index rank: Rank up Table 4.9: Position relocation with respect to kth-rank index: Position up S.N0 1 2 3 4 5 Authors Kth-rank gregory piatetsky-shapiro bernd-holger schlingloff william j. premerlani frederick eddy william e. lorensen 16 20 23 24 25 Kth-rank index 33 29 25 22 19 NWCindex 28 24 19 16 11 NWC-index Rank 15 19 22 23 24 Earned position in NWC-index +1 +1 +1 +1 +1 In table 4.9, NWC-index of all the authors have decreased but their NWC-index rank has increased by one position. Since in NWC-index, first author of a paper does not get total citations of the paper as that in kth-rank index so the NWC-index of authors has decreased but they have index Rank got higher rank as compared to their kth-rank. 30 25 20 15 10 5 0 Scenario 1: Position up Kth-rank NWC-index Rank Authors Figure 4.6: Scenario 1: Position up with respect to kth-rank index The decreasing bar of NWC-index rank represents the increase in ranks of authors as compared to their kth-rank. _____________________________________________________________________________________________ Author Productivity Indexing Via Topic Sensitive Weighted Citations 44 Chapter 4 Experiments Scenario 2: Relocation with respect to kth-rank index: Rank down Table 4.10: Position relocation with respect to kth-rank index: Position down S.N0 1 2 3 4 5 Authors Kth-rank edmund m. Clarke alfred menezes james e. rumbaugh usama m. fayyad michael r. blaha 9 9 10 13 18 Kth-rank index 46 46 44 39 31 NWC-index 36 33 25 26 23 NWC-index Rank 10 12 18 17 20 Position down in NWC-index -1 -3 -8 -4 -2 In above table the ranks of authors have decreased in NWC-index with the decreasing of NWCindex. Difference in the kth-rank index and NWC-index of authors has greater so the authors have lost their rank in NWC-index. E.g. james e. rumbaugh has kth-rank index of 44 and after calculating NWC-index his NWC-index decreased to 25 and his NWC-index rank is also decreased to 18 from 10. Scenario 2: Position down 25 index Rank 20 15 10 Kth-rank 5 NWC-index Rank 0 Authors Figure 4.7: Scenario 2: Position down with respect to kth-rank index _____________________________________________________________________________________________ Author Productivity Indexing Via Topic Sensitive Weighted Citations 45 Chapter 4 Experiments Scenario 3: Position stable with respect to kth-rank index Table 4.11: Position stable with respect to kth-rank index S.N0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Authors Kth-rank david e. Goldberg william g. Cochran c. a. r. hoare j. ross quinlan bertrand meyer jeffrey d. ullman jiawei han ricardo a. baeza-yates gerard salton rajeev motwani anil k. jain christos h. papadimitriou franco p. preparata micheline kamber paul c. van oorschot michael mcgill berthier a. ribeiro-neto prabhakar raghavan richard c. dubes scott a. Vanstone 1 2 3 4 5 6 7 8 9 11 12 13 14 15 16 17 17 19 21 22 Kth-rank index 94 92 84 64 60 58 51 48 46 42 40 39 38 35 33 32 32 30 27 26 NWC-index 94 92 84 64 60 57 42 39 37 34 33 32 31 28 27 26 26 24 22 19 NWC-index Rank 1 2 3 4 5 6 7 8 9 11 12 13 14 15 16 17 17 19 21 22 In table 4.11, NWC-index of authors has either equal to or less than their kth-rank index but their ranks in both methods have equal. Scenario 3: Position stable 25 15 10 scott a.… richard c.… prabhakar… berthier a.… michael mcgill micheline… paul c. van… franco p. … christos h.… anil k. jain rajeev motwani gerard salton ricardo a.… jiawei han jeffrey d.… bertrand meyer NWC-index Rank j. ross quinlan 0 c. a. r. hoare Kth-rank david e.… 5 william g.… index rank 20 Authors Figure 4.8: Scenario 3: Position stable with respect to kth-rank index _____________________________________________________________________________________________ Author Productivity Indexing Via Topic Sensitive Weighted Citations 46 Chapter 4 Experiments 4.5.5. Scenarios of TSWC-index TSWC-index has either equal to or less than h-index and kth-rank index. There have been three scenarios with respect to the ranks of authors in these indices. Some authors have TSWC-index less than or equal to h-index but higher rank than that of h-index. Some have equal rank as compared to h-index and some of them have less rank as compared to their h-index rank. Following are the three scenarios of TSWC-index that shows the rank of authors either rank up from h-index, rank down or rank stable to h-index. 4.5.5.1. Comparison of TSWC-index with h- index Topic Sensitive Weighted Citation (TSWC-index) has assigned weighted citations to co-authors of a paper according to their topic relatedness with the first author of that paper. Some authors TSWC score has decreased when they have not same topic to the first author. Some of the authors TSWC score has increased when they have same topic to the first author. hindex on the other hand has assigned full citations of a paper to each of its co-author so they always have got higher or equal h-index as compared to TSWC-index. There have three scenarios that have shown the relation of h-index and TSWC-index when TSWC-index has equal to or less than h-index. Scenario 1: Relocation with respect to h-index rank: Rank up Table 4.12: Position relocation with respect to h-index: Position up S.N0 1 2 3 4 5 6 7 8 9 Authors bertrand meyer ricardo a. baeza-yates edmund m. Clarke gerard salton anil k. jain rajeev motwani christos h. papadimitriou usama m. Fayyad franco p. preparata h-index Rank 6 8 10 12 15 15 17 18 19 h-index 61 51 49 46 43 43 41 39 38 TSWCindex 60 42 40 37 36 38 34 29 35 TSWC-index Rank 5 7 8 10 11 9 13 14 12 Earned Position in TSWC-index +1 +1 +2 +2 +4 +6 +4 +4 +7 _____________________________________________________________________________________________ Author Productivity Indexing Via Topic Sensitive Weighted Citations 47 Chapter 4 Experiments In the above table, authors TSWC-index has less than their h-index but they have higher rank as compared to h-index. Bertrand meyer h-index has 61. His TSWC-index has decreased to 60 because topic sensitive weighted citations have allocated to him. His rank has increased by 2 positions. Similar is the case with other authors as well. The figure showing the above results has index Rank presented below. 20 18 16 14 12 10 8 6 4 2 0 Scenario 1: Position up h-index Rank TSWC-index Rank Authors Figure 4.9: Scenario 1: Position up with respect to h-index Scenario 2: Relocation with respect to h-index rank: Rank down S.N0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Table 4.13: Position relocation with respect to h-index: Position down h-index TSWCTSWCPosition down Authors h-index Rank index index Rank in TSWC-index jeffrey d. ullman 5 62 56 6 -1 micheline kamber 9 50 28 15 -6 paul c. van oorschot 11 47 20 21 -10 scott a. Vanstone 12 46 21 20 -8 michael mcgill 13 45 26 16 -3 berthier a. ribeiro-neto 13 45 18 23 -10 james e. rumbaugh 14 44 25 17 -3 michael r. blaha 14 44 23 19 -5 william j. premerlani 14 44 19 22 -8 frederick eddy 14 44 16 25 -11 william e. lorensen 14 44 11 27 -13 gregory piatetsky-shapiro 14 44 24 18 -4 prabhakar raghavan 16 42 17 24 -8 bernd-holger schlingloff 17 41 17 24 -7 richard c. dubes 19 38 15 26 -7 _____________________________________________________________________________________________ Author Productivity Indexing Via Topic Sensitive Weighted Citations 48 Chapter 4 Experiments Table 4.13 has shown the authors TSWC-index and their rank decreased by calculating TSWCindex. Micheline kamber’s rank has decreased by 6 positions and his TSWC-index has decreased to 28 from his h-index 50 so the difference is very high which has resulted in decrease of rank as well. Authors on s.no 7, 8, 9, 10, 11 and 12 have same h-index of 44 and same h-index rank of 14 but after calculating their TSWC-index, their TSWC-index has decreased and became different because the topic relatedness of each author has checked with the topic of the first author so in case of not same topic to the first author, the authors NWC score has minimized that index Rank has resulted in reduction of their TSWC-index and their rank as well. 30 25 20 15 10 5 0 Scenario 2: Position down h-index Rank TSWC-index Rank Authors Figure 4.10: Scenario 2: Position down with respect to h-index Scenario 3: Position stable with respect to h- index Table 4.14: Position stable with respect to h-index S.N0 1 2 3 4 5 6 Authors david e. Goldberg william g. Cochran c. a. r. hoare j. ross quinlan jiawei han alfred menezes h-index Rank 1 2 3 4 7 11 h-index TSWC-index 95 92 84 64 53 47 94 92 84 64 42 36 TSWC-index Rank 1 2 3 4 7 11 In the above table, all authors have same rank. The h-index and TSWC-index of William g. Cochran, c. a. r. hoare and j. ross quinlan have same because Cochran and quinlan have same number of citations which shows that they are single-authors in their papers. Citations of hoare _____________________________________________________________________________________________ Author Productivity Indexing Via Topic Sensitive Weighted Citations 49 Chapter 4 Experiments have minimized a little bit because of co-authored paper so it did not change his TSWC-index. Jiawei han and Alfred menezes TSWC-index has decreased but they have same rank. index Rank Scenario 3: Position stable 12 10 8 6 4 2 h-index Rank TSWC-index Rank 0 david e. william g. c. a. r. goldberg cochran hoare j. ross quinlan jiawei alfred han menezes Authors Figure 4.11: Scenario 3: Position stable with respect to h-index 4.5.5.2. Comparison of TSWC-index with kth-rank index Both kth-rank index and TSWC index assign weighted citations to authors but the first author in multi-authored paper receives full citations of the paper in kth-rank index and other authors receive citations according to their rank. The TSWC-index on the other hand divides the total citations among authors of a paper using the criteria of topic sensitivity. The TSWC-index of an author may be either less than or equal to kth-rank index depending on the situation that the author has same topic to the first author or not and that an author is a single author of a paper or not. Authors rank in calculating TSWC-index may be less, greater or equal to the kth-rank index. Scenario 1: Relocation with respect to kth-rank index: Rank up Table 4.15: Position relocation with respect to kth-rank index: Position up S.N0 1 2 3 4 5 6 7 8 Authors Kth-rank ricardo a. baeza-yates edmund m. Clarke rajeev motwani anil k. jain franco p. preparata michael mcgill scott a. Vanstone william j. premerlani 8 9 11 12 14 17 22 23 Kth-rank index 48 46 42 40 38 32 26 25 TSWC-index 42 40 38 36 35 26 21 19 TSWC-index Rank 7 8 9 11 12 16 20 22 Earned position in TSWC-index +1 +1 +2 +1 +2 +1 +2 +1 _____________________________________________________________________________________________ Author Productivity Indexing Via Topic Sensitive Weighted Citations 50 Chapter 4 Experiments The TSWC-index of the authors in table 4.15 has decreased because the actual citations of paper have divided among its co-authors. The TSWC-index rank of these authors has increased as compare to the kth-rank. rajeev motwani’s TSWC-index rank has increased by 2 and the other authors ranks have also increased. Scenario 1: Position up 25 index Rank 20 15 10 Kth-rank TSWC-index Rank 5 0 Authors Figure 4.12: Scenario 1: Position up with respect to kth-rank index Scenario 2: Relocation with respect to kth-rank index: Rank down Table 4.16: Position relocation with respect to kth-rank index: Position down S.N0 1 2 3 s4 5 6 7 8 9 10 11 12 13 Authors Kth-rank alfred menezes gerard salton james e. rumbaugh usama m. fayyad gregory piatetsky-shapiro paul c. van oorschot berthier a. ribeiro-neto michael r. blaha prabhakar raghavan bernd-holger schlingloff richard c. dubes frederick eddy william e. lorensen 9 9 10 13 16 16 17 18 19 20 21 24 25 Kth-rank index 46 46 44 39 33 33 32 31 30 29 27 22 19 TSWCindex 36 37 25 29 24 20 18 23 17 17 15 16 11 TSWC-index Rank 11 10 17 14 18 21 23 19 24 24 26 25 27 Position down in TSWC-index -2 -1 -7 -1 -2 -5 -6 -1 -5 -4 -5 -1 -2 In the above table, all authors rank has decreased in TSWC-index. Gregory piatetsky-shapiro and paul c. van oorschot have same kth-rank of 16 and kth-rank index of 33 but their TSWC-index _____________________________________________________________________________________________ Author Productivity Indexing Via Topic Sensitive Weighted Citations 51 Chapter 4 Experiments has decreased to 24 and 20 and their ranks to 18 and 21 respectively because their citations have decreased in topic sensitive weighted citations. Scenario 2: Position down 30 index Rank 25 20 15 10 5 Kth-rank 0 TSWC-index Rank Authors Figure 4.13: Scenario 2: Position down with respect to kth-rank index Scenario 3: Position stable with respect to kth-rank index Table 4.17: Position stable with respect to kth-index S.N0 1 2 3 4 5 6 7 8 9 Authors Kth-rank Kth-rank index TSWC-index david e. Goldberg william g. Cochran c. a. r. hoare j. ross quinlan bertrand meyer jeffrey d. ullman jiawei han christos h. papadimitriou micheline kamber 1 2 3 4 5 6 7 13 15 94 92 84 64 60 58 51 39 35 94 92 84 64 60 56 42 34 28 TSWC-index Rank 1 2 3 4 5 6 7 13 15 All authors in table 4.17 have same rank in both methods. First five authors have same kth-rank index and TSWC-index while the subsequent authors have less TSWC-index as compare to the kth-rank index. _____________________________________________________________________________________________ Author Productivity Indexing Via Topic Sensitive Weighted Citations 52 Chapter 4 Experiments Scenario 3: Position stable 16 index Rank 14 12 10 8 6 Kth-rank 4 2 TSWC-index Rank 0 Authors Figure 4.14: Scenario 3: Position stable with respect to kth-rank index 4.5.6. Comparison of TSWC-index and NWC-index Some authors have got higher TSWC-index because of same area of interest in which most of their papers have included, some authors have got less TSWC-index because of not same topic to the first author so their NWC score has then decreased and some have got equal TSWC-index to the NWC-index because they have all same topic and their NWC score and TSWC score have same. Comparison of NWC-index and TSWC-index has given in the following figure. _____________________________________________________________________________________________ Author Productivity Indexing Via Topic Sensitive Weighted Citations 53 Chapter 4 Experiments Comparison of Authors NWC-index rank and TSWC-index rank 30 index rank 25 20 15 10 NWC-index rank 5 TSWC-index rank david e. Goldberg william g. Cochran c. a. r. hoare j. ross quinlan jeffrey d. ullman bertrand meyer jiawei han ricardo a. baeza-yates micheline kamber edmund m. Clarke paul c. van oorschot alfred menezes scott a. Vanstone gerard salton michael mcgill berthier a. ribeiro-neto frederick eddy michael r. blaha william j. premerlani james e. rumbaugh william e. lorensen gregory piatetsky-… anil k. jain rajeev motwani prabhakar raghavan bernd-holger schlingloff christos h. Papadimitriou usama m. Fayyad richard c. dubes franco p. preparata 0 Authors Figure 4.15: Comparison of TSWC-index with NWC-index Figure 4.15 shows variation in rank of authors after calculating NWC-index and TSWC-index. In TSWC-index, ranks of those authors have increased which have same topic to the first authors in most of their multi-authored papers. Authors having not same topic to the first author of the paper got lower ranks in TSWC-index. Different scenarios of TSWC-index with respect to NWC-index are shown below. Scenario 1: Relocation with respect to NWC-index: Rank up Table 4.18: Position relocation with respect to NWC-index: Position up S.N0 1 2 3 4 5 6 7 8 9 10 11 Authors ricardo a. baeza-yates edmund m. Clarke rajeev motwani alfred menezes anil k. jain franco p. preparata usama m. Fayyad michael mcgill james e. rumbaugh michael r. blaha scott a. Vanstone TSWCindex Rank 7 8 9 11 11 12 14 16 17 19 20 TSWCindex 42 40 38 36 36 35 29 26 25 23 21 NWCindex 39 36 34 33 33 31 26 26 25 23 19 NWC-index Rank 8 10 11 12 12 14 17 17 18 20 22 Earned position in TSWC-index +1 +2 +2 +1 +1 +2 +3 +1 +1 +1 +2 _____________________________________________________________________________________________ Author Productivity Indexing Via Topic Sensitive Weighted Citations 54 Chapter 4 Experiments In the above table, Michael mcgill, james e. rumbaugh and Michael r. blaha have TSWC-index equal to their NWC-index because they have same score of NWC and TSWC but their ranks have increased by one position in TSWC-index. Their score of TSWC has not decreased because they have same topic to their first author and as well as co-authors. The remaining authors have greater TSWC-index than their NWC-index and higher ranks as well. Their NWC score have increased because some of their co-authors NWC score have minimized due to not same topic to first author so we have decreased their co-authors NWC and increased their NWC score. Scenario 1: Position up 25 index Rank 20 15 TSWC-index Rank 10 NWC-index Rank 5 0 Authors Figure 4.16: Scenario 1: Position up with respect to NWC-index Scenario 2: Relocation with respect to NWC-index: Rank down Table 4.19: Position relocation with respect to NWC-index: Position down S.N0 1 2 3 4 5 6 7 8 9 Authors gerard salton gregory piatetsky-shapiro paul c. van oorschot berthier a. ribeiro-neto prabhakar raghavan bernd-holger schlingloff frederick eddy richard c. dubes william e. lorensen TSWCindex Rank 10 18 21 23 24 24 25 26 27 TSWC -index 37 24 20 18 17 17 16 15 11 NWC -index 37 28 27 26 24 24 16 22 11 NWC-index Rank 9 15 16 17 19 19 23 21 24 Position down in TSWC-index -1 -3 -5 -6 -5 -5 -2 -5 -3 _____________________________________________________________________________________________ Author Productivity Indexing Via Topic Sensitive Weighted Citations 55 Chapter 4 Experiments In table 4.19, all authors TSWC-index rank has decreased. gerard salton, frederick eddy and william e. lorensen have TSWC-index same to their NWC-index but their TSWC-index rank has decreased by 1, 2 and 3 respectively. Their rank has decreased because other authors TSWCindex has increased and resulted in increase in their rank as well. Similarly gregory piatetskyshapiro, paul c. van oorschot, berthier a. ribeiro-neto, prabhakar raghavan, bernd-holger schlingloff and richard c. dubes have not same topic to first author in some of their multiauthored papers so their NWC score have decreased which have further decreased their TSWCindex and their rank as well. Scenario 2: Position down 30 index rank 25 20 15 10 5 TSWC-index Rank 0 NWC-index Rank Authors Figure 4.17: Scenario 2: Position down with respect to NWC-index _____________________________________________________________________________________________ Author Productivity Indexing Via Topic Sensitive Weighted Citations 56 Chapter 4 Experiments Scenario 3: Position stable with respect to NWC-index Table 4.20: Position stable with respect to NWC-index S.N0 Authors 1 2 3 4 5 6 7 8 9 10 david e. Goldberg william g. Cochran c. a. r. hoare j. ross quinlan bertrand meyer jeffrey d. ullman jiawei han christos h. papadimitriou micheline kamber william j. premerlani TSWC-index Rank 1 2 3 4 5 6 7 13 15 22 TSWC-index NWC-index 94 92 84 64 60 56 42 34 28 19 94 92 84 64 60 57 42 32 28 19 NWC-index Rank 1 2 3 4 5 6 7 13 15 22 Authors in table 4.20 have same rank in their NWC-index and TSWC-index. william g. Cochran, c. a. r. hoare, j. ross quinlan, micheline kamber and william j. premerlani have 34194, 28227, 16859, 3363 and 1593 NWC and TSWC score respectively so they have same NWC-index and TSWC-index. They have same topic to their first author and to all of co-authors in their multiauthored papers so their TSWC score neither increased nor decreased and they got same rank as well. David e. Goldberg, bertrand meyer and jiawei han TSWC score have minimized by few numbers so it did not affect their TSWC-index and rank. jeffrey d. ullman TSWC score has decreased to 12878 from 13175 which has decreased his TSWC-index by one number and christos h. Papadimitriou TSWC score has increased that has increased his TSWC-index to 34 while his NWC-index was 32 but their rank in both indices have remained same. _____________________________________________________________________________________________ Author Productivity Indexing Via Topic Sensitive Weighted Citations 57 Chapter 4 Experiments Scenario 3: Position stable 25 index rank 20 15 TSWC-index Rank 10 NWC-index Rank 5 0 Authors Figure 4.18: Scenario 3: Position stable with respect to NWC-index _____________________________________________________________________________________________ Author Productivity Indexing Via Topic Sensitive Weighted Citations 58 Chapter 5 Conclusions _____________________________________________________________________________________________ Author Productivity Indexing Via Topic Sensitive Weighted Citations 59 Chapter 5 Conclusion 5. Conclusions It is important to consider topics of co-authors when weighted citations are assigned to them in multi-authored paper. To evaluate scientists on the basis of their topic based contribution, we have proposed NWC-index and TSWC-index. NWC-index of each author has calculated like that of h-index after the allocation of Normalized Weighted Citations (NWC) score to authors in multi-authored papers according to their rank. Topic of each author has checked in each paper against its first author and in case of not same topic their NWC score has reduced and the remaining same topic authors NWC score has increased and then their TSWC-index has calculated like h-index. We have compared the results of both proposed methods that have clearly shown the effects on ranking of authors and variations in index according to the allocation of Topic Sensitive Weighted Citations to authors. Our results have also shown the effects on ranking of authors and variations in indices with respect to kth-rank index and h-index. Our analysis has shown that an author with single-authored papers has got the full citations score and his/her NWC-index and TSWC-index have same with h-index and kth-rank index. Future work can be to find equal contributions of authors in multi-authored paper i.e. if all of authors of a paper have contributed equally to that paper. Another future work can be to minimize coauthors weights according to their correlation of topics with first author. If a coauthor topic is closely correlated with first author topic then his/her weight should be minimized by a smaller amount and if his/her topic is hardly correlated with first author topic then the weight should be minimized more. _____________________________________________________________________________________________ Author Productivity Indexing Via Topic Sensitive Weighted Citations 60 References References [1] Abbas, A.M. 2011. Weighted indices for evaluating the quality of research with multiple authorship. Scientometrics, 88, 1, 107–131. [2] Abramo, G., D’Angelo, C.A. and Rosati, F. 2013. The importance of accounting for the number of coauthors and their order when assessing research performance at the individual level in the life sciences. Journal of Informetrics, 7, 1, 198–208. [3] Adler, R., Ewing, J. and Taylor, P. 2008. Citation Statistics. A report from the International Mathematical Union (IMU). [4] Alonso, S., Cabrerizo, F.J., Herrera-Viedma, E. and Herrera, F. 2010. hg-Index: A new index to characterize the scientific output of researchers based on the h- and g-indices. Scientometrics, 82, 2, 391-400. [5] Anania, G. and Caruso, A. 2013. Two simple new bibliometric indexes to better evaluate research in disciplines where publications typically receive less citations. Scientometrics, 96, 2, 617-631. [6] Aziz, N.A. and Rozing, M.P. 2013. Profit (p)-Index: The Degree to Which Authors Profit from Co-Authors. PLoS ONE, 8, 4, e59814. doi:10.1371/journal.pone.0059814. [7] Batista, P.D., Campiteli, M.G., Kinouchi, O. and Martinez, A.S. 2006. Is it possible to compare researchers with different scientific interests? Scientometrics, 68, 1, 179-189. [8] Blei, D.M., Ng, A.Y. and Jordan, M.I. 2003. Latent Dirichlet Allocation. JMLR, 3, 9931022. [9] Bornmann, L., Mutz, R. and Daniel, H.D. 2008. Are there better indices for evaluation purposes than the h-index? A comparison of nine different variants of the h-index using data from biomedicine. Journal of the American Society for Information Science and Technology, 59, 5, 830-837. [10] Burrell, Q.L. 2007. Hirsch’s h-index: a stochastic model. Journal of Informatics, 1, 1, 16– 25. [11] Cabrerizo, F.J., Alonso, S., Herrera-Viedma, E. and Herrera, F. 2010. q2-Index: Quantitative and Qualitative Evaluation Based on the Number and Impact of Papers in the Hirsch Core. Journal of Informatics, 4, 1, 23-28. _____________________________________________________________________________________________ Author Productivity Indexing Via Topic Sensitive Weighted Citations 61 References [12] Carbone, V. 2011. Fractional counting of authorship to quantify scientific research output. arxiv:1106.0114v1. [13] Chai, J.C., Hua, P.H., Rousseau, R. and Wan, J.K. 2008. The Adapted Pure h-Index. In Proccedings of WIS 2008, Berlin Fourth International Conference on Webometrics, Informetrics and Scientometrics & Ninths COLLNET Meeting. [14] DBLP Bibliography Database: DBLP-Citation-network V5 , http://arnetminer.org/citation. [15] Egghe, L. 2006. Theory and Practice of the g-index. Jointly published by Akadémiai Kiadó, Budapest and Springer, Dordrecht, Scientometrics, 69, 1, 131-152. [16] Egghe, L. 2006. An Improvement to H-index: The G-index. ISSI News-Letter, 2, 1, 8-9. [17] Egghe, L. 2008f. Mathematical theory of the h-index and g-index in case of fractional counting of authorship. Journal of the American Society for Information Science and Technology, 59, 10, 1608–1616. [18] Garfield, E. 1999. Journal Impact Factor: a brief review. CMAJ, 161, 979-980. [19] Hagen, N.T. 2008. Harmonic allocation of authorship credit: Source-level correction of bibliometric bias assures accurate publication and citation analysis. PLoS One, 3, 12, e4021. [20] Hirsch, J.E. 2005. An index to quantify an individual’s scientific research output. In Proceedings of the National Academy of Sciences, 102, 46, 16569-16572. [21] Hirsch, J.E. 2010. An index to quantify an individual’s scientific research output that takes into account the effect of multiple coauthorship. Scientometrics, 85, 3, 741–754. [22] Jin, B.H. 2006. h-Index: An evaluation indicator proposed by scientist. Science Focus, 1, 1, 8–9. [23] Jin, B.H. 2007. The AR-index: complementing the h-index. ISSI Newsletter, 3, 1, 6. [24] Jin, B.H., Liang, L.M., Rousseau, R. and Egghe, L. 2007. The R- and AR- indices: Complementing the h-index. Chinese Science Bulletin, 52, 6, 855-863. [25] Kennedy, D. 2003. Multiple authors, multiple problems. Science, 301, 733. [26] Kosmulski, M. 2006. A new Hirsch-type index saves time and works equally well as the original h-index. ISSI Newsletter, 2, 3, 4–6. [27] Liu, X.Z. and Fang, H. 2012. Fairly sharing the credit of multi-authored papers and its application in the modification of h-index and g-index. Scientometrics, 91, 1, 37–49. _____________________________________________________________________________________________ Author Productivity Indexing Via Topic Sensitive Weighted Citations 62 References [28] Liu, X.Z. and Fang, H. 2012b. Modifying h-index by allocating credit of multi-authored papers whose author names rank based on contribution. Journal of Informetrics, 6, 4, 557–565. [29] Schreiber, M. 2008a. To share the fame in a fair way, hm modifies h for multi-authored manuscripts. New Journal of Physics, 10, 040201, 1-9. [30] Schreiber M. 2008b. A modification of the h-index: The h(m)-index accounts for multiauthored manuscripts. Journal of Informetrics, 2, 3, 211–216. [31] Schreiber, M. 2009. Fractionalized counting of publications for the g-Index. Journal of the American Society for Information Science, 60, 10, 2145–2150. [32] Seglen, P.O. 1997. Why the impact factor of journals should not be used for evaluating research. British Medical Journal, 314, 7079, 498-502. [33] Sekercioglu, C.H. 2008. Quantifying coauthor contributions. Science, 322, 371. [34] Wan, J.K., Hua, P.H. and Rousseau, R. 2007. The pure h-index: calculating an author’s hindex by taking co-authors into account. COLLNET Journal of Scientometrics and Information Management, 1, 2, 1-5. [35] Zhang, C.T. 2009. The e-Index, Complementing the h-Index for Excess Citations. PLoS ONE 4, 5, e5429. doi:10.1371/journal.pone.0005429. [36] Zhang, C.T. 2009. A proposal for calculating weighted citations based on author rank. EMBO Reports, 10, 5, 416–417. _____________________________________________________________________________________________ Author Productivity Indexing Via Topic Sensitive Weighted Citations 63
© Copyright 2025