Co-authorship Database development using Google scholar Data

COLLECTING AND MAPPING FACULTY
COLLABORATION DATA
Mingzhu Zhu, Dr. Brook Wu, Dr. Nancy Steffen-Fluhr, Dr. S. Roxanne Hiltz, Dr. Katia Passerini, Dr. Anatoliy Gruzd, Regina Collins
Data collection from Google scholar
Data collection from Scopus
Scopus author search
Fist name, last name
Affiliation filtering
Common names
Pub list fetcher
Extract title, year,
source title…
Pub detail page fetcher
Extract bibliographic data
mapping authors with their
affiliation information
Merge three Co-authorship Databases
A list of NJIT faculty names was
used to carry out Google
scholar’s author search, and
63,937 raw search hits were
collected. For the co-authorship
database, only the publications
with at least two NJIT authors
qualify for inclusion.
Co-authorship DB
from Scopus
Duplicate removal
Co-authorship DB from
Google scholar
Publication merge
Co-authorship DB manually
created 2 years ago
Author pub mapping
Co-authorship data base
Fig-2. Display of Publications in Google Scholar and ACM
Co-authorship Database development using
Google scholar
Co-authorship data
between NJIT faculty
members
Period study:
Pubs in 2000-2010
Co-authorship data
between NJIT faculty and
graduate students
Co-authorship data
Between NJIT faculty
and external authors
An overview of co-authorship database development using Google
scholar is illustrated in Fig-3.
Co-authorship network visualization
Co-authorship database development using Scopus
An overview of co-authorship database development using
Scopus is illustrated in Fig-1.
NJIT
Faculty
Name
List
For each faculty name,
do author search
1
Scopus
NJIT
Faculty
Name
List
For each faculty name,
do author search
Author list
Pub list
Pub detail
Bibliographic data
extractor
Co-authored
publication
extraction
2
Duplicate removal
Page fetcher:
3
downloads pubs
from different DLs
Candidate coauthored publication
{Title, URLs}
Duplicate removal
Publication data base
Coauthor data base
Author affiliation
mapping
Raw data process
Fig-1. Overview of co-authorship database development from scopus
A list of NJIT faculty names was used to carry out Scopus
author search. Fortunately, no NJIT faculty members have
common names. We use affiliation information to filter out
those authors who come from other organizations but have
the same name with NJIT faculty members.
Co-authorship database was developed by extracting the
bibliographic data for each faculty member.
……
Sciencedirect
Publication Database
{Title, URLs}
Affiliation filtering
portal.acm.org
Google
Scholar
Affiliation verification
4
Coauthored publication database
{Title, authors}
Fig-3. Overview of co-authorship database development
Co-authored publication extraction:
If a paper is co-authored by two NJIT faculty members, it should appear
in two different publication lists. For instance, suppose there are two
authors A and B, two publication lists named Pa and Pb are denoted:
Publication list for author A: Pa={p(a)1, p(a)2, p(a)3, ……. p(a)n}, and
Publication list for author B: Pb={p(b)1, p(b2, p(b)3, ……. p(b)m}.
If a publication p is coauthored by A and B, then p ∈ Pa and p ∈ Pb.
2,043 coauthored publication candidates were identified. After removing
the duplicates, only 1,914 pubs were left.
Fig-4. Co-authorship network created using keywords “Jian”