EFFICIENT WEB SEARCH STRATEGIES HOW TO USE GOOGLE IN AN APPROPRIATE MANNER

ProZ.com Regional Conference –
Edinburgh, UK, 10/11 November 2006
EFFICIENT WEB SEARCH
STRATEGIES
OR
HOW TO USE GOOGLE IN
AN APPROPRIATE MANNER
IN A T&I CONTEXT
Steffen Walter – http://proz.com/pro/34047
 Steffen Walter 2006
Ways and means to achieve better Internet search results
1
WHY THIS PRESENTATION?
Available search options seem to be
used only to an insufficient extent
ProZ.com evidence – many KudoZ
questions could have been resolved
before they were posted (by performing
appropriate ProZ.com term and/or
Internet searches)
DO YOUR OWN RESEARCH – IT’S A
BASIC TRANSLATOR SKILL!
 Steffen Walter 2006
2
CONTENT OVERVIEW
CAUTION! Google is an IT-based tool NOT
replacing the human intellect
How it works
Getting started – the Google Toolbar
Web search basics and advanced features
What sources to look for from a T&I
perspective
Quality sources vs web garbage – how to tell
Other searchable resources
Summary
 Steffen Walter 2006
Overview of presentation content
3
HOW GOOGLE WORKS
•
•
•
•
•
Google presumably holds the world’s largest database of
(cached) websites and other documents
So-called “spiders” or “robots” (a special software
developed by Google) index and rank websites and other
documents
On a given website, “spiders” collect data by moving from
hyperlink to hyperlink on that page
Collected data is processed by Google to establish the
“page ranking” (reflected in the order of URL appearance
after a specific search)
Shortcoming: page ranking is impacted by advertising,
which can distort search results
 Steffen Walter 2006
Principle of Google
4
Google help screen – more info
 Steffen Walter 2006
Use this Google subpage – http://www.google.com/support - to get familiar with
search options and additional features, such as the Google Toolbar.
5
Start: the Google Toolbar for added search convenience
 Steffen Walter 2006
To download the Google toolbar, go to the above page http://toolbar.google.com.
Available for Internet Explorer and Mozilla Firefox.
Note the “Page Rank” feature in the toolbar (cf. “How Google Works”)
6
Google start page – simple search
 Steffen Walter 2006
Simple search input screen – advanced screen explains search options in more
detail
7
Google advanced search screen
 Steffen Walter 2006
8
Web search basics
•
•
•
•
•
Any term combination entered in the search box will yield
hits with ALL terms (there is no need to add AND inbetween)
Example: translation interpreting editing proofreading
generates a list of URLs/pages containing all four terms
Choose keywords that are as specific as possible to
reduce the number of hits and increase their relevance
(i.e. go from more to less specific in the search process)
Common words, such as (in English) the, in, or of, are
usually ignored by the search algorithm (they must be
preceded by a plus sign to be taken into account, such as
in translation +in +the world)
Searches are not case-sensitive – TRansLaTion will yield
the same hits as translation
 Steffen Walter 2006
Example:
Try a Google Web search for the term sequence translation interpreting editing
proofreading (entered exactly as shown here, without inverted commas).
Pages/documents will appear that contain all four terms, such as several
ProZ.com profile hits.
9
Advanced search features
•
•
•
•
•
If required, terms can be excluded by putting a minus sign
immediately in front, such as in languages translation
interpreting -proofreading or auctions paintings -ebay
Phrase search can be performed by putting the entire phrase in
inverted commas, such as in “translation in the world”
Use of the tilde sign ~ in front of terms enables fuzzy search for
synonyms or related words, as in ~vehicles ~transport
~shipping, which also yields hits with terms such as shipments
or transportation
Use of the separator symbol | - i.e. the shortcut AltGr <> (such
as in “to make|take a decision” – in inverted commas) will show
hits with either of the phrases “to make a decision” and “to take
a decision”
Search can be restricted to specific domains or URLs by using
the site: command, as in the sequences translation
site:.proz.com, IAS IFRS “accounting standards” site:.eu.int, or
“code of conduct” site:.iti.org.uk
 Steffen Walter 2006
Examples:
1.
Go to the Google Web search page and type in languages
translation interpreting proofreading and languages translation interpreting proofreading, or auctions paintings ebay and auctions paintings -ebay. Compare
the number of hits -- it will be larger in the latter cases as less terms are taken
into account, which results in a lesser degree of specificity.
2.
Type the entire phrase “translation in the world” in the search box,
which gives, among other hits, a (somewhat outdated) paper on machine
translation – see http://www.eamt.org/archive/summit95.html
3.
Type in ~vehicles ~transport ~shipping -- and you will see other,
related terms appearing in the search results
4.
Keying in translation site:.proz.com will yield all hits pertaining
to the ProZ.com site that contain the term translation. Also try the second
combination IAS IFRS “accounting standards” site:.eu.int, which will lead you
straight to the relevant EU Directive at http://europa.eu.int/eurlex/pri/de/oj/dat/2003/l_261/l_26120031013de00010420.pdf
(please note that this document reflects the 2003 version of the
standards, which has been updated and expanded in the meantime).
The third hit appearing as a result of the search for“code of
conduct” site:.iti.org.uk gets you directly to the downloadable code of conduct
documents of the UK Institute of
Translation and Interpreting
accessible from http://www.iti.org.uk/pages/joinITI/code.asp
10
More advanced search operators
 Steffen Walter 2006
At http://www.google.com/intl/en/help/operators.html (the screenshot above),
you find a complete list of advanced search operators supported by Google, such
as the define: command (for example, enter the sequence define:code of conduct
to get a list of [English] code of conduct definitions in various contextual
settings).
11
Special search features
 Steffen Walter 2006
Features other than ordinary web search at
http://www.google.com/help/features.html
12
There’s more to it than pure Web search
 Steffen Walter 2006
Google includes more options than pure web search, such as:
•the Google Directory - http://www.google.com/dirhp (see above), a search tool
organized along broad main categories and more specific sub-categories
Examples:
The main category “References” includes links to online encyclopedias, monoand multilingual dictionaries, libraries, and many more
The “Translation” subcategory is ‘hidden’ behind Science -> Social Sciences ->
Linguistics -> Translation
The “World” category contains a similar search structure linking to sources in
languages other than English
•the Google Scholar site – see next slide
13
There’s more to it than pure Web search (2)
 Steffen Walter 2006
See the screenshot of the Google scholar page (advanced search) above http://scholar.google.com/advanced_scholar_search?hl=en&lr, which serves as a
tool restricting your search to scientific publications on the Web, such as peerreviewed articles, papers or theses, or references thereto.
Example: Enter the medical term/subject “congenital heart disease in adults” in
the scholar page search box to get a list of URLs containing (English) articles or
abstracts related to that subject, such as
http://bja.oxfordjournals.org/cgi/content/short/93/1/129
Please note that search results are usually introductory paragraphs or abstracts of
articles/papers. Full texts can normally only be viewed by subscribers and/or
registered users.
14
What sources to look for from a T&I perspective
•
•
•
•
•
Web-based monolingual encyclopedias or other
comprehensive references, such as Merriam Webster or
Wikipedia, which is available in several languages (see
Google Directory above)
Bi- or multilingual glossaries and dictionaries
Parallel texts, such as appearing on websites operated by
multinational groups of companies (e.g. annual reports) or
government institutions
Monolingual specialized texts, such as articles or papers
(see Google Scholar page above)
Translation and interpretation portals/venues/resource
collections, such as ProZ.com ☺
 Steffen Walter 2006
Broad categorization of online sources relevant to translation and interpretation
Examples:
1.
Encyclopedias and similar:
Merriam Webster - http://www.m-w.com/ , Wikipedia http://en.wikipedia.org/wiki/Main_Page (English),
http://de.wikipedia.org/wiki/Hauptseite (German),
http://fr.wikipedia.org/wiki/Accueil (French)
Encyclopædia Britannica - http://www.britannica.com/
(subscription fee applies after free trial)
Answers.com – http://www.answers.com
HowStuffWorks – http://www.howstuffworks.com
2.
Glossaries/dictionaries to be searched for according to specific
context and bookmarked and/or downloaded for further use (-> see also
Glosspost - http://www.proz.com/glosspost)
3.
For parallel texts see, for example,
http://www.daimlerchrysler.com/ (English) vs http://www.daimlerchrysler.de
(German) ,
http://www.bundesregierung.de/Webs/Breg/DE/Homepage/home.html (German)
vs http://www.bundesregierung.de/Webs/Breg/EN/Homepage/home.html
(English) vs
http://www.bundesregierung.de/Webs/Breg/FR/Homepage/home.html (French)
4.
For monolingual specialized texts see Google Scholar page above
15
And now for some more practical examples …
… straight out of KudoZ:
•
•
•
•
http://www.proz.com/kudoz/1621556 - DE>EN
filtergängig
Assuming that “filtergängig” denotes matter or particles able
to pass through the filter, I came up with the guess “filterpassing”.
To confirm, a parallel search was performed for
“filtergängige” and “filter-passing” was carried out
Result: parallel title English/German at
http://www.vdi.de/vdi/vrp/richtliniendetails/index.php?ID=27
56443
 Steffen Walter 2006
Practical examples to demonstrate search options
16
Some more practical examples … (2)
•
•
•
http://www.proz.com/kudoz/831339 - DE>EN
“Kreuzdiagnosen und Sterndiagnosen“
As these terms seemed to be related to the ICD coding of diseases, and
the asker indicated that she wanted a confirmation of UK usage, two
search steps were performed:
a) A site-specific Web search “ICD Kreuzdiagnosen site:.de”, which
yielded a hit at http://www.uniessen.de/nieren&hochdruck/Pages/DRGNephrol.html that confirmed
“Kreuzdiagnosen” to be superior to “Sterndiagnosen” while both belonged
to the ICD system
b) Assuming that DE “Stern” was equivalent to EN “asterisk”, a second
search for “ICD ‘asterisk codes’ site:.uk” was undertaken, which
confirmed the use of “asterisk codes” in the authoritative sources
http://www.connectingforhealth.nhs.uk/dscn/dscn2001/072001.pdf and
http://www.ic.nhs.uk/casemix/faq/sub4/hrg_grouper_8
As a by-product, these documents also proved “dagger codes” to be the
EN equivalent of “Kreuzdiagnosen”, with the “dagger” being the + sign.
 Steffen Walter 2006
Practical examples to demonstrate search options (please note that links
originally quoted in the question were replaced with current ones serving the
same purpose)
17
Some more practical examples … (3)
•
•
•
http://www.proz.com/kudoz/855179 EN>DE “feed and discharge chutes”
Based on prior knowledge that “feed and discharge” are often
rendered as “Aufgabe und Austrag” in German in materials
handling and conveying equipment while “chutes” are
“Schurren” or “Rutschen”, a “site:.de” search for each of the
compound nouns “Aufgabeschurre”, “Aufgaberutsche”,
“Austragsrutsche” and “Austragsschurre” was performed.
Examples of results:
http://www.vhvanlagenbau.de/www/Deutsch/Produkte/Zubehoer/schurrenhauben
_main.htm (“Aufgabeschurre”, with images)
http://www.maschinensucher.de/bruecken/Verfahrenstechnik.html
(“Austragsschurre”)
 Steffen Walter 2006
Practical examples to demonstrate search options
18
Some more practical examples … (4)
•
http://www.proz.com/kudoz/692500 - DE>EN
“Rückbildungsphase” in a cardiology context (ECG)
•
To confirm the initial assumption that “repolarization phase”
would be an adequate rendition, and based on the premise of
DE and EN parallel Web sources existing in the free Wikipedia
enyclopedia, a two-step search for a) EN ECG Wikipedia and
b) DE EKG Wikipedia was undertaken in order to find source
and assumed target term used in the description.
Results:
DE http://de.wikipedia.org/wiki/Elektrokardiogramm contains
“Erregungsrückbildung” while the parallel EN description at
http://en.wikipedia.org/wiki/Electrocardiogram uses
“repolarization” - cf. DE T-Welle entspricht der
Erregungsrückbildung der Kammer vs. EN The T wave
represents the repolarization of the ventricles.
•
 Steffen Walter 2006
Practical examples to demonstrate search options
19
More tips and tricks if all else fails
•
•
•
Use existing knowledge and linguistic imagination to come up with
“educated guesses” in the target language to be used as search terms
– more often than not this will take you one step further (DE>EN
KudoZ example: http://www.proz.com/kudoz/866781 “Werksstückausstand”, presumably synonymous to the more common
“Werkstücküberstand” – I guessed that “workpiece
projection/overhang” could be appropriate, and found native English
examples sites confirming that suspicion).
Try a parallel bilingual search of the source term/phrase and the
assumed (or partly known) target term/phrase with a restriction to
either source or target language domains.
If used responsibly, Google can be of some help in identifying correct
native usage (for native EN sources, this should always involve a
domain-specific phrase search for .uk, .us, .gov, .au etc. sources).
 Steffen Walter 2006
Some more tips …
20
Quality sources vs web garbage – how to tell
Screening the search hit list for
- Domain extensions (for example, in a scientific context, .edu or
ac.uk sites – if not already specified by the site: command – are
more likely to be of help than .com or .biz sites)
- URLs of reputable companies/institutions that increase the
likelihood of encountering reliable sources
- Correct spelling and grammar (site extracts/paragraphs
containing many errors indicate sloppiness and inaccuracy)
- Using site extracts/sentences shown in hit list
- To identify key terms that indicate the source’s relevance for
the subject matter at hand and
- To get a first impression as to whether the terminology used is
in keeping with the field or text to be translated
- Sources not meeting the above conditions should be
disregarded
 Steffen Walter 2006
An attempt at separating the wheat from the chaff
21
Summary
PROS:
•
The Internet has become a vast knowledge resource to be exploited to
the largest extent possible.
•
Search engines such as Google can be a valuable tool to identify
appropriate sources.
•
Provided basic and advanced features are used in the right combination
to refine the search, satisfactory results can be achieved.
CONS:
•
Despite frequent success in identifying appropriate sources, Google
should be used with the proverbial pinch of salt.
•
As such, the sheer number of hits is irrelevant in the majority of cases.
•
Due care should be taken to identify quality sources as far as practically
possible (see above) in order to steer clear of any GIGO phenomenon.
•
Internet search is recommended only after unsuccessful consultation of
offline or other established sources, such as glossaries/references
delivered by customers or own resources developed over time.
•
Google cannot replace own thinking and competence both in source and
target language.
 Steffen Walter 2006
Value and limitations of Google as an aid to translation/interpretation
GIGO – Garbage In, Garbage Out
22
Beyond Google – other searchable
resources
Encyclopedias and similar:
Merriam Webster - http://www.m-w.com/
Wikipedia:
http://en.wikipedia.org/wiki/Main_Page (English),
http://de.wikipedia.org/wiki/Hauptseite (German),
http://fr.wikipedia.org/wiki/Accueil (French)
Encyclopædia Britannica http://www.britannica.com/ (subscription fee applies
after free trial)
Answers.com – http://www.answers.com
HowStuffWorks – http://www.howstuffworks.com
 Steffen Walter 2006
23