ProZ.com Regional Conference – Edinburgh, UK, 10/11 November 2006 EFFICIENT WEB SEARCH STRATEGIES OR HOW TO USE GOOGLE IN AN APPROPRIATE MANNER IN A T&I CONTEXT Steffen Walter – http://proz.com/pro/34047 Steffen Walter 2006 Ways and means to achieve better Internet search results 1 WHY THIS PRESENTATION? Available search options seem to be used only to an insufficient extent ProZ.com evidence – many KudoZ questions could have been resolved before they were posted (by performing appropriate ProZ.com term and/or Internet searches) DO YOUR OWN RESEARCH – IT’S A BASIC TRANSLATOR SKILL! Steffen Walter 2006 2 CONTENT OVERVIEW CAUTION! Google is an IT-based tool NOT replacing the human intellect How it works Getting started – the Google Toolbar Web search basics and advanced features What sources to look for from a T&I perspective Quality sources vs web garbage – how to tell Other searchable resources Summary Steffen Walter 2006 Overview of presentation content 3 HOW GOOGLE WORKS • • • • • Google presumably holds the world’s largest database of (cached) websites and other documents So-called “spiders” or “robots” (a special software developed by Google) index and rank websites and other documents On a given website, “spiders” collect data by moving from hyperlink to hyperlink on that page Collected data is processed by Google to establish the “page ranking” (reflected in the order of URL appearance after a specific search) Shortcoming: page ranking is impacted by advertising, which can distort search results Steffen Walter 2006 Principle of Google 4 Google help screen – more info Steffen Walter 2006 Use this Google subpage – http://www.google.com/support - to get familiar with search options and additional features, such as the Google Toolbar. 5 Start: the Google Toolbar for added search convenience Steffen Walter 2006 To download the Google toolbar, go to the above page http://toolbar.google.com. Available for Internet Explorer and Mozilla Firefox. Note the “Page Rank” feature in the toolbar (cf. “How Google Works”) 6 Google start page – simple search Steffen Walter 2006 Simple search input screen – advanced screen explains search options in more detail 7 Google advanced search screen Steffen Walter 2006 8 Web search basics • • • • • Any term combination entered in the search box will yield hits with ALL terms (there is no need to add AND inbetween) Example: translation interpreting editing proofreading generates a list of URLs/pages containing all four terms Choose keywords that are as specific as possible to reduce the number of hits and increase their relevance (i.e. go from more to less specific in the search process) Common words, such as (in English) the, in, or of, are usually ignored by the search algorithm (they must be preceded by a plus sign to be taken into account, such as in translation +in +the world) Searches are not case-sensitive – TRansLaTion will yield the same hits as translation Steffen Walter 2006 Example: Try a Google Web search for the term sequence translation interpreting editing proofreading (entered exactly as shown here, without inverted commas). Pages/documents will appear that contain all four terms, such as several ProZ.com profile hits. 9 Advanced search features • • • • • If required, terms can be excluded by putting a minus sign immediately in front, such as in languages translation interpreting -proofreading or auctions paintings -ebay Phrase search can be performed by putting the entire phrase in inverted commas, such as in “translation in the world” Use of the tilde sign ~ in front of terms enables fuzzy search for synonyms or related words, as in ~vehicles ~transport ~shipping, which also yields hits with terms such as shipments or transportation Use of the separator symbol | - i.e. the shortcut AltGr <> (such as in “to make|take a decision” – in inverted commas) will show hits with either of the phrases “to make a decision” and “to take a decision” Search can be restricted to specific domains or URLs by using the site: command, as in the sequences translation site:.proz.com, IAS IFRS “accounting standards” site:.eu.int, or “code of conduct” site:.iti.org.uk Steffen Walter 2006 Examples: 1. Go to the Google Web search page and type in languages translation interpreting proofreading and languages translation interpreting proofreading, or auctions paintings ebay and auctions paintings -ebay. Compare the number of hits -- it will be larger in the latter cases as less terms are taken into account, which results in a lesser degree of specificity. 2. Type the entire phrase “translation in the world” in the search box, which gives, among other hits, a (somewhat outdated) paper on machine translation – see http://www.eamt.org/archive/summit95.html 3. Type in ~vehicles ~transport ~shipping -- and you will see other, related terms appearing in the search results 4. Keying in translation site:.proz.com will yield all hits pertaining to the ProZ.com site that contain the term translation. Also try the second combination IAS IFRS “accounting standards” site:.eu.int, which will lead you straight to the relevant EU Directive at http://europa.eu.int/eurlex/pri/de/oj/dat/2003/l_261/l_26120031013de00010420.pdf (please note that this document reflects the 2003 version of the standards, which has been updated and expanded in the meantime). The third hit appearing as a result of the search for“code of conduct” site:.iti.org.uk gets you directly to the downloadable code of conduct documents of the UK Institute of Translation and Interpreting accessible from http://www.iti.org.uk/pages/joinITI/code.asp 10 More advanced search operators Steffen Walter 2006 At http://www.google.com/intl/en/help/operators.html (the screenshot above), you find a complete list of advanced search operators supported by Google, such as the define: command (for example, enter the sequence define:code of conduct to get a list of [English] code of conduct definitions in various contextual settings). 11 Special search features Steffen Walter 2006 Features other than ordinary web search at http://www.google.com/help/features.html 12 There’s more to it than pure Web search Steffen Walter 2006 Google includes more options than pure web search, such as: •the Google Directory - http://www.google.com/dirhp (see above), a search tool organized along broad main categories and more specific sub-categories Examples: The main category “References” includes links to online encyclopedias, monoand multilingual dictionaries, libraries, and many more The “Translation” subcategory is ‘hidden’ behind Science -> Social Sciences -> Linguistics -> Translation The “World” category contains a similar search structure linking to sources in languages other than English •the Google Scholar site – see next slide 13 There’s more to it than pure Web search (2) Steffen Walter 2006 See the screenshot of the Google scholar page (advanced search) above http://scholar.google.com/advanced_scholar_search?hl=en&lr, which serves as a tool restricting your search to scientific publications on the Web, such as peerreviewed articles, papers or theses, or references thereto. Example: Enter the medical term/subject “congenital heart disease in adults” in the scholar page search box to get a list of URLs containing (English) articles or abstracts related to that subject, such as http://bja.oxfordjournals.org/cgi/content/short/93/1/129 Please note that search results are usually introductory paragraphs or abstracts of articles/papers. Full texts can normally only be viewed by subscribers and/or registered users. 14 What sources to look for from a T&I perspective • • • • • Web-based monolingual encyclopedias or other comprehensive references, such as Merriam Webster or Wikipedia, which is available in several languages (see Google Directory above) Bi- or multilingual glossaries and dictionaries Parallel texts, such as appearing on websites operated by multinational groups of companies (e.g. annual reports) or government institutions Monolingual specialized texts, such as articles or papers (see Google Scholar page above) Translation and interpretation portals/venues/resource collections, such as ProZ.com ☺ Steffen Walter 2006 Broad categorization of online sources relevant to translation and interpretation Examples: 1. Encyclopedias and similar: Merriam Webster - http://www.m-w.com/ , Wikipedia http://en.wikipedia.org/wiki/Main_Page (English), http://de.wikipedia.org/wiki/Hauptseite (German), http://fr.wikipedia.org/wiki/Accueil (French) Encyclopædia Britannica - http://www.britannica.com/ (subscription fee applies after free trial) Answers.com – http://www.answers.com HowStuffWorks – http://www.howstuffworks.com 2. Glossaries/dictionaries to be searched for according to specific context and bookmarked and/or downloaded for further use (-> see also Glosspost - http://www.proz.com/glosspost) 3. For parallel texts see, for example, http://www.daimlerchrysler.com/ (English) vs http://www.daimlerchrysler.de (German) , http://www.bundesregierung.de/Webs/Breg/DE/Homepage/home.html (German) vs http://www.bundesregierung.de/Webs/Breg/EN/Homepage/home.html (English) vs http://www.bundesregierung.de/Webs/Breg/FR/Homepage/home.html (French) 4. For monolingual specialized texts see Google Scholar page above 15 And now for some more practical examples … … straight out of KudoZ: • • • • http://www.proz.com/kudoz/1621556 - DE>EN filtergängig Assuming that “filtergängig” denotes matter or particles able to pass through the filter, I came up with the guess “filterpassing”. To confirm, a parallel search was performed for “filtergängige” and “filter-passing” was carried out Result: parallel title English/German at http://www.vdi.de/vdi/vrp/richtliniendetails/index.php?ID=27 56443 Steffen Walter 2006 Practical examples to demonstrate search options 16 Some more practical examples … (2) • • • http://www.proz.com/kudoz/831339 - DE>EN “Kreuzdiagnosen und Sterndiagnosen“ As these terms seemed to be related to the ICD coding of diseases, and the asker indicated that she wanted a confirmation of UK usage, two search steps were performed: a) A site-specific Web search “ICD Kreuzdiagnosen site:.de”, which yielded a hit at http://www.uniessen.de/nieren&hochdruck/Pages/DRGNephrol.html that confirmed “Kreuzdiagnosen” to be superior to “Sterndiagnosen” while both belonged to the ICD system b) Assuming that DE “Stern” was equivalent to EN “asterisk”, a second search for “ICD ‘asterisk codes’ site:.uk” was undertaken, which confirmed the use of “asterisk codes” in the authoritative sources http://www.connectingforhealth.nhs.uk/dscn/dscn2001/072001.pdf and http://www.ic.nhs.uk/casemix/faq/sub4/hrg_grouper_8 As a by-product, these documents also proved “dagger codes” to be the EN equivalent of “Kreuzdiagnosen”, with the “dagger” being the + sign. Steffen Walter 2006 Practical examples to demonstrate search options (please note that links originally quoted in the question were replaced with current ones serving the same purpose) 17 Some more practical examples … (3) • • • http://www.proz.com/kudoz/855179 EN>DE “feed and discharge chutes” Based on prior knowledge that “feed and discharge” are often rendered as “Aufgabe und Austrag” in German in materials handling and conveying equipment while “chutes” are “Schurren” or “Rutschen”, a “site:.de” search for each of the compound nouns “Aufgabeschurre”, “Aufgaberutsche”, “Austragsrutsche” and “Austragsschurre” was performed. Examples of results: http://www.vhvanlagenbau.de/www/Deutsch/Produkte/Zubehoer/schurrenhauben _main.htm (“Aufgabeschurre”, with images) http://www.maschinensucher.de/bruecken/Verfahrenstechnik.html (“Austragsschurre”) Steffen Walter 2006 Practical examples to demonstrate search options 18 Some more practical examples … (4) • http://www.proz.com/kudoz/692500 - DE>EN “Rückbildungsphase” in a cardiology context (ECG) • To confirm the initial assumption that “repolarization phase” would be an adequate rendition, and based on the premise of DE and EN parallel Web sources existing in the free Wikipedia enyclopedia, a two-step search for a) EN ECG Wikipedia and b) DE EKG Wikipedia was undertaken in order to find source and assumed target term used in the description. Results: DE http://de.wikipedia.org/wiki/Elektrokardiogramm contains “Erregungsrückbildung” while the parallel EN description at http://en.wikipedia.org/wiki/Electrocardiogram uses “repolarization” - cf. DE T-Welle entspricht der Erregungsrückbildung der Kammer vs. EN The T wave represents the repolarization of the ventricles. • Steffen Walter 2006 Practical examples to demonstrate search options 19 More tips and tricks if all else fails • • • Use existing knowledge and linguistic imagination to come up with “educated guesses” in the target language to be used as search terms – more often than not this will take you one step further (DE>EN KudoZ example: http://www.proz.com/kudoz/866781 “Werksstückausstand”, presumably synonymous to the more common “Werkstücküberstand” – I guessed that “workpiece projection/overhang” could be appropriate, and found native English examples sites confirming that suspicion). Try a parallel bilingual search of the source term/phrase and the assumed (or partly known) target term/phrase with a restriction to either source or target language domains. If used responsibly, Google can be of some help in identifying correct native usage (for native EN sources, this should always involve a domain-specific phrase search for .uk, .us, .gov, .au etc. sources). Steffen Walter 2006 Some more tips … 20 Quality sources vs web garbage – how to tell Screening the search hit list for - Domain extensions (for example, in a scientific context, .edu or ac.uk sites – if not already specified by the site: command – are more likely to be of help than .com or .biz sites) - URLs of reputable companies/institutions that increase the likelihood of encountering reliable sources - Correct spelling and grammar (site extracts/paragraphs containing many errors indicate sloppiness and inaccuracy) - Using site extracts/sentences shown in hit list - To identify key terms that indicate the source’s relevance for the subject matter at hand and - To get a first impression as to whether the terminology used is in keeping with the field or text to be translated - Sources not meeting the above conditions should be disregarded Steffen Walter 2006 An attempt at separating the wheat from the chaff 21 Summary PROS: • The Internet has become a vast knowledge resource to be exploited to the largest extent possible. • Search engines such as Google can be a valuable tool to identify appropriate sources. • Provided basic and advanced features are used in the right combination to refine the search, satisfactory results can be achieved. CONS: • Despite frequent success in identifying appropriate sources, Google should be used with the proverbial pinch of salt. • As such, the sheer number of hits is irrelevant in the majority of cases. • Due care should be taken to identify quality sources as far as practically possible (see above) in order to steer clear of any GIGO phenomenon. • Internet search is recommended only after unsuccessful consultation of offline or other established sources, such as glossaries/references delivered by customers or own resources developed over time. • Google cannot replace own thinking and competence both in source and target language. Steffen Walter 2006 Value and limitations of Google as an aid to translation/interpretation GIGO – Garbage In, Garbage Out 22 Beyond Google – other searchable resources Encyclopedias and similar: Merriam Webster - http://www.m-w.com/ Wikipedia: http://en.wikipedia.org/wiki/Main_Page (English), http://de.wikipedia.org/wiki/Hauptseite (German), http://fr.wikipedia.org/wiki/Accueil (French) Encyclopædia Britannica http://www.britannica.com/ (subscription fee applies after free trial) Answers.com – http://www.answers.com HowStuffWorks – http://www.howstuffworks.com Steffen Walter 2006 23
© Copyright 2024