What Is Needed To Get MT Right? Understanding ROI, Pitfalls and Best Practices Kirti Vashee Vice President [email protected] Copyright © 2013, Asia Online Pte Ltd • What are the MT options available today? • What are the most important determinants of success with MT? • Understanding customization from an overall control and management viewpoint • What are the ROI implications? • What are other critical factors that determine success? Copyright © 2013, Asia Online Pte Ltd • Free Online MT Systems – Google Translate – Bing Translate – 250 – 500 Million Users/Day • Open Source – Moses / Apertium • Instant Custom MT – Upload & Pray • Expert MT Systems Copyright © 2013, Asia Online Pte Ltd How may I help you today? • Real-Time Translation Model – Translation must be immediate – No editing – Content Type Examples: • Inter-Office Chat • Multilingual Customer Support • Immediate Document Translation • Content Publishing – Bulk Translation • Knowledge Base • Support Articles – Specialist Translation • Marketing • Product Documentation • Data Mining – Sentiment Analysis – Social Media Monitoring Copyright © 2013, Asia Online Pte Ltd Adapting an MT engine to minimize the amount of post editing required Adapting an MT engine so that it translates more in context = The Highest Possible MT Output Quality to get work done Faster and at Lower Cost Copyright © 2013, Asia Online Pte Ltd LP Source JA-EN なお, 以下の座標系の定義は 以下の通り。 JA-EN JA-EN EN-DE EN-DE EN-DE Human Reference Definitions pertaining to the coordinate systems are given below. せん断試験の管理特性を規定 Are the control characteristics of し判断基準は明確か shearing test defined to specify criteria for judgement clearly? ベントチューブスポット溶接の Is the strength of spot-welds on 強度は確認しているか vent tubes checked? An alternate host can start the Alternative Gastgeber können das meeting and act as the host. Meeting starten und als Gastgeber handeln. You can publish a recorded Sie können eine aufgezeichnete training session that was Schulungssitzung veröffentlichen, created with WebEx Recorder. die mit dem WebEx-Rekorder aufgezeichnet wurde. Once customer approves your Wenn der Kunde Ihre Anforderung request, the customer can select genehmigt, kann er eine an application to share. Applikation zum Teilen auswählen. EN-ES Remove the steel ball from the main oil gallery before cleaning. EN-ES Continuously with the ignition on and the propulsion system active. EN-ES The average response time goal is assigned a specific time goal. Customized Furthermore, the definition of the coordinate systems are as follows. Are the control characteristics of shear test defined to specify criteria for judgement clearly? Is the strength of spot-welds on vent tubes checked? Alternative Gastgeber können das Meeting starten und als Gastgeber handeln. Sie können eine aufgezeichnete schulungssitzung veröffentlichen, die mit dem WebEx-Rekorder erstellt wurde. Wenn der Kunde Ihre Anforderung genehmigt, kann der Kunde eine Applikation zum Teilen auswählen. Retire la bola de acero de la Retire la bola de acero de la canalización de aceite principal canalización de aceite principal antes de limpiar. antes de la limpieza. Continuamente con el encendido Continuamente con el encendido conectado y el sistema de en posición on y el sistema de propulsión activo. propulsión activo. El objetivo del tiempo de respuesta El objetivo del tiempo de medio se asigna a un objetivo de respuesta medio se asigna a un tiempo específico. objetivo de tiempo específico. Foundation Furthermore, the following coordinate system as defined. Shear test criterion for defining characteristics of the clear? It is the intensity of the welding spot vent tubes? Stellvertretendes Gastgeber beginnen können und so zu tun, als die Tagung des Aufnahmelandes. Sie können eine namentliche Fortbildungsveranstaltung veröffentlichen, mit WebEx Fahrtenschreiber. Wenn Verbraucher stimmt ihrem Antrag, der Kunde auswählen können, einen Antrag zu teilen. Eliminar la bola de acero de la limpieza galería antes de petróleo. Continuamente con la ignición en activo y el sistema de propulsión. La meta media del tiempo de respuesta se asigna una meta del momento específico. Customization teaches an engine how to translate using YOUR style and vocabulary Copyright © 2013, Asia Online Pte Ltd Just Add Water Upload Data If it was really this easy, don’t you think custom MT success stories would be everywhere? Time, Effort, Skill and Investment are Mandatory Copyright © 2013, Asia Online Pte Ltd • One Button Custom MT • Simply upload your data and magic happens to create a custom MT engine in hours/minutes. • A new phase of MT over promising? Flaws in the Instant Instant MT Approach • Instant MT cannot not read your mind. • Instant MT cannot determine which writing style, target audience, formats, vocabulary or capitalization you want. • Instant MT cannot determine what is missing and whether your data is suitable for your goal. • You don’t know what the right data is. Copyright © 2013, Asia Online Pte Ltd • Ease of use hides complexity and lack of key functionality • Perception of control, but almost all controls are missing. – The ability to upload data does not give you control • An inherent issue with DIY MT, whether Moses based or from a commercial service – it implies that the user knows how to do-it-themselves. The DIY Model enables users to create low quality MT engines very easily Detailed DIY and Language Studio™ comparison http://www.asiaonline.net/EN/Resources/Articles Copyright © 2013, Asia Online Pte Ltd MT System Quality Characteristics – Productivity Implications Free Online Engines Can be useful in some languages but often lower productivity than using TM alone and impossible to adapt to specific needs 1,000 to 3,000 Words/ Day per human editor Average segment quality = ~25 to 40% TM Fuzzy Match Human TEP Process Typically produce 2,500 Words / Day per translator Low Quality - Moses Less than 5% of these systems can outperform free online MT and best case productivity may be in the 3,000 Words/Day range Average segment quality = 50% - 60% TM Fuzzy Match Average Expert System These systems can provide 5,000 to 7,000 Words/Day per editor Average segment quality = 60% - 75% TM Fuzzy Match Superior Expert These systems can provide 9,000 to 12,000 Words/Day per editor Average segment quality = 70% - 85% TM Fuzzy Match Exceptional MT Copyright © 2013, Asia Online Pte Ltd These systems can provide 12,000+ Words/Day per editor Average segment quality = 80% - 90% TM Fuzzy Match MT learns from post editing feedback and quality of translation constantly improves. Cost of post editing progressively reduces as MT quality increases after each engine learning iteration. Cost Per Word Post Editing Cost 6 5 4 3 2 1 Post Editing (Human Translation) MT Post Editing 1 2 3 4 5 6 Engine Learning Iteration Post Editing Effort Reduces Over Time Publication Quality Target Quality The post editing and cleanup effort gets easier as the MT engine improves. Initial efforts should focus on error analysis and correction of a representative sample data set. Each successive project should get easier and more efficient. Post Editing Effort Raw MT Quality 1 2 3 4 5 Engine Learning Iteration Copyright © 2013, Asia Online Pte Ltd 6 Standard TEP Excellent Moses Average Expert Excellent Expert 2,500 3,000 6,000 9,000 Hourly Rate $45 $45 $45 $45 Word Rate 15 cents 12 cents 10 cents 7.5 cents Daily Cost at Hourly Rate $360 $360 $360 $360 Daily Cost at Word Rate $375 $360 $600 $675 $72,000.00 $ 60,000.00 $30,000.00 $20,000.00 $75,000.00 $ 60,000.00 $50,000.00 200.00 166.67 83.33 $37,500.00 55.56 Translated Words / Day 500,000 Word Project Hourly Cost Word Rate Cost Man Days Copyright © 2013, Asia Online Pte Ltd On MT Customization Copyright © 2012, 2013, Asia Online Pte Ltd • There are many factors that impact machine translation output quality of a custom engine. These include: – The complexity of the language pair and complexity of the subject domain – The quality of the source content – The amount and quality of translation memories available for training – The amount and quality of bilingual dictionaries and glossaries available for training – The domain suitability of the translation memories – The amount of effort put in to the initial customization for normalization and rule refinement – The amount of effort put in by the customer to work with Language Studio™ Linguist to identify and resolve issues – The amount of post edited data fed back into the engine for incremental quality improvement Copyright © 2013, Asia Online Pte Ltd Data Preparation Data Cleaning Translate Training Data Collections Diagnostics and Fine Tuning Quality Assurance Language Pair Foundation Data Original Translation Sources Copyright © 2013, Asia Online Pte Ltd Domain Foundation Data Sub-Domain Specific Data + Client Data Manufactured Data Asia Online Foundation Data + Language Pair Foundation Copyright © 2013, Asia Online Pte Ltd = Custom Engine Domain Foundation A high quality engine is as much about what data is not in the engine as it is about what data is in the engine • High quality data from general domains that is used a as a base to build upon • Data has had extensive cleaning and validation • Data is from trusted sources • Data is balanced and normalized • Data that might negatively influence an engine is removed Copyright © 2013, Asia Online Pte Ltd Basic Client Data Bilingual Translation Memories In domain historical translations in source and target language. Target Language Monolingual Data Monolingual target language text and URLs of in-domain websites. Extra Data (If Available) Bilingual Dictionaries and Glossaries In domain and client specific glossaries and dictionaries. Source Language Non-Translatable Terms Source language terms such as product names and place names that should not be translated. Source Material To Be Translated Source material can be analyzed and processed to further improve quality. Style Guides Rules can be added to match client style guide requirements. Copyright © 2013, Asia Online Pte Ltd Human Feedback Key Correct Raw MT Mistranslation Syntax/Grammar Terminology Spelling Punctuation Targeted Corrections of Bad Learning Spelling and Terminology Correct Correct Initial System Correct Correct Human Feedback can raise the raw output to previously unseen quality levels Copyright © 2013, Asia Online Pte Ltd Sub-Domain Specific Data RECAP + Client Data Manufactured Data Asia Online Foundation Data + Language Pair Foundation Copyright © 2013, Asia Online Pte Ltd = Custom Engine Domain Foundation • • • • • • • • Gap analysis Unknown word analysis Inflected form creation Grammatical structure creation Syntactic clause creation Terminology normalization Automated rule generation Over 150 tools and processes Copyright © 2013, Asia Online Pte Ltd The quick brown fox over jumps the lazy dog Additional corrective data generated by Language Studio™ Pro The quick brown fox jumps over the lazy dog Buddha jumps over the wall Siemens Wind Power CEO jumps over to Repower Judge jumps over bench in courtroom melee Military surveillance bot jumps over 25 foot walls Robbie Maddison jumps over Tower Bridge on motorbike Man jumps over Grand Canyon Cow jumps over Moon With IE9 in sight, Firefox jumps over 50% market share mark Long jumper Brian Thomas jumps over a car to raise money Rally car jumps over a crazy fan! Kobe jumps over a speeding Aston Martin A deer jumps over a motorcycle A woman jogging in a California state park jumps over a 100-foot cliff to get away from attacker An Afghan Army soldier jumps over a irrigation canal while conducting a foot patrol Copyright © 2013, Asia Online Pte Ltd LP Source Human Reference Customized Foundation JA-EN なお, 以下の座標系の定義は 以下の通り。 Furthermore, the definition of the Furthermore, the following coordinate systems are as follows. coordinate system as defined. JA-EN Are the control characteristics of Shear test criterion for defining shear test defined to specify characteristics of the clear? criteria for judgement clearly? Is the strength of spot-welds on It is the intensity of the welding vent tubes checked? spot vent tubes? Alternative Gastgeber können das Stellvertretendes Gastgeber Meeting starten und als Gastgeber beginnen können und so zu tun, als handeln. die Tagung des Aufnahmelandes. Sie können eine aufgezeichnete Sie können eine namentliche schulungssitzung veröffentlichen, Fortbildungsveranstaltung die mit dem WebEx-Rekorder veröffentlichen, mit WebEx erstellt wurde. Fahrtenschreiber. Wenn der Kunde Ihre Anforderung Wenn Verbraucher stimmt ihrem genehmigt, kann der Kunde eine Antrag, der Kunde auswählen Applikation zum Teilen auswählen. können, einen Antrag zu teilen. Retire la bola de acero de la Eliminar la bola de acero de la canalización de aceite principal limpieza galería antes de petróleo. antes de la limpieza. Continuamente con el encendido Continuamente con la ignición en en posición on y el sistema de activo y el sistema de propulsión. propulsión activo. Instale la palanca de cambios con Instalar el una Nueva palanca de una nueva "torx" tornillo. cambios con tornillo "torx". JA-EN EN-DE EN-DE EN-DE EN-ES EN-ES EN-ES Definitions pertaining to the coordinate systems are given below. せん断試験の管理特性を規定 Are the control characteristics of し判断基準は明確か shearing test defined to specify criteria for judgement clearly? ベントチューブスポット溶接の Is the strength of spot-welds on 強度は確認しているか vent tubes checked? An alternate host can start the Alternative Gastgeber können das meeting and act as the host. Meeting starten und als Gastgeber handeln. You can publish a recorded Sie können eine aufgezeichnete training session that was Schulungssitzung veröffentlichen, created with WebEx Recorder. die mit dem WebEx-Rekorder aufgezeichnet wurde. Once customer approves your Wenn der Kunde Ihre Anforderung request, the customer can select genehmigt, kann er eine an application to share. Applikation zum Teilen auswählen. Remove the steel ball from the Retire la bola de acero de la main oil gallery before cleaning. canalización de aceite principal antes de limpiar. Continuously with the ignition Continuamente con el encendido on and the propulsion system conectado y el sistema de active. propulsión activo. Install the shift lever with a new Instale la palanca de cambios con "torx" screw. un tornillo "torx" nuevo. Customization teaches an engine how to translate using YOUR style and vocabulary Copyright © 2013, Asia Online Pte Ltd Copyright © 2012, 2013, Asia Online Pte Ltd 1. When customizing an MT engine, the user must be able to define and refine the writing style, preferred terminology, target audience and purpose of the custom engine. 2. All the data used within the custom MT engine must be able to be refined to match the needs and purpose of each individual engine. 3. High quality in domain data should be the primary data that is used to build statistical models. 4. Cleaning data requires human cognition and understanding of the data ≠ automatic. Copyright © 2013, Asia Online Pte Ltd When customizing an MT engine, the user must be able to define and refine the writing style, preferred terminology, target audience and purpose of the custom engine. • Clean Data SMT requires a more granular definition mapped to the actual use of the engine than just a generic Top-Level Domain. • Granular sub-domains provide considerably more accurate translations and post editing productivity. Copyright © 2013, Asia Online Pte Ltd Language Top-Level Pair Domain EN-ES Engines/Sub-Domains Honda Automotive Google/Bing or Dirty Data SMT Cars User Manuals Engineering Service Manuals Motorbikes User Manuals Engineering Service Manuals Dirty Data SMT + In Domain Data Toyota Cars Marketing Service Reports Customization Level: 1 Generic 2 Domain 3 Client 4 Product 5 Target Audience / Purpose • Defining a custom engine at a high level (Top-Level Domain) such as "Automotive" or "Information Technology" is not sufficient and will result in inconsistent terminology and writing style. • The minimum starting level for a Clean Data Model is level 3, with terminology defined and normalized for consistency. • To be considered as utilizing the Clean Data SMT model, a plan to reach at least level 4 must be defined where data is further refined from level 3 to level 4 or 5. • True Clean Data SMT models have definition to at least level 4 and usually to level 5. • SMT systems that only define level 2 cannot be considered Clean Data SMT as a Top-Level Domain is too generic to deliver accurate and consistent context and terminology. Copyright © 2013, Asia Online Pte Ltd All the data used within the custom MT engine must be able to be refined to match the needs and purpose of each individual engine. • You should always have control of your own data • What about the other data that is used within the engine • Many MT vendors offer baseline data – – – – Is your data included in the vendors baseline? Can you take data you want out of the vendors baseline? Can you normalize the vendors baseline? Can you verify the quality of the vendors baseline? • Language Studio™ enables full control and transparency of all data in a custom engine Copyright © 2013, Asia Online Pte Ltd High quality in domain data should be the primary data that is used to build statistical models. Dirty Data SMT Model • Data – – – – Gathered from as many sources as possible. Domain of knowledge does not matter. Data quality is not important. Data quantity is important. • Theory – Good data will be more statistically relevant. Clean Data SMT Model • Data – Gathered from a small number of trusted quality sources. – Domain of knowledge must match target – Data quality is very important. – Data quantity is less important. • Theory – Bad or undesirable patterns cannot be learned if they don’t exist in the data. Copyright © 2013, Asia Online Pte Ltd Language Top-Level Pair Domain EN-ES Automotive Engines/Sub-Domains Honda Google/Bing Quality Level Cars User Manuals Engineering Service Manuals Motorbikes User Manuals Engineering Service Manuals Typical Competitor Quality Level Toyota Cars Marketing Service Reports Customization Level: Generic Domain Productivity Gain: ???? < 20-40% Client 50%+ Product 90%+ Target Audience / Purpose 150-300%+ • Generic MT from Google, Bing, etc. offers unknown productivity gains and sometimes productivity loss due to lack of control. • Instant MT offer < 20-40% productivity gains due to top domain only centric and “dirty data SMT” customization model. • Language Studio™ : – Targets of 150-300%+ productivity gains with granular sub-domain “clean data SMT” approach. – Provides complete control of writing style, terminology and is mapped to target audience reducing editing effort. Copyright © 2013, Asia Online Pte Ltd 1 Customize 22. Measure Measure Create a new custom engine using foundation data and your own language assets Measure the quality of the engine for rating and future improvement comparisons 4 Manage 3 Improve Manage translation projects while generating corrective data for quality improvement. Copyright © 2013, Asia Online Pte Ltd Provide corrective feedback removing potential for translation errors. Additional Training Data Runtime Improvements Fine tuning to specific formats and style guide requirements can be performed at runtime without retraining the engine. Each custom engine is a living engine and constantly improves with use. There are many sources of data that can improve an engine’s translation quality. Posted Edited Machine Translations Post editing of raw MT rapidly improves translation quality. Data Manufacturing Language Studio™ will analyze edits and other data and manufacture new data to improve quality. Bilingual Translation Memories Additional in domain historical translations in source and target language that were not included in earlier training. Target Language Monolingual Data Additional monolingual target language text and URLs of in-domain websites that were not included in earlier training. Bilingual Dictionaries and Glossaries Additional in domain and client specific glossaries and dictionaries that were not included in earlier training. Source Language Non-Translatable Terms Additional source language terms that should not be translated that were not included in earlier training. Copyright © 2013, Asia Online Pte Ltd • • • • • • Pre-Translation Javascript Pre-Translation Corrections Non-Translatable Terms Runtime Glossary Post-Translation Adjustments Post-Translation Javascript These features enable: • Normalization of terms • Control of preferred terminology • Mapping of complex rules as specified in the style guide Before Machine Translation Source text is processed and modified. Pre-Translation JavaScript (JS) - Complex pre-processing can be customized via JavaScript. Pre-Translation Corrections (PTC) - A list of terms that adjust the source text fixing common issues and making it more suitable for translation. Non-Translatable Terms (NTT) - A list of monolingual terms that are used to ensure key terms are not translated. Runtime Glossary (GLO) - A list of bilingual terms that are used to ensure terminology is translated a specific way. Copyright © 2013, Asia Online Pte Ltd After Machine Translation Target text is processed and modified. Post Translation Adjustment (PTA) - A list of terms in the target language that modify the translated output. This is very useful for normalization of target terms. Post Translation JavaScript (JS) - Complex post-processing can be customized via JavaScript. Runtime customizations can be applied in 2 forms: Default: Applied to all jobs. Job Specific: A different set of customizations can be applied for different clients. Productivity is the Best Quality Metric Metrics That Really Count • Productivity – Words per day per human resource • Margin – 2-3 times the profit margin is commonplace • • Raw MT often has a greater number of errors than first pass human translation. However: Consistency – Writing style and terminology 1. MT + Human delivers higher quality than a human only 2. approach New Business New business not accessible with a human only approach Competitive production advantages 3. Examples of other “Useful” Quality Indicators Automated Metrics (Good indicators, but not absolute) • BLEU (Bilingual Evaluation Understudy) • NIST • F-Measure (F1 Score or F-Score) • METEOR (Metric for Evaluation of Translation with Explicit Ordering) Manual Quality Metrics (Most not designed for MT, more for HT) • Edit Distance (Does not take into account complexity of edit) • SAE-J2450 (Industry specific) Copyright © 2013, Asia Online Pte Ltd MT errors are easy to see and easy to fix (i.e. simple grammar). MT provides more accurate and consistent terminology than human translators, especially when more than 1 human works on a project. Human errors may be fewer, but harder to see and harder to fix. Counting the number of errors only, offers no value as a metric as the complexity of the error is not taken into account. MT with more errors is often faster to edit and fix than first pass human translations with “fewer” errors. Margin Time • A performance measure used to evaluate the efficiency of an investment or to compare the efficiency of a number of different investments. • To calculate ROI, the benefit (return) of an investment is divided by the Cost of the Investment; the result is expressed as a percentage or a ratio. The Return On Investment formula: (Gain from Investment - Cost of Investment) ROI = Cost of Investment Copyright © 2013, Asia Online Pte Ltd There will always be someone who will do it cheaper. But at what cost? Copyright © 2013, Asia Online Pte Ltd • Understanding of Total Cost of Ownership (TCO) essential and a means for businesses to assess both direct and indirect costs and benefits related to any purchase. • The intention is to arrive at a final figure that will reflect the effective cost of purchase, all things considered. • There is no specific formula for TCO, it is unique to each organization and requirement. • TCO should include ALL costs over the entire lifetime of the investment, not just the initial costs. – Initial costs, recurring costs, replacement costs, end of life costs… Copyright © 2013, Asia Online Pte Ltd Human Resources Technical Resources Operational Resources Copyright © 2013, Asia Online Pte Ltd Productivity The increase in productivity for all human tasks Gains: within the translation production process and workflow. Profit Margins: The increase in profit margin for each project and progressive improvements in profit margin as custom machine translation engines mature. Copyright © 2013, Asia Online Pte Ltd There are also a number of other less obvious ROI benefits such as greater consistency in translations which increases customer satisfaction. Other business benefits relate more to sales and ongoing business flow. These include: New Business: • New projects that would not be possible without machine translation such as: New deals where machine translation was a component. • New projects that are machine translation only. • New projects where time was a critical factor that would not have been possible with a human only approach. Competitive • Time, quality and price are all factors that impact the competitiveness of Projects: a project bid: Projects where you could offer a more competitive bid that your competitors due to the use of machine translation. • Projects that would have been lost to a competitor without the advantages such as speed, terminology accuracy and writing style consistency that good machine translation offers. • Projects where the ability to be more flexible on pricing during negotiations due to higher profit margins help to win the project. Copyright © 2013, Asia Online Pte Ltd Copyright © 2013, Asia Online Pte Ltd Copyright © 2013, Asia Online Pte Ltd • Directly linked to time to improve quality • Faster improvement delivers faster ROI • Many customers recovers costs on their first project • Projects progressively become lower cost Copyright © 2013, Asia Online Pte Ltd IOLAR – From DIY Moses to Language Studio™ Copyright © 2012, 2013, Asia Online Pte Ltd • German -> Slovenian • Technical Engineering • Project of ~1 million words • Encouraged by TAUS and other DIY MT advocates tried their own DIY MT engine – Hired a computational linguist – Spent 6 months trying to build a quality system – Result: • Inconsistent and unusable results • Google was better quality (although also unusable) • Considerable time, effort and cost spent • Built a Expert Customized Engine with Asia Online – Worked with Language Studio™ Linguists who created a custom engine plan to deliver high quality output and address issues – Result: • In a very short period had an engine that was delivering high quality output • Some output was as good as their human translators were producing • Cost and time was greatly reduced compared to human only or DIY approach Copyright © 2013, Asia Online Pte Ltd http://www.asiaonline.net/EN/Resources/CaseStudies/IOLAR1.aspx Issue: As Slovenian is a heavily inflected language, one of the very common issues was that the correct term was being translated, but in the incorrect inflected form. In many cases, the correct inflected form was not in the translation memories provided by IOLAR. Solution: Language Studio™ Advanced Data Manufacturing tools were used to manufacture appropriate inflected forms in the correct context. This data would be used to ensure that the correct inflected form was available in the training data and thus reducing the number of incorrect inflections in the output. Copyright © 2013, Asia Online Pte Ltd • • • • • IOLAR had a working engine in just a few weeks The quality was immediately useable Terminology was very consistent Writing style was very consistent Using blind test sets on initial engine: – 32 BLEU points better than Google Translate – 34 BLEU points better than Microsoft Translator • On receipt of first round of post edits quality improved an additional 4 BLEU points Copyright © 2013, Asia Online Pte Ltd Since our initial internal efforts did not progress with the desired speed we turned to Asia Online to deal with the growing urgency being communicated by our clients. ... From a business perspective it was clear that outsourcing to an expert was a better strategy than a DIY struggle, and I would say that our investment in Asia Online’s Language Studio™ technology was one of the best technology investments that we have made. ... Some of the very technical segments were the same quality as human translation. – Simon Bratina, Executive Technical Director, IOLAR Copyright © 2013, Asia Online Pte Ltd • English to ES, RU, JA, ID and FRCA • Food Preparation • Project of 1+ million words • Very limited resources available for training MT engines – ALT collaborated with Asia Online to develop a terminology-driven data manufacturing strategy – Built up critical data resources very strategically to ensure success and demonstrated productivity • Built Several Customized Engines that Delivered on Productivity, even in Japanese – Worked with Language Studio™ Linguists who created a custom engine plan to deliver high quality output and address issues – Result: • In a very short period had an engine that was delivering high quality output • Resulted in improved relationship with customer and growing business • Developed a collaoration model for including post-editors into compensation determination process http://www.asiaonline.net/EN/Resources/CaseStudies/AdvancedLanguageTranslation1.aspx Copyright © 2013, Asia Online Pte Ltd “ There were far fewer errors produced by the Language Studio™ custom MT engine than the competitor's legacy MT engine. Notably there were fewer wrong meanings, structural errors and wrong terms in the Language Studio™ custom MT engine, that were "typical SMT problems" in the competitor's legacy MT engine. “ ” The final translation quality after post-editing was better with the new Language Studio™ custom MT engine than the competitor's legacy MT engine and also better than a human only translation approach. Terminology was more consistent with a combined Language Studio™ custom MT engine plus human post editing approach. Copyright © 2013, Asia Online Pte Ltd ” Asia Online v. Competing MT System Factor Total Raw J2450 Errors 2x Fewer Raw J2450 Score 2x Better Total PE J2450 Errors 5.3x Fewer PE J2450 Score 4.8x Better PE Rate 32% Faster “ We found that 52% of the raw original output from Asia Online had no errors at all – which is great for an initial engine. ” – Kevin Nelson, Managing Director, Omnilingua Worldwide How do you pay post-editors fairly if each engine is different? Tools Needed: • Effective Quality metrics – Automated – Human • Confidence scores – Scores on a 0-100 scale – Can be mapped to fuzzy TM match equivalents • Post Edit Quality Analysis – After editing is complete or even while editing is in progress, effort can be easily measured Copyright © 2013, Asia Online Pte Ltd Manage Expectations of Key Players PMs, Editors, Translators Clients Provide Many Editing Examples Get MT Output to Acceptable Levels Fix Dumb Errors early and quickly to minimize repetition Cycle quickly through initial MT development cycles Take feedback seriously and incorporate quickly Retrain the engine frequently so edits get easier Establish Fair & Reasonable Compensation Based on benchmarks and real throughput Err on the side of overcompensation than under Community Management Train Editors Communicate & Collaborate Quality Management Copyright © 2013, Asia Online Pte Ltd Human Corporate Products User Interface Existing Markets $36B New Markets Corporate Brochures 2,000 Product Brochures 10,000 Software Products 50,000 Manuals / Online Help 200,000 Enterprise Information HR / Training / Reports 500,000 Support / Knowledge Base Copyright © 2013, Asia Online Pte Ltd Words User Documentation Communications Machine Example User Generated Content Email / IM 10,000,000 Call Center / Help Desk 20,000,000+ Blogs / Reviews 50,000,000+ www.kv-emptypages.blogspot.com Thank You Kirti Vashee – [email protected] Follow Me on Twitter: @kvashee Join the Automated Language Translation Group in LinkedIn Copyright © 2013, Asia Online Pte Ltd
© Copyright 2024