Reading Between The Lines With Context

Achieving Effective Hyper-Localization Through Context Awareness Models

Published on in Advice / Tips & Tricks

By Tim YoungHoon Jung, Founder and CEO of XL8 

The tone of an email can easily be misread or misinterpreted. The same is true for translated movie or television content. In fact, it’s one of the biggest challenges that impacts the viewer experience across the globe. Getting content out to the market faster means more consumers can enjoy the titles they want, and content owners can efficiently use and monetize their extensive libraries by giving them an extended life in new markets. 

Variety is important to consumers; however, we must always remember that positive viewing experiences hinge on accuracy, tone, intent - in a word, context. 

Hyper-localization, which considers the various dialects and local vernaculars of a country’s regions, creates additional complexities when translating content. Therefore, localization strategies must also address hyper-localization and its impact on the overall message. Creating the right balance takes time, and unfortunately, translators are often required to turn around localized content under significantly compressed delivery timeframes. This is where context-aware Machine Translation (MT) based technologies can speed the localization process and deliver higher accuracy.

Translating text ‘word for word’ is now the baseline, which is the minimum expectation of AI-based machine translation. As audiences become more demanding, the stakes are raised even higher for localization providers to go above and beyond, delivering highly accurate meanings according to context while taking into account cultural, historical, and regional variations.

This article builds upon the previous discussions below and provides a deeper exploration of context-aware localization:

  • The first blog outlines the vision of XL8, the growing trend of combining translation with “hyper-localization” capabilities, and the need for and benefits of context awareness to achieve accurate content translation. 
  • The second blog provides background on XL8’s use of machine translation technology and its Context Awareness models for providing more accurate translations of colloquial phrases. Ultimately, XL8’s MT engines enable providers to fully understand the circumstances behind the original text and accurately translate dialog by actually “localizing” content instead of simply translating it “word for word.”

Most MT models translate sentences one by one, losing critical context that is outside of the primary sentence. Instead, context-aware translation models, like the ones employed by XL8, use information “surrounding” the source sentence. 

Context awareness also looks at sentences as a whole to determine if words should be feminine or masculine. Consider these two sentences: “I like this flower. Put it in the bag.” 

The subject “it” in the second sentence refers to the flower. In English, the pronoun “it” doesn’t vary based on the gender of the subject word. In a language like French, for example, pronouns change according to gender. 

Our CA models convert the English sentences to: J’aime cette fleur. Mets-la dans le sac.

The first “le” that appears in the first French sentence was masculine, while the correctly translated pronoun form of "it", denoting the flower, would be the feminine; “la”

Considering the context of a conversation, in other words, “reading between the lines,” was until very recently considered solely to be a human capability. Now, Context Awareness models are rapidly gaining that same ability. These CA models provide stunning results in terms of accuracy, often exceeding the expectations of translators while providing extraordinary consistency across the document as well as an entire series. 

It’s up to individual content owners or licensees to decide on the level of accuracy they need for each application. For example, live event audiences are more tolerant of mistakes created by AI-generated captioning, but mistakes detract from “offline” experiences such as viewing pre-recorded, broadcast, or OTT content.

With different language pairs, we’re able to achieve different levels of accuracy; and in fact, a growing number of language pairs exceed the mid-to-high 90 percent range. But, don’t just take our word for it. Research recently conducted by a committee of localization partners conducted testing of our new model for translating English to LATAM Spanish using several categories of programming (e.g., sci-fi, comedy, food, travel, drama). The tests were conducted with and without the XL8 Context Awareness model applied. 

Although both sets performed well, the accuracy of XL8's Context Awareness model averaged 95.5% while the normal model average was 91.2% (a percentage change of +4.3%.)

Overall, the CA model was more accurate regarding gender and formality consistency among multiple subtitles. While both performed well at providing coherent sentences, even when faced with misspelled words or odd phrasings, the CA model was more accurate with certain categories like food, where dishes were described in extreme detail with long lists of ingredients.

English to LATAM Spanish, as of July 20, 2022

 

C.A. 2022 / Genre

No. of Lines

Result - Accuracy

Result - Accuracy %

Engine 1

Engine 2

Engine 1

Engine 2

Documentary - Travel

400

365

327

91.3%

81.8%

SciFi

400

385

364

96.3%

91.0%

Comedy

400

385

373

96.3%

93.3%

Drama - Crime

400

389

374

97.3%

93.5%

Reality - Food

400

378

369

94.5%

92.3%

K-Drama

400

391

382

97.8%

95.5%

 

2400

2293

2189

95.5%

91.2%

 

Engine 1 - XL8 (context awareness model)

Engine 2 - XL8 (normal model)

The food category example is a perfect example of how accuracy levels can fluctuate. With certain advanced language pairs, accuracy may only get to 75% and for those exceptions, we recommend augmenting the translation with a Post-Edit review and quality control (QC) process. We are also constantly training and updating language pairs to make them context-aware, and through those efforts, we’ve seen increases in accuracy exceeding 15% as compared to the previous model.

Linguists who have worked with our CA-based models agree that it’s the tool they need to work more efficiently and complete projects with a higher level of customer satisfaction. 

"XL8's new Context Awareness engines make the translation and QC process much easier and faster,” said June K., a linguist with 4 years of localization experience and a specialist in Korean and English-to-Korean translations. “The accuracy of the results got higher from 70% to 90% in the best cases, and the time we spent on revising the translation was shortened by nearly 20%. The main characters' names and the key phrases show much more consistency compared to the original engines, which helps us maintain the best quality of our industry."

Even linguists who were at first hesitant due to their previous machine translation experiences have realized the workflow improvements possible with the XL8 model, leading to higher-quality work and faster customer responsiveness.

“I remember the first time I heard that we would implement the NMT [Neural Machine Translation] engine,” said Francesco R., a linguist with 15 years of localization experience and a specialist in French, Italian and English. “I was skeptical because, in my previous experience, machine translation meant lower quality. But in time, working with and on the engine, I have seen it improve in understanding the context and provide better translations, thanks to our database made of millions of lines, thanks to the programmers, and thanks to the feedback of translators. Now the NMT is a good instrument to help us provide quality translations for our clients and reduced turnaround times.”

The global demand for content is only going to increase, and customizable tools like our Context Awareness models will become increasingly critical to achieving effective hyper-localization. Our vision is to remove language barriers, allowing everybody to communicate and enjoy the content they want to watch in their own language, while also giving content owners more control over the entire localization process and helping them grow their businesses efficiently.

ProductionHUB ProductionHUB Logo

Related Blog Posts

Comments

There are no comments on this blog post.

You must be logged in to leave a comment.