In this data age, we can personalize the provision of information because we are rapidly learning more stuff about hundreds of millions of individual users of services. The underlying idea is that an AI-enabled information system (internet, website, media, etc) will be able to give end-users exactly the content they want. Why? Because these users offer increasingly rich data profiles to the machines running online services. How far, then, will a person’s experience of language(s) - one crucial variable - offer a challenge or a solution to tomorrow’s personalized data game?
There are two directions we can take:
Take this baseline situation: A virtual assistant (VA) can listen to my voice, detect signs of depression (or even covid-19!) emitted by my vocal cords, transmit an alert to a treatment service which will make a diagnosis based on access to my entire online medi-history, run against a zillion other sufferers and analyzed using machine learning. Personalized help will not be far away.
Note that in this case, the system doesn’t even need to know which language I speak, as the crucial data points are detected in the sounds of my voice rather than the meaning of my words. Quite a broad range of specialized psychological, medical and sociological insight can likely be derived from listening solely to speakers as ‘noise makers’ rather than as ‘sense makers.’ A written message, however, bearing identifiable semantic value would almost certainly not trigger a depression alert as such (it could be a fake, couldn’t it?). Yet generalized rant and extremist content written to social media can provide “psychological” alerts (due to specific words in a given language) for content moderators.
So far in the translation industry, p13n has been used to refer to the action of adapting content for traditional (big) language communities. The general view found in progress reports is that the range of languages to be datafied and digitized as online tongues is rising, albeit slowly. And that the unsurprising rationale for this is that most speakers of a language prefer to do business, search for information, or enjoy entertainment in their own tongue. So localizing a website from English into Kiswahili is rather condescendingly understood today as in some way ‘personalizing’ content for some 130 million potential users. Just as the Wycliffe Global Alliance has been steadily ‘personalizing’ parts of the Christian Bible into 3,384 languages.
The trouble with this interpretation is that once you’ve got a new language onboard, your p13n work never stops. You will have to start addressing all those new language speakers with their different personal preferences as evidenced in their usage (dialectal, social, racial, religious, educational, etc.), which is what we ultimately mean by p13n. What they prefer to read/hear and what they prefer to say/write. Let’s look at what this could mean.
AI-driven micro-analytics is set to reveal more about the power of language to influence specific individuals than just large cohorts of readers/listeners in general. Consider the current retail trend of giving customers more background information about product sourcing in order to build greater trust and attract various new sub-groups. What if the impact of this activation depended on personalized language preferences about the very nouns, verbs, adjectives, and rhetorical constructions that go into marketing content, once first contact is made through an advert?
And what if financial, insurance and similar services attempted to radically personalize the style of their communications to build stronger personal relationships by adapting parts of their content to near-individual psychological preferences when talking about money or debt?
This fine-tuned p13n is surely on the horizon. The global commercial translation industry is worth only a few billion dollars more than Amazon’s annual R&D budget of $35.9Bn. We can be almost sure that some of this R&D spend will go into working out the next stage in deploying voice assistance/assistants (VA). Spoken language is a remarkable conveyor of sentiment and an influencer on personal decision-making. VA developers are well aware of the volume, depth and granularity of information that voice input and output can provide marketing services of all kinds.
So if distinguishing between English and Kiswahili voices is deemed a major l10n achievement today (and translating content from one to the other is a real advance), the next steps in personalized VA language will involve far subtler dimensions. These could include understanding and delivering on the rhetorical benefits of different vocab choices, linguistic registers, topicality of content, voice musicality, and the use of humor or play.
Brands and services will have to develop content that both echoes the natural discourse of potential end-users, while leading those same consumers to embrace the specific design language of the brand. A complex equation. Only big data will be able to tell us if and how these semiotic features of human vocal and textual experience will influence language use. We will then be able to use smaller sets of personal data to build those end-user p13n profiles.
A simple contrast helps understand these possibilities in the voice domain. In standard conference interpreting situations, we don’t expect a match between the source speaker and the translator’s gender or age or voice quality - a 25-year-old woman can interpret (“speak for”) a 70 year-old man with a dialect-based accent. Someday soon (shock and horror!) we will have a conference interpretation system that will model the man’s voice in near real-time and automatically generate a translation from the system using his personal voice quality and speaking style for his translated contributions. This could seem ‘uncanny’, as they say, or it could become another new normal.
Making a user (customer, etc.) feel more at home with content, therefore, by testing different voices, rhythms, and speeds for delivering the content in question will be the stuff of machine learning worldwide, and probably form a major thread of future language p13n. This will mean that anyone in charge of multilingual versions of VAs and other robots (it may not even involve traditional translation as such) will need to go much deeper into the data than simply localizing “content as phrases” from language A to B. Their job will be to ensure a linguistic match between a p13n profile and a given message. And in due course, all this will have to be automated to handle communication in signing versions as well as written and spoken language.
The language trick that enables this familiar sort of p13n is paraphrase - two or more semantically equivalent versions of a single meaningful utterance that can be used in differently marked social and behavioral situations (Get the hell out! vs. Please leave immediately!). Translation is therefore a form of cross-lingual paraphrase. Yet inside some commercial translation, we shall presumably need to provide different paraphrases of the same basic utterance for different age or racial groups, education cohorts, and game players, in order to address humorous or serious situations, and all the rest of the variables from privacy to big crowds. Personalizing means customizing, as transcreators know well.
With an increasing production of spoken content & messaging, we will need to be able to mix and match these different custom registers in subtle ways. For example, the whole logic of emoji and visual languaging currently rampant on social media is tending towards capturing more of the allusive, freewheeling, short-hand power of language on the wing, and making it zing. However, translating at scale some of this playful yet meaningful content into other languages - our first step in p13n - is hardly on the agenda yet for machines, as they don’t yet have access to enough variegated language data. Trying to do this at human speed would usually be counter-productive.
P13n will also require us to translate content for different communities of inclusiveness, either for legal and political reasons or because one language’s innate structure requires subtler gender management or racial coding than another’s. This in turn will require access to greater knowledge about specific language behaviors than we might typically expect from our daily production teams.
At yet another level of communication, people instinctively prefer specific qualities of voices (think of your favorite actors and singers), and might therefore appreciate some voice content to address them in their preferred (although possibly faked) sonorities - especially if we can choose between spoken and written forms of the same content depending on our media choice at a specific moment.
Different types of content for different cohorts might therefore trigger different voice choices, rhythms, speeds, and syntaxes in a commercial context. All these variables could then become issues of linguistic choice among translation target users as well, massively complicating the delivery of translated content, and opening it up to automated solutions. These will first need to machine-test end-user evaluations of different versions on a vast scale to see what works. Linguists will then be able to work on succeeding fashions in language artistry to build p13n libraries.
This all suggests that the industry will need to move beyond the traditional choice of “national” or “regional” tongues as the defining criterion of p13n, and espouse the general tenets of “design thinking” and “millennial” mindsets when it comes to communication and digital continent. Planning how to personalize effectively, collect the data that determine personalized value in language both read/heard or written/spoken, and enrich the capacity of transcreation, machine translation, and other techniques to address this highly competitive challenge will no doubt fill many post-covid sleepless nights.
Long-time European language technology journalist, consultant, analyst and adviser.