SwissTXT and its mission to revolutionise interpreting for deaf people
In a scene from the film «Kingsman: The Secret Service», Galahad enters the Kingsman meeting room and is met by Arthur. Bear with me while we take a detour into film. This does make relate to interpreting for the deaf – I promise.
Galahad takes a seat at the otherwise empty meeting room table. Once they both put on their AR glasses, they immediately see all the other Kingsman agents as 3D or stereoscopic avatars.
This film extract explains how SwissTXT, a multimedia arm of Swiss broadcaster SRG, anticipates sign language interpreting in future. They foresee augmented reality creating an image of a person – in this instance, the interpreter – who will translate spoken language into sign language in real time.
What’s this got to do with a teletext company?
So I already mentioned that SwissTXT deals with multimedia. You might already have heard of it because it creates teletext. You know the brightly coloured pixel writing on a dark background you get when you press the Text button on your remote. But of course, it does much more than that as a subsidiary of SRG. It was founded in 1983 and amongst other things, it deals with access services.
«We made a commitment 35 years ago to create a subtitling service. At first, this was fairly simplistic. Then the UN Convention on the Rights of Persons with Disabilities and the demands for accessibility became greater. With technological advances, demand increased even further,» says Robin Ribback, Innovation Manager at SwissTXT.
And that’s why SwissTXT isn’t just responsible for creating subtitles these days. It’s also tasked with things like sign language and audio description. It goes without saying that people with sensory disabilities should have access to information, education and culture.
But the UN Convention on the Rights of Persons with Disabilities isn’t just about accessible TV. It also covers accessibility issues in other areas, such as education, events, corporate and politics. «It’s SwissTXT’s job to build an ecosystem for access to information, education and culture,» explains Robin.
SwissTXT has teamed up with a number of universities, including the University of Zurich, St. Gallen, Lausanne and the Universities of Applied Sciences in Winterthur, Bern and Olten. This collaboration ensures people with disabilities, such as hearing impairments can still access education by having their lectures read out to them online from a tablet using something called a ReSpeaker.
How spoken language gets turned into written language
You might be wondering how spoken language is interpreted for the hearing impaired at the moment. It works using a type of hardware called a ReSpeaker. Universities are a good place to see this used in practice.
Here’s what happens: the lecturer’s speech is transmitted online via a ReSpeaker. This ReSpeaker can be anywhere and their job is to turn spoken language into written text with punctuation. An automatic language recognition software puts the ReSpeaker’s spoken words into written format. This text is then shared with the hearing impaired person online. As a result, deaf students can follow along with the lesson by reading a transcript of what the lecturer is saying.
Essentially, that’s how subtitles work today. The difference being the system is now optimised. In an additional step, the ReSpeaker is replaced with automated voice recognition part way through. This transforms spoken word into text form. And this text is then optimised by a person.
In the third phase, the human aspect is completely removed from the process, as AI is in charge of the whole translation process. But that’s not to say it should work like YouTube, where the subtitles are purely a word-for-word transcription. They have to be edited into succinct text.
«It’s important for us to keep improving. And the key there is in collecting data,» says Robin. To do that, SwissTXT continually collect information from their briefs on broadcasts, education, events, entertainment and politics. This helps to make sure AI is always improving with deep learning.
«At the moment, people still play an important role in accessibility. But we’re always boosting our data so that the automated systems can take on more of the work. And then we’ll come to a point where the machines can handle the whole process,» says Robin, confidently. This would let them meet their goal of granting hearing impaired people access to 100% live text – all of the time and everywhere.
Obviously, this isn’t just for education and TV. It’s also for events, entertainment and politics. This kind of interpreting means the deaf community could follow meetings at the Council of States and the National Council along with other translations with audio description and sign language. Even today at events, you get subtitles for stadium commentators. As a result, people with hearing impairments can go along to FC Bayern Munich matches in the stadium and follow what the commentators are saying with the help of AR glasses (linked article in German).
How spoken language gets turned into sign language
«Deaf people want sign language interpreting,» explains Michaela Nachtrab, Business Developer for Access Services and a sign language interpreter herself. «They obviously want to be communicate in their own language.» However, that’s not as straightforward as just creating subtitles. To actually understand sign language, a lot of factors come into play.
For instance, the signs or gestures are important and then there’s also the role the torso plays along with facial expressions. «Even small facial movements can make a difference to the meaning. To give you an example, when I raise an eyebrow and look down, I’m asking a question,» says Michaela. And the torso is used to show things like locations and positions.
For this to work, there needs to be an inbuilt artificial image – AI – of a sign language interpreter. You could call it an avatar. «But avatars are often equated with games. That’s why we’re calling our image a realatar,» explains Robin.
To create the realatar, SwissTXT starts as always with the subtitles. First, they record interpreters in a special studio and generate a digital copy of them. This realatar can then be broadcast on devices like notebook or tablets.
As in the example with the ReSpeaker, the interpreters can work from anywhere. All they need is a camera that films their face, a camera to records their movements and motion sensors to register their hand movements. It’s conceivable that everyone could have a realatar of themselves, meaning you could have Samuel L. Jackson signing for you in future.
«At the moment we’re in the initial phase, which you could call live remote avatar puppeteering,» says Robin. «That might not sound like very much but this is what lets interpreters do their work from anywhere – even from home. This saves us a great deal of money,» Michaela adds.
Now it’s all about collecting data on their movements and facial expressions: «Up until now people haven’t thought of visually recording movements that sign language interpreters make,» Robin points out. Speech recognition data has been collected since 1987. But data banks for sign language, on the other hand, are only just getting built now.
The first data to be collected is for weather forecasts. That’s because the language used for this type of report is fairly limited and clear-cut. And consequently there’s less room for the wrong interpretation being used.
In terms of language understanding, it’s usually the case that people get it 99% of the time, while machines only understand 85% of it. In sign language, this value of machine comprehension drops significantly. If a machine understands 55% of it right, it’s already doing well. However, when it comes to understanding spoken language, a machine will recognise at least 90% of words.
Similarly to subtitles, sign language interpreting should be automated in three phases. This helps to train AI with natural language processing (NLP) and deep learning. By the end of this, AI should recognise spoken language, be able to translate it into sign language and use a realatar.
The aim: making everything available to people with hearing impairments
This is where HbbTV comes into play. It displays a transparent browser overlay across the TV signal. What does that mean in practice? It displays the subtitles above or below the sign language interpreter.
And what works for watching TV should also be suitable for other areas, such as education, events and politics. As with the examples we started with in the scene from «Kingsman: The Secret Service» and the stadium commentator at FC Bayern Munich, augmented reality is a key factor. Watch out for developments in AR glasses, where they’re set to display a 3D sign language interpreter.
However, the whole concept of 3D or stereoscopy brings up other questions. «How can the Kingsmans drink whisky together? Holographic people and objects are fixed elements within the room. In other words, they go well beyond simple holographs. Then you’re likely to get questions about how a hybrid professional environment would work. That’s where research is headed in future,» says Robin with certainty.
In case you’re interested, the data SwissTXT has been collecting is freely available online.