It’s easy to get lost in the ill-boding narratives surrounding new technologies like virtual reality, artificial intelligence (AI) and augmented reality. Media outlets certainly play a role in exacerbating these attitudes. We tend to fear the unknown and it does not help that we can already imagine dangerous uses of these technologies. From malicious deepfakes to killer robots, the mind wanders…
It is refreshing to hear a more encouraging perspective. Hao Li, co-founder and CEO of Pinscreen, is a computer graphics and computer vision expert who works on creating accessible virtual avatars. “I was really fascinated by the fact that you can create anything you want,” says Li. His focus quickly became digitising humans because “humans were one of the hardest things to do. One of the reasons for that is that we’re very sensitive if things don’t look right with a person… I knew if you put sufficient effort you could basically create the perfect digital human.”
Pinscreen has various AI-powered solutions. They can create fully autonomous virtual assistants. They have an avatar creation solution that can create a complete 3D avatar of someone based solely on a photo. They are now working on creating a technology that can automatically generate virtual avatars, in order to make this technology more accessible. “If it is not accessible, then it means that this technology can be used for things that are basically reserved for professionals. This means that you can only use it for entertainment purposes, like film production or video games but if we ever wanted to have a virtual avatar of ourselves, for example, in applications beyond entertainment and production, then it has to be accessible” says Li.
AI as a facilitating force
Why is that important? Aren’t we already too preoccupied with our two-dimensional digital identities? Do digital avatars or the technology behind them have anything to add to our lives?
Turns out they do. Making this technology accessible opens the door for several use cases. First one is in telecommunication. “Let’s say we’re communicating in 2D, but what if we wanted to have an interview and we’re in the same space? If we can both be teleported to the same place and have a face-to-face conversation without the need to physically be there, I think that’s one of the things that this kind of technology can facilitate,” says Li. And the benefits could be paramount. “If we can do this virtually but still have the same experience, that would solve the energy problem.”
The second one is scaled content creation. Let’s say a company wants to work with a celebrity, but they cannot fit into the celebrity’s schedule or do not have the budget. Creating visuals of the celebrity without actually including them in the recording or production process allows the celebrity to scale. “For the celebrity it’s also interesting because instead of just having a few recordings a year they can do a thousand” says Li. This will allow the celebrity to have a bigger reach, make more money overall, and pick the more interesting projects to pour their creativity into, while also allowing the companies that want to work with them to do so in a more cost-effective way.
Third one is human-machine interactions. There are many things in daily life that don’t scale, or, as we all now know from first-hand experience, can get dangerously interrupted. “In the pandemic many doctors weren’t available or many people didn’t have the luxury to see a doctor. So imagine if you had the chance to have a virtual doctor that can help you at least with 60% of the usual things – they can diagnose your symptoms and give you some guidance,” explains Li. A good portion of services like healthcare or education can be automated. In that case, having a realistic autonomous interface would help. “The human is really just the interface and allows us to be more natural because it will be weird if we talk to a robot-looking thing. I think there’s always more emotion when we talk to another human.” A realistic looking avatar could even help with special cases – like helping people with autism cope with social stress.
United against deepfakes
Of course there are potentially harmful uses of this technology and Li doesn’t take them lightly. Deepfakes, a solution that uses a deep generative model to swap a face with another in videos, is one of them. “The first time that something like that came up was back in 2018 when deepfakes came up,” remembers Li. The core technology that Li and others work on was used to put celebrities into pornography without their consent. Being able to manipulate videos also meant that they could be used to interfere in politics or spread fake news.
Even at a smaller scale, the malicious uses of this technology can be damaging. “There’s actually been an FBI warning a month ago where they were notifying private companies that they’re observing some new threats where AI-synthesised media has been used for scam purposes,” recalls Li. “You have phishing attacks where a fake identity of a person tries to impersonate the real person.”
There are detection algorithms that are used by companies to identify deepfakes and fake news, but they aren’t always reliable. Li emphasizes the importance of having mechanisms in place, pointing to the recent efforts of social media platforms such as Facebook and Twitter as examples. Flagging suspicious content and providing further context is important. “If you look at fake news in general they don’t even need deepfakes. It’s just a photo of something. If the photo is well taken, if the news article is really well-written, then the only thing that would prevent you from spreading [fake news] further is fact-checking. You need mechanisms. I think that’s the key.”
Li is already working on a new project to prevent the spread of fake news at scale. Led by University of California, Berkeley (UC Berkeley) and funded by the Defense Advanced Research Projects Agency (DARPA), the program, called SEMAFOR for semantic forensics, aims to implement a multi-model approach to combat the next generation of AI manipulations in media. It’s currently very easy to spread fake news. Creating convincing pieces of manipulated media content such as videos, audios and images isn’t that difficult. The new approach that Li and his colleagues are working on at SEMAFOR tries to put content in context. By looking at the bigger picture and taking into consideration things like movements and gestures they are trying to find anomalies or correlations that can detect if a certain piece of content has been manipulated. “What if people are creating something so sophisticated that could generate content at scale so people can’t see what’s right or wrong anymore? It’s not only about detecting that. It’s being able to tell how it’s been generated, where the data came from, and if the content has been developed for something good or bad. Because not every manipulation is that.”
With any new technology as transformative as AI there will be efforts to use it for harmful purposes. It’s easy to get intimidated and lose sight of potential benefits. Even if current efforts to combat the malicious uses fail to protect against everything, they are still worth it. As Li says, “You always have that one specific expert that can create the perfect thing, that can bypass the security, but if you can stop most of it that’s already really important.”