Research into machine learning and artificial intelligence, now a key technology in almost every industry and company, is too voluminous for anyone to read. This column, Perceptronaims to collect some of the most relevant recent findings and papers – particularly in, but not limited to, artificial intelligence – and explain why they matter.
Over the past few weeks, Google researchers have demonstrated an AI system that PALI, which can multitask in over 100 languages. Elsewhere, a Berlin-based group launched a project called Source+ it’s designed as a way to allow artists, including visual artists, musicians, and writers, to opt in — and opt out — of allowing their work to be used as training data for AI.
AI systems like OpenAI’s GPT-3 can generate fairly meaningful text or summarize existing text from the web, e-books, and other information sources. But historically they have been limited to a single language, limiting both their usefulness and scope.
Fortunately, research on multilingual systems has accelerated in recent months—led in part by community efforts like Hugging Face Bloom. In an effort to capitalize on these advances in multilingualism, a team at Google created PaLI, which was trained on both images and text to perform tasks such as image captioning, object detection, and optical character recognition.
Google claims that PaLI can understand 109 languages and the relationships between words in those languages and images, allowing it to, for example, caption a photo on a postcard in French. Although the work remains firmly in the research phases, the creators say it illustrates the important interplay between language and imagery — and could form the basis for a commercial product down the line.
Speech is another aspect of language in which artificial intelligence is constantly improving. Play.ht recently showed off a new text-to-speech model that puts a remarkable amount of emotion and range into its output. The clips posted last week sounds fantastic, although of course they are cherry picked.
We generated our own clip using the intro to this article and the results are still solid:
It’s not yet clear exactly what this type of voice generation will be most useful for. We’re not quite at the stage where they’re making whole books — or rather, they can, but it might not be anyone’s first choice just yet. But as the quality increases, the applications multiply.
Matt Dryhurst and Holly Herndon — an academic and a musician respectively — have teamed up with the organization Spawning to launch Source+, a standard they hope will address the problem of photo-generating AI systems created using artwork by artists who don’t have been informed or requested permission. Source+, which costs nothing, aims to allow artists to opt out of their work being used for AI training purposes if they choose.
Image generation systems like Stable Diffusion and DALL-E 2 were trained on billions of images pulled from the web to “learn” how to translate text prompts into art. Some of these images come from public art communities like ArtStation and DeviantArt – not necessarily with the knowledge of the artists – and imbue systems with the ability to mimic certain artists, inclusive artists like Greg Rutowski.
Because of the systems’ ability to mimic art styles, some artists fear that they could threaten their livelihoods. Source+ — while voluntary — could be a step toward giving artists a greater say in how their art is used, Dryhurst and Herndon say — assuming it’s adopted at scale (a big if).
DeepMind has a research team trying to resolve another long-standing problematic aspect of AI: its tendency to spew toxic and misleading information. Focusing on text, the team developed a chatbot called Sparrow that can answer common questions by searching the web with Google. Other cutting-edge systems like Google’s LaMDA can do the same, but DeepMind claims that Sparrow provides believable, non-toxic answers to questions more often than its peers.
The trick was to align the system with people’s expectations of it. DeepMind recruited people to use Sparrow and then had them provide feedback to train a model about how useful the answers were by showing participants multiple answers to the same question and asking them which answer they liked best. The researchers also defined rules for Sparrow such as “do not make threatening statements” and “do not make hateful or offensive comments,” which prompted participants to impose on the system by trying to trick it into breaking the rules.
DeepMind admits Sparrow has room for improvement. But in one study, the team found that the chatbot provided a “plausible” answer supported by evidence 78% of the time when it was asked a factual question, and violated the aforementioned rules only 8% of the time. That’s better than DeepMind’s original dialog system, the researchers noted, which broke the rules roughly three times as often when tricked into doing so.
A separate DeepMind team recently tackled a very different area: video games, which have historically been difficult for AI to quickly master. Their system cheekily named A MEMEreportedly achieved “human-level” performance on 57 different Atari games 200 times faster than the previous best system.
According to DeepMind’s report describing MEME, the system can learn to play games by watching approximately 390 million frames — “frames” referring to the still images that are refreshed very quickly to give the impression of motion. That might sound like a lot, but the previous state-of-the-art required 80 billion frames in the same number of Atari games.
Being skilled at playing Atari might not sound like a desirable skill. And indeed, some critics argue that games are the wrong benchmark for AI due to their abstractness and relative simplicity. But research labs like DeepMind believe the approaches could be applied to other, more useful areas in the future, such as robots that more effectively learn to perform tasks by watching videos or self-improving, self-driving cars.
Nvidia had a field day on the 20th, announcing dozens of products and services, including several interesting AI efforts. Self-driving cars are one of the company’s focuses, powering AI and training it. For the latter, simulators are crucial and it is also important that the virtual roads resemble the real ones. They describe a new, improved content stream which accelerates the transfer of data collected from cameras and sensors on real cars into the digital realm.
Things like real-world vehicles and road irregularities or tree cover can be accurately reproduced so that the self-driving AI doesn’t learn on a sanitized version of the street. And it makes it possible to create larger and more variable simulation setups in general, which helps with sustainability. (Another image of him is above.)
Nvidia also introduced its IGX system for autonomous platforms in industrial situations — cooperation between man and machine, as you might find in the factory. Of course, there is no shortage of them, but as the complexity of tasks and operating environments increases, the old methods no longer help, and companies looking to improve their automation are looking to the future.
“Proactive” and “predictive” safety is what IGX is designed to help with, which means catching safety issues before they cause outages or injuries. A bot might have its own emergency braking mechanism, but if a camera monitoring the area can tell it to swerve before a forklift gets in its way, things go a little more smoothly. Exactly which company or software achieves this (and on what hardware and how to pay for it all) is still a work in progress, with the likes of Nvidia and startups like Veo Robotics feeling my way.
Another interesting step forward was made in the home turf of Nvidia games. The company’s latest and greatest GPUs are built not only for pushing triangles and shaders, but also for fast AI-powered tasks like its proprietary DLSS technology for upscaling and adding frames.
The problem they’re trying to solve is that game engines are so demanding that generating more than 120fps (to keep up with the latest monitors) while maintaining visual fidelity is a herculean task. which even powerful GPUs can barely do. But DLSS is a kind of intelligent frame mixer that can increase the resolution of the output frame without aliasing or artifacts, so the game does not have to push so many pixels.
In DLSS 3, Nvidia claims it can generate entire additional frames at a 1:1 ratio, so you can render 60 frames natively and the other 60 through AI. I can think of a few reasons that could make things weird in a high-performance gaming environment, but Nvidia is probably well aware of them. In any case, you’ll have to pay about a thousand for the privilege of using the new system, since it will only work on RTX 40 series cards. But if graphics fidelity is your main priority, go for it.
The last thing today is a drone-based 3D printing technique from Imperial College London which could be used for autonomous building processes sometime in the deep future. For now, it’s definitely not practical to create anything bigger than a trash can, but it’s still early days. Eventually, they hope to make it more like the one above, and it really does look great, but watch the video below to get your bearings.