At the Lawrence J. Ellison Institute for Transformative Medicine of USC, scientists have trained a neural network to spot different types of breast cancer on a small data set of less than 1,000 images. Instead of educating the AI system to distinguish between groups of samples, the researchers taught the network to recognize the visual “tissue fingerprint” of tumors so that it could work on much larger, unannotated data sets.
Halfway across the country in suburban Chicago, Oracle’s construction and engineering group is working with video-camera and software companies to build an artificial intelligence system that can tell from live video feeds—with up to 92% accuracy—whether construction workers are wearing hard hats and protective vests and practicing social distancing.
Such is the promise of computer vision, whereby machines are trained to interpret and understand the physical world around them, oftentimes spotting and comparing fine visual cues the human eye can miss. The fusion of computer vision with deep learning (a branch of artificial intelligence that employs neural networks), along with advances in graphics processors that run many calculations in parallel and the availability of huge data sets, has led to leaps in accuracy.
Now, a generation of GPUs equipped with even more circuitry for parsing photos and video and wider availability of cloud data centers for training statistical prediction systems have quickened development in self-driving cars, oil and gas exploration, insurance assessment, and other fields.
“Devoting more money to large data centers makes it possible to train problems of any size, so the decision can become simply an economic one: How many dollars should be devoted to finding the best solution to a given data set?”
David Lowe, Professor Emeritus of Computer Science, University of British Columbia
“Machine learning has completely changed computer vision since 2012, as the new deep-learning methods simply perform far better than what was possible previously,” says David Lowe, a professor emeritus of computer science at the University of British Columbia who works on automated driving and developed a computer vision algorithm that led to advances in robotics, retail, and police work in the 2000s.
“Almost all computer vision problems are now solved with deep learning using massive amounts of training data,” he says. “This means the major difficulty and expense are gathering very large data sets consisting of images that are correctly labeled with the desired results.”
56% of business and IT executives say their organizations use computer vision technologies.1
Oracle is making servers available on its Oracle Cloud Infrastructure that run Nvidia’s latest A100 GPUs. In addition to faster processing cores, bulked-up memory, and quicker data shuttling among processors, the GPUs include circuitry and software that make training AI systems on photos and video quicker and more accurate.
Powerful but static
There are still limits to today’s vision systems. Autonomous vehicles need to clear safety hurdles stemming from the vast number of unpredictable events that arise when people and animals get near cars; an area that’s hard to train machine learning systems to recognize. Computers still can’t reliably predict what will happen in certain situations—such as when a car is about to swerve—in a way that humans intuitively can. Many applications are limited in their usefulness by the availability or cost of generating large sets of clearly labeled training data.
“Today’s AI is powerful, but it’s static,” said Fei-Fei Li, codirector of Stanford’s University’s Human-Centered AI Institute, during a recent corporate talk. “The next wave of AI research ought to focus on this more active perspective and interaction with the real world instead of the passive work we’ve been doing.”
Neural networks use successive layers of computation to understand increasingly complex concepts, then arrive at an answer. Running deep learning systems on GPUs lets them train themselves on large volumes of data that involve multiplying data points by their statistical weights in parallel on graphics chips’ many small processors. In computer vision, the techniques have led to the ability to quickly identify people, objects, and animals in photos or on the street; build robots that can see and work better alongside humans; and develop vehicles that drive themselves.
“Training can use such vast amounts of computation that there are some problems constrained simply by the speed of processors,” says computer scientist Lowe. “However, training is highly parallel, meaning that just devoting more money to large data centers makes it possible to train problems of any size, so the decision can become simply an economic one: How many dollars should be devoted to finding the best solution to a given data set?”
Thousands of chips
For video analysis, for example, each new Nvidia A100 GPU incudes five video decoders (compared with one in the previous-generation chip), letting the performance of video decoding match that of AI training and prediction software. The chips include technology for detecting and classifying JPEG images and segmenting them into their component parts, an active area of computer vision research. Nvidia, which is acquiring mobile chip maker Arm Holdings, also offers software that takes advantage of the A100’s video and JPEG capabilities to keep GPUs fed with a pipeline of image data.
Using Oracle Cloud, businesses can run applications that connect GPUs via a high-speed remote direct memory access network to build clusters of thousands of graphics chips at speeds of 1.6 terabits per second, says Sanjay Basu, Oracle Cloud engineering director.
An oil and gas reservoir modeling company in Texas uses Oracle Cloud Infrastructure to classify images taken from inside wells to determine promising drilling sites, Basu says. It also employs so-called AI “inference” to make decisions on real-world data after training its machine learning system.
94% of executives say their organizations are already using it, or plan to in the next year. 1
An auto insurance claims inspector runs a cluster of computers in Oracle’s cloud that train a machine learning system to recognize photos of cars damaged in accidents. Insurers can make quick repair estimates after drivers, using an insurer-provided app, send them photos snapped with their phones.
Oracle is also in discussions with European automakers about applying its cloud computing infrastructure to train automated driving systems based on images and video of traffic and pedestrians captured during test runs.
In a Deloitte survey of more than 2,700 IT and business executives in North America, Europe, China, Japan, and Australia published this year, 56% of respondents said their companies are already using computer vision, while another 38% said they plan to in the next year. According to research firm Omdia, the global computer vision software market is expected to grow from $2.9 billion in 2018 to $33.5 billion by 2025.
1 Source: Deloitte Insights “State of AI in the Enterprise” report, 2020.