top of page
  • Digital Japan 2030

AI - Deep Learning – Computer Vision

Updated: Feb 1, 2021

For an introduction to key terminology on AI and Deep Learning, please refer to this article.


What it is and the value it drives

Computer Vision is an area of Deep Learning that allows computers to understand images and “predict” what an image is. It is a prediction because what is easy for humans is often hard for computers: in many domains, computers are closing in on human accuracy, and in some situations even exceed it. At its core, computer vision is still trying to predict "yes" or "no", but this time for an image. What the computer “sees” are the numbers that correspond to the red, blue, and green pixels of the image, which the algorithm learns to associate with the image itself. Its origins were playful, involving creation of apps to detect if an image was a cat, or not a cat; it has now evolved to significant commercial applications; popular ones being autonomous driving, precision agriculture, and pathology detection in X-rays.


While computer vision may sound complex, it is not. First, it depends on data sets of images, some of which are actually quite small – as is the case for X-rays, where there may be a few thousand, or even less where the pathology is positive – and thus the term now used for it is “small data”. That image data in turn needs to be labeled: in the case of an X-ray, an experienced doctor needs to label that image with a 1 or 0 to indicate if the pathology is positive or negative. In the case of images of the road for autonomous driving, they need to be labeled to specify if they contain, say, a car in the image, or a pedestrian. Today, there are many third-party companies that label data sets. These labeled images are inputted into the "Neural Network Model” – simply a large collection of formulas, all of which are simple in their own right but when stacked together produce a powerful prediction. The trained model will be used to predict what a future incoming image is, or better said, the probability of what a new image is.


For applications that deal with recognizing images, the most popular type of Neural Network is called a CNN or Convolutional Neural Network. Images are large in size, and thus computationally expensive to process. What a CNN does is reduce the size of images by applying a 'filter' or 'convolution' that makes it smaller in various steps but helps keep key attributes of the image such as edges or key features. Thus, by the time the image has made its way through the neural network, it is significantly smaller to analyze but retains all the key properties that allow the model to predict what it is. All the above steps and the CNN example can be seen below.



The value of this technology comes from improved speed and quality in many business processes, as well as cost improvements from automating many tasks in a process. Because most companies are organized around a value chain, which is in turn broken down into key processes, many of these are candidates for automation improvements with computer vision.


Where it is today

There is much good news today for organizations looking to deploy computer vision. A great number of data sets are available to train models, and given the broad reach of mobile phones it is easy to produce large data sets quickly. There are third-party companies, such as iMerit, that help label images quickly. The major cloud platforms all have services to train and predict what images are, such as Google AutoML and Amazon Rekognition. These services all come with tools to test model accuracy. And for those wishing to deploy their own custom models, the algorithms are openly available. For example, the algorithm used by Tesla (YOLO for You Only Look Once) can be downloaded on GitHub.


In recent months, Japan has been creating some marquee applications of AI-computer vision.


The Japanese car auction company AUCNET used Deep Learning-computer vision to predict the price of a used car on konpeki.io. Pricing a used car is the type of process typically done with flow charts, data science, and rule-based systems to evaluate areas of a car such as the body, tires or interior. Traditional data analytics can show a certain type of car is underpriced and thus prompt an appraiser to make a correction. The deep-learning-powered application, by contrast, simply requires uploading images of the car from various angles and uses these images to predict the price of the car, dramatically shortening the process and improving accuracy. To achieve this, AUCNET trained a model with many images of cars and then "labeled" those with the type of car, hence enabling the prediction.


Another talked-about Deep Learning computer vision application is Tuna Scope, developed by Dentsu in Japan. tuna-scope.com developed a model to predict the freshness of Japanese tuna, by simply taking a picture of a cut of the tuna’s tail on a phone. This application is a great example of automation in a case where there are fewer and fewer experts, and knowledge needs to be automated to avoid being lost.


Globally, some of the most notable uses of computer vision have been autonomous driving and X-ray pathology detection.


How the technology will continue to evolve

Similar to Traditional ML, there will be three important developments in computer vision.


First, highly creative use cases will emerge to show how processes can be dramatically improved using image recognition. For example, Google and the US Navy recently partnered to use images taken by drones and detect where ships have rusty parts, reducing the need for long maintenance walk-throughs.


Second, the predictive power of these models will be increased by augmenting data with synthetic data, or otherwise improving the models.


Third, more use will be made of AutoML models. These have continued to improve in accuracy, getting better at detecting images and accounting for variations in the image data.


The key applications

Industry sectors across the economy will be able to drive significant value from this technology:

  • Retail and Retail Security: Where it can be used to help users rapidly recognize and shop for products they want, as well as protect physical stores and other assets.

  • Manufacturing: Where image recognition can be used to detect defects on manufacturing lines by simply using pictures directly in the manufacturing line.

  • Healthcare: Radiology applications for key pathologies across the board will continue to flourish and compensate for the shortage of radiologists.

  • Agriculture: Precision agriculture analyzes images of crops to specify where they may need fertilizers or irrigation, saving on agricultural inputs.

  • Government: Where it can be leveraged for document recognition, and where if used appropriately, facial recognition can reduce fraud and enable authentication.

204 views0 comments

Recent Posts

See All
bottom of page