top of page
  • Digital Japan 2030

AI - Deep Learning – Language and Speech Processing

Updated: Feb 1, 2021

For an introduction to key terminology on AI and Deep Learning, please refer to this article.


What it is and the value it drives

A smart speaker narrating the news, setting up a reminder of an upcoming appointment by speaking to a virtual assistant, automatically translating text and audio in real time in real time, interacting with a chat bot on a website to solve a customer service issue: all these are applications of Natural Language Processing (NLP), the field of AI which deals with understanding, manipulating, and generating text. It includes text-to-speech, speech-to-text, text-to-translated text, and conversational AI. Automatic Speech Recognition (ASR) develops applications that can convert speech sounds to text; speech synthesis (or Text-To-Speech, TTS) deals with the reverse process. These fields are adjacent to Optical Character Recognition (OCR) which, while technically a part of computer vision, enables computers to read printed or handwritten documents.

NLP models at their core are also trying to predict something, in this case, a word, or more precisely, the "probability of a sentence" given the words it contains. By taking into account grammar rules and statistical patterns of text, NLP models solve "tasks", i.e., very specific use cases in text analysis.

The promise of NLP is to automate the process of making sense of unstructured text data, thus enabling companies to consume vastly larger amounts of text with a fraction of the human effort. This data can be leveraged to handle tasks previously thought to require human effort, such as research, customer interactions, and document translation.

Where it is today

Business use cases for speech and language processing can be grouped into five categories: customer experience analytics, virtual agents and contact center analytics, augmented search, automated document review, and natural language generation. Following are some examples of each in turn.


Customer experience analytics. Companies can analyze social media posts and customer service transcripts to understand customer sentiment and satisfaction. For example, apparel company Puma, in partnership with Cloud Cherry, improved its Net Promoter Score (NPS) by 20% by using sentiment analysis, and reduced staff-related complaints by 40%. Virtual agents and contact center analytics. "Chat bots" are conversational interfaces that help customers navigate a service and resolve issues with limited human intervention. These models are trained with question-and-answer pairs, so that in turn they can predict the best response to a customer’s question. They are widely used for straightforward transactions today such as ordering a pizza or booking a plane ticket. In 2018, Shiseido's customer service team integrated BEDORE, a "dialogue engine" to process chat and voice conversations. Since deployment, the system handled 80% of customer enquiries, expanding the support window and reducing human operator workload. Augmented search. Navigating the data deluge of unstructured content in search of answers is a time-consuming undertaking in most functions, from R&D, to Legal, to Customer Support. "Information extraction" systems can navigate vast pools of information based on unstructured queries to find and synthesize insight. For example, Japanese medical startup Ubie offers hospitals an AI-driven clinical questionnaire that helps doctors identify potential diseases based on patients' reported symptoms. Automated document review. From Risk and Compliance to resumé screening, NLP can be used to process massive volumes of text in real time seeking out anomalies in key documents. In June 2020, legal tech startup MNTSQ launched a partnership with the Bank of Fukuoka, to pilot its contract generation and review technology that will reduce contract processing time and will scan them for risky clauses. Natural Language Generation (NLG). NLG techniques are used to create reports from financial records and press searches, and even translate documents between languages. Readily available translation arrived with Google Translate in 2006, and now more than 100 billion words are translated every day.

How the technology will continue to evolve

The next decade will likely see a steady increase in the quality of NLP models. Progress is being made in accuracy on various tasks, and research is constantly pushing the state-of-the-art in increasingly complex tasks such as document summarization, speech recognition, and question answering. Some of the most exciting developments on the horizon concern the quality of text and speech generation models.

The accuracy of NLP systems across a broad range of tasks will continuously improve, sustained by reduced costs of high-performance computing resources, greater availability of training data from various industries and use cases, and recent developments in Deep Learning for language.


From 2018, the introduction of Deep Learning NLP models ushered in a new era of advanced use cases underpinned by better accuracy and lower lead times. In summer 2020, the powerful GPT3 model was launched in beta by OpenAI. GPT3 is able to interpret human language such as writing prompts, questions, or other directions, and generate prose, provide information, and even create working code in a programming language. In Japan, similar models are being reproduced, such as ELYZA. Provided sufficient Japanese data sets are available, these powerful models can be integrated into multiple business applications.



The key future applications

In Japan as well as globally, NLP applications are likely to find their way into more and more sectors and use cases. This expansion will be driven by three factors: confidence in the technology and ongoing accuracy improvements; availability of low-code solutions, supported by commoditization of NLP as a service; and improved data maturity of companies, which will provide easy access to business-critical data for strategic and operational analysis.

In particular, there are four domains in which more widespread adoption in Japan is expected.

Chat bots. Improved text generation techniques will enable chat bots to satisfy the demand for 24/7 instant customer service at reduced operational costs. The global chat bot market is anticipated to reach $9.4 billion by 2024, with retail and banking driving growth. IDC forecasts that, as soon as 2022, 30% of enterprises will be using conversational interfaces for customer engagement.

High accuracy and real-time machine translation. Machine translation is due to become even more pervasive in both personal and working lives. It will be possible to seamlessly engage with content from around the world, and hold conversations crossing language barriers more smoothly than before. Translation has clear effects in unlocking cross-border collaboration: the introduction of machine translation on eBay, for example, contributed to international trade on the marketplace, resulting in significant increases in items sold from the US and Europe to other countries.

Speech recognition. Human-level speech recognition and synthesis are also quickly approaching, driving diverse use cases such as automated caption generation, instant transcription services, and even more natural AI assistants.

Document processing. As companies move more towards establishing data lakes, the costs of integrating information extraction and document processing technologies will fall, and it will become easier to reap the automation and analytics benefit of these solutions.


158 views0 comments

Recent Posts

See All
bottom of page