Machine learning is now living through a justified hype as organizations uncover more and more of its potential benefits. The algorithms produce accurate results and hence provide real value to businesses. They are extremely effective at solving tasks like object detection, character recognition and behaviour prediction. Moreover, new algorithms created by fusing existing ideas and models can deal with even more complex tasks such as natural language descriptions of images or videos.
How neural networks pay attention
In recent years, neural networks have made significant progress in image and natural language processing. They have learned to recognize, localize and segment images, as well as translate natural language and answer questions. One of the methods that has allowed for such progress is the introduction of neural attention models. These models enable neural networks to select and work with certain pieces of data at a given moment.
The main task of the neural attention mechanism is to learn how to understand where the important information is contained. Let’s look at the example of a neural machine translator.
At first, the words from the input sentence are fed into the encoder so that it contains the sentence’s meaning, the so-called thought vector. Based on this vector, the decoder produces the words of the output sentence one by one. At every step, the attention mechanism helps the decoder to focus on different fragments of the input sentence.
How to use neural attention
Neural attention mechanism can be instrumental in automating complex data processing tasks that deal with large volumes of fast-moving data. Here are a few practical implementations:
Neural machine translation
Machine translation has improved significantly in 2017 with the introduction of bidirectional residual seq2seq (sequence to sequence) neural networks with attention mechanism. The mechanism determines the importance of each word in the input sentence to extract additional local information about this word. As a result, the modern tools are now able to produce high-quality translations of lengthy and complex sentences.
The best illustration of neural attention’s successful use is the neural machine translator under the hood of Google Translate.
Here is a good example that illustrates the distribution of attention when translating from English into French. The language model of the decoder and the attention mechanism were trained to produce the correct word sequence in the output sentence.
Text Summarization
Creating annotations to articles and texts is daunting and time-consuming, especially if the data is vast and heterogeneous. Attention models can pinpoint the most significant parts of the text and compose a meaningful headline, meaning you will no longer have to read the full text to capture its central idea. Text summarization can do it for you in an instant, with results that can be helpful in generating titles for web pages, high-level information research or information segmentation for fast reading.
Chatbots with a question-answering system
In search of efficiency, businesses try to automate as many routine processes as possible; however, a perfect tool for human and machine interaction has not been created yet. Natural language processing (NLP) is not an easy task, but with the attention mechanism, its accuracy increases dramatically. The attention mechanism can detect the most significant (key) words, even from a long and complex question, and produce the right answer. The mechanism can be implemented as an add-on and work together with the neural network on a common knowledge base. The chatbots then bring the mechanism behind machine translation to a higher level of abstraction: translating from one verbal sequence to another.
Image processing
The neural attention mechanism can be used to improve the results of computer vision tasks.
Image recognition
A convolutional neural network with attention mechanism can understand where the significant objects are on the image and make decisions based on the information flow from this area.
Natural language image captioning
The idea is the same as image recognition, but to caption the image, the attention heatmap changes depending on each word in the generated sentence.
The neural network can translate everything it sees in the image into words. Look how the network distributes its attention at different stages of formulating the description.
This function has rich practical potential. Captioning can be used to automate the creation of hashtags or subtitles, to write descriptions for blind people and even to produce daily surveillance reports for security guards.
CAPTCHA solving
The attention mechanism turned out to be very successful in solving CAPTCHAs that are based on recognition and segmentation of noisy or distorted pictures with subsequent text input. The CAPTCHA image is solved by the deconvolution method that creates the segmentation mask. Such segmentation can be used as a simple convolutional variant of attention.
A CAPTCHA solving function can find its potential application with chatbots that have to parse third-party sites to answer queries.
Image-based question-answering systems
Like conventional question-answering linguistic systems, an image-based question-answering system takes a natural language input, but instead of accessing the knowledge base, it uses the attention mechanism to find the answer on the image.
Conclusion
The attention mechanism has proven to be an extremely effective tool capable of solving intellectual problems that require a detailed and gradual analysis of input data.
Usually, the introduction of the attention mechanism can improve the performance of the algorithms that are already in use by businesses. At the same time, attention models do not cause any significant increase in the existing algorithm’s computational complexity. In addition, the attention mechanism can help businesses cope with tasks that could not be successfully solved before — processing or generating data with a temporal structure.
About the Author
Mikhail Konstantinov is a Data Scientist at ELEKS. Mikhail’s strongest sides are Python, TensorFlow (Google’s Deep Learning Framework), PyTorch and Keras. He is particularly skilled in handling high volumes of structured and unstructured data, building statistical analysis and predictive models, data and feature engineering, and machine learning. Deep learning has been Mikhail’s recent focus; he has over 3 years of commercial experience in building ML models with modern deep learning frameworks and advanced neural arichitectures used for image recognition and segmentation, text processing, behavioral factors analysis, and neural machine translation.
Sign up for the free insideBIGDATA newsletter.
Speak Your Mind