Applied Machine Learning in Affective Computing and Human Movement

Machines’ Limitations and Motives

The technological evolutions have contributed to every field of life and tempted humans’ expectations more and more during the last two decades. Machines are expected to act and behave like a human which is a very ambitious demand. The main difference is the intelligence that a human has which is based on learned knowledge. Humans learn knowledge through observations, training, and experiences. The machines not only need to have knowledge but also the ability to think, feel, express, and decide. Efforts have been reported towards vesting the intelligence using different approaches with partial successes.

Machines are now capable of dealing with complex situations in a single domain where they have been trained. In order to enable them to solve complex problems in every single domain of life, we need to train them for every single domain of knowledge, which is not a simple and viable approach. The AI challenges can be categorized in terms of situational, mental, textual, visual/audio, and hybrid without going into Psychological and Physiological technicalities.

Challenges in Detecting Human Emotions

Artificial Intelligence (AI) is now part of daily life in one way or another. It is the study of principles about normal agents and it’s constructing components as described by Russell and Norvig [1]. These agents are referred to as rationality which deals with the broadness of mental attitude [2, 3]. The rationality notions are now more helpful to explain the limitations of humans’ daily life [4]. In other words, data sets about our daily activities can now easily be recorded and used for analysis or implementation in an AI-based system. Although, there are some exceptions where further investigations and research is required.

One of the main challenges is how to monitor the emotional states of a human. According to Shaver [5], the basic emotions include love, joy, surprise, anger, sadness, and fear. The problem further exaggerates when you need to know the secondary and tertiary emotions. For example, the primary emotion “Anger” has six secondary emotions including “irritation”, and for irritation itself, there are six different tertiary emotions namely aggravation, irritation, agitation, annoyance, grouchiness, and grumpiness. Furthermore, it should also detect the intensity of emotion because a human can judge the level of anger or other emotions. The detection is not impossible if brain signals can be intercepted and translated into emotional status accordingly. A brain-computer interface (BCI), sometimes called a mind-machine interface (MMI) is an area of study where further research is needed to achieve the optimal goals of AI.

Challenges in Detecting Human Sentiments

Ever since the introduction of big data, data mining techniques have got more focus on the research community. Sentiment analysis also referred to as text analysis or opinion mining, is one of the basic and most challenging tasks of Affective Computing. The technology has evolved and soon future services and products will be defined according to public opinions and reviews. This will increase the prominence of sentiment analysis and affective computing. Text analysis requires some efficient algorithms to detect the correct sentiment from the text. The literature shows that texts can be successfully analyzed and classified in narrow categories (binary polarity), for example, positive or negative, good or bad, etc. Again, these achievements are partial and need further research to analyze the text and give detailed and desired results.

Opinion mining is usually used to classify reviews of products or services. By expanding the basics, Syner [6] proposed techniques to classify the reviews on a multi-scale to get in-depth analysis. After reading a text paragraph, a human can comment in an objective as well as subjective manner. For machines, it is a more difficult task to classify text into categories namely objective and subjective. The main reason is that in some texts, the subjectivity of words may largely depend on their context while in other texts the objectivity of a paragraph may count on some subjective sentences. This problem can be dealt with by removing objective sentences before the text is classified in binary polarity as demonstrated by Pang [7].

There are several tools available for this purpose which can be used for opinion mining. Some most widely used tools are given as;

Natural Language Toolkit (NLTK): provides libraries for statistical natural language processing using Python language.
OpenNLP: provides a library for machine learning-based natural language processing.
DiscoverText: provides cloud-based text analysis features like Active Learning machine classification engine.
Microsoft Distributed Machine Learning Toolkit (DMTK): provides support to data parallelization, distributed word embedding algorithm, and topic model algorithm.
SentiWordNet: provide a lexical resource for opinion mining.
NetOwl: provide support to extracts entities, events, and links, performs name matching.
GATE: provide is an open-source toolkit for Text Mining.
WordNet-Affect: provide support to correlate effective concepts with affective.

To read more about relevant tools and software follow the link at [8].

Potential Way-outs

As discussed in the previous section, there has been progressing in every aspect of AI but still, complete solutions to the related problems are not effective enough. While analyzing the problem as a whole system, the defective component seems to the integration approaches which combine the different parts. To further explain this concept, let us consider an example in the given scenario.

Way-outs Implementation in a Use-case Scenario

As mentioned earlier, it is a challenging task to detect emotions and their intensity like anger and smile through existing approaches. When monitoring the health status of a patient, we can detect the anger by analyzing the facial features in a given image. In addition, we can have sensors attached to the patient’s body for monitoring the heartbeat and blood pressure, etc. Now, the level of anger can be more accurately perceived by analyzing all the data from relevant sources. In this case, we can have the following data sources;

The face expression analysis
Heartbeat rate
Blood pressure level
Background image mood

The face expression analysis can be combined with the analysis detected from heartbeat and blood pressure sensors. In addition, if we are able to detect the background of the image and sense the mood there, it will further improve the accuracy.

By applying available machine learning techniques, the idea of deep learning can be applied where different combinations of results suggest different outputs. Deep Learning is originally based on machine learning but in multiple layers. Thus, it is the integration of available technologies that can make a difference. These technologies can be exploited for enhancing the results pertaining to a specific task.

Deep Learning in Human Movement Detection

Human movement (HM) detection has emerged as a new active research direction in the last decade. It is favorable to use in Human-Computer-Interface (HCI), video surveillance, health, and sports applications. In HCI, it is widely used for visual commands and avatar controls. In video surveillance, movement detection is used for geofencing to monitor the virtual geographical boundaries, crowd surveillance, and abnormal activities. HM play a vital role in health applications, for example, it is used for elderly monitoring and gait analysis. Physical fitness is also analyzed in sports using HM.

Human movement is analyzed by video sequences. For this purpose, kinematic parameters are investigated in a biomechanical study of human motion. Kinematic are all geometrical properties of motion for the position, velocity, and acceleration. For human movement detection in a video, segmentation is performed where moving objects are identified by pixel changes. Segmentation can also be performed using either temporal or spatial information of the images in a video. The depth imagery analysis has emerged after the introduction of time-of-flight (ToF) cameras using depth sensors. This provided exciting opportunities to develop Natural User Interface (NUI) which is based on hand gestures [9].

The most challenging task is to identify the human body, face, hands, and legs in the image. There is a huge variance in human body shapes, sizes, and appearances based on many reasons like environmental and cultural. For this purpose, several approaches have been identified and proposed mainly statistical models. These models learn the object patterns first and then are used for comparisons with the patterns detected from the images. Objects are detected using their features such as size, edges, color, and texture, etc. The model has to learn all the features and their combination for object estimation. In this case, the well-suited approach is to use deep learning where features can be considered as their layers. Deep learning algorithms can model abstraction from the data.

Key Takeaways

Using a machine learning technique like SVM is well suited for smaller data sets while deep learning is the perfect choice for larger datasets and complex problems.
The research areas like image recognition, human movement detection, and text/sentiment analysis deal with larger datasets, and hence deep learning algorithms are highly recommended.
In order to solve a problem or designing an AI-based system the choice of the best tool, technique, and integration plays a significant role in achieving desired results.

Machines’ Limitations and Motives

Challenges in Detecting Human Emotions

Challenges in Detecting Human Sentiments

Potential Way-outs

Way-outs Implementation in a Use-case Scenario

Deep Learning in Human Movement Detection

Key Takeaways

Recommended Reading