Machine learning: Typical tasks and ways to complete them

20 May 2022
Machine Learning

This article will explain how complex problems can be solved using modern ML methods

Machine learning (ML) is a type of artificial intelligence (AI) that analyzes data so applications can make more accurate predictions. Forecasting new data is carried out by analyzing historical data.

Problems that machine learning can solve

Binary classification

Binary classification uses two class labels and usually involves predicting one of two classes. One class is normal and the other is abnormal. Such a classification can be applied when the decision of an event is either "yes" or "no."

For example, an email may have a normal state (not spam) and an abnormal state (spam). It is difficult to determine which folder a letter might locate to. A medical test task, for example, would be in a normal state if no disease was detected but in an abnormal state if a disease was detected. 

Multi-class classification

Multi-class classification refers to classification with more than two class labels. The number of class labels can vary. For example, a photo can be defined as belonging to one person from a large number of photos in a facial recognition system, just as various plants can be acknowledged as belonging to a certain species.

Regression

Regression analysis is a collection of machine learning methods. They allow the prediction of a continuous outcome variable based on the value of one or more predictor variables. A regression model helps to build a mathematical equation that can predict the outcome given a change in variables.

Clustering

Сlustering is one approach to finding hidden patterns in any unlabeled data. In simple terms, all data points are categorized into groups called clusters. These data points are very similar to other points within the cluster but are dissimilar to points from other clusters.

Anomaly detection

Anomaly detection is a process that detects unusual patterns in a dataset, i.e. those that do not belong to a specific dataset. These anomalies can indicate unusual network traffic, detect a faulty sensor, reveal data to clean up, or detect the presence of an unwanted third party. 

The detection of anomalies is typically used for:

  • Data cleanup
  • Detecting intrusions (database or ecosystem)
  • Detecting fraud
  • Monitoring system health

Recommendations

The recommendation engine uses a variety of algorithms to recommend the most relevant information to users. The algorithm captures the past behavior of the client and, based on this, recommends products, news, or whatever is in line with the user's interests.

Machine learning works with data types such as media (images, videos, sounds), tabular data, time-series data, and text data. Read the examples below to find out how this works. 

Working with images

Besides binary classification and multiclass clustering, machine learning can use deep neural networks for image analysis, in addition to just sorting images. Examples include the process of an X-ray looking for pneumonia or other abnormalities in the lungs, or the analysis of seismic images to identify dynamics.

Working with tabular data

Before exploring this in detail, let's establish what kind of tabular data machine learning can work with. Examples of tabular data include: 

  • Loan application forms
  • Property parameters
  • Customer characteristics
  • Sociological surveys
  • Patients’ vital signs and test results
  • Regional economic indicators

Algorithms that can be applied to analyze tabular data include:

  • Linear regression
  • Logistic regression (binary classification)
  • Linear discriminant analysis (multiple classification)
  • Naive Bayesian classifier (classification)
  • Decision trees (regression, classification)

Analysis of loan application forms 

For example, when analyzing a loan application form, machine learning can apply classification tasks to determine whether it is worth issuing a loan to an applicant. Or, it can use regression to see how much money could be issued within a loan and what the loan non-repayment risk would be for a certain amount.

Customer characteristics

When working with customer data, machine learning processes the information using classification, regression, and clustering. These tasks allow us to determine whether a client will remain or leave, predict how much a client could spend the following month, and divide clients by their interests in certain products for subsequent marketing or by their paying capacity.

In addition, with machine learning, it is possible to determine the interests of one client for further recommendations or to establish the fraudulent use of a credit card.

Patients’ vital signs and test results

If the task is to monitor the vital signs of patients, then classification, regression, and clustering can be used. By using classification algorithms, you can make diagnoses, cluster (identify abnormal processes and regressions), estimate the time to recover, or forecast future diseases.

In addition to the above, we can work with time-series data by using certain algorithms.

Time-series data includes:

  • Seismographic monitoring
  • Moving a car or courier
  • Customer transactions
  • Website logs
  • The cost of exchange instruments

For example, after studying the previous and current value of exchange instruments, you can predict their future value.

Here is a list of algorithms for working with time-series data:

  • Moving averages
  • Brown’s exponential smoothing
  • Holt's double exponential smoothing
  • Holt-Winters’ triple exponential smoothing
  • ARIMA (autoregressive integrated moving average model)
  • SARIMA (seasonal autoregressive integrated moving average model)

Working with text data

Machine learning can be applied when working with the following text data: social media and online chat messages, emails, articles and news, phone conversations, support services, and contracts. Classification can be used to identify positive and negative user reactions and identify leaders in specific communities, as well as to determine the quality of phone operators. Machine learning can determine the topics of documents, the relevance of the text to a topic, and can briefly identify the meaning of a text. In addition, it can sort emails as spam and non-spam, and prioritize incoming emails by importance.

As an output

There are various ways to use machine learning and its algorithms to improve many areas of ​​life. If you have an idea or any queries related to machine learning and its application in your business, we have a specialist with more than ten years’ experience in the industry who is always happy to answer questions. You can leave your contact details here and we will reply to you within two business days.


Authors: V. Nareyko feat. V. Kurbatov