Fundamentals of Machine Learning

1. Intro to Machine Learning

From smart assistants to self-driving cars, you’ve probably seen or heard a lot about AI (Artificial Intelligence) in the past couple of years. AI has been a booming field that is being increasingly integrated into a variety of scientific and commercial fields. But, the AI that is commonly used in our society is not the same as the AI you see in the movies. AI is a very broad term to describe applications when a machine mimics cognitive functions that humans associate with human minds such as “learning” and “problem-solving”

Artificial intelligence systems are built upon ML (machine learning) algorithms. Machine learning incorporates algorithms that are trained on data and apply what they have “learned” to make decisions on output. A wide spectrum of programs can be considered machine learning programs, from the linear regression function on your calculator to complex medical diagnosis ML programs.

Now, you might ask, “When would we even need machine learning algorithms when humans could perform the same functions?” ML is mainly used for the following purposes:

  • Humans can’t explain their expertise such as in forensic analysis
  • Models must be customized such as in personalized medicine
  • There are large amounts of data that a human can’t interpret such as in biochemistry
  • Humans are not efficient at a task such as industrial applications

Well, how does an ML model work then? Think back to middle school math, when you learned that functions can be defined by the form y=f(x). Given a value x, the function f(x) would output a value of y. Now imagine that you only know the y-value in certain cases, but want to predict the y-value in cases where you don’t know the y-value. That is where a basic prediction function comes into play.

Let’s say you have a dataset of the accuracy of NFL quarterbacks versus their arm lengths, with arm length being the x-value and accuracy being the y-value. If you wanted to predict the accuracy of a quarterback with an arm length of 3.5 feet, but no such quarterback exists in the NFL, you could use a machine learning model to predict the hypothetical quarterback’s arm length. First, you would split the given dataset you have into a training set and a testing set for your model. Then, you would choose between different types of models to find a f(x) that best fits the given trend of accuracies and arm lengths. The test set would be used to evaluate the function f(x) to determine how accurate it is at generalizing a trend.

Of course, in the above sample, the data would be relatively clean and have only 1 variable in the X and 1 variable in the Y. In machine learning models applied in scientific and commercial fields, there could be hundreds or thousands of variables in the X or Y fields. This is where a distinction between real and ideal data can be made. Ideal data has an easily generalizable trend with strong correlations between variables. Real data is data in the real world that does not always have strong correlations between variables.

Types of Machine Learning

Within machine learning, there are two main types: Supervised and Unsupervised learning. Imagine you are a machine learning scientist working on a cancer diagnosis algorithm in a hospital. You are given a set of data that has X-rays labeled as having cancer and not having cancer. This is supervised learning, where the data is preprocessed and has the desired outputs you can train the model on. Now imagine you are given the X-ray data but it is not labeled by diagnosis and you have to generalize a trend on your own. This is called unsupervised learning, which requires the use of methods such as clustering to identify trends in the data.

AI Academies

There is also a third, less common, type of machine learning called reinforcement learning. In reinforcement learning, an ML algorithm learns through trial and error. The algorithm would get rewarded and penalized for particular actions. Reinforcement learning algorithms think several steps ahead, as they always attempt to make the optimum move or near optimum move if the optimum move is not realistic given the complexity of the program. Reinforcement learning is used in specialized cases such as certain video games with AI opponents, travel planning algorithms, and budget optimization algorithms.

Regression vs. Classification

There are two main tasks that can be accomplished with ML, regression, and classification. Regression involves making quantitative predictions on continuous data. Classification involves putting data into qualitative categories.

AI Academies

Summary

Machine learning has a variety of applications in our day-to-day lives from social networks to personalized medicine to recommendation algorithms and navigation. Understanding the foundations of ML and how to use it will become an increasingly important skill in years to come.