Instance-based Learning: A Powerful Overview of an Alternative Approach

Jack Xia

October 12, 2023

Providing an overview of the family of techniques that Howso Engine uses.

In the world of AI and machine learning, model-based learning tools such as ChatGPT have taken the world by storm. These tools are often powered by deep neural networks and represent some of the latest advances in the field. Despite the amazing progress and countless applications, model-based learning tools share some inherent structural challenges that limit their ability to be understood by humans. Howso Engine is different. Instead of neural networks, Engine relies on instance-based learning (IBL).

So, what is IBL? How is it different than other AI? And why does IBL differentiate Howso as an Understandable AI Engine? In this blog post, we’ll answer each of these questions, helping you better understand the Howso difference.

What is IBL?

Most of the techniques today in AI and machine learning use model-based learning approaches. These techniques take the data and abstract them down to simpler representations. For example, a simple linear regression is a type of model-based technique as it takes the dataset and tries to compress the underlying trend into a single function. Modern AI algorithms are much more sophisticated than a simple regression, however, they are trying to accomplish the same thing conceptually.

Unlike model-based learning, instance-based learning algorithms instead retain the instances (data) in memory and make inferences, predictions, or decisions by comparing new instances (or data points) to the retained instances. By referencing or querying the saved instances directly, this method has many advantages that a model-based approach cannot replicate.

IBL Example

Now that we have the core definition down, let’s use an analogy to illustrate IBL. Imagine you are a student, and you taking an open notes exam. Throughout the semester you had taken each piece of information you learned in class and stored it in a filing cabinet. Because you are very well-organized, you placed your notes in specific folders depending on the subject. You also created detailed labels about what each folder contains so they are easy to query. Thus, during the exam, you can easily find the answer to each question by searching through the folder of information related to that topic. Little did you know, by thoughtfully storing data, then finding answers based on information you have already learned, you are behaving exactly as an instance-based learning technique would! In this situation, a model-learning technique would be more analogous to the teacher allowing you to write down some formulas on a single notecard. While some subjects can be fit onto a single notecard, there will be many where you wished you would have had all of your notes!

Now that we have a baseline for what IBL is, let’s expand our technical definition. As mentioned earlier, IBL is a family of techniques that gain inference from stored data, so there are a variety of ways in which instance-based learning techniques search through and sort data. One method you may be familiar with already follows the principles of “nearest neighbors”, which predicts information about a new data point by identifying the new point’s closest neighbors in its memory of saved data points. As a member of the IBL family, Howso Engine uses a technique that is similar nearest neighbors, but includes advances in statistics, physics, game theory, and information theory to improve its performance.

Want to learn more? Access open source here

What are some advantages of IBL?

Most other forms of AI, such as neural networks, use models or abstractions of relationships within the data to make predictions about new information. By nature, these models are not transparent. That is why we refer to neural networks as black box AI.

Let us expand upon our definition of black box AI by extending it to our exam prep analogy. With an IBL approach, we meticulously organized and saved each piece of original information from your class so that you are able to directly cite them your final exam. In the black box AI approach where you have some formulas written down on a notecard, you have no notes to directly cite. If you forget how these individual formulas fit into the problem statement, there is no additional information you can retrieve. You might have even written down the wrong formula and you cannot reference your original notes or print materials to make sure the formula is written down correctly.

With black box AI, you rely on an abstraction of your information, but you can never truly understand it because you cannot cite your sources. With IBL, you can trust your information because you are answering questions using your original information and have the capability to directly cite it.

Why is IBL technology useful?

If you are using AI to make decisions about people’s lives, you need to know where the basis of that information came from, and you need to be able to cite your sources.

Think about it this way: let’s say that your doctor is recommending a risky procedure. Would you be more comfortable with your doctor’s recommendations if told you to just trust them and they are unable to cite their sources? Or, would you feel more comfortable if your doctor could directly pull the peer-reviewed study from his desk and show the information directly to you? If it were me, I would vote for the latter. IBL technology shifts the burden of trust onto the AI itself, asking it to directly cite its source and enhancing our ability to make human critical decisions with technology. Ready to jump in? Access our developers page here.

Newer to data science concepts? Try our playground