An Introduction to Instance Based Learning

Jack Xia

closeup photo of eyeglasses

Providing an overview of the family of techniques that Howso Engine uses.

Instance-based learning is a family of techniques otherwise known as algorithms, methods or procedures. These algorithms make inferences, predictions, or decisions by comparing a new instance (or data point) to instances (or data) that it has previously encountered.

To make these comparisons, IBL saves data in memory that it can easily reference (or query) to gain understanding about new information.

Now that we have the core definition down, let’s better break IBL down with an analogy. Pretend that you are a student and throughout the semester, you take each piece of information you learn in-class and store it in a filing cabinet. Because you are very well-organized, you place your notes in specific folders depending on the subject. You also create detailed labels about what each folder contains. Luckily for you, at the end of the semester, the final exam is “open note.”

This exam is a piece of cake because of your efforts to organize your notes. You are able to easily find the answer to each question by searching for the folder of information to which the question’s topic is most similar.

Little did you know, by thoughtfully storing data, then finding answers based on information you have already learned, you are behaving exactly as an instance-based learning technique would.

Now that we have a baseline for what IBL is, let’s expand our technical definition. As mentioned earlier, IBL is a family of techniques that gain inference from stored data, so there are a variety of ways in which instance-based learning techniques search through and sort data.

One method you may be familiar with already follows the principles of “nearest neighbors”, which literally predicts information about a new data point by identifying the new point’s closest neighbors in its memory of saved data points. As a member of the IBL family, Howso Engine uses a technique that is similar nearest neighbors, but includes advances in statistics, physics, game theory, and information theory to improve its performance.

How is IBL different than other AI?

Most other forms of AI, such as neural networks, use models or abstractions of relationships within the data to make predictions about new information. By nature, these models are not very understandable by humans. That’s why we refer to neural networks as black box AI.

Let’s expand upon our definition of black box AI by extending it to our exam prep analogy. With an IBL approach, we meticulously organize and save each piece of original information from your class that you can directly cite during your final exam.

Now we’ll walk through what a black box AI approach would look like. You would study your notes, remember how they connect to other pieces of information from your class, and then throw your notes out. Unfortunately, come final exam time, you now have no notes to directly cite.

You remember how all the information connects together but you can’t verify where the information came from. While some people have great memories, or even spend a lot of time studying, if you always throw the notes out, you will never be able to use them during the exam.

With black box, you rely on an abstraction of your information but you can never truly understand it because you can’t cite your sources. With IBL, you can trust your information because you are answering questions using your original information and have the capability to directly cite it.

Why is IBL technology useful?

If you are using AI to make decisions about people’s lives, you need to know where the basis of that information came from and you need to be able to cite your source.

Think about it this way: what if the student we’ve been discussing is learning to be a medical doctor? Would you be more comfortable with your doctor making recommendations to you about your health if they can’t cite their sources?

Or, would you feel more comfortable if your doctor could directly cite the information they have learned? If it were me, I would vote for the latter. IBL technology shifts the burden of trust on the AI itself, asking it to directly cite its source and enhancing our ability to make human critical decisions with technology.