Infer Meaning from Point Patterns with Nearest Neighbor Distances

By Lillian Pierson

Distance-based point pattern measurement, as in nearest neighbor algorithms, is useful for describing the second order effects inherent within a dataset — those effects caused by interactions between data points in a dataset. Most of the time, you probably wouldn’t know that your data points are influencing one another, but you can test for this kind of influence by using nearest neighbor distances to draw inferences from point patterns.

To illustrate the idea of second order effects within a dataset, consider a dataset that describes the spread of Ebola in Sierra Leone populations. Population density is definitely a risk factor for the spread of Ebola within the community, due to the fact that the disease is more likely to spread when infected people are in close proximity to non-infected people. The spread of Ebola is an effect that’s directly caused by interactions between “data points” — the individuals in an Ebola-affected population.

Average nearest neighbor algorithms calculate a descriptive index value that represents the average distance between a data point and its nearest neighbor. If the calculated index value is less than 1, then the data is said to show clustered patterning. If the index value is greater than 1, then the data is said to show dispersion patterning.

So how can you detect interactions between data points? You have to look at the patterns within the data. Clustered patterning indicates that there is some sort of interaction going on between the data points and that this interaction is causing an increase in average similarity values. The easiest way to understand this concept is to think of two oppositely charged magnets in close, proximate distance of one another. If you look at the descriptive index value for the magnets and see that it is less than 1, then you can even assume that the magnets are of the opposite charge — and thus attracting one another — just based on this index value alone.

In dispersion patterning, on the other hand, interaction between the data points causes a decrease in average similarity values. Going back to the magnets analogy, this time the two magnets repel one another. If you look at the descriptive index value for the magnets and see that it is greater than 1, then you can assume that the magnets are of like charge — and thus repelling one another.