Module 12: Linear Regression and Correlation

Outliers

Barbara Illowsky & OpenStax et al.

In a given set of data, you want to look for an overall pattern and any outliers. An
outlier is an observation of data that does not fit the rest of the data. It is sometimes called an extreme value. When you graph an outlier, it will appear not to fit the pattern of the graph. Some outliers are due to mistakes (for example, writing down 50 instead of 500) while others may indicate that something unusual is happening.

The following video gives an introduction to the idea of an outlier in a set of data.

The IQR can help to determine potential outliers. A value is suspected to be a potential outlier if it is less than (1.5)(IQR) below the first quartile or more than (1.5)(IQR) above the third quartile. Potential outliers always require further investigation.


Note

A potential outlier is a data point that is significantly different from the other data points. These special data points may be errors or some kind of abnormality or they may be a key to understanding the data.

example

For the following 13 real estate prices, calculate the IQR and determine if any prices are potential outliers. Prices are in dollars.

389,950; 230,500; 158,000; 479,000; 639,000; 114,950; 5,500,000; 387,000; 659,000; 529,000; 575,000; 488,800; 1,095,000

Solution:

Order the data from smallest to largest.

114,950; 158,000; 230,500; 387,000; 389,950; 479,000; 488,800; 529,000; 575,000; 639,000; 659,000; 1,095,000; 5,500,000

M = 488,800

Q1 = displaystylefrac{{{230},{500}+{387},{000}}}{{2}} = 308,750

Q3 = displaystylefrac{{{639},{000}+{659},{000}}}{{2}} = 649,000

IQR = 649,000 – 308,750 = 340,250

(1.5)(IQR) = (1.5)(340,250) = 510,375

Q1 – (1.5)(IQR) = 308,750 – 510,375 = –201,625

Q3 + (1.5)(IQR) = 649,000 + 510,375 = 1,159,375

No house price is less than –201,625. However, 5,500,000 is more than 1,159,375. Therefore, 5,500,000 is a potential outlier.

License

Icon for the Creative Commons Attribution 4.0 International License

Adapted By Darlene Young Inroductory Statistics by Barbara Illowsky & OpenStax et al. is licensed under a Creative Commons Attribution 4.0 International License, except where otherwise noted.

Share This Book