Module 12: Linear Regression and Correlation

# Outliers

Barbara Illowsky & OpenStax et al.

**outlier**is an observation of data that does not fit the rest of the data. It is sometimes called an

**extreme value**. When you graph an outlier, it will appear not to fit the pattern of the graph. Some outliers are due to mistakes (for example, writing down 50 instead of 500) while others may indicate that something unusual is happening.

The following video gives an introduction to the idea of an outlier in a set of data.

*IQR*can help to determine potential outliers. A value is suspected to be a potential outlier if it is less than (1.5)(

*IQR*) below the first quartile or more than (1.5)(

*IQR*) above the third quartile. Potential outliers always require further investigation.

### Note

A potential outlier is a data point that is significantly different from the other data points. These special data points may be errors or some kind of abnormality or they may be a key to understanding the data.

### example

For the following 13 real estate prices, calculate the *IQR* and determine if any prices are potential outliers. Prices are in dollars.

389,950; 230,500; 158,000; 479,000; 639,000; 114,950; 5,500,000; 387,000; 659,000; 529,000; 575,000; 488,800; 1,095,000

Solution:

Order the data from smallest to largest.

114,950; 158,000; 230,500; 387,000; 389,950; 479,000; 488,800; 529,000; 575,000; 639,000; 659,000; 1,095,000; 5,500,000

*M* = 488,800

*Q*1 = = 308,750

*Q*3 = = 649,000

*IQR* = 649,000 – 308,750 = 340,250

(1.5)(*IQR*) = (1.5)(340,250) = 510,375

*Q*1 – (1.5)(*IQR*) = 308,750 – 510,375 = –201,625

*Q*3 + (1.5)(*IQR*) = 649,000 + 510,375 = 1,159,375

No house price is less than –201,625. However, 5,500,000 is more than 1,159,375. Therefore, 5,500,000 is a potential outlier.