Discover the patterns in your data! Learn to identify different types of distributions, spot outliers, and understand how they affect your statistical analysis.
• Bell-shaped curve
• Symmetric around the mean
• Mean = Median = Mode
• Most data near the center
• Right-skewed: Tail extends right
• Left-skewed: Tail extends left
• Mean ≠ Median ≠ Mode
• Asymmetric shape
• All values equally likely
• Flat, rectangular shape
• No clear center
• Equal frequency across range
• Two distinct peaks
• Two modes
• Often indicates two groups
• Valley between peaks
• Data points that are significantly different from other observations
• Values that fall far outside the normal range
• Can be caused by measurement errors, rare events, or data entry mistakes
• Can significantly affect statistical measures
Data: 85, 88, 92, 89, 87, 91, 90, 88, 86, 45
Step 1: Calculate statistical measures
Mean = 84.1
Median = 88.5
Range = 47
Step 2: Identify the outlier
The score of 45 is much lower than all other scores
It's more than 30 points below the next lowest score
Step 3: Impact analysis
Without outlier: Mean = 89.4, Median = 88.5
The outlier significantly lowered the mean but didn't affect the median much
Data: 12, 15, 18, 20, 22, 25, 28, 30, 35, 50
Step 1: Find Q1, Q2 (median), and Q3
Q1 = 18 (25th percentile)
Q2 = 23.5 (50th percentile)
Q3 = 30 (75th percentile)
Step 2: Calculate IQR
IQR = Q3 - Q1 = 30 - 18 = 12
Step 3: Find outlier boundaries
Lower boundary = Q1 - 1.5 × IQR = 18 - 18 = 0
Upper boundary = Q3 + 1.5 × IQR = 30 + 18 = 48
Step 4: Identify outliers
The value 50 is above the upper boundary (48)
Therefore, 50 is an outlier
Data: Income levels in a neighborhood: $30k, $35k, $40k, $45k, $50k, $55k, $60k, $65k, $70k, $200k
Statistical Measures:
Mean = $69,000
Median = $52,500
Mode = No clear mode
Skewness Analysis:
Mean > Median → Right-skewed
The high income ($200k) pulls the mean to the right
Interpretation:
Most people earn around $30k-$70k, but one person earns much more, creating a long tail to the right.
Data: Test scores: 20, 85, 88, 90, 92, 94, 95, 96, 97, 98
Statistical Measures:
Mean = 85.5
Median = 93
Most scores are high (85-98)
Skewness Analysis:
Mean < Median → Left-skewed
The low score (20) pulls the mean to the left
Interpretation:
Most students scored very well (85-98), but one student scored much lower, creating a long tail to the left.
• Consistent measurement units
• No obvious outliers
• Reasonable distribution shape
• Mean and median close together
• Many extreme outliers
• Inconsistent units or scales
• Unusual distribution patterns
• Large gaps between mean and median
Scenario: Analyzing heights of 8th grade students: 150cm, 155cm, 160cm, 165cm, 170cm, 175cm, 180cm, 185cm, 190cm, 250cm
Statistical Analysis:
Mean = 176cm
Median = 172.5cm
Range = 100cm
Quality Issues:
Recommendation:
Investigate the 250cm measurement - it's likely a mistake (should be 150cm or 160cm).
Outliers might be legitimate data points. Always investigate before removing them.
For skewed distributions, the median is often a better measure of central tendency.
The shape of the distribution tells you important information about your data.
Problem 1:
Identify the outlier in: 12, 15, 18, 20, 22, 25, 28, 30, 35, 100
The outlier is 100 - it's much larger than all other values.
Problem 2:
Is this distribution right-skewed or left-skewed: Mean = 45, Median = 50?
Left-skewed. When mean < median, the distribution has a long tail to the left.
Problem 3:
Which measure is most affected by outliers: mean, median, or mode?
The mean is most affected by outliers because it includes all values in its calculation.