Incorrect GROUP BY

From SQLZoo
Jump to navigation Jump to search
schema:gisq

Problem

When using a GROUP BY clause every field in the SELECT list must be either:

  • One of the GROUP BY terms - in this case region
  • An aggregate function - for example SUM or COUNT
  • An expression based on the above

In the example the field name may not be used on the SELECT line.

In a group by line each region shows up only once - however in a typical region such as Africa there are several different name values. WHich one should SQL pick?

SELECT name, region, MAX(population)
  FROM bbc
  GROUP BY region

Solutions

  • Remove the offending field from the SELECT line
SELECT region, MAX(population)
 FROM bbc
 GROUP BY region
  • Add the field to the GROUP BY clause
SELECT name, region, MAX(population)
 FROM bbc
 GROUP BY name,region
  • Aggregate the offending field
SELECT MAX(name), region, MAX(population)
 FROM bbc
 GROUP BY region
DataWars, Data Science Practice Projects - LogoDataWars: Practice Data Science/Analysis with +100 Real Life Projects