MoL Chapter 8 Lab

From SQLZoo
Jump to navigation Jump to search
List all details from the weather table for dunstaffnage for 1974 so that you can see the NULL value
SELECT *
  FROM weather
 WHERE station='dunstaffnage' AND yr=1974
SELECT *
  FROM weather
 WHERE station='dunstaffnage' AND yr=1974
What output do you expect from this query? Notice that some values are missing in the tmin column for this year and station.
+--------------+------+------+------+------+
| station      | yr   | mnth | tmin | tmax |
+--------------+------+------+------+------+
| dunstaffnage | 1974 |    1 |  3.8 |  8.7 |
| dunstaffnage | 1974 |    2 |  3.7 |    8 |
| dunstaffnage | 1974 |    3 |  2.8 |  8.9 |
| dunstaffnage | 1974 |    4 |  3.9 |   14 |
| dunstaffnage | 1974 |    5 |  7.3 | NULL |
| dunstaffnage | 1974 |    6 |  8.3 | 16.8 |
| dunstaffnage | 1974 |    7 | 10.2 | 15.9 |
| dunstaffnage | 1974 |    8 | 10.7 | 16.9 |
| dunstaffnage | 1974 |    9 |    8 |   14 |
| dunstaffnage | 1974 |   10 |  5.4 | 10.9 |
| dunstaffnage | 1974 |   11 |  3.7 |  8.9 |
| dunstaffnage | 1974 |   12 |  4.6 |  9.1 |
+--------------+------+------+------+------+
SELECT COUNT(tmin), COUNT(tmax)
  FROM weather
 WHERE station='dunstaffnage' AND yr=1974
SELECT COUNT(tmin), COUNT(tmax)
  FROM weather
 WHERE station='dunstaffnage' AND yr=1974
Find the number of non-NULL values for station and the number of non-NULL values for tmin over the whole data set.
SELECT COUNT(station),COUNT(yr)
  FROM weather
SELECT COUNT(station),COUNT(tmin)
  FROM weather
You should notice that the columns station, yr and mnth are never NULL.
Show the number readings in the whole data set together with the number of missing values for ewach of the columns tmin, tmax, rain and sun.
You can use the expression COUNT(station)-COUNT(tmin) to find the number of missing values in tmin. Your output should look like this:
+----------+--------------+--------------+--------------+-------------+
| readings | tmin missing | tmax missing | rain missing | sun missing |
+----------+--------------+--------------+--------------+-------------+
|     7756 |          104 |          105 |           79 |        1931 |
+----------+--------------+--------------+--------------+-------------+
SELECT COUNT(station),COUNT(yr)
  FROM weather
SELECT COUNT(station) AS readings,
       COUNT(station)-COUNT(tmin) AS 'tmin missing',
       COUNT(station)-COUNT(tmax) AS 'tmax missing',
       COUNT(station)-COUNT(rain) AS 'rain missing',
       COUNT(station)-COUNT(sun)  AS 'sun missing'
  FROM weather
List the station, the year and the month for each reading where tmin is NULL in 2008.
SELECT station, yr, mnth
  FROM weather
WHERE yr=2000
SELECT station, yr, mnth
  FROM weather
WHERE tmin IS NULL AND yr=2008
All stations have some readings for sun and two stations have complete records. List the number of missing sun values and the total number of readings for each station.
For each station show the number of readings and the number of missing sun values.
SELECT station, COUNT(station)
  FROM weather
GROUP BY station
SELECT station,count(1) as 'readings',count(1)-count(sun) as 'missing sun'
  FROM weather
 GROUP BY station
Stornoway has several missing tmin values from 1920. Where tmin is missing we can estimate it by subtracting 6 from the tmax value when the real value is NULL.
Show the mnth, estimated tmin and actual tmax for Stornoway in 1920. Round your answers to 1 decimal place.
You will need to use the COALESCE function for this.
SELECT mnth, tmin,ROUND(tmax-6,1) AS 'est tmin', tmax
  FROM weather
 WHERE station='Stornoway' AND yr = 1920
SELECT mnth, tmin,ROUND(COALESCE(tmin,tmax-6),1) 'est tmin', tmax
  FROM weather
 WHERE station='Stornoway' AND yr = 1920
Wick has several missing values for tmin and tmax in month 2 for the years 1920,1923,1934,1998 - we need to put these right as well as we can.
Show yr, tmin, tmax, est_tmin and est_tmax for Wick in February for the years 1920,1923,1934 and 1998.
  • Where tmin is missing estimate it using tmax-5 if possible and 1.0 if tmax is also missing.
  • Where tmax is missing use tmin+5 if possible and 6.0 if tmin is also missing.
You will need to use the COALESCE function for this.
SELECT yr,tmax,tmin
  FROM weather
 WHERE station='wick' and mnth=2
 AND yr in (1920,1923,1934,1998)
SELECT yr, tmin,tmax,
             ROUND(COALESCE(tmin,tmax-5,1),1) 'est tmin',
             ROUND(COALESCE(tmax,tmin+5,6),1) 'est tmax'
  FROM weather
 WHERE station='wick' and mnth=2
 AND yr in (1920,1923,1934,1998)
DataWars, Data Science Practice Projects - LogoDataWars: Practice Data Science/Analysis with +100 Real Life Projects