Difference between pages "Self join" and "SELECT .. GROUP BY"

From SQLZOO
(Difference between pages)
Jump to: navigation, search
(As stated, Question 7 result should only give one line for each unique pair of company, num; thus added DISTINCT.)
 
 
Line 1: Line 1:
==Edinburgh Buses==
 
<p>[[Edinburgh_Buses. |Details of the database]] Looking at the data</p>
 
stops('''id''', name)
 
route('''num''','''company''','''pos''', ''stop'')
 
  
<div class='schema'></div>
+
<h1>SELECT .. GROUP BY</h1>
<div class = 'ref_section'>
+
Host cities and continents for the Olympics Games are stored in the table <code>games</code>.
<table class = 'db_ref'>
+
Notice that Europe appears in the table twice:
<tr><th>'''stops'''</th><th>'''route'''</th></tr>
+
<table>
<tr><td>id</td><td>num</td></tr>
+
<caption align='center'>'''games'''</caption>
<tr><td>name</td><td>company</td></tr>
+
<tr><th align='center'>'''yr'''</th><th align='center'>'''city'''</th><th align='center'>'''continent'''</th></tr>
<tr><td></td><td>pos</td></tr>
+
<tr><td>2000</td><td align='left'>Sydney</td><td align='left'>Australasia</td></tr>
<tr><td></td><td>stop</td></tr>
+
<tr><td>2004</td><td align='left'>Athens</td><td align='left'>Europe</td></tr>
<tr><td></td><td></td></tr>
+
<tr><td>2008</td><td align='left'>Beijing</td><td align='left'>Asia</td></tr>
 +
<tr><td>2019</td><td align='left'>London</td><td align='left'>Europe</td></tr>
 
</table>
 
</table>
</div>
+
 
+
<div class='ht'>
<div class="progress_panel"><div>
+
<div class='params'>schema:scott</div>
  <div class="summary">Summary</div>
+
<source lang=sql class='tidy'> DROP TABLE games</source>
  <div class="progressbarbg">
+
<source lang=sql class='setup'> CREATE TABLE games(
    <div class="progressbar"></div>
+
  yr INTEGER,
  </div>
+
  city VARCHAR(20),
</div></div>
+
  continent VARCHAR(20));
 
+
INSERT INTO games VALUES (2000,'Sydney','Australasia');
<div class='qu'>
+
INSERT INTO games VALUES (2004,'Athens','Europe');
How many '''stops''' are in the database. 
+
INSERT INTO games VALUES (2008,'Beijing','Asia');
<source lang='sql' class='def'>
+
INSERT INTO games VALUES (2012,'London','Europe');
</source>
 
 
 
<source lang='sql' class='ans'>
 
SELECT COUNT(*)  
 
FROM stops
 
</source>
 
</div>
 
 
 
<div class='qu'>
 
Find the '''id''' value for the stop 'Craiglockhart'   
 
<source lang='sql' class='def'>
 
</source>
 
 
 
<source lang='sql' class='ans'>
 
SELECT id
 
FROM stops
 
WHERE name='Craiglockhart'
 
</source>
 
</div>
 
 
 
 
 
<div class='qu'>
 
Give the '''id''' and the '''name''' for the '''stops''' on the '4' 'LRT' service.   
 
<source lang='sql' class='def'>
 
 
</source>
 
</source>
 
+
In a GROUP BY statement only <i>distinct</i> values are shown for the column in the GROUP BY.
<source lang='sql' class='ans'>
+
This example shows the continents hosting the Olympics with the count of the number of games held.
SELECT id, name FROM stops, route
+
<source lang='sql' class='def e-oracle'>
  WHERE id=stop
+
SELECT continent, COUNT(yr) FROM scott.games
    AND company='LRT'
+
GROUP BY continent
    AND num='4'
 
 
</source>
 
</source>
</div>
 
 
<h2>Routes and stops</h2>
 
<div class='qu'>
 
The query shown gives the number of routes that visit either London Road (149) or Craiglockhart (53). Run the query and notice the two services that link these '''stops''' have a count of 2.
 
Add a HAVING clause to restrict the output to these two routes.   
 
 
<source lang='sql' class='def'>
 
<source lang='sql' class='def'>
SELECT company, num, COUNT(*)
+
SELECT continent, COUNT(yr) FROM games
FROM route WHERE stop=149 OR stop=53
+
GROUP BY continent
GROUP BY company, num
 
</source>
 
 
 
<source lang='sql' class='ans'>
 
SELECT company, num, COUNT(*)
 
FROM route WHERE stop=149 OR stop=53
 
GROUP BY company, num
 
HAVING COUNT(*)=2
 
 
</source>
 
</source>
 
</div>
 
</div>
  
<div class='qu'>
+
<p>See also</p>
Execute the self join shown and observe that b.stop gives all the places you can get to from Craiglockhart, without changing routes.
+
<ul>
Change the query so that it shows the services from Craiglockhart to London Road.     
+
   <li>[[SUM_and_COUNT |SUM and COUNT]]</li>
<source lang='sql' class='def'>
+
   <li>[[SELECT_.._WHERE |SELECT WHERE]]</li>
SELECT a.company, a.num, a.stop, b.stop
+
</ul>
FROM route a JOIN route b ON
 
  (a.company=b.company AND a.num=b.num)
 
WHERE a.stop=53
 
</source>
 
 
 
<source lang='sql' class='ans'>
 
SELECT a.company, a.num, a.stop, b.stop
 
FROM route a JOIN route b ON
 
  (a.company=b.company AND a.num=b.num)
 
WHERE a.stop = 53 AND b.stop=149
 
</source>
 
</div>
 
 
 
<div class='qu'>
 
The query shown is similar to the previous one, however by joining two copies of the '''stops''' table we can refer to '''stops''' by '''name''' rather than by number.
 
Change the query so that the services between 'Craiglockhart' and 'London Road' are shown. If you are tired of these places try 'Fairmilehead' against 'Tollcross'     
 
<source lang='sql' class='def'>
 
SELECT a.company, a.num, stopa.name, stopb.name
 
FROM route a JOIN route b ON
 
  (a.company=b.company AND a.num=b.num)
 
  JOIN stops stopa ON (a.stop=stopa.id)
 
  JOIN stops stopb ON (b.stop=stopb.id)
 
WHERE stopa.name='Craiglockhart'
 
</source>
 
 
 
<source lang='sql' class='ans'>
 
SELECT a.company, a.num, stopa.name, stopb.name
 
FROM route a JOIN route b ON
 
  (a.company=b.company AND a.num=b.num)
 
   JOIN stops stopa ON (a.stop=stopa.id)
 
  JOIN stops stopb ON (b.stop=stopb.id)
 
WHERE stopa.name='Craiglockhart'
 
  AND stopb.name='London Road'
 
</source>
 
</div>
 
 
 
<h2>[[Using a self join]]</h2>
 
<div class='qu'>
 
Give a list of all the services which connect stops 115 and 137 ('Haymarket' and 'Leith')     
 
<source lang='sql' class='def'>
 
</source>
 
 
 
<source lang='sql' class='ans'>
 
SELECT DISTINCT R1.company, R1.num
 
   FROM route R1, route R2
 
  WHERE R1.num=R2.num AND R1.company=R2.company
 
    AND R1.stop=115 AND R2.stop=137
 
</source>
 
</div>
 
 
 
<div class='qu'>
 
Give a list of the services which connect the '''stops''' 'Craiglockhart' and 'Tollcross'     
 
<source lang='sql' class='def'>
 
</source>
 
 
 
<source lang='sql' class='ans'>
 
SELECT R1.company, R1.num
 
  FROM route R1, route R2, stops S1, stops S2
 
  WHERE R1.num=R2.num AND R1.company=R2.company
 
    AND R1.stop=S1.id AND R2.stop=S2.id
 
    AND S1.name='Craiglockhart'
 
    AND S2.name='Tollcross'
 
</source>
 
</div>
 
 
 
<div class='qu'>
 
Give a list of the '''stops''' which may be reached from 'Craiglockhart' by taking one bus. Include the details of the appropriate service.       
 
<source lang='sql' class='def'>
 
</source>
 
 
 
<source lang='sql' class='ans'>
 
SELECT S2.id, S2.name, R2.company, R2.num
 
  FROM stops S1, stops S2, route R1, route R2
 
  WHERE S1.name='Craiglockhart'
 
    AND S1.id=R1.stop
 
    AND R1.company=R2.company AND R1.num=R2.num
 
    AND R2.stop=S2.id
 
</source>
 
</div>
 
 
 
<div class='qu'>
 
Try to show it is possible to get from Sighthill to Craiglockhart by changing lanes twice.
 
This DB doesn't make this easy!       
 
<source lang='sql' class='def'>
 
</source>
 
 
 
<source lang='sql' class='ans'>
 
select distinct a.name, c.name
 
from stops a JOIN route z ON a.id=z.stop
 
JOIN route y ON y.num = z.num
 
JOIN stops b ON y.stop=b.id
 
JOIN route x ON x.num = y.num
 
JOIN stops c ON c.id=x.stop
 
where a.name='Craiglockhart'
 
AND c.name ='Sighthill'
 
</source>
 
</div>
 
 
 
<div>
 
<div class="lsclear">Clear your results</div>
 
<p><div class="quizlink">[[Self join Quiz]]</div></p>
 
</div>
 

Revision as of 21:30, 7 May 2013

SELECT .. GROUP BY

Host cities and continents for the Olympics Games are stored in the table games. Notice that Europe appears in the table twice:

games
yrcitycontinent
2000SydneyAustralasia
2004AthensEurope
2008BeijingAsia
2019LondonEurope
schema:scott
 DROP TABLE games
 CREATE TABLE games(
  yr INTEGER,
  city VARCHAR(20),
  continent VARCHAR(20));
INSERT INTO games VALUES (2000,'Sydney','Australasia');
INSERT INTO games VALUES (2004,'Athens','Europe');
INSERT INTO games VALUES (2008,'Beijing','Asia');
INSERT INTO games VALUES (2012,'London','Europe');

In a GROUP BY statement only distinct values are shown for the column in the GROUP BY. This example shows the continents hosting the Olympics with the count of the number of games held.

SELECT continent, COUNT(yr) FROM scott.games
 GROUP BY continent
SELECT continent, COUNT(yr) FROM games
 GROUP BY continent

See also