1
00:00:00,506 --> 00:00:09,396
[ Silence ]

2
00:00:09,896 --> 00:00:10,896
>> All right.

3
00:00:10,966 --> 00:00:13,276
Welcome back to Computer
Science S-75.

4
00:00:13,276 --> 00:00:15,776
This is Lecture 5,
SQL, Continued.

5
00:00:15,986 --> 00:00:17,386
So this worked out
so well last time.

6
00:00:17,386 --> 00:00:18,726
I thought we could start
with the same question.

7
00:00:18,776 --> 00:00:19,596
What did we do last time?

8
00:00:20,106 --> 00:00:20,226
Yes.

9
00:00:21,896 --> 00:00:24,396
>> Posting into internet SQL.

10
00:00:24,446 --> 00:00:26,006
>> OK, good, so an internet SQL.

11
00:00:26,006 --> 00:00:27,456
So structured query language.

12
00:00:27,456 --> 00:00:29,936
Someone from this side
now, what is SQL about?

13
00:00:31,286 --> 00:00:31,746
Yeah, Jack [assumed spelling].

14
00:00:31,746 --> 00:00:34,716
>> It's a nice way that we
can interact with the data

15
00:00:34,716 --> 00:00:38,176
that we can store
in MySQL database.

16
00:00:38,276 --> 00:00:39,936
>> OK. Nice way to
interact with a data

17
00:00:39,936 --> 00:00:41,676
that you can store
in a MySQL database.

18
00:00:41,676 --> 00:00:43,366
And in what sense, Axel
[assumed spelling],

19
00:00:43,366 --> 00:00:45,756
do you interact with
the database?

20
00:00:46,566 --> 00:00:49,276
>> Well, you can interact
with it because it runs

21
00:00:49,276 --> 00:00:50,526
on a port on a server--

22
00:00:50,526 --> 00:00:50,706
>> OK.

23
00:00:50,816 --> 00:00:53,346
>> -- and essentially just
stop it via a terminal window--

24
00:00:53,436 --> 00:00:53,836
>> OK.

25
00:00:53,836 --> 00:00:57,706
>> -- type commands but there is
a GUI that's called PhpMyAdmin--

26
00:00:57,776 --> 00:00:58,006
>> OK.

27
00:00:58,006 --> 00:01:00,306
>> -- that we use
together with MySQL.

28
00:01:00,546 --> 00:01:01,236
>> OK, good.

29
00:01:01,236 --> 00:01:03,746
So we've seen at least a
couple of ways to interface

30
00:01:03,746 --> 00:01:04,806
with the MySQL database.

31
00:01:04,806 --> 00:01:07,466
You can use the command line
or the actual MySQL client,

32
00:01:07,466 --> 00:01:10,076
which is the sort of retro
black and white window,

33
00:01:10,356 --> 00:01:13,096
which is wonderfully useful
but also limited just in terms

34
00:01:13,096 --> 00:01:14,536
of its screen real state.

35
00:01:14,766 --> 00:01:17,696
We also looked at a tool called
PhpMyAdmin, which is popular.

36
00:01:17,696 --> 00:01:20,066
It's not the only tool but
it tends to be popular.

37
00:01:20,256 --> 00:01:21,736
It's a coincidence
for our purposes

38
00:01:21,736 --> 00:01:22,956
that it's actually
written in PHP.

39
00:01:22,956 --> 00:01:24,326
We haven't looked
underneath the hood

40
00:01:24,326 --> 00:01:26,256
but if you did start
poking around, you would see

41
00:01:26,256 --> 00:01:29,626
that that entire GUI is a
dynamic website implemented

42
00:01:29,626 --> 00:01:30,746
in PHP but there are others.

43
00:01:30,746 --> 00:01:31,776
If you're a Windows user,

44
00:01:31,776 --> 00:01:33,826
there's actually a
downloadable Windows client

45
00:01:33,826 --> 00:01:36,616
that you can use natively
on your own Windows computer

46
00:01:36,836 --> 00:01:40,496
and connect to, for instance,
a remote MySQL database.

47
00:01:40,816 --> 00:01:42,166
Generally, you cannot, though,

48
00:01:42,166 --> 00:01:44,806
connect too far away
MySQL databases.

49
00:01:44,806 --> 00:01:48,056
The reason being that
the communication is not,

50
00:01:48,426 --> 00:01:50,226
by default, encrypted in any way

51
00:01:50,226 --> 00:01:53,626
and indeed MySQL itself does not
really have very good built-in

52
00:01:53,626 --> 00:01:58,226
functionality to do the
equivalent of say HTTPS or SSL.

53
00:01:58,226 --> 00:02:01,496
So it's generally best practiced
to keep your MySQL database

54
00:02:01,496 --> 00:02:03,296
on the same network
as the machines

55
00:02:03,296 --> 00:02:04,536
that are actually talking to it.

56
00:02:04,856 --> 00:02:07,906
So to be more clear, you could
not generally sign up for

57
00:02:07,906 --> 00:02:11,856
like dreamhost.com, have MySQL
database somewhere out there

58
00:02:11,856 --> 00:02:14,226
on the internet and then
use like this Windows client

59
00:02:14,486 --> 00:02:16,536
to connect to it because
everything you've been sending,

60
00:02:16,536 --> 00:02:18,306
including your password,
in the clear.

61
00:02:18,306 --> 00:02:19,626
So be mindful of that.

62
00:02:19,966 --> 00:02:20,336
All right.

63
00:02:20,336 --> 00:02:21,746
So that's how you
interface with it.

64
00:02:21,886 --> 00:02:23,386
What kinds of queries

65
00:02:23,386 --> 00:02:26,186
or statements did we see
last time are possible?

66
00:02:26,766 --> 00:02:27,846
>> INSERT INTO.

67
00:02:28,036 --> 00:02:28,866
>> INSERT INTO.

68
00:02:28,866 --> 00:02:30,946
So this is for inserting
rows into a table.

69
00:02:30,946 --> 00:02:33,906
We call that MySQL databases
or relational databases,

70
00:02:33,906 --> 00:02:35,616
which means there
are rows and columns.

71
00:02:35,926 --> 00:02:37,916
What other statements
did we see?

72
00:02:37,916 --> 00:02:41,786
>> SELECT and then would
you want to select a star,

73
00:02:41,786 --> 00:02:44,726
a wildcard star or natural--

74
00:02:44,846 --> 00:02:46,136
>> Good. So we saw
SELECT queries

75
00:02:46,136 --> 00:02:48,536
like select star
from the table name.

76
00:02:48,536 --> 00:02:51,956
And that gives everything or
you can say select X, Y, Z,

77
00:02:52,186 --> 00:02:53,936
that will just give
you those three fields

78
00:02:53,936 --> 00:02:55,456
and then we introduced
the notion of a predicate,

79
00:02:55,456 --> 00:02:58,416
and a predicate lets you
filter that result set,

80
00:02:58,416 --> 00:03:02,486
much like XPath lets you filter
the node set that comes back.

81
00:03:02,486 --> 00:03:06,066
You can say where
user ID equals 1

82
00:03:06,066 --> 00:03:08,796
or where email address
equals JHarvard

83
00:03:08,796 --> 00:03:11,006
at something dot
something or the like.

84
00:03:11,006 --> 00:03:12,186
So you can filter your queries

85
00:03:12,186 --> 00:03:14,496
and you can even Boolean
and/or them together.

86
00:03:14,496 --> 00:03:17,776
So you can say give me
anyone whose gender is male

87
00:03:17,776 --> 00:03:19,026
or gender is female.

88
00:03:19,026 --> 00:03:20,256
So you can get back to everyone

89
00:03:20,256 --> 00:03:22,286
in that way even though
star would suffice

90
00:03:22,366 --> 00:03:23,156
in a case like that.

91
00:03:23,396 --> 00:03:25,656
So in short, we have this
ability to filter our results.

92
00:03:25,656 --> 00:03:26,366
So that's SELECT.

93
00:03:26,646 --> 00:03:27,636
We've talked about INSERT.

94
00:03:27,906 --> 00:03:28,366
There's DELETE.

95
00:03:28,366 --> 00:03:31,776
We haven't really used this
one yet but you can use DELETE

96
00:03:31,776 --> 00:03:34,186
to delete rows as you'll find.

97
00:03:34,486 --> 00:03:35,376
What's another?

98
00:03:35,676 --> 00:03:35,796
Yeah.

99
00:03:36,436 --> 00:03:37,096
>> UPDATE.

100
00:03:37,216 --> 00:03:37,816
>> UPDATE.

101
00:03:37,916 --> 00:03:41,026
So UPDATE is for actually
updating existing rows.

102
00:03:41,106 --> 00:03:43,576
This might make sense if a
user updates their password,

103
00:03:43,576 --> 00:03:46,346
their name, their phone number,
whatever the case may be.

104
00:03:46,656 --> 00:03:50,196
So, for those unfamiliar,
realize that this kind

105
00:03:50,196 --> 00:03:52,996
of database follows a
paradigm that has kind

106
00:03:52,996 --> 00:03:56,346
of a silly acronym associated
with it but handy to know.

107
00:03:56,986 --> 00:04:02,436
This is the-- literally, the
CRUD model for database queries.

108
00:04:02,436 --> 00:04:03,606
Anyone want to take a stab

109
00:04:03,606 --> 00:04:05,696
at guessing what
this acronym means?

110
00:04:06,356 --> 00:04:08,436
It's really kind of stupid.

111
00:04:10,536 --> 00:04:13,526
At least the acronym
but the capabilities

112
00:04:13,576 --> 00:04:15,996
that it's describing is
actually quite common.

113
00:04:16,066 --> 00:04:17,636
So C is for-- Yeah.

114
00:04:17,636 --> 00:04:22,476
>> It might be Create, Read and
then something and then Delete.

115
00:04:22,476 --> 00:04:22,686
>> So close.

116
00:04:22,686 --> 00:04:23,526
Yes. What's the something?

117
00:04:24,016 --> 00:04:24,486
What's the U?

118
00:04:24,486 --> 00:04:24,636
>> Update.

119
00:04:25,266 --> 00:04:25,676
>> Update.

120
00:04:25,676 --> 00:04:26,216
That's all it is.

121
00:04:26,216 --> 00:04:28,036
So Create, Read, Update, Delete.

122
00:04:28,256 --> 00:04:32,996
These are the incarnation
of this acronym

123
00:04:32,996 --> 00:04:36,896
in SQL is what we've already
said, create is insert,

124
00:04:37,536 --> 00:04:40,866
read is select, update is
update, and delete is delete.

125
00:04:41,056 --> 00:04:45,166
But you'll find that in other
context where you have this kind

126
00:04:45,216 --> 00:04:46,266
of expressive capabilities,

127
00:04:46,616 --> 00:04:49,116
that it follows this
so called CRUD model.

128
00:04:49,116 --> 00:04:51,206
So just FYI if you
come across that.

129
00:04:51,726 --> 00:04:52,266
All right.

130
00:04:52,266 --> 00:04:55,476
So, let's introduce a couple
of new features now of SQL

131
00:04:55,476 --> 00:04:57,236
and then we'll try
to take a stab

132
00:04:57,236 --> 00:04:59,086
at designing something
a little more complex

133
00:04:59,116 --> 00:05:00,766
than last weeks user's table.

134
00:05:01,016 --> 00:05:03,396
So recall that MySQL has
a whole bunch of types.

135
00:05:03,556 --> 00:05:07,146
And here we have on the top
left, string-related ones,

136
00:05:07,146 --> 00:05:09,726
then dates and times,
then numeric ones,

137
00:05:09,726 --> 00:05:11,896
then some floating
point values and blobs.

138
00:05:12,056 --> 00:05:14,476
Let see if we can pluck
off a few these interesting

139
00:05:14,476 --> 00:05:15,976
characteristics of
each of these.

140
00:05:16,336 --> 00:05:20,106
So when might you use a
VARCHAR field in a table?

141
00:05:20,106 --> 00:05:23,156
How about someone
from this side again?

142
00:05:24,306 --> 00:05:26,416
VARCHAR, when should
you go for a VARCHAR?

143
00:05:26,416 --> 00:05:26,886
Yeah, Louis [assumed spelling].

144
00:05:26,956 --> 00:05:27,516
>> Middle initial.

145
00:05:27,906 --> 00:05:29,116
>> OK, so a middle initial.

146
00:05:29,116 --> 00:05:29,366
Why?

147
00:05:29,366 --> 00:05:32,516
>> Because you'll know like the
length of it and see the space.

148
00:05:32,546 --> 00:05:33,076
>> OK, good.

149
00:05:33,076 --> 00:05:34,826
So it's probably
just one character,

150
00:05:34,826 --> 00:05:37,026
maybe two or maybe zero.

151
00:05:37,026 --> 00:05:39,366
And so, if that's the case,
you might not necessarily want

152
00:05:39,366 --> 00:05:41,086
to waste space unnecessarily.

153
00:05:41,406 --> 00:05:44,046
So initials, not necessarily
a very compelling case

154
00:05:44,046 --> 00:05:45,536
because we're only
talking one or two bytes.

155
00:05:45,536 --> 00:05:47,936
So what's another scenario
that's a little more compelling

156
00:05:47,936 --> 00:05:48,566
for a VARCHAR?

157
00:05:48,636 --> 00:05:50,846
>> When you use it when
you do not know the length

158
00:05:52,566 --> 00:05:53,956
of the thing that user inputs.

159
00:05:53,996 --> 00:05:54,126
>> Yes.

160
00:05:54,126 --> 00:05:55,046
>> Username or name.

161
00:05:55,046 --> 00:05:57,666
>> Good. So for something where
you really don't know the length

162
00:05:57,666 --> 00:06:00,366
and an initial, I mean,
most of the time, it's one

163
00:06:00,366 --> 00:06:02,376
or zero maybe two,
but I'm hard-pressed

164
00:06:02,376 --> 00:06:03,556
to even think of
a two-letter one.

165
00:06:03,846 --> 00:06:06,266
So for something that's a
little more variable by nature,

166
00:06:06,266 --> 00:06:09,766
email address or name, then
a VARCHAR might be reasonable

167
00:06:09,766 --> 00:06:12,066
because then you can
say anyone's name can be

168
00:06:12,066 --> 00:06:16,136
up to 30 characters but frankly
most people's names are a few

169
00:06:16,136 --> 00:06:18,446
characters, 10 maybe, if that.

170
00:06:18,776 --> 00:06:21,946
So, why waste 22 or so
additional characters,

171
00:06:21,996 --> 00:06:25,206
bytes in your database,
storing them unnecessarily

172
00:06:25,206 --> 00:06:26,236
if do not need them.

173
00:06:26,236 --> 00:06:27,756
So there's a price we
pay, though, for this.

174
00:06:28,896 --> 00:06:32,096
What's the downside of
saying variable length CHARs

175
00:06:32,096 --> 00:06:33,326
as opposed to just a CHAR field?

176
00:06:33,326 --> 00:06:33,393
Yeah.

177
00:06:33,393 --> 00:06:37,536
>> Searching is slower if
it's not on the same size.

178
00:06:37,536 --> 00:06:40,556
>> Yeah. So searching is
most likely slower even

179
00:06:40,556 --> 00:06:42,496
if it's imperceptible
for certain cases.

180
00:06:42,496 --> 00:06:45,176
In large scale scenarios,
we have many, many,

181
00:06:45,176 --> 00:06:48,616
many rows of data in your
database and you actually want

182
00:06:48,616 --> 00:06:49,996
to search those rows
efficiently.

183
00:06:50,166 --> 00:06:51,806
One of the upsides
of using a CHAR,

184
00:06:51,806 --> 00:06:53,696
which by contrast is
a fixed length field,

185
00:06:53,886 --> 00:06:57,036
is that you can generally search
that field more efficiently

186
00:06:57,036 --> 00:06:59,826
because you don't have sort
of this variable length

187
00:07:00,776 --> 00:07:04,956
to your cells in that
table so to speak,

188
00:07:05,086 --> 00:07:06,376
rather everything
is fixed length.

189
00:07:06,426 --> 00:07:09,506
So just like an array allows you
random access, similarly might

190
00:07:09,506 --> 00:07:13,736
to be able to leverage the idea
of random access in your column

191
00:07:13,736 --> 00:07:17,606
if all of the lengths are
identical, eight bytes,

192
00:07:17,656 --> 00:07:18,636
six bytes, sort of the like.

193
00:07:18,916 --> 00:07:20,816
The downside of course
of a fixed length is

194
00:07:20,816 --> 00:07:22,866
that if your middle
initial is just one letter

195
00:07:22,866 --> 00:07:24,736
or if your email
address is fewer

196
00:07:24,736 --> 00:07:26,756
than the fixed length
you've come up with, well,

197
00:07:26,756 --> 00:07:27,986
then you're just wasting space.

198
00:07:28,236 --> 00:07:30,236
So it's a tradeoff, we just
do not get anything for free.

199
00:07:30,446 --> 00:07:31,926
What about text fields?

200
00:07:32,306 --> 00:07:34,126
Why did we introduce those?

201
00:07:34,406 --> 00:07:35,206
When are they useful?

202
00:07:35,726 --> 00:07:35,816
Yeah.

203
00:07:37,366 --> 00:07:41,446
>> I think text was
for like longer texts.

204
00:07:41,576 --> 00:07:44,166
>> OK, so longer text.

205
00:07:51,416 --> 00:07:56,156
>> That were much bigger than
what you would typically--

206
00:07:56,416 --> 00:07:58,586
say the length of the VARCHAR,

207
00:07:58,586 --> 00:08:01,886
some text would be very
much bigger than VARCHAR.

208
00:08:01,886 --> 00:08:06,356
>> Good. So much bigger
than a VARCHAR, typically.

209
00:08:06,406 --> 00:08:08,236
So if it's like in essay fields.

210
00:08:08,236 --> 00:08:12,256
So if it's like a college
application kind of thing

211
00:08:12,256 --> 00:08:14,836
where you want people
to upload big blob--

212
00:08:14,836 --> 00:08:18,846
actually, college essays
are what, like 500 words--

213
00:08:18,846 --> 00:08:21,576
So probably not a
good scenario there,

214
00:08:21,866 --> 00:08:28,346
but if it's a much larger corpus
of text, maybe it's the contents

215
00:08:28,346 --> 00:08:30,756
of a web page that you
screen scraped and you want

216
00:08:30,756 --> 00:08:35,166
to store all of that data
in just a big chunk of text,

217
00:08:35,166 --> 00:08:37,536
maybe you would use
a text field.

218
00:08:37,536 --> 00:08:41,766
And one of the upsides of that,
as we'll probably encounter

219
00:08:41,816 --> 00:08:44,526
in PhpMyAdmin again tonight,
is that you can specify

220
00:08:44,526 --> 00:08:49,656
that you want to have a
full text index on it,

221
00:08:49,656 --> 00:08:53,246
which means you can use
Google-like queries to search

222
00:08:53,246 --> 00:08:56,096
for keywords in that field,

223
00:08:56,096 --> 00:08:58,366
and you don't get
those same capabilities

224
00:08:58,366 --> 00:09:00,176
in for instance a
VARCHAR or a CHAR itself.

225
00:09:00,176 --> 00:09:01,226
So what does this mean?

226
00:09:01,226 --> 00:09:04,056
Well, typically, when you're
using a VARCHAR or a CHAR,

227
00:09:04,056 --> 00:09:08,236
you can say something like
this, SELECT star FROM users

228
00:09:08,236 --> 00:09:10,076
WHERE what might be reasonable.

229
00:09:10,076 --> 00:09:12,826
How about let's say WHERE
citystate-- oops, autocomplete--

230
00:09:12,906 --> 00:09:17,066
WHERE citystate LIKE, and now
I do not necessarily know what

231
00:09:17,346 --> 00:09:20,356
their state is going be but I
want to search for all states,

232
00:09:20,406 --> 00:09:23,416
let's say containing the
word let's say "New".

233
00:09:23,506 --> 00:09:27,936
I do not know why I want to
do this but I want things

234
00:09:27,936 --> 00:09:29,826
like New York, New London.

235
00:09:29,876 --> 00:09:32,726
I want to get back all
of the rows in the table

236
00:09:32,936 --> 00:09:35,616
where people are from towns
whose names contain the word

237
00:09:35,616 --> 00:09:35,683
"new".

238
00:09:35,683 --> 00:09:39,366
It's completely arbitrary,
this particular example,

239
00:09:39,366 --> 00:09:40,346
but how can I express this?

240
00:09:40,376 --> 00:09:41,036
Well, there's the LIKE operator

241
00:09:41,066 --> 00:09:42,296
in SQL whereby you can
pattern-match, and the fact

242
00:09:42,326 --> 00:09:43,346
that I've used this
percent signs happens

243
00:09:43,376 --> 00:09:44,996
to be the wildcard character
in the context of a predicate.

244
00:09:45,026 --> 00:09:46,046
So in this context,
it's not the star,

245
00:09:46,076 --> 00:09:47,306
it's in fact the percent
sign but this allows me

246
00:09:47,336 --> 00:09:48,926
to do pattern matching inside of
a CHAR field or a VARCHAR field.

247
00:09:48,956 --> 00:09:50,366
However, if I instead make
something a text field then I

248
00:09:50,396 --> 00:09:51,746
can actually use a more
powerful matching expression

249
00:09:51,776 --> 00:09:52,826
that allows me to say,
"Give me all the things

250
00:09:52,856 --> 00:09:53,726
that match this keyword
and this one,"

251
00:09:53,756 --> 00:09:55,286
and then your database engine
can even rank them in terms

252
00:09:55,316 --> 00:09:56,306
of relevance which
might be advantageous

253
00:09:56,336 --> 00:09:57,086
if you're building
a search engine

254
00:09:57,116 --> 00:09:57,746
for your site or the like.

255
00:09:57,776 --> 00:09:58,976
But to do that it needs
to be a text field.

256
00:09:59,096 --> 00:10:01,556
This is probably--
typically the more common one

257
00:10:01,556 --> 00:10:02,896
when you're just trying
to look up queries.

258
00:10:02,896 --> 00:10:05,156
And you'll see this
perhaps in MySQL

259
00:10:05,156 --> 00:10:07,776
if you actually leverage
its search tab,

260
00:10:07,776 --> 00:10:08,836
which we didn't use last week,

261
00:10:09,166 --> 00:10:11,506
but you'll see it among
the various options.

262
00:10:12,276 --> 00:10:12,636
All right.

263
00:10:12,886 --> 00:10:16,426
So, dates and date
times and years.

264
00:10:16,806 --> 00:10:19,786
Why use something like a date
or date time or year field

265
00:10:19,786 --> 00:10:22,076
when clearly, you could
implement this yourself

266
00:10:22,176 --> 00:10:24,226
with just CHAR fields
or VARCHARs?

267
00:10:24,226 --> 00:10:24,696
What's the point?

268
00:10:24,696 --> 00:10:24,763
Yeah?

269
00:10:24,763 --> 00:10:26,526
>> It's probably
more standardized

270
00:10:26,836 --> 00:10:31,396
and may be more efficient.

271
00:10:31,466 --> 00:10:32,796
>> Good. So more standardized

272
00:10:32,796 --> 00:10:34,656
and arguably could be
made more efficient.

273
00:10:34,656 --> 00:10:37,916
For instance, to
represent a year, we humans,

274
00:10:37,916 --> 00:10:40,216
we typically write
out four characters

275
00:10:40,216 --> 00:10:42,836
but maybe a computer can do
that a little more efficiently

276
00:10:42,836 --> 00:10:45,836
by storing the number of
years from some offset, 1970.

277
00:10:45,926 --> 00:10:47,066
Now, that would be
dumb in this case

278
00:10:47,066 --> 00:10:49,646
because then you can never
store other years but you--

279
00:10:49,646 --> 00:10:51,246
stands the reason that
the computer could come

280
00:10:51,246 --> 00:10:53,216
up a more efficient
representation

281
00:10:53,216 --> 00:10:55,456
than you plus it is
in deed standardized.

282
00:10:55,646 --> 00:10:58,456
You also have the ability
in MySQL to call functions

283
00:10:58,556 --> 00:11:00,956
that are built in to the
database engine itself.

284
00:11:00,956 --> 00:11:03,546
So, you can manipulate things
like dates and times just

285
00:11:03,546 --> 00:11:06,176
like you can actually do in
PHP and in other languages.

286
00:11:06,506 --> 00:11:09,306
But if you ask for a date,
you can get it back in--

287
00:11:09,376 --> 00:11:12,126
a date time rather, you get it
back in your preferred format,

288
00:11:12,126 --> 00:11:14,256
maybe it's 24-hour
time, 12-hour time,

289
00:11:14,546 --> 00:11:16,126
you get that kind
of flexibility.

290
00:11:16,296 --> 00:11:17,456
And so it's standardized

291
00:11:17,456 --> 00:11:19,836
and it's better typed
inside of your database.

292
00:11:19,836 --> 00:11:21,846
So, it's not just
some free form string

293
00:11:22,076 --> 00:11:24,146
that you're trusting
yourself and other developers

294
00:11:24,146 --> 00:11:26,066
to actually insert reliably.

295
00:11:26,346 --> 00:11:29,246
And in term of numbers,
TINYINT SMALLINT MEDIUMINT,

296
00:11:29,246 --> 00:11:30,136
INT, and BIGINT.

297
00:11:30,416 --> 00:11:32,206
So INT is pretty common to use.

298
00:11:32,276 --> 00:11:35,736
Facebook, for instance, use
INTs early on for its IDs.

299
00:11:35,736 --> 00:11:37,326
But what's the downside
of using an INT

300
00:11:37,326 --> 00:11:38,506
for things like Facebook IDs?

301
00:11:38,506 --> 00:11:38,656
Yeah?

302
00:11:38,656 --> 00:11:44,836
>> You eventually get a lot
of them might run out of the--

303
00:11:44,836 --> 00:11:48,446
what you can store in it.

304
00:11:48,716 --> 00:11:49,506
>> Yeah, exactly.

305
00:11:49,506 --> 00:11:51,376
So there is a finite
number of INTS,

306
00:11:51,376 --> 00:11:54,416
which in this case are 32-bit
values, and if they're unsigned,

307
00:11:54,416 --> 00:11:56,566
you have as many
as 4 billion values

308
00:11:56,566 --> 00:11:58,346
which frankly would be
a good problem to have

309
00:11:58,346 --> 00:12:00,576
if you site has more
than 4 billion users,

310
00:12:00,576 --> 00:12:02,906
and therefore you've overflowed
in INT, but you got to think

311
00:12:02,906 --> 00:12:04,966
about and in Facebook's
case, people are signing

312
00:12:04,966 --> 00:12:07,176
up for fake accounts
and they're already had

313
00:12:07,206 --> 00:12:09,176
like 800 million
legitimate users.

314
00:12:09,176 --> 00:12:10,686
So, they're getting
up there anyway,

315
00:12:10,976 --> 00:12:13,766
and so it was definitely
a scenario

316
00:12:13,766 --> 00:12:16,226
in which they would
not have wanted

317
00:12:16,226 --> 00:12:19,076
to use 32-bit INTs long-term
just because you're fixed

318
00:12:19,076 --> 00:12:23,066
in the maximum number values
that you can use there.

319
00:12:23,276 --> 00:12:25,426
So, frankly, these days,
if you're doing anything

320
00:12:25,426 --> 00:12:27,676
where you anticipate
a huge number of rows,

321
00:12:28,046 --> 00:12:29,666
it's not unreasonable
to use BIGINT,

322
00:12:29,806 --> 00:12:32,966
which would be a 64-bit value
and what kinds of scenarios,

323
00:12:32,966 --> 00:12:34,916
well, probably not
users, you don't need 2

324
00:12:34,916 --> 00:12:38,396
to the 64 possible users
in your system most likely.

325
00:12:38,726 --> 00:12:41,706
But if you're writing
things out like log files

326
00:12:41,976 --> 00:12:44,566
or you're recording
transactions of some sort

327
00:12:44,566 --> 00:12:47,096
where you have no idea how many
widgets you're going to sell

328
00:12:47,096 --> 00:12:49,986
in your e-commerce site, or
you just don't want to have

329
00:12:49,986 --> 00:12:53,976
to bound yourself, it's probably
worth spending an extra 32 bits

330
00:12:53,976 --> 00:12:56,456
just to ensure that you'll
have a higher probability

331
00:12:56,456 --> 00:12:57,826
of uniqueness long-term,

332
00:12:58,046 --> 00:13:00,356
especially when consider
the auto_increment feature.

333
00:13:00,356 --> 00:13:02,406
And this is kind of one of
the little downsides of it.

334
00:13:02,406 --> 00:13:05,236
What was the auto_increment
feature of a numeric field?

335
00:13:05,326 --> 00:13:05,436
Yeah?

336
00:13:05,926 --> 00:13:09,186
>> If you add auto_increment,
it's automatically going to add

337
00:13:09,186 --> 00:13:11,126
into itself, so you're
going to have, say,

338
00:13:11,126 --> 00:13:12,686
an automatic unique ID--

339
00:13:12,686 --> 00:13:13,116
>> Exactly.

340
00:13:13,776 --> 00:13:15,686
>> -- so you don't have
to query the database

341
00:13:15,756 --> 00:13:18,016
with the largest number,
take it back to the server,

342
00:13:18,066 --> 00:13:19,456
plus one, and then INSERT.

343
00:13:19,716 --> 00:13:22,096
>> Exactly, so with
auto_increment, by contrast,

344
00:13:22,096 --> 00:13:24,716
you can tell the
database, this field here--

345
00:13:24,826 --> 00:13:28,216
let's call it ID as we did with
the user's table last week.

346
00:13:28,366 --> 00:13:30,546
You can just say, this should
be number and it should start

347
00:13:30,546 --> 00:13:32,606
at one and it should
forever auto_increment

348
00:13:32,606 --> 00:13:34,546
for each additional
insertion that I do.

349
00:13:34,786 --> 00:13:36,736
And that's fine and you
can get up to 2 billion

350
00:13:36,736 --> 00:13:39,726
with signed numbers, 4 billion
with unsigned, or 2 to the 64

351
00:13:39,726 --> 00:13:41,456
with unsigned 64-bit numbers.

352
00:13:41,836 --> 00:13:44,916
But, what you don't get is
the ability to reuse numbers.

353
00:13:44,916 --> 00:13:47,706
So, if you delete, for instance,
a user from that table,

354
00:13:47,946 --> 00:13:50,466
their number is not
going to get reused.

355
00:13:50,466 --> 00:13:52,996
So you could end up getting
sparsely-populated data,

356
00:13:52,996 --> 00:13:55,786
which isn't necessarily a bad
thing, but that just means

357
00:13:55,786 --> 00:13:58,586
that even though you might
not have 4 billion users,

358
00:13:58,586 --> 00:14:01,436
you might have used 4 billion
IDs, it just so happens

359
00:14:01,436 --> 00:14:03,666
that a bunch of those people
deleted their accounts.

360
00:14:03,666 --> 00:14:06,886
Now, you could go through and
change around people's IDs

361
00:14:07,096 --> 00:14:08,636
but if-- I mean, if
you do use Facebook,

362
00:14:08,636 --> 00:14:10,156
just imagine how many
thing would break

363
00:14:10,156 --> 00:14:13,436
if you started changing people's
IDs, any URLs people had copied

364
00:14:13,436 --> 00:14:16,196
and pasted around the world
containing a user ID would now

365
00:14:16,196 --> 00:14:19,206
be invalid if your re--
changing people IDs around.

366
00:14:19,206 --> 00:14:20,536
So that's probably a bad idea.

367
00:14:20,916 --> 00:14:23,496
And frankly, it's just
expensive to go back and figure

368
00:14:23,496 --> 00:14:25,066
out where are the
holes in your data,

369
00:14:25,066 --> 00:14:28,316
so most databases just kind of
forge ahead blindly to higher

370
00:14:28,316 --> 00:14:30,676
and higher numbers so at least
you have more wiggle room

371
00:14:30,796 --> 00:14:32,496
with something like
a 64-bit value.

372
00:14:32,596 --> 00:14:33,796
So those are some
of the gotchas.

373
00:14:33,796 --> 00:14:37,246
All right, so floating point
numbers doubles are what you

374
00:14:37,246 --> 00:14:38,836
would expect in most languages.

375
00:14:38,836 --> 00:14:41,966
Decimal is an interesting one
and it will come up probably

376
00:14:41,966 --> 00:14:44,476
in the context for you of
project one which will talk

377
00:14:44,476 --> 00:14:48,026
about tonight, where you want o
represent money somehow, dollars

378
00:14:48,026 --> 00:14:49,996
and cents for at
least US currency.

379
00:14:50,316 --> 00:14:52,706
What's the problem with
representing money in general

380
00:14:52,706 --> 00:14:56,006
with a floating point value
in any language really, yeah?

381
00:14:56,136 --> 00:14:59,006
>> If you ever want to display
it and it has a cent value

382
00:14:59,006 --> 00:15:03,856
that runs like 90 cents or 80
cents, it cuts off the last zero

383
00:15:03,856 --> 00:15:06,706
and it just looks like $12.9
or something like that.

384
00:15:06,706 --> 00:15:07,446
>> OK. So, sure.

385
00:15:07,446 --> 00:15:09,536
So there is this
aesthetic issue involved

386
00:15:09,536 --> 00:15:12,406
in floating point numbers where
if the amount is a dollar 90,

387
00:15:12,526 --> 00:15:14,166
as we would see it on a store,

388
00:15:14,456 --> 00:15:16,456
well the zero is not
strictly necessary

389
00:15:16,456 --> 00:15:17,836
for that number's
representation.

390
00:15:17,836 --> 00:15:19,306
So a computer, by default,

391
00:15:19,306 --> 00:15:22,616
might just do dollar
sign 1.9 and that's it.

392
00:15:22,616 --> 00:15:24,336
So that problem we
can fix pretty easily

393
00:15:24,366 --> 00:15:28,426
by just formatting it as a
correct string with two digits.

394
00:15:28,646 --> 00:15:30,946
But there's an underlying
problem whenever you use

395
00:15:30,986 --> 00:15:34,136
floating point values,
whether it's a float or double,

396
00:15:34,136 --> 00:15:37,936
32-bit or 64-bit real number.

397
00:15:38,726 --> 00:15:43,126
What's the problem?

398
00:15:43,236 --> 00:15:46,176
Why should you not store
money as floats or doubles?

399
00:15:46,676 --> 00:15:48,766
Yeah, Jack.

400
00:15:49,056 --> 00:15:50,126
>> Rounding issues.

401
00:15:50,656 --> 00:15:55,056
>> Rounding issues,
what do you mean?

402
00:15:55,516 --> 00:16:01,816
>> It's like-- from my
experience with floats

403
00:16:02,066 --> 00:16:09,766
and doubles, occasionally
if you try to store money in

404
00:16:09,766 --> 00:16:14,166
or any value in, and you want to
get down to the exact value back

405
00:16:14,166 --> 00:16:20,036
in the end, you're going to end
up with 0.999 and something--

406
00:16:20,036 --> 00:16:20,706
>> Good. Exactly.

407
00:16:20,706 --> 00:16:23,926
In fact, there is an
infinite number of numbers

408
00:16:24,016 --> 00:16:28,636
that you cannot represent
precisely using a floating point

409
00:16:28,636 --> 00:16:32,846
value, whether it's a 32-bit
float or a 64-bit double,

410
00:16:33,366 --> 00:16:35,386
and you can think
of it is follows,

411
00:16:35,386 --> 00:16:37,206
if you have a 32-bit value
or even a bigger one,

412
00:16:37,206 --> 00:16:40,546
a 64-bit value, there's still
a finite number of numbers

413
00:16:40,546 --> 00:16:43,566
that you can represent
with those bits.

414
00:16:43,566 --> 00:16:47,296
However, how many real
numbers are there in the world,

415
00:16:47,416 --> 00:16:50,406
where a real number, in
this case, is something

416
00:16:50,406 --> 00:16:54,186
with a decimal point and a
number before and after it?

417
00:16:54,186 --> 00:16:56,896
How many of those are there?

418
00:16:56,896 --> 00:16:57,176
>> Infinite.

419
00:16:57,346 --> 00:16:58,396
>> So there's an
infinite, right?

420
00:16:58,396 --> 00:17:03,256
But if you only have a
finite number of ways

421
00:17:03,806 --> 00:17:07,746
of representing real
numbers in a computer,

422
00:17:07,816 --> 00:17:09,546
as you do with the
float or double,

423
00:17:09,546 --> 00:17:12,696
but there's an infinite number
of them, you're going to have

424
00:17:12,696 --> 00:17:15,366
to round or cut corner
somewhere.

425
00:17:15,366 --> 00:17:20,596
So the problem of representing
something important, like money,

426
00:17:20,596 --> 00:17:24,326
using floating point values,
is that you might be trying

427
00:17:24,416 --> 00:17:26,716
to represent a dollar 90.

428
00:17:26,716 --> 00:17:31,936
But you know what, the closest
the computer can actually get

429
00:17:32,076 --> 00:17:44,016
using its bits is $1.89
cents or $1.89.9999 cents,

430
00:17:44,016 --> 00:17:45,196
which is not exactly right.

431
00:17:45,196 --> 00:17:50,056
So, how many of you have ever
seen, "Office Space" the movie?

432
00:17:50,056 --> 00:17:51,236
Just you and me?

433
00:17:51,236 --> 00:17:52,216
Just Louis and me?

434
00:17:52,216 --> 00:17:56,036
Or "Superman 3",
even less likely?

435
00:17:56,036 --> 00:17:56,766
Chris [assumed spelling]
saw that one as well.

436
00:17:56,766 --> 00:17:59,996
OK. So, if you haven't, this is
a good excuse to watch a movie.

437
00:17:59,996 --> 00:18:01,156
You can say it's for class.

438
00:18:01,156 --> 00:18:04,336
Watch the movie, "Office Space",
which is the funnier of the two.

439
00:18:04,336 --> 00:18:05,506
And there is a scene
in this movie--

440
00:18:05,506 --> 00:18:08,846
this is not a spoiler at all,
but there is a scene in this--

441
00:18:08,846 --> 00:18:10,726
it's kind of a spoiler,
but you--

442
00:18:10,726 --> 00:18:11,736
by now, you should
have seen this.

443
00:18:11,736 --> 00:18:14,086
It came out like 10 years ago.

444
00:18:14,476 --> 00:18:16,056
So, there is a scene
in the movie

445
00:18:16,056 --> 00:18:20,106
where these guys are trying to
steal money from their company

446
00:18:20,106 --> 00:18:26,816
because the company's
computers are not taking

447
00:18:26,866 --> 00:18:30,166
into account this kind
of imprecision of values.

448
00:18:30,166 --> 00:18:31,446
So, essentially,
they write a program

449
00:18:31,446 --> 00:18:35,226
that says anytime there is an
amount, like a dollar point 89

450
00:18:35,406 --> 00:18:39,586
or rather, let's say
a dollar 90.00001,

451
00:18:40,386 --> 00:18:43,416
which would normally be shown
as just a dollar and 90.

452
00:18:43,416 --> 00:18:45,826
There're still some fractions
of cents there, right?

453
00:18:45,826 --> 00:18:50,146
There's an opportunity to
steal those fractions of cents

454
00:18:50,146 --> 00:18:52,176
but the company would never
know because all they see

455
00:18:52,176 --> 00:18:53,886
on their paperwork is 1.90.

456
00:18:53,886 --> 00:18:57,406
So, they write a program to
steal all of those fractions

457
00:18:57,406 --> 00:19:00,036
of pennies and hilarity
ensues as a result.

458
00:19:00,036 --> 00:19:03,816
So a real world example of
floating point imprecision.

459
00:19:03,896 --> 00:19:05,496
Excellent movie called
"Office Space."

460
00:19:05,496 --> 00:19:05,746
All right.

461
00:19:05,746 --> 00:19:08,516
So, blobs, we haven't used them,
not really going to use them,

462
00:19:08,516 --> 00:19:10,246
but what could they be used for?

463
00:19:10,246 --> 00:19:10,313
Yeah.

464
00:19:10,313 --> 00:19:10,486
>> Images.

465
00:19:10,486 --> 00:19:10,876
>> Images, yes.

466
00:19:10,876 --> 00:19:14,016
So, binary objects of some
sort, binary large objects.

467
00:19:14,016 --> 00:19:16,636
But again, I would generally
recommend that if you need

468
00:19:16,886 --> 00:19:19,746
to store binary data, you store
in a folder on the file system,

469
00:19:19,746 --> 00:19:21,886
and then store in your
database instead what?

470
00:19:21,886 --> 00:19:21,986
Yeah.

471
00:19:21,986 --> 00:19:22,196
>> Path.

472
00:19:22,196 --> 00:19:22,736
>> Just the path.

473
00:19:22,736 --> 00:19:26,296
The file name or the path to it,
whatever directory it happens

474
00:19:26,326 --> 00:19:27,736
to be stored and plus its
filename, something like that.

475
00:19:27,766 --> 00:19:27,946
All right.

476
00:19:27,976 --> 00:19:29,386
So, here are just a couple
of functions that are germane

477
00:19:29,416 --> 00:19:30,376
to the data types
built in to MySQL.

478
00:19:30,406 --> 00:19:31,606
If you go to that URL there,
you'll see a whole bunch

479
00:19:31,636 --> 00:19:32,806
of others that are built
in, things like averaging

480
00:19:32,836 --> 00:19:33,496
and summation and the like.

481
00:19:33,526 --> 00:19:34,486
But here's just two
very common ones,

482
00:19:34,516 --> 00:19:35,446
date formatting and
time formatting.

483
00:19:35,476 --> 00:19:36,736
So long as you have stored
a date or time in one

484
00:19:36,766 --> 00:19:37,936
of those date time-related
fields we looked at,

485
00:19:37,966 --> 00:19:39,136
literally date, time, date,
time, year and so forth,

486
00:19:39,166 --> 00:19:40,336
you can call this
function in your SQL query

487
00:19:40,366 --> 00:19:41,596
to get back a different
representation of that.

488
00:19:41,626 --> 00:19:42,466
So, how does this
work in practice?

489
00:19:42,496 --> 00:19:43,996
Well, instead of doing something
like SELECT sales date From--

490
00:19:44,026 --> 00:19:45,436
OK, auto complete, just going
to keep doing this to us.

491
00:19:45,466 --> 00:19:46,306
SELECT salesdate FROM mytable.

492
00:19:46,336 --> 00:19:47,626
If instead, you want to not
get back something like--

493
00:19:47,656 --> 00:19:48,676
and actually let me
change it to this.

494
00:19:48,706 --> 00:19:49,156
If you don't want

495
00:19:49,186 --> 00:19:50,506
to get something back that's
not very user-friendly

496
00:19:50,536 --> 00:19:52,996
like 2012-07-01 at 23 hours,
1 minute, and 23 seconds,

497
00:19:53,026 --> 00:19:54,106
like this is what will
come back by default,

498
00:19:54,136 --> 00:19:54,976
that's not a very user friendly.

499
00:19:55,066 --> 00:19:57,866
You could get that back and
then use something like PHP

500
00:19:57,866 --> 00:20:00,636
to massage it into something
a little more user-friendly

501
00:20:00,946 --> 00:20:09,426
or you can actually say
DATE_FORMAT, passing this in,

502
00:20:09,626 --> 00:20:13,426
and then passing in
something like Y-M-D

503
00:20:13,426 --> 00:20:18,256
to just get back the month, or
you could something, like M/D,

504
00:20:18,256 --> 00:20:21,196
to just get back
month/day and so forth.

505
00:20:21,196 --> 00:20:23,016
And you can do something
similar with--

506
00:20:23,016 --> 00:20:26,966
you can do something
similar with times as well.

507
00:20:26,966 --> 00:20:27,746
All right.

508
00:20:28,896 --> 00:20:32,006
So, just a couple of
functions but realize

509
00:20:32,006 --> 00:20:33,236
that functionality exists.

510
00:20:33,776 --> 00:20:35,396
But now, let's take a look

511
00:20:35,396 --> 00:20:37,906
at one other fundamentally
different approach

512
00:20:37,906 --> 00:20:39,256
from last time.

513
00:20:39,256 --> 00:20:41,106
So, here's a snippet
of representative code

514
00:20:41,426 --> 00:20:44,366
that actually doesn't do all
that much besides connecting

515
00:20:44,366 --> 00:20:46,566
to a database, but uses
different functions

516
00:20:46,566 --> 00:20:47,706
than we used last time.

517
00:20:48,106 --> 00:20:50,536
What were the functions
we used last week

518
00:20:50,536 --> 00:20:54,366
to implement our various
log-in examples to connect

519
00:20:54,366 --> 00:20:57,646
to a database and select
the database and so forth?

520
00:20:58,496 --> 00:20:58,596
Yeah.

521
00:20:58,756 --> 00:21:02,066
>> You do MySQL and
then Connect.

522
00:21:02,066 --> 00:21:03,186
>> Good, so mysql_connect.

523
00:21:03,396 --> 00:21:07,596
Then there is mysql_select_db
for selecting a database.

524
00:21:07,596 --> 00:21:11,006
Then there is mysql_query
for querying the database.

525
00:21:11,116 --> 00:21:12,436
And there's a whole
bunch of others.

526
00:21:12,436 --> 00:21:15,606
In fact, if we look this
up in the documentation,

527
00:21:15,606 --> 00:21:17,266
let me go ahead and
pull up a browser.

528
00:21:17,466 --> 00:21:20,436
Let me go ahead and
search for MySQL PHP.

529
00:21:20,436 --> 00:21:23,856
That will pull up the
manual page on php.net

530
00:21:23,886 --> 00:21:26,936
that has a listing of all of
the various functions here,

531
00:21:27,406 --> 00:21:30,096
and the ones we're are
about to get back here,

532
00:21:30,606 --> 00:21:38,586
albeit a little slowly every
time we try to do this.

533
00:21:40,156 --> 00:21:40,846
There we go.

534
00:21:41,456 --> 00:21:42,056
All right.

535
00:21:42,056 --> 00:21:45,606
So, you can see here a list of
most of the MySQL functions.

536
00:21:45,606 --> 00:21:46,646
The rest are cut off.

537
00:21:46,646 --> 00:21:48,246
And there's more here
than we probably are going

538
00:21:48,246 --> 00:21:49,276
to care to ever use.

539
00:21:49,276 --> 00:21:52,156
But mysql_connect is
up there at the top.

540
00:21:52,556 --> 00:21:55,006
Mysql_fetch_assoc is there.

541
00:21:55,006 --> 00:21:56,896
There is another one,
mysql_fetch_array,

542
00:21:56,896 --> 00:21:57,796
which allows you to get back

543
00:21:57,796 --> 00:22:00,016
in numeric array if
you would prefer.

544
00:22:00,256 --> 00:22:02,066
There's mysql_fetch_object
if you're familiar

545
00:22:02,066 --> 00:22:03,486
with object-oriented
programming.

546
00:22:03,486 --> 00:22:06,786
You can actually have the
database, or rather the library,

547
00:22:07,036 --> 00:22:11,156
return to you a PHP object
inside of which are properties

548
00:22:11,156 --> 00:22:13,346
that represent the
various cells from a row.

549
00:22:13,526 --> 00:22:14,866
And a whole bunch
of other ones here.

550
00:22:14,866 --> 00:22:16,326
There's mysql_num_rows.

551
00:22:16,326 --> 00:22:18,506
We use that to figure out
how many rows had come back

552
00:22:18,506 --> 00:22:19,966
in our result set and so forth.

553
00:22:20,186 --> 00:22:24,796
But there is a downside of using
this built-in library in PHP.

554
00:22:24,796 --> 00:22:26,566
One is the function
names are atrocious.

555
00:22:26,916 --> 00:22:32,896
The worst of them is recall
mysql_real_escape_string.

556
00:22:33,036 --> 00:22:35,696
Now, just because it's
got awfully named,

557
00:22:35,696 --> 00:22:38,596
it doesn't mean it's
a bad thing to use.

558
00:22:38,596 --> 00:22:41,946
In fact, not using it is a
far worse sin than having use

559
00:22:41,946 --> 00:22:44,046
that particular name
since you'll be vulnerable

560
00:22:44,186 --> 00:22:46,036
to SQL injection attacks,
which we talked briefly

561
00:22:46,036 --> 00:22:48,076
about last time, but we
will look again more closely

562
00:22:48,076 --> 00:22:50,696
at the end of the semester
when we focus on security.

563
00:22:51,096 --> 00:22:53,456
But also, you're doing
a couple of things

564
00:22:53,456 --> 00:22:54,666
that are little short-sighted.

565
00:22:54,666 --> 00:22:59,716
One, you have to--
you're tied now to MySQL.

566
00:22:59,716 --> 00:23:03,886
If you ever decide or your new
boss ever decides or whatever,

567
00:23:04,066 --> 00:23:07,256
someone decides to change
the database from MySQL

568
00:23:07,256 --> 00:23:10,086
to something called PostgreSQL
or Oracle or the like,

569
00:23:10,326 --> 00:23:12,756
you literally have to go through
and rewrite all of your code

570
00:23:12,756 --> 00:23:15,376
or at least do a massive find
and replace, and even then,

571
00:23:15,376 --> 00:23:16,506
some things are likely to break.

572
00:23:16,726 --> 00:23:17,856
So, you've just committed
yourself

573
00:23:17,856 --> 00:23:19,866
from day 1 to using MySQL.

574
00:23:19,866 --> 00:23:22,716
And maybe that's fine and maybe
9 times out of 10 that's fine,

575
00:23:23,036 --> 00:23:26,456
but for the other scenario is
it really worth the aggravation

576
00:23:26,456 --> 00:23:27,096
down the line?

577
00:23:27,466 --> 00:23:29,766
But more compellingly
too, it's so much easier

578
00:23:29,766 --> 00:23:30,946
to make mistakes, right?

579
00:23:30,946 --> 00:23:34,066
If you-- as soon as you start
writing code for project 1,

580
00:23:34,066 --> 00:23:37,286
which involves using a database,
the very first time you forget

581
00:23:37,516 --> 00:23:39,596
to call
mysql_real_escape_string,

582
00:23:39,846 --> 00:23:42,186
that's all it takes for your
site to get compromised.

583
00:23:42,186 --> 00:23:44,946
And at that point,
it would be so nice

584
00:23:45,276 --> 00:23:48,716
if instead someone else could
do the scrubbing of data for you

585
00:23:48,716 --> 00:23:51,446
so that the burden is not on
every one of us in this room

586
00:23:51,446 --> 00:23:54,546
to have to call this particular
function every time we want

587
00:23:54,546 --> 00:23:56,696
to protect ourselves against
theses kinds of attacks.

588
00:23:57,046 --> 00:23:59,456
Better would be to use
a library that does

589
00:23:59,456 --> 00:24:01,266
that kind of escaping for us.

590
00:24:01,606 --> 00:24:04,246
So enter into the picture
something called PDO,

591
00:24:04,246 --> 00:24:07,836
portable data objects, which
refers to a code, like this,

592
00:24:08,416 --> 00:24:11,346
that abstracts away
the detail of what kind

593
00:24:11,346 --> 00:24:12,516
of database you're using.

594
00:24:12,816 --> 00:24:16,486
And by that, I mean, you can
use the exact same code to talk

595
00:24:16,486 --> 00:24:20,296
to MySQL, to talk to Oracle,
Microsoft Access even SQLite,

596
00:24:20,396 --> 00:24:22,646
which recall is just
a little binary file

597
00:24:22,736 --> 00:24:24,186
that you store locally on disk.

598
00:24:24,186 --> 00:24:26,516
So you have this layer
of abstraction now

599
00:24:26,516 --> 00:24:29,746
so that the code you write
doesn't have to change at all

600
00:24:29,986 --> 00:24:31,386
if you do change your database.

601
00:24:31,656 --> 00:24:34,156
The only thing you have
to change is a variable

602
00:24:34,316 --> 00:24:36,296
or an argument that
you pass in up here

603
00:24:36,296 --> 00:24:37,516
in the top, for instance.

604
00:24:37,516 --> 00:24:40,166
Notice that I've said
a variable called DSN,

605
00:24:40,166 --> 00:24:43,966
by convention
mysql:dbname=lecture.

606
00:24:43,966 --> 00:24:49,716
So that's really no different
from last time, host=127.0.0.1.

607
00:24:49,716 --> 00:24:53,266
And the only part of that line
that I would have to change

608
00:24:53,266 --> 00:24:55,096
if I move to something,
like Oracle,

609
00:24:55,446 --> 00:24:57,726
is the one mention of MySQL.

610
00:24:57,726 --> 00:25:00,686
So, essentially, that's a
unique string that identifies

611
00:25:00,946 --> 00:25:05,126
for this library what type
of driver to use, MySQL,

612
00:25:05,126 --> 00:25:08,206
Oracle of the like, and then
what the data base name is

613
00:25:08,206 --> 00:25:10,116
and the host actually is.

614
00:25:10,116 --> 00:25:12,396
And then the user and password
have nothing to do really

615
00:25:12,766 --> 00:25:14,386
with the type of
database you're using.

616
00:25:14,666 --> 00:25:15,746
So, what do I have now?

617
00:25:15,816 --> 00:25:18,036
So for those unfamiliar
with tries and catches,

618
00:25:18,076 --> 00:25:19,906
this simply has to do
with exception handling,

619
00:25:20,186 --> 00:25:22,436
so this doesn't have anything
fundamentally to do to this--

620
00:25:22,436 --> 00:25:26,816
with this idea of using this
library, but the way it works is

621
00:25:26,816 --> 00:25:28,676
that when you call a new PDO

622
00:25:29,066 --> 00:25:31,976
and the jargon here is
instantiating a new object

623
00:25:32,006 --> 00:25:33,206
of type PDO.

624
00:25:33,496 --> 00:25:35,546
You pass in a couple
of three arguments,

625
00:25:35,596 --> 00:25:37,976
the connection string,
user, and password.

626
00:25:38,276 --> 00:25:39,926
And then if something
goes wrong,

627
00:25:40,196 --> 00:25:43,306
rather than returning false
as is the case with a lot

628
00:25:43,306 --> 00:25:45,436
of PHP functions, rather
than triggering an error

629
00:25:45,536 --> 00:25:47,476
which generally forces
your program to die,

630
00:25:47,746 --> 00:25:50,906
it instead does what's
called "throws an exception".

631
00:25:51,126 --> 00:25:54,426
And if your code has not
been designed to catch any

632
00:25:54,456 --> 00:25:58,236
such exceptions, your program
will quit, right then and there,

633
00:25:58,236 --> 00:25:59,906
crashing on the user
effectively.

634
00:26:00,196 --> 00:26:04,036
So instead, if we say "prepare
to catch" a PDO exception,

635
00:26:04,036 --> 00:26:07,736
and if one happens, let's
call it $e, at least, you can,

636
00:26:07,736 --> 00:26:11,066
in a slightly more user-friendly
way, handle that kind of error.

637
00:26:11,066 --> 00:26:13,856
Now what might the error be,
username or password is wrong,

638
00:26:14,016 --> 00:26:16,436
MySQL is down, something
like that, but this line

639
00:26:16,436 --> 00:26:19,426
of code essentially,
becomes our equivalent now,

640
00:26:19,426 --> 00:26:21,156
of mysql_connect.

641
00:26:21,856 --> 00:26:25,606
Now, beyond this, and you'll
see in section tonight as well

642
00:26:25,606 --> 00:26:28,026
as with this tutorial which
we refer to in project 1,

643
00:26:28,496 --> 00:26:31,276
pretty much the-- all of the
lessons from last time and all

644
00:26:31,276 --> 00:26:33,826
of the lessons from tonight
about SQL itself still apply.

645
00:26:33,826 --> 00:26:36,056
This doesn't change SQL, this
doesn't change your selects

646
00:26:36,056 --> 00:26:37,836
or your updates or your
inserts or deletes,

647
00:26:38,116 --> 00:26:40,056
all it changes really
is how you initially,

648
00:26:40,056 --> 00:26:41,146
connect to the database,

649
00:26:41,316 --> 00:26:43,626
and it also just changes
what function you calls.

650
00:26:43,626 --> 00:26:45,966
Instead of mysql_query,
you're instead going

651
00:26:45,966 --> 00:26:48,136
to execute a function
called EXECUTE

652
00:26:48,226 --> 00:26:51,416
or another one called
PREPARE and the nice thing

653
00:26:51,416 --> 00:26:55,486
about PREPARE is that it
creates in database terminology,

654
00:26:55,486 --> 00:26:56,976
which called a prepared
statement.

655
00:26:57,446 --> 00:27:00,016
This is sort of like
a compiled SQL string,

656
00:27:00,226 --> 00:27:02,686
so if you have query
you execute a whole lot,

657
00:27:02,816 --> 00:27:06,256
select star from table
where foo equals bar.

658
00:27:06,496 --> 00:27:08,696
If you're constantly
executing that string--

659
00:27:08,746 --> 00:27:11,956
that query, but maybe only
one part of it is changing,

660
00:27:12,166 --> 00:27:14,756
maybe the value you're searching
for is what's changing,

661
00:27:14,756 --> 00:27:17,976
so select star from users
where ID equals something.

662
00:27:18,356 --> 00:27:21,716
If you want to prepare that
query but call it again

663
00:27:21,716 --> 00:27:23,576
and again and again, and
every time you call it,

664
00:27:23,576 --> 00:27:24,596
you just want to change what?

665
00:27:24,956 --> 00:27:26,836
The ID, that you're
searching for,

666
00:27:27,076 --> 00:27:29,946
you can call PDO's
PREPARE function

667
00:27:30,216 --> 00:27:33,596
and this will essentially
optimize that query

668
00:27:33,986 --> 00:27:37,106
to just have one part of it
changed on each iteration.

669
00:27:37,356 --> 00:27:40,186
And the other upside is
when you plug in an ID

670
00:27:40,186 --> 00:27:43,076
to a prepared statement or
really plug in any value

671
00:27:43,076 --> 00:27:44,216
to a prepared statement,

672
00:27:44,446 --> 00:27:46,216
guess what the library
does for you for free.

673
00:27:46,216 --> 00:27:46,306
Yeah.

674
00:27:46,796 --> 00:27:48,576
>> It escapes it.

675
00:27:48,986 --> 00:27:51,516
>> It escapes it, so
you no longer have

676
00:27:51,516 --> 00:27:53,266
to think about any of that.

677
00:27:53,426 --> 00:27:55,216
So understanding it
still good certainly,

678
00:27:55,436 --> 00:27:58,136
but you no longer have to
worry about trusting yourself

679
00:27:58,136 --> 00:28:00,296
to write a code that's
100% correct

680
00:28:00,446 --> 00:28:02,086
when it comes to security.

681
00:28:02,086 --> 00:28:04,556
So again, let's see some
examples tonight in section.

682
00:28:04,556 --> 00:28:07,076
This is what we'll have
use in project one,

683
00:28:07,276 --> 00:28:10,706
but realize this is just a layer
on top of those MySQL functions

684
00:28:10,706 --> 00:28:12,466
that come built into PHP.

685
00:28:12,466 --> 00:28:15,816
Now there's the PDO library
which just solves a couple

686
00:28:15,816 --> 00:28:18,626
of problems that you
might, otherwise, run into.

687
00:28:19,486 --> 00:28:24,236
All right so last time,
we had life pretty simple.

688
00:28:24,656 --> 00:28:27,396
We only had one table called
"Users" and what kinds

689
00:28:27,396 --> 00:28:29,206
of fields do we have
in our Users' table?

690
00:28:30,056 --> 00:28:30,166
Yeah?

691
00:28:30,556 --> 00:28:31,706
>> Username and password.

692
00:28:31,846 --> 00:28:33,376
>> Username, password
and what else?

693
00:28:34,066 --> 00:28:36,576
>> And we have user ID.

694
00:28:36,626 --> 00:28:38,256
>> We eventually
added a user ID.

695
00:28:38,256 --> 00:28:40,166
Why did we introduce
a user ID, yeah?

696
00:28:41,126 --> 00:28:43,236
>> To be the primary key.

697
00:28:43,446 --> 00:28:44,856
>> We wanted a primary key--

698
00:28:44,856 --> 00:28:47,536
a primary key, which you
uniquely identifies every row

699
00:28:47,536 --> 00:28:49,646
in the table, but why
not just use username?

700
00:28:49,646 --> 00:28:56,806
>> Because username
may be duplicated

701
00:28:56,806 --> 00:28:57,686
and maybe the unique one.

702
00:28:57,686 --> 00:29:00,556
>> OK, well let's assume
that usernames, by design,

703
00:29:00,556 --> 00:29:03,016
will be unique just like the
numeric IDs will be unique

704
00:29:03,016 --> 00:29:05,796
but what was better about
using an integer for the ID,

705
00:29:05,796 --> 00:29:09,666
the unique identifier, as
opposed to just trusting

706
00:29:09,666 --> 00:29:11,396
that usernames are unique?

707
00:29:11,396 --> 00:29:11,466
Yeah?

708
00:29:12,406 --> 00:29:14,666
>> The [inaudible] will
be sending a lot more data

709
00:29:14,666 --> 00:29:16,306
if you send the username.

710
00:29:16,306 --> 00:29:18,436
>> Yeah, exactly, it's
really, a performance thing.

711
00:29:18,436 --> 00:29:21,306
If users can have
variable length usernames,

712
00:29:21,306 --> 00:29:24,266
three characters, four, maybe
even 12 characters long,

713
00:29:24,486 --> 00:29:26,996
if you want to look up a user,
that means you have to search

714
00:29:27,026 --> 00:29:29,656
for 3 or 4 or 12 characters.

715
00:29:29,856 --> 00:29:33,616
By contrast, if you also
assigned users a unique numeric

716
00:29:33,616 --> 00:29:35,646
identifier like an
INT or BIGINT,

717
00:29:35,946 --> 00:29:39,296
then you are spending exactly
4 bytes of maybe 8 bytes

718
00:29:39,296 --> 00:29:42,936
on every user, so you again have
a fixed width table effectively

719
00:29:42,936 --> 00:29:45,516
and it's just much more
efficient in general to search

720
00:29:45,516 --> 00:29:48,716
for integers, whether
they're 32 or 64-bits,

721
00:29:48,716 --> 00:29:51,446
than for arbitrary
length strings.

722
00:29:52,026 --> 00:29:55,166
So that's the reason we
ultimately introduce the ID

723
00:29:55,166 --> 00:29:56,696
field but the problem too,

724
00:29:56,696 --> 00:29:58,806
is that we started telling
the story about normalization,

725
00:29:59,206 --> 00:30:01,676
and we started talking about,
"Well, what if users have cities

726
00:30:01,676 --> 00:30:03,196
and states and zip codes?"

727
00:30:03,486 --> 00:30:05,466
There's obviously going
to be some redundancy,

728
00:30:05,706 --> 00:30:09,276
and so we propose factoring
out what, from a user's table,

729
00:30:09,276 --> 00:30:12,436
if we also want to store their
addresses, cities and states

730
00:30:12,436 --> 00:30:13,356
and zips and so forth.

731
00:30:14,286 --> 00:30:16,196
Well, how did we solve
that inefficiency?

732
00:30:16,446 --> 00:30:16,566
Yeah.

733
00:30:17,086 --> 00:30:20,596
>> We store the zip codes
instead of the actual address.

734
00:30:21,106 --> 00:30:21,716
>> Yeah, good.

735
00:30:21,716 --> 00:30:24,836
So rather than store
Cambridge, mass, O2138.

736
00:30:24,836 --> 00:30:26,426
Cambridge, mass, O2138.

737
00:30:26,426 --> 00:30:29,096
Cambridge, mass-- so it's
already getting boring to say

738
00:30:29,096 --> 00:30:31,866
and imagine just storing that
again and again and again

739
00:30:31,866 --> 00:30:33,896
for every user, we can
do better than this.

740
00:30:33,896 --> 00:30:37,426
Well, what do we minimally need
to identify user's locales,

741
00:30:37,726 --> 00:30:38,936
probably just the zip code.

742
00:30:39,246 --> 00:30:42,036
So now we could have a
zips table that has city,

743
00:30:42,086 --> 00:30:45,696
state and zip as its columns,
but the only thing we put

744
00:30:45,696 --> 00:30:47,736
in the user's table is
which of those three fields?

745
00:30:48,146 --> 00:30:49,056
>> Zip code.

746
00:30:49,276 --> 00:30:50,736
>> The zip code, exactly.

747
00:30:51,016 --> 00:30:53,196
So, that's nice but it feels

748
00:30:53,196 --> 00:30:56,196
like the price I'm now paying
is every time I want to look

749
00:30:56,196 --> 00:30:57,986
up a user and figure
out their address,

750
00:30:57,986 --> 00:30:59,946
now I have to do a
select on the user's table

751
00:30:59,946 --> 00:31:03,466
and also another
on the zip's table.

752
00:31:03,466 --> 00:31:05,756
And my God, imagine if we
did this with other pieces

753
00:31:05,756 --> 00:31:07,726
of data as well, all right?

754
00:31:07,726 --> 00:31:10,706
If we have other common
pieces of data, would we have

755
00:31:10,706 --> 00:31:12,976
to now select from a third
table or fourth table?

756
00:31:12,976 --> 00:31:15,466
It feels like we're solving one
problem but introducing others,

757
00:31:15,466 --> 00:31:17,166
namely ones related
to performance.

758
00:31:17,476 --> 00:31:19,806
But thankfully, relational
databases allow you

759
00:31:19,806 --> 00:31:21,606
to join tables together.

760
00:31:21,606 --> 00:31:23,596
So let's take a look
at this as an example.

761
00:31:23,876 --> 00:31:27,196
So at top right here
there're a couple of tables.

762
00:31:27,196 --> 00:31:30,276
This is taken from a
site called w3schools.com

763
00:31:30,276 --> 00:31:32,746
which has varying
degrees of accuracy

764
00:31:32,746 --> 00:31:35,306
but this particular example
is decent to work with.

765
00:31:35,626 --> 00:31:37,776
We have two tables,
employees and orders.

766
00:31:37,996 --> 00:31:42,106
And in the employee's table we
have a field called Employee_ID

767
00:31:42,296 --> 00:31:43,866
and then another
field called Name.

768
00:31:43,866 --> 00:31:48,246
So it looks like Ola has an
employee ID of 01, Tove has one

769
00:31:48,246 --> 00:31:50,786
of O2, Stephen of
O3, and so forth,

770
00:31:50,936 --> 00:31:52,446
so nothing too unreasonable
there.

771
00:31:52,676 --> 00:31:57,026
And now under Orders, we
have a product ID, a product,

772
00:31:57,026 --> 00:32:00,356
and an employee ID table
that apparently is specifying

773
00:32:00,356 --> 00:32:02,706
who sold these products.

774
00:32:02,846 --> 00:32:05,096
So it's just some kind of
log, maybe it's for commission

775
00:32:05,096 --> 00:32:06,756
so you know whom-- who
to pay and how much

776
00:32:06,756 --> 00:32:08,516
to pay of who sold what.

777
00:32:08,516 --> 00:32:11,296
Well, let's start finding some
faults with this design and see

778
00:32:11,296 --> 00:32:13,716
if we can't fix but let's
also consider what they did

779
00:32:13,716 --> 00:32:14,886
that wasn't bad.

780
00:32:15,316 --> 00:32:17,326
So at the top, the
employee's table,

781
00:32:17,326 --> 00:32:18,916
if you were doing this table,

782
00:32:18,916 --> 00:32:20,526
what would you have
done differently?

783
00:32:21,076 --> 00:32:26,216
Isaac [assumed spelling],
that counts as a hand up.

784
00:32:28,416 --> 00:32:28,536
>> Well--

785
00:32:28,706 --> 00:32:30,706
>> Find faults with the
employee's table at the moment.

786
00:32:31,516 --> 00:32:35,876
[ Inaudible Remark ]

787
00:32:36,376 --> 00:32:40,406
Sir, that's actually--
I forgot your name.

788
00:32:40,966 --> 00:32:41,096
>> Scoot.

789
00:32:41,846 --> 00:32:42,476
>> Scott, yes.

790
00:32:43,886 --> 00:32:45,086
Find faults with the
employee's table.

791
00:32:46,516 --> 00:32:49,606
[ Inaudible Remark ]

792
00:32:50,106 --> 00:32:52,846
OK, good. So we might as--
we have a database, right?

793
00:32:52,846 --> 00:32:55,276
We might as well tease apart,
first name and last name

794
00:32:55,276 --> 00:32:57,436
because storing it with
just the commas is just

795
00:32:57,436 --> 00:32:58,486
completely unnecessary.

796
00:32:58,486 --> 00:33:01,056
It's as though we've
created a fake column by sort

797
00:33:01,056 --> 00:33:04,536
of merging together the idea of
a CSV for the database table,

798
00:33:04,756 --> 00:33:08,396
why not just put last name
field and first name field?

799
00:33:08,396 --> 00:33:09,106
So we could do that.

800
00:33:09,106 --> 00:33:09,896
So that's pretty good.

801
00:33:10,206 --> 00:33:11,346
All right, and what else?

802
00:33:11,346 --> 00:33:14,886
Well, for some reason, these
employee IDs are written as 01,

803
00:33:14,886 --> 00:33:19,086
02 which suggests that they're
using like a CHAR or a VARCHAR.

804
00:33:19,086 --> 00:33:20,626
And that's just dumb
at this point, right?

805
00:33:20,626 --> 00:33:23,176
It should just be a number
and if it's an actual number,

806
00:33:23,176 --> 00:33:25,566
it would suggest that
there's not a prefix of zero

807
00:33:25,566 --> 00:33:27,766
but now we're just kind of
inferring from the example.

808
00:33:27,766 --> 00:33:29,006
But that could be
a potential fault.

809
00:33:29,506 --> 00:33:30,846
So now, what about orders?

810
00:33:30,846 --> 00:33:33,946
Well, it looks like here
that what they have done

811
00:33:33,946 --> 00:33:37,916
which is good, is they have
not put the employee's names

812
00:33:38,026 --> 00:33:39,136
in the order's table.

813
00:33:39,276 --> 00:33:42,436
They've just put the
employee's employee IDs.

814
00:33:42,796 --> 00:33:46,596
So this is similar conceptually
then to not putting city, state,

815
00:33:46,656 --> 00:33:49,526
zip in a user's table
just putting zip.

816
00:33:49,686 --> 00:33:52,656
So that seems like they made a
smart decision, what was kind

817
00:33:52,656 --> 00:33:53,666
of stupid though, nonetheless?

818
00:33:54,546 --> 00:33:56,276
Find fault with the
order's table now.

819
00:33:56,666 --> 00:34:00,086
Jack, that counts as a hand up.

820
00:34:01,066 --> 00:34:03,006
Everyone has to stop
scratching their head.

821
00:34:03,906 --> 00:34:05,526
Find faults with
the order's table.

822
00:34:05,526 --> 00:34:15,696
>> I don't really see much fault
in the order's table, you know,

823
00:34:15,696 --> 00:34:21,126
except for maybe that old
one, like the employee IDs

824
00:34:22,596 --> 00:34:25,986
or being 01, they have 01--

825
00:34:25,986 --> 00:34:27,846
>> I'm going to decline
that one.

826
00:34:27,916 --> 00:34:29,246
So no, I can't reuse that idea.

827
00:34:29,686 --> 00:34:30,926
There're still faults
here though.

828
00:34:30,926 --> 00:34:34,016
Yeah. You could have
an order ID.

829
00:34:34,016 --> 00:34:34,986
Yes, so we have no way

830
00:34:34,986 --> 00:34:36,666
of uniquely identifying
these orders

831
00:34:36,666 --> 00:34:38,436
and maybe there's
no use case for it

832
00:34:38,436 --> 00:34:40,356
but that does seem
inconsistent with the idea

833
00:34:40,356 --> 00:34:42,666
of having employee
IDs and product IDs.

834
00:34:43,166 --> 00:34:45,416
And even beyond that,
there's still--

835
00:34:45,416 --> 00:34:48,896
there's pretty bad fault
in the order's table.

836
00:34:48,896 --> 00:34:49,206
Axel.

837
00:34:49,566 --> 00:34:51,456
>> I don't know if
this counts as a fault

838
00:34:51,456 --> 00:34:55,546
but I really wouldn't order
the tables from the employee ID

839
00:34:55,546 --> 00:34:58,096
and order from the
product ID [inaudible].

840
00:34:58,256 --> 00:35:00,636
>> OK, order the
table from product ID.

841
00:35:00,636 --> 00:35:03,446
But isn't that what
they've done it on the left?

842
00:35:03,446 --> 00:35:08,726
>> Yes, but what about
to the 35 and 36?

843
00:35:09,146 --> 00:35:09,846
>> Oh, to the 30--

844
00:35:09,846 --> 00:35:11,126
>> That's just arbitrarily.

845
00:35:11,576 --> 00:35:14,736
>> I see. So let's-- it
is an arbitrary example

846
00:35:14,736 --> 00:35:15,816
and maybe those were
also got deleted

847
00:35:15,816 --> 00:35:17,246
because they were canceled
even though that's kind

848
00:35:17,246 --> 00:35:18,376
of a lot of cancellations.

849
00:35:18,476 --> 00:35:18,676
Jack.

850
00:35:18,976 --> 00:35:19,576
>> I figured it.

851
00:35:19,616 --> 00:35:21,926
They have product ID,
and the product stored

852
00:35:21,926 --> 00:35:23,376
in the table while-- yes,

853
00:35:23,476 --> 00:35:25,636
they have a product ID they
can store that somewhere else.

854
00:35:25,636 --> 00:35:25,946
>> Exactly.

855
00:35:26,516 --> 00:35:30,976
[ Inaudible Remark ]

856
00:35:31,476 --> 00:35:34,616
Exactly. So whoever made this
table kind of got it half right

857
00:35:34,616 --> 00:35:36,906
and they did factor
out the employee ID

858
00:35:36,906 --> 00:35:39,246
and put only the employee
ID here, but then they sort

859
00:35:39,246 --> 00:35:41,266
of forgot that lesson
learned and seemed

860
00:35:41,266 --> 00:35:43,936
to be duplicating product name,
product name, product name,

861
00:35:43,936 --> 00:35:45,766
product name, and
you can't see it here

862
00:35:45,766 --> 00:35:48,646
but suppose another person
orders another printer.

863
00:35:48,646 --> 00:35:50,596
Well, what's going to
end up in the next row?

864
00:35:50,806 --> 00:35:53,676
Well, it's going to be a product
ID of 234 but then the word

865
00:35:53,676 --> 00:35:55,316
"printer" in the second column

866
00:35:55,316 --> 00:35:58,466
and then whoever the employee ID
is that's sold on that printer.

867
00:35:58,696 --> 00:36:00,956
So we probably should
introduce another table.

868
00:36:00,956 --> 00:36:03,026
What should that table
be called reasonably?

869
00:36:03,746 --> 00:36:03,996
>> Products.

870
00:36:04,466 --> 00:36:05,146
>> Products, right?

871
00:36:05,146 --> 00:36:06,456
So like a product table.

872
00:36:06,676 --> 00:36:10,556
Inside of which is a product
ID field, and a name field

873
00:36:10,936 --> 00:36:13,766
which then suggests that how
should we fix the order's table,

874
00:36:13,766 --> 00:36:14,516
what should go?

875
00:36:14,516 --> 00:36:15,106
Yeah, Isaac.

876
00:36:16,536 --> 00:36:18,936
>> Just product ID.

877
00:36:18,936 --> 00:36:21,686
>> So just keep the
product IDs and ditch what?

878
00:36:21,686 --> 00:36:23,616
>> And take out then product.

879
00:36:23,616 --> 00:36:25,316
>> Exactly, ditch the product.

880
00:36:25,496 --> 00:36:28,346
Now, if you continue this kind
of normalization as it's called,

881
00:36:28,546 --> 00:36:31,536
your database starts to
get very cleanly designed.

882
00:36:31,536 --> 00:36:35,946
And by clean, I mean, there is
only one authoritative place

883
00:36:36,026 --> 00:36:38,156
to find out what the
name is of a product.

884
00:36:38,156 --> 00:36:40,516
You check the products' table
and there's only one mention

885
00:36:40,516 --> 00:36:42,076
of printer or table or chair.

886
00:36:42,286 --> 00:36:43,846
It's not duplicated
all over the place.

887
00:36:43,846 --> 00:36:45,826
And this is good obviously
if you just change the name

888
00:36:45,826 --> 00:36:48,006
of something, you wouldn't
want to have to go through all

889
00:36:48,006 --> 00:36:49,486
of your various tables
looking for all

890
00:36:49,486 --> 00:36:52,636
of the redundantly named things
just so you can update them.

891
00:36:53,006 --> 00:36:54,196
So that's good, but again,

892
00:36:54,196 --> 00:36:55,826
it feels like I'm creating
a huge amount of work

893
00:36:55,826 --> 00:36:58,536
for myself now because
whereas in a simple world,

894
00:36:58,676 --> 00:37:00,786
I just select star
from one table,

895
00:37:01,026 --> 00:37:02,846
now I have like three
different tables.

896
00:37:02,846 --> 00:37:05,196
And how do I select
data simultaneously

897
00:37:05,446 --> 00:37:08,536
so that the data I get back is
representative of a given moment

898
00:37:08,536 --> 00:37:11,496
of time and not from
this second followed

899
00:37:11,496 --> 00:37:13,866
by another second
followed by another second,

900
00:37:14,086 --> 00:37:16,826
whereby your queries themselves
might be some number of seconds

901
00:37:16,826 --> 00:37:17,806
or milliseconds apart.

902
00:37:17,986 --> 00:37:19,556
How do I get a snapshot in time?

903
00:37:19,746 --> 00:37:21,526
Well, we can actually
do this all at once.

904
00:37:21,526 --> 00:37:23,596
We can do this with a couple
of different syntaxis.

905
00:37:23,846 --> 00:37:24,756
So let's try this.

906
00:37:24,986 --> 00:37:26,126
Let's assume for the moment

907
00:37:26,656 --> 00:37:30,136
that this is good enough even
though the faults we already

908
00:37:30,136 --> 00:37:33,186
found, if we assume these
table structures, though,

909
00:37:33,186 --> 00:37:35,836
how can we go about
querying data

910
00:37:35,836 --> 00:37:37,456
and getting it back all at once?

911
00:37:38,036 --> 00:37:40,786
Well, here's some slightly
new syntax and I've written

912
00:37:40,786 --> 00:37:42,786
on three lines really
just for readability.

913
00:37:42,836 --> 00:37:43,736
SQL doesn't care.

914
00:37:43,736 --> 00:37:47,136
It only matters when-- that
the whole expression is

915
00:37:47,136 --> 00:37:48,016
syntactically valid.

916
00:37:48,356 --> 00:37:52,896
So SELECT
Employees.Name,Orders.Product.

917
00:37:53,416 --> 00:37:54,536
Now, what does this mean?

918
00:37:54,536 --> 00:37:57,336
Well, Employees.Name, as you
can probably infer, is referring

919
00:37:57,336 --> 00:37:58,746
to the name field of what table?

920
00:37:59,406 --> 00:38:00,846
Employee. So that's all.

921
00:38:00,846 --> 00:38:02,386
The dot notation
does exactly that,

922
00:38:02,386 --> 00:38:05,176
and we saw the dot notation
actually [inaudible] last week.

923
00:38:05,556 --> 00:38:08,826
Orders.Product, same idea,
select the product name

924
00:38:08,896 --> 00:38:10,316
from the order's table.

925
00:38:10,896 --> 00:38:13,226
Where do you want to
select those fields from?

926
00:38:13,226 --> 00:38:14,826
It's a little-- it
feels a little redundant

927
00:38:14,826 --> 00:38:16,026
but this is just the way it is.

928
00:38:16,186 --> 00:38:19,206
You then specify what tables
this query is selecting

929
00:38:19,206 --> 00:38:19,656
data from.

930
00:38:19,656 --> 00:38:22,286
So you have to say, from
employees comma orders.

931
00:38:22,376 --> 00:38:24,236
And it doesn't matter of the
order in which you say them

932
00:38:24,236 --> 00:38:25,916
but you have to say the
table names that you want

933
00:38:25,916 --> 00:38:26,886
to involve in this query.

934
00:38:27,386 --> 00:38:28,986
But now I'm doing
this last predicate.

935
00:38:28,986 --> 00:38:30,456
And this is definitely
more involved

936
00:38:30,456 --> 00:38:32,836
than the ones we've
looked at briefly thus far

937
00:38:33,116 --> 00:38:37,986
where Employees.Employee_
ID=Orders.Employee_ID.

938
00:38:37,986 --> 00:38:42,096
In English, what is that
predicate doing for us?

939
00:38:42,636 --> 00:38:42,716
Yeah.

940
00:38:46,516 --> 00:38:52,296
[ Inaudible Remark ]

941
00:38:52,796 --> 00:38:55,406
Exactly. It's creating a
cross reference of sort

942
00:38:55,406 --> 00:38:58,066
between the two tables
namely employees and orders

943
00:38:58,486 --> 00:39:03,446
and it's explaining
to the database how

944
00:39:03,446 --> 00:39:05,416
to join those two
tables together.

945
00:39:05,746 --> 00:39:07,626
Now, I can sort of simulate
this with my fingers.

946
00:39:07,626 --> 00:39:09,286
Let me pull up the
data over here.

947
00:39:09,476 --> 00:39:11,846
What is obviously
the one common field

948
00:39:11,846 --> 00:39:14,436
between the employee's
table and the order's table?

949
00:39:14,436 --> 00:39:17,336
So it's the Employee_ID, right?

950
00:39:17,336 --> 00:39:18,476
That's the only that's in both.

951
00:39:18,636 --> 00:39:23,546
So if we again kind of do the
little physical depiction I did

952
00:39:23,546 --> 00:39:24,846
this with my fingers last week,

953
00:39:24,846 --> 00:39:27,126
suppose that here are the
employees and supposed

954
00:39:27,126 --> 00:39:29,406
that these are the employee IDs,
the tips of my fingers here.

955
00:39:29,556 --> 00:39:32,246
And now the order's table
also has employee ID

956
00:39:32,246 --> 00:39:33,986
and it's the tips
of my fingers here.

957
00:39:34,206 --> 00:39:36,056
Effectively, what
that query is doing

958
00:39:36,056 --> 00:39:39,676
by saying match
Employees.Employee

959
00:39:39,676 --> 00:39:41,916
_ID=orders.employee_ID,

960
00:39:42,126 --> 00:39:44,206
it's saying to line
up the employee's ID.

961
00:39:44,206 --> 00:39:47,086
So if this is 001 on the
left, it's 01 on the right.

962
00:39:47,316 --> 00:39:49,266
If it's 03 on the left,
it's 03 on the right.

963
00:39:49,266 --> 00:39:52,886
So you kind of join these
things together using identical

964
00:39:52,886 --> 00:39:55,986
employee IDs and what does
this whole query return?

965
00:39:56,236 --> 00:39:59,196
What returns a result say
which recalls a fancy way

966
00:39:59,196 --> 00:40:01,686
of saying a temporary table,
that table looks like this

967
00:40:02,156 --> 00:40:04,456
where these fields have
been conjoined somehow

968
00:40:04,636 --> 00:40:08,686
and each row represents the
combination of some data

969
00:40:08,686 --> 00:40:11,446
from one table with
the other table.

970
00:40:12,286 --> 00:40:14,656
So, let's actually try
to see this in practice.

971
00:40:14,656 --> 00:40:16,976
Let's go and recreate then
we'll improve upon it.

972
00:40:17,236 --> 00:40:21,576
So let me go over to phpMyAdmin
in the appliance, and I'm going

973
00:40:21,576 --> 00:40:24,916
to go ahead and create
a new database.

974
00:40:25,496 --> 00:40:29,826
I'm going to go ahead and
call this jharvard_Monday just

975
00:40:29,826 --> 00:40:30,776
for kicks today.

976
00:40:30,776 --> 00:40:31,886
So now we have a clean database.

977
00:40:31,886 --> 00:40:33,496
We won't commingle
it with last time.

978
00:40:33,796 --> 00:40:35,226
Let me choose the
Monday database.

979
00:40:35,546 --> 00:40:36,816
And now let's create a table.

980
00:40:36,816 --> 00:40:39,586
Now, in general I would not
do what W3Schools did here

981
00:40:39,586 --> 00:40:41,736
with the capitalization, but
let's just keep it identical.

982
00:40:41,736 --> 00:40:43,226
So we'll create an
employee's table,

983
00:40:43,406 --> 00:40:47,176
and how many columns
did it have?

984
00:40:47,276 --> 00:40:49,676
Two. So let's do that and
then the first one was

985
00:40:49,676 --> 00:40:51,106
called Employee_ID.

986
00:40:51,316 --> 00:40:55,626
It could be an INT but
let's just do BIGINT just

987
00:40:55,626 --> 00:40:56,706
to be different this time.

988
00:40:56,706 --> 00:40:58,446
And then it had a name
field which should be

989
00:40:58,446 --> 00:40:59,476
of what type, probably?

990
00:41:00,976 --> 00:41:02,566
Yes. So let's go in the
VARCHAR, and now we have

991
00:41:02,566 --> 00:41:03,646
to make a design decision,

992
00:41:03,646 --> 00:41:05,716
how many characters
should the name be?

993
00:41:08,476 --> 00:41:08,656
>> Fifty.

994
00:41:08,796 --> 00:41:09,496
>> Fifty? OK.

995
00:41:09,496 --> 00:41:10,426
So let's-- 50.

996
00:41:10,426 --> 00:41:13,526
Something reasonable but be
consistent in general probably.

997
00:41:13,766 --> 00:41:15,666
And now what else do
we want to choose here?

998
00:41:15,666 --> 00:41:18,236
Should either ID or
employee name be null?

999
00:41:19,756 --> 00:41:22,396
Probably not, I want both
really in my database.

1000
00:41:22,556 --> 00:41:27,866
Index, should employee ID
be a primary unique index

1001
00:41:27,926 --> 00:41:29,586
or a full text?

1002
00:41:30,696 --> 00:41:30,996
>> Primary.

1003
00:41:30,996 --> 00:41:31,446
>> Primary.

1004
00:41:31,446 --> 00:41:33,956
Why, because it hopefully
uniquely identifies the user.

1005
00:41:34,086 --> 00:41:35,626
And we're going to kind
of clean up 1 to 10.

1006
00:41:35,626 --> 00:41:38,906
We're not going to have prefixed
leading 0s in our employee IDs.

1007
00:41:38,906 --> 00:41:40,786
Let's actually use a
number like a BIGINT.

1008
00:41:41,236 --> 00:41:43,406
How about the name?

1009
00:41:43,406 --> 00:41:45,106
Should we make the names unique?

1010
00:41:47,016 --> 00:41:48,046
[ Inaudible Remark ]

1011
00:41:48,046 --> 00:41:50,166
Right, there's a lot of like
Mike Smiths in the world.

1012
00:41:50,166 --> 00:41:52,916
It would kind of be unfortunate
if we can only hire one

1013
00:41:52,916 --> 00:41:55,336
at our company so we probably
should not impose that.

1014
00:41:55,786 --> 00:41:59,886
When might I want to
define an INDEX on a field?

1015
00:41:59,886 --> 00:42:02,266
So it's not unique INDEX
it's just an INDEX.

1016
00:42:02,346 --> 00:42:04,616
What was the advantage
of defining an INDEX?

1017
00:42:05,486 --> 00:42:10,356
I didn't really talk about this
but just think instinctively,

1018
00:42:10,686 --> 00:42:14,036
why would you want to index,
so to speak, a database field?

1019
00:42:14,216 --> 00:42:14,896
>> For searchability

1020
00:42:15,166 --> 00:42:17,146
>> For searchability,
for performance, right?

1021
00:42:17,146 --> 00:42:19,806
If a field is not indexed
and you want to search on it,

1022
00:42:20,036 --> 00:42:21,706
you essentially have
to do linear search.

1023
00:42:21,706 --> 00:42:23,156
You have to search
the whole column

1024
00:42:23,366 --> 00:42:24,896
to find the values
you're looking for.

1025
00:42:25,106 --> 00:42:27,206
But if instead you as
the database designer,

1026
00:42:27,206 --> 00:42:30,906
say index this field, you're
essentially providing a hint

1027
00:42:30,906 --> 00:42:34,476
to MySQL that you or someone
will maybe want to search

1028
00:42:34,476 --> 00:42:37,776
on this field, so please
do some effort upfront

1029
00:42:38,086 --> 00:42:40,236
to optimize those
future searches.

1030
00:42:40,236 --> 00:42:42,266
And it will create some
kind of tree structure,

1031
00:42:42,266 --> 00:42:44,836
it's generally called the B-tree
structure that gives us more

1032
00:42:44,836 --> 00:42:47,706
like logarithmic search time
instead of linear search time,

1033
00:42:47,706 --> 00:42:49,916
which for those unfamiliar,
just means it will be faster.

1034
00:42:50,226 --> 00:42:52,526
How about AI, which is
just a shorthand way

1035
00:42:52,526 --> 00:42:53,646
of saying auto_increments?

1036
00:42:53,646 --> 00:42:55,476
Should name or ID
be auto_increment?

1037
00:42:56,486 --> 00:42:58,476
Probably ID if we don't
want to have to deal

1038
00:42:58,476 --> 00:42:59,946
with managing those ourselves.

1039
00:42:59,946 --> 00:43:02,256
In terms of defaults,
we probably don't want

1040
00:43:02,256 --> 00:43:04,376
to give people default
names or anything like that,

1041
00:43:04,416 --> 00:43:05,796
so let's leave everything
else alone.

1042
00:43:05,796 --> 00:43:07,386
Lastly, let me mention this.

1043
00:43:07,386 --> 00:43:08,256
We'll come back to this.

1044
00:43:08,256 --> 00:43:10,596
We didn't choose this
last time but notice

1045
00:43:10,596 --> 00:43:14,176
under database engine, there are
different engines you can use.

1046
00:43:14,176 --> 00:43:16,856
And I made the comparison
with like NTFS and HFS+

1047
00:43:16,936 --> 00:43:19,506
for file systems, if you're
familiar with those words.

1048
00:43:19,776 --> 00:43:22,436
But for now we're going to
leave it on the default InnoDB,

1049
00:43:22,696 --> 00:43:25,826
but later we'll see what
the actual implications are

1050
00:43:25,826 --> 00:43:28,856
of not giving that
choice much thought.

1051
00:43:29,176 --> 00:43:30,226
There will be some tradeoffs.

1052
00:43:30,226 --> 00:43:30,936
So let's save.

1053
00:43:31,576 --> 00:43:31,836
All right.

1054
00:43:31,836 --> 00:43:33,626
So what query was just executed?

1055
00:43:33,626 --> 00:43:35,836
Again, here is one of the
upsides of phpMyAdmin.

1056
00:43:35,836 --> 00:43:37,176
It kind of teaches you as you go

1057
00:43:37,376 --> 00:43:40,226
by telling you what query
it actually executed

1058
00:43:40,226 --> 00:43:42,116
to do your bidding
just a moment ago.

1059
00:43:42,386 --> 00:43:44,456
So we'd executed to
create table statements.

1060
00:43:44,786 --> 00:43:47,756
So now, let's go ahead
and create this table.

1061
00:43:48,456 --> 00:43:55,236
And let's go ahead and
choose create table, orders.

1062
00:43:55,506 --> 00:43:57,466
How many columns
does this thing have?

1063
00:43:58,176 --> 00:43:59,436
There's three at the moment.

1064
00:43:59,436 --> 00:44:02,056
And we'll just deal with
the imperfections of it now.

1065
00:44:02,276 --> 00:44:06,906
In those three columns,
where what, product ID?

1066
00:44:07,016 --> 00:44:09,066
So, Prod_ID.

1067
00:44:09,106 --> 00:44:11,476
And then we had product.

1068
00:44:12,816 --> 00:44:14,386
And then we had what,
as the last field?

1069
00:44:14,896 --> 00:44:15,366
>> Employee_ID.

1070
00:44:15,706 --> 00:44:17,366
>> Good, Employee_ID.

1071
00:44:18,186 --> 00:44:20,996
All right, so for product
ID, what should this guy be?

1072
00:44:20,996 --> 00:44:21,906
Let's go with BIGINTS.

1073
00:44:21,906 --> 00:44:24,406
So if we're going to use BIGINTS
everywhere, let's just use them

1074
00:44:24,456 --> 00:44:26,696
for this guy too,
product should be what?

1075
00:44:27,206 --> 00:44:27,396
>> VARCHAR.

1076
00:44:29,946 --> 00:44:30,816
>> VARCHAR.

1077
00:44:30,816 --> 00:44:32,026
How long should the
products name be?

1078
00:44:32,026 --> 00:44:35,196
>> Depends on what
you're selling.

1079
00:44:35,256 --> 00:44:36,246
>> Depends on what
you're selling.

1080
00:44:36,636 --> 00:44:39,366
We are selling printers,
tables, and chairs.

1081
00:44:39,896 --> 00:44:40,116
>> Twenty.

1082
00:44:41,016 --> 00:44:42,256
>> Twenty, all right?

1083
00:44:42,956 --> 00:44:44,746
The marketing people will be
upset with you when they come

1084
00:44:44,746 --> 00:44:45,686
up with the longer word.

1085
00:44:45,686 --> 00:44:46,236
But that's OK.

1086
00:44:46,656 --> 00:44:47,906
Employee_ID should be what?

1087
00:44:48,256 --> 00:44:48,486
>> BIGINTS.

1088
00:44:51,136 --> 00:44:53,766
>> BIGINTS for consistency
now, right, because this is

1089
00:44:53,766 --> 00:44:56,566
as we'll call it, a
foreign key whereas

1090
00:44:56,606 --> 00:44:58,606
in the other table
it was a primary key.

1091
00:44:58,836 --> 00:45:01,776
All right, we'll
leave blank there.

1092
00:45:01,776 --> 00:45:02,466
Let's see.

1093
00:45:02,466 --> 00:45:04,106
Null, should any of
these fields be null?

1094
00:45:04,106 --> 00:45:06,866
Probably not, unless you
want no one to be able

1095
00:45:06,866 --> 00:45:07,656
to solve these things.

1096
00:45:08,096 --> 00:45:12,836
Index, which should
the product ID be?

1097
00:45:17,496 --> 00:45:18,716
>> Primary case.

1098
00:45:19,416 --> 00:45:19,906
>> Primary case.

1099
00:45:19,906 --> 00:45:23,096
So this one is kind of a trick
question, so I would argue

1100
00:45:23,096 --> 00:45:28,416
that primary key is not
appropriate here, but why?

1101
00:45:29,976 --> 00:45:31,286
Here's the-- here
are the tables again.

1102
00:45:31,286 --> 00:45:31,366
Yeah.

1103
00:45:31,466 --> 00:45:36,386
>> It's not really
that one search word,

1104
00:45:36,386 --> 00:45:40,096
you want to link the
product to the employee.

1105
00:45:41,196 --> 00:45:43,166
>> True, but I would
argue I would want

1106
00:45:43,166 --> 00:45:44,396
to search on product ID.

1107
00:45:44,396 --> 00:45:46,866
Suppose I want to know how
many chairs have we sold,

1108
00:45:46,866 --> 00:45:48,566
I could do a select
star from orders

1109
00:45:48,566 --> 00:45:50,886
where product ID equals 865.

1110
00:45:51,956 --> 00:45:54,386
So, searching on is
compelling, but primary key,

1111
00:45:54,386 --> 00:45:57,196
primary key means that
uniquely identifies the row.

1112
00:45:57,576 --> 00:46:01,306
So what would the implication
be of making product ID unique

1113
00:46:01,676 --> 00:46:04,536
in a primary key at that,
in this particular table?

1114
00:46:04,536 --> 00:46:04,656
Yeah.

1115
00:46:05,186 --> 00:46:08,406
>> If you had more than
one chair and your table

1116
00:46:08,406 --> 00:46:10,206
which you're probably
going to have [inaudible].

1117
00:46:10,206 --> 00:46:10,736
>> Exactly.

1118
00:46:10,736 --> 00:46:13,666
We could only ever sell
one printer or one table

1119
00:46:13,666 --> 00:46:14,996
or one chair, or rather

1120
00:46:15,256 --> 00:46:18,206
in individual employee
could only sell one printer

1121
00:46:18,206 --> 00:46:20,536
or one table, or one chair
because again the definition

1122
00:46:20,536 --> 00:46:23,376
of primary key must
uniquely identify a row.

1123
00:46:23,766 --> 00:46:27,646
So, if you have another-- if
you have employe number 1,

1124
00:46:27,646 --> 00:46:31,316
if Ola sells a second printer,
what should the row be?

1125
00:46:31,396 --> 00:46:33,886
It should be 2, 3,
4 printer or 1,

1126
00:46:34,256 --> 00:46:36,956
but that would violate the
primary key constraint.

1127
00:46:37,096 --> 00:46:39,586
So really, someone else
offered up the solution earlier,

1128
00:46:39,586 --> 00:46:41,256
what field is really missing

1129
00:46:41,256 --> 00:46:42,916
from the order's table
that should be there?

1130
00:46:43,016 --> 00:46:44,156
[ Inaudible Remark ]

1131
00:46:44,156 --> 00:46:47,106
I might be just call
it order ID.

1132
00:46:47,546 --> 00:46:49,886
So something that uniquely
identifies the order.

1133
00:46:49,886 --> 00:46:51,366
And if we did introduce that,

1134
00:46:51,366 --> 00:46:53,966
I would propose making
order ID the primary key

1135
00:46:54,166 --> 00:46:56,796
because then you can start
it one and two and three

1136
00:46:56,796 --> 00:47:00,016
and for everything you sell,
you can increment the order ID.

1137
00:47:00,396 --> 00:47:02,496
So, what might we
want to do then

1138
00:47:02,496 --> 00:47:05,476
with the poorly designed
table that we have here?

1139
00:47:05,756 --> 00:47:08,596
Well, let's go ahead and-- I'm
going to propose indexing it.

1140
00:47:08,766 --> 00:47:11,006
It's not unique but this
way I can search on it,

1141
00:47:11,076 --> 00:47:13,426
which reasonable to want
to search on an ID fields.

1142
00:47:13,646 --> 00:47:15,586
Now the next field was Name.

1143
00:47:15,666 --> 00:47:18,496
Should we have any keys
on the product name?

1144
00:47:24,276 --> 00:47:24,346
>> No.

1145
00:47:24,466 --> 00:47:25,716
>> OK, I heard a no.

1146
00:47:25,836 --> 00:47:27,056
Anyone who want to
argue the opposite,

1147
00:47:27,056 --> 00:47:30,036
and then well flesh
out which is best?

1148
00:47:30,106 --> 00:47:36,536
So I'll play the contrarian.

1149
00:47:36,596 --> 00:47:40,416
So I'm going to propose yes,
I do want an INDEX on here.

1150
00:47:41,656 --> 00:47:44,376
Why might I be taking
that position no?

1151
00:47:45,386 --> 00:47:48,166
And again the field in question
is the name of the product.

1152
00:47:48,916 --> 00:47:48,983
Yeah.

1153
00:47:49,226 --> 00:47:52,026
>> We want to search for
all tables that were sold?

1154
00:47:52,136 --> 00:47:53,206
>> Yeah, what if
I want to search

1155
00:47:53,206 --> 00:47:54,516
for all tables that were sold?

1156
00:47:54,516 --> 00:47:56,226
So I'm, you know,
the salesperson.

1157
00:47:56,226 --> 00:47:59,306
I don't really remember what
the number is for that product

1158
00:47:59,306 --> 00:48:01,416
but I want to check how
many chairs have I sold

1159
00:48:01,416 --> 00:48:03,776
or my boss wants to check
how many chairs have I sold,

1160
00:48:04,156 --> 00:48:05,856
and maybe we have
different types of chairs.

1161
00:48:05,856 --> 00:48:08,346
Maybe we have products called
big chair, small chair,

1162
00:48:08,346 --> 00:48:09,916
white chair, black
chair, you know,

1163
00:48:10,006 --> 00:48:13,116
any kind of longer product name
that has the key word chair

1164
00:48:13,426 --> 00:48:15,736
but I, like a person
just using a computer,

1165
00:48:15,736 --> 00:48:17,546
I just want to do the equivalent
of like a Goggle search.

1166
00:48:17,546 --> 00:48:18,656
I want to search for chair.

1167
00:48:18,896 --> 00:48:21,026
Now, what's the SQL
keyword that we can use

1168
00:48:21,056 --> 00:48:22,816
to do those kinds
of keyword searches?

1169
00:48:23,386 --> 00:48:23,486
>> Like.

1170
00:48:24,636 --> 00:48:26,576
>> Like. So we saw like earlier.

1171
00:48:26,756 --> 00:48:30,056
So actually, the like expression
here would probably be

1172
00:48:30,326 --> 00:48:34,206
"%chair%", because that
would find me any products

1173
00:48:34,206 --> 00:48:35,866
that have the word
chair in them.

1174
00:48:35,986 --> 00:48:38,416
Now that could fail if there's
some weird word out there

1175
00:48:38,416 --> 00:48:43,026
that has chair as a
prefix or as a substring,

1176
00:48:43,356 --> 00:48:45,796
but it's probably good
enough for our purposes here.

1177
00:48:46,036 --> 00:48:48,766
So why might I want
to index then the name

1178
00:48:48,766 --> 00:48:50,016
if I want to search on it?

1179
00:48:50,116 --> 00:48:52,886
Now, maybe I do, maybe I don't
but if I do I should have

1180
00:48:52,926 --> 00:48:54,656
that as an index, but not unique

1181
00:48:54,656 --> 00:48:56,056
because then I could
only share--

1182
00:48:56,056 --> 00:48:57,576
sell one chair for instance.

1183
00:48:57,966 --> 00:48:59,526
Now lastly, employee ID,

1184
00:49:00,536 --> 00:49:03,556
should employee ID be a
primary key, unique or index?

1185
00:49:04,046 --> 00:49:09,486
Full text isn't relevant
because it's not a text field.

1186
00:49:10,456 --> 00:49:13,436
Yeah. Exactly, probably you want
to index it because it is an ID

1187
00:49:13,576 --> 00:49:16,906
and we want to not
only search on it here,

1188
00:49:17,026 --> 00:49:18,586
we also want to do
the join thing.

1189
00:49:18,586 --> 00:49:21,406
And in fact, anytime you're
joining one table with another,

1190
00:49:21,646 --> 00:49:23,866
the field or fields
that you want to join

1191
00:49:23,866 --> 00:49:27,826
on should be defined as having
indexes for performance,

1192
00:49:27,826 --> 00:49:30,226
otherwise, again, it's a
very expensive operation.

1193
00:49:30,226 --> 00:49:31,246
If you remember--
what's your name again?

1194
00:49:31,436 --> 00:49:31,706
>> Ben.

1195
00:49:31,776 --> 00:49:34,076
>> Ben, OK, trying
to get them all now.

1196
00:49:34,076 --> 00:49:34,486
All right.

1197
00:49:34,616 --> 00:49:36,466
So, let's finish this up.

1198
00:49:36,466 --> 00:49:37,926
Let's make this an INDEX.

1199
00:49:38,196 --> 00:49:39,866
And should any of these
be auto_increment?

1200
00:49:39,866 --> 00:49:45,976
>> If we had like
a sales ID that's--

1201
00:49:45,976 --> 00:49:48,456
>> Good. If we had, let's
keep calling it an order ID.

1202
00:49:48,456 --> 00:49:49,876
Then yes, that would
be reasonable

1203
00:49:49,876 --> 00:49:52,286
if order ID uniquely
identifies that order.

1204
00:49:52,476 --> 00:49:54,016
But as it stands now, no,

1205
00:49:54,016 --> 00:49:55,866
I don't want this auto
incrementing because I'm going

1206
00:49:55,866 --> 00:49:58,826
to deliberately specify
who sold what

1207
00:49:58,826 --> 00:49:59,976
and what that ID actually is.

1208
00:50:00,186 --> 00:50:02,186
So let me go ahead
and click Save.

1209
00:50:02,566 --> 00:50:07,176
And now let's go ahead and
look back at this query.

1210
00:50:07,476 --> 00:50:12,286
So, when I SELECT
Employees.Name, Orders.Product

1211
00:50:12,476 --> 00:50:14,826
FROM Employees, Orders
WHERE Employee_ID

1212
00:50:14,826 --> 00:50:17,046
in one table equals
Employee_ID in the other,

1213
00:50:17,336 --> 00:50:19,786
I get back this sort
of visual effect,

1214
00:50:19,786 --> 00:50:21,716
and what does the
temporary table look like?

1215
00:50:21,716 --> 00:50:26,366
Well, if we kind of do that
query, we get back this table.

1216
00:50:26,856 --> 00:50:28,956
So, it's a temporary
table, it's my result set,

1217
00:50:28,956 --> 00:50:30,636
it has some number of
rows, in this case,

1218
00:50:30,786 --> 00:50:34,036
three and I have all
Ola and Stephen and all

1219
00:50:34,036 --> 00:50:35,576
of the items that they sold.

1220
00:50:35,826 --> 00:50:37,606
But this is a little
confusing to me.

1221
00:50:38,046 --> 00:50:42,906
I thought I have colleagues
named Tove and Kari also.

1222
00:50:43,636 --> 00:50:46,006
Why are they not in my table?

1223
00:50:46,006 --> 00:50:47,956
What's the mistake or
problem here, Isaac?

1224
00:50:48,106 --> 00:50:50,106
[ Inaudible Remark ]

1225
00:50:50,196 --> 00:50:52,516
Yeah. This isn't a bug, right?

1226
00:50:52,516 --> 00:50:55,046
It's just that neither of them
is selling very much right now

1227
00:50:55,046 --> 00:50:57,026
because look at the orders
table, who sold things,

1228
00:50:57,026 --> 00:51:01,366
employee ID one and then three
and three sold two things.

1229
00:51:01,626 --> 00:51:03,766
So, Tove and Kari
haven't sold anything.

1230
00:51:03,766 --> 00:51:06,566
So this is indeed
correct, it just so happens

1231
00:51:06,566 --> 00:51:09,176
that only Stephen and
Ola have sold something

1232
00:51:09,566 --> 00:51:11,216
from among the products.

1233
00:51:11,886 --> 00:51:14,706
So, there's another syntax now
we can use to do this join.

1234
00:51:14,706 --> 00:51:17,656
This is what's generally
called an implicit join,

1235
00:51:17,966 --> 00:51:18,906
why is it implicit?

1236
00:51:18,906 --> 00:51:21,316
Well, I have nowhere used
literally the world join

1237
00:51:21,516 --> 00:51:22,526
and you're just kind

1238
00:51:22,526 --> 00:51:25,676
of implicitly saying create
a new table that's the result

1239
00:51:25,676 --> 00:51:27,456
of joining these two
tables by a way of the

1240
00:51:27,456 --> 00:51:29,296
where clause that I had there.

1241
00:51:29,296 --> 00:51:30,976
But we can be more explicit here

1242
00:51:30,976 --> 00:51:32,026
and we can do something
like this.

1243
00:51:32,566 --> 00:51:37,286
Notice this is after,
before, after, before.

1244
00:51:37,646 --> 00:51:39,606
So, here's the after
version, what's different?

1245
00:51:39,806 --> 00:51:42,736
I still say SELECT
Employees.Name, Orders.Product

1246
00:51:42,936 --> 00:51:46,826
but this time, I say From
Employees JOIN Orders.

1247
00:51:47,146 --> 00:51:51,926
So, I explicitly say, I want
to join employees and orders

1248
00:51:52,576 --> 00:51:55,166
but how do I want to join
them, I have to specify

1249
00:51:55,266 --> 00:51:58,616
on Employee_ID equals
Employee_ID from each

1250
00:51:58,616 --> 00:51:59,636
of the respective tables.

1251
00:52:00,076 --> 00:52:03,866
So, the end result is identical
and it's really up to you as to

1252
00:52:03,866 --> 00:52:05,476
which one is more clear.

1253
00:52:05,476 --> 00:52:08,856
Frankly, I almost always go with
the join syntax like this just

1254
00:52:08,856 --> 00:52:12,446
because it's super explicit
as to what I'm joining on what

1255
00:52:12,586 --> 00:52:15,166
and it's a little more
clear to me what's going on.

1256
00:52:15,166 --> 00:52:17,926
And again, the white space where
I fit enter is meaningless.

1257
00:52:18,026 --> 00:52:20,716
I just did this for formatting
reasons on the screen.

1258
00:52:21,246 --> 00:52:26,606
So that you've heard it now,
there's other types of joins.

1259
00:52:27,546 --> 00:52:33,586
There are left joins, there are
right joins and outer and inner

1260
00:52:33,586 --> 00:52:36,326
and now what the relevance is

1261
00:52:36,326 --> 00:52:38,976
for us is it won't really
affect you with project 1

1262
00:52:39,256 --> 00:52:42,086
but sometimes there
are corner cases,

1263
00:52:42,156 --> 00:52:47,066
like what if you don't
have-- suppose that you--

1264
00:52:47,366 --> 00:52:49,076
let's see, who is number three.

1265
00:52:49,076 --> 00:52:50,666
Suppose that Stephen was fired,

1266
00:52:50,916 --> 00:52:52,296
he no longer works
at the company.

1267
00:52:52,526 --> 00:52:54,896
So you go ahead and delete
him from the employees table.

1268
00:52:55,476 --> 00:52:58,246
You probably don't want to
lose his entire sales history

1269
00:52:58,246 --> 00:52:59,646
because if you're trying
to balance the books,

1270
00:52:59,646 --> 00:53:01,896
you want to know what you
sold and who sold it even

1271
00:53:01,896 --> 00:53:03,056
if that person is not there.

1272
00:53:03,396 --> 00:53:06,376
But the problem, though,
is that if Stephen is not

1273
00:53:06,376 --> 00:53:11,166
in the employees table anymore
and you do a join employee ID,

1274
00:53:11,486 --> 00:53:13,166
those rows are going
to disappear.

1275
00:53:13,686 --> 00:53:16,606
They will not appear in the
output and we won't know

1276
00:53:16,606 --> 00:53:18,826
that Stephen sold both
the table and a chair.

1277
00:53:19,146 --> 00:53:22,276
So, by using left join or right
join, you can essentially say

1278
00:53:22,496 --> 00:53:23,746
which of your two tables

1279
00:53:23,746 --> 00:53:27,476
that you're joining should
carry a little more weight.

1280
00:53:28,006 --> 00:53:32,316
So, in other words, if in
this case, I did a right join,

1281
00:53:32,816 --> 00:53:34,846
now which is the table on the
left and which is the right?

1282
00:53:34,846 --> 00:53:37,706
It's not quite obvious from the
way I've formatted the text here

1283
00:53:37,976 --> 00:53:41,026
but notice I said FROM
Employees JOIN Orders.

1284
00:53:41,026 --> 00:53:42,366
If you write that
out in a sentence,

1285
00:53:42,646 --> 00:53:45,326
which one's on the left
employees and then orders.

1286
00:53:45,326 --> 00:53:47,506
So, employee join orders,
if you write it out,

1287
00:53:47,506 --> 00:53:49,106
you have a left table
and a right table,

1288
00:53:49,326 --> 00:53:50,916
even though it might happen
to be on separate lines.

1289
00:53:51,396 --> 00:53:52,546
So what's the implication then?

1290
00:53:52,546 --> 00:53:57,436
If I actually said FROM
Employees right JOIN Orders,

1291
00:53:58,266 --> 00:54:01,516
that means that every
relevant row

1292
00:54:01,516 --> 00:54:04,336
in the orders table should be
included in the result set.

1293
00:54:04,836 --> 00:54:06,306
And if it just so happens

1294
00:54:06,576 --> 00:54:09,086
that there is no
corresponding employee ID

1295
00:54:09,206 --> 00:54:13,236
in the employees table because
Stephen was let go, that's OK,

1296
00:54:13,386 --> 00:54:16,396
still give me a row for
all of the stuff he sold

1297
00:54:16,536 --> 00:54:18,456
but put no there and no there.

1298
00:54:18,506 --> 00:54:20,926
You don't know who sold it but
you don't want to lose track

1299
00:54:20,926 --> 00:54:22,296
of the fact that it was sold.

1300
00:54:22,616 --> 00:54:25,206
Now, by contrast, a left join
would not solve this problem.

1301
00:54:25,486 --> 00:54:26,636
But if something were missing

1302
00:54:26,636 --> 00:54:29,466
from the orders table then
maybe you do a left join.

1303
00:54:29,606 --> 00:54:32,526
But the idea here is that
you bias it toward one table

1304
00:54:32,526 --> 00:54:35,266
or the other just so that
you don't accidentally drop

1305
00:54:35,556 --> 00:54:36,326
some rows.

1306
00:54:38,136 --> 00:54:44,566
All right, any questions
on joins and we'll try

1307
00:54:44,566 --> 00:54:48,546
to make this more
near term relevant

1308
00:54:48,546 --> 00:54:51,876
when we discuss later
today project 1.

1309
00:54:53,146 --> 00:54:53,486
All right.

1310
00:54:53,556 --> 00:54:55,986
So let's go ahead and take a
five-minute break but the teaser

1311
00:54:55,986 --> 00:54:58,306
for now is we're going to
come back and talk about milk

1312
00:54:58,306 --> 00:55:00,866
and going to the store and
putting in a refrigerator.

1313
00:55:01,006 --> 00:55:03,126
So hopefully that will get
you to come back all excited.

1314
00:55:03,636 --> 00:55:06,136
Take a five-minute break.

1315
00:55:06,136 --> 00:55:06,596
All right.

1316
00:55:06,776 --> 00:55:09,756
So, consider the
following scenario.

1317
00:55:09,756 --> 00:55:14,206
You have a roommate and you
have a refrigerator and you both

1318
00:55:14,256 --> 00:55:18,096
like milk and you open them--
one of you gets home one day

1319
00:55:18,096 --> 00:55:20,356
and you open the fridge
and you're out of milk.

1320
00:55:20,356 --> 00:55:23,936
And so, you close the
fridge, you head outside,

1321
00:55:24,016 --> 00:55:26,906
you walk across the street
to CVS and you get in line

1322
00:55:26,906 --> 00:55:28,416
to buy some milk and the lines

1323
00:55:28,416 --> 00:55:30,256
at CVS these days are
always ridiculously long

1324
00:55:30,256 --> 00:55:31,536
because they have those
self-checkout machines.

1325
00:55:31,536 --> 00:55:32,106
So it's perfect.

1326
00:55:32,346 --> 00:55:34,956
Because now your roommate comes
home while you're still waiting

1327
00:55:34,956 --> 00:55:37,216
in line at the self-checkout
machine at CVS.

1328
00:55:37,216 --> 00:55:39,606
He or she discovers
you're out of milk,

1329
00:55:39,786 --> 00:55:42,686
he or she closes the
refrigerator, walks outside,

1330
00:55:42,686 --> 00:55:45,206
goes to the supermarket
instead of CVS,

1331
00:55:45,206 --> 00:55:46,716
so you don't actually
cross paths.

1332
00:55:47,436 --> 00:55:50,126
Then some number of minutes
later, you both get home

1333
00:55:50,236 --> 00:55:53,706
and voila, you now have twice
as much milk and milk spoils

1334
00:55:53,706 --> 00:55:55,736
so this was not a very good
plan because you like milk

1335
00:55:55,736 --> 00:55:57,786
but you don't really like milk
so it's not like you're going

1336
00:55:57,786 --> 00:55:59,136
to drink two gallons of milk.

1337
00:55:59,456 --> 00:56:01,466
So now, you've wasted
some money in some milk.

1338
00:56:01,696 --> 00:56:01,986
All right.

1339
00:56:02,226 --> 00:56:03,446
It's a horrible problem, right?

1340
00:56:03,856 --> 00:56:05,666
So how do you solve
this in the real world?

1341
00:56:06,616 --> 00:56:10,066
How do you avoid getting
twice as much milk?

1342
00:56:10,466 --> 00:56:10,586
Yes?

1343
00:56:13,496 --> 00:56:16,476
>> I mean you could just
write a note on the fridge

1344
00:56:16,476 --> 00:56:17,446
that you've gone out to buy--

1345
00:56:17,446 --> 00:56:18,336
>> OK, good.

1346
00:56:18,446 --> 00:56:20,406
Right, so it's totally
reasonable decision.

1347
00:56:20,406 --> 00:56:21,966
If you are the first
one to come home

1348
00:56:21,966 --> 00:56:23,016
and you realize there's no milk,

1349
00:56:23,056 --> 00:56:24,606
just leave your roommate
a note, right?

1350
00:56:24,606 --> 00:56:25,676
It's a courteous thing to do.

1351
00:56:25,676 --> 00:56:26,616
You'll go get the milk.

1352
00:56:26,796 --> 00:56:27,916
He or she will appreciate it.

1353
00:56:27,916 --> 00:56:30,736
And so when he or she then comes
home, opens the fridge door

1354
00:56:30,736 --> 00:56:32,366
and realizes, oh,
there's no milk,

1355
00:56:32,606 --> 00:56:34,526
he or she does not
take it upon himself

1356
00:56:34,526 --> 00:56:36,366
to go to the store as well.

1357
00:56:36,636 --> 00:56:37,896
So this is good.

1358
00:56:37,896 --> 00:56:41,176
A note is very reasonable thing
and we can actually kind of ramp

1359
00:56:41,176 --> 00:56:42,726
up the metaphor a bit
and instead of saying

1360
00:56:42,726 --> 00:56:45,066
like leaving a note on the
fridge, you could be really kind

1361
00:56:45,066 --> 00:56:47,526
of crazy and just lock
the fridge altogether

1362
00:56:47,736 --> 00:56:49,426
if you're the first
person, right?

1363
00:56:49,426 --> 00:56:51,996
Because the problem here
fundamentally is that both

1364
00:56:51,996 --> 00:56:54,886
of you are independently
checking the state

1365
00:56:54,886 --> 00:56:55,856
of the refrigerator.

1366
00:56:56,076 --> 00:56:58,096
Now, you start thinking of
the refrigerator as a variable

1367
00:56:58,096 --> 00:57:00,466
that has some value and
that value is either zero

1368
00:57:00,466 --> 00:57:02,636
or one there's milk
or maybe it's a number

1369
00:57:02,636 --> 00:57:04,496
that indicates how many
ounces of milk you have,

1370
00:57:04,796 --> 00:57:07,576
but either way, it's a variable,
both of you at the moment

1371
00:57:07,576 --> 00:57:08,226
in the first version

1372
00:57:08,226 --> 00:57:11,416
of the story are
independently checking its state

1373
00:57:11,766 --> 00:57:14,226
but the problem is
that your operations

1374
00:57:14,226 --> 00:57:15,986
of checking the variable's state

1375
00:57:16,646 --> 00:57:19,896
and updating the variable's
state, in other words,

1376
00:57:19,896 --> 00:57:23,496
opening the fridge and looking
and then, buying more milk,

1377
00:57:23,816 --> 00:57:25,346
those operations
are not, atomic.

1378
00:57:25,796 --> 00:57:28,266
They can happen but there
can be interruptions

1379
00:57:28,316 --> 00:57:29,856
in between those two operations.

1380
00:57:29,856 --> 00:57:31,056
What's the interrupt here?

1381
00:57:31,356 --> 00:57:35,156
You check the variable's state,
you then start to go for milk,

1382
00:57:35,156 --> 00:57:37,426
your roommate then checks
the variable's state

1383
00:57:37,516 --> 00:57:42,046
and you both have correct
views of the world, but you,

1384
00:57:42,046 --> 00:57:44,066
the first person, have
made a decision based

1385
00:57:44,066 --> 00:57:45,636
on that variable's
initial state,

1386
00:57:45,636 --> 00:57:47,076
by walking across
the street to CVS.

1387
00:57:47,656 --> 00:57:51,386
So when you come back is
when the problem ensues.

1388
00:57:51,796 --> 00:57:54,756
So again, a note will solve this
problem assuming your roommate

1389
00:57:54,756 --> 00:57:56,816
reads it and if he or she is
not really paying attention

1390
00:57:56,816 --> 00:57:58,726
to the post-it notes, you
could literally put a lock

1391
00:57:58,726 --> 00:57:59,606
on the refrigerator.

1392
00:57:59,606 --> 00:58:00,596
What would that prevent?

1393
00:58:00,866 --> 00:58:03,006
Well that would physically
prevent your roommate

1394
00:58:03,206 --> 00:58:05,266
from checking the
value of that variable.

1395
00:58:05,266 --> 00:58:06,586
Now, it could be
a little awkward

1396
00:58:06,586 --> 00:58:08,186
because if they're really
determined to have milk,

1397
00:58:08,186 --> 00:58:09,366
you're going to come home
and they're still going

1398
00:58:09,366 --> 00:58:10,576
to be holding on to the door

1399
00:58:10,806 --> 00:58:13,336
because you've blocked
them, so to speak.

1400
00:58:13,336 --> 00:58:14,906
They're in what's
called deadlock mode

1401
00:58:14,906 --> 00:58:17,436
if we really abuse this
analogy to computer stuff.

1402
00:58:17,476 --> 00:58:20,216
So they're in deadlock because
nothing can actually happen

1403
00:58:20,216 --> 00:58:22,136
there or they're spinning
and waiting for the lock.

1404
00:58:22,136 --> 00:58:23,466
Deadlock usually
involves two people.

1405
00:58:24,136 --> 00:58:27,806
So, we can solve that problem
in the real world, but now,

1406
00:58:27,806 --> 00:58:32,196
let's consider a related problem
in the world of databases.

1407
00:58:32,496 --> 00:58:35,896
So, the same scenario arouse
in conversation last week,

1408
00:58:35,896 --> 00:58:38,206
when we were talking about
registering for a website

1409
00:58:38,206 --> 00:58:39,986
or buying something
from a website.

1410
00:58:40,276 --> 00:58:42,526
For instance in a simple
scenario of registration,

1411
00:58:42,786 --> 00:58:45,016
suppose that your
name is Mike Smith

1412
00:58:45,306 --> 00:58:47,236
and someone else's
name is Mike Smith.

1413
00:58:47,236 --> 00:58:50,306
And just by bad lock, both of
you hop on the internet one day

1414
00:58:50,306 --> 00:58:53,126
and tried to register for
the very new facebook.com,

1415
00:58:53,126 --> 00:58:54,016
the next big thing.

1416
00:58:54,266 --> 00:58:55,896
And you really want Smith

1417
00:58:56,116 --> 00:58:58,426
as your username or
Mike or M. Smith.

1418
00:58:58,426 --> 00:59:00,606
Whatever the case may be you
both what the same username.

1419
00:59:00,876 --> 00:59:02,896
And suppose that by bad
luck, you're both sitting

1420
00:59:02,896 --> 00:59:04,736
at your laptops and
you both hit enter

1421
00:59:04,936 --> 00:59:07,246
at the same time,
well, what happens?

1422
00:59:07,246 --> 00:59:09,176
Well, those HTTP
requests go to the server,

1423
00:59:09,176 --> 00:59:11,766
the server has a database
now, so there's some code

1424
00:59:11,766 --> 00:59:14,946
that passes those username
requests off to the database

1425
00:59:15,216 --> 00:59:16,806
and what's going to happen?

1426
00:59:16,806 --> 00:59:19,776
Well even though, we like
to think of computers

1427
00:59:19,776 --> 00:59:22,306
as doing dozens of things at
once because they seem to be.

1428
00:59:22,536 --> 00:59:24,636
Really, they're typically
doing one thing

1429
00:59:24,636 --> 00:59:27,526
or maybe a finite number
of things simultaneously.

1430
00:59:27,826 --> 00:59:31,146
And in this case let's consider
a simple story whereby your two

1431
00:59:31,146 --> 00:59:34,586
requests might have arrived
at the same time at the server

1432
00:59:34,656 --> 00:59:38,346
for the username Smith, but one
of you has got to win, right?

1433
00:59:38,346 --> 00:59:40,036
One of them will
be serviced ever

1434
00:59:40,036 --> 00:59:41,896
so quickly before the other one.

1435
00:59:42,226 --> 00:59:43,266
So what's the problem?

1436
00:59:43,496 --> 00:59:47,116
What if that first, request
is, is Smith available?

1437
00:59:47,836 --> 00:59:50,146
This guy checks, but then
the computer again needs

1438
00:59:50,146 --> 00:59:53,886
to maintain this
illusion of parallelism,

1439
00:59:54,046 --> 00:59:57,416
so as we'd said last week this
thread, your request is going

1440
00:59:57,416 --> 00:59:59,896
to be put to sleep, maybe
for a second, probably,

1441
00:59:59,896 --> 01:00:03,356
just for a split second at which
point the other person's thread

1442
01:00:03,356 --> 01:00:05,436
is going to be a
awoken and it's going

1443
01:00:05,436 --> 01:00:06,966
to check, is Smith available?

1444
01:00:07,126 --> 01:00:09,716
The answer to that query
is also going to be yes

1445
01:00:10,216 --> 01:00:13,416
if you just use a simple
MySQL SELECT statement, right?

1446
01:00:13,416 --> 01:00:14,076
You might select.

1447
01:00:14,076 --> 01:00:17,216
Select star from users
where username equals Smith.

1448
01:00:17,696 --> 01:00:20,726
If that returns zero rows,
this guy knows it's available.

1449
01:00:20,726 --> 01:00:23,866
If that query returns zero rows,
that means this guy can have

1450
01:00:23,866 --> 01:00:24,876
that username as well.

1451
01:00:25,136 --> 01:00:27,366
So suppose you issue your
SELECT statement first.

1452
01:00:28,016 --> 01:00:30,906
But then in your
next line of code is

1453
01:00:30,906 --> 01:00:32,706
when you actually
do your insert.

1454
01:00:32,906 --> 01:00:36,206
How do you create a new user,
you insert into that table.

1455
01:00:36,566 --> 01:00:37,606
But what's going to happen?

1456
01:00:37,796 --> 01:00:43,456
So the first user has his choice
of username, Smith, inserted.

1457
01:00:43,976 --> 01:00:45,936
The second one, what happens?

1458
01:00:46,606 --> 01:00:49,626
Now there's a collision, right?

1459
01:00:49,626 --> 01:00:51,566
And so the MySQL query function

1460
01:00:51,566 --> 01:00:53,006
or the corresponding
PDO is going

1461
01:00:53,006 --> 01:00:54,806
to return some kind of error.

1462
01:00:54,986 --> 01:00:56,296
And yet why should
it be an error?

1463
01:00:56,296 --> 01:00:59,926
You told both of these
Michaels that Smith is available

1464
01:01:00,236 --> 01:01:03,936
so this is an incorrect
result, quite bad.

1465
01:01:04,216 --> 01:01:07,526
Now instead, let's assume a
more compelling scenario, right?

1466
01:01:07,526 --> 01:01:10,466
OK, so what, the second Mike
Smith didn't get his favorite

1467
01:01:10,466 --> 01:01:12,036
username, so not a huge deal.

1468
01:01:12,336 --> 01:01:14,776
But now consider the
case of money or ATM.

1469
01:01:15,346 --> 01:01:17,986
So if you want to go to a
cash machine and suppose

1470
01:01:17,986 --> 01:01:20,836
that you're being really
crafty, you're a bad guy,

1471
01:01:21,096 --> 01:01:22,316
and you've somehow
figured out how

1472
01:01:22,316 --> 01:01:25,146
to duplicate your plastic
ATM card, that's not hard,

1473
01:01:25,146 --> 01:01:26,176
it's just a magnetic strip.

1474
01:01:26,496 --> 01:01:29,326
And suppose you go up to an ATM
and you cover up the cameras

1475
01:01:29,326 --> 01:01:30,296
so there's no corner cases,

1476
01:01:30,416 --> 01:01:33,766
and you put in the two ATM cards
simultaneously and you log in

1477
01:01:34,126 --> 01:01:37,706
and then you do something
like, I want to do a withdraw

1478
01:01:37,766 --> 01:01:42,996
of $100 enter simultaneously
on these two machines.

1479
01:01:44,066 --> 01:01:45,336
How is this story similar?

1480
01:01:45,456 --> 01:01:47,746
Where in lies the same problem?

1481
01:01:48,146 --> 01:01:48,456
Axel?

1482
01:01:48,716 --> 01:01:52,616
>> When the ATMs checks the
account for balance or whatever,

1483
01:01:52,866 --> 01:01:55,316
it's going to return the
same value to both ATMs.

1484
01:01:55,526 --> 01:01:55,756
>> Good.

1485
01:01:55,756 --> 01:01:57,586
>> So essentially it's going
to withdraw the same amount

1486
01:01:57,866 --> 01:01:59,446
from the same-- from
the same balance

1487
01:01:59,446 --> 01:02:02,986
and the end result is
going to be the same.

1488
01:02:02,986 --> 01:02:03,686
>> OK, exactly.

1489
01:02:03,686 --> 01:02:05,886
So if it's really the
same kind of story,

1490
01:02:05,886 --> 01:02:08,106
it's just the variable you're
checking now is your account

1491
01:02:08,106 --> 01:02:12,466
balance and suppose that you
only have $100 in your account,

1492
01:02:12,776 --> 01:02:14,486
you being the bad guy,

1493
01:02:14,606 --> 01:02:16,616
but you've executed this
query simultaneously.

1494
01:02:16,616 --> 01:02:19,166
So just as in the case of
milk, just as in the case

1495
01:02:19,166 --> 01:02:24,546
of the username checking, both
ATMs say select star from,

1496
01:02:24,546 --> 01:02:31,146
let's say, accounts where user
ID equals 12345, whatever,

1497
01:02:31,146 --> 01:02:33,506
that is the unique
identifier on my ATM card.

1498
01:02:33,706 --> 01:02:35,326
And what I'm selecting
this time is balance

1499
01:02:35,816 --> 01:02:39,526
and the balance comes back and
I say, "Oh, this user has $100"

1500
01:02:39,746 --> 01:02:41,616
but both ATMs get
that same answer.

1501
01:02:41,616 --> 01:02:43,146
So what does an ATM
do then next?

1502
01:02:43,486 --> 01:02:46,786
Well, if there is $100 in the
account, what should it do?

1503
01:02:46,786 --> 01:02:50,596
It should dispense $100
from both machines.

1504
01:02:51,406 --> 01:02:55,976
So now you've withdrawn $200
even though you only had $100

1505
01:02:56,226 --> 01:03:01,386
and somehow your account now is
at negative 100 or worse, zero,

1506
01:03:01,926 --> 01:03:03,786
right, if the math just
kind of works out that way

1507
01:03:03,786 --> 01:03:05,126
and they're using
an unsigned insert

1508
01:03:05,126 --> 01:03:06,716
because it doesn't really
makes sense for a bank

1509
01:03:06,716 --> 01:03:08,606
to just give you more money
than you actually have,

1510
01:03:08,606 --> 01:03:10,666
unless you actually signed
paperwork for a loan.

1511
01:03:11,076 --> 01:03:13,386
So in the worse case from
the bank's perspective,

1512
01:03:13,386 --> 01:03:15,996
they have just given
a guy who owns $100,

1513
01:03:16,356 --> 01:03:18,156
$200 and they don't even know it

1514
01:03:18,156 --> 01:03:20,096
because the account
balance is now zero.

1515
01:03:20,526 --> 01:03:22,226
So how could we have
solved this problem?

1516
01:03:23,416 --> 01:03:25,146
How could we have solved
the username problem

1517
01:03:25,496 --> 01:03:28,396
so that the expected
outcome is indeed correct?

1518
01:03:29,806 --> 01:03:29,916
Yeah?

1519
01:03:30,446 --> 01:03:31,546
>> Lock the fridge.

1520
01:03:31,546 --> 01:03:32,776
>> OK, so lock the fridge.

1521
01:03:32,776 --> 01:03:34,856
So what does that mean
in the case of the ATM?

1522
01:03:35,086 --> 01:03:38,546
Well, you just have to ensure
that after you ask a question

1523
01:03:38,546 --> 01:03:40,726
like how much money
does this guy have,

1524
01:03:41,166 --> 01:03:45,356
you then perform the deduction
before answering any similar

1525
01:03:45,356 --> 01:03:49,806
queries for other ATMs
for that particular guy.

1526
01:03:50,156 --> 01:03:52,516
You somehow need to
make your two operations

1527
01:03:52,766 --> 01:03:53,796
of checking balance

1528
01:03:54,316 --> 01:03:57,246
and withdrawing money
atomic, so to speak.

1529
01:03:57,246 --> 01:04:00,146
Atomic in the sense that it's
like a very small particle,

1530
01:04:00,286 --> 01:04:02,756
it all happens at once together.

1531
01:04:03,066 --> 01:04:04,176
So how do we do this?

1532
01:04:04,176 --> 01:04:05,516
We don't really have
this way of--

1533
01:04:05,516 --> 01:04:07,436
we don't really have a way
of doing this in code just

1534
01:04:07,436 --> 01:04:10,586
yet because in PHP, recall,
when we were doing all those log

1535
01:04:10,586 --> 01:04:13,976
in examples last week, we called
MySQL query and then MySQL query

1536
01:04:13,976 --> 01:04:15,436
in different places of code.

1537
01:04:15,726 --> 01:04:16,636
You want to somehow be able

1538
01:04:16,636 --> 01:04:19,086
to execute two queries
simultaneously.

1539
01:04:19,316 --> 01:04:21,096
Now, you saw semicolons in use

1540
01:04:21,096 --> 01:04:22,636
for the MySQL command
line client,

1541
01:04:22,686 --> 01:04:23,806
the black and white interface.

1542
01:04:23,996 --> 01:04:25,946
But it's not enough to
just separate your queries

1543
01:04:25,946 --> 01:04:28,596
by semicolons and send them
both to the database at once

1544
01:04:28,596 --> 01:04:29,966
because they can
still get interrupted.

1545
01:04:30,246 --> 01:04:31,686
You have to be more explicit

1546
01:04:31,686 --> 01:04:35,276
when you want your database
operations to be atomic

1547
01:04:35,276 --> 01:04:37,466
and there's a couple
of ways to do that.

1548
01:04:37,856 --> 01:04:42,356
There is this handy
syntax in MySQL

1549
01:04:42,586 --> 01:04:44,786
where you can do the following.

1550
01:04:44,786 --> 01:04:46,786
This doesn't quite
solve the ATM example

1551
01:04:46,786 --> 01:04:48,706
but it does solve
similar problems

1552
01:04:48,976 --> 01:04:50,636
and that you can express
yourself as follows.

1553
01:04:50,636 --> 01:04:54,066
INSERT INTO table, whatever
it's called, columns a, b, c,

1554
01:04:54,496 --> 01:04:56,566
what values do you want
to insert for a, b, and c?

1555
01:04:56,566 --> 01:04:58,986
Let's arbitrarily say
the value is 1, 2, 3.

1556
01:04:59,456 --> 01:05:05,316
However, if one of those
fields, say, a is defined

1557
01:05:05,316 --> 01:05:11,406
as a primary key, unique
key, and if you are trying

1558
01:05:11,406 --> 01:05:15,456
to insert a duplicate key,
this syntax let's you say,

1559
01:05:15,696 --> 01:05:16,836
"Mm-mm, don't do that.

1560
01:05:16,836 --> 01:05:19,936
Instead, just update one
of those field's values."

1561
01:05:20,186 --> 01:05:21,386
So what's the implication here?

1562
01:05:21,386 --> 01:05:23,236
Let's actually take an example

1563
01:05:25,126 --> 01:05:28,796
like let's say ordering
something--

1564
01:05:29,046 --> 01:05:30,036
let's come up with
a good one here.

1565
01:05:30,256 --> 01:05:32,376
Let's actually use
stocks, stocks symbols.

1566
01:05:32,586 --> 01:05:35,406
So suppose you're implementing
a website for keeping track

1567
01:05:35,406 --> 01:05:37,596
of what shares of
stock someone owns

1568
01:05:37,976 --> 01:05:39,526
and you just want a total count.

1569
01:05:39,526 --> 01:05:40,926
How many shares of
Google do I own?

1570
01:05:40,926 --> 01:05:43,186
How many shares of Microsoft
do I own and so forth.

1571
01:05:43,396 --> 01:05:45,016
So in whatever table
you're using

1572
01:05:45,356 --> 01:05:49,686
to store a user's portfolio,
you simply want to have one row

1573
01:05:49,686 --> 01:05:53,066
for Google maximally and in
another column next to Google,

1574
01:05:53,066 --> 01:05:54,536
you want the total
number of shares.

1575
01:05:54,756 --> 01:05:56,156
And if it's one, that's fine,

1576
01:05:56,406 --> 01:05:58,496
but if the user buys
some Google shares today

1577
01:05:58,496 --> 01:06:00,086
and then buy some more tomorrow,

1578
01:06:00,336 --> 01:06:01,816
you want to update
that same row.

1579
01:06:01,816 --> 01:06:04,016
You do not want duplicate
Google rows just

1580
01:06:04,016 --> 01:06:04,736
because there's no need.

1581
01:06:04,736 --> 01:06:05,386
It's inefficient.

1582
01:06:05,386 --> 01:06:07,166
You can just update
that particular value

1583
01:06:07,236 --> 01:06:08,566
in this version of the story.

1584
01:06:09,016 --> 01:06:11,466
So with this kind of query,
you could say exactly that.

1585
01:06:11,636 --> 01:06:15,406
INSERT INTO table and then
in parenthesis "Google"

1586
01:06:15,406 --> 01:06:18,246
or whatever its stock
symbol is, comma something,

1587
01:06:18,496 --> 01:06:22,446
comma 1 where the 1 is
the number of shares

1588
01:06:22,446 --> 01:06:23,416
that you're trying to buy.

1589
01:06:24,256 --> 01:06:28,186
If, though, there is a duplicate
key, well, what's likely

1590
01:06:28,186 --> 01:06:29,816
to be the unique key here?

1591
01:06:29,816 --> 01:06:32,986
Goog, G-O-O-G happens to
be the symbol for Google.

1592
01:06:33,226 --> 01:06:37,146
So if you already have a row in
this table with a key of Goog,

1593
01:06:37,326 --> 01:06:39,046
what is this code
telling it to do?

1594
01:06:39,046 --> 01:06:41,206
It's instead saying,
"Don't insert a new row

1595
01:06:41,206 --> 01:06:43,486
with the values a,
b, c. Instead,

1596
01:06:43,486 --> 01:06:46,026
just update the existing
row c value

1597
01:06:46,026 --> 01:06:47,256
by incrementing it by one."

1598
01:06:48,146 --> 01:06:50,226
So in other words, this
is one way of expressing

1599
01:06:50,496 --> 01:06:52,946
with a new query that we
haven't seen before with the

1600
01:06:52,946 --> 01:06:56,796
ON DUPLICATE KEY syntax
essentially checking

1601
01:06:56,796 --> 01:07:00,806
for the presence of a row or
a value and then acting based

1602
01:07:00,806 --> 01:07:04,016
on that check all
atomically, all at once.

1603
01:07:04,516 --> 01:07:07,026
By contrast, if you had to do
this manually, you would need

1604
01:07:07,026 --> 01:07:09,376
to do a select to first find
out if there's already a row

1605
01:07:09,376 --> 01:07:12,566
for Google, then you would
do an update or an insert.

1606
01:07:12,866 --> 01:07:16,256
This allows you to collapse a
select followed by an update

1607
01:07:16,256 --> 01:07:18,156
or an insert into
just one query.

1608
01:07:18,436 --> 01:07:19,736
So that's one way of doing this

1609
01:07:19,736 --> 01:07:22,446
but unfortunately
that is not enough.

1610
01:07:22,816 --> 01:07:26,236
So thankfully, databases
typically support transaction

1611
01:07:26,306 --> 01:07:26,846
these days.

1612
01:07:27,316 --> 01:07:29,106
So InnoDB is now relevant.

1613
01:07:29,296 --> 01:07:32,526
The storage engine
that we chose earlier

1614
01:07:32,756 --> 01:07:34,986
for my database was
InnoDB by default.

1615
01:07:35,286 --> 01:07:36,526
That just means that's
the format

1616
01:07:36,526 --> 01:07:37,906
in which all of my
data is stored.

1617
01:07:38,316 --> 01:07:39,626
So what does that really mean?

1618
01:07:39,626 --> 01:07:42,446
Well, that means I have a
feature called transactions

1619
01:07:42,726 --> 01:07:45,846
that I can use on that database
engine type but not on others.

1620
01:07:46,296 --> 01:07:48,496
So how do you use transactions?

1621
01:07:48,766 --> 01:07:49,556
You can do this.

1622
01:07:49,556 --> 01:07:53,006
These four lines represent
four separate SQL statements.

1623
01:07:53,316 --> 01:07:55,226
The first, START TRANSACTION

1624
01:07:55,226 --> 01:07:57,776
and you can also say begin
transaction, they're synonyms.

1625
01:07:58,186 --> 01:08:01,946
That just means here comes
a sequence of SQL queries

1626
01:08:02,136 --> 01:08:04,116
that I want to execute
atomically.

1627
01:08:04,446 --> 01:08:07,656
In other words, execute all
of these together or none

1628
01:08:07,656 --> 01:08:10,866
of them do not interrupt
anyone of these steps

1629
01:08:10,916 --> 01:08:13,166
with someone else's queries
or anything like that.

1630
01:08:13,266 --> 01:08:15,716
Put a lock on these rows.

1631
01:08:16,066 --> 01:08:20,276
So UPDATE account SET balance =
balance - 1000 WHERE number = 2,

1632
01:08:20,276 --> 01:08:23,996
UPDATE account SET balance =
balance + 1000 WHERE number = 1.

1633
01:08:24,236 --> 01:08:25,236
What's the context here?

1634
01:08:25,236 --> 01:08:26,246
It's kind of arbitrary.

1635
01:08:26,456 --> 01:08:29,046
But imagine that I'm just
trying to transfer $1000

1636
01:08:29,046 --> 01:08:30,316
between two different accounts.

1637
01:08:30,596 --> 01:08:32,606
One of those accounts,
account number is one,

1638
01:08:32,686 --> 01:08:34,196
the other account number is two.

1639
01:08:34,616 --> 01:08:36,906
So this is one of
those scenarios much

1640
01:08:36,906 --> 01:08:40,746
like the ATM withdrawal that
I want both the deduction

1641
01:08:40,746 --> 01:08:43,386
from one account and the
deposit into the other

1642
01:08:43,556 --> 01:08:47,716
to happen together or not at
all, otherwise I'm going to end

1643
01:08:47,716 --> 01:08:52,036
up losing $1000 accidentally
or gaining $1000 accidentally

1644
01:08:52,036 --> 01:08:56,156
and not just moving $1000
from one account to the other.

1645
01:08:56,506 --> 01:08:58,166
COMMIT, as the name suggests,

1646
01:08:58,166 --> 01:09:00,506
means commit this
transaction to the database.

1647
01:09:00,506 --> 01:09:01,726
In other words, execute.

1648
01:09:01,986 --> 01:09:04,076
So by saying START TRANSACTION,

1649
01:09:04,076 --> 01:09:05,816
you're essentially telling
the database to buffer

1650
01:09:05,816 --> 01:09:07,706
up a few SQL statements.

1651
01:09:08,006 --> 01:09:10,136
Once you do COMMIT,
it executes them all

1652
01:09:10,136 --> 01:09:13,096
without letting anyone
else squeeze inside there.

1653
01:09:13,386 --> 01:09:17,616
What's also nice is that there's
a command we can use that's sort

1654
01:09:17,616 --> 01:09:20,006
of the opposite of
COMMIT, it's ROLLBACK.

1655
01:09:20,496 --> 01:09:23,006
So this is another
powerful feature of InnoDB.

1656
01:09:23,006 --> 01:09:26,156
Whereby suppose we do the
following, start transaction,

1657
01:09:26,206 --> 01:09:28,066
update one account,
update the other account

1658
01:09:28,066 --> 01:09:31,196
and now you do a check, select
the balance from the accounts

1659
01:09:31,196 --> 01:09:33,986
where number equals 2 and then
in my-- with my hash sign there,

1660
01:09:34,286 --> 01:09:36,116
I'm just proposing that
we have some PHP code.

1661
01:09:36,116 --> 01:09:38,376
It's in the comment because I
just mean it to be pseudo code.

1662
01:09:38,676 --> 01:09:41,616
Suppose that I screwed up and
after selecting the balance

1663
01:09:41,616 --> 01:09:44,726
from account where number
equals 2, now I realize, shoot,

1664
01:09:44,726 --> 01:09:45,826
now the user has gone

1665
01:09:45,826 --> 01:09:48,686
to a negative $1000
or negative $1 even.

1666
01:09:48,966 --> 01:09:53,206
I noticed a problem and I want
to undo every one of the steps

1667
01:09:53,316 --> 01:09:56,826
that I just executed whether
it's three steps or two or ten.

1668
01:09:57,436 --> 01:09:58,676
I want to undo these changes.

1669
01:09:58,676 --> 01:10:01,266
It's like hitting command
or control Z, Z, Z, Z,

1670
01:10:01,266 --> 01:10:03,846
Z when you really screw up in an
essay you want to just go back,

1671
01:10:03,846 --> 01:10:04,616
back, back, back, back

1672
01:10:04,616 --> 01:10:05,976
or in some programming
code you're writing.

1673
01:10:06,446 --> 01:10:08,826
So ROLLBACK undoes everything

1674
01:10:09,136 --> 01:10:10,926
up until the most
recent transaction.

1675
01:10:10,926 --> 01:10:14,036
So it's wonderfully powerful
and allows you to try

1676
01:10:14,036 --> 01:10:15,186
to achieve some result.

1677
01:10:15,346 --> 01:10:18,276
But if you can't, you don't have
to figure out how to fix all

1678
01:10:18,276 --> 01:10:21,386
of your various tables
and cells based

1679
01:10:21,386 --> 01:10:22,756
on the changes you
made thus far.

1680
01:10:23,796 --> 01:10:26,776
So wonderfully powerful and it's
also quite efficient and it's

1681
01:10:26,776 --> 01:10:28,636
in contrast to something
called locks.

1682
01:10:29,086 --> 01:10:30,686
So MyISAM, for many years,

1683
01:10:30,686 --> 01:10:34,096
was the more popular
default engine for MySQL.

1684
01:10:34,626 --> 01:10:37,106
The upside of MyISAM is
that it actually tends

1685
01:10:37,106 --> 01:10:39,946
in certain benchmarks to be
faster than InnoDB especially

1686
01:10:39,946 --> 01:10:42,156
for selects and sometimes
writes,

1687
01:10:42,616 --> 01:10:44,626
but you don't get
the same property

1688
01:10:44,626 --> 01:10:47,106
of having transaction
support but you do have locks.

1689
01:10:47,536 --> 01:10:50,166
In MyISAM tables, which
is just another option

1690
01:10:50,166 --> 01:10:52,836
from that dropdown, essentially
the data then is stored

1691
01:10:52,836 --> 01:10:53,946
in a slightly different format.

1692
01:10:54,336 --> 01:10:55,926
But if you want to
achieve the same idea,

1693
01:10:55,926 --> 01:10:57,956
you have to literally
lock the whole table.

1694
01:10:58,256 --> 01:11:01,426
So the syntax for that which is
less commonly necessary now is

1695
01:11:01,546 --> 01:11:03,396
LOCK TABLES account WRITE.

1696
01:11:03,806 --> 01:11:04,666
Now, what does that mean?

1697
01:11:04,736 --> 01:11:05,986
Account is the table name.

1698
01:11:06,266 --> 01:11:09,046
And what kind of things do
you prevent people from doing?

1699
01:11:09,376 --> 01:11:10,066
From writing.

1700
01:11:10,356 --> 01:11:12,496
So you can think of this is as
like the padlock on the fridge.

1701
01:11:12,496 --> 01:11:15,276
You're preventing your roommate
from writing to the fridge

1702
01:11:15,506 --> 01:11:18,276
by inserting more milk
into that variable.

1703
01:11:18,586 --> 01:11:21,816
So this just means don't let--
anyone else can read the table,

1704
01:11:21,816 --> 01:11:23,496
I don't really care,
but I don't want them

1705
01:11:23,496 --> 01:11:25,716
to write to the table here.

1706
01:11:26,286 --> 01:11:28,746
So as now, I'm going to select
the balance and then I'm going

1707
01:11:28,746 --> 01:11:30,176
to go ahead and update
the balance,

1708
01:11:30,176 --> 01:11:32,726
this time equal to 1500.

1709
01:11:32,726 --> 01:11:34,346
And then I'm going
to unlock the tables.

1710
01:11:34,546 --> 01:11:36,226
So in effect, you can
do the same thing.

1711
01:11:36,226 --> 01:11:37,986
You can say execute
all of this stuff

1712
01:11:38,406 --> 01:11:39,896
and don't let anyone
get in the middle.

1713
01:11:40,076 --> 01:11:42,126
But the problem with
locks in MyISAM is

1714
01:11:42,126 --> 01:11:45,486
that you literally lock the
entire table which means even

1715
01:11:45,486 --> 01:11:47,966
if someone wants to check
someone else's account balance

1716
01:11:48,226 --> 01:11:51,236
or someone else wants to make a
deposit that has nothing to do

1717
01:11:51,236 --> 01:11:52,416
with account number 2.

1718
01:11:52,736 --> 01:11:54,066
The entire table is locked.

1719
01:11:54,406 --> 01:11:57,206
So this tends to be
bad for efficiency

1720
01:11:57,326 --> 01:11:59,136
if you're essentially
telling everyone,

1721
01:11:59,136 --> 01:12:01,486
every other row especially
if you have a million rows,

1722
01:12:01,696 --> 01:12:04,586
no one else can deposit money,
check their account balances

1723
01:12:04,586 --> 01:12:07,386
or anything because I am
blocking this whole table.

1724
01:12:07,866 --> 01:12:09,346
Transactions in InnoDB

1725
01:12:09,346 --> 01:12:11,936
by contrast effectively
only lock the rows

1726
01:12:11,936 --> 01:12:12,766
that you care about.

1727
01:12:13,206 --> 01:12:15,686
So it's much better for
performance at least

1728
01:12:15,686 --> 01:12:16,906
in terms of transactions.

1729
01:12:16,906 --> 01:12:18,706
So that there is the tradeoff.

1730
01:12:19,196 --> 01:12:23,506
All right, questions?

1731
01:12:24,046 --> 01:12:28,366
So let me come back to one thing

1732
01:12:28,366 --> 01:12:30,286
that relates now
to InnoDB as well.

1733
01:12:30,516 --> 01:12:33,056
So recall that we were
re-implementing these couple

1734
01:12:33,056 --> 01:12:36,056
of tables for the example
involving employees

1735
01:12:36,056 --> 01:12:36,696
and products.

1736
01:12:37,116 --> 01:12:39,146
And suppose now, what I want

1737
01:12:39,146 --> 01:12:43,446
to do is refine a little bit
this thing here, Employee_ID,

1738
01:12:43,696 --> 01:12:45,606
recall we defined
an index on it.

1739
01:12:45,896 --> 01:12:46,706
Now, why do we do that?

1740
01:12:46,706 --> 01:12:47,736
For performance really.

1741
01:12:47,736 --> 01:12:48,796
It's a unique identifier.

1742
01:12:48,796 --> 01:12:50,326
I might want to search
by employee ID

1743
01:12:50,326 --> 01:12:51,286
so I made it an index.

1744
01:12:51,756 --> 01:12:55,466
But notice this, I'm going
to go down to relation view

1745
01:12:55,466 --> 01:12:58,826
in phpMyAdmin which is this link
here below the table summary.

1746
01:12:59,166 --> 01:13:02,746
And now, notice that, oops, did
I not actually check that box.

1747
01:13:02,886 --> 01:13:04,946
I'm going to go back.

1748
01:13:04,946 --> 01:13:06,826
Let's see, employees, we have.

1749
01:13:08,166 --> 01:13:10,956
So here's how you look up
what indexes you've created

1750
01:13:10,956 --> 01:13:12,156
in my-- phpMyAdmin.

1751
01:13:12,156 --> 01:13:13,936
I've clicked the
employees table.

1752
01:13:13,936 --> 01:13:16,846
And here I see a reminder of
what the columns are called.

1753
01:13:17,226 --> 01:13:21,426
If I click Indexes, I'll see
that this tables has two indexes

1754
01:13:21,426 --> 01:13:26,446
at the moment, a primary
index on the Employee_ID field

1755
01:13:26,656 --> 01:13:28,526
and it's unique by
definition of primary.

1756
01:13:28,836 --> 01:13:32,876
And then I remember that I also
put an index on Name which was

1757
01:13:33,126 --> 01:13:35,126
on the Name column
and it's BTREE.

1758
01:13:35,126 --> 01:13:37,626
That's an illusion to
the tree structure.

1759
01:13:37,986 --> 01:13:41,456
That actually does the--
That optimizes the searching.

1760
01:13:41,846 --> 01:13:43,126
But now, let me go
back to Orders

1761
01:13:43,126 --> 01:13:45,366
because I actually think I did
not check the box correctly

1762
01:13:45,366 --> 01:13:46,936
earlier and look at my Indexes.

1763
01:13:46,936 --> 01:13:50,256
And indeed-- OK, I
completely screwed up earlier.

1764
01:13:50,556 --> 01:13:53,326
I didn't mean to check,
select all of those things.

1765
01:13:53,326 --> 01:13:55,416
This is a shortcoming
of my memory

1766
01:13:55,416 --> 01:13:57,216
and of phpMyAdmin's interface.

1767
01:13:57,916 --> 01:14:03,496
What I did here in defining
an index, I defined it on all

1768
01:14:03,496 --> 01:14:05,936
of three fields which
was not what I intended.

1769
01:14:05,936 --> 01:14:09,126
I wanted to have an index on
each of the individual field

1770
01:14:09,126 --> 01:14:11,596
so that I could search on
each field individually.

1771
01:14:11,946 --> 01:14:15,046
This index suggests
that I want to select--

1772
01:14:15,416 --> 01:14:18,086
I want to search on all
three fields simultaneously.

1773
01:14:18,426 --> 01:14:22,476
As with the where statement
like where product ID equals 1

1774
01:14:22,656 --> 01:14:27,226
and product equals table
and Employee_ID equals 3.

1775
01:14:27,636 --> 01:14:30,076
So that's not the kind
of query I had in mind.

1776
01:14:30,346 --> 01:14:31,146
So I'm going to fix this.

1777
01:14:31,206 --> 01:14:35,196
Let me click Drop next to this
key rather next to this index.

1778
01:14:35,196 --> 01:14:37,396
And let me do this manually.

1779
01:14:37,446 --> 01:14:40,936
So I'm going to go
here and add a index.

1780
01:14:41,596 --> 01:14:43,416
I'm going to go here
and add an index

1781
01:14:43,596 --> 01:14:45,036
and over here and add an index.

1782
01:14:45,506 --> 01:14:47,976
Unfortunately, phpMyAdmin,
when you do it all at once,

1783
01:14:47,976 --> 01:14:49,866
apparently assumes
you want a joint key.

1784
01:14:50,156 --> 01:14:51,836
So now, I have three
separate indices

1785
01:14:52,116 --> 01:14:54,046
which is the intended behavior.

1786
01:14:54,326 --> 01:14:57,116
And the link I'm going to
click now is relation view.

1787
01:14:57,116 --> 01:15:00,526
And this is another feature you
get with a database like MySQL

1788
01:15:00,526 --> 01:15:03,416
that you don't get with XML
or with CSV or the like.

1789
01:15:03,996 --> 01:15:06,976
Notice under Relations, so I
can now define what are called

1790
01:15:07,096 --> 01:15:08,326
foreign key constraints.

1791
01:15:08,326 --> 01:15:10,666
And that's a feature
of InnoDB only.

1792
01:15:11,296 --> 01:15:13,386
In other words, I can
now tell the database

1793
01:15:13,956 --> 01:15:16,546
where these fields also appear.

1794
01:15:16,796 --> 01:15:20,436
And the one I care about at
the moment is Employee_ID.

1795
01:15:21,046 --> 01:15:24,306
Notice here that under this
checkbox, I have a list of all

1796
01:15:24,306 --> 01:15:27,256
of the other fields I've defined
in other tables with indices.

1797
01:15:27,726 --> 01:15:31,256
And the one I want to do is I
want to say that this table,

1798
01:15:31,446 --> 01:15:36,376
these orders .Employee_ID
field has a relationship

1799
01:15:36,636 --> 01:15:40,026
with the employee's
table .Employee_ID field.

1800
01:15:40,406 --> 01:15:43,406
In other words, I have
given this an index

1801
01:15:43,526 --> 01:15:45,816
in the orders table,
Employee_ID.

1802
01:15:46,416 --> 01:15:48,856
And where else does
Employee_ID appear?

1803
01:15:48,886 --> 01:15:51,546
Obviously in the employee's
table where it's a primary key.

1804
01:15:51,956 --> 01:15:54,936
So if I want to teach the
database that Employee_ID

1805
01:15:55,086 --> 01:15:58,206
in the orders table
is a foreign key.

1806
01:15:58,606 --> 01:15:59,606
And what's a foreign key?

1807
01:15:59,756 --> 01:16:02,506
Well, it's a field that happens
to be a primary key elsewhere.

1808
01:16:02,786 --> 01:16:04,736
I literally just make
that association here.

1809
01:16:05,086 --> 01:16:06,516
So now, the database,

1810
01:16:06,516 --> 01:16:10,556
MySQL knows that Employee_ID
is a primary key in employees.

1811
01:16:11,156 --> 01:16:13,586
And it's a foreign
key in orders.

1812
01:16:14,106 --> 01:16:14,966
Well, who cares?

1813
01:16:15,196 --> 01:16:18,676
Well now, you can set up rules
for undelete and unupdate.

1814
01:16:19,306 --> 01:16:22,636
In other words, by restricting
this field in this way,

1815
01:16:23,046 --> 01:16:27,016
I can now prevent
developers, myself included,

1816
01:16:27,196 --> 01:16:29,996
from deleting this
employee from the database.

1817
01:16:29,996 --> 01:16:34,216
Why? Because if Stephen,
employee three, has both a row

1818
01:16:34,216 --> 01:16:36,846
in the employee's table and he's
also sold a couple of things

1819
01:16:36,846 --> 01:16:39,996
in the orders table, by
specifying this relationship

1820
01:16:39,996 --> 01:16:42,066
between the two tables,
I can ensure that I

1821
01:16:42,066 --> 01:16:44,926
or a colleague doesn't
accidentally delete Stephen

1822
01:16:45,126 --> 01:16:46,396
from the employee's table

1823
01:16:46,396 --> 01:16:48,696
because I've restricted
deletes and updates.

1824
01:16:49,056 --> 01:16:50,806
In other words, if I
try to delete this--

1825
01:16:51,076 --> 01:16:54,316
delete Stephen, it's not going
to be allowed because I'm going

1826
01:16:54,316 --> 01:16:55,836
to be violating these
constraints.

1827
01:16:56,086 --> 01:16:58,336
By contrast, I can
specify that I don't want

1828
01:16:58,336 --> 01:17:00,926
to restrict those
kinds of operations.

1829
01:17:01,156 --> 01:17:03,026
I want to do something
like CASCADE.

1830
01:17:03,026 --> 01:17:05,306
And CASCADE is kind
of the opposite.

1831
01:17:05,306 --> 01:17:07,826
What this means is that
if I delete Stephen

1832
01:17:08,586 --> 01:17:10,776
from the employee's table,
you know what, go ahead

1833
01:17:10,776 --> 01:17:12,646
and delete Stephen from
the orders table, too.

1834
01:17:12,646 --> 01:17:13,506
We're done with Stephen.

1835
01:17:13,756 --> 01:17:14,816
If he's fired, that's it.

1836
01:17:14,816 --> 01:17:16,986
We're going to remove
all evidence of him.

1837
01:17:17,426 --> 01:17:19,596
So this might not be
the desired behavior.

1838
01:17:19,596 --> 01:17:21,256
Sometimes it is,
sometimes it isn't.

1839
01:17:21,456 --> 01:17:23,986
But if you did want that
behavior, where deleting

1840
01:17:23,986 --> 01:17:27,046
from one table automatically
triggers deletes from another,

1841
01:17:27,296 --> 01:17:28,956
you can do this again
all in the database.

1842
01:17:28,956 --> 01:17:29,926
So what's the upside?

1843
01:17:30,146 --> 01:17:32,156
Well, you don't have to
worry about this in code,

1844
01:17:32,156 --> 01:17:33,896
whether it's PHP or
some other language,

1845
01:17:33,896 --> 01:17:36,866
your database can do be
self-managing in this way.

1846
01:17:37,156 --> 01:17:38,246
So all you have to worry

1847
01:17:38,246 --> 01:17:40,636
about is deleting the one
thing you care abut, Stephen,

1848
01:17:40,636 --> 01:17:41,616
from the user's table.

1849
01:17:41,736 --> 01:17:44,136
And if Stephen's name
or ID is anywhere else

1850
01:17:44,136 --> 01:17:46,716
in your entire database and
any of your dozens of tables,

1851
01:17:47,166 --> 01:17:49,546
these kinds of operations
can go through automatically

1852
01:17:49,546 --> 01:17:50,786
and clean all that up for you.

1853
01:17:50,976 --> 01:17:53,066
Now again, probably bad
for terminating an employee

1854
01:17:53,066 --> 01:17:55,256
because you don't want to forget
what they did actually sell.

1855
01:17:55,486 --> 01:17:56,856
So we might instead
want to do something

1856
01:17:56,856 --> 01:17:58,846
like restrict in that case.

1857
01:17:58,936 --> 01:18:00,456
But that's another
advantage of InnoDB

1858
01:18:00,456 --> 01:18:03,726
that you don't get from MyISAM.

1859
01:18:03,846 --> 01:18:08,736
All right, so let's
talk about design

1860
01:18:08,736 --> 01:18:10,046
with an eye toward project 1.

1861
01:18:10,046 --> 01:18:12,976
So project 0 still in
progress presumably,

1862
01:18:13,056 --> 01:18:15,546
per the course's website
we'll announce tomorrow

1863
01:18:15,546 --> 01:18:17,656
on the course's homepage
how to submit it

1864
01:18:17,736 --> 01:18:19,006
and the process for doing so.

1865
01:18:19,006 --> 01:18:20,376
So keep an eye on that.

1866
01:18:20,376 --> 01:18:21,986
You will need to submit
it electronically.

1867
01:18:22,386 --> 01:18:28,826
And then this second
project 1 is due next week

1868
01:18:28,936 --> 01:18:31,056
so realize there's
deliberately this overlap

1869
01:18:31,056 --> 01:18:33,426
so that even though project
0 is not due until Wednesday,

1870
01:18:33,636 --> 01:18:35,656
we'll release the next
project on a Monday just

1871
01:18:35,656 --> 01:18:37,446
so that you can manage
your own time.

1872
01:18:37,446 --> 01:18:39,016
And if you get a
head start, great.

1873
01:18:39,226 --> 01:18:41,776
If not, you can start on
Wednesday or thereafter.

1874
01:18:42,316 --> 01:18:46,186
All right, so CS75 Finance is
the challenge at hand next.

1875
01:18:46,496 --> 01:18:48,126
And this is your second project

1876
01:18:48,126 --> 01:18:50,346
that actually involves
MySQL this time around.

1877
01:18:50,346 --> 01:18:52,116
So an actual database,
no more XML.

1878
01:18:52,706 --> 01:18:55,816
Division for CS75 Finance
for the specification,

1879
01:18:55,816 --> 01:18:57,766
whose PDF is already online is

1880
01:18:57,886 --> 01:19:01,226
to implement your own
etrade.com-like website,

1881
01:19:01,446 --> 01:19:04,826
a website that allows you to
manage the buying and selling

1882
01:19:04,826 --> 01:19:07,856
of stocks and to
allow users to pretend

1883
01:19:07,906 --> 01:19:09,836
that they're actually buying
and selling real stocks.

1884
01:19:09,836 --> 01:19:11,926
In other words, if I'm a
user using your version

1885
01:19:11,926 --> 01:19:15,066
of CS75 Finance, you're going
to have to allow me to register

1886
01:19:15,066 --> 01:19:16,986
for an account and then
log in to your site.

1887
01:19:17,216 --> 01:19:18,286
And the moment I register,

1888
01:19:18,286 --> 01:19:20,826
it's a nice little promotion
you're going to give me $10,000.

1889
01:19:21,076 --> 01:19:23,596
And you're going to do that
by associating that with me,

1890
01:19:23,596 --> 01:19:25,526
my user ID, somewhere
in your database.

1891
01:19:26,046 --> 01:19:30,006
And then, you're going to
enable me to get stock quotes

1892
01:19:30,006 --> 01:19:32,656
like how much is Google right
now, how much is Yahoo right now

1893
01:19:32,656 --> 01:19:34,646
or any other company,
and then buy

1894
01:19:34,646 --> 01:19:36,956
and sell stocks, so to speak.

1895
01:19:37,066 --> 01:19:38,336
And how do you buy a stock?

1896
01:19:38,336 --> 01:19:40,896
Well, you're going to check how
much the stock currently is.

1897
01:19:40,896 --> 01:19:43,406
And if I want to buy one share,
you're going to do the math,

1898
01:19:43,546 --> 01:19:46,006
one share times stock price
is probably a few dollars

1899
01:19:46,006 --> 01:19:46,886
or a few hundred dollars.

1900
01:19:47,086 --> 01:19:48,786
Then you're going to check
what's my account balance.

1901
01:19:48,786 --> 01:19:50,956
I have $10,000 out of the gate.

1902
01:19:50,956 --> 01:19:52,746
And then you're going to do
the math and say, "OK now,

1903
01:19:52,746 --> 01:19:55,076
you have $9,000 something
but you have one share

1904
01:19:55,076 --> 01:19:56,946
of stock in some company.

1905
01:19:57,386 --> 01:20:01,066
So the building blocks we've
been looking at tonight in terms

1906
01:20:01,066 --> 01:20:04,446
of locks and in terms of
automatically updating rows

1907
01:20:04,446 --> 01:20:06,346
and the like, it's
definitely going to come

1908
01:20:06,346 --> 01:20:08,506
into play here especially
since money is involved,

1909
01:20:08,616 --> 01:20:10,796
where correctness is
all the more important

1910
01:20:10,796 --> 01:20:12,916
because otherwise
someone, you or the bank,

1911
01:20:13,146 --> 01:20:13,996
are going to lose money.

1912
01:20:14,376 --> 01:20:16,396
So the proposal we
have here is this.

1913
01:20:16,396 --> 01:20:19,776
First, take notice that here
is the most recent piece

1914
01:20:19,776 --> 01:20:22,736
of spam I got over the weekend,
advertising some penny stock

1915
01:20:22,736 --> 01:20:24,686
and this penny stock goes
for 3 cents right now.

1916
01:20:25,006 --> 01:20:28,086
Invariably, we have to change
this example every year

1917
01:20:28,086 --> 01:20:30,186
because these penny stocks keep
getting driven out of business.

1918
01:20:30,616 --> 01:20:31,946
But this is an email I got.

1919
01:20:31,946 --> 01:20:34,166
Someone encouraging me
to buy this penny stock

1920
01:20:34,166 --> 01:20:35,916
and a penny stock is
one that's really cheap,

1921
01:20:35,996 --> 01:20:37,666
some usually some
number of pennies.

1922
01:20:38,176 --> 01:20:41,756
And this was 3 cents
as of yesterday.

1923
01:20:42,336 --> 01:20:45,256
So, what's the relevance here?

1924
01:20:45,456 --> 01:20:47,396
This is really just a
screenshot of Yahoo Finance.

1925
01:20:47,626 --> 01:20:49,976
What's nice about Yahoo
Finance is that you can look

1926
01:20:49,976 --> 01:20:51,866
up stock quotes like this one.

1927
01:20:51,866 --> 01:20:54,186
ROSV is the stock symbol.

1928
01:20:54,186 --> 01:20:56,896
If you're unfamiliar with
buying and selling stocks,

1929
01:20:56,896 --> 01:20:59,646
realize that this specification
points you at a tutorial

1930
01:20:59,646 --> 01:21:01,816
for them that applies to--

1931
01:21:02,146 --> 01:21:04,636
that will give you a sense

1932
01:21:04,636 --> 01:21:06,786
of how this world
works if unfamiliar.

1933
01:21:06,786 --> 01:21:09,466
But let me go to Yahoo
Finance and type in ROSV,

1934
01:21:09,626 --> 01:21:12,046
which again is the
symbol for Rostock

1935
01:21:12,046 --> 01:21:13,756
or Rostock Ventures Comp, Corp,

1936
01:21:13,756 --> 01:21:15,926
whatever that is,
and get quotes.

1937
01:21:15,926 --> 01:21:16,926
And I see pretty much this.

1938
01:21:16,926 --> 01:21:19,766
So it looks like no, it didn't
really trade heavily today.

1939
01:21:20,616 --> 01:21:24,176
In fact, 20,000 shares were
actually bought or sold today

1940
01:21:24,176 --> 01:21:26,416
but the price remains
at 3 cents.

1941
01:21:27,076 --> 01:21:29,386
So this is nice and interesting
but what's really need

1942
01:21:29,386 --> 01:21:35,096
about Yahoo is that down
here at the bottom is

1943
01:21:35,096 --> 01:21:36,476
that they have this
little toolbox.

1944
01:21:36,866 --> 01:21:38,326
Because what we want
to do is not want to--

1945
01:21:38,326 --> 01:21:40,466
we don't want to resort to
something like screen scraping

1946
01:21:40,466 --> 01:21:42,146
where to get a stock
price from Yahoo,

1947
01:21:42,146 --> 01:21:44,696
we have to grab their web
page and then read the HTML

1948
01:21:44,696 --> 01:21:47,436
and figure out what number in
that mess is the stock price.

1949
01:21:47,866 --> 01:21:50,246
Rather here is a nice
link, Download Data.

1950
01:21:50,626 --> 01:21:51,706
So let's actually click this.

1951
01:21:51,706 --> 01:21:53,716
I'm going to click
Download Data delayed

1952
01:21:53,716 --> 01:21:54,756
and that's just a reminder

1953
01:21:54,756 --> 01:21:56,586
that it's actually a
few minutes delayed.

1954
01:21:56,986 --> 01:21:58,486
It's not truly real time.

1955
01:21:58,666 --> 01:22:00,826
Now notice this downloaded
file called quotes.csv.

1956
01:22:00,826 --> 01:22:04,016
So this is nice because now we
can tie it together last week's

1957
01:22:04,016 --> 01:22:06,586
discussion of CSV and
open up this file.

1958
01:22:07,406 --> 01:22:13,086
It's going to open up a
spreadsheet program like numbers

1959
01:22:13,156 --> 01:22:19,146
or like Excel and what do I
have here, well this is it.

1960
01:22:19,646 --> 01:22:21,756
So it's a very simple
spreadsheet, it seems,

1961
01:22:21,756 --> 01:22:24,386
where the left-hand
column contains ROSV,

1962
01:22:24,456 --> 01:22:28,936
the next column contains 0.03
which is presumably the price,

1963
01:22:29,006 --> 01:22:32,296
the next column contains
today's date and then the time

1964
01:22:32,506 --> 01:22:34,516
and then I'm not sure
about those other fields,

1965
01:22:34,586 --> 01:22:35,486
draw off the top of my head

1966
01:22:35,486 --> 01:22:37,476
because it's not actually
indicated but in a spec,

1967
01:22:37,756 --> 01:22:39,156
we actually point
out exactly what--

1968
01:22:39,156 --> 01:22:41,126
how you can figure out what
all the various fields mean.

1969
01:22:41,316 --> 01:22:43,296
But the real important one
for now is just the price.

1970
01:22:43,856 --> 01:22:44,916
It's 3 cents.

1971
01:22:45,056 --> 01:22:47,126
So what's the implication
of having access to CSVs?

1972
01:22:47,126 --> 01:22:50,916
It turns out it's really
easy in PHP to write code

1973
01:22:50,916 --> 01:22:54,746
that essentially pretends to be
a browser, hits a foreign URL,

1974
01:22:54,956 --> 01:22:56,816
gets back the result, and if--

1975
01:22:56,816 --> 01:22:59,116
it's great if the
result is in CSV.

1976
01:22:59,116 --> 01:23:00,976
What function do we
talk about last week

1977
01:23:00,976 --> 01:23:03,516
that will actually parse the CSV
file and hand you back an array?

1978
01:23:04,186 --> 01:23:05,326
What's that?

1979
01:23:05,526 --> 01:23:05,926
>> XPath?

1980
01:23:06,206 --> 01:23:08,066
>> Not XPath.

1981
01:23:08,066 --> 01:23:10,776
So XPath was only for XML
but there was a function

1982
01:23:11,056 --> 01:23:15,216
that we talked about that can
open a CSV file and parse it

1983
01:23:15,216 --> 01:23:18,146
for you and hand all of
the issues involving commas

1984
01:23:18,146 --> 01:23:19,096
and quotes and the like.

1985
01:23:19,096 --> 01:23:19,536
Louis?

1986
01:23:19,726 --> 01:23:20,496
>> Fgetcsv.

1987
01:23:20,496 --> 01:23:21,506
>> Yeah, so fgetcsv.

1988
01:23:21,506 --> 01:23:23,976
We didn't use it but now is
going to be our chance tonight

1989
01:23:23,976 --> 01:23:26,596
to actually use this
function to fetch the data

1990
01:23:26,596 --> 01:23:28,306
and then do something with it.

1991
01:23:28,636 --> 01:23:30,296
So we can pretend
to be a browser,

1992
01:23:30,406 --> 01:23:32,486
effectively downloading
the CSV file.

1993
01:23:32,486 --> 01:23:35,236
Why? Well if we'd look
at the Yahoo Finance URL,

1994
01:23:35,236 --> 01:23:37,636
let's take a look at
what it actually is.

1995
01:23:38,066 --> 01:23:41,456
Let me go ahead and
control click

1996
01:23:41,456 --> 01:23:44,536
and choose copy link
address and then paste this

1997
01:23:44,576 --> 01:23:45,916
into a little text file.

1998
01:23:46,256 --> 01:23:48,546
Notice that this is the
URL that we just visited,

1999
01:23:48,546 --> 01:23:51,746
download.finance
.yahoo.com/d/ quotes.csv?

2000
01:23:51,796 --> 01:23:53,826
The question mark is
the interesting part

2001
01:23:53,826 --> 01:23:56,046
because that means
there's some dynamism.

2002
01:23:56,636 --> 01:24:00,646
S equals ROSV and f
equals a lot of stuff.

2003
01:24:01,456 --> 01:24:05,256
So take a guess, what does the
s HTTP parameter apparently

2004
01:24:05,256 --> 01:24:07,716
represent here?

2005
01:24:07,926 --> 01:24:08,026
Yeah?

2006
01:24:08,356 --> 01:24:08,596
>> Search?

2007
01:24:09,026 --> 01:24:11,036
>> Search or, yeah, more
concretely here, though,

2008
01:24:11,036 --> 01:24:12,006
what are you searching for?

2009
01:24:12,296 --> 01:24:13,216
Isaac?

2010
01:24:13,456 --> 01:24:13,656
>> Stock.

2011
01:24:13,846 --> 01:24:17,296
>> The stock symbol, so s is
presumably the stock symbol.

2012
01:24:17,296 --> 01:24:18,726
Now I didn't create this URL.

2013
01:24:18,726 --> 01:24:22,146
Remember I searched for ROSV
and hit enter, but it appears

2014
01:24:22,146 --> 01:24:25,146
to have generated later in
that page under the toolbox

2015
01:24:25,146 --> 01:24:28,316
of link that, you know, I feel
like I can fake this, right?

2016
01:24:28,316 --> 01:24:30,306
We already know that you
can generate your own URLs

2017
01:24:30,306 --> 01:24:33,666
or your own strings in
PHP, surely I could copy

2018
01:24:33,666 --> 01:24:36,626
and paste most of this URL but
just change which part of it

2019
01:24:37,556 --> 01:24:38,676
in my forthcoming project?

2020
01:24:38,886 --> 01:24:39,836
Just the value of?

2021
01:24:40,196 --> 01:24:40,606
>> The stock?

2022
01:24:40,786 --> 01:24:42,196
>> S, yeah, just the
value of the stock.

2023
01:24:42,476 --> 01:24:43,866
Now what does f refer to?

2024
01:24:43,866 --> 01:24:46,446
That's another parameter and I
only know what those fields mean

2025
01:24:46,656 --> 01:24:48,206
by looking it up in
the documentation.

2026
01:24:48,206 --> 01:24:51,896
We give you URL in line but
this is just some arbitrary sort

2027
01:24:51,896 --> 01:24:55,326
of 1990 style thing from
Yahoo where they have sl,

2028
01:24:55,326 --> 01:24:57,336
I think that is, 1d1t1.

2029
01:24:57,716 --> 01:24:58,506
Each of those letters

2030
01:24:58,506 --> 01:25:01,576
and numbers just
represents a certain field,

2031
01:25:01,636 --> 01:25:04,126
like the stock price,
the date, the time,

2032
01:25:04,126 --> 01:25:07,096
all of the columns we saw on our
spreadsheet, they are derived

2033
01:25:07,096 --> 01:25:10,046
from that cryptic looking
string there and some random guy

2034
01:25:10,046 --> 01:25:11,676
in the internet figured
out what all of these meant

2035
01:25:11,676 --> 01:25:13,186
and that's the URL
we put in the specs

2036
01:25:13,186 --> 01:25:14,906
so that you can go see
what they all represent.

2037
01:25:15,416 --> 01:25:17,306
But by default, we
get back this.

2038
01:25:17,686 --> 01:25:21,466
This last thing, &e=.csv
is actually a hack.

2039
01:25:21,466 --> 01:25:24,386
Older browsers sometimes used
to choke when you were trying

2040
01:25:24,386 --> 01:25:27,956
to download something like
a CSV if the file extension

2041
01:25:28,216 --> 01:25:30,916
in the URL was not actually CSV.

2042
01:25:31,176 --> 01:25:32,736
So even though this
is not a file,

2043
01:25:32,736 --> 01:25:37,456
this is probably a dynamically
generated spreadsheet of sort,

2044
01:25:37,456 --> 01:25:40,856
a CSV file, it's not an actual
file on a hard drive somewhere,

2045
01:25:41,096 --> 01:25:43,936
that's just a silly little
old school workaround

2046
01:25:44,126 --> 01:25:46,686
for various browser issues
that have largely disappeared

2047
01:25:46,686 --> 01:25:47,706
at least in this case.

2048
01:25:48,176 --> 01:25:50,036
So what's the takeaway here?

2049
01:25:50,236 --> 01:25:51,756
It feels like I should
be able to copy

2050
01:25:51,756 --> 01:25:56,856
and paste this entire URL and
just change this if I want

2051
01:25:56,856 --> 01:25:59,616
to implement my own
stock trading website

2052
01:25:59,616 --> 01:26:03,246
that allows users to specify
stocks via my own interface.

2053
01:26:03,726 --> 01:26:05,246
So let's take a look
at an example here.

2054
01:26:05,246 --> 01:26:06,716
Let me go into the appliance

2055
01:26:07,466 --> 01:26:10,036
where I have this
simple page waiting,

2056
01:26:10,036 --> 01:26:12,646
this is one of the
snippets of code available

2057
01:26:12,646 --> 01:26:16,876
on the lecture's page
tonight for tonight's lecture.

2058
01:26:16,876 --> 01:26:18,586
This is index.php.

2059
01:26:18,646 --> 01:26:20,366
The functionality
I've implemented

2060
01:26:20,366 --> 01:26:21,416
in advance here is this.

2061
01:26:21,586 --> 01:26:25,196
If I go here and type in
GOOG as Google stock symbol,

2062
01:26:25,196 --> 01:26:27,396
it doesn't matter if it's upper
case or lower case for Yahoo,

2063
01:26:27,706 --> 01:26:31,546
and then click Submit, notice
that this is what comes back.

2064
01:26:32,266 --> 01:26:35,996
So as of tonight, Monday,
it's apparently 574.92.

2065
01:26:36,196 --> 01:26:37,106
Let's try another one.

2066
01:26:37,106 --> 01:26:42,506
Microsoft, MSFT is their symbol,
submit, there is this 29.44.

2067
01:26:42,816 --> 01:26:45,406
We can do something
like Facebook, submit,

2068
01:26:46,216 --> 01:26:48,526
there is this 28 and so forth.

2069
01:26:48,826 --> 01:26:49,866
Now, where is this coming?

2070
01:26:50,126 --> 01:26:51,286
This is not hard coded.

2071
01:26:51,566 --> 01:26:53,756
This is generated
dynamically by the server.

2072
01:26:53,756 --> 01:26:56,716
So let's take a look
now at index.php.

2073
01:26:56,716 --> 01:26:59,756
Let me go ahead and
open up index,

2074
01:27:01,036 --> 01:27:03,386
and it's a pretty small program

2075
01:27:03,776 --> 01:27:06,246
and index rather
is just my form.

2076
01:27:06,686 --> 01:27:09,236
So notice this is where
we were a moment ago.

2077
01:27:09,886 --> 01:27:11,586
Where do I send the user

2078
01:27:11,586 --> 01:27:13,436
when I submit the
form just to be clear?

2079
01:27:14,056 --> 01:27:14,276
Yeah?

2080
01:27:14,686 --> 01:27:15,806
>> Quote.php.

2081
01:27:16,066 --> 01:27:16,406
>> Quote.php.

2082
01:27:16,406 --> 01:27:18,766
I'm using GET just
because I don't have to

2083
01:27:18,766 --> 01:27:21,186
but I chose to, input
name="symbol".

2084
01:27:21,186 --> 01:27:22,866
I didn't call it s
but not a big deal.

2085
01:27:22,866 --> 01:27:24,086
I can call it anything I want.

2086
01:27:24,186 --> 01:27:25,696
Type as text and
then a submit button.

2087
01:27:25,806 --> 01:27:26,366
So that's it.

2088
01:27:26,886 --> 01:27:30,056
And now what's the next
file obviously of interest?

2089
01:27:30,196 --> 01:27:35,866
Quote.php, so let me go in
there and open up quote.php,

2090
01:27:35,866 --> 01:27:39,136
and this is how I go about
getting the stock price.

2091
01:27:39,636 --> 01:27:42,506
So notice at the very top of
my file I've got some PHP tags,

2092
01:27:42,796 --> 01:27:44,016
so now I'm in PHP mode.

2093
01:27:44,356 --> 01:27:45,656
Notice I'm next doing this.

2094
01:27:46,086 --> 01:27:50,966
I'm URL-encoding s. What does
it mean to URL-encode something?

2095
01:27:51,016 --> 01:27:51,136
Yeah?

2096
01:27:51,476 --> 01:27:56,246
>> I think it takes all the--

2097
01:27:56,246 --> 01:28:00,566
all the spaces and
turns it into, like,

2098
01:28:00,566 --> 01:28:01,176
[inaudible] and all that.

2099
01:28:01,366 --> 01:28:03,646
>> Yes. Spaces are generally
just simplified as pluses

2100
01:28:03,646 --> 01:28:06,756
but are B percent
20, yup, percent 20.

2101
01:28:06,756 --> 01:28:10,766
So yes, URL-encode takes
a string and ensures

2102
01:28:10,766 --> 01:28:13,286
that there are no
dangerous characters

2103
01:28:13,286 --> 01:28:16,316
or confusing characters
for URL's sake.

2104
01:28:16,516 --> 01:28:17,776
What are the confusing
characters?

2105
01:28:17,776 --> 01:28:19,246
Well, URLs can not have spaces.

2106
01:28:19,246 --> 01:28:20,786
So how are they typically
represented?

2107
01:28:20,786 --> 01:28:23,276
With a space character or
with descriptive sequence

2108
01:28:23,276 --> 01:28:24,666
of percent two, zero.

2109
01:28:24,936 --> 01:28:27,306
So URL-encode does that.

2110
01:28:27,666 --> 01:28:28,696
Now, is this a big deal?

2111
01:28:28,696 --> 01:28:29,466
I just typed in GOOG.

2112
01:28:29,466 --> 01:28:33,126
URL-encode is not going to
do anything to that string.

2113
01:28:33,126 --> 01:28:34,636
It's not going to do
anything to Microsoft.

2114
01:28:34,636 --> 01:28:36,436
It's not going to
do anything to FB.

2115
01:28:36,466 --> 01:28:38,576
But if I typed in
some weird punctuation

2116
01:28:38,576 --> 01:28:41,426
for certain stock symbols or I
hit the space bar accidentally,

2117
01:28:41,746 --> 01:28:43,626
that would handle
that kind of scenario.

2118
01:28:43,926 --> 01:28:46,946
So this is just one of those
do it as a matter of principle.

2119
01:28:47,026 --> 01:28:49,786
Any time you send user
input to a browser,

2120
01:28:49,936 --> 01:28:51,276
you call htmlspecialchars.

2121
01:28:51,276 --> 01:28:53,366
Any time you send user
input to a database,

2122
01:28:53,726 --> 01:28:55,196
you call
mysql_real_escape_string

2123
01:28:55,196 --> 01:28:58,136
or now the prepare
function in PDO.

2124
01:28:58,686 --> 01:29:03,596
Any time you pass something to
a URL, you should call urlencode

2125
01:29:03,786 --> 01:29:05,206
on it, those three
rules of thumb.

2126
01:29:05,206 --> 01:29:06,456
All right.

2127
01:29:06,726 --> 01:29:09,386
So $url is my next
line in line four

2128
01:29:09,506 --> 01:29:12,056
and notice all I did
literally what I promised.

2129
01:29:12,056 --> 01:29:14,866
I copied and pasted the
URL from Yahoo's website

2130
01:29:15,226 --> 01:29:20,056
and I put this little
placeholder here of $s

2131
01:29:20,056 --> 01:29:21,446
with the curly braces around it.

2132
01:29:21,766 --> 01:29:23,726
The curly braces are
not strictly necessary.

2133
01:29:23,726 --> 01:29:25,996
It's just good practice
that I've gotten into just

2134
01:29:25,996 --> 01:29:30,736
in case it's a complex looking
variable like a superglobal

2135
01:29:30,736 --> 01:29:31,716
or an associative array.

2136
01:29:32,166 --> 01:29:35,906
So that just means put the
value of s there and s, recall,

2137
01:29:35,906 --> 01:29:37,136
was defined one line earlier

2138
01:29:37,436 --> 01:29:39,426
to be the return
value of urlencode.

2139
01:29:39,426 --> 01:29:41,486
All right, the next
line, what am I doing?

2140
01:29:41,916 --> 01:29:44,176
This is what's great
about PHP sometimes.

2141
01:29:44,176 --> 01:29:50,406
It just does so many things
easily, $handle = fopen URL

2142
01:29:50,406 --> 01:29:51,836
"r" let's go in reverse order.

2143
01:29:51,836 --> 01:29:53,656
What does the r likely mean?

2144
01:29:54,506 --> 01:29:54,596
Yeah?

2145
01:29:55,116 --> 01:29:55,256
>> Read.

2146
01:29:55,516 --> 01:29:58,556
>> Read. So this doesn't really
make sense in the context

2147
01:29:58,556 --> 01:30:00,786
of URLs because it's
not like you can write,

2148
01:30:00,786 --> 01:30:03,336
change the URL writing
code on your server

2149
01:30:03,336 --> 01:30:03,976
as opposed to someone else's.

2150
01:30:04,876 --> 01:30:08,396
But fopen is generally used
to open a file, file open.

2151
01:30:08,906 --> 01:30:13,206
But PHP overloads a lot of
these file related functions.

2152
01:30:13,506 --> 01:30:16,206
So that if you instead give it,
not the name of a file that's

2153
01:30:16,206 --> 01:30:19,156
in your current folder, but
you instead give it to URL,

2154
01:30:19,156 --> 01:30:22,126
it will open a connection
to that server via TCP/IP,

2155
01:30:22,126 --> 01:30:24,386
it will do all the
requisite DNS stuff,

2156
01:30:24,386 --> 01:30:28,126
it will get back the results
of that URL and hand it to you

2157
01:30:28,466 --> 01:30:31,576
in the form of a reference,
called the file handle.

2158
01:30:31,956 --> 01:30:33,846
So that's what $handle
represents.

2159
01:30:33,846 --> 01:30:36,976
It represents essentially the
return value from that server

2160
01:30:36,976 --> 01:30:38,636
that represents the
file that came back,

2161
01:30:38,636 --> 01:30:40,586
which is hopefully
an actual CSV.

2162
01:30:40,926 --> 01:30:43,366
Now, here is the function I
promised that we could use.

2163
01:30:43,466 --> 01:30:47,936
Fgetcsv takes a file handle,
so it doesn't take a string,

2164
01:30:48,466 --> 01:30:50,616
doesn't take a file name,
it takes a file handle,

2165
01:30:50,616 --> 01:30:52,406
which is the return
the value of fopen.

2166
01:30:52,406 --> 01:30:53,496
So, you have to do it this way.

2167
01:30:54,036 --> 01:30:57,796
And it reads in the
first row from that file.

2168
01:30:58,386 --> 01:31:01,116
So, you would have to call
fgetcsv again and again

2169
01:31:01,116 --> 01:31:03,716
and again if the
spreadsheet contained multiple

2170
01:31:03,716 --> 01:31:04,416
stock symbols.

2171
01:31:04,416 --> 01:31:06,136
But as we saw, there
is just one.

2172
01:31:06,136 --> 01:31:07,786
So, I'm only calling this once.

2173
01:31:08,286 --> 01:31:10,186
So fgetcsv returns the first.

2174
01:31:10,186 --> 01:31:13,056
And what's the data
type that comes back?

2175
01:31:13,286 --> 01:31:17,386
Well, it's an array, where it's
a numerically indexed array,

2176
01:31:17,546 --> 01:31:21,646
where $row open bracket zero
close bracket will represent the

2177
01:31:21,646 --> 01:31:23,206
first column from the CSV.

2178
01:31:23,526 --> 01:31:25,116
Bracket one the next column,

2179
01:31:25,146 --> 01:31:27,336
bracket two the next
column and so forth.

2180
01:31:27,526 --> 01:31:29,866
So, there is no notion
of headers in CSV.

2181
01:31:30,116 --> 01:31:32,976
Sometimes companies or
people will put the first row

2182
01:31:32,976 --> 01:31:38,236
as having words for human
friendliness however fgetcsv is

2183
01:31:38,236 --> 01:31:39,376
not going to use that at all.

2184
01:31:39,656 --> 01:31:43,606
So, in fact, you sometimes want
to call fgetcsv once to E-up

2185
01:31:43,706 --> 01:31:47,166
that first line as being useful
for humans but not for machines,

2186
01:31:47,166 --> 01:31:49,716
because it's the
next row and beyond

2187
01:31:49,966 --> 01:31:50,966
that actually has your data.

2188
01:31:51,196 --> 01:31:54,546
But for us, so simple,
because there's only one column

2189
01:31:54,546 --> 01:31:56,756
or one row to worry about
that came back from Yahoo.

2190
01:31:56,756 --> 01:31:57,946
So I just call fgetcsv.

2191
01:31:57,946 --> 01:32:00,646
And then for good measure,
I call fclose on the handle

2192
01:32:00,646 --> 01:32:01,656
to close that connection.

2193
01:32:01,656 --> 01:32:02,646
I don't need anything more.

2194
01:32:03,136 --> 01:32:07,276
Now, down below, I have
some super simple HTML

2195
01:32:07,406 --> 01:32:11,616
and notice what I'm plugging
in, the current price of this.

2196
01:32:11,866 --> 01:32:14,596
Now, just to be clear, why
am I calling htmlspecialchars

2197
01:32:14,596 --> 01:32:16,936
on the symbol that the user
typed in to my little form?

2198
01:32:17,326 --> 01:32:17,496
Yeah.

2199
01:32:18,516 --> 01:32:24,156
[ Inaudible Remark ]

2200
01:32:24,656 --> 01:32:26,306
Exactly. I want to protect

2201
01:32:26,306 --> 01:32:28,846
against what we've been calling
cross-site scripting attacks,

2202
01:32:28,926 --> 01:32:30,236
so more on that again to come.

2203
01:32:30,236 --> 01:32:31,686
But for now, we want
to make sure

2204
01:32:31,686 --> 01:32:33,676
that they've not typed
any dangerous characters

2205
01:32:33,676 --> 01:32:36,166
or been tricked into typing
in some dangerous characters.

2206
01:32:36,436 --> 01:32:38,666
And now the interesting
part, the current price

2207
01:32:38,666 --> 01:32:42,316
of that symbol is and then
some more PHP code over here.

2208
01:32:42,686 --> 01:32:43,876
And what am I outputting?

2209
01:32:44,136 --> 01:32:45,416
Row bracket one.

2210
01:32:45,746 --> 01:32:46,716
How do I know it's one?

2211
01:32:46,796 --> 01:32:48,046
I just looked at the CSV, right?

2212
01:32:48,046 --> 01:32:49,826
We just open the N
numbers or an Excel.

2213
01:32:49,826 --> 01:32:53,246
We saw that column one, not
zero, column one is the price,

2214
01:32:53,726 --> 01:32:55,076
so I want to print that out.

2215
01:32:55,736 --> 01:32:58,356
Now there is a problem here,
like Jack's concern earlier

2216
01:32:58,356 --> 01:33:00,896
about money, could end up
causing us some problem.

2217
01:33:00,896 --> 01:33:04,036
And if the price happens to
be something, like a $1.90,

2218
01:33:04,126 --> 01:33:05,626
maybe we would get $1.09.

2219
01:33:05,626 --> 01:33:06,836
You'll have to actually
experiment

2220
01:33:06,836 --> 01:33:09,066
with some real stock
quotes and see what kinds

2221
01:33:09,066 --> 01:33:10,586
of numbers Yahoo
is spitting out.

2222
01:33:10,946 --> 01:33:14,016
But realize there's a
function called number_format

2223
01:33:14,016 --> 01:33:16,906
in PHP, printf in PHP.

2224
01:33:16,906 --> 01:33:19,036
Both of which allow
you to format numbers

2225
01:33:19,036 --> 01:33:22,626
to a specific number of decimal
places, so you can guarantee

2226
01:33:22,696 --> 01:33:24,516
that your money will be
formatted in the right way.

2227
01:33:24,516 --> 01:33:26,876
And back to our discussion
now of databases,

2228
01:33:26,876 --> 01:33:29,666
what were the data types
that allow decimal points?

2229
01:33:29,736 --> 01:33:32,686
It was float, double
and there is one other.

2230
01:33:33,006 --> 01:33:33,186
Yeah.

2231
01:33:33,546 --> 01:33:33,756
>> Decimal.

2232
01:33:33,756 --> 01:33:36,036
>> Yeah. It was called
decimal, which doesn't exists

2233
01:33:36,036 --> 01:33:37,416
in most programming languages,

2234
01:33:37,456 --> 01:33:41,136
but this is a database field
type that actually allows you

2235
01:33:41,136 --> 01:33:43,816
to specify precisely the number

2236
01:33:43,816 --> 01:33:47,566
of digits you want before the
period and after the period.

2237
01:33:47,886 --> 01:33:49,666
So for this look--
ultimately look

2238
01:33:49,666 --> 01:33:51,366
in the MySQL documentation
for more.

2239
01:33:51,366 --> 01:33:52,646
It's very easy to use.

2240
01:33:52,936 --> 01:33:56,566
But you'll be able to decide you
probably want two, maybe three,

2241
01:33:56,566 --> 01:33:59,836
maybe even four numbers
after the decimal point.

2242
01:33:59,836 --> 01:34:02,676
It really depends on what kind
of precision Yahoo is returning,

2243
01:34:02,676 --> 01:34:04,626
is it just cents or is
it fractions of cents.

2244
01:34:05,026 --> 01:34:07,636
And ask for how many numbers
to the left of the symbol--

2245
01:34:08,046 --> 01:34:09,606
the left of the period
you'll kind of have

2246
01:34:09,606 --> 01:34:10,476
to make a judgment call.

2247
01:34:10,476 --> 01:34:13,106
I mean, Google is like $500, so
you need at least three digits.

2248
01:34:13,676 --> 01:34:16,246
Berkshire Hathaway,
sometimes $1000,

2249
01:34:16,246 --> 01:34:17,366
so you might need four digits.

2250
01:34:17,366 --> 01:34:18,936
But you probably
don't need 20 digits

2251
01:34:18,936 --> 01:34:20,686
because that would be a
really expensive stock.

2252
01:34:20,856 --> 01:34:22,456
So you're going to have
a judgment call there.

2253
01:34:22,686 --> 01:34:25,896
But realize that decimal for
money is probably your best bet,

2254
01:34:26,036 --> 01:34:27,676
because there is no imprecision.

2255
01:34:27,766 --> 01:34:31,406
If you specify give me two
numbers after the decimal place,

2256
01:34:31,766 --> 01:34:32,816
that's what you'll get.

2257
01:34:32,816 --> 01:34:34,266
You won't get weird
rounding issues

2258
01:34:34,266 --> 01:34:35,496
as you would more traditionally.

2259
01:34:35,496 --> 01:34:38,496
So realize decimal will be
your friend in the database

2260
01:34:38,496 --> 01:34:40,676
and something, like
printf or number_format,

2261
01:34:40,676 --> 01:34:42,476
will be your friend when
it comes to the aesthetics

2262
01:34:42,816 --> 01:34:44,726
of showing a human the number.

2263
01:34:46,136 --> 01:34:46,616
All right.

2264
01:34:46,706 --> 01:34:49,366
So, that's how we can
fetch stock queries,

2265
01:34:49,496 --> 01:34:50,766
and that's it for integration.

2266
01:34:50,886 --> 01:34:52,006
What's really nice here is

2267
01:34:52,006 --> 01:34:56,316
that with CS75 finance you can
integrate real time stock prices

2268
01:34:56,316 --> 01:35:01,906
into your application by just
using this machine readable

2269
01:35:01,906 --> 01:35:05,326
format, CSV, which we all agree
is kind of a crappy format.

2270
01:35:05,326 --> 01:35:06,296
It's not very flexible.

2271
01:35:06,296 --> 01:35:08,006
You can't sort of tag
information very well.

2272
01:35:08,006 --> 01:35:08,736
It's very flood.

2273
01:35:08,736 --> 01:35:10,016
It's vulnerable to like commas

2274
01:35:10,016 --> 01:35:11,916
and quotes being
an awkward place.

2275
01:35:12,366 --> 01:35:14,226
But it's at least
machine readable.

2276
01:35:14,226 --> 01:35:15,906
And it's not something
that we have to worry

2277
01:35:15,906 --> 01:35:17,596
about parsing ourselves.

2278
01:35:18,096 --> 01:35:21,516
So, when we get back that value,
we can just plug it right in.

2279
01:35:21,516 --> 01:35:24,246
And we can now do it to perform
math and bys and the cells

2280
01:35:24,246 --> 01:35:27,686
and the like, so a bit of
a word on strategy then.

2281
01:35:27,686 --> 01:35:30,516
So, this website has to
support these features.

2282
01:35:30,546 --> 01:35:34,286
And the dot, dot, dot refers to
other things that you might want

2283
01:35:34,286 --> 01:35:35,226
to add by your own choice.

2284
01:35:35,226 --> 01:35:36,926
So, you're going to
need to enable users

2285
01:35:36,926 --> 01:35:38,556
to log in to your website.

2286
01:35:38,926 --> 01:35:40,796
Now, you have a bit of a
chicken and an egg problem here,

2287
01:35:40,796 --> 01:35:42,106
because how you're
going to implement log

2288
01:35:42,106 --> 01:35:44,156
in functionality before
you actually have users.

2289
01:35:44,426 --> 01:35:46,386
And again, obviously, it's
going to be you pretending

2290
01:35:46,386 --> 01:35:47,986
to be a user or creating users.

2291
01:35:48,216 --> 01:35:50,816
So, you'll think through
how best to approach this.

2292
01:35:51,286 --> 01:35:53,616
If you want to implement log
in, well, you have the advantage

2293
01:35:53,616 --> 01:35:55,486
of having seen like
nine different log

2294
01:35:55,486 --> 01:35:58,466
in examples thus far, probably
you shouldn't use the earliest

2295
01:35:58,466 --> 01:36:01,236
of those, with hard coded John
Harvard username and password.

2296
01:36:01,556 --> 01:36:03,786
But the most recent ones, where
you actually use the database.

2297
01:36:03,786 --> 01:36:05,686
So, you actually have
code already from class

2298
01:36:06,006 --> 01:36:09,086
that you can adopt for your own
project to implement log ins.

2299
01:36:09,386 --> 01:36:11,776
Now, how do you implement
registration?

2300
01:36:11,776 --> 01:36:13,506
Well, registration is
not going to use select

2301
01:36:14,236 --> 01:36:15,516
from your user's table.

2302
01:36:15,516 --> 01:36:17,086
What keywords we're
going to use instead?

2303
01:36:18,216 --> 01:36:18,456
Yeah.

2304
01:36:18,666 --> 01:36:19,106
>> Insert into.

2305
01:36:19,656 --> 01:36:20,706
>> Insert into, right?

2306
01:36:20,706 --> 01:36:24,106
If you want to register for an
account, you, the programmer,

2307
01:36:24,106 --> 01:36:25,356
are probably going
to have to call one

2308
01:36:25,356 --> 01:36:27,696
or more insert statements
to add that user

2309
01:36:27,696 --> 01:36:30,326
to the table once they've told
you their preferred username

2310
01:36:30,536 --> 01:36:31,236
and password.

2311
01:36:31,406 --> 01:36:34,056
Passwords, that's in the
design decision, right?

2312
01:36:34,056 --> 01:36:36,096
We talked about not
storing in clear text,

2313
01:36:36,096 --> 01:36:37,126
not the best practice.

2314
01:36:37,346 --> 01:36:38,406
You should at least hash it,

2315
01:36:38,406 --> 01:36:40,906
but using the password function
might not be the best approach.

2316
01:36:40,906 --> 01:36:42,886
So, again, more on project
1 in sections tonight.

2317
01:36:42,886 --> 01:36:46,156
But realize even though the
functionality is maybe obvious,

2318
01:36:46,186 --> 01:36:48,276
the implementation is
meant to be non-obvious.

2319
01:36:48,366 --> 01:36:50,736
A lot of interesting
decision points to make.

2320
01:36:51,056 --> 01:36:53,036
What I would propose in
general is that if you want

2321
01:36:53,036 --> 01:36:55,646
to implement log-in first,
that's fine and you should.

2322
01:36:55,946 --> 01:36:58,816
But you can simulate
registrations, how?

2323
01:36:59,006 --> 01:37:00,556
Just use phpMyAdmin, right?

2324
01:37:00,556 --> 01:37:01,826
Create your user's table.

2325
01:37:02,056 --> 01:37:03,616
Literally click the insert tab

2326
01:37:03,616 --> 01:37:05,516
and just manually
insert some fake users,

2327
01:37:05,516 --> 01:37:07,326
so that you at least have
someone to play with.

2328
01:37:07,646 --> 01:37:09,546
Then, go and test your log

2329
01:37:09,546 --> 01:37:12,666
in code using those users you
already inserted manually.

2330
01:37:12,946 --> 01:37:14,146
Once you have log in working,

2331
01:37:14,146 --> 01:37:17,656
then you can actually implement
register.php or similar

2332
01:37:18,086 --> 01:37:19,606
to actually do those inserts.

2333
01:37:19,866 --> 01:37:22,866
So realize, you certainly
shouldn't try tackling steps one

2334
01:37:22,866 --> 01:37:25,876
and two and three and four and
five and then test everything,

2335
01:37:26,096 --> 01:37:28,306
you can absolutely
do this step by step

2336
01:37:28,366 --> 01:37:29,906
by just simulating
certain things

2337
01:37:30,086 --> 01:37:32,166
by using the command
line or phpMyAdmin.

2338
01:37:32,706 --> 01:37:33,716
So, what about get quotes?

2339
01:37:34,206 --> 01:37:34,996
Well, here, too, you kind

2340
01:37:35,026 --> 01:37:36,296
of have some code
from class, right?

2341
01:37:36,296 --> 01:37:39,206
How do you go about reading
a stock quote from Yahoo?

2342
01:37:39,426 --> 01:37:41,506
It's like two lines
of code, maybe three.

2343
01:37:41,506 --> 01:37:43,496
So you at least have
that module now.

2344
01:37:43,726 --> 01:37:45,846
But that's a good candidate
to write a function for it,

2345
01:37:45,846 --> 01:37:47,716
so it's not going to be the
only thing your file does.

2346
01:37:47,716 --> 01:37:49,776
You can write a function that
you then call, so you could have

2347
01:37:49,776 --> 01:37:52,746
like a helpers.php file
again or something like that

2348
01:37:52,746 --> 01:37:54,356
if you've gone that
route for project 0.

2349
01:37:54,616 --> 01:37:56,746
So getting the quote
is relatively easy.

2350
01:37:57,206 --> 01:37:59,196
Now selling and buying
is less obvious.

2351
01:37:59,266 --> 01:38:01,076
So, why do I propose
selling first?

2352
01:38:01,556 --> 01:38:04,096
Create your portfolio table,
or whatever you're going

2353
01:38:04,096 --> 01:38:08,006
to call the table that keeps
track of who has bought what.

2354
01:38:08,786 --> 01:38:11,576
And how do you stimulate
buying stocks before you have

2355
01:38:11,576 --> 01:38:12,546
that in functionality?

2356
01:38:12,926 --> 01:38:14,036
Use phpMyAdmin, right?

2357
01:38:14,036 --> 01:38:16,116
Go to the insert tab and
just manually pretend

2358
01:38:16,116 --> 01:38:17,946
like you bought Goog
and these many shares

2359
01:38:18,276 --> 01:38:20,386
or you bought Microsoft
and these many shares.

2360
01:38:20,606 --> 01:38:22,466
Then you can implement
selling functionality.

2361
01:38:22,706 --> 01:38:25,126
And do look at the spec, because
we do specify what we mean

2362
01:38:25,126 --> 01:38:25,896
by selling.

2363
01:38:25,896 --> 01:38:28,186
We, for instance, don't let
yourself fractions of shares.

2364
01:38:28,526 --> 01:38:31,136
We specify that you can--
the user has to sell them all

2365
01:38:31,136 --> 01:38:33,186
at once, so you don't have
to sell just a few of them.

2366
01:38:33,186 --> 01:38:35,096
But when it comes to
buying, you're going

2367
01:38:35,406 --> 01:38:36,586
to have to do a few things.

2368
01:38:36,586 --> 01:38:39,086
You're going to have to not only
get a quote right then and there

2369
01:38:39,086 --> 01:38:40,226
because we want it
to be current.

2370
01:38:40,666 --> 01:38:43,036
You're going to have to then may
check how much money the user

2371
01:38:43,036 --> 01:38:43,856
actually has.

2372
01:38:43,856 --> 01:38:46,066
So, somewhere in this
database, you have to keep track

2373
01:38:46,066 --> 01:38:49,616
of my account balance, which
initially we propose as $10,000.

2374
01:38:49,616 --> 01:38:51,506
After that, it should
go up and down based

2375
01:38:51,506 --> 01:38:52,426
on what the user does.

2376
01:38:52,776 --> 01:38:54,236
And you're going to
have to make sure

2377
01:38:54,636 --> 01:38:57,846
that if the user has already
bought some Google shares,

2378
01:38:58,526 --> 01:39:00,856
you're going to have to update
that row in your database,

2379
01:39:00,896 --> 01:39:02,546
so that you don't
just have many, many,

2380
01:39:02,546 --> 01:39:03,696
many different Google rows

2381
01:39:03,696 --> 01:39:05,656
when clearly you
could consolidate them

2382
01:39:05,656 --> 01:39:06,846
and just care about the total.

2383
01:39:06,846 --> 01:39:08,596
So in short, there's
going to be another--

2384
01:39:08,596 --> 01:39:10,166
a number of other
design decisions.

2385
01:39:10,166 --> 01:39:11,976
But how many tables do
we seem to be talking

2386
01:39:11,976 --> 01:39:12,776
about already, minimally?

2387
01:39:13,716 --> 01:39:19,686
>> Minimally two,
users and portfolios.

2388
01:39:19,686 --> 01:39:20,966
>> OK, users and portfolios.

2389
01:39:20,966 --> 01:39:23,646
And what kinds of field should
probably go into user's table,

2390
01:39:23,646 --> 01:39:24,876
even though there are
many ways to do this?

2391
01:39:26,146 --> 01:39:26,316
Yeah?

2392
01:39:26,706 --> 01:39:29,396
>> Username, password,
probably email and the, perhaps,

2393
01:39:29,396 --> 01:39:34,296
portfolio ID or something like
that, [inaudible] perhaps.

2394
01:39:34,466 --> 01:39:34,996
>> OK. Good.

2395
01:39:34,996 --> 01:39:37,926
So, your username, password
and maybe email or maybe

2396
01:39:37,926 --> 01:39:39,206
that could just be the username.

2397
01:39:39,206 --> 01:39:43,896
The-- And some kind of unique
ID, because you're going to want

2398
01:39:43,896 --> 01:39:46,346
to correlate a user
in the user's table

2399
01:39:46,606 --> 01:39:49,986
with a portfolio owner
in the portfolio's table,

2400
01:39:49,986 --> 01:39:52,246
because what fields might
go in the portfolio's table?

2401
01:39:52,746 --> 01:39:54,446
>> Your stocks?

2402
01:39:54,636 --> 01:39:55,276
>> Your stocks.

2403
01:39:55,476 --> 01:39:57,836
So one field is maybe the
symbol of the thing you own.

2404
01:39:58,176 --> 01:40:01,816
One field is the ID of the
person who owns that symbol.

2405
01:40:02,066 --> 01:40:03,996
And then, what's probably
at least the third field

2406
01:40:03,996 --> 01:40:05,046
in the portfolio's table?

2407
01:40:05,716 --> 01:40:05,816
Yeah?

2408
01:40:06,256 --> 01:40:08,716
>> How many stocks
you have [inaudible].

2409
01:40:08,716 --> 01:40:09,186
>> Exactly.

2410
01:40:09,186 --> 01:40:11,006
Quantity. Do you want
to store the price

2411
01:40:11,226 --> 01:40:12,376
in the portfolio's table?

2412
01:40:12,876 --> 01:40:12,946
>> No.

2413
01:40:13,936 --> 01:40:15,906
>> Probably not, right,
because the price is going

2414
01:40:16,116 --> 01:40:18,716
to change presumably.

2415
01:40:18,986 --> 01:40:21,586
So, it's not like once you buy
a stock, that's what it's worth.

2416
01:40:21,586 --> 01:40:22,686
It's going to change day to day.

2417
01:40:22,686 --> 01:40:22,936
Yeah?

2418
01:40:22,936 --> 01:40:27,486
>> Just looking at the
stock, you might want

2419
01:40:27,736 --> 01:40:29,056
to actually store
the purchase price

2420
01:40:29,056 --> 01:40:30,986
because [inaudible]
calculate the difference.

2421
01:40:30,986 --> 01:40:31,286
>> Good.

2422
01:40:31,286 --> 01:40:32,166
>> It's gone up or down.

2423
01:40:32,346 --> 01:40:33,636
>> So there is an opportunity.

2424
01:40:33,636 --> 01:40:35,156
And here in lies
the dot, dot, dot.

2425
01:40:35,386 --> 01:40:36,646
Maybe it would be a good idea

2426
01:40:36,646 --> 01:40:39,406
to store the purchase
price recognizing

2427
01:40:39,406 --> 01:40:41,116
that the current
price will change.

2428
01:40:41,296 --> 01:40:43,206
But then, you can do
some basic arithmetic

2429
01:40:43,206 --> 01:40:46,716
and say your portfolio is
up 50% or it's down 50%.

2430
01:40:46,716 --> 01:40:48,896
But to do that, you have to
know what the starting point

2431
01:40:48,896 --> 01:40:49,986
or the starting price was.

2432
01:40:50,296 --> 01:40:53,306
But there, realize that you
run into some design issues,

2433
01:40:53,306 --> 01:40:56,606
like if you're going to start
restoring the purchase price,

2434
01:40:57,216 --> 01:41:00,336
now you might not want
to consolidate rows

2435
01:41:00,476 --> 01:41:01,696
in your portfolio table.

2436
01:41:01,696 --> 01:41:04,216
Because what if I buy a
share of Google at $400,

2437
01:41:04,546 --> 01:41:09,506
but another share of Google at
$500 I want to somehow remember

2438
01:41:09,506 --> 01:41:11,966
that one of those shares was
400, one of those was 500

2439
01:41:11,966 --> 01:41:14,386
or I want to store the average
price or something like that,

2440
01:41:14,566 --> 01:41:17,456
now I have some non-obvious
accounting issues to deal with.

2441
01:41:17,456 --> 01:41:19,376
So realizing the
spec, we do allow you

2442
01:41:19,376 --> 01:41:21,786
to simplify certain things,
but those are the kinds

2443
01:41:21,786 --> 01:41:26,276
of tradeoffs ultimately
that you need to make.

2444
01:41:26,416 --> 01:41:27,216
Any questions?

2445
01:41:27,766 --> 01:41:31,266
So it's actually--

2446
01:41:31,266 --> 01:41:33,586
you'll actually find that it's
pretty neat once you have the

2447
01:41:33,586 --> 01:41:35,246
things up and running
and you actually buy

2448
01:41:35,246 --> 01:41:38,006
and sell these stocks and then
check your portfolio the next

2449
01:41:38,006 --> 01:41:41,316
day because if you're querying
Yahoo every time the user wants

2450
01:41:41,316 --> 01:41:43,836
to look at their portfolio as
you'll have to, you'll be able

2451
01:41:43,836 --> 01:41:46,126
to see whether things
are going up or down.

2452
01:41:46,126 --> 01:41:48,226
And certainly, if you're
buying like millions of shares

2453
01:41:48,226 --> 01:41:49,346
of penny stocks or whatnot,

2454
01:41:49,386 --> 01:41:51,286
you can actually see
things change day to day

2455
01:41:51,286 --> 01:41:53,106
if there's actually a price
movement and it's going

2456
01:41:53,106 --> 01:41:54,696
from 3 cents to 4 cents.

2457
01:41:54,696 --> 01:41:56,996
It's actually a nice
33% profit overnight.

2458
01:41:57,436 --> 01:41:59,306
But, of course, this
is just imaginary

2459
01:41:59,306 --> 01:42:03,716
and you're just doing the math
even though the actual volume

2460
01:42:03,716 --> 01:42:06,186
of shares might not be
consistent with reality.

2461
01:42:06,546 --> 01:42:08,546
Any questions.

2462
01:42:08,546 --> 01:42:09,626
All right.

2463
01:42:11,796 --> 01:42:13,726
So, why don't we
officially wrap there?

2464
01:42:13,726 --> 01:42:14,856
I'll stick around for questions.

2465
01:42:14,856 --> 01:42:17,526
We'll let Chris get set up
and dive a bit more into PDO

2466
01:42:17,526 --> 01:42:21,346
in section, into project
1 and into SQL itself.

2467
01:42:24,016 --> 01:42:25,656
All right.

2468
01:42:27,516 --> 01:42:34,330
[ Silence ]

