1
00:00:00,036 --> 00:00:00,186
START

2
00:00:00,686 --> 00:00:08,806
[ Silence ]

3
00:00:09,306 --> 00:00:11,326
>> All right, welcome back
to Computer Science E-75.

4
00:00:11,326 --> 00:00:13,856
This is lecture one in which
we actually dive in to PHP.

5
00:00:13,856 --> 00:00:15,786
And so you pulled
up your browser,

6
00:00:15,786 --> 00:00:19,006
you hit www.google.com
and you hit enter.

7
00:00:19,446 --> 00:00:22,566
Can we play that back to the
story, what happens first

8
00:00:22,706 --> 00:00:25,436
and try to impress everyone
with as much technical detail

9
00:00:25,436 --> 00:00:27,286
by just one step as possible.

10
00:00:27,866 --> 00:00:30,556
Give me one step
in this process.

11
00:00:30,626 --> 00:00:31,846
You have hit enter,
what happens?

12
00:00:32,476 --> 00:00:32,606
Yes.

13
00:00:34,516 --> 00:00:36,896
>> Communication
with the DNS server.

14
00:00:37,066 --> 00:00:38,256
>> OK, so there's
some communication

15
00:00:38,256 --> 00:00:41,336
with the DNS sever, where by
your browser asks the local

16
00:00:41,336 --> 00:00:42,206
operating system.

17
00:00:42,206 --> 00:00:43,946
What is the IP address
of google.com.

18
00:00:44,226 --> 00:00:46,366
If you're operating system
itself does not know,

19
00:00:46,366 --> 00:00:48,686
it turn asks the
local DNS server.

20
00:00:48,856 --> 00:00:51,416
And who typically owns or
controls these DNS servers?

21
00:00:52,016 --> 00:00:53,676
[ Inaudible Remark ]

22
00:00:53,676 --> 00:00:54,946
Yeah, you're ISP.

23
00:00:54,946 --> 00:00:56,796
So, for Verisign,
Comcast, Harvard,

24
00:00:56,796 --> 00:00:58,586
your company anyone
along those lines.

25
00:00:58,586 --> 00:01:02,416
And if you're company your ISP
does not know what the IP is

26
00:01:02,416 --> 00:01:03,646
for google.com, what
happens next?

27
00:01:04,346 --> 00:01:04,446
Yup.

28
00:01:05,756 --> 00:01:07,366
>> They probably know another
DNS provider that knows

29
00:01:07,416 --> 00:01:10,926
so little, it may
direct to that stuff.

30
00:01:10,986 --> 00:01:13,956
>> Excellent, they probably know
some other DNS server and so,

31
00:01:13,956 --> 00:01:15,896
they ask the-- a
bigger fish followed

32
00:01:15,896 --> 00:01:17,276
by a bigger fish and so forth.

33
00:01:17,276 --> 00:01:20,236
And worse case, these are these
root servers that at least know

34
00:01:20,346 --> 00:01:22,136
where the other authorities are

35
00:01:22,136 --> 00:01:24,736
for the various .coms,
.nets, .orgs.

36
00:01:24,946 --> 00:01:27,496
And the reason that all works
is that when buy google.com

37
00:01:27,496 --> 00:01:29,546
or on your personal
domain, you at least have

38
00:01:29,616 --> 00:01:31,536
to tell you're registrar what?

39
00:01:31,606 --> 00:01:31,736
Yeah.

40
00:01:32,276 --> 00:01:37,786
>> On the DNS server if you're--

41
00:01:37,976 --> 00:01:40,036
where you're getting
your website.

42
00:01:40,036 --> 00:01:43,416
>> The DNS serves of the-- of
the hosting company of what not,

43
00:01:43,416 --> 00:01:46,316
where your website lives, and
that's typically called NS1

44
00:01:46,356 --> 00:01:47,726
and NS2, just conventions.

45
00:01:47,996 --> 00:01:51,516
But the important detail is that
they're usually two DNS serves

46
00:01:51,516 --> 00:01:55,216
that in return know your
websites as IP address,

47
00:01:55,416 --> 00:01:59,486
knows your webs-- domain names,
e-mail servers and the like.

48
00:01:59,616 --> 00:02:01,666
OK, so now my browser
knows the IP address

49
00:02:01,666 --> 00:02:05,086
of my google.com,
what happens next?

50
00:02:06,206 --> 00:02:06,296
Yeah.

51
00:02:07,216 --> 00:02:09,686
>> Look, sends and
it get request.

52
00:02:09,726 --> 00:02:09,816
>> Good.

53
00:02:09,816 --> 00:02:12,346
>> Yeah, the room of
actual hard drive.

54
00:02:12,816 --> 00:02:13,766
>> Good so we told the story

55
00:02:13,766 --> 00:02:16,096
of the virtual envelop
a.k.a. packet and that's send

56
00:02:16,096 --> 00:02:18,516
from point A, you,
to point B, Google.

57
00:02:18,516 --> 00:02:22,176
And inside that envelop is
this message "get me slash"

58
00:02:22,386 --> 00:02:23,606
and then there's some reminder

59
00:02:23,916 --> 00:02:25,536
of the protocol that's
being spoken

60
00:02:25,536 --> 00:02:27,986
"http slash 1.1" or what not.

61
00:02:28,076 --> 00:02:30,116
What's also inside
of that pocket?

62
00:02:30,946 --> 00:02:31,996
What amount of information?

63
00:02:32,406 --> 00:02:36,566
>> Could be reminder of
what actual web address

64
00:02:37,116 --> 00:02:38,116
that user typed in.

65
00:02:38,256 --> 00:02:40,376
>> Good, so a reminder of
the address that user typed

66
00:02:40,376 --> 00:02:43,706
in which is the host HTTP
header and this is crucial

67
00:02:43,706 --> 00:02:44,946
for what feature offered

68
00:02:44,946 --> 00:02:48,056
by today's web servers,
someone else, yeah.

69
00:02:48,236 --> 00:02:49,256
>> Virtual hosting.

70
00:02:49,586 --> 00:02:53,026
>> Virtual hosting, whereby
you can put many websites

71
00:02:53,026 --> 00:02:56,126
on the same physical machine
and even on the same IP address

72
00:02:56,446 --> 00:02:59,836
because browsers thankfully will
remind the server what host name

73
00:02:59,836 --> 00:03:02,576
was actually requested so that
the web server can distinguish

74
00:03:02,576 --> 00:03:05,846
between your website of someone
else's website and so forth.

75
00:03:06,196 --> 00:03:08,226
All right, this virtual
envelop goes to Google.

76
00:03:08,226 --> 00:03:09,896
Google opens the
envelop so to speak.

77
00:03:09,956 --> 00:03:11,696
She get slash dot, dot.

78
00:03:11,916 --> 00:03:14,036
Realizes "Oh, you want
the root of our website?"

79
00:03:14,036 --> 00:03:17,266
In Google's case, that's all
the HTML and other assets

80
00:03:17,266 --> 00:03:19,776
that compose their home
page for searching.

81
00:03:20,056 --> 00:03:22,476
And so, they respond with
the packet of their own

82
00:03:22,476 --> 00:03:24,796
or more packets of their
own inside of which is all

83
00:03:24,796 --> 00:03:26,896
that HTML, your browser
receives it,

84
00:03:26,896 --> 00:03:29,096
renders it, connection is close.

85
00:03:29,336 --> 00:03:31,046
Now in terms of more
subtle details,

86
00:03:31,046 --> 00:03:33,726
browsers these days are
fairly smart and that rather

87
00:03:33,726 --> 00:03:35,816
than ever have to ask
the operating system,

88
00:03:35,816 --> 00:03:37,186
Mac OS, Windows, whatever.

89
00:03:37,306 --> 00:03:40,356
What the IP address is of
google.com, a browser will cache

90
00:03:40,436 --> 00:03:41,766
that IP address typically.

91
00:03:42,016 --> 00:03:44,386
So this just means it's
slightly more efficient

92
00:03:44,386 --> 00:03:46,686
than asking the operating system
and certainly more efficient

93
00:03:46,686 --> 00:03:48,276
than asking local DNS servers.

94
00:03:48,576 --> 00:03:50,026
But there's a got you,
and one of the themes

95
00:03:50,026 --> 00:03:52,326
of this course will be to try to
point out some of these details.

96
00:03:52,506 --> 00:03:54,056
Because, if you are
not just a user

97
00:03:54,056 --> 00:03:56,136
but you're actually a
web developer trying

98
00:03:56,136 --> 00:03:58,556
to build new websites, suppose

99
00:03:58,556 --> 00:04:01,536
that the IP address has
been cached but suppose

100
00:04:01,536 --> 00:04:03,546
that you moved the
website to another server

101
00:04:03,546 --> 00:04:05,046
or another virtual machine.

102
00:04:05,366 --> 00:04:07,166
There are these got you's
you might run in to.

103
00:04:07,166 --> 00:04:08,986
And so one of the
recurring themes of any sort

104
00:04:08,986 --> 00:04:10,186
of web development especially

105
00:04:10,396 --> 00:04:13,886
in this PHP world is constantly
be clearing your cache.

106
00:04:13,886 --> 00:04:16,266
And in one other upsides
of using Chrome frankly

107
00:04:16,646 --> 00:04:21,386
for primary development is it
has incognito mode which, well,

108
00:04:21,386 --> 00:04:23,646
usually is used so you can
browse sketchy places online.

109
00:04:23,646 --> 00:04:25,596
It can also be used to
a developer's advantage

110
00:04:25,836 --> 00:04:28,326
and that it will prevent
cookies from being saved

111
00:04:28,466 --> 00:04:30,096
and other details
from being cached.

112
00:04:30,126 --> 00:04:32,096
But even then, it's not
perfect and even I often

113
00:04:32,216 --> 00:04:34,416
to have quit the
browser entirely clear my

114
00:04:34,416 --> 00:04:35,376
cache manually.

115
00:04:35,636 --> 00:04:37,866
If you ever notice anomaly
is happening or like,

116
00:04:37,866 --> 00:04:40,896
"I know I changed that file"
it could just be some stupid

117
00:04:40,896 --> 00:04:41,566
cache issue.

118
00:04:41,566 --> 00:04:43,176
Just-- So put that in
the back of your mind

119
00:04:43,176 --> 00:04:44,386
so that you don't waste 10,

120
00:04:44,386 --> 00:04:47,116
20 minutes some night this
summer chasing down a bug

121
00:04:47,116 --> 00:04:48,536
that you actually already fixed.

122
00:04:48,536 --> 00:04:51,826
Caching takes many forms
and DNS is just one of them.

123
00:04:51,826 --> 00:04:52,636
All right.

124
00:04:52,636 --> 00:04:56,396
So any questions on that
big picture of HTTP?

125
00:04:56,846 --> 00:04:58,926
None? All right.

126
00:04:59,136 --> 00:05:00,846
So where does this all fit in?

127
00:05:00,846 --> 00:05:03,016
So this is the picture we
essentially just painted

128
00:05:03,016 --> 00:05:06,326
verbally, so what's
on the end of point B?

129
00:05:06,326 --> 00:05:08,166
In this case Google
or some other server.

130
00:05:08,166 --> 00:05:11,806
So one of the most popular web
servers out there is Apache.

131
00:05:11,806 --> 00:05:13,176
This is freely available
software.

132
00:05:13,176 --> 00:05:15,976
It can run on Linux computers,
Macs, Windows computers

133
00:05:15,976 --> 00:05:18,556
but it's super common in
the Linux and Unix world

134
00:05:18,556 --> 00:05:20,416
in particular, and those
tend to be machines used

135
00:05:20,416 --> 00:05:21,736
for web servers these days.

136
00:05:22,486 --> 00:05:24,306
It is the A in LAMP.

137
00:05:24,536 --> 00:05:28,496
So LAMP is just a silly buzz
word, a Linux, Apache, MySQL,

138
00:05:28,496 --> 00:05:31,366
PHP, and that's just a buzz
word saying, "I'm using all

139
00:05:31,366 --> 00:05:32,726
of these various technologies."

140
00:05:32,726 --> 00:05:35,566
But common jargon in the
industry is to say that

141
00:05:35,566 --> 00:05:37,166
"I'm running a LAMP stack."

142
00:05:37,166 --> 00:05:40,286
And that just means you have
Linux as you operating system,

143
00:05:40,286 --> 00:05:42,706
Apache as your web
server and so forth.

144
00:05:42,706 --> 00:05:44,686
And so there's nothing
technical about the term,

145
00:05:44,926 --> 00:05:46,636
but we'll be looking at
the individual pieces.

146
00:05:46,636 --> 00:05:50,286
So one of the latest versions
of Apache is 2.2 something.

147
00:05:50,286 --> 00:05:51,636
This is the documentation there.

148
00:05:51,636 --> 00:05:52,926
I will say from personal
experience,

149
00:05:52,926 --> 00:05:54,506
I've never found it
the most user-friendly.

150
00:05:54,776 --> 00:05:57,436
So frankly Google is the
better friend to me at least

151
00:05:57,486 --> 00:05:58,736
than Apache is on websites,

152
00:05:58,736 --> 00:06:02,056
stockoverflow.com,
serverfault.com.

153
00:06:02,056 --> 00:06:03,156
These are wonderful places

154
00:06:03,156 --> 00:06:06,686
where smart technical people
post generally useful solutions

155
00:06:06,686 --> 00:06:07,566
to common problems.

156
00:06:07,846 --> 00:06:11,066
So keep an eye out
for-- or make use

157
00:06:11,266 --> 00:06:12,826
of those resources
as you see fit.

158
00:06:13,086 --> 00:06:15,246
But what are the kinds
of things that you can do

159
00:06:15,246 --> 00:06:16,896
with the web server
configuration?

160
00:06:17,066 --> 00:06:18,446
Well, virtual host name.

161
00:06:18,446 --> 00:06:21,076
So this is a representative
snippet

162
00:06:21,556 --> 00:06:25,676
from a file called httpd.conf.

163
00:06:25,676 --> 00:06:27,666
And let me just pull
up a little scratch pad

164
00:06:27,696 --> 00:06:29,766
so we can type out
some notes here.

165
00:06:30,266 --> 00:06:33,906
And the blackboards are
occluded by the projector here

166
00:06:33,906 --> 00:06:35,116
so we'll use text edit.

167
00:06:35,576 --> 00:06:39,136
So this just so happens
to be the name typically

168
00:06:39,136 --> 00:06:41,836
of a configuration file
however you might also see it

169
00:06:41,836 --> 00:06:44,886
as apache.conf, apache2.conf.

170
00:06:45,076 --> 00:06:46,796
It really depends on
your operating system

171
00:06:46,796 --> 00:06:49,066
or the distribution of Linux
for instance that you're using.

172
00:06:49,116 --> 00:06:51,036
But the important takeaway is

173
00:06:51,036 --> 00:06:53,936
that this is typically the
main configuration file

174
00:06:53,936 --> 00:06:56,246
for an Apache based web server
and internetics [phonetic]

175
00:06:56,736 --> 00:06:59,946
in Microsoft IIS server
has similar features.

176
00:06:59,946 --> 00:07:01,216
There's other web
servers software

177
00:07:01,216 --> 00:07:03,206
but Apache is definitely
among the most the common.

178
00:07:03,486 --> 00:07:06,736
And here is a representative
snippet from that file

179
00:07:06,916 --> 00:07:08,866
that apparently is
implementing what feature

180
00:07:09,516 --> 00:07:11,346
for the web server
if you can infer.

181
00:07:11,346 --> 00:07:14,476
Kind of just guess
by reading it.

182
00:07:14,476 --> 00:07:14,543
Yeah.

183
00:07:14,543 --> 00:07:19,746
>> First of, it's a port 80 so
that's on a regular website.

184
00:07:19,746 --> 00:07:22,456
>> OK. Good so you see a
port 80 at the very top there

185
00:07:22,586 --> 00:07:24,116
which suggests it's
indeed a sort

186
00:07:24,116 --> 00:07:26,306
of standard website
living on a standard port.

187
00:07:27,446 --> 00:07:28,626
What else comes to mind?

188
00:07:28,626 --> 00:07:31,246
What other feature is being
conveyed by this configure?

189
00:07:31,246 --> 00:07:31,313
Yeah.

190
00:07:31,866 --> 00:07:33,606
>> A database.

191
00:07:34,306 --> 00:07:39,026
>> A database, where are
you inferring database from?

192
00:07:39,446 --> 00:07:40,636
>> Is that port 443?

193
00:07:40,996 --> 00:07:44,126
>> 44-- so not-- 443 is
actually used for SSL.

194
00:07:44,156 --> 00:07:45,286
So there's two pieces here.

195
00:07:45,286 --> 00:07:47,516
We can-- and we'll focus
on both, but first,

196
00:07:47,706 --> 00:07:50,236
the top one port 80 is sort
of the simpler of the two,

197
00:07:50,236 --> 00:07:51,416
so let's look there first.

198
00:07:51,836 --> 00:07:53,366
So I'll put this one up.

199
00:07:53,366 --> 00:07:55,806
So virtual hosting, this feature

200
00:07:55,806 --> 00:07:58,856
where by a web server
can use multiple--

201
00:07:58,856 --> 00:08:02,096
the same IP address for
multiple websites is implemented

202
00:08:02,096 --> 00:08:03,676
literally by a way
of a file like this.

203
00:08:03,756 --> 00:08:05,026
This is telling the web server,

204
00:08:05,026 --> 00:08:06,536
and the top thing there
is just a comment,

205
00:08:06,916 --> 00:08:10,016
this is telling the web server
"Hey, define a virtual host

206
00:08:10,016 --> 00:08:14,816
or Vhost on port 80 of any IP
address that the server have."

207
00:08:14,816 --> 00:08:17,196
So star denotes anything,
and in this case,

208
00:08:17,196 --> 00:08:18,696
it's meant to mean
an IP address.

209
00:08:19,036 --> 00:08:21,446
And this is relevant because if
the web server just so happens

210
00:08:21,446 --> 00:08:24,276
to have multiple IP addresses,
this is a wild card character

211
00:08:24,276 --> 00:08:26,436
that just says, it doesn't
matter what IP address,

212
00:08:26,436 --> 00:08:29,486
the request comes in on,
go ahead and just listen

213
00:08:29,486 --> 00:08:31,246
on port 80 on all of those IPs.

214
00:08:31,506 --> 00:08:33,496
So another common thing
specially if you're developing

215
00:08:33,496 --> 00:08:37,176
on your local virtual machine
which is increasingly common,

216
00:08:37,176 --> 00:08:40,126
and this again what we'll do in
the class, sometimes you do need

217
00:08:40,126 --> 00:08:43,556
to know the IP address specially
in various cloud environments.

218
00:08:43,736 --> 00:08:47,016
So just be mindful of sometimes
star is not sufficient unless

219
00:08:47,016 --> 00:08:51,226
you have configuration
another layer of configuration

220
00:08:51,226 --> 00:08:52,426
that I'll wave my
hand up for now

221
00:08:52,486 --> 00:08:54,026
because we're just
looking at snippet here.

222
00:08:54,336 --> 00:08:56,606
So this says, listen on
port 80 on any IP address

223
00:08:56,606 --> 00:08:58,886
that the server has
for incoming requests.

224
00:08:59,356 --> 00:09:03,426
Now, when in-- requests do come
in to the server, thankfully,

225
00:09:03,426 --> 00:09:06,786
they should have that
host colon HTTP header

226
00:09:06,906 --> 00:09:09,956
that reminds the server
what this request was for.

227
00:09:10,226 --> 00:09:11,926
So, if you skim through
some of these,

228
00:09:11,926 --> 00:09:14,266
and let's skip the top
part now, server name,

229
00:09:14,366 --> 00:09:17,496
this is where the Vhost's
name is actually defined.

230
00:09:17,806 --> 00:09:19,486
And we'll see it down here, too.

231
00:09:19,486 --> 00:09:22,196
For the SSL version, the name of
this website will be the same.

232
00:09:22,466 --> 00:09:24,656
But I've also defined
what we call an alias,

233
00:09:25,126 --> 00:09:26,336
which is just what in this case?

234
00:09:26,576 --> 00:09:27,216
Web sanity check.

235
00:09:27,826 --> 00:09:27,916
Yeah.

236
00:09:28,576 --> 00:09:30,446
>> The same size of [inaudible].

237
00:09:30,766 --> 00:09:31,276
>> Exactly.

238
00:09:31,276 --> 00:09:34,316
The alias here is just
cs75.net with no www.

239
00:09:34,626 --> 00:09:37,786
So, this is just one of the
steps necessary to ensure

240
00:09:37,786 --> 00:09:42,386
that both www.cs75.network
and cs75.network.

241
00:09:42,616 --> 00:09:44,536
So, the quick story
I told on Monday

242
00:09:44,536 --> 00:09:48,866
about certain websites just not
working with just something.com

243
00:09:48,866 --> 00:09:51,506
or the like, is because
someone did not think

244
00:09:51,506 --> 00:09:53,936
to configure a fairly
minor detail like this.

245
00:09:54,066 --> 00:09:56,956
Again, this is Apache but
other web servers, a Lighttpd,

246
00:09:56,956 --> 00:09:58,886
Nginx and others have
similar features.

247
00:09:59,086 --> 00:10:02,756
So, this is one step and just
to time Monday until tonight,

248
00:10:03,216 --> 00:10:06,306
what was the other key
detail that you need to do

249
00:10:06,306 --> 00:10:07,646
to ensure that both work?

250
00:10:07,836 --> 00:10:11,166
Both www and not ww.

251
00:10:11,376 --> 00:10:12,656
Not a redirect.

252
00:10:12,656 --> 00:10:14,736
Redirect is really just
to ensure these are ends

253
00:10:14,736 --> 00:10:15,996
up at the place you want

254
00:10:16,166 --> 00:10:18,676
where you want both destinations
fundamentally to work.

255
00:10:19,016 --> 00:10:20,436
[ Inaudible Remark ]

256
00:10:20,436 --> 00:10:23,676
So, we needed a DNS record,
an A record in particular.

257
00:10:23,676 --> 00:10:28,686
So, we needed to specify that
cs75.net itself has an A record

258
00:10:28,896 --> 00:10:34,346
and we need to specify that
www.cs75.net has an A record or,

259
00:10:34,886 --> 00:10:36,976
what other type of
record could be?

260
00:10:37,106 --> 00:10:39,106
[ Inaudible Remark ]

261
00:10:39,196 --> 00:10:43,086
Multiple aliases which we
called CNAMES on Monday.

262
00:10:43,476 --> 00:10:44,986
So, CNAME are canonical name.

263
00:10:45,086 --> 00:10:48,146
Now, these two is sort of
a corner case, technically,

264
00:10:48,426 --> 00:10:53,236
unfortunately, you can
not generally make CNAMES

265
00:10:53,546 --> 00:10:54,866
for the root of d domain.

266
00:10:54,866 --> 00:11:00,486
Cs75.net cannot be a CNAME for
something else but something

267
00:11:00,486 --> 00:11:04,956
with a host name, www,
ftp, mail, .something.com,

268
00:11:04,956 --> 00:11:06,066
those can all be CNAMES

269
00:11:06,316 --> 00:11:08,086
and that's a bit even
over simplification.

270
00:11:08,086 --> 00:11:12,286
You can have cs75.net be at
CNAME technically but things

271
00:11:12,286 --> 00:11:14,096
like e-mail tend to
break as a result.

272
00:11:14,096 --> 00:11:17,016
So, let me just make the
blank statement that this has

273
00:11:17,066 --> 00:11:21,166
to be an A record, this can
be an A record or a CNAME.

274
00:11:21,366 --> 00:11:23,386
So, just little things you need
to keep in mind when setting

275
00:11:23,386 --> 00:11:26,246
up for instance your own domain
name that you just bought.

276
00:11:26,246 --> 00:11:28,686
Server admin, so this is
just a floppy detail so that

277
00:11:28,686 --> 00:11:32,106
if there's ever an error on your
website and you see it like 404

278
00:11:32,106 --> 00:11:33,086
or something like that,

279
00:11:33,276 --> 00:11:34,916
if you haven't customized
the error message,

280
00:11:34,916 --> 00:11:37,166
the footer of the web
page is generally going

281
00:11:37,166 --> 00:11:39,836
to give the email address of
the web master at something.com.

282
00:11:40,126 --> 00:11:41,226
In this case, we're telling them

283
00:11:41,226 --> 00:11:42,946
to use this address
just because.

284
00:11:43,006 --> 00:11:45,656
So, it's not something like
web master which doesn't exist

285
00:11:45,776 --> 00:11:48,766
in our case since we're
such a small shot.

286
00:11:49,116 --> 00:11:52,476
Lastly, custom log, error log,
this kind of do with they say.

287
00:11:52,476 --> 00:11:54,746
It's just specifying the
folder in which you want logs

288
00:11:54,796 --> 00:11:59,006
to be stored and most important
line here though perhaps is

289
00:11:59,006 --> 00:12:00,006
document root.

290
00:12:00,206 --> 00:12:02,226
Now, this is kind of crazy
long encryptic [phonetic].

291
00:12:02,396 --> 00:12:05,606
It just is what we as a
class decided to do in terms

292
00:12:05,606 --> 00:12:07,516
of the layout of our hard drive.

293
00:12:07,946 --> 00:12:10,996
However, all this is
telling the virtual host is

294
00:12:10,996 --> 00:12:14,866
that the HTML files or
PHP files, GIFs or PNGs

295
00:12:14,926 --> 00:12:17,086
for this virtual
host called www.

296
00:12:17,086 --> 00:12:20,326
[inaudible].net lives
specifically

297
00:12:20,496 --> 00:12:22,336
in this directory on the server.

298
00:12:22,846 --> 00:12:25,406
Very often this will be much a
shorter path for normal people

299
00:12:25,406 --> 00:12:27,946
but we've kind of laid ourselves
out fairly hierarchy play

300
00:12:27,946 --> 00:12:31,626
which is why it's so long
but that's all it means.

301
00:12:31,786 --> 00:12:33,046
All right, any questions?

302
00:12:33,836 --> 00:12:34,666
And again, this is something

303
00:12:34,666 --> 00:12:36,806
that for the first project you
have an opportunity to tinker

304
00:12:36,806 --> 00:12:39,716
with and even break if you
want and you'll be able

305
00:12:39,716 --> 00:12:40,996
to restore it rather easily.

306
00:12:41,496 --> 00:12:43,016
All right.

307
00:12:43,146 --> 00:12:48,396
So the virtual host on port 443
is a little more interesting

308
00:12:48,436 --> 00:12:51,866
but also mostly a duplicate
but the few lines are new

309
00:12:51,866 --> 00:12:53,906
which one's jump out at
you is obviously new,

310
00:12:53,906 --> 00:12:56,936
so all the SSL stuff
at the bottom.

311
00:12:57,176 --> 00:13:00,436
So SSL is kind of a
pain to setup at least

312
00:13:00,436 --> 00:13:02,906
with certain web
servers whereby you have

313
00:13:02,906 --> 00:13:04,116
to configure a few files.

314
00:13:04,356 --> 00:13:05,736
So what is SSL?

315
00:13:05,906 --> 00:13:07,406
SSL is Secure Sockets Layer.

316
00:13:07,646 --> 00:13:09,436
This is the protocol
that websites use

317
00:13:09,436 --> 00:13:11,606
to communicate securely
with browsers

318
00:13:11,926 --> 00:13:15,396
but what is necessary before
you can actually use SSL

319
00:13:15,396 --> 00:13:16,026
on your website?

320
00:13:16,246 --> 00:13:16,926
Does anyone know?

321
00:13:17,756 --> 00:13:19,786
What's involved in doing this?

322
00:13:22,436 --> 00:13:22,576
Yeah.

323
00:13:22,576 --> 00:13:24,686
>> I think you need to
distribute a certificate

324
00:13:24,686 --> 00:13:27,686
that the user will have to get.

325
00:13:27,886 --> 00:13:31,656
>> Exactly, you need to
distribute a certificate

326
00:13:31,746 --> 00:13:36,556
that the user will need
part of, it will need

327
00:13:36,556 --> 00:13:37,956
to get some help from you.

328
00:13:38,006 --> 00:13:39,866
Thankfully, it's all automatic.

329
00:13:39,866 --> 00:13:42,216
So how do you go about
getting a SSL certificates?

330
00:13:42,546 --> 00:13:45,516
So there's a couple
of things you can do.

331
00:13:45,516 --> 00:13:47,476
You can either, create
one and sign it,

332
00:13:47,506 --> 00:13:50,056
so to speak to yourself, or
you can pay someone else.

333
00:13:50,056 --> 00:13:52,096
And have you ever been
to a website that said--

334
00:13:52,096 --> 00:13:54,176
whereby the browser upon
visiting, yells at you

335
00:13:54,336 --> 00:13:59,656
as saying something like "this
website cannot be trusted" only,

336
00:14:00,186 --> 00:14:03,186
you know, "you should not go
here" for some reason like that.

337
00:14:03,186 --> 00:14:05,656
So that's because that
website probably doesn't have a

338
00:14:05,656 --> 00:14:07,156
certificate that was signed

339
00:14:07,156 --> 00:14:09,286
by what's called a
certificate authority.

340
00:14:09,286 --> 00:14:16,876
And I think I can actually
simulate this, I just happened

341
00:14:17,036 --> 00:14:21,586
to cross this the other day
because I wanted to make one

342
00:14:21,796 --> 00:14:24,196
of my university
websites run over SSL.

343
00:14:24,196 --> 00:14:26,816
So let me open up
chrome here and type

344
00:14:26,816 --> 00:14:29,636
in https://cs.harvard.edu,
enter.

345
00:14:29,636 --> 00:14:31,246
Perfect, perfect example.

346
00:14:31,366 --> 00:14:33,296
So CS department
has not paid for

347
00:14:33,356 --> 00:14:36,846
and what's called a SSL
certificate ironically.

348
00:14:36,846 --> 00:14:39,816
And I will fix this but it's
a great demonstration, so.

349
00:14:39,816 --> 00:14:40,426
What does this mean?

350
00:14:40,426 --> 00:14:43,106
It means that the site isn't
necessarily insecure, per se.

351
00:14:43,106 --> 00:14:47,156
It pretty much boils down and
this is some what pessimistic

352
00:14:47,156 --> 00:14:50,476
to the fact that we have not
paid for in SSL certificate.

353
00:14:50,476 --> 00:14:53,996
We have created an SSL
certificate whereby that's just

354
00:14:54,086 --> 00:14:54,706
a command.

355
00:14:54,706 --> 00:14:56,386
On a Linux computer,

356
00:14:56,386 --> 00:14:59,456
you typically run a
command called Open SSL

357
00:14:59,576 --> 00:15:04,416
with some fairly arcane command
line arguments and hit enter.

358
00:15:04,416 --> 00:15:08,376
And that gives you what's called
a public key and a private key.

359
00:15:08,506 --> 00:15:09,126
What does that mean?

360
00:15:09,126 --> 00:15:11,516
Well for our purposes here,

361
00:15:11,516 --> 00:15:14,266
just know that there's a fancy
mathematical relationship

362
00:15:14,336 --> 00:15:16,416
between this thing called a
pubic key and a private key.

363
00:15:16,416 --> 00:15:19,446
They're really just big random
numbers and mathematically,

364
00:15:19,446 --> 00:15:22,786
people in the internet
can use CS75's public key

365
00:15:22,996 --> 00:15:24,496
to encrypt information to us.

366
00:15:24,496 --> 00:15:26,506
So if some random
user is visiting,

367
00:15:26,506 --> 00:15:29,016
trying to visit Harvard
CS website,

368
00:15:29,016 --> 00:15:31,906
their browser automatically
will say the cs.harvard.edu,

369
00:15:31,906 --> 00:15:35,366
"Can I please have
your public key?"

370
00:15:35,366 --> 00:15:37,476
And the browser will
send it for free

371
00:15:37,476 --> 00:15:39,366
and over the internet publicly,

372
00:15:39,366 --> 00:15:41,366
it's not something
that's secure.

373
00:15:41,366 --> 00:15:48,276
Public key is meant to
be-- by definition public.

374
00:15:48,516 --> 00:15:54,536
That browser will behind the
scenes unbeknownst to the user,

375
00:15:54,536 --> 00:15:58,216
use that public key,
that big random number

376
00:15:58,216 --> 00:15:59,646
to encrypt their request.

377
00:15:59,646 --> 00:16:04,596
And the request can be
something stupid like get slash

378
00:16:04,596 --> 00:16:07,786
and that's literally all
my request just now was.

379
00:16:07,866 --> 00:16:09,556
But it encrypts it
none the less.

380
00:16:09,556 --> 00:16:14,296
And you could probably guess,
what is the only number

381
00:16:14,346 --> 00:16:17,486
in the world that can decrypt
something that's been encrypted

382
00:16:17,486 --> 00:16:21,396
with the public key,
the private key?

383
00:16:21,396 --> 00:16:23,306
And that's something
that my server

384
00:16:23,306 --> 00:16:26,716
or the CS department
server keeps to itself.

385
00:16:26,716 --> 00:16:28,646
And you don't give it out,

386
00:16:28,646 --> 00:16:31,596
and the web servers
never going to send it.

387
00:16:31,596 --> 00:16:34,586
It's stored somewhere
on the hard drive.

388
00:16:34,586 --> 00:16:37,956
Now mathematically,
that key will be used

389
00:16:38,086 --> 00:16:40,446
with mathematical formula

390
00:16:40,496 --> 00:16:43,446
to reverse the effects
essentially of the encryption.

391
00:16:43,716 --> 00:16:48,566
So that what the CS department's
web server finally sees is get

392
00:16:48,566 --> 00:16:50,426
slash or whatever it
is the user wants.

393
00:16:50,476 --> 00:16:54,206
And conversely it works
in the other direction.

394
00:16:54,206 --> 00:16:58,386
When you install browser, your
browser generates and a public

395
00:16:58,386 --> 00:17:00,026
and private key pair,
so that's--

396
00:17:00,026 --> 00:17:01,646
the traffic can work

397
00:17:01,646 --> 00:17:04,336
on the opposite direction
as well if necessary.

398
00:17:04,336 --> 00:17:06,086
So what's the take away here?

399
00:17:06,146 --> 00:17:08,236
We did all that in
the CS department

400
00:17:08,236 --> 00:17:10,096
but we didn't pay
someone else to certify

401
00:17:10,096 --> 00:17:13,566
that we are Harvard
University's CS department.

402
00:17:13,596 --> 00:17:17,866
So the way as SSL works
on a higher level is

403
00:17:18,086 --> 00:17:21,366
that there is this chain
of trust that humans

404
00:17:21,366 --> 00:17:24,326
in the world have tried to build
up whereby there's big companies

405
00:17:24,326 --> 00:17:26,076
like Verisign is one of them.

406
00:17:26,306 --> 00:17:28,956
GoDaddy is another and maybe
ever Namecheap does this.

407
00:17:28,956 --> 00:17:30,786
Even more cheaply than others,

408
00:17:30,786 --> 00:17:35,036
whereby you have these fairly
big entities on the world

409
00:17:35,036 --> 00:17:37,346
who charge you money
to then stamp

410
00:17:37,346 --> 00:17:40,166
so to speak your
certificate as valid.

411
00:17:40,226 --> 00:17:40,796
What does mean?

412
00:17:40,966 --> 00:17:41,786
They digitally sign it.

413
00:17:41,786 --> 00:17:44,796
So there's actually some
interesting mathematics there

414
00:17:45,126 --> 00:17:49,896
that are involved but in the
end of the day, it's in part

415
00:17:49,896 --> 00:17:51,866
of marketing thing,
whereby we the whole world

416
00:17:51,866 --> 00:17:55,366
of internet users are
trusting that if Verisign says

417
00:17:55,636 --> 00:17:59,676
that this SSL certificate
belongs to cs.harvard.edu.

418
00:18:00,006 --> 00:18:04,976
If I trust Verisign, I
should trust this website.

419
00:18:04,976 --> 00:18:08,436
Now how does Verisign
do the authorization?

420
00:18:08,676 --> 00:18:12,126
Well, some of these
registrars or these sellers

421
00:18:12,126 --> 00:18:15,616
of SLL certificates, they'll go

422
00:18:16,796 --> 00:18:21,006
to a reasonable lengths
to make sure.

423
00:18:21,226 --> 00:18:23,596
They'll call you on the phone,

424
00:18:23,636 --> 00:18:26,386
they'll check some
business records.

425
00:18:26,426 --> 00:18:29,066
That's what you get if
they're really being diligent.

426
00:18:29,066 --> 00:18:35,006
But the reality is all they
do is send an email typically

427
00:18:35,006 --> 00:18:37,936
to whoever is on file as the
owner of the domain name,

428
00:18:38,026 --> 00:18:40,046
and in this case it's
Drew Faust or someone

429
00:18:40,046 --> 00:18:42,796
like that for harvard.edu.

430
00:18:42,796 --> 00:18:45,916
And that person has to say,
"Yes, I own this domain

431
00:18:46,096 --> 00:18:50,536
and I approve this digital
signing of this certificate."

432
00:18:50,536 --> 00:18:53,226
And then, you get back your
digitally signed certificate.

433
00:18:53,226 --> 00:18:58,366
And what you do as the system
administrator is you install

434
00:18:58,366 --> 00:18:59,886
that digitally signed
certificate

435
00:18:59,886 --> 00:19:01,976
which frankly is a big
number supplemented

436
00:19:01,976 --> 00:19:03,516
by another big number
and you install it

437
00:19:03,776 --> 00:19:05,756
on your web server using
the syntax that we just saw

438
00:19:05,756 --> 00:19:07,076
and we'll see again
in just a moment.

439
00:19:07,076 --> 00:19:07,966
So how do you get
this certificate?

440
00:19:07,996 --> 00:19:09,526
Well, you can go to someone like
Verisign-- and let's do that.

441
00:19:09,556 --> 00:19:11,206
Verisign.com and here we have--
let's see lots of products.

442
00:19:11,236 --> 00:19:11,656
So, oh, here we go.

443
00:19:11,686 --> 00:19:12,406
Buy SSL certificates and OK.

444
00:19:12,436 --> 00:19:13,336
You know it's going
to be expensive

445
00:19:13,366 --> 00:19:14,686
when they don't tell you the
price right away on the page,

446
00:19:14,716 --> 00:19:15,646
so let's compare all
SLL certificates.

447
00:19:15,676 --> 00:19:16,276
OK. So what do we get?

448
00:19:16,306 --> 00:19:17,176
Let's see, let's
just spoil the-- OK.

449
00:19:17,206 --> 00:19:17,476
Here we go.

450
00:19:17,506 --> 00:19:18,346
OK, they're still
not-- oh there we go.

451
00:19:18,376 --> 00:19:19,756
OK. So here's what an SSL
certificate apparently cost

452
00:19:19,786 --> 00:19:20,386
if you go through Verisign.

453
00:19:20,416 --> 00:19:21,376
And mind you, it's
just for one year.

454
00:19:21,406 --> 00:19:22,726
So you're essentially renting
their approval for a year.

455
00:19:22,756 --> 00:19:23,476
What you get now is what here?

456
00:19:23,506 --> 00:19:24,286
Different encryption strengths.

457
00:19:24,316 --> 00:19:25,666
So if you're familiar with
cryptography, the more bits

458
00:19:25,696 --> 00:19:27,166
in the cipher in the encryption
algorithm, the more secure

459
00:19:27,196 --> 00:19:27,916
in theory the transmission is.

460
00:19:27,946 --> 00:19:29,296
Extended validation, not
quite sure what this means,

461
00:19:29,326 --> 00:19:30,706
probably has to do something
like the duration of it.

462
00:19:30,736 --> 00:19:31,876
The warranty, I've never
really understood, you know,

463
00:19:31,906 --> 00:19:32,866
you're going to pay $400

464
00:19:32,896 --> 00:19:33,916
and somehow they're
warranting your website

465
00:19:33,946 --> 00:19:34,996
for $1.5 million dollars.

466
00:19:35,026 --> 00:19:36,076
I assume the fine prints
said something like,

467
00:19:36,106 --> 00:19:37,516
"If the cryptography we use
is broken, fundamentally,

468
00:19:37,546 --> 00:19:38,206
we will pay out this amount."

469
00:19:38,236 --> 00:19:38,836
I'm just making that up.

470
00:19:38,866 --> 00:19:40,336
But the reality is this is
pretty meaningless, all of this.

471
00:19:40,366 --> 00:19:41,236
And the fact that
you get the right

472
00:19:41,266 --> 00:19:42,796
to put Norton Secured Sealed
on your website is atrocious.

473
00:19:42,826 --> 00:19:44,146
Because anyone can put
an image tag on a website

474
00:19:44,176 --> 00:19:44,836
that says something like that.

475
00:19:44,866 --> 00:19:45,766
So a lot of these
realize is trying

476
00:19:45,796 --> 00:19:46,996
to create an industry
around, sending a message

477
00:19:47,026 --> 00:19:47,776
of security to end users.

478
00:19:47,806 --> 00:19:49,066
But seeing this should never
mean anything to anyone.

479
00:19:49,096 --> 00:19:49,996
It just means that
someone knows how

480
00:19:50,026 --> 00:19:50,866
to embed an image on a website.

481
00:19:50,896 --> 00:19:51,616
And the take away here too is

482
00:19:51,646 --> 00:19:53,146
that using Verisign isn't
necessarily all that compelling.

483
00:19:53,176 --> 00:19:54,196
If we instead go to GoDaddy.com.

484
00:19:54,226 --> 00:19:55,636
GoDaddy.com which again
tries to sell you everything

485
00:19:55,666 --> 00:19:56,866
in the kitchen sink when
you visit their website,

486
00:19:56,896 --> 00:19:57,976
at least is more
reasonable when it comes

487
00:19:58,006 --> 00:19:59,116
to SSL certificates
whereby you can get away

488
00:19:59,146 --> 00:20:00,976
with $69.99 a year
or the premium SSL.

489
00:20:01,876 --> 00:20:04,616
And in this case premium
SSL, which is a feature a lot

490
00:20:04,616 --> 00:20:07,266
of these SSL providers
have tried to market

491
00:20:07,266 --> 00:20:10,546
in recent years does really
one fundamental difference.

492
00:20:10,696 --> 00:20:14,276
What does it mean when you visit
a website and the address bar,

493
00:20:14,456 --> 00:20:17,466
it not only says HTTPS
but it also turns green

494
00:20:17,996 --> 00:20:20,076
and says the companies
name in that address bar.

495
00:20:20,626 --> 00:20:22,586
What does it mean?

496
00:20:22,586 --> 00:20:24,556
>> It's supposed to mean
this side is really secure

497
00:20:24,626 --> 00:20:26,926
and you really trust it.

498
00:20:26,926 --> 00:20:29,276
>> Right. But in reality what
does it effectively mean based

499
00:20:29,276 --> 00:20:29,756
on this--

500
00:20:29,756 --> 00:20:30,736
>> They paid a hundred
dollars [inaudible].

501
00:20:31,336 --> 00:20:33,506
>> Exactly, they paid a
hundred dollars instead

502
00:20:33,506 --> 00:20:34,806
of $70 to get that right.

503
00:20:34,806 --> 00:20:37,466
Now before we just said these
sentences, how many of you knew

504
00:20:37,636 --> 00:20:39,836
that a green address
bar meant something

505
00:20:39,836 --> 00:20:40,726
fundamentally different?

506
00:20:41,396 --> 00:20:44,006
OK. So-- OK.

507
00:20:44,206 --> 00:20:46,446
Even eh, like-- so
there's the question.

508
00:20:46,446 --> 00:20:49,396
Is it really worth $30 to
convince no one in this room

509
00:20:49,396 --> 00:20:51,056
that you're site is more secure?

510
00:20:51,296 --> 00:20:53,156
So I'm being a little
pessimistic with all of these.

511
00:20:53,156 --> 00:20:54,866
But frankly I do think
this is a bit of scam.

512
00:20:54,866 --> 00:20:56,166
That we've built up
this whole industry,

513
00:20:56,166 --> 00:20:58,066
that in theory is actually
is a wonderful idea.

514
00:20:58,196 --> 00:21:01,456
These chains of trust whereby if
you trust someone authoritative,

515
00:21:01,646 --> 00:21:02,856
like Verisign or the like,

516
00:21:02,856 --> 00:21:04,616
you can then trust
anyone they trust.

517
00:21:04,896 --> 00:21:06,716
But the reality is, it's so easy

518
00:21:06,716 --> 00:21:08,526
to get SSL certificate
these days.

519
00:21:08,786 --> 00:21:15,116
And even until recently most
browsers did not put this crazy

520
00:21:15,156 --> 00:21:17,306
sounding message in
front of the user.

521
00:21:17,306 --> 00:21:20,536
You might see a little broken
link or a broken padlock icon

522
00:21:20,916 --> 00:21:22,386
but they didn't really
raise the bar.

523
00:21:22,386 --> 00:21:24,466
One thing Google has
started doing is putting

524
00:21:24,466 --> 00:21:25,246
up a site like this.

525
00:21:25,726 --> 00:21:28,236
But I dare say, and this
is a made up statistic,

526
00:21:28,236 --> 00:21:30,376
9 times out of 10, when
you see this message,

527
00:21:30,596 --> 00:21:32,196
it's just because
someone has let--

528
00:21:32,196 --> 00:21:33,956
hasn't paid for their
SSL certificate

529
00:21:33,956 --> 00:21:35,286
for the year or it has lapsed.

530
00:21:35,336 --> 00:21:36,266
I do this all the time,

531
00:21:36,266 --> 00:21:38,676
once a year our website start
saying this because I forgot

532
00:21:38,676 --> 00:21:40,386
to pay to bill for
the SSL certificate.

533
00:21:40,666 --> 00:21:43,146
But fundamentally, it's a
wonderful idea because it means

534
00:21:43,146 --> 00:21:44,606
that you might be
visiting a site

535
00:21:44,996 --> 00:21:46,816
that is not who they
claim to be.

536
00:21:47,036 --> 00:21:48,416
Because rather, you're
the victim

537
00:21:48,416 --> 00:21:49,436
of what might be called a man

538
00:21:49,436 --> 00:21:51,496
in the middle attack
whereby someone has gotten

539
00:21:51,496 --> 00:21:52,996
into the middle of
your DNS traffic

540
00:21:52,996 --> 00:21:56,366
and even though you think
your visiting cs.harvard.edu,

541
00:21:56,616 --> 00:21:59,316
some bad guys sitting in
Starbucks has actually lead you

542
00:21:59,316 --> 00:22:02,246
to his website instead and is
trying to trick you into typing

543
00:22:02,246 --> 00:22:03,946
in your user name and
password at the like.

544
00:22:04,256 --> 00:22:06,796
So again the mathematics, the
technology itself is wonderful

545
00:22:07,016 --> 00:22:08,976
but the fact that there is this
market that are paying hundreds

546
00:22:08,976 --> 00:22:11,986
of dollars versus tens of
dollars is a bit unfortunate

547
00:22:12,196 --> 00:22:12,996
that that's where we're at.

548
00:22:13,146 --> 00:22:13,296
Yeah.

549
00:22:13,816 --> 00:22:21,846
>> This message will only
appear if port 443 is active

550
00:22:22,816 --> 00:22:25,746
in SSL is being offered.

551
00:22:25,746 --> 00:22:26,176
>> Enabled.

552
00:22:26,436 --> 00:22:26,976
Exactly.

553
00:22:27,016 --> 00:22:27,966
>> Otherwise it will not.

554
00:22:28,366 --> 00:22:28,776
>> Correct.

555
00:22:28,776 --> 00:22:31,766
If the web server itself is
not configured to listen,

556
00:22:31,766 --> 00:22:34,836
so to speak, on port
443, then this--

557
00:22:34,926 --> 00:22:35,796
you will just get a dead end

558
00:22:35,796 --> 00:22:38,056
and you will get a generic
browser message saying

559
00:22:38,056 --> 00:22:40,006
"server not found" or
something to that effect.

560
00:22:40,336 --> 00:22:42,536
So you must per the
configuration we started

561
00:22:42,536 --> 00:22:43,426
glancing at.

562
00:22:43,426 --> 00:22:45,656
At least have your website
configured to listen

563
00:22:45,656 --> 00:22:47,816
on both of those TCP ports.

564
00:22:47,876 --> 00:22:49,806
Recall our discussion
of ports on Monday.

565
00:22:50,056 --> 00:22:51,526
We can do a little
introspection here.

566
00:22:51,526 --> 00:22:54,466
If I click the X up
here and then zoom in.

567
00:22:54,946 --> 00:22:57,346
Server's certificate
does not match the URL,

568
00:22:57,566 --> 00:22:59,256
server certificate has expired,

569
00:22:59,306 --> 00:23:00,556
server certificate
is not trusted.

570
00:23:00,556 --> 00:23:02,056
So, we're really not
doing so well here.

571
00:23:02,336 --> 00:23:05,636
So let's click on certificate
information just to see what--

572
00:23:05,636 --> 00:23:09,126
oh, but the irony is-- but we
have a very secure connection

573
00:23:09,126 --> 00:23:10,766
to whoever the hell
this is on the internet.

574
00:23:11,396 --> 00:23:13,176
So, let's click certificate
information

575
00:23:13,176 --> 00:23:15,786
and we'll get a little
more detail.

576
00:23:16,276 --> 00:23:18,496
So looks like this
certificate expired in May.

577
00:23:18,496 --> 00:23:20,746
So I'm guilty of the same,
so I can't really poke fun

578
00:23:21,306 --> 00:23:23,196
of them for doing this.

579
00:23:23,286 --> 00:23:25,376
But if we click details
and scroll down,

580
00:23:25,806 --> 00:23:29,766
we see that the certificate
they're actually using

581
00:23:29,766 --> 00:23:35,336
for cs.harvard.edu should
actually be eecs.harvard.edu.

582
00:23:35,336 --> 00:23:37,416
That's Electrical Engineering
in Computer Science.

583
00:23:37,836 --> 00:23:38,856
So there is a-- unfortunately,

584
00:23:38,956 --> 00:23:40,826
I've just revealed who's
responsible for this certificate

585
00:23:40,826 --> 00:23:43,306
but he's no longer
here, so it is OK.

586
00:23:43,626 --> 00:23:46,816
But what the take away here is
that there's a few solutions.

587
00:23:46,816 --> 00:23:48,966
Either one, you pay the
bill and then at least one

588
00:23:48,966 --> 00:23:50,066
of those messages goes away.

589
00:23:50,066 --> 00:23:51,706
And it's not just a
matter of paying the bill,

590
00:23:51,706 --> 00:23:54,836
you have to download an
updated certificate to install

591
00:23:54,836 --> 00:23:57,396
in your web server with an
updated date for expiration.

592
00:23:58,286 --> 00:24:01,826
But more than that, we also
have to fix the domain name.

593
00:24:02,076 --> 00:24:03,646
And so you have a
few options here.

594
00:24:03,646 --> 00:24:06,346
You either one, buy a
separate SSL certificate

595
00:24:06,586 --> 00:24:10,916
for cs.harvard.edu in
addition to eecs.harbor.edu,

596
00:24:11,236 --> 00:24:13,356
or you can buy what's called
the wildcard certificate.

597
00:24:13,356 --> 00:24:16,006
And for instance the course
CS75, we have this ourselves.

598
00:24:16,316 --> 00:24:20,516
It's unfortunately like $199
a year, but what that means

599
00:24:20,516 --> 00:24:23,316
for your money, is that you can
protect and avoid these kinds

600
00:24:23,316 --> 00:24:26,266
of warnings for *.cs75.net.

601
00:24:26,266 --> 00:24:29,496
Any subdomains you want and we
happen to use things like mail

602
00:24:29,496 --> 00:24:31,276
and others for back
and technical reasons.

603
00:24:31,546 --> 00:24:33,376
So for us that actually
tends to make sense.

604
00:24:33,746 --> 00:24:35,116
So there's a few solutions here.

605
00:24:35,226 --> 00:24:38,436
And I should say too, one
of the other reasonably--

606
00:24:38,436 --> 00:24:41,646
compelling reason to pay
more money to a bigger fish

607
00:24:41,646 --> 00:24:43,496
than someone like
GoDaddy or Namecheap

608
00:24:43,496 --> 00:24:46,916
for SSL certificates is that,
as part of this chain of trust,

609
00:24:46,966 --> 00:24:49,956
the various browser
manufacturers Microsoft, Google,

610
00:24:50,206 --> 00:24:54,036
Apple and so forth, they
ship their browsers, Safari,

611
00:24:54,036 --> 00:24:58,276
IE and so forth, with certain
certificate authorities' own

612
00:24:58,536 --> 00:24:59,756
certificate installed.

613
00:25:00,136 --> 00:25:02,896
So in other words it's up to
those big companies of browsers

614
00:25:02,896 --> 00:25:03,636
to decide who--

615
00:25:03,636 --> 00:25:05,736
which certificate
authority should you trust.

616
00:25:06,056 --> 00:25:07,756
And some of those
vendors, Microsoft,

617
00:25:07,756 --> 00:25:08,426
they might have a list

618
00:25:08,426 --> 00:25:10,426
of certificate authorities
who's trust this long,

619
00:25:10,676 --> 00:25:13,706
Google's might be this long,
really depends on the company.

620
00:25:14,036 --> 00:25:16,946
So if you go some
fly-by-night operation

621
00:25:16,946 --> 00:25:19,866
or you yourself digitally
signed your own certificate,

622
00:25:20,006 --> 00:25:22,506
which is mathematically
possible, you--

623
00:25:22,506 --> 00:25:24,706
if you are not trusted
or that fly-by-night

624
00:25:24,706 --> 00:25:27,266
as a SSL company is not
trusted by Microsoft or Google,

625
00:25:27,536 --> 00:25:29,216
you're going to get
this kind of warning.

626
00:25:29,526 --> 00:25:31,026
So one of the things
you're paying--

627
00:25:31,026 --> 00:25:33,686
and if you frankly are Fortune
500 Company and the difference

628
00:25:33,686 --> 00:25:37,306
between $300 or $1000 is not
such a big deal to make sure

629
00:25:37,306 --> 00:25:40,696
that more of your costumers
reach your website correctly,

630
00:25:40,746 --> 00:25:43,466
it might be worth spending
more money because it could be

631
00:25:43,466 --> 00:25:46,506
that someone has got the latest
version of Android and they're--

632
00:25:46,506 --> 00:25:48,606
for whatever reason it did not
ship with the right certificates

633
00:25:48,606 --> 00:25:51,376
or someone's using version
1.0 of Netscape or something

634
00:25:51,376 --> 00:25:54,036
like that, and so certificates
aren't trusted inside of that.

635
00:25:54,186 --> 00:25:57,206
So again, you're paying
to minimize the risk

636
00:25:57,206 --> 00:26:00,016
of users running into this
kind of unrecognized message

637
00:26:00,416 --> 00:26:02,436
but that's orthogonal
to the expiration

638
00:26:02,436 --> 00:26:05,246
which is just a matter
of we left the bill laps.

639
00:26:05,496 --> 00:26:06,186
Any questions?

640
00:26:06,626 --> 00:26:09,126
No? All right.

641
00:26:09,246 --> 00:26:11,126
So how do you actually
configure this?

642
00:26:11,466 --> 00:26:15,516
Well, when you create your
certificate, running a command

643
00:26:15,516 --> 00:26:20,466
on the computer, you end up
with two files, one is a key

644
00:26:20,716 --> 00:26:21,926
and one is a certificate.

645
00:26:22,976 --> 00:26:24,566
They key-- rather
one is a private key,

646
00:26:24,566 --> 00:26:25,576
one is a public key.

647
00:26:25,836 --> 00:26:29,066
This line here, SSL certificate
key file, this is literally

648
00:26:29,066 --> 00:26:32,026
where our private key
can found on the server.

649
00:26:32,026 --> 00:26:35,726
For security reasons, I've
faked it as path to cs75.key

650
00:26:35,956 --> 00:26:37,396
but it's somewhere
on the hard drive.

651
00:26:37,396 --> 00:26:40,076
And I should make it clear,
it is not in the same location

652
00:26:40,076 --> 00:26:42,506
as your HTML files and GIFs
because that would be stupid

653
00:26:42,506 --> 00:26:43,926
if you-- anyone could
just download it.

654
00:26:43,926 --> 00:26:44,746
So it's somewhere else.

655
00:26:45,226 --> 00:26:48,036
The certificate key--
a SSL certificate file,

656
00:26:48,276 --> 00:26:50,436
this is what you're paying for.

657
00:26:50,686 --> 00:26:53,656
You upload you public key
to GoDaddy or Verisign,

658
00:26:53,846 --> 00:26:56,126
they then send you
back via email

659
00:26:56,126 --> 00:26:59,786
or a download a digitally signed
copy which has your big number

660
00:26:59,846 --> 00:27:01,156
and essentially very big number.

661
00:27:01,236 --> 00:27:02,686
And then you install that here.

662
00:27:02,966 --> 00:27:05,616
And then lastly, this
chain file just has to do

663
00:27:05,616 --> 00:27:09,406
with some registrar, some
SSL providers where by just

664
00:27:09,406 --> 00:27:11,746
in case their-- one
of the certificate--

665
00:27:11,746 --> 00:27:15,106
authority certificates
didn't ship with the browser,

666
00:27:15,406 --> 00:27:18,796
this chain certificate
essentially says we trust this

667
00:27:18,796 --> 00:27:22,326
person so it's OK if your
certificate is assigned by them.

668
00:27:22,326 --> 00:27:25,096
So I have glossed over some
of the technical detail,

669
00:27:25,096 --> 00:27:27,916
and it turns out, is maybe
nice the theories this is.

670
00:27:27,916 --> 00:27:29,506
SSL itself is still
completely broken,

671
00:27:29,506 --> 00:27:31,096
like it can be circumvented.

672
00:27:31,096 --> 00:27:33,156
And I'll actually try to dig
up an article and I'll post it

673
00:27:33,156 --> 00:27:34,836
on the lecture's
page after tonight.

674
00:27:35,106 --> 00:27:37,096
If you're curious to see
an interesting presentation

675
00:27:37,096 --> 00:27:38,856
on the various ways
in which you can--

676
00:27:39,216 --> 00:27:42,706
for word SSL and trick users

677
00:27:42,706 --> 00:27:45,016
into thinking its secure
when it really isn't.

678
00:27:45,746 --> 00:27:47,906
So nice story but the whole
world is broken anyway.

679
00:27:49,266 --> 00:27:51,286
Any questions about SSL?

680
00:27:52,216 --> 00:27:54,736
There's one corner case
that you need to be mindful

681
00:27:54,736 --> 00:27:57,866
of when setting up your
own website, a running SSL

682
00:27:57,916 --> 00:27:59,676
in your website requires

683
00:27:59,676 --> 00:28:06,136
that you have a unique--
fill in the blank.

684
00:28:06,216 --> 00:28:08,496
Requires that your
website have a unique IP.

685
00:28:08,716 --> 00:28:11,376
And this is one of the genuine
gotchas [phonetic] with SSL.

686
00:28:11,376 --> 00:28:14,006
You have a sort of
Catch-22 with SSL.

687
00:28:14,316 --> 00:28:17,076
Because SSL is about
encrypting information,

688
00:28:17,706 --> 00:28:18,686
what's get encrypted?

689
00:28:18,796 --> 00:28:21,376
Really everything in the
request and the response.

690
00:28:21,456 --> 00:28:22,906
So everything inside

691
00:28:22,906 --> 00:28:24,986
of the virtual envelope
is encrypted what are some

692
00:28:24,986 --> 00:28:26,586
of the things inside
the virtual envelope?

693
00:28:26,586 --> 00:28:28,876
Well, the get line and also--

694
00:28:29,036 --> 00:28:32,706
>> The specific server you
would be on a virtual--

695
00:28:32,826 --> 00:28:33,356
>> Exactly.

696
00:28:33,516 --> 00:28:37,576
The host tether which tells the
server which Vhost this is for.

697
00:28:37,776 --> 00:28:39,306
But the problem is,
as we're looking

698
00:28:39,306 --> 00:28:40,276
at the configuration here.

699
00:28:40,276 --> 00:28:43,616
Every Vhost can obviously
have its own SSL certificate

700
00:28:43,726 --> 00:28:45,616
because it might be
food.com, bar.com.

701
00:28:45,616 --> 00:28:47,276
This could be unrelated
entities.

702
00:28:47,276 --> 00:28:50,666
This is a snippet of a
shared web host's web

703
00:28:50,666 --> 00:28:51,766
server configuration.

704
00:28:52,136 --> 00:28:55,296
So, if you're getting encrypted
request but the only way

705
00:28:55,296 --> 00:28:58,476
to figure out how to--
who the request is for,

706
00:28:58,656 --> 00:29:00,146
is to decrypt the request.

707
00:29:00,686 --> 00:29:03,186
But to decrypted request,
you have to know who it is

708
00:29:03,186 --> 00:29:07,266
for because the SSL
certificate key--

709
00:29:07,266 --> 00:29:10,136
the private key you should
use is tied to that Vhost.

710
00:29:10,446 --> 00:29:11,906
You again have this Catch-22.

711
00:29:12,086 --> 00:29:15,706
You can only figure out who it's
for by knowing who it's for.

712
00:29:16,166 --> 00:29:18,756
And so, there's, you know,
there's-- in theory work around,

713
00:29:18,756 --> 00:29:20,646
you could try all possible
private keys you have

714
00:29:20,646 --> 00:29:21,726
on the system decrypting

715
00:29:21,726 --> 00:29:23,226
but that's not necessarily
deterministic

716
00:29:23,226 --> 00:29:24,936
and it is also a little
hackish [phonetic] especially

717
00:29:24,936 --> 00:29:26,636
if you have hundreds
of Vhost on the server.

718
00:29:26,846 --> 00:29:29,776
So the de facto result is
that you just can't do it.

719
00:29:30,156 --> 00:29:34,116
But if you give every
Vhost a unique IP address

720
00:29:34,486 --> 00:29:36,816
and then associate
effectively the certificate

721
00:29:36,816 --> 00:29:39,106
with that IP address,
then you're safe.

722
00:29:39,516 --> 00:29:41,956
Because then, you can just
assume that if it comes

723
00:29:41,956 --> 00:29:47,396
in on IP address w.x.y.z it must
be using this SSL certificate.

724
00:29:47,396 --> 00:29:49,066
And there is one corner case.

725
00:29:49,446 --> 00:29:51,676
If you have a wildcard
certificate like we,

726
00:29:51,676 --> 00:29:54,626
the course do, thankfully
with the wildcard,

727
00:29:54,626 --> 00:29:56,606
we don't need a unique
IP address for all

728
00:29:56,606 --> 00:29:59,976
of our subdomains, FTP,
mail, web and so forth.

729
00:30:00,246 --> 00:30:02,476
Because, if they all
come to the same server,

730
00:30:02,476 --> 00:30:04,206
you can use the same
wildcard certificate

731
00:30:04,206 --> 00:30:05,826
to decrypt all of that traffic.

732
00:30:06,296 --> 00:30:09,496
So in short, when you sign up
for a web host, if you want SSL,

733
00:30:09,496 --> 00:30:11,776
which frankly this days, it's
just a good thing to have,

734
00:30:11,776 --> 00:30:12,836
good practice to get into,

735
00:30:13,036 --> 00:30:15,246
it's probably worth
paying a few dollars more

736
00:30:15,246 --> 00:30:16,386
to get a unique IP address.

737
00:30:16,606 --> 00:30:18,096
Because otherwise,
your users will get

738
00:30:18,096 --> 00:30:19,586
that very scary, red message.

739
00:30:19,866 --> 00:30:22,126
And Google makes it, Chrome
makes it easy to click through.

740
00:30:22,176 --> 00:30:25,316
Firefox, you literally have to
click like five buttons in order

741
00:30:25,316 --> 00:30:26,946
to get pass the warnings.

742
00:30:26,946 --> 00:30:27,576
It's atrocious.

743
00:30:27,576 --> 00:30:29,196
No normal user will
ever figure it out.

744
00:30:30,546 --> 00:30:32,306
So paying for an SSL
certificate is sort

745
00:30:32,306 --> 00:30:33,696
of a necessary evil these days.

746
00:30:33,806 --> 00:30:35,476
The end result is
great, cryptography.

747
00:30:35,806 --> 00:30:38,186
But a bunch of hoops you
have to try-- jump through.

748
00:30:38,736 --> 00:30:41,466
All right, any questions?

749
00:30:41,956 --> 00:30:42,023
Yeah.

750
00:30:42,023 --> 00:30:46,526
>> When I post a post-it paper
that you mentioned [inaudible].

751
00:30:46,646 --> 00:30:47,316
>> By morning.

752
00:30:47,316 --> 00:30:49,636
I'll dig up the URL
and then I will post it

753
00:30:49,636 --> 00:30:51,266
on the lecture's
page of the website.

754
00:30:51,726 --> 00:30:53,986
So that-- if of interest,
you can check that out.

755
00:30:54,136 --> 00:30:57,956
Let me just pull
up our slides here.

756
00:30:58,956 --> 00:31:02,706
And go to-- so what about this?

757
00:31:02,956 --> 00:31:06,806
This is among the more cryptic
pieces of syntax that's useful

758
00:31:06,806 --> 00:31:08,406
to know or at least
get comfortable with

759
00:31:08,406 --> 00:31:09,976
or get comfortable
copying and pasting.

760
00:31:10,266 --> 00:31:11,876
Because with Apache
you can actually start

761
00:31:11,876 --> 00:31:13,326
to do fairly powerful things.

762
00:31:13,326 --> 00:31:14,736
And this is perhaps
one of the most common.

763
00:31:15,036 --> 00:31:16,796
This is using enough
feature of Apache

764
00:31:16,796 --> 00:31:18,836
and other web servers have
very similar functionality,

765
00:31:18,836 --> 00:31:20,086
though they might call
it something different,

766
00:31:20,466 --> 00:31:21,996
called the URL rewriting.

767
00:31:22,266 --> 00:31:24,656
So mod rewrite just
means module rewrite,

768
00:31:24,726 --> 00:31:25,866
this is an optional feature.

769
00:31:25,866 --> 00:31:28,066
You can enable an
Apache web server

770
00:31:28,406 --> 00:31:30,026
that lets you rewrite URLs.

771
00:31:30,366 --> 00:31:32,926
Now even if you've never
seen this syntax before,

772
00:31:33,706 --> 00:31:35,526
what do you think
these three lines

773
00:31:35,526 --> 00:31:37,246
of monospaced text are doing?

774
00:31:37,856 --> 00:31:37,936
Yeah.

775
00:31:38,756 --> 00:31:43,256
>> Compensating for
omissions and misspellings.

776
00:31:43,256 --> 00:31:45,186
>> Compensating for
omissions and misspellings?

777
00:31:45,606 --> 00:31:47,946
Sort of. That's actually
a good thought.

778
00:31:48,186 --> 00:31:52,066
The only catch there is that
if the user does mistype the

779
00:31:52,176 --> 00:31:56,186
address, it won't necessarily
work unless DNS is configured

780
00:31:56,186 --> 00:31:58,566
to at least deliver the
user to this end point.

781
00:31:58,936 --> 00:32:03,206
So in other words, if they
accidentally type wwww.cs75.net,

782
00:32:03,426 --> 00:32:06,876
that will be a dead end unless
we in DNS have allowed to work

783
00:32:06,876 --> 00:32:09,766
with an aid record for
instance or wild card record

784
00:32:09,766 --> 00:32:10,806
which is also possible.

785
00:32:11,326 --> 00:32:13,866
What else might this
be doing though?

786
00:32:13,866 --> 00:32:15,106
That's on the right track though

787
00:32:15,106 --> 00:32:17,266
and this is a very concrete
case that we're solving.

788
00:32:17,856 --> 00:32:21,616
>> Maybe it has something to do
with the checking if it's HTTPS.

789
00:32:21,616 --> 00:32:22,406
>> OK, good.

790
00:32:22,406 --> 00:32:24,126
So is it checking
whether it's HTTPS.

791
00:32:24,126 --> 00:32:27,676
So it's technically not, though
it's very close to doing that.

792
00:32:27,676 --> 00:32:29,116
We could tweak it
in a certain way.

793
00:32:29,116 --> 00:32:29,196
Yeah?

794
00:32:29,676 --> 00:32:34,986
>> Is this kind like
a re-direct something?

795
00:32:35,106 --> 00:32:36,916
>> It is a redirect and
what's it redirecting

796
00:32:36,966 --> 00:32:37,876
from and to do you think?

797
00:32:37,876 --> 00:32:41,176
>> From the top unto
the bottom one.

798
00:32:41,176 --> 00:32:42,826
>> From the top unto
the bottom one, sort of.

799
00:32:43,106 --> 00:32:44,416
So that's actually pretty close.

800
00:32:44,416 --> 00:32:46,006
So let's start teasing
this apart.

801
00:32:46,376 --> 00:32:48,176
So the very first line
does what it says.

802
00:32:48,176 --> 00:32:50,706
Turns this so-called rewrite
engine on, if without that,

803
00:32:50,706 --> 00:32:52,816
also this is a common
thing I often forget about,

804
00:32:52,946 --> 00:32:54,816
nothing is going to work
unless you explicitly turn the

805
00:32:54,816 --> 00:32:55,566
engine on.

806
00:32:55,936 --> 00:32:57,336
So first line does that.

807
00:32:57,646 --> 00:32:58,816
Second line is a condition.

808
00:32:58,816 --> 00:33:01,266
So you can think of this is
a certain of a cryptic way

809
00:33:01,266 --> 00:33:03,516
of implementing an
if-else type condition.

810
00:33:03,876 --> 00:33:07,846
So if the HTTP host
variables-- so what is this?

811
00:33:07,846 --> 00:33:09,656
Anything with the
present sign curly brace

812
00:33:09,656 --> 00:33:11,536
and then a capital
phrase like that,

813
00:33:11,816 --> 00:33:13,656
it's what's called an
environment variable

814
00:33:13,656 --> 00:33:14,476
on the web server.

815
00:33:14,476 --> 00:33:16,346
There's a whole bunch of
variables that are set to sort

816
00:33:16,346 --> 00:33:18,256
of automatically for
you when a user visits,

817
00:33:18,546 --> 00:33:20,136
among them is HTTP host.

818
00:33:20,226 --> 00:33:23,196
And that is a variable that
specifies what is the IP address

819
00:33:23,196 --> 00:33:25,506
or literally the word, the
host name or domain name

820
00:33:25,636 --> 00:33:26,506
that the user visited.

821
00:33:26,506 --> 00:33:28,646
It's equivalent to the
host line if you will

822
00:33:28,646 --> 00:33:29,666
from the HTTP request.

823
00:33:30,336 --> 00:33:33,186
So bang here is part now
of a regular expression.

824
00:33:33,186 --> 00:33:36,106
So if unfamiliar, regular
expression is a pattern

825
00:33:36,106 --> 00:33:37,216
that you're trying to match.

826
00:33:37,496 --> 00:33:39,496
Bang is the opposite of true,

827
00:33:39,496 --> 00:33:43,426
so it means if the
HTB host is not going

828
00:33:43,426 --> 00:33:46,576
to match the following,
don't proceed any further.

829
00:33:46,786 --> 00:33:48,036
So what are trying to match?

830
00:33:48,036 --> 00:33:51,986
The caret symbol means
what in a rejects?

831
00:33:51,986 --> 00:33:53,846
Reject is fancy way of
saying regular expression.

832
00:33:53,976 --> 00:33:54,186
>> Anything?

833
00:33:54,756 --> 00:33:56,556
>> Not anything that
would be dot.

834
00:33:58,136 --> 00:33:59,476
>> Not any case.

835
00:33:59,926 --> 00:34:01,046
>> Not any case.

836
00:34:01,236 --> 00:34:03,286
Caret symbol, anyone else?

837
00:34:05,616 --> 00:34:05,806
>> Begins.

838
00:34:05,926 --> 00:34:06,866
>> Begins, perfect.

839
00:34:06,866 --> 00:34:09,086
So caret symbol means
the beginning

840
00:34:09,086 --> 00:34:12,926
of the variables value
must start with www this is

841
00:34:12,926 --> 00:34:15,286
to avoid accidental
substring matching

842
00:34:15,286 --> 00:34:16,116
where you're matching part

843
00:34:16,116 --> 00:34:18,006
of the did domain name
but not all of it.

844
00:34:18,236 --> 00:34:20,426
So this means you must
start matching from www.

845
00:34:20,426 --> 00:34:22,306
In other words the first letter

846
00:34:22,306 --> 00:34:24,456
in the host name
must actually be www.

847
00:34:24,456 --> 00:34:27,556
It can't be xyz, www.

848
00:34:27,996 --> 00:34:30,506
So www, I have a backslash dot.

849
00:34:30,636 --> 00:34:31,926
Based on what I just said

850
00:34:31,926 --> 00:34:34,036
about dot significance
what is backslash dot?

851
00:34:34,536 --> 00:34:38,676
It's an escape character, so
it means literally a period.

852
00:34:38,716 --> 00:34:41,396
If you just say period that
means any character can be here,

853
00:34:41,606 --> 00:34:44,406
backslash dot is only
a dot can be here.

854
00:34:44,766 --> 00:34:51,816
Cs75.net/.net means it must
match some literally a .NET

855
00:34:51,816 --> 00:34:54,866
and then this NC is fairly--
or arcane, just means no case.

856
00:34:54,946 --> 00:34:55,946
It's a case insensitive.

857
00:34:55,946 --> 00:34:57,956
It doesn't matter if the user
have the caps lock key on,

858
00:34:57,956 --> 00:35:00,056
this will still match
if the word is correct.

859
00:35:00,456 --> 00:35:02,996
So if the HTTP host is not equal

860
00:35:02,996 --> 00:35:07,606
to literally www.cs75.net
proceed to the following line.

861
00:35:07,946 --> 00:35:09,386
What does the following
line say?

862
00:35:09,386 --> 00:35:10,426
This is rewrite rule.

863
00:35:10,426 --> 00:35:12,836
So this is the-- if
you have an if-else,

864
00:35:12,966 --> 00:35:16,686
this is the if-then
part of the expression.

865
00:35:17,066 --> 00:35:18,426
So if, then do this.

866
00:35:18,826 --> 00:35:21,446
So this thing here, let's come
back to and focus on this.

867
00:35:21,626 --> 00:35:24,986
I am going to rewrite the
user, rather, redirect the user

868
00:35:24,986 --> 00:35:30,196
to HTTPS://www.cs75.net/$1.

869
00:35:30,496 --> 00:35:34,506
What is $1 may be referred
to for those familiar

870
00:35:34,506 --> 00:35:35,166
with rejectses [phonetic]?

871
00:35:35,516 --> 00:35:39,456
>> I think it's whatever
the user type in after .net?

872
00:35:39,456 --> 00:35:39,976
>> Exactly.

873
00:35:39,976 --> 00:35:41,636
>> So that you wouldn't
have like a [inaudible].

874
00:35:42,086 --> 00:35:42,626
>> Exactly.

875
00:35:42,976 --> 00:35:44,246
So let's go back to this.

876
00:35:44,246 --> 00:35:45,126
What is this doing?

877
00:35:45,326 --> 00:35:48,206
Parentheses, in the context
of regular expressions,

878
00:35:48,356 --> 00:35:50,556
generally mean capturing
parenthesis.

879
00:35:50,896 --> 00:35:55,336
So this cryptic sequence of
symbols here means dot start.

880
00:35:55,336 --> 00:35:58,696
So dot is any character,
star means zero or more

881
00:35:58,696 --> 00:35:59,846
of the proceeding thing.

882
00:36:00,076 --> 00:36:02,806
So this means zero or more
characters capture them.

883
00:36:03,176 --> 00:36:04,356
Where you're capturing
them from?

884
00:36:04,356 --> 00:36:06,626
Exactly what you said,
anything after the slash

885
00:36:06,626 --> 00:36:09,636
that the user typed in is
captured by these parentheses

886
00:36:09,796 --> 00:36:13,006
and by convention is stored
in a variable called $1.

887
00:36:13,546 --> 00:36:15,546
If I had a second pair
of parentheses over here

888
00:36:15,546 --> 00:36:17,436
for whatever reason,
then I would have access

889
00:36:17,436 --> 00:36:22,496
to $1 and $2 and $3 and $4.

890
00:36:22,496 --> 00:36:24,776
So it's a generic
way of not knowing

891
00:36:24,776 --> 00:36:26,776
in advance how many
parentheses you might have,

892
00:36:26,776 --> 00:36:29,496
but you can at least express
yourself after the fact.

893
00:36:29,856 --> 00:36:33,446
So this just ensures that if
the user visits something/abc,

894
00:36:33,446 --> 00:36:40,416
I will not be redirecting
the user to www.cs75.net.

895
00:36:40,416 --> 00:36:40,816
That's it.

896
00:36:40,906 --> 00:36:44,536
I will also have the courtesy
of sending them to /abc.

897
00:36:44,536 --> 00:36:48,126
And this is infuriating how
few websites actually do this,

898
00:36:48,126 --> 00:36:49,276
especially in mobile phones.

899
00:36:49,276 --> 00:36:50,986
If you're in the habit of
reading of news or what-not

900
00:36:50,986 --> 00:36:53,256
on your phone, this is a
detail that drives me nuts.

901
00:36:53,256 --> 00:36:55,846
I'll go to like Google News,
which has links to all sorts

902
00:36:55,846 --> 00:36:59,156
of websites, I'll click through,
and for whatever stupid reason,

903
00:36:59,156 --> 00:37:01,736
the website will decide,
"Oh, you probably want--

904
00:37:01,736 --> 00:37:04,306
you came to us from Google News,
but we want to show you our--

905
00:37:04,306 --> 00:37:06,856
the mobile version of our
website, so let us send you

906
00:37:06,856 --> 00:37:09,606
to m.news.com" or
whatever it is,

907
00:37:09,856 --> 00:37:13,276
completely forgetting what the
URL was that you we were at.

908
00:37:13,276 --> 00:37:15,216
So the end result is you
can't view the article

909
00:37:15,216 --> 00:37:15,906
that you clicked on.

910
00:37:16,276 --> 00:37:17,216
How do you fix this?

911
00:37:18,026 --> 00:37:19,206
Simple as something like this.

912
00:37:19,526 --> 00:37:20,656
Now, if they're not
using a patchy,

913
00:37:20,656 --> 00:37:22,296
it's going to be a little
different, but it affix,

914
00:37:22,296 --> 00:37:23,776
it's fundamentally that simple

915
00:37:23,836 --> 00:37:25,376
to remember what
the user typed in.

916
00:37:25,376 --> 00:37:26,856
So again, in terms
of user experience,

917
00:37:26,856 --> 00:37:29,636
in terms of running your own
websites, super simple thing

918
00:37:29,636 --> 00:37:32,056
to do and certainly to
you user's advantage,

919
00:37:32,056 --> 00:37:34,426
because if you're like me,
you just leave that news site

920
00:37:34,426 --> 00:37:35,676
and never come back
because it just--

921
00:37:35,676 --> 00:37:37,736
it was annoying to
visit in that case.

922
00:37:38,516 --> 00:37:38,866
All right.

923
00:37:38,866 --> 00:37:41,026
And how about a couple
more technical details?

924
00:37:41,056 --> 00:37:42,556
R equals 301.

925
00:37:42,826 --> 00:37:44,946
Anyone want to guess
what's that referring to?

926
00:37:44,946 --> 00:37:46,366
>> Isn't that the redirect one?

927
00:37:46,536 --> 00:37:48,456
>> Yeah. The redirect's
status quo that we talked

928
00:37:48,456 --> 00:37:51,196
about on Monday, 301
means, what specifically?

929
00:37:51,736 --> 00:37:51,866
Moved--

930
00:37:53,056 --> 00:37:54,126
>> Permanently.

931
00:37:55,116 --> 00:37:55,926
>> -- permanently.

932
00:37:56,246 --> 00:37:59,136
So this is in contrast
with 302, which happens

933
00:37:59,136 --> 00:38:01,176
to be moved temporarily.

934
00:38:01,746 --> 00:38:02,706
Who cares?

935
00:38:02,836 --> 00:38:05,166
Like why are these two separate
codes do you think whose

936
00:38:05,166 --> 00:38:07,406
functionality is
essentially the same.

937
00:38:07,496 --> 00:38:10,926
>> If it's moved permanently
or computers don't save that.

938
00:38:10,926 --> 00:38:14,126
>> Good. If it's a 301 and
thus permanent, the browser,

939
00:38:14,126 --> 00:38:16,006
if it's smart, it will
cache that response

940
00:38:16,346 --> 00:38:18,016
and the next time
you, the human,

941
00:38:18,176 --> 00:38:19,966
try to visit the same
page, you're just going

942
00:38:19,966 --> 00:38:21,416
to be automatically redirected

943
00:38:21,606 --> 00:38:24,356
without wasting the server's
time asking the same question.

944
00:38:24,666 --> 00:38:27,216
Whereas 302 means
it's temporary,

945
00:38:27,246 --> 00:38:28,966
you probably should
check back with me.

946
00:38:29,246 --> 00:38:31,096
So upside is, you
save a little time.

947
00:38:31,096 --> 00:38:33,386
The user gets a response
a little bit faster.

948
00:38:33,726 --> 00:38:34,836
Downside though is what?

949
00:38:35,266 --> 00:38:40,516
What's the downside
of 301 do you think?

950
00:38:41,536 --> 00:38:43,326
Again, think-- start
thinking about corner cases

951
00:38:43,326 --> 00:38:45,046
and problems you
might be creating

952
00:38:45,046 --> 00:38:46,386
by trying to be helpful.

953
00:38:47,346 --> 00:38:48,486
For the-- what's that?

954
00:38:48,486 --> 00:38:50,506
>> In case it will
revert that back.

955
00:38:50,506 --> 00:38:51,546
>> In case it reverts back.

956
00:38:51,546 --> 00:38:54,666
Suppose that you just decide
to reconfigure your server

957
00:38:54,666 --> 00:38:57,486
or you change the name
of it or whatever.

958
00:38:57,616 --> 00:38:59,056
You know, it's not
something you do commonly,

959
00:38:59,326 --> 00:39:02,296
but the day you do it, are you
going to be tricking your users

960
00:39:02,296 --> 00:39:03,536
into visiting a dead end?

961
00:39:03,696 --> 00:39:04,836
And so you have to be mindful,

962
00:39:04,836 --> 00:39:07,316
especially if you're the
person doing the web server

963
00:39:07,316 --> 00:39:09,466
configuration, not the
development of the website,

964
00:39:09,856 --> 00:39:11,206
you know, maybe we
should make sure both

965
00:39:11,206 --> 00:39:13,996
of these continue working for
some number of days or weeks

966
00:39:14,126 --> 00:39:15,326
so that anyone in the world

967
00:39:15,326 --> 00:39:18,606
who had cached this response
finally reboots their computer

968
00:39:18,606 --> 00:39:19,866
or quits their browsers.

969
00:39:19,866 --> 00:39:21,956
So these are the kinds of
corner cases to be mindful

970
00:39:21,956 --> 00:39:24,856
of especially when you care
ever so much about uptime

971
00:39:24,856 --> 00:39:27,386
and making sure your
users don't hit dead ends.

972
00:39:27,736 --> 00:39:29,986
L, probably won't guess
this, this means last.

973
00:39:30,346 --> 00:39:31,706
This just means if
you have a whole bunch

974
00:39:31,706 --> 00:39:34,286
of these rewrite conditions
and rules in the same file,

975
00:39:34,746 --> 00:39:36,366
this is just one of
saying, "That's it.

976
00:39:36,616 --> 00:39:38,656
Don't bother processing
anything else in the file.

977
00:39:38,656 --> 00:39:40,426
We want this redirect
to kick in first."

978
00:39:41,226 --> 00:39:42,416
So find fault with this.

979
00:39:42,416 --> 00:39:44,736
I'm kind of looking
at my own align here,

980
00:39:44,906 --> 00:39:48,346
and there's technically a bug
even though it's not likely ever

981
00:39:48,346 --> 00:39:49,226
to be encountered.

982
00:39:49,496 --> 00:39:51,246
How could have I been
a little more rigorous

983
00:39:51,246 --> 00:39:55,056
with defining this do you think?

984
00:39:55,056 --> 00:39:56,976
Specifically, I'm thinking
about my pattern matching.

985
00:39:57,116 --> 00:39:59,346
It's not quite as
robust or correct

986
00:39:59,346 --> 00:40:01,306
as I think it probably
should be,

987
00:40:01,306 --> 00:40:02,546
if you want to be
really nit-picky.

988
00:40:03,486 --> 00:40:04,366
Yeah. With--

989
00:40:05,246 --> 00:40:08,926
>> I don't know that would help

990
00:40:09,176 --> 00:40:15,146
if you could add
HTTP in front of www.

991
00:40:15,146 --> 00:40:15,706
>>HTTP in front of www.

992
00:40:15,706 --> 00:40:16,866
Oh, good-- so good
thoughts, to put HTTP,

993
00:40:16,866 --> 00:40:18,166
it would actually break them.

994
00:40:18,326 --> 00:40:21,126
Because HTTP host, the
variable is by definition,

995
00:40:21,126 --> 00:40:22,796
and you can only know this
by reading the manual,

996
00:40:22,996 --> 00:40:25,516
does not contain the protocol,
it only contains the host.

997
00:40:25,856 --> 00:40:28,646
How about if I point
at the end here?

998
00:40:29,686 --> 00:40:31,006
What could I be doing better?

999
00:40:31,626 --> 00:40:31,896
Yeah.

1000
00:40:32,076 --> 00:40:34,196
>> The slash at the end.

1001
00:40:34,196 --> 00:40:34,986
>> So good thought too.

1002
00:40:34,986 --> 00:40:37,826
Slash also though doesn't belong
because it's part of the path

1003
00:40:37,826 --> 00:40:39,796
and host is literally
just the host.

1004
00:40:40,056 --> 00:40:41,036
But it is something there.

1005
00:40:41,186 --> 00:40:43,016
If you are familiar with
the regular expressions,

1006
00:40:43,016 --> 00:40:43,516
it could be--

1007
00:40:43,796 --> 00:40:45,366
>> Sets, I think it corresponds,

1008
00:40:45,816 --> 00:40:47,126
gives [inaudible]
toward the end.

1009
00:40:47,346 --> 00:40:49,356
>> Exactly, and for
some crazy reason,

1010
00:40:49,356 --> 00:40:50,426
you would like to think that--

1011
00:40:50,426 --> 00:40:51,426
or it'd be a nice world

1012
00:40:51,426 --> 00:40:54,296
if the karat symbol represented
both the beginning and the end

1013
00:40:54,296 --> 00:40:56,326
of a string, but the
world chose dollar sign.

1014
00:40:56,616 --> 00:41:00,656
So, I should really put a
dollar sign after the T here,

1015
00:41:00,936 --> 00:41:01,956
because that would mean,

1016
00:41:01,956 --> 00:41:05,896
you have to literally
match NET and that's it.

1017
00:41:06,226 --> 00:41:07,466
Now, why is that relevant?

1018
00:41:08,476 --> 00:41:11,276
Well, it's probably not that
relevant because I do not know

1019
00:41:11,276 --> 00:41:15,266
of any top level domains
that exist today that are--

1020
00:41:15,266 --> 00:41:18,806
that started with NET and
have more letters after them.

1021
00:41:19,056 --> 00:41:20,626
But there's this
trendency [phonetic] now

1022
00:41:20,626 --> 00:41:22,896
where the world is
creating much bigger names.

1023
00:41:23,096 --> 00:41:25,096
And in fact if you
pay like $100,000,

1024
00:41:25,096 --> 00:41:27,226
you can get .google or .apple.

1025
00:41:27,226 --> 00:41:29,946
But someone could get
.networksolutions.

1026
00:41:30,156 --> 00:41:32,526
And as soon as we
do that, then again,

1027
00:41:32,526 --> 00:41:34,186
the pattern match
is not quite right.

1028
00:41:34,306 --> 00:41:36,046
But again, it has no
real material effect

1029
00:41:36,046 --> 00:41:38,206
because if DNS weren't set up,
the user would never reach me.

1030
00:41:38,546 --> 00:41:40,856
But again, just a little thing
to be mindful of that is not

1031
00:41:40,856 --> 00:41:42,066
as precise as we could be.

1032
00:41:43,026 --> 00:41:43,736
All right.

1033
00:41:43,776 --> 00:41:46,106
So, what is this-- OK,
that was really technical.

1034
00:41:46,556 --> 00:41:48,966
Who cares, what is
this really doing?

1035
00:41:49,636 --> 00:41:52,576
Why would the user
ever reached my website

1036
00:41:52,576 --> 00:41:55,996
and not already be
at www.cs75.net?

1037
00:41:56,616 --> 00:42:01,896
What is the point of these
three lines from a user's--

1038
00:42:02,066 --> 00:42:03,856
or really just big picture here?

1039
00:42:04,376 --> 00:42:08,566
How else could you
visit www.cs75.

1040
00:42:08,916 --> 00:42:11,076
net? Even today with
your laptops?

1041
00:42:11,176 --> 00:42:11,286
Yeah.

1042
00:42:11,666 --> 00:42:12,986
>> Use FTP.

1043
00:42:13,226 --> 00:42:15,076
>> OK, FTP but then
this won't even kick

1044
00:42:15,076 --> 00:42:19,366
in because this is just a web,
just a port 80, just HTTP.

1045
00:42:19,576 --> 00:42:21,136
How else could you visit
the course's on page?

1046
00:42:21,136 --> 00:42:21,226
Yeah.

1047
00:42:21,486 --> 00:42:23,276
>> There could be error in one

1048
00:42:23,276 --> 00:42:25,936
of the DNS server
that [inaudible]--

1049
00:42:25,936 --> 00:42:26,126
>> OK.

1050
00:42:26,626 --> 00:42:30,066
>> Someone to your ID and
[inaudible] who doesn't intend

1051
00:42:30,496 --> 00:42:31,506
to go your actual [inaudible].

1052
00:42:31,946 --> 00:42:32,546
>> Oh, so that's good.

1053
00:42:32,546 --> 00:42:33,886
So if there's a DNS error

1054
00:42:33,886 --> 00:42:35,756
or there's just some
maliciousness going on,

1055
00:42:35,846 --> 00:42:38,626
you could be lead to
our website and-- right?

1056
00:42:38,626 --> 00:42:40,906
We did this Monday, what was
this stupid little demo I did

1057
00:42:40,906 --> 00:42:44,756
on the fly that made a
certain news company look a

1058
00:42:44,756 --> 00:42:45,386
little silly?

1059
00:42:45,946 --> 00:42:49,896
>> Change the name CNN.

1060
00:42:50,316 --> 00:42:50,836
>> Yeah, right?

1061
00:42:50,836 --> 00:42:52,996
I think I had davidnews.com
all of a sudden

1062
00:42:52,996 --> 00:42:55,456
and we went there
and we stayed there.

1063
00:42:55,516 --> 00:42:57,926
And I mentioned at the time
that CNN, if they just put

1064
00:42:57,926 --> 00:43:00,616
like two lines of configuration
in the file, they could fix this

1065
00:43:00,836 --> 00:43:03,336
and immediately redirect the
user to protect their branding

1066
00:43:03,336 --> 00:43:05,736
so that it goes back
to www.cnn.com.

1067
00:43:06,126 --> 00:43:07,336
This is exactly the fix.

1068
00:43:07,616 --> 00:43:09,596
Now we're not doing it because
worried people are going to come

1069
00:43:09,596 --> 00:43:12,586
up with like fake cs75.com
or stupid stuff like that.

1070
00:43:12,846 --> 00:43:13,846
But just the simpler, what

1071
00:43:13,846 --> 00:43:18,626
if they just visit
http://cs75.net, enter.

1072
00:43:19,006 --> 00:43:21,496
We just decided as a course
that like most websites

1073
00:43:21,496 --> 00:43:24,686
on the internet, we want to
standardize not on cs75.net,

1074
00:43:24,686 --> 00:43:27,716
which we want to work but we
want to redirect the users

1075
00:43:27,716 --> 00:43:31,126
so that they end
up at www.cs75.net.

1076
00:43:31,126 --> 00:43:33,406
Now why? One of it is
just, you know, branding.

1077
00:43:33,406 --> 00:43:35,556
If you want to-- there's
something to be said for just

1078
00:43:35,556 --> 00:43:37,786
at least standardizing
what your URLS look like,

1079
00:43:37,836 --> 00:43:39,746
whether it has the www or not.

1080
00:43:40,196 --> 00:43:41,976
But more than that, we
mentioned briefly on Monday

1081
00:43:41,976 --> 00:43:44,446
and we'll revisit this in
time, the cookie issue.

1082
00:43:44,446 --> 00:43:46,536
Whereby, if you do
have a subdomain,

1083
00:43:46,536 --> 00:43:49,456
you can then isolate cookies
to be part of the www subdomain

1084
00:43:49,456 --> 00:43:51,836
and they don't have to be global

1085
00:43:51,836 --> 00:43:53,796
to your whole domain
name cs75.net.

1086
00:43:53,796 --> 00:43:56,976
So in another words, all
these lines are doing for us,

1087
00:43:56,976 --> 00:43:59,726
and these are literally the
lines we use on our website.

1088
00:43:59,986 --> 00:44:05,506
If I go to http://cs75.net,
enter, where do I end up?

1089
00:44:06,676 --> 00:44:10,786
Well a couple of places, one, I
ended up at www, just because.

1090
00:44:10,786 --> 00:44:14,896
But I also end up at the SSL
version also just because.

1091
00:44:14,896 --> 00:44:17,386
And then this, it's just
because we're using MediaWiki,

1092
00:44:17,386 --> 00:44:19,786
software that automatically
makes the default page called

1093
00:44:19,786 --> 00:44:21,226
main page for no good reason.

1094
00:44:21,706 --> 00:44:24,296
So there's a few
things going on there.

1095
00:44:24,496 --> 00:44:28,166
So you can infer from this
though, how can you enforce use

1096
00:44:28,226 --> 00:44:29,606
of SSL on your website?

1097
00:44:29,606 --> 00:44:31,976
Suppose you're bank,
suppose your Gmail these days

1098
00:44:31,976 --> 00:44:35,076
and you want to force
users to stay on HTTPS even

1099
00:44:35,076 --> 00:44:38,016
if they visit HTTP,
how do you do it?

1100
00:44:38,266 --> 00:44:40,026
Well, it's pretty much
the same trick here.

1101
00:44:40,196 --> 00:44:41,946
But rather than check
the host name

1102
00:44:41,946 --> 00:44:44,986
which is not the problem
now, you want to check SSL,

1103
00:44:45,286 --> 00:44:47,276
so what you can really
do in this case,

1104
00:44:47,356 --> 00:44:48,946
is instead align like this.

1105
00:44:48,946 --> 00:44:57,146
RewriteCond HTTPS not equal On.

1106
00:44:57,206 --> 00:44:59,246
So this i a light--
slightly different syntax

1107
00:44:59,666 --> 00:45:01,756
but this is a different
condition we could use

1108
00:45:01,896 --> 00:45:03,146
that asks a different question.

1109
00:45:03,446 --> 00:45:07,986
If the environment variable
called HTTPS is not equal ON,

1110
00:45:08,156 --> 00:45:10,566
on, that's the implication?

1111
00:45:10,566 --> 00:45:11,276
It means it's off.

1112
00:45:11,726 --> 00:45:12,856
And so what should you do?

1113
00:45:12,856 --> 00:45:15,106
Well, the next line is
that same rewrite role,

1114
00:45:15,336 --> 00:45:16,576
you will redirect the user.

1115
00:45:16,776 --> 00:45:18,856
So, this is how you enforce SSL.

1116
00:45:18,856 --> 00:45:21,426
This is one way you can
enforce SSL on a website.

1117
00:45:21,756 --> 00:45:21,856
Yeah.

1118
00:45:22,296 --> 00:45:26,016
>> And so this checks for
every page to say somewhat

1119
00:45:26,096 --> 00:45:28,556
about [inaudible]
.com slash banking--

1120
00:45:28,556 --> 00:45:28,976
>> Exactly.

1121
00:45:28,976 --> 00:45:31,776
>> -- but still work
[inaudible] send to the HTTPS.

1122
00:45:31,946 --> 00:45:32,486
>> Exactly.

1123
00:45:32,486 --> 00:45:36,476
This will work for every page
on the website because we had

1124
00:45:36,476 --> 00:45:40,006
that additional use of the
capturing parenthesis to ensure

1125
00:45:40,006 --> 00:45:42,106
that they don't just go back
to the generic home page,

1126
00:45:42,106 --> 00:45:44,036
which is just annoying at
least in my experience,

1127
00:45:44,536 --> 00:45:47,806
but rather they go to slash
whatever they were at.

1128
00:45:47,806 --> 00:45:49,826
And this gets installed
to clear either

1129
00:45:49,826 --> 00:45:51,846
in that file called httpd.conf.

1130
00:45:52,176 --> 00:45:56,696
But as you also see, there are
per directory file configuration

1131
00:45:56,696 --> 00:46:00,496
files that Apache supports
called HT access files,

1132
00:46:00,716 --> 00:46:04,626
literally just a text file
called period H-T-A-C-C-E-S-S.

1133
00:46:05,156 --> 00:46:07,306
And that syntax looks
very similar to this.

1134
00:46:07,806 --> 00:46:11,366
But, you can't necessarily do
everything in an HT access file

1135
00:46:11,366 --> 00:46:13,156
that you can in the main
server configuration.

1136
00:46:13,156 --> 00:46:15,706
In depends if people like
us, the system administrators

1137
00:46:15,706 --> 00:46:19,716
of a website let you put
certain commands in a directory.

1138
00:46:20,116 --> 00:46:23,326
So, you can use .htaccess files
the password protect directories

1139
00:46:23,326 --> 00:46:26,736
for instance to change mind-type

1140
00:46:26,736 --> 00:46:28,826
so to speak some
fairly arcane details.

1141
00:46:29,126 --> 00:46:30,666
But this is one of
the most compelling.

1142
00:46:31,496 --> 00:46:32,756
And there's actually
another one.

1143
00:46:33,886 --> 00:46:35,276
Facebook, if you're a user.

1144
00:46:35,426 --> 00:46:37,156
Almost, many of the URLs end

1145
00:46:37,306 --> 00:46:42,126
in what file extension
as we said on Monday?

1146
00:46:42,126 --> 00:46:44,166
So, .php just because, like,
for historical reasons,

1147
00:46:44,166 --> 00:46:46,566
they still use PHP for a lot
of their front end stuff,

1148
00:46:46,606 --> 00:46:47,736
but there's no technical reason

1149
00:46:47,736 --> 00:46:50,896
to expose what language
you're using on your server.

1150
00:46:50,896 --> 00:46:54,086
In fact, it feels like it's just
a waste of four bytes, right?

1151
00:46:54,086 --> 00:46:57,546
Why bother sending .php when
it's strictly not necessary.

1152
00:46:57,706 --> 00:47:00,436
And frankly it's very
web 2.0 these days

1153
00:47:00,436 --> 00:47:03,206
to have cleaner URLs,
prettier URLs.

1154
00:47:03,206 --> 00:47:05,286
They just don't have craft
like file extensions.

1155
00:47:05,626 --> 00:47:09,946
These httpd.conf and also HT
access files can also be used

1156
00:47:10,166 --> 00:47:14,676
to let you avoid ever
putting .php in your URLs.

1157
00:47:14,836 --> 00:47:17,326
Your files on your hard drive
can still be called hello.php

1158
00:47:17,326 --> 00:47:21,996
but the user could just visit
/hello and using mod rewrite,

1159
00:47:21,996 --> 00:47:23,626
you can essentially
tell the web server

1160
00:47:23,816 --> 00:47:28,266
if the file /hello does not
exist, look for /hello.php.

1161
00:47:28,266 --> 00:47:31,196
And if that exists,
serve that up instead.

1162
00:47:31,636 --> 00:47:31,846
Yeah.

1163
00:47:31,846 --> 00:47:33,496
>> No, nothing.

1164
00:47:34,026 --> 00:47:36,536
>> OK. So, lot of power.

1165
00:47:36,706 --> 00:47:39,096
I will say too, this
is one of the things

1166
00:47:39,096 --> 00:47:42,006
that frustrates some people
including myself the most

1167
00:47:42,146 --> 00:47:44,456
because the slightest
syntax error anywhere,

1168
00:47:44,656 --> 00:47:46,406
if you get the permissions
of the file wrong,

1169
00:47:46,406 --> 00:47:47,306
your whole website can break.

1170
00:47:47,616 --> 00:47:50,756
So, it's a lot of power and a
lot of trial and error and a lot

1171
00:47:50,756 --> 00:47:52,566
of googling sometimes
to solve these problems.

1172
00:47:53,096 --> 00:47:54,316
All right.

1173
00:47:54,796 --> 00:47:55,456
Any questions?

1174
00:47:56,166 --> 00:47:58,476
No? All right.

1175
00:47:58,476 --> 00:48:00,666
So, where can use
stuff like this?

1176
00:48:00,666 --> 00:48:03,146
Well, next week, when we start
talking about the first project,

1177
00:48:03,146 --> 00:48:05,466
we'll introduce this
appliance, this virtual machine

1178
00:48:05,696 --> 00:48:07,926
in which you have your own
version of Apache running.

1179
00:48:08,246 --> 00:48:11,186
But-- And certainly after the
course or even during the course

1180
00:48:11,186 --> 00:48:13,916
if you want to experiment
with other approaches,

1181
00:48:13,916 --> 00:48:17,076
it's actually very easy to get
LAMP onto your own computer.

1182
00:48:17,076 --> 00:48:18,656
You don't need to pay for
a web post, you don't need

1183
00:48:18,656 --> 00:48:20,026
to set up Linux computer.

1184
00:48:20,026 --> 00:48:21,446
You can do it on
your own Mac or PC.

1185
00:48:21,566 --> 00:48:25,536
In fact Mac OS these days comes
with Apache, comes with PHP,

1186
00:48:25,616 --> 00:48:28,376
comes with Python, comes
with Perl, a lot of support

1187
00:48:28,426 --> 00:48:31,176
for web programming
related stuff built

1188
00:48:31,176 --> 00:48:33,546
in even though you sometimes
have to run some commands

1189
00:48:33,546 --> 00:48:34,526
to actually enable it.

1190
00:48:34,526 --> 00:48:37,466
Your laptop is not a web server
by default even though Apache is

1191
00:48:37,466 --> 00:48:38,266
in there if it's a Mac.

1192
00:48:38,736 --> 00:48:39,896
Windows tends not to come

1193
00:48:39,896 --> 00:48:42,546
with as much software along
these lines but either way,

1194
00:48:42,546 --> 00:48:44,576
there are some packages,
this is one of them XAMPP

1195
00:48:44,836 --> 00:48:48,686
that makes it pretty easy
to make a web environment

1196
00:48:48,686 --> 00:48:50,356
on your own computer
not necessarily

1197
00:48:50,356 --> 00:48:52,266
for serving content
to real users.

1198
00:48:52,586 --> 00:48:54,896
We had that discussion on Monday
that, you know, getting users

1199
00:48:54,896 --> 00:48:57,506
from the outside world to your
home with your cable model

1200
00:48:57,506 --> 00:48:59,046
and all that, it's not trivial

1201
00:48:59,046 --> 00:49:00,956
and your ISP might not
even let you or like,

1202
00:49:01,576 --> 00:49:03,106
but for development purposes.

1203
00:49:03,106 --> 00:49:05,716
You don't need of
actual web server per se.

1204
00:49:05,716 --> 00:49:08,226
You don't need to pay anyone
to start doing web development.

1205
00:49:08,226 --> 00:49:10,166
You can do it on your
own local hard drive even

1206
00:49:10,166 --> 00:49:13,596
if it's not static content, HTML
files but it's actually dynamic

1207
00:49:13,856 --> 00:49:15,466
with something like PHP.

1208
00:49:15,466 --> 00:49:20,086
So, XAMPP is just the
product name for free software

1209
00:49:20,326 --> 00:49:24,936
that includes support for Linux,
Mac OS, Solaris, and Windows.

1210
00:49:24,936 --> 00:49:27,426
So, it doesn't matter what
OS you have and it installs

1211
00:49:27,426 --> 00:49:30,466
for you Apache, MySQL,
PHP and also even Perl

1212
00:49:30,716 --> 00:49:32,726
which is the other
P in LAMP sometimes.

1213
00:49:32,966 --> 00:49:35,676
Or actually no, that's
the P in XAMPP in LAMP.

1214
00:49:35,676 --> 00:49:36,526
So, what is this mean?

1215
00:49:36,526 --> 00:49:39,916
It means you go to their website
which is, you just google XAMPP

1216
00:49:40,016 --> 00:49:41,096
to pull up their page.

1217
00:49:41,406 --> 00:49:42,966
You can install the software.

1218
00:49:42,966 --> 00:49:46,316
And ideally, you then have
some nice documentation locally

1219
00:49:46,506 --> 00:49:48,526
and your own database,
your own web server,

1220
00:49:48,526 --> 00:49:49,326
your own installation of PHP

1221
00:49:49,326 --> 00:49:51,906
so you can do all your
development locally,

1222
00:49:51,906 --> 00:49:53,816
which is nice because
it's super fast.

1223
00:49:53,816 --> 00:49:55,536
And it means you can work
in a cafe or what not

1224
00:49:55,536 --> 00:49:57,726
without even having
internet access.

1225
00:49:58,116 --> 00:49:58,976
There are some corner cases.

1226
00:49:59,046 --> 00:50:01,626
XAMPP hasn't been the easiest
historically to set up.

1227
00:50:01,666 --> 00:50:04,566
Sometimes it does not quite
work on everyone's computers,

1228
00:50:04,566 --> 00:50:06,666
which is why we actually
transition to the VM approach

1229
00:50:06,816 --> 00:50:07,866
where we can guarantee

1230
00:50:07,866 --> 00:50:10,786
that everyone's environment is
the same and works correctly.

1231
00:50:11,156 --> 00:50:13,886
But certainly moving forward
when you no longer want to rely

1232
00:50:13,926 --> 00:50:17,206
on course provided software
realized this is a nice local

1233
00:50:17,316 --> 00:50:18,736
development option as well.

1234
00:50:18,816 --> 00:50:22,556
And similarly that you configure
most anything you would like.

1235
00:50:23,046 --> 00:50:24,856
Any questions then?

1236
00:50:24,856 --> 00:50:25,656
All right.

1237
00:50:26,336 --> 00:50:29,576
It feels like a good point
to take a five-minute break

1238
00:50:29,576 --> 00:50:32,086
and when we return, why
do not we dive into PHP

1239
00:50:32,086 --> 00:50:33,756
and actually finishing
the back end

1240
00:50:33,756 --> 00:50:35,396
of something like google.com.

1241
00:50:35,476 --> 00:50:40,266
So let's take five.

1242
00:50:41,356 --> 00:50:42,746
All right, we're back.

1243
00:50:42,746 --> 00:50:46,056
So just a couple of
details, you should have

1244
00:50:46,056 --> 00:50:49,016
or should soon receive
an email invitation

1245
00:50:49,056 --> 00:50:50,516
from the course's
discussion tool.

1246
00:50:50,776 --> 00:50:52,266
We'll post a link
and announcement

1247
00:50:52,266 --> 00:50:54,316
on the course's home page to
explain to where to go and how

1248
00:50:54,316 --> 00:50:56,966
to go if you do not receive such
a link but it would have gone

1249
00:50:56,966 --> 00:50:59,396
to this e-mail to
the e-mail address

1250
00:50:59,396 --> 00:51:01,206
with which you registered
for the course, FYI,

1251
00:51:01,346 --> 00:51:03,236
in case that's not in address
you use quite commonly.

1252
00:51:03,236 --> 00:51:06,066
But again, more details on the
course's home page by tomorrow.

1253
00:51:06,226 --> 00:51:09,276
Let me introduce another
of the course's TF's alone

1254
00:51:09,416 --> 00:51:11,726
who if you would not mind coming
up close to my microphone,

1255
00:51:11,726 --> 00:51:12,986
would like to say
hello to the class.

1256
00:51:13,166 --> 00:51:13,606
>> Hi everyone.

1257
00:51:13,896 --> 00:51:14,996
My name is Allan.

1258
00:51:14,996 --> 00:51:15,736
You can call me Allen.

1259
00:51:16,656 --> 00:51:19,956
It's-- These are for you and
I'm here to take your questions

1260
00:51:19,956 --> 00:51:21,146
and help you out
with anything you--

1261
00:51:21,486 --> 00:51:22,546
>> OK. Excellent.

1262
00:51:23,106 --> 00:51:26,236
And Peter how we met on Monday
will be back shortly this

1263
00:51:26,236 --> 00:51:28,976
evening and once lecture
wraps, we'll dive into section.

1264
00:51:28,976 --> 00:51:30,036
Which again will
be an opportunity

1265
00:51:30,036 --> 00:51:33,516
for slightly more intimate
Q&A to go over concepts

1266
00:51:33,546 --> 00:51:35,396
that might be a little
more abstract

1267
00:51:35,396 --> 00:51:38,096
and particularly once the
first project is released

1268
00:51:38,096 --> 00:51:40,526
which will be on July 9th is
when the first one will go out,

1269
00:51:40,726 --> 00:51:42,736
it will be an opportunity
particularly to focus

1270
00:51:42,736 --> 00:51:44,626
on the project and get
direction and guidance

1271
00:51:44,626 --> 00:51:46,016
and design tips on them.

1272
00:51:46,086 --> 00:51:47,506
So, more on that to come.

1273
00:51:47,926 --> 00:51:48,326
All right.

1274
00:51:48,486 --> 00:51:50,806
So, time for some PHP.

1275
00:51:50,806 --> 00:51:54,056
So recall that we
talked briefly about some

1276
00:51:54,056 --> 00:51:56,506
of the basic UI mechanisms
that browsers allow.

1277
00:51:56,506 --> 00:52:00,166
Radio buttons, text
fields, text areas,

1278
00:52:00,166 --> 00:52:01,106
checked boxes and the like.

1279
00:52:01,106 --> 00:52:01,956
And these really are going

1280
00:52:01,956 --> 00:52:04,016
to be the fundamental
mechanisms whereby we go

1281
00:52:04,016 --> 00:52:08,126
from static web sites with just
HTML and CSS to dynamic websites

1282
00:52:08,126 --> 00:52:10,446
with some kind of
server side intelligence

1283
00:52:10,686 --> 00:52:12,466
that does something
based on user input

1284
00:52:12,636 --> 00:52:14,916
to produce dynamic user output.

1285
00:52:15,176 --> 00:52:18,036
So these days, thankfully the
web is getting more interesting

1286
00:52:18,036 --> 00:52:21,916
and sexier than some of these
more old school UI mechanisms.

1287
00:52:21,916 --> 00:52:24,436
But even the fanciest
of autocomplete widgets

1288
00:52:24,436 --> 00:52:26,616
that you see, and
calendaring things

1289
00:52:26,616 --> 00:52:28,316
where you can choose
calendar dates and the

1290
00:52:28,316 --> 00:52:30,416
like are still built
on top of these

1291
00:52:30,416 --> 00:52:32,116
but all the more
stylized these days

1292
00:52:32,116 --> 00:52:33,566
with JavaScript and with CSS.

1293
00:52:33,696 --> 00:52:35,696
And so we'll look at
some of that fancier use

1294
00:52:35,696 --> 00:52:38,966
of input mechanisms in a
few weeks when get to AJAX

1295
00:52:38,966 --> 00:52:40,846
and JavaScript itself.

1296
00:52:40,846 --> 00:52:43,316
So here is a representative
snippet of Google.

1297
00:52:43,316 --> 00:52:44,446
Recall that on Monday,

1298
00:52:44,446 --> 00:52:46,856
we started implementing the same
interface even though it was all

1299
00:52:46,856 --> 00:52:47,916
black and white in text.

1300
00:52:48,146 --> 00:52:50,516
But we did have a text box and
we did have a couple of buttons

1301
00:52:50,516 --> 00:52:52,956
and when you click that submit
button, you actually ended

1302
00:52:52,956 --> 00:52:54,526
up initially nowhere, right?

1303
00:52:54,526 --> 00:52:56,996
We ended up on my same file,
which is not dynamic at all.

1304
00:52:57,146 --> 00:52:59,776
But then I went in and
change the action attributes

1305
00:52:59,776 --> 00:53:01,326
so that we actually
submit it to Google,

1306
00:53:01,506 --> 00:53:02,996
so technically we
cut some corners

1307
00:53:02,996 --> 00:53:05,046
and didn't implement a
dynamic website ourselves

1308
00:53:05,046 --> 00:53:08,446
but we did look at the basic
mechanism whereby form input

1309
00:53:08,476 --> 00:53:13,516
becomes get request or an
alternative to GET is POST.

1310
00:53:13,926 --> 00:53:15,256
For those familiar, what are--

1311
00:53:15,256 --> 00:53:17,196
what is one or more of the
fundamental differences

1312
00:53:17,196 --> 00:53:18,836
between using GET versus POST?

1313
00:53:18,836 --> 00:53:19,246
Yeah.

1314
00:53:19,246 --> 00:53:22,476
>> Oh, GET is actually going
to include what you entered

1315
00:53:22,636 --> 00:53:25,516
in the form of the URL.

1316
00:53:25,516 --> 00:53:25,666
>> OK.

1317
00:53:25,666 --> 00:53:27,916
>> And POST is just
not good into that.

1318
00:53:27,916 --> 00:53:28,566
>> OK. Excellent.

1319
00:53:28,566 --> 00:53:32,946
So GET request will have state
change in the URL itself.

1320
00:53:32,946 --> 00:53:35,176
And that's exactly what we
saw on Monday with the Google

1321
00:53:35,176 --> 00:53:37,356
where we had question
mark, what came next?

1322
00:53:38,356 --> 00:53:39,276
Question mark--

1323
00:53:39,886 --> 00:53:39,986
>> Q.

1324
00:53:39,986 --> 00:53:43,026
>> -- Q equals whatever--
harvard whatever I tap in--

1325
00:53:43,116 --> 00:53:44,466
type in or the user types in.

1326
00:53:44,756 --> 00:53:46,426
So POST does not do that.

1327
00:53:46,886 --> 00:53:49,206
So, that's a nice
distinction, but what's--

1328
00:53:49,206 --> 00:53:50,456
what are some more distinctions

1329
00:53:50,456 --> 00:53:52,866
or what would motivate
you using GET versus POST

1330
00:53:52,866 --> 00:53:55,356
if functionally they
could be the same.

1331
00:53:55,636 --> 00:53:57,436
You could still get
search results

1332
00:53:57,436 --> 00:53:59,016
for instance even though Google

1333
00:53:59,016 --> 00:54:00,486
as an aside does
not support POST.

1334
00:54:01,656 --> 00:54:03,686
What's the-- What else
should drive you to GET

1335
00:54:03,686 --> 00:54:06,556
versus POST or vice versa?

1336
00:54:07,176 --> 00:54:07,256
Yeah.

1337
00:54:08,456 --> 00:54:11,086
>> Well if you're on the
site that tends to deal

1338
00:54:11,086 --> 00:54:16,126
with uploads here then
why don't you suppose

1339
00:54:16,126 --> 00:54:18,076
with it had special ways
to deal with large files--

1340
00:54:18,076 --> 00:54:18,786
>> Excellent.

1341
00:54:18,786 --> 00:54:21,096
Yeah. So GET requests are
not so great for things

1342
00:54:21,096 --> 00:54:23,246
like file uploads,
photo uploads, right?

1343
00:54:23,296 --> 00:54:25,746
If anything conceptually,
this just make no sense,

1344
00:54:25,746 --> 00:54:27,696
how do you upload
a file in a URL.

1345
00:54:27,696 --> 00:54:30,426
Now technically, you can encode
it using something called base64

1346
00:54:30,426 --> 00:54:34,316
encoding where you convert the
binary image of zeros and ones

1347
00:54:34,316 --> 00:54:36,826
to As and Bs and Cs and
1, 2, 3s and so forth.

1348
00:54:37,086 --> 00:54:39,516
But the other gotcha is that
most browsers have a length

1349
00:54:39,866 --> 00:54:41,506
on the maximum length
of the URL.

1350
00:54:41,756 --> 00:54:43,586
Unfortunately, this
is not standardized

1351
00:54:43,586 --> 00:54:45,196
and it's barely even documented.

1352
00:54:45,196 --> 00:54:46,376
But the rough rule of thumb is

1353
00:54:46,376 --> 00:54:48,946
if your URL is several
hundred characters long,

1354
00:54:48,946 --> 00:54:49,986
it's probably too long.

1355
00:54:50,196 --> 00:54:53,356
And a reasonable cut off is
something like 1024 characters.

1356
00:54:53,356 --> 00:54:54,596
You're definitely
pushing your limits.

1357
00:54:54,876 --> 00:54:56,646
However, it's completely
browser dependent.

1358
00:54:56,686 --> 00:54:59,176
Some browser support
8000-character URLs,

1359
00:54:59,176 --> 00:55:02,616
1000-character URLs but the
take away is that, really,

1360
00:55:02,616 --> 00:55:04,406
you have to deal with
lowest common denominator,

1361
00:55:04,406 --> 00:55:05,286
whatever that is.

1362
00:55:05,416 --> 00:55:09,066
And so anytime your URL start
getting long, it's probably time

1363
00:55:09,066 --> 00:55:11,806
to rethink your design and start
using something called AJAX,

1364
00:55:11,806 --> 00:55:14,266
which again we'll
look at or using POST.

1365
00:55:14,266 --> 00:55:15,536
POST does not have a limit.

1366
00:55:16,066 --> 00:55:20,556
In fact, one of the upsides of
POST is that it in HTTP headers,

1367
00:55:20,756 --> 00:55:24,916
will tell the server how big
the file or parameters are

1368
00:55:24,916 --> 00:55:27,956
that are being posted, so to
speak, so that the browsers know

1369
00:55:27,996 --> 00:55:29,556
when it's received everything.

1370
00:55:29,556 --> 00:55:30,806
So the browser figures out.

1371
00:55:30,806 --> 00:55:33,646
OK, this is like a 5
megabyte photo, so I'm going

1372
00:55:33,646 --> 00:55:36,946
to tell the web server through
the headers expect 5 megabytes

1373
00:55:37,156 --> 00:55:40,376
And then with the-- server gets
is below all the headers is all

1374
00:55:40,376 --> 00:55:43,856
the crazy zeros and ones or
equivalently A, B, Cs, 1, 2,

1375
00:55:43,886 --> 00:55:45,536
3s but it knows where they stop.

1376
00:55:45,536 --> 00:55:47,476
So it knows when it's
received the whole photo.

1377
00:55:47,726 --> 00:55:48,816
Suppose there's grade for that,

1378
00:55:48,816 --> 00:55:51,186
what else is POST
compelling for?

1379
00:55:51,186 --> 00:55:53,656
What other used cases
besides file uploads?

1380
00:55:59,476 --> 00:56:01,086
And put on your paranoid hat.

1381
00:56:01,086 --> 00:56:03,376
If you're using GET,
what are you at risk for?

1382
00:56:03,956 --> 00:56:04,036
Yeah.

1383
00:56:05,016 --> 00:56:08,966
>> Somebody is actually is
snipping what the user sends.

1384
00:56:09,296 --> 00:56:09,656
>> Perfect.

1385
00:56:09,656 --> 00:56:12,126
So if-- and what might the user
send that could be sensitive?

1386
00:56:12,606 --> 00:56:14,956
>> I mean, you wouldn't really
send the password or a--

1387
00:56:14,956 --> 00:56:15,216
>> Good.

1388
00:56:15,286 --> 00:56:16,666
>> -- username with
the GET list.

1389
00:56:16,666 --> 00:56:17,056
>> OK, good.

1390
00:56:17,056 --> 00:56:19,516
So sending user names,
passwords, credit card numbers,

1391
00:56:19,516 --> 00:56:22,266
anything that's arguably
sensible probably should not be

1392
00:56:22,266 --> 00:56:23,906
submitted by a GET
because it ends

1393
00:56:23,906 --> 00:56:25,966
up in the URL, and
why is that bad?

1394
00:56:25,966 --> 00:56:28,256
Well fundamentally, it's still
being sent to the web server

1395
00:56:28,256 --> 00:56:30,336
and if it's over SSL,
it's at least encrypted.

1396
00:56:30,556 --> 00:56:33,076
However, it's not encrypted
from you family members

1397
00:56:33,076 --> 00:56:35,846
or your friends or your
roommates who might sit

1398
00:56:35,846 --> 00:56:37,036
down at your same computer.

1399
00:56:37,036 --> 00:56:39,436
And you know what you can do
with most browsers today browse

1400
00:56:39,436 --> 00:56:40,566
through the history, right?

1401
00:56:40,566 --> 00:56:43,096
And if it's in the URL, that
means it's going to get logged

1402
00:56:43,096 --> 00:56:44,546
and it's going to end
up in autocomplete

1403
00:56:44,546 --> 00:56:46,316
until the cache is
manually cleared.

1404
00:56:46,576 --> 00:56:48,566
It's just too easy then
for someone to find it.

1405
00:56:48,566 --> 00:56:50,046
And it's also going to
end up somewhere else.

1406
00:56:50,956 --> 00:56:53,306
Even though it might be
transmitted over SSL,

1407
00:56:53,306 --> 00:56:56,056
so random people on the internet
or Starbucks can't see it,

1408
00:56:56,746 --> 00:56:59,726
once the server gets the
request, many servers as we--

1409
00:56:59,856 --> 00:57:03,356
you can maybe infer from the
httpd.conf configuration file

1410
00:57:03,356 --> 00:57:05,036
are there have logs.

1411
00:57:05,256 --> 00:57:06,966
And what tends to
get logged in logs?

1412
00:57:06,996 --> 00:57:09,636
Not POST, because they could be
huge, 5 megabytes and what not.

1413
00:57:09,936 --> 00:57:12,306
But typically what
are logged in logs?

1414
00:57:13,296 --> 00:57:15,976
GET requests, including
the URL that was visited.

1415
00:57:16,306 --> 00:57:18,736
Which means any website
that's ever used GET

1416
00:57:18,736 --> 00:57:21,986
for password authentication
or credit card submission--

1417
00:57:21,986 --> 00:57:24,056
which would be rare but
could happen especially

1418
00:57:24,056 --> 00:57:25,336
if the person does not
know what they are doing--

1419
00:57:25,706 --> 00:57:26,916
it's ending up in the logs.

1420
00:57:26,916 --> 00:57:30,166
Which means some random person's
unencrypted log files has all

1421
00:57:30,166 --> 00:57:31,646
of your sensitive information.

1422
00:57:31,646 --> 00:57:33,596
So in short, anytime
something is big

1423
00:57:33,596 --> 00:57:34,856
or anytime something
is sensitive,

1424
00:57:35,136 --> 00:57:36,496
GET is not the way to go.

1425
00:57:36,806 --> 00:57:39,156
However, that would
seem that's just fine,

1426
00:57:39,156 --> 00:57:40,536
just use POST all
the time, right?

1427
00:57:40,536 --> 00:57:41,676
Just avoid all these
issues together.

1428
00:57:41,676 --> 00:57:43,026
I do not have to remember
what the difference is.

1429
00:57:43,916 --> 00:57:46,326
But what's the downside
of using POST?

1430
00:57:47,016 --> 00:57:49,956
Based on your own, maybe even
non-technical user experience,

1431
00:57:50,506 --> 00:57:51,286
what's the downside?

1432
00:57:51,816 --> 00:57:51,896
Yeah.

1433
00:57:56,676 --> 00:57:59,486
>> Can copy-paste the URL--

1434
00:57:59,676 --> 00:57:59,926
>> Yeah.

1435
00:57:59,926 --> 00:58:00,366
>> -- available [inaudible].

1436
00:58:01,306 --> 00:58:01,666
>> Perfect.

1437
00:58:01,666 --> 00:58:02,996
You can copy-paste the URL.

1438
00:58:02,996 --> 00:58:04,856
Completely reasonably
concern especially

1439
00:58:04,856 --> 00:58:06,466
from the user experience
user perspective.

1440
00:58:06,466 --> 00:58:08,276
Because, very reasonable
for someone who want

1441
00:58:08,276 --> 00:58:11,046
to copy the URL say "Oh,
check out this book" or "check

1442
00:58:11,046 --> 00:58:12,956
out this link", whatever
it is you're looking at.

1443
00:58:13,196 --> 00:58:15,006
And it's actually pretty
infuriating when the person

1444
00:58:15,006 --> 00:58:17,316
who receives the email says
"Oh, I only see their home page"

1445
00:58:17,456 --> 00:58:18,926
because they just
redirected them,

1446
00:58:18,926 --> 00:58:20,166
because of a number of things.

1447
00:58:20,386 --> 00:58:23,696
One, the state that was
necessary to remember that book,

1448
00:58:23,696 --> 00:58:25,506
the ISPN or whatever
was not in the URL

1449
00:58:25,506 --> 00:58:29,356
because they were using POST, or
it's even worse, some websites--

1450
00:58:29,356 --> 00:58:30,736
even I think the
Harvard Coop does this.

1451
00:58:31,046 --> 00:58:33,146
When you navigate
around their website,

1452
00:58:33,546 --> 00:58:35,326
the URL similarly doesn't change

1453
00:58:35,326 --> 00:58:37,546
because the information being
stored is best that I can tell

1454
00:58:37,546 --> 00:58:38,646
in their session cookies.

1455
00:58:38,646 --> 00:58:41,586
Something we'll talk about on
next week or later tonight,

1456
00:58:41,966 --> 00:58:44,696
whereby it's only
remembered by the server.

1457
00:58:44,696 --> 00:58:46,096
Thanks to a cookie
where you are,

1458
00:58:46,096 --> 00:58:48,776
which means even you can't
bookmark your own pages

1459
00:58:48,806 --> 00:58:49,796
that are of interest to you.

1460
00:58:50,056 --> 00:58:51,646
So in short, horrible design,

1461
00:58:51,646 --> 00:58:53,936
and some websites are
very much guilty of this.

1462
00:58:53,936 --> 00:58:57,246
So how many time you want the
user to be able to save state

1463
00:58:57,346 --> 00:59:01,166
in a URL rather in an email or
just with the back button too.

1464
00:59:01,336 --> 00:59:04,586
It's helpful to make sure
it is in the URL itself.

1465
00:59:05,246 --> 00:59:09,366
Of course there's
another reason,

1466
00:59:10,056 --> 00:59:12,536
this is getting better these
days with modern browsers,

1467
00:59:12,756 --> 00:59:15,276
but typically with POSTs
if you click reload,

1468
00:59:15,596 --> 00:59:18,026
you'll often get prompted
and the website will say

1469
00:59:18,026 --> 00:59:19,746
or the browser will say
"Are you sure you want

1470
00:59:19,746 --> 00:59:21,016
to resubmit this form?"

1471
00:59:21,266 --> 00:59:23,776
So there's also issues of
resubmitting forms and what not

1472
00:59:23,776 --> 00:59:24,736
that are typically bad.

1473
00:59:24,736 --> 00:59:27,106
And so one of the things that's
got in more common these days

1474
00:59:27,476 --> 00:59:29,976
to avoid people accidentally
checking out twice

1475
00:59:30,186 --> 00:59:33,476
or buying things twice on
an online store, you know,

1476
00:59:33,476 --> 00:59:35,086
having that message
say, wait a minute,

1477
00:59:35,086 --> 00:59:36,516
are you sure you want
to submit this form?

1478
00:59:36,746 --> 00:59:38,686
What you can often do is--

1479
00:59:38,686 --> 00:59:40,516
once the user does a POST

1480
00:59:40,516 --> 00:59:42,196
because they have
uploaded something

1481
00:59:42,486 --> 00:59:43,776
or they bought something,

1482
00:59:44,026 --> 00:59:47,106
what you then do is immediately
redirect them with a 301

1483
00:59:47,106 --> 00:59:49,846
or a 302 which only use GETs.

1484
00:59:49,846 --> 00:59:52,806
You cannot use redirects to
repost somewhere else, FYI.

1485
00:59:53,296 --> 00:59:55,516
Then the user, if they
accidentally hit reload

1486
00:59:55,516 --> 00:59:58,216
or hit back in their browser,
they're only going to get back

1487
00:59:58,216 --> 00:59:59,976
and forth between a GET
requests not a POST.

1488
01:00:00,636 --> 01:00:02,716
So you can also discourage
the user

1489
01:00:02,716 --> 01:00:04,376
from submitting a form again.

1490
01:00:04,596 --> 01:00:06,296
And there's other protections
you can put in place,

1491
01:00:06,596 --> 01:00:07,716
but that's another reason, too,

1492
01:00:07,936 --> 01:00:10,006
if you want to avoid
resubmission of forms.

1493
01:00:10,306 --> 01:00:13,306
Sending a GET via
redirect can be one level

1494
01:00:13,336 --> 01:00:14,946
of protection against that.

1495
01:00:15,636 --> 01:00:16,106
All right.

1496
01:00:17,126 --> 01:00:19,236
So, here we go with PHP.

1497
01:00:19,236 --> 01:00:21,466
This is going to be a
fairly rapid tour of this,

1498
01:00:21,466 --> 01:00:23,876
because again the course
does assume nontrivial prior

1499
01:00:23,876 --> 01:00:24,796
programming experience.

1500
01:00:25,076 --> 01:00:26,776
So this is another
detail to where

1501
01:00:26,776 --> 01:00:29,786
if you find yourself what
is programming, again,

1502
01:00:29,786 --> 01:00:31,576
we should have a
conversation right after class

1503
01:00:31,576 --> 01:00:34,116
or over the LAN or with Peter
if you're more comfortable

1504
01:00:34,116 --> 01:00:36,156
about what your own background
is because we're going

1505
01:00:36,156 --> 01:00:38,796
to start talking about things
like arrays and hash tables

1506
01:00:38,796 --> 01:00:39,826
and associative arrays.

1507
01:00:39,826 --> 01:00:43,176
And if this is all new to
you, it's definitely going

1508
01:00:43,176 --> 01:00:44,896
to be a bigger challenge

1509
01:00:45,176 --> 01:00:47,036
but we've certainly had
students do it before,

1510
01:00:47,036 --> 01:00:48,556
so use your judgment
along the way.

1511
01:00:49,026 --> 01:00:50,076
So, one of the best things

1512
01:00:50,076 --> 01:00:52,986
about PHP is its
documentation to be honest.

1513
01:00:52,986 --> 01:00:55,066
It's actually fairly
user-friendly,

1514
01:00:55,346 --> 01:00:57,476
very nice to navigate
and so let me just follow

1515
01:00:57,476 --> 01:00:59,856
up an arbitrary example,
kind of a boring function

1516
01:00:59,856 --> 01:01:01,086
but one that's commonly used.

1517
01:01:01,436 --> 01:01:03,366
If I Google PHP date function,

1518
01:01:03,976 --> 01:01:07,106
I can go up to a representative
documentation page here.

1519
01:01:07,336 --> 01:01:08,976
And just to give you quick tour

1520
01:01:09,486 --> 01:01:11,266
of something you'll see
much more when you dive

1521
01:01:11,266 --> 01:01:15,066
into the course's projects,
along the left-hand side

1522
01:01:15,066 --> 01:01:19,726
of the website is a list
of all of the related

1523
01:01:19,726 --> 01:01:24,336
or available functions, PHP
is actually not this slow

1524
01:01:24,336 --> 01:01:25,246
of a language usually.

1525
01:01:25,926 --> 01:01:29,656
Let's try reloading.

1526
01:01:35,756 --> 01:01:37,926
OK. She didn't-- oh, so,

1527
01:01:37,926 --> 01:01:39,326
actually there's an
interesting lesson there.

1528
01:01:39,746 --> 01:01:41,866
So actually, let's try
this rather than just give

1529
01:01:41,866 --> 01:01:42,616
up on this altogether.

1530
01:01:42,616 --> 01:01:45,636
Let me see if we can--
oh, damn network.

1531
01:01:45,796 --> 01:01:48,096
So I was going to pull
up Chrome's network tab,

1532
01:01:48,096 --> 01:01:50,626
we could look at exactly what
was hanging there, but it seems

1533
01:01:50,626 --> 01:01:51,576
to have resolved itself.

1534
01:01:51,576 --> 01:01:53,276
So, a quick tour then
of the page here.

1535
01:01:53,516 --> 01:01:56,106
So on the left-hand side is
all of the related functions,

1536
01:01:56,106 --> 01:01:59,036
just FYI, a little overwhelming
at first but the reality is

1537
01:01:59,036 --> 01:02:00,786
for this class and really
in general, you're not going

1538
01:02:00,786 --> 01:02:02,756
to need to know every one of
these functions, just looking it

1539
01:02:02,756 --> 01:02:04,816
up on demand is useful
enough typically.

1540
01:02:05,106 --> 01:02:08,196
On the right-hand side is the
canonical layout of a function.

1541
01:02:08,196 --> 01:02:10,036
So, it tells you
first what version

1542
01:02:10,036 --> 01:02:11,586
of PHP supports this function.

1543
01:02:11,586 --> 01:02:14,676
This is actually important not
so much when you control your

1544
01:02:14,676 --> 01:02:17,146
on own server because either
you'll be running yourself

1545
01:02:17,146 --> 01:02:20,666
if it's your own server, pretty
recent version of PHP, 5.1, 5.2,

1546
01:02:20,666 --> 01:02:24,586
5.3, 5.4, or fairly recent
incarnations but 5.4 the latest.

1547
01:02:24,936 --> 01:02:26,576
But there are some
web hosting companies

1548
01:02:26,576 --> 01:02:29,606
that might still be running
PHP 4, not terribly common

1549
01:02:29,876 --> 01:02:31,306
but you will lose a huge amount

1550
01:02:31,306 --> 01:02:33,706
of functionality including
object-oriented programming

1551
01:02:33,706 --> 01:02:38,086
support, if you are something
as old as PHP 4, just FYI.

1552
01:02:38,416 --> 01:02:39,646
So, what does this function do?

1553
01:02:39,646 --> 01:02:41,746
It formats a local date
and time which means

1554
01:02:41,746 --> 01:02:48,756
if I give it a string like H
colon M for hours colon minutes

1555
01:02:48,756 --> 01:02:49,666
or something like that.

1556
01:02:49,966 --> 01:02:54,476
It should return to me a
formatted string like 3:00 p.m.

1557
01:02:54,956 --> 01:02:55,886
or something like that.

1558
01:02:55,886 --> 01:02:58,036
So, that's what it does, it
gives me the current time

1559
01:02:58,216 --> 01:03:01,546
or it converts a numeric
time stamp to a date.

1560
01:03:01,846 --> 01:03:04,146
So, here is how you
parse the signatures.

1561
01:03:04,386 --> 01:03:05,746
This means it returns a string.

1562
01:03:05,796 --> 01:03:07,026
This means it takes a string

1563
01:03:07,026 --> 01:03:08,766
as its first argument,
which is the format.

1564
01:03:09,186 --> 01:03:12,656
Any variable in PHP as we'll
see quite a bit is it starts

1565
01:03:12,656 --> 01:03:13,666
with a dollar sign.

1566
01:03:14,156 --> 01:03:17,376
Square brackets in documentation
means it's optional,

1567
01:03:17,376 --> 01:03:19,886
which means if you want to
override the current date

1568
01:03:19,886 --> 01:03:22,346
and time you can pass a
new numeric time stamp.

1569
01:03:22,616 --> 01:03:24,136
For those unfamiliar,
a time stamp

1570
01:03:24,136 --> 01:03:26,506
in many programming languages
is the number of seconds

1571
01:03:26,506 --> 01:03:29,236
since January 1st 1970,
the so-called epoch.

1572
01:03:29,676 --> 01:03:32,036
And then you can override
the default behavior.

1573
01:03:32,036 --> 01:03:35,106
Useful if you've stored time
stamps in like a database

1574
01:03:35,106 --> 01:03:36,226
and you want to display them

1575
01:03:36,226 --> 01:03:38,546
in some human friendly
way after the facts.

1576
01:03:39,056 --> 01:03:39,496
All right.

1577
01:03:39,496 --> 01:03:41,296
This returns a string
format [inaudible]

1578
01:03:41,296 --> 01:03:43,186
to given format string
dot, dot, dot.

1579
01:03:43,606 --> 01:03:45,546
Here's just some more
detail on the format,

1580
01:03:45,606 --> 01:03:49,386
so the format parameter can
apparently be a quoted string

1581
01:03:49,386 --> 01:03:52,246
containing all of these
various placeholders, D for day,

1582
01:03:52,536 --> 01:03:55,696
J for day of the month without
leading zeroes and so forth.

1583
01:03:55,866 --> 01:03:58,616
Memorizing this is not a
good use of any human's time,

1584
01:03:59,026 --> 01:04:00,476
but looking it up is reasonable.

1585
01:04:00,696 --> 01:04:02,596
Let's just scroll
down, past all of that.

1586
01:04:02,596 --> 01:04:04,116
Timestamp does what I promised.

1587
01:04:04,116 --> 01:04:06,856
Return value returns a
formatted date string.

1588
01:04:07,186 --> 01:04:09,846
If you do something wrong,
it goes on to explain

1589
01:04:09,846 --> 01:04:10,796
that there's an error.

1590
01:04:11,006 --> 01:04:12,786
And then let me scroll
down here.

1591
01:04:12,966 --> 01:04:14,126
The examples, frankly, is

1592
01:04:14,126 --> 01:04:17,026
where my eye is typically
drawn most immediately.

1593
01:04:17,126 --> 01:04:20,946
So, if I take a look here
this gives me some little

1594
01:04:20,946 --> 01:04:21,466
cheat sheets.

1595
01:04:21,516 --> 01:04:25,006
If I want to print
out echo date "l"

1596
01:04:25,246 --> 01:04:27,486
for whatever reason L
denotes the day of the week.

1597
01:04:27,486 --> 01:04:29,066
If it is Monday, it
would print Monday.

1598
01:04:29,066 --> 01:04:31,396
Today, it would print
Wednesday dynamically.

1599
01:04:31,706 --> 01:04:33,556
Here's some more
complicated string

1600
01:04:33,666 --> 01:04:37,276
that they claim will print out
this and so on and so forth.

1601
01:04:37,346 --> 01:04:39,476
This is the kind of thing
that this function does.

1602
01:04:39,746 --> 01:04:41,466
But the takeaway is,
for our purposes now,

1603
01:04:41,466 --> 01:04:44,036
is just PHP's documentation is
always structured in this way.

1604
01:04:44,336 --> 01:04:48,046
Summary of the function up top,
description of the parameters,

1605
01:04:48,116 --> 01:04:50,066
some version notes
in case you need

1606
01:04:50,066 --> 01:04:52,056
to be aware what
version of PHP you have.

1607
01:04:52,486 --> 01:04:54,616
Example one, example
two, example three.

1608
01:04:54,616 --> 01:04:56,896
And then at the bottom,
there's generally some pretty

1609
01:04:56,896 --> 01:05:00,476
intelligent discussion on the
comment threads that are there.

1610
01:05:00,476 --> 01:05:01,706
It's not really crazy talk.

1611
01:05:01,706 --> 01:05:03,136
This seemed to moderate
it quite well,

1612
01:05:03,346 --> 01:05:06,316
so you actually see people
sharing useful code for command,

1613
01:05:06,316 --> 01:05:08,836
workarounds or common tricks
that someone might want

1614
01:05:08,836 --> 01:05:10,646
to do related to
the date function.

1615
01:05:10,646 --> 01:05:12,756
So in short, the documentation
will be your friend

1616
01:05:12,756 --> 01:05:14,926
and what we will do in
lecture is not to go

1617
01:05:14,926 --> 01:05:17,496
through mind-numbing tours of
the various functions that exist

1618
01:05:17,496 --> 01:05:20,256
and so forth, but focus much
more so on the concepts,

1619
01:05:20,256 --> 01:05:22,976
on the syntax, and on
the overall framework

1620
01:05:22,976 --> 01:05:25,576
so that you know as you
dive in to how do I do this,

1621
01:05:25,576 --> 01:05:28,436
how do I do this, where
it fits in big picture

1622
01:05:28,546 --> 01:05:29,576
in terms of a project.

1623
01:05:30,146 --> 01:05:33,806
So PHP is an interpreted
language.

1624
01:05:33,806 --> 01:05:35,876
What does it mean for a
language to be interpreted?

1625
01:05:36,386 --> 01:05:40,596
Or what is the opposite

1626
01:05:41,426 --> 01:05:43,636
of an interpreted language
even though they're not truly

1627
01:05:44,056 --> 01:05:44,956
literally opposites.

1628
01:05:45,316 --> 01:05:45,406
Yeah?

1629
01:05:46,016 --> 01:05:47,556
[ Inaudible Remark ]

1630
01:05:47,556 --> 01:05:50,186
A compiled language, so a
compiled language is something

1631
01:05:50,186 --> 01:05:54,506
like C or C++, or language
that has source code written

1632
01:05:54,506 --> 01:05:56,816
in English-like syntax
but you have to run it

1633
01:05:56,816 --> 01:05:59,916
through a compiler like GCC
or Visual Studio or the like

1634
01:06:00,196 --> 01:06:02,556
and it outputs what's
generally called object code

1635
01:06:02,656 --> 01:06:05,856
or more specifically zeroes and
ones that are patterned in a way

1636
01:06:06,086 --> 01:06:08,756
that a CPU like an
Intel CPU understands.

1637
01:06:08,966 --> 01:06:11,576
An interpreted language
skips that step, essentially,

1638
01:06:11,776 --> 01:06:13,586
whereby instead you
write the source code

1639
01:06:13,586 --> 01:06:15,416
and then you pass
your source code

1640
01:06:15,416 --> 01:06:17,336
through what's called
an interpreter instead

1641
01:06:17,336 --> 01:06:19,966
of a compiler and then an
interpreter essentially reads

1642
01:06:19,966 --> 01:06:21,436
that language that
you've written,

1643
01:06:21,436 --> 01:06:23,786
the source code you've written,
top to bottom, left to right,

1644
01:06:24,016 --> 01:06:27,036
doing line by line exactly
what you tell it to do.

1645
01:06:27,196 --> 01:06:29,406
So the upside is, there's
no intermediate step,

1646
01:06:29,406 --> 01:06:31,936
you don't have to run the
compiler then run your program.

1647
01:06:32,316 --> 01:06:34,736
In an interpreted world,
you just run your program

1648
01:06:34,796 --> 01:06:35,976
through the interpreter
and it's that.

1649
01:06:35,976 --> 01:06:37,156
It's one step instead of two.

1650
01:06:37,156 --> 01:06:39,856
But what's the downside
of the fact

1651
01:06:39,856 --> 01:06:42,166
that it's interpreting it
line by line as opposed

1652
01:06:42,166 --> 01:06:43,596
to converting it
to zeroes and ones?

1653
01:06:44,036 --> 01:06:44,326
>> Performance.

1654
01:06:44,646 --> 01:06:45,786
>> Performance, typically.

1655
01:06:45,786 --> 01:06:48,156
So compiled languages
tend to be faster

1656
01:06:48,156 --> 01:06:50,196
because you're spending
more time in memory

1657
01:06:50,196 --> 01:06:52,736
and disk space upfront
to convert source codes

1658
01:06:52,736 --> 01:06:55,206
to object codes, zeroes and
ones, but once it zeroes

1659
01:06:55,206 --> 01:06:58,816
and ones, it's super ready to be
read and understood by the CPU.

1660
01:06:59,096 --> 01:07:01,136
Whereas an interpreted
language typically needs

1661
01:07:01,136 --> 01:07:03,516
to be literally interpreted
again and again,

1662
01:07:03,516 --> 01:07:07,256
and every time I call the
date function, D-A-T-E needs

1663
01:07:07,256 --> 01:07:11,016
to be parsed or read and
then converted effectively

1664
01:07:11,016 --> 01:07:12,686
to the underlying functionality.

1665
01:07:13,066 --> 01:07:16,416
Now, there exists
compilers of sorts for PHP

1666
01:07:16,416 --> 01:07:18,616
and for other interpreted
languages

1667
01:07:18,616 --> 01:07:20,546
and what are called
opcode caches.

1668
01:07:20,546 --> 01:07:22,176
More on this at the end of
this semester when we talk

1669
01:07:22,176 --> 01:07:24,946
about scalability, which
simply means for now,

1670
01:07:25,166 --> 01:07:26,686
that smart web servers

1671
01:07:26,866 --> 01:07:30,136
and interpreters will do
the interpretation once,

1672
01:07:30,466 --> 01:07:33,486
convert it to some intermediate
format and then save

1673
01:07:33,486 --> 01:07:34,496
that intermediate format,

1674
01:07:34,576 --> 01:07:38,346
which in the PHP world is
called opcodes, O-P-C-O-D-E-S.

1675
01:07:38,836 --> 01:07:41,946
And this just means it will skip
that step the next time around.

1676
01:07:41,946 --> 01:07:44,596
It's not quite compiled
but at least it's better,

1677
01:07:44,596 --> 01:07:46,226
it's a closer approximation
to it.

1678
01:07:46,526 --> 01:07:48,896
Frankly, it's a nice thing
with interpreted languages

1679
01:07:48,896 --> 01:07:51,116
because you don't have to go
through that annoying step

1680
01:07:51,366 --> 01:07:52,846
of recompiling and recompiling.

1681
01:07:52,876 --> 01:07:55,386
Every time you make a
change, you can interact

1682
01:07:55,386 --> 01:07:57,146
with your code a
lot more fluidly.

1683
01:07:57,146 --> 01:07:59,716
It just saves some steps,
especially for large projects

1684
01:07:59,986 --> 01:08:02,726
which might have large number
of files and lines of code

1685
01:08:02,726 --> 01:08:04,456
to actually compile otherwise.

1686
01:08:04,456 --> 01:08:06,346
So, some upsides
and some downsides.

1687
01:08:06,346 --> 01:08:09,416
If you're crazy popular
like someone like Facebook,

1688
01:08:09,416 --> 01:08:12,506
Facebook actually has a
framework called HipHop.

1689
01:08:12,716 --> 01:08:15,406
It's PHP which they released
open source a while back

1690
01:08:15,406 --> 01:08:20,106
which actually compiles PHP down
to C++ which is then compiled

1691
01:08:20,106 --> 01:08:23,486
and turned to object codes
to get maximal performance

1692
01:08:23,896 --> 01:08:25,186
out of the code that they write.

1693
01:08:25,186 --> 01:08:26,836
And this is motivated
by a number of things,

1694
01:08:26,836 --> 01:08:29,676
but among the things they
discuss publicly is this way,

1695
01:08:29,676 --> 01:08:32,606
PHP is fairly omnipresent
and it's fairly easy language

1696
01:08:32,606 --> 01:08:33,916
for people to learn
especially coming

1697
01:08:33,916 --> 01:08:34,966
out of college and the like.

1698
01:08:35,256 --> 01:08:38,026
So it means they can have their
developers using a language

1699
01:08:38,026 --> 01:08:40,606
that's fairly easy to learn,
they probably already know it,

1700
01:08:40,886 --> 01:08:43,226
and they can then defer
the performance details

1701
01:08:43,276 --> 01:08:45,686
that are typically associated
with the language to some

1702
01:08:45,686 --> 01:08:48,996
of their more advances engineers
who can then take PHP code

1703
01:08:49,196 --> 01:08:51,646
down to something that's
even more highly performing.

1704
01:08:51,766 --> 01:08:53,846
So among the options
that exists these days.

1705
01:08:54,136 --> 01:08:55,656
So a lot of the arguments
you might see on the web

1706
01:08:55,656 --> 01:08:58,666
about performance of PHP
versus Ruby versus Python

1707
01:08:58,666 --> 01:08:59,776
versus Java versus this.

1708
01:09:00,246 --> 01:09:02,596
There are many, many
different technical solutions

1709
01:09:02,596 --> 01:09:03,586
to the performance question.

1710
01:09:03,586 --> 01:09:06,106
And a very valid heuristic, I
think, when choosing a language,

1711
01:09:06,106 --> 01:09:07,566
whether it's going
to be PHP or another,

1712
01:09:07,896 --> 01:09:10,026
is what you already know
and what the cost is to you

1713
01:09:10,026 --> 01:09:11,476
to develop or to
learning something else,

1714
01:09:11,476 --> 01:09:13,276
what friends know or what
your colleagues know,

1715
01:09:13,746 --> 01:09:17,896
and also what tools exist to
mitigate, and the prices you pay

1716
01:09:18,196 --> 01:09:20,146
to use something like
an interpreted language.

1717
01:09:21,596 --> 01:09:24,246
So suPHP, this is something
that will be installed

1718
01:09:24,246 --> 01:09:25,476
in the CS50 Appliance.

1719
01:09:25,536 --> 01:09:28,686
It is installed on some
web host, but not nearly

1720
01:09:28,686 --> 01:09:30,206
as many as would be good.

1721
01:09:30,726 --> 01:09:34,726
So suPHP is substitute
user PHP and it exists

1722
01:09:34,726 --> 01:09:36,486
to solve the following problem.

1723
01:09:36,486 --> 01:09:39,906
When you have web server, you
have software running on it

1724
01:09:39,906 --> 01:09:42,696
that listens for connections
on port A and so forth.

1725
01:09:42,896 --> 01:09:46,496
Years ago, most such servers
ran as a username called root.

1726
01:09:46,596 --> 01:09:49,246
Root is the administrative
user and running anything

1727
01:09:49,246 --> 01:09:51,486
as root is generally bad, why?

1728
01:09:52,116 --> 01:09:52,226
Yes?

1729
01:09:53,136 --> 01:09:56,396
>> Well, you can, like,
destroy your computer.

1730
01:09:56,886 --> 01:09:58,106
>> You can destroy
your computer, how?

1731
01:09:59,356 --> 01:09:59,876
Be more specific.

1732
01:09:59,876 --> 01:10:03,316
>> Well, you can remove,
like, files that are essential

1733
01:10:03,316 --> 01:10:04,446
to the operating system.

1734
01:10:06,356 --> 01:10:09,526
>> Good. So if the root
user has full-fledged access

1735
01:10:09,526 --> 01:10:12,126
to the system, if you make
a mistake in your code,

1736
01:10:12,126 --> 01:10:15,136
if it's a web server, and
you're running web code,

1737
01:10:15,296 --> 01:10:17,646
and you make a mistake and you
accidentally delete the wrong

1738
01:10:17,646 --> 01:10:20,416
directory, that is permanent,
like you can touch anything

1739
01:10:20,416 --> 01:10:22,366
on the system including
the password file

1740
01:10:22,366 --> 01:10:24,306
which even though is encrypted
should not generally be shared

1741
01:10:24,306 --> 01:10:24,836
with the world.

1742
01:10:25,046 --> 01:10:27,656
So in short, running anything
as root puts you at risk

1743
01:10:27,866 --> 01:10:30,076
because if what root
is doing is bugging.

1744
01:10:30,246 --> 01:10:32,506
And odds are you're human,
you error, you're going

1745
01:10:32,506 --> 01:10:33,976
to write buggy code sometimes,

1746
01:10:33,976 --> 01:10:35,466
that means who's
running the buggy code,

1747
01:10:35,466 --> 01:10:37,016
the most important
user on the system

1748
01:10:37,256 --> 01:10:39,076
which means your entire
machine could be compromised

1749
01:10:39,076 --> 01:10:40,046
if you screw up.

1750
01:10:40,286 --> 01:10:42,706
So finally, the world years
ago got into the habit

1751
01:10:42,706 --> 01:10:44,926
of at least running web
servers in particular

1752
01:10:45,036 --> 01:10:46,006
as different username.

1753
01:10:46,006 --> 01:10:48,976
Sometimes "nobody"
literally, the username nobody

1754
01:10:48,976 --> 01:10:52,436
or dub dub dub or
Apache or HDPD,

1755
01:10:52,436 --> 01:10:54,086
it doesn't really
matter what it is,

1756
01:10:54,236 --> 01:10:55,916
it matters that it's not root.

1757
01:10:56,266 --> 01:10:57,396
But some problems arise,

1758
01:10:57,396 --> 01:10:59,916
especially in this popular
world these days of V-hosting

1759
01:10:59,916 --> 01:11:01,546
and web post, commercial
web post.

1760
01:11:01,546 --> 01:11:05,356
Because just think of
this, if you are customer A

1761
01:11:05,356 --> 01:11:08,146
and there's a customer B, and
you have someone like DreamHost

1762
01:11:08,146 --> 01:11:10,646
or the like, you each
have accounts with them

1763
01:11:10,866 --> 01:11:12,716
and you have your own
usernames and passwords

1764
01:11:12,716 --> 01:11:14,396
and you have your own home
directory, so to speak,

1765
01:11:14,396 --> 01:11:15,686
where you can store you code.

1766
01:11:15,956 --> 01:11:19,846
But the web server runs under
username Apache for instance,

1767
01:11:19,846 --> 01:11:21,796
Apache again being the
web server software.

1768
01:11:22,216 --> 01:11:26,446
In terms of permissions,
Apache is not you, obviously,

1769
01:11:26,446 --> 01:11:29,626
because you are A or you are B,
so you have different usernames.

1770
01:11:29,836 --> 01:11:31,926
But if Apache is the web
server and the web server needs

1771
01:11:31,926 --> 01:11:34,956
to obviously be able to see your
files in order to serve them up,

1772
01:11:35,326 --> 01:11:37,956
what kinds of permission do your
files need if you're familiar

1773
01:11:37,956 --> 01:11:40,786
with Linux file permissions
or Windows,

1774
01:11:40,786 --> 01:11:42,196
really file permissions
in general?

1775
01:11:43,096 --> 01:11:44,966
Your files have to be what's
called world readable,

1776
01:11:45,116 --> 01:11:45,606
typically.

1777
01:11:45,876 --> 01:11:48,086
You can do more fine grain
permissions, but the reality is

1778
01:11:48,086 --> 01:11:49,936
on most systems, the
easier approaches,

1779
01:11:50,196 --> 01:11:53,986
you're told to chmod your file
644, more on that in the future.

1780
01:11:54,226 --> 01:11:55,986
But make your files
world readable.

1781
01:11:55,986 --> 01:11:58,586
Why? Because you don't really
need the world to read them,

1782
01:11:58,586 --> 01:12:00,176
you need the web
server to be able

1783
01:12:00,176 --> 01:12:02,856
to read them including
specially your PHP code,

1784
01:12:02,856 --> 01:12:04,546
which we'll about-- were
about to start writing.

1785
01:12:05,066 --> 01:12:06,636
So, what's the implication,
though?

1786
01:12:06,636 --> 01:12:07,416
There's a few things.

1787
01:12:07,416 --> 01:12:08,946
If your files are world readable

1788
01:12:08,946 --> 01:12:11,416
so that this middle man Apache
can read them, that's great.

1789
01:12:11,416 --> 01:12:12,556
It makes the website work.

1790
01:12:12,976 --> 01:12:15,796
But it also means that someone
else can read your file, too.

1791
01:12:16,306 --> 01:12:19,986
>> That would probably
be the other customer.

1792
01:12:20,046 --> 01:12:21,726
>> The other customer,
customer B, right?

1793
01:12:21,726 --> 01:12:23,356
Because world readable
is world readable.

1794
01:12:23,586 --> 01:12:27,056
Now, if your files are being
served up on a web server,

1795
01:12:27,056 --> 01:12:30,346
that means you can see your
files at URLs like /hello.php.

1796
01:12:30,346 --> 01:12:32,286
So that means anyone

1797
01:12:32,286 --> 01:12:34,476
in the internet knows what
your files are called.

1798
01:12:34,716 --> 01:12:36,986
So, other customer
because he or she can log

1799
01:12:36,986 --> 01:12:40,346
in to the same server can
just enter your directory,

1800
01:12:40,346 --> 01:12:42,396
and even though they might
not be able to see all

1801
01:12:42,396 --> 01:12:44,546
of your files, if they
know what they're called,

1802
01:12:44,866 --> 01:12:46,986
they can then definitely
see your files

1803
01:12:46,986 --> 01:12:49,376
by just using a text editor
or some kind of program

1804
01:12:49,586 --> 01:12:50,796
that just opens these files.

1805
01:12:51,046 --> 01:12:53,636
Now, that's not such a big
deal for JavaScript, for CSS,

1806
01:12:53,696 --> 01:12:55,476
because frankly, who cares?

1807
01:12:55,476 --> 01:12:57,836
That stuff is by nature
of JavaScript and CSS,

1808
01:12:57,836 --> 01:13:00,036
going to be sent to the browser
in the whole world anyway.

1809
01:13:00,036 --> 01:13:03,296
And you might try to obfuscate,
as we'll discuss in a few weeks

1810
01:13:03,296 --> 01:13:05,546
with minification and
hiding things from users.

1811
01:13:05,706 --> 01:13:07,656
But you can't really protect
your intellectual property

1812
01:13:07,656 --> 01:13:10,046
when it comes to JavaScript
and CSS because the browser

1813
01:13:10,046 --> 01:13:11,276
and the whole world
have to see it.

1814
01:13:11,596 --> 01:13:14,036
But PHP, you might put
a lot of heart into it

1815
01:13:14,036 --> 01:13:15,846
and you've put a lot of
intellectual property

1816
01:13:15,846 --> 01:13:18,076
into your PHP code which
is really the secret sauce

1817
01:13:18,076 --> 01:13:19,216
of your business or whatever.

1818
01:13:19,676 --> 01:13:21,956
But now, the web
server needs to be able

1819
01:13:21,956 --> 01:13:23,856
to read it as can any customer.

1820
01:13:24,056 --> 01:13:24,926
So now, you are at risk

1821
01:13:24,926 --> 01:13:27,806
of the customer seeing all
the hard work you've done.

1822
01:13:28,046 --> 01:13:30,886
In fact, what might your files
contain out of necessity,

1823
01:13:31,616 --> 01:13:33,156
if familiar with
databases in the like?

1824
01:13:33,346 --> 01:13:33,436
Yeah?

1825
01:13:34,076 --> 01:13:37,766
>> The PHP would need
to contain the name

1826
01:13:37,766 --> 01:13:39,446
of the database and
the password.

1827
01:13:39,446 --> 01:13:40,096
>> Exactly.

1828
01:13:40,096 --> 01:13:42,326
Things like usernames and
passwords for databases,

1829
01:13:42,326 --> 01:13:45,376
for caching engines, for
Facebook APIs, whatever it is,

1830
01:13:45,656 --> 01:13:49,946
your PHP code might have some
more insight of it variables

1831
01:13:50,176 --> 01:13:51,706
that did need to be there

1832
01:13:51,956 --> 01:13:54,446
but you don't need the customer
be being able to see it.

1833
01:13:54,446 --> 01:13:57,036
So in short, running a web
server as Apache is great

1834
01:13:57,036 --> 01:13:59,756
for security of the whole
site, bad for the security

1835
01:13:59,756 --> 01:14:01,616
of customers A and B and C

1836
01:14:01,616 --> 01:14:03,146
who probably don't
even know each other

1837
01:14:03,146 --> 01:14:04,696
and certainly shouldn't
trust each other.

1838
01:14:05,036 --> 01:14:06,686
So, thankfully there's
a solution here

1839
01:14:06,796 --> 01:14:07,946
and it comes in different forms.

1840
01:14:07,946 --> 01:14:10,216
One of the solutions
for PHP is suPHP.

1841
01:14:10,396 --> 01:14:15,606
In the suPHP model,
customer A's code is executed

1842
01:14:15,606 --> 01:14:19,146
by a username called A,
the same user's username.

1843
01:14:19,196 --> 01:14:22,606
B's code is executed by
username B. In other words,

1844
01:14:22,606 --> 01:14:27,386
the web server sort of magically
transforms itself into user A

1845
01:14:27,386 --> 01:14:29,926
when it's time to execute A's
code and transforms itself

1846
01:14:29,926 --> 01:14:32,736
into user B when it's time to
execute B's code which means

1847
01:14:32,736 --> 01:14:35,706
if you screw up and have buggy
code and you're customer A,

1848
01:14:35,876 --> 01:14:38,696
whose files could you possibly
delete under this model?

1849
01:14:41,016 --> 01:14:42,116
Only your own.

1850
01:14:42,456 --> 01:14:44,496
And, you know, that still
might be unfortunate

1851
01:14:44,536 --> 01:14:47,016
but at least you're not
compromising anyone else

1852
01:14:47,156 --> 01:14:47,896
on the system.

1853
01:14:47,896 --> 01:14:49,776
And it's your own fault if
you delete your own files,

1854
01:14:49,776 --> 01:14:51,326
but it's a good thing that
you can't delete anyone

1855
01:14:51,326 --> 01:14:51,976
else's files.

1856
01:14:52,316 --> 01:14:53,726
This also solves another issue.

1857
01:14:53,726 --> 01:14:56,186
If your website is like
a commercial website even

1858
01:14:56,186 --> 01:14:58,546
if it's small with only hundreds
or thousands of customers,

1859
01:14:58,826 --> 01:15:01,146
and those customers need to
upload files like photos,

1860
01:15:01,146 --> 01:15:02,936
or videos or stuff that's
not meant to be public

1861
01:15:02,936 --> 01:15:05,376
in the Facebook sense but
fairly private at least

1862
01:15:05,376 --> 01:15:07,036
in the limited privacy sense.

1863
01:15:07,436 --> 01:15:11,626
So, the upside here is when
a user uploads a file now

1864
01:15:11,806 --> 01:15:15,196
and the web server is using
suPHP, that file will be saved

1865
01:15:15,196 --> 01:15:20,066
on the disk as owned by user
A and B's files will be saved

1866
01:15:20,066 --> 01:15:22,896
as user B. By contrast
in the other model

1867
01:15:22,896 --> 01:15:26,026
where everything gets run by
Apache, who saves the files?

1868
01:15:26,256 --> 01:15:29,656
Apache, which means
Apache owns the files

1869
01:15:29,936 --> 01:15:32,456
and that means the
only way to ensure

1870
01:15:32,456 --> 01:15:34,506
that they can be
accessed subsequently is

1871
01:15:34,506 --> 01:15:37,316
to make them world
readable which means all

1872
01:15:37,316 --> 01:15:39,976
of the new content your
users are uploading is going

1873
01:15:39,976 --> 01:15:41,486
to be readable by
customer A, and B,

1874
01:15:41,486 --> 01:15:43,276
and C, and D on the system.

1875
01:15:43,276 --> 01:15:44,806
So, in short, this is good

1876
01:15:44,806 --> 01:15:47,236
and this is not a feature
that's typically advertised

1877
01:15:47,236 --> 01:15:48,756
significantly by web post.

1878
01:15:48,756 --> 01:15:50,606
I don't even know if
DreamHost does it these days.

1879
01:15:50,606 --> 01:15:52,926
I'm going to guess they don't
because we didn't see mention

1880
01:15:52,926 --> 01:15:54,086
of it, but don't
hold me to that.

1881
01:15:54,346 --> 01:15:56,446
You might want to dig a little
deeper into the fine print.

1882
01:15:56,716 --> 01:15:59,246
But if you are using something
like a virtual private server,

1883
01:15:59,246 --> 01:16:01,606
you can also avoid this issue
altogether because at least

1884
01:16:01,606 --> 01:16:03,176
if you own the whole server,

1885
01:16:03,176 --> 01:16:04,876
even if it's a rented
virtual machine,

1886
01:16:05,036 --> 01:16:08,076
at least there's no other
customers on the same server.

1887
01:16:08,076 --> 01:16:10,566
So, again, something to be
mindful of so that, you know,

1888
01:16:10,566 --> 01:16:13,516
when you pay 895, 599,
whatever it is per month,

1889
01:16:13,516 --> 01:16:15,406
again you're getting
what you pay for.

1890
01:16:15,406 --> 01:16:17,766
And if you care about
your intellectual property

1891
01:16:17,766 --> 01:16:19,766
and the security of your
site, these are the kinds

1892
01:16:19,766 --> 01:16:22,696
of questions you should be
mindful of asking or reading

1893
01:16:22,696 --> 01:16:24,726
up on before signing up.

1894
01:16:24,796 --> 01:16:27,796
So, suPHP is something that will
be installed in the appliance.

1895
01:16:28,126 --> 01:16:31,086
So, for those who would like to
read up on the language itself,

1896
01:16:31,086 --> 01:16:33,556
this week, there will just be
recommended readings of sorts.

1897
01:16:33,856 --> 01:16:35,906
Realize that there are
some good tutorials online.

1898
01:16:35,906 --> 01:16:37,986
And again, if you have
a programming background

1899
01:16:37,986 --> 01:16:39,676
in any syntactically
similar language,

1900
01:16:39,676 --> 01:16:41,496
some of these might even be
boring which would be great

1901
01:16:41,596 --> 01:16:43,106
because it will walk
you through for loops

1902
01:16:43,106 --> 01:16:44,296
and while loops and the like.

1903
01:16:44,596 --> 01:16:45,926
So, we'll just do a
quick tour of some

1904
01:16:45,926 --> 01:16:48,716
of these syntactic details
tonight, but then focus on some

1905
01:16:48,716 --> 01:16:49,936
of the higher level concepts

1906
01:16:49,976 --> 01:16:53,226
that will be distinct
to web programming.

1907
01:16:53,756 --> 01:16:57,286
So, without further ado, one
of the more stupid details

1908
01:16:57,286 --> 01:16:58,206
but I just put it out there

1909
01:16:58,206 --> 01:16:59,406
because it's the
first thing you see.

1910
01:16:59,636 --> 01:17:01,746
Variables, again, start
with dollar signs in PHP

1911
01:17:01,746 --> 01:17:04,486
and here is the rule
as to what is valid.

1912
01:17:04,486 --> 01:17:06,606
In short, I would choose
variables in a sort

1913
01:17:06,606 --> 01:17:08,816
of normal way typically
with alphabetical letters

1914
01:17:08,816 --> 01:17:11,556
but there are some other things
you can use like underscores

1915
01:17:11,556 --> 01:17:12,786
and numbers and the like.

1916
01:17:13,306 --> 01:17:14,746
But again, we won't
spend too much time

1917
01:17:14,746 --> 01:17:16,476
on this kind of level of detail.

1918
01:17:17,046 --> 01:17:20,476
Data types, PHP is what's
called a loosely typed language

1919
01:17:20,476 --> 01:17:22,736
which means the data
types exist, kind of,

1920
01:17:23,126 --> 01:17:25,696
but they're not readily enforced
in the same way that they are

1921
01:17:25,696 --> 01:17:28,226
in java or in C or in C++.

1922
01:17:28,616 --> 01:17:30,216
So, what data types
exist, booleans,

1923
01:17:30,216 --> 01:17:32,046
integers, floats, and strings.

1924
01:17:32,496 --> 01:17:35,616
But when you declare a variable,
you do not specify its type.

1925
01:17:35,786 --> 01:17:37,206
It is inferred by the type

1926
01:17:37,206 --> 01:17:39,586
of value you actually
put inside of it.

1927
01:17:39,826 --> 01:17:41,746
So, if you say something
like $x,

1928
01:17:41,956 --> 01:17:43,856
because again dollar sign
means this is a variable,

1929
01:17:43,856 --> 01:17:45,856
$x is a very boring
name for a variable

1930
01:17:45,856 --> 01:17:49,776
but it's a variable equal sign,
one, two, three, semicolon.

1931
01:17:50,096 --> 01:17:51,676
That data-- The data type

1932
01:17:51,676 --> 01:17:55,096
of that value will be integer
even though I didn't specify it

1933
01:17:55,096 --> 01:17:55,626
as such.

1934
01:17:55,916 --> 01:18:01,666
If by contrast I say $x
equals 1.23 semicolon,

1935
01:18:01,666 --> 01:18:04,256
it's instead going to be what?

1936
01:18:05,626 --> 01:18:05,856
Yeah?

1937
01:18:06,146 --> 01:18:06,546
>> Float.

1938
01:18:06,756 --> 01:18:10,036
>> Float, a floating point
value, a real number.

1939
01:18:10,036 --> 01:18:11,836
If you instead say equals true

1940
01:18:11,836 --> 01:18:13,716
or equals false it's
going to be a boole.

1941
01:18:13,966 --> 01:18:18,516
If you instead say "hello",
it's going to be a string.

1942
01:18:18,866 --> 01:18:20,386
But that type is not invariant.

1943
01:18:20,456 --> 01:18:25,216
If you try to use a string
in a boolean context,

1944
01:18:25,416 --> 01:18:27,586
then you go get a lot
of implicit conversion.

1945
01:18:27,586 --> 01:18:29,116
So, in other words
in an if condition,

1946
01:18:29,406 --> 01:18:33,066
normally you would say
something like if x equals y

1947
01:18:33,426 --> 01:18:36,136
or you would say if true,
something like that.

1948
01:18:36,586 --> 01:18:38,426
If instead you say
if "hello", well,

1949
01:18:38,426 --> 01:18:42,426
hello will be implicitly
casted to a boolean

1950
01:18:42,726 --> 01:18:47,716
and because hello is not the
number zero, the boolean value

1951
01:18:47,716 --> 01:18:49,476
of hello is going to be true.

1952
01:18:49,906 --> 01:18:53,166
So, you can use strings
even as truth values

1953
01:18:53,166 --> 01:18:55,126
which can encourage
sloppy programming

1954
01:18:55,126 --> 01:18:56,516
and we'll see some
examples of these,

1955
01:18:56,926 --> 01:18:59,326
but it's also useful
sometimes and that it's not

1956
01:18:59,326 --> 01:19:01,306
as pedantic a language
as something like Java

1957
01:19:01,306 --> 01:19:03,636
where you are constantly
casting things back and forth.

1958
01:19:03,956 --> 01:19:04,066
Yeah?

1959
01:19:04,066 --> 01:19:05,736
>> Can you perform
string operations

1960
01:19:05,736 --> 01:19:09,406
on integers or vice versa?

1961
01:19:09,506 --> 01:19:09,926
>> Good question.

1962
01:19:09,926 --> 01:19:11,266
Can you perform string
operations

1963
01:19:11,266 --> 01:19:12,636
on integers and vice versa?

1964
01:19:12,706 --> 01:19:15,386
Yes, they will be up casted
to a string in that case

1965
01:19:15,386 --> 01:19:17,146
and become part of
the string itself.

1966
01:19:17,566 --> 01:19:19,486
And one of the motivations
for this is that PHP

1967
01:19:19,486 --> 01:19:21,616
from the start was really
designed to be web-centric,

1968
01:19:21,766 --> 01:19:23,966
and the reality is when
you're writing web software,

1969
01:19:24,166 --> 01:19:26,936
you're interacting with the
user entirely via strings.

1970
01:19:26,976 --> 01:19:28,746
Now, the user might
type in one, two, three,

1971
01:19:28,956 --> 01:19:32,776
but as we've seen via HTTP GETs
and talked about HTTP POST,

1972
01:19:33,036 --> 01:19:34,606
it's all text at
the end of the day.

1973
01:19:34,606 --> 01:19:37,646
There's no data type associated
with an HTML input field.

1974
01:19:37,956 --> 01:19:39,896
So, even though the user
might type one, two, three,

1975
01:19:39,896 --> 01:19:43,246
what's going to be sent to the
server is "one, two, three".

1976
01:19:43,496 --> 01:19:46,426
And so the fact that there is
this loose typing is reasonably

1977
01:19:46,426 --> 01:19:49,106
consistent with what you're
getting from the user anyway,

1978
01:19:49,306 --> 01:19:51,866
even though again it
can feel a little messy,

1979
01:19:51,866 --> 01:19:54,786
and it is in some sense,
but that's at least one

1980
01:19:54,786 --> 01:19:56,726
of the original motivations
for it.

1981
01:19:56,726 --> 01:20:00,606
In terms of objects
and collections, Java--

1982
01:20:00,606 --> 01:20:03,946
PHP has arrays and it also has
objects, more on those to come.

1983
01:20:04,166 --> 01:20:05,806
And there's also
things called resources.

1984
01:20:06,266 --> 01:20:08,446
Resource is something
like when you open a file,

1985
01:20:08,636 --> 01:20:11,966
what you get back is not the
file per se, you get sort

1986
01:20:11,966 --> 01:20:14,916
of like a pointer or a
reference in C or Java-speak.

1987
01:20:15,126 --> 01:20:18,436
And that reference is to
a resource which is sort

1988
01:20:18,436 --> 01:20:19,776
of like a special object

1989
01:20:19,916 --> 01:20:21,456
that contains interesting
information,

1990
01:20:21,456 --> 01:20:23,516
the size of the file, your
location and then the type

1991
01:20:23,516 --> 01:20:25,596
of it, and so forth,
details like that.

1992
01:20:25,926 --> 01:20:26,966
Null is null.

1993
01:20:26,966 --> 01:20:29,206
It's when you have
no value there.

1994
01:20:29,946 --> 01:20:33,706
You can have the value null as
a placeholder, but variables

1995
01:20:33,706 --> 01:20:36,686
in PHP as well see can
also be set or not set.

1996
01:20:36,686 --> 01:20:38,066
So, null is an actual value.

1997
01:20:38,066 --> 01:20:39,676
It doesn't mean the
absence of a value.

1998
01:20:39,976 --> 01:20:42,306
You can have the absence of
a value as we'll soon see.

1999
01:20:42,696 --> 01:20:43,636
And then there's mixed.

2000
01:20:43,636 --> 01:20:45,906
So, mixed isn't really a type
but you'll see these things

2001
01:20:45,906 --> 01:20:47,466
in documentation, in particular.

2002
01:20:47,806 --> 01:20:50,686
If you see on PHP.net
documentation

2003
01:20:50,686 --> 01:20:53,796
that says this function takes
mixed, what does that mean?

2004
01:20:54,126 --> 01:20:56,076
Well, it means it can take
any number of different types.

2005
01:20:56,076 --> 01:20:58,216
It can accept a string
or a number,

2006
01:20:58,376 --> 01:21:01,396
and this is where PHP is both
handy but also a little sloppy

2007
01:21:01,396 --> 01:21:04,226
and that it's not strictly
typed or strongly typed.

2008
01:21:04,396 --> 01:21:08,126
Number means integer or float
if the function doesn't care.

2009
01:21:08,366 --> 01:21:10,596
And a callback is
a function pointer.

2010
01:21:10,596 --> 01:21:12,956
We won't spend too much time on
those but you can pass function

2011
01:21:12,956 --> 01:21:16,536
around by pointers or by
references generally known

2012
01:21:16,536 --> 01:21:17,916
as a callback, in this case.

2013
01:21:18,776 --> 01:21:22,846
Another word on mixed, PHP
is very common for its design

2014
01:21:23,076 --> 01:21:25,576
of returning mixed data types.

2015
01:21:25,896 --> 01:21:28,166
So it's very common in
PHP for a function even

2016
01:21:28,166 --> 01:21:30,186
like date to return strings.

2017
01:21:30,756 --> 01:21:33,486
But if something goes wrong, it
could actually return a boole.

2018
01:21:33,486 --> 01:21:34,906
And what's it going to
return in that case?

2019
01:21:35,396 --> 01:21:38,876
False. So, it's very often,
it's very common rather

2020
01:21:38,876 --> 01:21:41,646
in PHP functions that
you'll-- 99% of the time,

2021
01:21:41,646 --> 01:21:42,926
it will return a
certain data type

2022
01:21:43,126 --> 01:21:44,916
but it could return
something very different.

2023
01:21:45,156 --> 01:21:49,106
So, learning to check for
that correctly is good

2024
01:21:49,106 --> 01:21:51,106
in the context of PHP.

2025
01:21:51,396 --> 01:21:53,396
So, I'll point that
along the way as well.

2026
01:21:53,796 --> 01:21:57,046
So now, some special variables
before we start writing

2027
01:21:57,046 --> 01:21:57,626
some code.

2028
01:21:58,096 --> 01:22:01,986
So, in PHP, there are
special global variables

2029
01:22:01,986 --> 01:22:04,336
that are called superglobals.

2030
01:22:04,336 --> 01:22:06,936
They are in scope, so
to speak, everywhere.

2031
01:22:06,936 --> 01:22:09,346
In any line of code you write
so long as it's executed

2032
01:22:09,346 --> 01:22:12,216
by a web server, you have
access to these variables.

2033
01:22:12,266 --> 01:22:13,256
They start with dollar sign,

2034
01:22:13,256 --> 01:22:14,896
start with underscores
and then all caps.

2035
01:22:15,316 --> 01:22:18,786
So, $_GET is a variable.

2036
01:22:18,816 --> 01:22:19,686
It's going to be an array.

2037
01:22:19,686 --> 01:22:23,566
It's going to be an associative
of array AKA hash table, AKA--

2038
01:22:23,566 --> 01:22:26,796
not really an object, but
it's a key value store.

2039
01:22:27,886 --> 01:22:30,626
What do you think is in
that variable called $_GET?

2040
01:22:31,566 --> 01:22:31,886
Take a guess.

2041
01:22:32,186 --> 01:22:32,286
Yeah.

2042
01:22:32,896 --> 01:22:37,116
>> All the things that are in
the URL that performs them.

2043
01:22:37,116 --> 01:22:37,566
>> Exactly.

2044
01:22:37,566 --> 01:22:40,666
So, Q equals Harvard,
foo equals bar,

2045
01:22:40,666 --> 01:22:41,456
bass equals coax [phonetic],

2046
01:22:41,456 --> 01:22:44,676
whatever the user submitted via
the form is going to be handed

2047
01:22:44,676 --> 01:22:47,446
to on a platter, so to speak,
in the form of this variable

2048
01:22:47,586 --> 01:22:49,716
so that if you want the
value of Q, you just have

2049
01:22:49,716 --> 01:22:50,916
to look inside that variable.

2050
01:22:50,916 --> 01:22:52,816
And this is one of the things
that's compelling about PHP.

2051
01:22:52,816 --> 01:22:56,206
In contrast, language like Perl,
which is very popular years ago

2052
01:22:56,206 --> 01:22:59,696
for web programming, you either
jump through hoops or use an--

2053
01:22:59,696 --> 01:23:03,136
a popular library to actually
parse the HTTP requests

2054
01:23:03,306 --> 01:23:06,376
to get access to
the keys and values.

2055
01:23:06,476 --> 01:23:09,236
PHP and frameworks like Django
and Ruby on Rails make this

2056
01:23:09,236 --> 01:23:10,786
so much easier these days.

2057
01:23:10,786 --> 01:23:12,736
And PHP does this
to the superglobals.

2058
01:23:13,056 --> 01:23:14,956
$_POST, well I guess what

2059
01:23:14,956 --> 01:23:17,386
that does anything you
post ends up in that array.

2060
01:23:17,716 --> 01:23:19,766
$_FILES is great, too.

2061
01:23:19,766 --> 01:23:22,066
If you do let the user
upload photos or whatnot,

2062
01:23:22,316 --> 01:23:25,106
you're handed the files in the
form of an array, you don't have

2063
01:23:25,106 --> 01:23:28,126
to parse it or figure out how to
deal with file uploads yourself,

2064
01:23:28,126 --> 01:23:29,356
super easy in that sense.

2065
01:23:29,716 --> 01:23:34,436
Some of the more esoteric
ones now are SERVER and ENV.

2066
01:23:34,996 --> 01:23:37,456
SERVER contains things
like the user's IP address,

2067
01:23:37,856 --> 01:23:38,916
they're user agents.

2068
01:23:39,156 --> 01:23:40,076
What was user agent?

2069
01:23:41,026 --> 01:23:41,146
Yeah?

2070
01:23:42,026 --> 01:23:44,046
>> The browser and
the operating system.

2071
01:23:44,046 --> 01:23:46,216
>> The browser and the operating
system, that cryptic string

2072
01:23:46,216 --> 01:23:49,086
that is apparently being
sent every time the browser

2073
01:23:49,086 --> 01:23:49,606
visits you.

2074
01:23:49,956 --> 01:23:53,406
And this ENV variable is rarely
used but it gives you access

2075
01:23:53,466 --> 01:23:55,306
to lower level details
on the machine.

2076
01:23:55,636 --> 01:23:57,766
COOKIE is nice and we'll
come back to that next week.

2077
01:23:57,766 --> 01:24:01,546
But COOKIE stores cookies,
key values that you might send

2078
01:24:01,546 --> 01:24:02,906
or receive from browsers.

2079
01:24:03,236 --> 01:24:07,036
$_REQUEST has all of
the interesting details

2080
01:24:07,036 --> 01:24:08,086
about the user's request.

2081
01:24:08,356 --> 01:24:09,986
What path did they request?

2082
01:24:10,546 --> 01:24:14,876
What was-- Was there a question
mark in the URL with parameters?

2083
01:24:14,876 --> 01:24:17,886
So, if you access to the
raw details before they end

2084
01:24:17,886 --> 01:24:21,356
up in a more user-friendly place
like GET and POST and COOKIE.

2085
01:24:21,716 --> 01:24:25,296
And SESSION is one of the
most powerful ones, arguably.

2086
01:24:25,296 --> 01:24:27,806
It is the thing that allows
you to implement a state

2087
01:24:28,016 --> 01:24:29,966
and implement things
like shopping carts.

2088
01:24:30,346 --> 01:24:33,766
Even though HTTP, as we sort
of began to discuss on Monday,

2089
01:24:33,766 --> 01:24:36,106
is stateless and that as
soon as you visit a page,

2090
01:24:36,106 --> 01:24:38,276
and you disconnect from the
server and the page is loaded,

2091
01:24:38,536 --> 01:24:40,746
you no longer have a connection
to the server anymore.

2092
01:24:41,146 --> 01:24:45,446
Via COOKIES, you can remember
or rather a server can remember

2093
01:24:45,446 --> 01:24:47,336
that you're logged in
and we likened the COOKIE

2094
01:24:47,336 --> 01:24:49,246
on Monday to like a hand stamp.

2095
01:24:49,246 --> 01:24:50,156
And what is SESSION?

2096
01:24:50,526 --> 01:24:52,756
SESSION is this amazing
superglobe on PHP

2097
01:24:52,756 --> 01:24:55,986
that you the programmer can
put anything you want in it,

2098
01:24:55,986 --> 01:24:59,426
any keys and values, any
numbers, any strings, any ISBNs,

2099
01:24:59,566 --> 01:25:01,656
of things a user put
in their shopping car.

2100
01:25:02,126 --> 01:25:04,926
And the next time the
user visits your website,

2101
01:25:05,126 --> 01:25:07,856
so long as their cookie
hasn't expired, you can access

2102
01:25:07,856 --> 01:25:11,286
that exact same data
in $_SESSION,

2103
01:25:11,526 --> 01:25:12,896
magically, so to speak.

2104
01:25:12,896 --> 01:25:15,646
You don't have to worry about
figuring out who the user is.

2105
01:25:15,956 --> 01:25:18,316
PHP and in turn the
web server do all

2106
01:25:18,316 --> 01:25:20,156
that for you out of the box.

2107
01:25:20,266 --> 01:25:23,256
So again, another upside
of a language like this.

2108
01:25:24,206 --> 01:25:26,516
So let's actually see this
in action rather than talking

2109
01:25:26,516 --> 01:25:28,756
about it in the abstract.

2110
01:25:28,756 --> 01:25:34,066
So, last time, recall that we
had this file, let me go cs75.

2111
01:25:34,066 --> 01:25:38,376
net lectures where we
posted a video and more.

2112
01:25:38,746 --> 01:25:40,606
And in our source
code directory,

2113
01:25:40,606 --> 01:25:42,596
typically if we write some
source code on the fly,

2114
01:25:42,596 --> 01:25:45,236
during lecture I'll clean it up
and then upload it the next day

2115
01:25:45,236 --> 01:25:46,946
to the server if you want to
play around so you don't have

2116
01:25:46,946 --> 01:25:48,176
to write down code and whatnot.

2117
01:25:48,506 --> 01:25:50,276
Or if we have some stuff in
advance, I'll put it there.

2118
01:25:50,276 --> 01:25:52,296
So this is from Monday,
and we had this site,

2119
01:25:52,296 --> 01:25:53,596
Google and Google Search.

2120
01:25:53,826 --> 01:25:56,446
And when I submitted this,
recall that if I search

2121
01:25:56,446 --> 01:26:01,306
for Harvard enter, I ended up
at-- enter, oh, we broke it.

2122
01:26:01,306 --> 01:26:02,716
I should fix-- I will fix this.

2123
01:26:02,716 --> 01:26:04,926
Recall that I broke it
at the very end of class,

2124
01:26:05,686 --> 01:26:09,076
by changing the value of Q
to something else altogether

2125
01:26:09,076 --> 01:26:12,156
because I think I said like QQQ
or something random like that.

2126
01:26:12,546 --> 01:26:16,636
So, let's now instead of using
Google to do our back end,

2127
01:26:16,636 --> 01:26:18,936
let's instead write
the back end ourselves.

2128
01:26:18,986 --> 01:26:20,456
So I'm going to go
ahead and do this.

2129
01:26:20,456 --> 01:26:23,336
First let me grab this page
source and I'm going to open

2130
01:26:23,336 --> 01:26:25,136
up our little text
editor as before.

2131
01:26:26,236 --> 01:26:28,776
And yeah, this is what
I did wrong last time.

2132
01:26:28,826 --> 01:26:32,406
So now it's back to Q. But this
time, I'm going to change this

2133
01:26:32,406 --> 01:26:34,546
to point at my own server.

2134
01:26:34,816 --> 01:26:37,246
So then a word on a server,
if I scroll over here,

2135
01:26:37,246 --> 01:26:40,496
this is my CS50 Appliance, the
virtual machine that in a week

2136
01:26:40,496 --> 01:26:42,436
and a half's time, we'll
start using as well,

2137
01:26:42,596 --> 01:26:44,636
and it's in Linux
computer, but more than that,

2138
01:26:44,636 --> 01:26:45,996
even though it looks
like a desktop

2139
01:26:45,996 --> 01:26:49,156
with a little Start-like menu
in Windows, it's still a server,

2140
01:26:49,326 --> 01:26:51,026
and I can see this as follows.

2141
01:26:51,026 --> 01:26:58,796
If I go ahead and inside of
the appliance I visit, Google,

2142
01:26:58,796 --> 01:27:03,556
I see Google, but if instead
I do http://local host,

2143
01:27:03,876 --> 01:27:08,446
local host is the common name
for a Linux computer when you're

2144
01:27:08,446 --> 01:27:09,706
on the Linux computer itself.

2145
01:27:09,706 --> 01:27:12,166
And this is true in Mac OS as
well and sort of in Windows.

2146
01:27:12,356 --> 01:27:14,166
Local host refers to
the computer you're on.

2147
01:27:14,536 --> 01:27:19,546
So when I visit http://local
host and let's just say slash,

2148
01:27:19,546 --> 01:27:25,016
enter, I should see the root
directory of the web server.

2149
01:27:25,186 --> 01:27:26,216
So this is what I'm seeing.

2150
01:27:26,216 --> 01:27:28,416
The fact that I'm seeing
this page, and actually,

2151
01:27:28,416 --> 01:27:30,436
it tells us literally what
it is, "This page is used

2152
01:27:30,436 --> 01:27:32,916
to test the proper operation
of the Apache HTTP server

2153
01:27:32,916 --> 01:27:33,976
after it's been installed.

2154
01:27:33,976 --> 01:27:36,106
If you can read this page, it
means the web server installed

2155
01:27:36,106 --> 01:27:37,626
at this site working
properly but it's not

2156
01:27:37,626 --> 01:27:38,476
yet being configured."

2157
01:27:38,716 --> 01:27:40,646
So that's great, that's
exactly what I wanted to see.

2158
01:27:40,646 --> 01:27:43,796
Some mentioned that web server
is working and now it's up to me

2159
01:27:43,796 --> 01:27:46,166
to actually populate
it with some data.

2160
01:27:46,596 --> 01:27:48,446
Now I can do something else now.

2161
01:27:49,136 --> 01:27:51,026
And these kinds of steps
if you're unfamiliar,

2162
01:27:51,026 --> 01:27:53,916
we will explain in the first--
before the first project.

2163
01:27:54,116 --> 01:27:55,896
What I've done is, right
now, is I've opened

2164
01:27:55,896 --> 01:27:57,346
up a so-called terminal window.

2165
01:27:57,636 --> 01:28:00,286
This is an old school
black and white interface

2166
01:28:00,286 --> 01:28:02,726
for navigating the
contents of a computer.

2167
01:28:02,726 --> 01:28:04,486
It's like the DOS
prompt of yesteryear.

2168
01:28:04,486 --> 01:28:05,986
Mac OS, it's the
terminal window.

2169
01:28:06,136 --> 01:28:09,536
Windows sort of has an analog in
the command prompt, but it's not

2170
01:28:09,536 --> 01:28:11,956
as flexible as on
Linux and Mac OS.

2171
01:28:12,466 --> 01:28:14,066
And I can do a few things here.

2172
01:28:14,066 --> 01:28:16,586
Again, we'll document
this more in the future,

2173
01:28:16,826 --> 01:28:19,696
but this is fairly archean
command for making a directory,

2174
01:28:19,826 --> 01:28:22,416
mkdir, space, the
name of the directory.

2175
01:28:22,416 --> 01:28:24,106
Now I'm going to go
ahead and hit enter.

2176
01:28:24,336 --> 01:28:26,746
And what that will do for
me, ignore the control C,

2177
01:28:27,066 --> 01:28:29,766
is I can now do cd public html

2178
01:28:29,996 --> 01:28:31,936
and that stands for
change directory.

2179
01:28:32,196 --> 01:28:35,506
And in change directory,
now I am inside of this,

2180
01:28:35,506 --> 01:28:37,646
so cd is like double
clicking a folder

2181
01:28:37,866 --> 01:28:40,306
in a modern operating system
which then opens a new window.

2182
01:28:40,606 --> 01:28:43,086
So cd has now put me
inside of public html.

2183
01:28:43,576 --> 01:28:46,236
So now I'm going to
go ahead and do this.

2184
01:28:46,236 --> 01:28:50,796
I'm going to go ahead and run
a command like gedit hello

2185
01:28:50,796 --> 01:28:53,846
or let's do google.html.

2186
01:28:54,296 --> 01:28:56,646
Gedit happens to be a
text editor for Linux,

2187
01:28:56,646 --> 01:28:59,316
so it's like a text edit,
it's like notepad.exe,

2188
01:28:59,376 --> 01:29:01,446
but this one's a little nicer

2189
01:29:01,446 --> 01:29:03,556
and that it supports something
called syntax highlighting,

2190
01:29:03,556 --> 01:29:05,296
whereby my code will
be colorized

2191
01:29:05,296 --> 01:29:06,396
to be more user-friendly.

2192
01:29:06,766 --> 01:29:08,876
So let me go ahead and copy
what we wrote on Monday

2193
01:29:09,316 --> 01:29:11,296
over here and paste it in.

2194
01:29:11,406 --> 01:29:13,276
So this is what I mean
by syntax highlighted.

2195
01:29:13,276 --> 01:29:14,896
It's just pink and
purple and whatnot,

2196
01:29:14,896 --> 01:29:17,926
just to draw our attention to
semantically the different parts

2197
01:29:18,106 --> 01:29:20,906
of the web page, and now I'm
going to go ahead and hit save.

2198
01:29:22,116 --> 01:29:24,576
So control S, or I
can go to file menu.

2199
01:29:24,576 --> 01:29:26,506
And now, let me go back
to that terminal window,

2200
01:29:26,506 --> 01:29:29,126
and again I'm back in Linux
here, and I'm going to go ahead

2201
01:29:29,126 --> 01:29:33,506
and do ls, and notice I have
a file, called google.html.

2202
01:29:33,846 --> 01:29:35,266
And I can do all
sorts of commands.

2203
01:29:35,266 --> 01:29:38,186
There's the cat command which
shows you the contents of files.

2204
01:29:38,186 --> 01:29:40,626
There's the more command which
shows you the contents of files.

2205
01:29:40,936 --> 01:29:42,076
You can do any number of things.

2206
01:29:42,076 --> 01:29:43,576
I can accidentally delete it

2207
01:29:43,576 --> 01:29:45,866
with the RM command,
don't do that.

2208
01:29:46,266 --> 01:29:49,506
But I can do all sorts of things
at the so-called command line

2209
01:29:49,786 --> 01:29:52,256
that I could with a mouse and
a keyboard, traditionally.

2210
01:29:52,896 --> 01:29:55,886
So, what's the takeaway here?

2211
01:29:56,226 --> 01:29:59,196
Now that I have google.html,
notice that I have it

2212
01:29:59,196 --> 01:30:02,426
in my public html directory,
but if you can infer,

2213
01:30:02,496 --> 01:30:03,716
who am I at the moment?

2214
01:30:03,716 --> 01:30:05,096
What's my user name?

2215
01:30:06,616 --> 01:30:06,766
Yeah?

2216
01:30:08,086 --> 01:30:08,866
>> Jharvard?

2217
01:30:08,866 --> 01:30:10,286
>> Jharvard, so I
am John Harvard.

2218
01:30:10,286 --> 01:30:10,866
Why is that?

2219
01:30:10,866 --> 01:30:12,906
Well we configured this
particular virtual machine

2220
01:30:13,156 --> 01:30:14,976
with a generic username,
John Harvard,

2221
01:30:14,976 --> 01:30:17,386
so that anyone can use it
and so that in documentation

2222
01:30:17,386 --> 01:30:19,866
and whatnot, we can tell you
exactly what your username is.

2223
01:30:19,866 --> 01:30:20,966
It just gets a little
more annoying

2224
01:30:20,996 --> 01:30:22,446
if everyone has unique addresses

2225
01:30:22,446 --> 01:30:24,236
because troubleshooting
is harder and so forth.

2226
01:30:24,406 --> 01:30:26,696
So just assumed you've signed
up for a web hosting company.

2227
01:30:26,906 --> 01:30:29,976
They have arbitrarily told you
your username will be jharvard

2228
01:30:29,976 --> 01:30:32,006
instead of A or B. So now I'm

2229
01:30:32,006 --> 01:30:34,116
in John Harvard so-called
home directory,

2230
01:30:34,426 --> 01:30:36,276
the folder that I get
for all my storage.

2231
01:30:36,496 --> 01:30:39,576
And in there I created the
public html subdirectory

2232
01:30:39,686 --> 01:30:40,346
or folder.

2233
01:30:40,526 --> 01:30:42,896
And in there, just to be clear,
what's inside of public html

2234
01:30:43,406 --> 01:30:46,536
at this point in the story?

2235
01:30:46,656 --> 01:30:47,286
Google.html.

2236
01:30:47,486 --> 01:30:49,826
So how do I visit google.html?

2237
01:30:50,456 --> 01:30:53,986
Well, I'm going to open my
Chrome browser and rather

2238
01:30:53,986 --> 01:30:58,216
than visit just local host,
I'm going to actually do this,

2239
01:30:58,216 --> 01:31:05,836
http://localhost/ tilde
jharvard/google.html.

2240
01:31:06,236 --> 01:31:09,586
So this is a convention
on a lot of web servers.

2241
01:31:09,586 --> 01:31:12,656
When you want to access
a specific person's home

2242
01:31:12,656 --> 01:31:16,856
directory, you do slash tilde
username, slash filename.

2243
01:31:17,176 --> 01:31:18,866
You do not type what apparently?

2244
01:31:19,336 --> 01:31:20,846
Public html.

2245
01:31:20,976 --> 01:31:25,656
So public html is implied by the
fact that you're using the URL,

2246
01:31:25,656 --> 01:31:28,186
so don't type public
html in URL itself.

2247
01:31:28,186 --> 01:31:29,936
And now I'm going to
go ahead and hit enter.

2248
01:31:30,316 --> 01:31:33,846
And voila-- damn it, broken.

2249
01:31:34,666 --> 01:31:35,996
So what does this mean?

2250
01:31:36,076 --> 01:31:38,126
First of all, which--
what's the status code here?

2251
01:31:38,126 --> 01:31:38,756
Has anyone spot it?

2252
01:31:40,196 --> 01:31:40,306
Yeah?

2253
01:31:40,756 --> 01:31:41,076
>> Forbidden.

2254
01:31:41,556 --> 01:31:44,596
>> Forbidden, 403, you can see
it in the tab at the very top.

2255
01:31:44,596 --> 01:31:46,616
So that's one of those
more archean status codes.

2256
01:31:46,616 --> 01:31:48,466
404 is a little more
common, File Not Found.

2257
01:31:48,636 --> 01:31:50,266
File is there, but
I'm forbidden.

2258
01:31:50,266 --> 01:31:52,276
So just high level, in
English, what does this mean?

2259
01:31:52,726 --> 01:31:52,816
Yeah?

2260
01:31:53,816 --> 01:31:56,016
>> Didn't set the permissions.

2261
01:31:56,016 --> 01:31:57,066
>> I haven't set
the permissions.

2262
01:31:57,066 --> 01:31:59,146
So we talked early about the
idea of global permissions.

2263
01:31:59,396 --> 01:32:01,476
Now let's frame this
in a Linux context.

2264
01:32:01,476 --> 01:32:03,196
And again, Mac OS
is very similar.

2265
01:32:03,516 --> 01:32:07,386
Windows isn't quite the same
process, but the ideas exist

2266
01:32:07,386 --> 01:32:08,596
on all of these platforms.

2267
01:32:08,596 --> 01:32:11,126
So, let me do ls for list again.

2268
01:32:11,216 --> 01:32:13,426
This is like dir if you
come from a Windows world.

2269
01:32:13,676 --> 01:32:16,006
And I see google.html,
not all that enlightening.

2270
01:32:16,216 --> 01:32:17,466
But I can do a long listing.

2271
01:32:17,466 --> 01:32:20,036
So ls -l and then hit enter.

2272
01:32:20,326 --> 01:32:22,776
So -l, for those less
familiar with Linux

2273
01:32:22,776 --> 01:32:26,016
or unfamiliar is the switch,
the command line switch or flag

2274
01:32:26,016 --> 01:32:27,506
or option, whatever
you want to call it,

2275
01:32:27,736 --> 01:32:29,656
that modifies the
behavior of the command

2276
01:32:29,656 --> 01:32:31,566
which in this case
is called ls, enter.

2277
01:32:31,766 --> 01:32:33,186
And now I see more outputs.

2278
01:32:33,256 --> 01:32:34,096
What do I see now?

2279
01:32:34,686 --> 01:32:38,076
I see first who owns the
file, what is their group

2280
01:32:38,076 --> 01:32:40,026
and by default the
appliance is configured

2281
01:32:40,026 --> 01:32:42,266
so that there's a students group
and there's only one student

2282
01:32:42,266 --> 01:32:43,546
for everyone called jharvard.

2283
01:32:43,686 --> 01:32:44,986
But when you install
your appliance,

2284
01:32:44,986 --> 01:32:46,366
you're not sharing
the same appliance.

2285
01:32:46,366 --> 01:32:49,646
You have your own
copy of the appliance

2286
01:32:49,646 --> 01:32:51,606
with on jharvard account.

2287
01:32:51,606 --> 01:32:54,616
This means it is 424 bytes

2288
01:32:54,616 --> 01:32:57,366
which means 424 characters
I typed in to that file.

2289
01:32:57,626 --> 01:32:58,996
This is when we last edited it.

2290
01:32:59,226 --> 01:33:00,366
This is the name of the file.

2291
01:33:00,366 --> 01:33:03,856
And I skip the most interesting
part which is over here.

2292
01:33:04,676 --> 01:33:07,056
Now, this is maybe
a little cryptic

2293
01:33:07,056 --> 01:33:09,346
but rw generally
denotes read and write.

2294
01:33:09,846 --> 01:33:11,996
And what we have
here is an indication

2295
01:33:11,996 --> 01:33:13,236
of three types of permissions.

2296
01:33:13,366 --> 01:33:14,816
So this is a very crash course.

2297
01:33:15,176 --> 01:33:16,876
Again, you don't need to
commit all of this to memory

2298
01:33:16,876 --> 01:33:19,296
yet because they'll come up
again in the actual projects.

2299
01:33:19,296 --> 01:33:21,706
But what we've just
done here is--

2300
01:33:22,096 --> 01:33:24,806
let me actually copy
and paste this.

2301
01:33:25,546 --> 01:33:27,326
We have this sequence here.

2302
01:33:27,866 --> 01:33:31,836
What in the world
does this mean?

2303
01:33:31,976 --> 01:33:33,626
Well first, I'm going to
cheat and I'm going to get rid

2304
01:33:33,626 --> 01:33:37,006
of this one, the first dash is
either a D if it's a directory,

2305
01:33:37,296 --> 01:33:39,556
or a hyphen if it's a
file, something else

2306
01:33:39,556 --> 01:33:40,296
if it's something else.

2307
01:33:40,296 --> 01:33:41,806
But for now, let's just
assume that directories

2308
01:33:41,806 --> 01:33:42,836
and files are all that exist.

2309
01:33:43,416 --> 01:33:46,016
So now there's this and let
me put some spaces then.

2310
01:33:46,306 --> 01:33:48,806
It looks like we have a
pattern of triples here.

2311
01:33:49,146 --> 01:33:54,216
The first triple is
the owner, so to speak.

2312
01:33:54,216 --> 01:33:58,076
The second sequence is the
group, in this case, students.

2313
01:33:58,116 --> 01:34:00,086
And then the last is the world.

2314
01:34:00,386 --> 01:34:01,926
So what is the implication
right now?

2315
01:34:01,956 --> 01:34:03,846
The owner can read
and write this file.

2316
01:34:04,386 --> 01:34:06,236
The group, students,
can read and write.

2317
01:34:06,526 --> 01:34:08,716
That feels a little
worrisome, but in this case,

2318
01:34:09,106 --> 01:34:11,126
the virtual machine
is on my own computer.

2319
01:34:11,516 --> 01:34:13,826
There's a students group
but I'm the only student.

2320
01:34:14,016 --> 01:34:15,856
So this is kind of immaterial.

2321
01:34:15,886 --> 01:34:18,526
So it's not great but not
bad, it doesn't really--

2322
01:34:18,526 --> 01:34:19,916
it's not applicable
at the moment.

2323
01:34:20,286 --> 01:34:21,656
The whole world though
can read this

2324
01:34:21,926 --> 01:34:23,866
and that's what I
want for an html file.

2325
01:34:24,226 --> 01:34:26,236
So it feels like my
permissions are right.

2326
01:34:27,496 --> 01:34:29,496
What else could be wrong then?

2327
01:34:30,436 --> 01:34:33,066
Again context is web
server is running as Apache

2328
01:34:33,066 --> 01:34:35,356
or some username
that's not me right now,

2329
01:34:35,866 --> 01:34:38,266
but we have to give
him access to it.

2330
01:34:38,266 --> 01:34:38,333
Yeah?

2331
01:34:38,333 --> 01:34:40,246
>> I have a question.

2332
01:34:40,846 --> 01:34:40,936
>> OK.

2333
01:34:42,016 --> 01:34:43,066
[ Inaudible Remark ]

2334
01:34:43,066 --> 01:34:46,196
The last dash?

2335
01:34:46,526 --> 01:34:47,316
In this case, no.

2336
01:34:47,316 --> 01:34:49,856
This is actually OK and
others would be possible.

2337
01:34:49,856 --> 01:34:52,776
Technically rw or
r would be fine.

2338
01:34:52,996 --> 01:34:55,746
Or even rw nothing
r would be fine.

2339
01:34:55,976 --> 01:34:58,106
Point is that the world
has to be able to read it.

2340
01:34:58,446 --> 01:35:00,006
But what else does the
world have to be able

2341
01:35:00,006 --> 01:35:01,866
to have access to, do you think?

2342
01:35:02,436 --> 01:35:02,666
>> Directory.

2343
01:35:02,746 --> 01:35:03,866
>> The directory, right?

2344
01:35:03,866 --> 01:35:05,266
We got to go one level higher.

2345
01:35:05,266 --> 01:35:06,296
So how can I do this?

2346
01:35:06,296 --> 01:35:10,096
Well, when I did ls -l a moment
ago, I only saw the file.

2347
01:35:10,096 --> 01:35:15,436
Let me do ls -al which is
all in long or I can do this,

2348
01:35:15,786 --> 01:35:18,186
you can combine switches
typically in Linux just

2349
01:35:18,186 --> 01:35:19,976
for our convenience
like this, al.

2350
01:35:20,386 --> 01:35:21,386
Now I see more.

2351
01:35:21,666 --> 01:35:23,846
The first two lines
are dot and dot-dot.

2352
01:35:23,846 --> 01:35:27,706
What does dot represent
in a typical file system?

2353
01:35:29,496 --> 01:35:29,976
Sure.

2354
01:35:30,016 --> 01:35:31,656
[ Inaudible Remark ]

2355
01:35:31,656 --> 01:35:34,006
Close. What does dot represent?

2356
01:35:34,956 --> 01:35:35,876
Oh, let me change the question.

2357
01:35:35,876 --> 01:35:37,036
What does dot-dot represent?

2358
01:35:37,346 --> 01:35:37,836
Excellent.

2359
01:35:37,936 --> 01:35:40,236
It means the directory
above, so dot-dot.

2360
01:35:40,446 --> 01:35:42,256
So dot, though, by
contrast represents?

2361
01:35:42,846 --> 01:35:44,406
>> Current folder.

2362
01:35:44,546 --> 01:35:46,006
>> The current folder,
the one that you're in.

2363
01:35:46,006 --> 01:35:49,036
So dot is where you are,
dot-dot is your so-called parent

2364
01:35:49,216 --> 01:35:52,626
which just means the thing
you're inside of that the--

2365
01:35:52,806 --> 01:35:55,366
that what the parent folder is
that your folder is inside of.

2366
01:35:55,646 --> 01:35:58,566
So dot here refers to a
directory called public html.

2367
01:35:58,896 --> 01:36:01,956
Dot-dot refers to
my home directory.

2368
01:36:02,556 --> 01:36:07,446
And now-- I know what it is.

2369
01:36:08,216 --> 01:36:09,186
Damn it. OK.

2370
01:36:09,666 --> 01:36:11,606
So, I'm going to have to
fake the story slightly

2371
01:36:11,606 --> 01:36:13,136
for just a moment.

2372
01:36:13,136 --> 01:36:14,426
Everything is actually correct.

2373
01:36:14,596 --> 01:36:17,086
There is another secret
setting that I changed earlier

2374
01:36:17,086 --> 01:36:17,966
in the week while playing

2375
01:36:17,966 --> 01:36:19,536
with the virtual machine
that explains this.

2376
01:36:19,536 --> 01:36:22,876
It's a feature called SELinux
for security enhanced Linux

2377
01:36:22,876 --> 01:36:25,926
which disallows anyone including
John Harvard from using the web.

2378
01:36:26,256 --> 01:36:28,436
So let me see if I
can quickly fix this,

2379
01:36:29,066 --> 01:36:31,576
but this was a wonderful stroll
down the diagnostic techniques

2380
01:36:31,576 --> 01:36:33,416
that would have led
us to the solution.

2381
01:36:35,516 --> 01:36:48,476
[ Pause ]

2382
01:36:48,976 --> 01:36:51,476
Uh-huh, oops, and we go here.

2383
01:36:52,886 --> 01:36:55,556
OK. So, OK.

2384
01:36:56,026 --> 01:36:59,476
So, this is a detail you
will not trip over yourself

2385
01:36:59,476 --> 01:37:01,866
because by default what I just
did is already done for you.

2386
01:37:01,866 --> 01:37:03,876
It's just I disabled that we're
playing around the other day.

2387
01:37:04,166 --> 01:37:07,096
This was an additional security
mechanism called SELinux

2388
01:37:07,096 --> 01:37:10,146
which comes with flavors of
Linux like Fedora and CentOS,

2389
01:37:10,146 --> 01:37:13,466
and Redhat and it's meant to
lock down systems even more.

2390
01:37:13,946 --> 01:37:16,566
But doesn't matter because the
story we told is still very much

2391
01:37:16,566 --> 01:37:17,036
the same.

2392
01:37:17,036 --> 01:37:20,326
In fact, I can simulate now how
we could have created a problem

2393
01:37:20,326 --> 01:37:21,906
for ourselves as follows.

2394
01:37:21,906 --> 01:37:23,246
Let me go into this directory

2395
01:37:23,466 --> 01:37:24,856
and everything now
looks correct.

2396
01:37:24,986 --> 01:37:27,786
All of this is good because
it means a few things.

2397
01:37:27,786 --> 01:37:29,766
Google.html is readable
by the world.

2398
01:37:30,136 --> 01:37:32,556
What do you think x means
for both dot and dot-dot?

2399
01:37:33,006 --> 01:37:33,376
>> Executable.

2400
01:37:33,716 --> 01:37:34,546
>> Executable.

2401
01:37:34,636 --> 01:37:38,056
Now normally, executable
means like execute a file,

2402
01:37:38,056 --> 01:37:40,606
run a program, but that's
not the case for directories

2403
01:37:40,606 --> 01:37:42,356
because notice the D and the D?

2404
01:37:42,506 --> 01:37:44,976
For directories, if a
directory is executable,

2405
01:37:44,976 --> 01:37:47,276
that means someone
can get into it.

2406
01:37:47,586 --> 01:37:50,836
They can't necessarily read it
and see the contents or is read.

2407
01:37:50,956 --> 01:37:54,026
Execute means they can do
the equivalent of cd into it

2408
01:37:54,126 --> 01:37:57,416
or they can visit the URL that
contains that directory's name.

2409
01:37:57,646 --> 01:37:58,896
So the fact that this is x,

2410
01:37:58,896 --> 01:38:01,626
this x and this is r
is actually perfect.

2411
01:38:01,756 --> 01:38:02,626
That's what we want.

2412
01:38:02,826 --> 01:38:04,506
But I can simulate
it being wrong.

2413
01:38:04,506 --> 01:38:07,276
Suppose that by default
when I'd created this file,

2414
01:38:08,236 --> 01:38:09,466
it looked like this.

2415
01:38:10,996 --> 01:38:12,416
What's wrong now
with this picture?

2416
01:38:13,346 --> 01:38:14,116
What jumps out at you?

2417
01:38:14,606 --> 01:38:14,673
Yeah.

2418
01:38:14,673 --> 01:38:18,266
>> That it's only read,
written by the owner

2419
01:38:18,266 --> 01:38:21,626
and no one else can access it.

2420
01:38:21,626 --> 01:38:21,896
>> Perfect.

2421
01:38:21,896 --> 01:38:24,626
Only read, writable by the
owner, no one else can read it.

2422
01:38:24,626 --> 01:38:25,366
That's a problem.

2423
01:38:25,556 --> 01:38:27,286
So there's a bunch
of ways to fix this

2424
01:38:27,286 --> 01:38:29,856
but the way we'll
introduce for now is chmod

2425
01:38:29,856 --> 01:38:35,276
which is change mode and then
a for all aka everybody, plus,

2426
01:38:35,276 --> 01:38:39,086
what do I want to give everyone,
r. So a little archean,

2427
01:38:39,086 --> 01:38:42,036
the syntax, but then this
command gives it what do

2428
01:38:42,036 --> 01:38:42,346
we want.

2429
01:38:42,466 --> 01:38:44,426
Change the mode of
the google.html

2430
01:38:44,656 --> 01:38:47,496
to get everyone r.
The plus means give,

2431
01:38:47,496 --> 01:38:48,876
minus means subtract.

2432
01:38:49,096 --> 01:38:52,946
So enter, ls -al and now
that problem is solved.

2433
01:38:53,126 --> 01:38:56,776
By contrast, if the
directories looked like this,

2434
01:38:58,316 --> 01:39:00,486
propose to me how we
fix this problem now.

2435
01:39:00,486 --> 01:39:02,976
Now my dot and dot-dot
directories are no longer

2436
01:39:02,976 --> 01:39:05,926
executable which means my file
is readable but no one can get

2437
01:39:05,926 --> 01:39:07,446
into this directory via the web.

2438
01:39:08,086 --> 01:39:09,096
How do I fix this?

2439
01:39:13,856 --> 01:39:14,386
>> A plus x.

2440
01:39:14,386 --> 01:39:17,496
>> OK. Good, a+x for
executability and then the name

2441
01:39:17,496 --> 01:39:19,636
of the file which is--
or folder which is dot

2442
01:39:19,636 --> 01:39:21,606
and I can actually put
a space separated list

2443
01:39:21,606 --> 01:39:22,956
of these things on
the command line.

2444
01:39:22,956 --> 01:39:27,076
I can hit that and now ls
-al, we fix that problem, too.

2445
01:39:27,296 --> 01:39:33,646
Now suppose I goofed and suppose
I do chmod a+x google.html you

2446
01:39:33,646 --> 01:39:36,116
can maybe guess what's
going to change.

2447
01:39:36,116 --> 01:39:38,806
So think to yourself what does
this line going to look like.

2448
01:39:39,186 --> 01:39:43,966
In just a second, now it
has an x everywhere as well.

2449
01:39:44,166 --> 01:39:45,356
Does this mean anything?

2450
01:39:45,506 --> 01:39:48,646
In this case, no, it's an
HTML file, it's a static file.

2451
01:39:48,866 --> 01:39:50,776
Making it executable
means nothing.

2452
01:39:50,776 --> 01:39:52,476
And so, is this going
to break anything?

2453
01:39:52,476 --> 01:39:54,876
No, it's just kind of
wrong in principle.

2454
01:39:55,186 --> 01:39:57,236
However, sometimes with PHP,

2455
01:39:57,236 --> 01:40:00,006
your PHP files need
to be executable.

2456
01:40:00,216 --> 01:40:02,976
That is not the case
on most web servers.

2457
01:40:03,386 --> 01:40:05,596
Typically, they just
need to be readable.

2458
01:40:06,416 --> 01:40:08,526
And we'll now see
some PHP, all right.

2459
01:40:08,756 --> 01:40:11,026
So that was a lot of
fun making google.html.

2460
01:40:11,276 --> 01:40:15,266
Now, let us pretend to
implement a Goggle server.

2461
01:40:15,266 --> 01:40:18,356
I'm going to go ahead
and hit New,

2462
01:40:18,896 --> 01:40:20,556
let me copy this temporarily.

2463
01:40:20,556 --> 01:40:23,486
So new file, I'm going
save this as server.php.

2464
01:40:23,486 --> 01:40:25,496
So our very first PHP file,
we're going to pretend

2465
01:40:25,496 --> 01:40:28,726
to be Goggle for
a moment, enter.

2466
01:40:28,956 --> 01:40:31,316
And now I'm going to start, you
know, I'm going to cheat here

2467
01:40:31,736 --> 01:40:33,526
and say, you know
what, I don't what

2468
01:40:33,526 --> 01:40:34,946
to do any of these just yet.

2469
01:40:34,946 --> 01:40:38,256
I'm going to just do something
silly like coming soon.

2470
01:40:39,286 --> 01:40:41,906
So this, I argue, is PHP.

2471
01:40:42,576 --> 01:40:46,126
I name the file server.php,
I claim you now no PHP.

2472
01:40:46,126 --> 01:40:47,496
And why is that?

2473
01:40:47,676 --> 01:40:51,756
Well in the world of PHP you
can actually commingle HTML

2474
01:40:51,946 --> 01:40:54,266
and CSS with row of PHP code.

2475
01:40:54,516 --> 01:40:57,516
So the fact that I haven't
actually written any PHP code,

2476
01:40:57,736 --> 01:41:01,156
is actually kind of sad
because this is not PHP,

2477
01:41:01,226 --> 01:41:02,276
but this will still work.

2478
01:41:02,396 --> 01:41:04,066
So let's actually take
a look at what happens.

2479
01:41:04,436 --> 01:41:08,736
I'm going to go into google.html
now, which again we made Monday.

2480
01:41:09,166 --> 01:41:10,806
And I've already fixed
the query string.

2481
01:41:10,876 --> 01:41:13,716
But I don't want to go to
search on goole.com now,

2482
01:41:13,916 --> 01:41:16,036
I'm instead get to
change this to server.php.

2483
01:41:16,036 --> 01:41:19,386
In order words, when I submit
this form now, I want it going

2484
01:41:19,386 --> 01:41:21,656
to my own file just
to see what happens.

2485
01:41:22,096 --> 01:41:23,636
So let's go ahead
and pull this up.

2486
01:41:23,966 --> 01:41:27,066
And let me go ahead and type
in Harvard again, enter.

2487
01:41:27,806 --> 01:41:29,176
Wait a minute, something
is wrong.

2488
01:41:30,286 --> 01:41:31,056
What I'd do that's wrong?

2489
01:41:31,646 --> 01:41:35,766
I did not implement
this certainly.

2490
01:41:35,876 --> 01:41:35,976
Yeah.

2491
01:41:36,031 --> 01:41:38,031
[ Inaudible Remark ]

2492
01:41:38,046 --> 01:41:38,686
Perfect, right?

2493
01:41:39,106 --> 01:41:40,206
Stupid mistake, right?

2494
01:41:40,206 --> 01:41:40,956
Caching, right?

2495
01:41:40,956 --> 01:41:42,426
The browser has to be reloaded

2496
01:41:42,426 --> 01:41:44,366
to actually get the
new copy of the HTML.

2497
01:41:44,366 --> 01:41:48,476
So let's hit the back button,
and let's then reload here.

2498
01:41:48,636 --> 01:41:50,066
And now, let me do
a sanity check.

2499
01:41:50,066 --> 01:41:50,806
I'm going to right click

2500
01:41:50,806 --> 01:41:52,846
and view page source,
now it's correct.

2501
01:41:52,986 --> 01:41:55,526
This is what the browser
is now seeing, server.php.

2502
01:41:55,526 --> 01:41:57,716
So here we go, I'm
going to search

2503
01:41:57,716 --> 01:42:00,486
for Harvard now and hit enter.

2504
01:42:01,616 --> 01:42:03,216
Hmm, problem.

2505
01:42:03,296 --> 01:42:05,816
So this is a security feature
that's actually provided

2506
01:42:05,816 --> 01:42:06,436
by suPHP.

2507
01:42:06,436 --> 01:42:11,966
Just for good measure, suPHP
does not want your PHP files

2508
01:42:11,966 --> 01:42:13,016
to be writtable [phonetic], why?

2509
01:42:13,156 --> 01:42:15,576
Because if you screw up,
if the file is writtable,

2510
01:42:15,706 --> 01:42:17,896
you could change the
file itself somehow.

2511
01:42:18,116 --> 01:42:22,876
So we can fix this using what we
know already of chmod, ls-al--

2512
01:42:23,246 --> 01:42:26,656
oops-- ls-al, the problem is

2513
01:42:26,656 --> 01:42:29,086
that the PHP file is
writtable by group.

2514
01:42:29,086 --> 01:42:34,306
How do I take away that W
from my group do you think?

2515
01:42:34,796 --> 01:42:34,916
Yeah.

2516
01:42:34,916 --> 01:42:37,696
>> Use G minus [inaudible].

2517
01:42:37,786 --> 01:42:40,866
>> Perfect G minus W
for a server.php, enter.

2518
01:42:41,096 --> 01:42:43,906
And now I do ls-al
and that's OK.

2519
01:42:43,906 --> 01:42:46,056
And you know what I'm going
to do one more thing chmod,

2520
01:42:46,546 --> 01:42:50,716
I'm going to do a
minus r of server.php.

2521
01:42:50,936 --> 01:42:53,666
And now, here is the output.

2522
01:42:53,726 --> 01:42:55,226
This is actually wrong now.

2523
01:42:55,436 --> 01:42:56,676
I need to give myself back.

2524
01:42:56,676 --> 01:43:02,056
So a chmod owner, O plus
R of server.php ls-al--

2525
01:43:03,096 --> 01:43:06,446
oops-- let's cheat here.

2526
01:43:07,126 --> 01:43:11,166
So now what do we see?

2527
01:43:11,406 --> 01:43:17,816
OK. So now, I argue that
this is sufficient for PHP.

2528
01:43:18,096 --> 01:43:22,386
Whereas JavaScript and HTML
and CSS and GIFs and PNGs

2529
01:43:22,386 --> 01:43:25,176
and JPEGs need to
be readable by all,

2530
01:43:25,526 --> 01:43:29,466
I argue now that PHP files
only have to be readable by me.

2531
01:43:30,646 --> 01:43:31,696
Why does this distinction?

2532
01:43:31,696 --> 01:43:38,186
Why does this make
sense in the context

2533
01:43:38,186 --> 01:43:39,546
of what we've discussed
this far today?

2534
01:43:40,286 --> 01:43:40,376
Yeah.

2535
01:43:41,676 --> 01:43:43,806
>> This is just wild guess.

2536
01:43:43,806 --> 01:43:47,106
That made the PHPs just
run on the server not

2537
01:43:47,106 --> 01:43:49,316
by the actual user
on the other side--

2538
01:43:49,316 --> 01:43:49,656
>> Perfect.

2539
01:43:49,656 --> 01:43:52,816
>> -- it's just getting
what the PHP needs to,

2540
01:43:52,816 --> 01:43:54,066
in which is irrelevant
in this case.

2541
01:43:54,296 --> 01:43:54,906
>> Exactly.

2542
01:43:55,006 --> 01:43:59,036
So whereas static files
like JavaScript, CSS, HTML,

2543
01:43:59,036 --> 01:44:01,836
JPEG are ultimately sent
literally to the user

2544
01:44:01,836 --> 01:44:03,666
to be viewed and
seen by him or her.

2545
01:44:03,666 --> 01:44:07,566
PHP is meant to be first
interpreted by the server

2546
01:44:07,716 --> 01:44:09,486
and then the server
will send the output

2547
01:44:09,536 --> 01:44:11,966
of that PHP file to the browser.

2548
01:44:12,216 --> 01:44:14,046
Now at the moment, we have
kind of a silly example.

2549
01:44:14,046 --> 01:44:16,736
Inside of server.php is
no PHP code whatsoever.

2550
01:44:16,736 --> 01:44:19,036
What's inside of
there, just HTML.

2551
01:44:19,386 --> 01:44:22,206
So what's going to happen when
I reload the page, and resubmit

2552
01:44:22,206 --> 01:44:25,366
that form, the web server
Apache is going to realize, "Oh,

2553
01:44:25,506 --> 01:44:27,496
you have submitted a
form to a PHP file."

2554
01:44:27,496 --> 01:44:30,006
Why? Because it ends in .php.

2555
01:44:30,006 --> 01:44:33,556
I am configured because of
the way the LAMP stack works

2556
01:44:33,826 --> 01:44:37,876
to interpret .php files
using the PHP interpreter

2557
01:44:37,876 --> 01:44:40,146
which is just a program
that understand PHP.

2558
01:44:40,146 --> 01:44:42,966
Now, the PHP interpreter is
going to look for PHP code.

2559
01:44:43,316 --> 01:44:44,906
Anything that's not a PHP code,

2560
01:44:44,906 --> 01:44:47,016
it's defined to just
spit out raw.

2561
01:44:47,266 --> 01:44:50,306
So anything in the file,
even if it ends in .php,

2562
01:44:50,306 --> 01:44:52,226
if its not a PHP code itself,

2563
01:44:52,226 --> 01:44:54,176
it just get sent
raw to the browser.

2564
01:44:54,376 --> 01:44:56,076
So what is the user going
to see in this case?

2565
01:44:56,246 --> 01:44:59,046
Literally all of my HTML because
I haven't written a single line

2566
01:44:59,046 --> 01:45:00,376
of PHP code yet.

2567
01:45:00,836 --> 01:45:04,116
But the point though is that
because it did end in PHP,

2568
01:45:04,176 --> 01:45:07,596
the principle is the same, only
the web server has to be able

2569
01:45:07,596 --> 01:45:10,526
to read that PHP file in
order to interpret it.

2570
01:45:10,656 --> 01:45:17,206
But who is the web server going
to be running as for PHP files?

2571
01:45:17,676 --> 01:45:17,756
Yeah.

2572
01:45:18,816 --> 01:45:19,166
>> Jharvard

2573
01:45:19,166 --> 01:45:22,276
>> Jharvard, because
of the suPHP feature.

2574
01:45:22,276 --> 01:45:24,366
Substitute user PHP, means

2575
01:45:24,366 --> 01:45:29,176
for any PHP files substitute the
user who owns the file so that,

2576
01:45:29,356 --> 01:45:31,436
the security mechanism
we discussed is in place.

2577
01:45:31,826 --> 01:45:34,926
So I'm going back to my
browser, I'm going to go back

2578
01:45:34,986 --> 01:45:38,066
to the form, I'm going
to resubmit Harvard

2579
01:45:38,066 --> 01:45:39,406
to my fake Google search.

2580
01:45:39,406 --> 01:45:42,856
And now enter, now,
list the URL,

2581
01:45:43,366 --> 01:45:47,516
is server.php question mark Q
equals Harvard, Coming Soon.

2582
01:45:47,796 --> 01:45:49,296
How, lets write some PHP code.

2583
01:45:49,296 --> 01:45:51,156
One of the most powerful
things you can do

2584
01:45:51,156 --> 01:45:55,046
in a dynamic website is actually
spit out what the user has done.

2585
01:45:55,356 --> 01:45:59,156
So here is my PHP code, rather--
well, it's sort of meaningless

2586
01:45:59,206 --> 01:46:02,556
because there is no PHP,
let me-- your server.php.

2587
01:46:02,556 --> 01:46:06,276
Instead of coming soon,
let me do something like,

2588
01:46:06,676 --> 01:46:12,196
"You wanted to search
for; let me do a bold tag,

2589
01:46:12,646 --> 01:46:16,066
and let me really cheat
now, harvard, save this.

2590
01:46:16,726 --> 01:46:16,956
All right.

2591
01:46:16,956 --> 01:46:19,756
Now, nobody should be fooled
by this, when I go back here,

2592
01:46:20,416 --> 01:46:22,526
go back, do I have
to reload the form?

2593
01:46:23,106 --> 01:46:25,926
No, because I only changed
the server.php files.

2594
01:46:25,926 --> 01:46:27,836
You don't need to
refresh everything.

2595
01:46:27,836 --> 01:46:29,236
I didn't change the Google.html.

2596
01:46:29,286 --> 01:46:32,816
Let me go ahead and click
Google search, oh my God,

2597
01:46:33,066 --> 01:46:34,166
we now have a dynamic website.

2598
01:46:34,166 --> 01:46:36,766
I typed Harvard and Harvard
appeared on the screen,

2599
01:46:37,146 --> 01:46:38,226
but not really, right?

2600
01:46:38,226 --> 01:46:41,916
Because if I go back
again, and I type in Yale

2601
01:46:42,396 --> 01:46:44,566
in I Google search, OK,
I'm clearly cheating.

2602
01:46:44,796 --> 01:46:47,186
So let's be a little
more genuinely dynamic.

2603
01:46:47,186 --> 01:46:49,896
Let's go here, and I don't
want to spit out Harvard.

2604
01:46:50,376 --> 01:46:53,136
But based on the discussion
of superglobals earlier,

2605
01:46:53,246 --> 01:46:55,646
where in the world can we
find what the user typed

2606
01:46:55,646 --> 01:46:56,206
in for queue?

2607
01:46:56,706 --> 01:46:58,976
Yeah, go ahead, yup.

2608
01:46:59,516 --> 01:47:01,736
[ Inaudible Remark ]

2609
01:47:02,236 --> 01:47:03,756
In it GET superglobal, yeah.

2610
01:47:03,756 --> 01:47:05,316
So let's do this, we now need

2611
01:47:05,536 --> 01:47:07,796
to insert the value
of that variable.

2612
01:47:07,796 --> 01:47:11,896
And you might just want to do
this, $_GET, here is the syntax

2613
01:47:11,896 --> 01:47:14,126
for going into a superglobal.

2614
01:47:14,326 --> 01:47:17,096
You do square brackets,
quote and quote the name

2615
01:47:17,096 --> 01:47:19,156
of the thing you want
to GET closed bracket.

2616
01:47:19,156 --> 01:47:22,066
All right, so this is
a super global itself

2617
01:47:22,336 --> 01:47:23,936
but it's more specifically
in a associate

2618
01:47:23,936 --> 01:47:27,326
of array otherwise known
as a hash table, hash map,

2619
01:47:27,326 --> 01:47:28,406
whatever, you're familiar with.

2620
01:47:28,596 --> 01:47:31,986
And that means you index into
it using not numbers but words

2621
01:47:31,986 --> 01:47:35,336
or letters, and once you get out
of it, is the key-- the values.

2622
01:47:35,926 --> 01:47:38,856
So in this case, we should
get back H-A-R-V-A-R-D

2623
01:47:38,856 --> 01:47:41,176
or Y-A-L-E, but not quite.

2624
01:47:41,436 --> 01:47:43,486
So, let me try this just
to prove that I'm wrong.

2625
01:47:43,876 --> 01:47:46,356
Let me go back here,
real search.

2626
01:47:47,856 --> 01:47:50,116
OK, clearly not what
I want but I need

2627
01:47:50,116 --> 01:47:52,846
to tell the server,
here is PHP code.

2628
01:47:52,846 --> 01:47:54,806
Otherwise, it's just
cryptic looking English.

2629
01:47:55,006 --> 01:47:58,486
The means by which I do that
is I have to enter PHP mode,

2630
01:47:58,736 --> 01:48:02,336
open bracket question
mark PHP space.

2631
01:48:02,926 --> 01:48:05,686
And then on the end
kind of the opposite,

2632
01:48:05,686 --> 01:48:07,396
question mark close bracket.

2633
01:48:07,716 --> 01:48:11,476
If you've come from the world of
ASP in Windows or JSP in Java,

2634
01:48:11,476 --> 01:48:14,136
you might have seen similar
tags, this just means,

2635
01:48:14,246 --> 01:48:17,616
enter PHP mode, do
something, exit PHP mode.

2636
01:48:17,896 --> 01:48:19,466
So let's see what the
end result is here.

2637
01:48:20,066 --> 01:48:24,956
Let me go back to Google,
reverse, Google search

2638
01:48:24,956 --> 01:48:27,336
for Yale, interesting.

2639
01:48:27,996 --> 01:48:29,536
What is missing here now?

2640
01:48:30,006 --> 01:48:32,666
What did I do wrong?

2641
01:48:33,826 --> 01:48:33,946
Yeah?

2642
01:48:34,076 --> 01:48:40,626
>> Well, you have to actually
set the value of the GET.

2643
01:48:40,806 --> 01:48:41,436
>> Exactly.

2644
01:48:41,816 --> 01:48:44,206
So think about any programming
language you know, generally,

2645
01:48:44,206 --> 01:48:45,926
if you want to print
the value of variable,

2646
01:48:46,116 --> 01:48:47,726
it's not sufficient
just to write the name

2647
01:48:47,726 --> 01:48:48,986
of the variable in your program.

2648
01:48:49,516 --> 01:48:49,616
>> Echo.

2649
01:48:50,856 --> 01:48:53,056
>> Echo would work but we
have a couple of options here.

2650
01:48:53,056 --> 01:48:56,166
We can say echo,
literally, we can say print

2651
01:48:56,306 --> 01:48:59,446
and then we can do a parenthesis
to make an actual function call.

2652
01:48:59,446 --> 01:49:00,416
I'll go with this one for now.

2653
01:49:00,416 --> 01:49:02,156
But echo is also
a viable option,

2654
01:49:02,346 --> 01:49:06,016
and now we're explicitly telling
the interpreter, print the value

2655
01:49:06,016 --> 01:49:07,636
of this variable here.

2656
01:49:07,636 --> 01:49:10,986
So let's go back to my
browser, go back, resubmit Yale.

2657
01:49:11,916 --> 01:49:14,166
And now, we have
some dynamism to it.

2658
01:49:14,336 --> 01:49:14,446
Yeah.

2659
01:49:14,616 --> 01:49:17,126
>> Is there a difference
between echo and print?

2660
01:49:17,126 --> 01:49:18,576
>> Is there a different
between echo and print?

2661
01:49:18,576 --> 01:49:19,486
Not really.

2662
01:49:19,486 --> 01:49:22,406
Print is a proper function,
Echo is a language construct

2663
01:49:22,406 --> 01:49:23,586
that the crazy people
in the internet

2664
01:49:23,586 --> 01:49:25,606
that have done benchmarks
comparing print and echo.

2665
01:49:25,856 --> 01:49:27,656
And every blog post
that I've read,

2666
01:49:27,656 --> 01:49:28,696
pretty much says
they're equivalent.

2667
01:49:29,206 --> 01:49:31,696
Now, except for microseconds
or milliseconds

2668
01:49:31,696 --> 01:49:34,156
of your echoing millions of
things but, for all intents

2669
01:49:34,156 --> 01:49:35,186
and purposes, they're the same.

2670
01:49:36,116 --> 01:49:37,586
So we can do something
else here.

2671
01:49:37,586 --> 01:49:39,676
And now this is a religious
thing that I'm sure some people

2672
01:49:39,676 --> 01:49:41,296
on the Internet will
hate me for saying.

2673
01:49:41,606 --> 01:49:44,746
But, I've always thought
this is atrocious construct

2674
01:49:44,806 --> 01:49:46,286
for saying enter PHP mode.

2675
01:49:46,536 --> 01:49:49,636
And indeed, PHP also supports
what are called short tags,

2676
01:49:49,636 --> 01:49:51,886
open bracket question
mark and that's it.

2677
01:49:52,086 --> 01:49:54,336
Now, there are corner
cases you can get into

2678
01:49:54,336 --> 01:49:56,776
and if you read the crazy
religious debates online,

2679
01:49:56,776 --> 01:49:57,256
you'll see that, one

2680
01:49:57,256 --> 01:49:59,056
of the reasonably
compelling reasons is that,

2681
01:49:59,396 --> 01:50:03,126
if a web server is not
configured with support

2682
01:50:03,126 --> 01:50:05,316
for short tags, this is
a short tag, because why?

2683
01:50:05,316 --> 01:50:06,876
It's shorter than I
what previously typed.

2684
01:50:07,296 --> 01:50:08,446
Then you do run the risk

2685
01:50:08,446 --> 01:50:11,656
of having your raw PHP code
transmitted to the user

2686
01:50:11,656 --> 01:50:13,626
as though it's just
HTML or the like at

2687
01:50:13,626 --> 01:50:15,736
which point you've
disclosed the sanctity

2688
01:50:15,736 --> 01:50:17,506
of your intellectually
property, or worse,

2689
01:50:17,506 --> 01:50:18,656
your user names and passwords.

2690
01:50:18,986 --> 01:50:20,456
So that's kind of a legitimate.

2691
01:50:20,756 --> 01:50:24,096
But if you are running your own
web server, and have control

2692
01:50:24,096 --> 01:50:26,616
over the short tags feature
in a file called php.ini,

2693
01:50:27,386 --> 01:50:29,636
which is config file, I think
we mentioned briefly on Monday,

2694
01:50:29,816 --> 01:50:30,966
that we'll be on the
appliance for you

2695
01:50:30,966 --> 01:50:31,966
to tinker with if you want.

2696
01:50:32,426 --> 01:50:33,856
Frankly, I just think
there's an elegance

2697
01:50:33,856 --> 01:50:34,766
about the symmetry of this.

2698
01:50:34,976 --> 01:50:36,956
But typically when
you're writing code,

2699
01:50:37,146 --> 01:50:39,586
that won't necessarily
run on your own server

2700
01:50:39,826 --> 01:50:43,356
but could be posted
as open source code,

2701
01:50:43,666 --> 01:50:45,486
or you're writing it
for corporate project

2702
01:50:45,486 --> 01:50:47,926
where you don't have control
over the web servers themselves,

2703
01:50:48,206 --> 01:50:50,596
the first way I did it with
open bracket PHP is the

2704
01:50:50,596 --> 01:50:51,126
preferred way.

2705
01:50:51,126 --> 01:50:53,236
Because it's more portable,
it's not going to break.

2706
01:50:53,366 --> 01:50:55,486
Because the worst thing,
is if you download code

2707
01:50:55,726 --> 01:50:57,806
that someone else has written
and it's all short tags

2708
01:50:57,806 --> 01:50:59,636
and your web server
doesn't support short tags

2709
01:50:59,636 --> 01:51:01,156
and you might not
control your web server

2710
01:51:01,156 --> 01:51:03,496
because it's a third party web
post, it's a pain in the neck.

2711
01:51:03,496 --> 01:51:04,726
You go through thousands
of lines

2712
01:51:04,726 --> 01:51:07,166
of your own code
changing your short tags

2713
01:51:07,166 --> 01:51:09,006
to long tags or vice versa.

2714
01:51:09,946 --> 01:51:12,436
So just FYI, you'll
see both tricks online.

2715
01:51:13,276 --> 01:51:16,866
So this is nice but can
we do better than this?

2716
01:51:16,866 --> 01:51:19,876
Well, let's actually try
something a little more general.

2717
01:51:19,876 --> 01:51:23,506
Let me go in here instead,
let me create a new form

2718
01:51:24,186 --> 01:51:28,696
and let's do a few different
data types this time.

2719
01:51:28,696 --> 01:51:31,306
Let me go ahead here and paste
this in just to get it started.

2720
01:51:31,706 --> 01:51:34,406
And then I'll have
a registration form,

2721
01:51:35,116 --> 01:51:37,906
and center Google Registration.

2722
01:51:38,496 --> 01:51:40,736
Again, we'll do register dot--

2723
01:51:40,736 --> 01:51:41,846
or this time we'll
do register.php.

2724
01:51:41,846 --> 01:51:44,916
And let's do a few
things this time.

2725
01:51:44,916 --> 01:51:47,556
I'm going to do input,
name, equals name.

2726
01:51:48,666 --> 01:51:51,426
And I'm going to say, let's
do this quick and dirty

2727
01:51:51,496 --> 01:51:54,536
for a registration form for like
a conference or student group

2728
01:51:54,536 --> 01:51:55,506
or something like that,

2729
01:51:55,996 --> 01:51:58,496
input name equals
name, type equals text.

2730
01:51:58,826 --> 01:52:01,206
And now, let's do
a line break here,

2731
01:52:01,756 --> 01:52:04,986
and let's just do another
something here, like,

2732
01:52:06,096 --> 01:52:09,426
let's do Gender and
let's do this check--

2733
01:52:09,576 --> 01:52:13,436
or write there radio and
for something like gender.

2734
01:52:14,386 --> 01:52:20,606
And then I'll say value equals
M for male and I'll say M here.

2735
01:52:20,666 --> 01:52:24,596
And then I'll stay over here,
input type equals name--

2736
01:52:24,786 --> 01:52:32,076
nope gender-- nope, name equals
gender, type equals radio,

2737
01:52:32,076 --> 01:52:35,466
value equals F, and
now I'll put F here.

2738
01:52:35,756 --> 01:52:38,926
And then should we do
one more, let's do one,

2739
01:52:38,976 --> 01:52:40,996
just a simple drop
down down here.

2740
01:52:41,256 --> 01:52:49,286
Let's do a select, name
equals states, close select.

2741
01:52:50,166 --> 01:52:55,056
Let's do this here, option value
equals let's say Connecticut,

2742
01:52:55,776 --> 01:52:58,756
close option, and Massachusetts.

2743
01:52:58,756 --> 01:52:59,996
So our registration form

2744
01:52:59,996 --> 01:53:02,116
for whatever reason will only
support people from Connecticut

2745
01:53:02,116 --> 01:53:03,096
or Massachusetts just

2746
01:53:03,096 --> 01:53:05,476
so we don't get bored
typing them all out.

2747
01:53:05,476 --> 01:53:08,886
OK, so I've made a very
quick and dirty form in--

2748
01:53:09,426 --> 01:53:12,446
sadly a file called google.php.

2749
01:53:12,446 --> 01:53:13,666
So, I'll restore that later

2750
01:53:13,666 --> 01:53:14,996
so you can have the
original code back.

2751
01:53:15,756 --> 01:53:17,546
Let's go ahead and save
this as something else.

2752
01:53:17,806 --> 01:53:19,676
So, register.html.

2753
01:53:20,576 --> 01:53:22,976
OK. So, now let me pull
this up in my browser.

2754
01:53:23,586 --> 01:53:26,906
Server is going to
change to register.html.

2755
01:53:27,286 --> 01:53:30,046
OK. So there we have pretty
atrocious looking website.

2756
01:53:30,256 --> 01:53:32,646
And in fact I've omitted one
of the more important pieces.

2757
01:53:33,736 --> 01:53:35,346
So, what do we need?

2758
01:53:35,946 --> 01:53:38,086
It'll be nice if we
had a submit button.

2759
01:53:38,086 --> 01:53:42,376
So, let's go in here,
input type equal submit,

2760
01:53:42,716 --> 01:53:47,756
value equals register,
close brackets, reload.

2761
01:53:47,896 --> 01:53:50,106
OK. So, there's our
very simple website.

2762
01:53:50,106 --> 01:53:51,936
It's a little more interesting
than our fake Google site

2763
01:53:51,936 --> 01:53:54,606
because at least now we have a
couple of user input mechanisms

2764
01:53:54,606 --> 01:53:55,756
that we didn't have before.

2765
01:53:55,986 --> 01:53:57,086
So, then let's now look

2766
01:53:57,086 --> 01:53:58,906
on the back end what
we're going to get.

2767
01:53:59,226 --> 01:54:01,836
So, first, let me fill
this out as a sample.

2768
01:54:01,836 --> 01:54:04,246
David, male, we'll change
this to Massachusetts,

2769
01:54:04,246 --> 01:54:05,596
and now I'm going
to click register.

2770
01:54:05,596 --> 01:54:07,586
But let me zoom out so we
can see the URL change.

2771
01:54:08,026 --> 01:54:11,826
Register, and now registered.php
was not found on the server

2772
01:54:11,826 --> 01:54:13,876
but that make sense because
we haven't created it yet.

2773
01:54:14,226 --> 01:54:15,286
So, let's go ahead and do that.

2774
01:54:15,286 --> 01:54:17,536
Let me go back to
my text editor,

2775
01:54:18,096 --> 01:54:22,326
let me copy this temporarily,
make a new file, paste this in,

2776
01:54:22,326 --> 01:54:24,586
we'll call this register.php.

2777
01:54:24,586 --> 01:54:28,716
And I want to say
here registered

2778
01:54:29,286 --> 01:54:36,326
and we'll say something like,
"Hello", open brackets, print,

2779
01:54:36,486 --> 01:54:39,286
dollar sign, underscore,
GET, name,

2780
01:54:39,936 --> 01:54:42,456
close bracket, close
bracket there.

2781
01:54:42,906 --> 01:54:44,246
So, let's take this
one step at the time.

2782
01:54:44,356 --> 01:54:45,946
First, I'm just going
to say hello

2783
01:54:46,036 --> 01:54:48,816
to whoever it was
that registered, OK?

2784
01:54:49,376 --> 01:54:51,006
So, let's get back
over to the browser.

2785
01:54:51,466 --> 01:54:54,136
We'll go back, we
submit the form

2786
01:54:54,756 --> 01:54:56,306
and damn it, same bug again.

2787
01:54:56,366 --> 01:54:57,376
So, quick how do we fix this,

2788
01:54:57,656 --> 01:54:59,166
writeable by group,
that was the problem.

2789
01:55:00,066 --> 01:55:00,606
Chmod.

2790
01:55:02,116 --> 01:55:02,506
>> G.

2791
01:55:03,076 --> 01:55:03,156
>> G.

2792
01:55:05,356 --> 01:55:06,016
>> Plus

2793
01:55:07,086 --> 01:55:09,496
>> Minus, w register.php.

2794
01:55:09,676 --> 01:55:10,926
OK, fixed.

2795
01:55:11,416 --> 01:55:13,326
Let me go back here.

2796
01:55:13,326 --> 01:55:16,446
And notice, new status code
500, 500 is generally the worst,

2797
01:55:16,446 --> 01:55:17,576
it means it really
did something wrong.

2798
01:55:17,896 --> 01:55:18,266
All right.

2799
01:55:18,386 --> 01:55:19,696
So, let's go back here.

2800
01:55:19,976 --> 01:55:22,486
Let's reload the form and
wala [phonetic] Hello David.

2801
01:55:22,486 --> 01:55:23,896
OK, so some progress there.

2802
01:55:23,896 --> 01:55:24,496
so that's good.

2803
01:55:24,616 --> 01:55:26,466
And let me introduce one
other syntactic trick.

2804
01:55:26,686 --> 01:55:28,306
Frankly, this isn't the
prettiest thing printing

2805
01:55:28,306 --> 01:55:29,736
out a symbol there.

2806
01:55:30,066 --> 01:55:32,716
There is this trick you
can do with short tags

2807
01:55:32,716 --> 01:55:33,636
which is very compelling.

2808
01:55:33,636 --> 01:55:35,936
If you want to insert
the value of a variable,

2809
01:55:36,126 --> 01:55:38,326
you can put open bracket
question mark equal sign

2810
01:55:38,326 --> 01:55:39,606
with no space in between them.

2811
01:55:39,956 --> 01:55:42,786
So, just to confirm, let me go
back to the page, let me reload

2812
01:55:43,156 --> 01:55:45,216
and seems to be staying
the same, which is great.

2813
01:55:45,416 --> 01:55:46,536
Now, let's look at the URL.

2814
01:55:46,536 --> 01:55:49,686
It's more complex in Google's
because we have multiple input.

2815
01:55:49,686 --> 01:55:51,526
David, male, state equals MA.

2816
01:55:51,956 --> 01:55:53,936
How do we get access
to these other values?

2817
01:55:53,936 --> 01:55:56,926
Well, first let's do a quick and
dirty thing and let's just look

2818
01:55:56,926 --> 01:55:58,796
at the entire contents of GET.

2819
01:55:59,246 --> 01:56:01,816
So, let me go into
registered.php and I'm going

2820
01:56:02,116 --> 01:56:05,176
to cheat now, I'm going to
output a pre-formatted of tag,

2821
01:56:05,516 --> 01:56:08,586
we call that pre-formatted
text uses monospaced font just

2822
01:56:08,586 --> 01:56:09,876
so everything looks like code.

2823
01:56:10,176 --> 01:56:17,106
And what I'm going to do in here
is instead going to do ?=$_GET.

2824
01:56:17,796 --> 01:56:19,136
But this isn't quite right.

2825
01:56:19,136 --> 01:56:20,836
Actually let me put
this on this line.

2826
01:56:21,336 --> 01:56:23,156
So, you'd like do think
this will just print

2827
01:56:23,156 --> 01:56:24,356
out the entirety of GET.

2828
01:56:24,446 --> 01:56:26,006
But let's see what
I see instead,

2829
01:56:26,076 --> 01:56:27,886
if I go to here, let me reload.

2830
01:56:28,716 --> 01:56:30,206
OK, not that enlightening.

2831
01:56:30,206 --> 01:56:31,306
It just says array.

2832
01:56:31,556 --> 01:56:34,496
But that make sense because
I did say GET is an array.

2833
01:56:34,806 --> 01:56:36,406
So, we need to print
it recursively

2834
01:56:36,406 --> 01:56:37,726
to see what's inside the array.

2835
01:56:38,096 --> 01:56:39,276
And the trick you can use,

2836
01:56:39,276 --> 01:56:40,986
and this is not generally
for production code.

2837
01:56:40,986 --> 01:56:44,296
You don't say print, you
say print_r for recursive,

2838
01:56:44,586 --> 01:56:47,076
and it's a wonder way of just
taking a quick peak inside

2839
01:56:47,076 --> 01:56:47,746
of variables.

2840
01:56:47,746 --> 01:56:49,416
So, I'm going to
go to registered,

2841
01:56:49,416 --> 01:56:51,036
reload, and there we go.

2842
01:56:51,436 --> 01:56:52,626
So, this is what it looks like.

2843
01:56:52,626 --> 01:56:54,326
This is completely
arbitrary formatting.

2844
01:56:54,326 --> 01:56:56,566
This has nothing to do with
the underlying implementation,

2845
01:56:56,566 --> 01:56:58,626
it's just the pretty way of
printing the information.

2846
01:56:58,786 --> 01:57:01,336
And now, I see three
keys, name, gender,

2847
01:57:01,336 --> 01:57:03,536
state followed three value.

2848
01:57:03,536 --> 01:57:04,986
So, this is just a
nice sanity check

2849
01:57:04,986 --> 01:57:06,556
as to what's actually in there.

2850
01:57:06,896 --> 01:57:08,466
So, now I can do
something like this.

2851
01:57:08,466 --> 01:57:12,376
Let me go back in the P--
registered.php, let me go back

2852
01:57:12,376 --> 01:57:21,116
to saying h1 Hello equals
a $_GET name close bracket

2853
01:57:21,116 --> 01:57:23,466
exclamation point close h1.

2854
01:57:23,556 --> 01:57:26,526
Now, let me do this again and
I'm going to say something

2855
01:57:26,526 --> 01:57:31,226
like You are a-- this is going

2856
01:57:31,226 --> 01:57:33,026
to be a little underwhelming
at first.

2857
01:57:33,026 --> 01:57:37,696
Let's just do gender and
then close that h1 tag.

2858
01:57:38,206 --> 01:57:43,076
And then finally,
you are from state.

2859
01:57:43,076 --> 01:57:44,966
So, this should hopefully
follow logically

2860
01:57:44,966 --> 01:57:46,186
from what we did a moment ago.

2861
01:57:46,406 --> 01:57:47,536
So, let's reload now.

2862
01:57:48,026 --> 01:57:49,486
And fonts are little big.

2863
01:57:49,886 --> 01:57:51,636
Not the most user
friendly thing,

2864
01:57:51,836 --> 01:57:53,206
but at least we're on our way.

2865
01:57:53,476 --> 01:57:56,086
However, notice that there
is no security mechanism

2866
01:57:56,086 --> 01:57:57,056
in place here right now.

2867
01:57:57,056 --> 01:57:59,196
There is no sanity
checking of user's input.

2868
01:57:59,396 --> 01:58:02,216
And notice, we used GET,
recall the URL looks like this.

2869
01:58:02,696 --> 01:58:05,526
So, what if I instead
do something like this,

2870
01:58:05,766 --> 01:58:08,556
this is not like a
correct website right now.

2871
01:58:08,556 --> 01:58:10,746
So, there's opportunities
here, right?

2872
01:58:10,746 --> 01:58:13,576
There's opportunities to one
make sure that what the user--

2873
01:58:13,576 --> 01:58:17,926
what we provided to the user
is options are actually checked

2874
01:58:18,146 --> 01:58:19,126
on the server side.

2875
01:58:19,536 --> 01:58:21,486
Two, we can make it
more user friendly,

2876
01:58:21,486 --> 01:58:24,286
it'll be nice is Massachusetts
said Massachusetts not MA.

2877
01:58:24,286 --> 01:58:27,566
It'll be nice if the M
became male in lower case

2878
01:58:27,566 --> 01:58:29,966
or you are a guy to you are
a girl or just something.

2879
01:58:30,206 --> 01:58:31,686
So, there seems to
be opportunities here

2880
01:58:31,686 --> 01:58:34,236
for if conditions and
else's and some kind

2881
01:58:34,236 --> 01:58:35,976
of conditional checks
and so forth.

2882
01:58:36,196 --> 01:58:37,106
So, we can build up there.

2883
01:58:37,106 --> 01:58:39,266
But one of the most important
takeaways is that right now,

2884
01:58:39,266 --> 01:58:42,316
we're just trusting what the
user has submitted to the form

2885
01:58:42,576 --> 01:58:45,076
and this in of itself
is not a good assumption

2886
01:58:45,076 --> 01:58:47,056
because we can do something
even worst than this.

2887
01:58:47,226 --> 01:58:49,496
This is a very common
thing known as an--

2888
01:58:49,576 --> 01:58:50,836
cross-site scripting attack.

2889
01:58:51,356 --> 01:58:54,216
That we'll talk about more
toward the end of the semester.

2890
01:58:54,216 --> 01:58:56,406
But if you're familiar with
JavaScript even minimally,

2891
01:58:56,746 --> 01:59:02,206
what if I do something crazy
like this, you have been hacked,

2892
01:59:02,936 --> 01:59:05,276
question mark here,
close script tag.

2893
01:59:05,276 --> 01:59:07,286
OK, that's my name
I claim, all right?

2894
01:59:07,286 --> 01:59:08,456
So, what's going
to happened now?

2895
01:59:08,966 --> 01:59:11,686
Well, because of the service
side code what am I doing

2896
01:59:11,686 --> 01:59:13,796
with the name parameter?

2897
01:59:14,686 --> 01:59:14,796
Yeah.

2898
01:59:15,166 --> 01:59:17,086
>> You're closing
the script site?

2899
01:59:17,086 --> 01:59:18,046
>> I'm closing the script.

2900
01:59:18,046 --> 01:59:19,526
Well, and well, actually
you notice here,

2901
01:59:19,526 --> 01:59:21,056
I closed it but I
also opened it.

2902
01:59:21,206 --> 01:59:24,266
What am I doing in
registered.php with that value?

2903
01:59:24,936 --> 01:59:25,046
Yeah.

2904
01:59:25,566 --> 01:59:27,296
>> You're actually
going to get--

2905
01:59:27,546 --> 01:59:36,486
you're going to send
that string to the user

2906
01:59:36,486 --> 01:59:38,716
and the user [inaudible]
is going

2907
01:59:38,836 --> 01:59:41,466
to interpret it as JavaScript.

2908
01:59:41,466 --> 01:59:42,246
>> Exactly.

2909
01:59:42,246 --> 01:59:44,616
I'm literally going to spit
out what the user typed.

2910
01:59:44,616 --> 01:59:47,676
But if the user typed HTML,
that's going to add it

2911
01:59:47,676 --> 01:59:49,276
to the page and that HTML
is going to be executed

2912
01:59:49,276 --> 01:59:50,366
or interpret it, and
if it's a script tag,

2913
01:59:50,396 --> 01:59:51,596
it means the JavaScript
code is going to run.

2914
01:59:51,626 --> 01:59:53,276
So, in short, what we just did
is amazingly simple, too simple.

2915
01:59:53,306 --> 01:59:54,296
Very bad, like this
is not a good code.

2916
01:59:54,326 --> 01:59:55,256
And many websites
make this mistake

2917
01:59:55,286 --> 01:59:55,976
because watch what happens now.

2918
01:59:56,046 --> 02:00:02,576
If I go here and click register,
o-oh, what did I wrong?

2919
02:00:02,576 --> 02:00:07,726
Register [inaudible] script,
all right, stand by one second.

2920
02:00:07,726 --> 02:00:09,216
My dramatic alert.

2921
02:00:09,216 --> 02:00:12,166
You have been hacked.

2922
02:00:13,276 --> 02:00:15,756
Hmm, Chrome, are you
doing this to me?

2923
02:00:16,016 --> 02:00:17,796
[ Inaudible Remark ]

2924
02:00:17,796 --> 02:00:21,986
That should be OK
where we put it.

2925
02:00:23,276 --> 02:00:23,646
>> Semicolon.

2926
02:00:23,776 --> 02:00:27,096
>> Semicolon, let me go back.

2927
02:00:27,856 --> 02:00:29,296
You have been hacked.

2928
02:00:29,296 --> 02:00:32,806
That should be OK, let me try
one other thing otherwise this

2929
02:00:32,806 --> 02:00:35,126
is going to be a
very underwhelming--

2930
02:00:35,776 --> 02:00:41,476
type equals-- oh my God.

2931
02:00:41,476 --> 02:00:42,116
All right.

2932
02:00:42,846 --> 02:00:44,466
Stand by for one second.

2933
02:00:45,146 --> 02:00:49,406
We're going to try
one other thing here.

2934
02:00:49,896 --> 02:00:55,116
Otherwise, you will never
believe anything else I'll say.

2935
02:00:55,946 --> 02:01:06,126
OK, 151, 128.

2936
02:01:06,126 --> 02:01:06,976
Register.html.

2937
02:01:08,136 --> 02:01:10,566
OK, so before I tell
you what I just did,

2938
02:01:10,566 --> 02:01:13,206
we're going to try this again.

2939
02:01:13,326 --> 02:01:22,426
Script, alert, you have
been hacked, Massachusetts.

2940
02:01:23,466 --> 02:01:24,416
Oh, damn you Chrome.

2941
02:01:24,616 --> 02:01:26,446
OK, Google has been too
helpful for its own good.

2942
02:01:26,886 --> 02:01:30,266
So, Google is detecting what
we just did and is scrubbing

2943
02:01:30,266 --> 02:01:33,786
that apparently for us, which
is rather good and bad of them.

2944
02:01:34,136 --> 02:01:35,856
So, this was the effect
I was trying to create.

2945
02:01:35,856 --> 02:01:38,336
So, I very quickly open
up the Firefox instead,

2946
02:01:38,336 --> 02:01:40,296
which apparently doesn't have
this protection in place,

2947
02:01:40,676 --> 02:01:43,136
and this is not the
behavior we wanted.

2948
02:01:43,136 --> 02:01:46,766
But as soon as I click OK,
we should at least see some

2949
02:01:46,766 --> 02:01:49,606
of the behavior I expected
but not quite all of it.

2950
02:01:49,606 --> 02:01:50,656
Now, this is stupid, right?

2951
02:01:50,656 --> 02:01:53,496
You're an idiot if you're
trying to like trick yourself

2952
02:01:53,496 --> 02:01:55,036
into executing JavaScript
alerts.

2953
02:01:55,036 --> 02:01:56,646
Like this is not really
threatening anyone other

2954
02:01:56,646 --> 02:01:57,676
than myself.

2955
02:01:57,706 --> 02:02:00,016
However, if you think
about how we did,

2956
02:02:00,166 --> 02:02:01,646
notice what's in the URL there.

2957
02:02:02,016 --> 02:02:04,576
So, apparently you can
trigger these kinds of tricks

2958
02:02:04,856 --> 02:02:07,216
by typing an input
manually to forms

2959
02:02:07,216 --> 02:02:08,486
but that's the silly
way of doing it.

2960
02:02:08,486 --> 02:02:10,436
What if instead you are
bad guy and you're doing

2961
02:02:10,436 --> 02:02:12,776
like a fishing attacks,
sending people bogus emails,

2962
02:02:13,036 --> 02:02:14,416
and you're telling
them to click a link

2963
02:02:14,756 --> 02:02:16,626
and they don't necessarily
see the whole link

2964
02:02:16,716 --> 02:02:19,756
because it's hidden with
HTML email formatting.

2965
02:02:20,156 --> 02:02:23,366
But they click that link,
they get led to my page

2966
02:02:23,366 --> 02:02:25,016
and then some JavaScript
code executes.

2967
02:02:25,046 --> 02:02:26,946
Well, this too stupid
JavaScript.

2968
02:02:26,946 --> 02:02:28,696
Triggering an alert
is not hacking anyone.

2969
02:02:29,206 --> 02:02:31,516
But as we'll see in a few
weeks with JavaScript,

2970
02:02:31,516 --> 02:02:35,426
you also have access to a
user's cookies in JavaScript,

2971
02:02:35,606 --> 02:02:37,686
which means there are attacks
that we'll talk about later

2972
02:02:37,686 --> 02:02:40,576
in the semester whereby you can
steal someone session cookie,

2973
02:02:40,836 --> 02:02:43,246
high jacking their session
in the same way we discussed

2974
02:02:43,246 --> 02:02:45,266
on Monday with Firesheep
and Starbucks and the

2975
02:02:45,266 --> 02:02:48,826
like by having tricked the user
into typing or clicking a link

2976
02:02:49,106 --> 02:02:51,486
that it takes advantage
of this failure

2977
02:02:51,486 --> 02:02:53,386
to escape the user's input.

2978
02:02:53,666 --> 02:02:56,776
So, the fix here is
actually relatively simple,

2979
02:02:56,776 --> 02:03:00,626
if tedious in my code, you
never, ever, ever, ever,

2980
02:03:00,626 --> 02:03:02,686
want to trust what
the user has typed in.

2981
02:03:02,986 --> 02:03:06,466
So, the real way to echo user
input is something like this,

2982
02:03:06,466 --> 02:03:10,236
HTML special chars, which is an
annoyingly long function name

2983
02:03:10,236 --> 02:03:11,866
but it is a very good function.

2984
02:03:12,116 --> 02:03:15,986
And that it will ensure that
any potentially dangerous

2985
02:03:15,986 --> 02:03:19,356
characters, among them the open
bracket, which as you know,

2986
02:03:19,356 --> 02:03:22,476
demarks that start of an
HTML tag will be escaped.

2987
02:03:22,626 --> 02:03:26,726
So that now, if I go back and
resubmit the exact same form--

2988
02:03:27,206 --> 02:03:30,316
now, I look like the idiot
because I've typed in--

2989
02:03:30,476 --> 02:03:32,366
displaying exactly
what I typed in,

2990
02:03:32,366 --> 02:03:35,326
which you would think is the
expected behavior anyway.

2991
02:03:35,616 --> 02:03:37,756
So, one of the recurring themes
that we'll discuss not just

2992
02:03:37,756 --> 02:03:39,586
at the end of the semester
but throughout is how

2993
02:03:39,586 --> 02:03:42,496
to take advantage of
things like escaping both

2994
02:03:42,766 --> 02:03:45,296
for user input here,
for JavaScript inputs

2995
02:03:45,296 --> 02:03:47,896
and most importantly
for database inputs

2996
02:03:48,166 --> 02:03:52,596
so that ultimately you are not
vulnerable to attacks like this.

2997
02:03:52,776 --> 02:03:54,596
So, what did I do
to work around this?

2998
02:03:54,596 --> 02:03:57,366
In Firefox, notice my
URL is very different.

2999
02:03:57,426 --> 02:04:00,826
In Firefox, what URL did I
use to visit the website?

3000
02:04:01,556 --> 02:04:03,446
The same website.

3001
02:04:04,746 --> 02:04:05,976
Yeah. So, go ahead, what is it?

3002
02:04:06,016 --> 02:04:07,176
[ Inaudible Remark ]

3003
02:04:07,176 --> 02:04:13,676
Yeah, so this private
IP, 192.168.151.128.

3004
02:04:13,726 --> 02:04:14,406
Where did that come from?

3005
02:04:14,436 --> 02:04:15,076
Well, the CS-50 appliance,

3006
02:04:15,076 --> 02:04:15,976
the virtual machine
I've been running,

3007
02:04:15,976 --> 02:04:17,856
it's just the computer
on the internet.

3008
02:04:17,856 --> 02:04:20,536
I'll be the virtual one,
and because I'm running it

3009
02:04:20,536 --> 02:04:23,636
in the program called VMware,
which is again a hypervisor

3010
02:04:23,636 --> 02:04:25,786
that allows you to run one
operating system on another.

3011
02:04:26,036 --> 02:04:28,476
Notice in the bottom right
hand corner of the appliance,

3012
02:04:28,826 --> 02:04:31,396
there is mention
of my IP address.

3013
02:04:31,666 --> 02:04:33,456
And this can change
all the time.

3014
02:04:33,456 --> 02:04:34,856
VMware in this case is acting

3015
02:04:34,856 --> 02:04:38,146
as the so-called DHCP server
giving the appliance a different

3016
02:04:38,146 --> 02:04:39,886
IP potentially every
time I turn it on.

3017
02:04:40,226 --> 02:04:42,126
But this is just a
configuration we put here

3018
02:04:42,126 --> 02:04:44,756
to always remind the human
what IP address here she has.

3019
02:04:45,156 --> 02:04:46,526
So, what is the implication?

3020
02:04:46,756 --> 02:04:49,246
This is nice because it
means I can as I promise

3021
02:04:49,246 --> 02:04:51,886
on Monday minimize the
appliance all together.

3022
02:04:52,086 --> 02:04:53,976
Not even have to worry about
getting too comfortable

3023
02:04:53,976 --> 02:04:55,746
with the actual Linux
environment,

3024
02:04:55,746 --> 02:04:58,146
and I can just treat
this as a remote server.

3025
02:04:58,456 --> 02:05:00,586
Now, it's remote in
the sense that I--

3026
02:05:00,956 --> 02:05:03,186
it's remote as though
it's remote.

3027
02:05:03,186 --> 02:05:04,566
It's actually physically present

3028
02:05:04,766 --> 02:05:08,756
but I can still address
via-- an IP address here.

3029
02:05:08,896 --> 02:05:13,106
And if I'm on my own Mac or
my PC depending on your OS,

3030
02:05:13,106 --> 02:05:16,156
I can now just visit that
actual URL with the browser,

3031
02:05:16,356 --> 02:05:18,426
it says though I'm
visiting a remote server.

3032
02:05:18,536 --> 02:05:20,816
And if I'm really particular,
and I just don't like looking

3033
02:05:20,816 --> 02:05:22,666
at this address, what
I can do as what I did

3034
02:05:22,666 --> 02:05:25,666
on Monday whereby I can
open up the terminal window

3035
02:05:26,426 --> 02:05:32,336
and I can do edit etc
hosts, type in my password.

3036
02:05:32,336 --> 02:05:36,026
And then remember we did this
trick here so let me go here

3037
02:05:36,026 --> 02:05:40,996
and then I can do
davidsecretwebsite.com.

3038
02:05:41,486 --> 02:05:45,836
And now, because I've taught my
Mac to make the DNS association

3039
02:05:45,836 --> 02:05:49,066
for me, I can change this
to this, and now notice,

3040
02:05:49,066 --> 02:05:51,056
davidsecretwebsite.com is born.

3041
02:05:51,376 --> 02:05:53,406
I'll be at only on my
own local computer.

3042
02:05:53,686 --> 02:05:56,826
So, when I mentioned earlier
that you can do developments

3043
02:05:56,826 --> 02:05:59,086
on your own computer,
it's a wonderful way

3044
02:05:59,086 --> 02:06:00,436
of doing website development

3045
02:06:00,436 --> 02:06:01,846
because you can still
simulate all

3046
02:06:01,846 --> 02:06:03,906
of the realities
of HTTP and DNS.

3047
02:06:04,126 --> 02:06:06,086
But locally without needing
an internet connection,

3048
02:06:06,086 --> 02:06:08,526
without needing remote server
without having to pay anyone

3049
02:06:08,526 --> 02:06:10,926
for those services, you can
spend those months upfront

3050
02:06:11,226 --> 02:06:15,766
working at home and a café at
work all without needing any

3051
02:06:15,766 --> 02:06:18,356
of the physical infrastructure
that's typically associated

3052
02:06:18,356 --> 02:06:18,976
with the internet.

3053
02:06:18,976 --> 02:06:19,936
So, we're also introduce you

3054
02:06:19,936 --> 02:06:21,526
in the first project
to this approach.

3055
02:06:21,976 --> 02:06:23,546
But we've only just
scratch the surface.

3056
02:06:23,546 --> 02:06:25,906
So one, all I've been doing
is that going out input,

3057
02:06:26,006 --> 02:06:28,846
but clearly a website like
Facebook and Google take input,

3058
02:06:29,096 --> 02:06:31,636
it checks the inputs with
if conditions and else's

3059
02:06:31,636 --> 02:06:32,876
and the loops in and what not,

3060
02:06:33,186 --> 02:06:36,006
it does like writing
things to data bases.

3061
02:06:36,236 --> 02:06:39,726
And it would also be nice to
to move away from what seems

3062
02:06:39,726 --> 02:06:41,226
to be a very sloppy start.

3063
02:06:41,226 --> 02:06:43,906
Whereby, we've been running
HTML and then I kind of dropped

3064
02:06:43,906 --> 02:06:47,196
in to PHP mode very quickly,
then went back to HTML.

3065
02:06:47,376 --> 02:06:48,866
This is not going
to scale very well.

3066
02:06:48,866 --> 02:06:50,546
So, if you're coming to the
course with the background

3067
02:06:50,546 --> 02:06:55,266
in ASP or JSP or even Django
or Rails, there are ways

3068
02:06:55,266 --> 02:06:57,546
of cleaning up our codes so
that we can practice some good

3069
02:06:57,546 --> 02:07:00,376
principles like, let's
keep presentation separate

3070
02:07:00,376 --> 02:07:01,086
from our data.

3071
02:07:01,086 --> 02:07:03,536
This is one of these mantras
that makes good sense especially

3072
02:07:03,536 --> 02:07:06,416
for large projects where
you keep your HTML separate

3073
02:07:06,416 --> 02:07:08,336
from your CSS, separate
from your JavaScript,

3074
02:07:08,516 --> 02:07:09,546
separate from your data,

3075
02:07:09,656 --> 02:07:11,996
separate now from
your PHP codes.

3076
02:07:11,996 --> 02:07:14,276
So, even though tonight
we've started to dive

3077
02:07:14,276 --> 02:07:16,486
in with this commingling
approach, and on Monday,

3078
02:07:16,486 --> 02:07:18,326
we'll do some more of
the same, we'll also look

3079
02:07:18,556 --> 02:07:20,866
at some common paradigms
among them, MVC,

3080
02:07:20,866 --> 02:07:22,936
Model-View-Controller,
where you can really start

3081
02:07:22,936 --> 02:07:26,566
to separate these
things into more complex,

3082
02:07:26,566 --> 02:07:31,096
more sophisticated, rather more
clean redesigned applications.

3083
02:07:31,536 --> 02:07:32,946
But for now, why
don't we go ahead

3084
02:07:32,946 --> 02:07:34,306
and adjourn here officially.

3085
02:07:34,306 --> 02:07:35,946
We'll take a 5, 10-minute break.

3086
02:07:35,946 --> 02:07:36,826
Peter we'll get set up, if you'd

3087
02:07:36,826 --> 02:07:38,606
like to remain per
section by all means do.

3088
02:07:38,836 --> 02:07:40,386
Otherwise section will
be filmed as usual

3089
02:07:40,386 --> 02:07:43,236
and be placed online
by sometime tomorrow.

3090
02:07:43,466 --> 02:07:45,126
And I'll linger around
for one on one questions.

3091
02:07:45,256 --> 02:07:46,926
All right, we'll
see you on Monday.

3092
02:07:51,016 --> 02:07:51,083
[ Silence ]

3093
02:07:51,083 --> 02:07:51,150
END

