[ Silence ] All right. Welcome back. This is Lecture Eight, Security. Only one lecture to go after this. So summer is almost over and tonight we start talking about all of the dangerous things you might currently be doing in your code, specifically in the context of PHP, and MySQL, as well as in JavaScript. So of the topics we've discussed thus far this semester what are some of the possible threats that we've encountered, possible security issues we've encountered? What comes to mind? Ben? 

SQL injection attacks when typing in scripts as [inaudible] parameters. 

Okay, good. So SQL injection attacks which we'll see a concrete example of today. Typing in Script, JavaScript code into like a form and accidently tricking the user into displaying that on some webpage or executing that on some webpage. We'll see an example of that as well. Other possible threats? Isaac is looking to Axle for an answer. What do you got Axle? 

It's not really a possible threat but if you don't encrypt your data over HTTPS it's going to open for anybody to see and hijack your session and all of that. 

Okay, good. So back in lecture zero we talked a bit about HTTPS, or SSL, and how not using it obviously means your traffic is completely unencrypted and what the implications of that are, particularly for what kind of information? What kind of information is at risk if you're using HTTP versus HTTPS? Yeah? 

Well your data is really sent unencrypted so anybody on the same network can see that connection and get all the information that's sent. 

Okay, good. So anyone on the same network can sniff and read any of the data that you're transmitting, and frankly on the entire Internet, right? Anyone between you, point A, and say Amazon, point B. If Amazon's not using HTTPS at all, in theory anyone with access with any of the various routers in between point A and B count sniff that traffic. So it doesn't just have to be wireless connectivity nearby you. All right, so let's try to flesh some of these things out and most of you probably aren't familiar with a protocol called Telnet. You might be familiar with a protocol called FTP. Does someone want to pluck off one, or either, or both of these -- what these protocols, Telnet and FTP are or are used for? Isaac? 

Well, I know FTP, like file transcript protocol, means to send like files from one computer to another. 

Good. So, file transfer protocol is FTP, it's used to transfer programs, files, anything from one computer to another. There's a got-you with it, it's that it's completely unencrypted, and yet many webhosts support only FTP, or many webhosts offer FTP but don't really warn customers of the potential concerns with it. So what does it mean concretely if you're transferring stuff via FTP? Right, if you have a webhosting account, maybe it's DreamHost, maybe it's someone else, by nature of the content you're uploading you want the whole world to be able to see your GIF's, and JPEG's, and HTML files anyway. So what's the big deal if moving the files from your computer or even your appliance from this remote server is all unencrypted? Axle? 

Well, say that you send more important stuff than just images. Say you send the PHP configuration file. 

Good. 

It contains the database name, username, password. And once somebody has that they could essentially log into your database and get everything. 

Yeah, absolutely. So if you're sending more than just publicly accessible content like, HTML files, JPEGS, images and the like, but rather you're sending things like your PHP code, which has your intellectual property, or your configuration files for PHP, which might have your database usernames and passwords, and God knows what else, well now you're sending this completely in the clear for anyone on the Internet, or anyone in Starbucks, anyone between point A and B to potentially intercept. And even worse than that, what also is going in the clear if you're not using any kind of encryption when you use FTP? Axle? 

I think you're login credentials for the actual server. 

Exactly. Which is the absolute worse part of the whole story. Which is that your username and your password are sent completely in the clear, which means who cares if everything else is sent in the clear, anyone else can just login into the server after the fact and do whatever they want with your account. So in short, if you end up signing up for some commercial webhost, if you end up supporting your own server locally, there is no reason today to use FTP unless it's maybe on a local isolated network. But even then it's just bad practice because you can use quite easily an encrypted protocol, one of which we're about to see, one of which isn't mentioned here, but it's called SFTP, secure file transfer protocol, which actually has encryption. And these days there is really no good reason not to use something like SFTP. So Telnet is a little more dated, but it's still used for various things. It's a protocol that's used to control one computer from another, whereby if you think about your terminal window within the CF50 appliance. When you open a terminal you have a black and white command prompt. Well Telnet is a protocol that allows you to access a black and white command prompt like that but from another computer. So I could be sitting at home and I could use Telnet back in the day to connect to a Harvard server and then have a blinking prompt on my window -- on my screen, but that computer that I'm actually controlling is somewhere else on campus. Or conversely, in theory you could telnet, as a verb, from your own laptop to the appliance in order to get a prompt, but these days you would instead use something called SSH. Indeed the appliance does not have Telnet support enabled, you cannot connect to it insecurely via Telnet, rather you have to use SSH, which encrypts the traffic. Even if you're just trying to connect from your Mac or PC to the appliance in order to pull up that terminal window. And though I keep describing it as a black and white window, of course some of you might have noticed that it supports color in theory with various programs, but it's still a command line interface ultimately. So, HTTP, this one we've definitely talked about. What kinds of things are sent in the clear when you're using just HTTP and not HTTPS? What kind of stuff is at risk? How about Jack? You have the look of an answer on your face. 

[Inaudible] together here. Put me on the spot here. 

I know, I have time. It's okay. Come on. We can do it. HTTP, you have -- it's completely sent in the clear. It's used by a lot of websites. So what kind of stuff might be sent in the clear? [ Inaudible Speaker ] Okay. Session, what's a session though? 

The very same file that is on your computer that tells a website or a webhost that you are the one currently logged into them. 

Okay. 

And someone can steal your session and then go and pretty much login as if they were you. 

Okay, good. So recall that we discussed this sort of higher level concept of a session that's incarnated in PHP with the superglobal called $_SESSION that it gives you the illusion of having really a sort of persistent connection to some user even though they might be visiting you with a browser every few seconds or every few minutes. Nonetheless, you're still provided with the storage even though HTTP itself is stateless. Now you're not technically transferring the session back and forth across the Internet via HTTP. What are you actually transferring across the Internet via HTTP that enables sessions to exist? 

It's some sort of session ID which is a really long stream of letters and numbers you can't really guess. 

Perfect. So it's a really long sequence of letters and numbers and this is specifically an example of what? The session idea is implemented with what feature of HTTP? Or how -- where is this sent? Let's see, someone else? So like I've totally -- I agree that HTTP involves sending this unique identifier that somehow implements sessions but how? If we opened up that virtual envelope where in the contents inside would it be -- this ID? Or what's it an example of? No one wants to make eye contact. Axle? 

I mean sending -- it's send in the headers as the session cookie. 

Okay, good. So it's sent in the headers as a session cookie, or just as a more general notion of a cookie. Right? There's a set cookie header that a server sends to a browser and the set cookie header allows you to set a value for some key and in the world of PHP that key happens to be called by default PHPSESID, in all caps, but that's totally arbitrary. And the server meanwhile sets a value for that key which is, as Jack says, a big random sequence of letters and numbers, and then every time the browser revisits that same website that it originally sent it, the set cookie header, that browser is essentially reminding the server, "I am this ID. I am this ID." And weeks ago we kind of likened it to a handstamp that you get at a club or an amusement park. So that it permits you access even after you're first time through that gate so that you can kind of come and go as you please. Because that handstamp is sort of reminding the guy at the gate that you have been here before and in this case it's being even more specific, it's reminding him who you actually are, not in terms of your identity but in terms of your unique identifier. So HTTP is dangerous in terms of its lack of encryption because the session ID, if it's being sent back and forth across the wire, it doesn't contain itself private information, it doesn't contain your username, your password, your credit card information. Because again, as Jack said, it's a big random number or big random sequence of letters and numbers, but the catch is if you're not encrypting it what's the implication if someone sniffs it and steals it somehow by listening to your traffic? Axle? 

Well they can take your session and pretend that their computer is actually my computer running the same session. 

Exactly. 

[Inaudible] website, that website is going to recognize that website and it's going to say, "Hey, you're already logged in." It's going to log them in and it's going to display the login -- the secure side of the page. 

Exactly. Right. If all it takes to remind a server who you are is presentation of this big random sequence of letters and numbers and a bad guy is able to steal that sequence of letters and numbers from you, by just listening in on your traffic wirelessly, or even poking around on your computer, and then copying that big random sequence of letters and numbers. And then he is smart enough to know how to configure their computer to transmit that same cookie value with a cookie header, well he or she can just pretend to be you and the server doesn't really know the difference. So we'll come back to how we might mitigate this, but that's one of the key shortcomings of just using HTTP for anything sensitive, not to mention the fact that if you're submitting a form that has your username, your password, your credit card number, if the site is itself not using HTTPS and all of that stuff too is going in the clear, and so more private information could indeed be taken in that case. MySQL meanwhile is similar in spirit to the rest of these protocols in that it itself is not encrypted. Recall we talked briefly a couple lectures ago that generally you want your MySQL server sitting on the same network as your web servers, even if it's on a different physical machine so that your traffic is only going over your own local network and not connecting to some remote database server elsewhere. So, what are some of the problems we did solve though thus far? Well, recall there's this feature in the appliance, which I think I mentioned ever so briefly, or maybe not even, suPHP, substitute user PHP. Oh we did have this conversation. Recall that one of the problems with a web server in general is that if you are running that web server as root, that's very bad, and thankfully it's rarely done these days. Why is it bad to run a web server, which is just a program that serves up webpages, as the administrator account, so-called root? Jack? 

If anyone ever breaks into the account via some PHP hack or something they can literally wreck anything on the server. 

Exactly. 

[Inaudible] has root access. 

Perfect. So if the web server is running with administrative privileges as the so-called root user and that web server is executing buggy code, that you or someone else wrote, buggy in the sense that there's something stupid in there that lets a user execute some command. The scary part here is that who's going to be executing that command if a bad guy is taking advantage of that bug or vulnerability in the software is going to be executed as root. And root unfortunately generally has privileges to delete everything, download anything, change usernames, passwords, install anything. It's just there's really no constraints on that particular user. So, running anything as root is generally bad because again, if what root is running is vulnerable to being taken over by a bad guy, or tricked into executing some arbitrary command, well then that command's going to be run as root and at that point who knows what the bad guys has done to or with your system. Axle, question, comment? No? Okay. All right, so we can fix this by running a web server under a different username. Something like Apache, or HTTPD, or some systems run it as literally an account called "Nobody." And in all of those cases the user in question does not have administrative privileges. So the worst thing that can happen if you're running your web server as Apache or as HTTPD, as the appliance does, is that the only account that can be compromised, the only account whose files can be deleted, the only account that can have some damage done to it is the HTTPD user. Now that's not great because that means a bad guy could take down your entire web server, or delete the logs for the web server, or any files that the HTTPD user owns, but at least that's one user you can just remove the whole web server account, you can blow away all those files because root hasn't been compromised and you can reinstall. But there is a problem. If your files are being read and executed by Apache, the web server, what do you have to chmod your files to be in that case? If you are someone like J. Harvard, or Axle, or Isaac, if you have your own user ID and yet the web server is running as a different username, like HTTPD, what do you have to set the permissions on your own files to for this to all work? Previously it didn't matter because root can read and write any file so it doesn't matter what the permissions are. He has unfettered access. But HTTPD wouldn't. So what do you have to chmod your files usually? And chmod remember is the command for changing the mode of a file which means the permission, 644, 711, 700, A plus R, A plus X, whatever the case may be. And don't worry if you don't remember the codes, but in words what kind of permissions do your files need? Yeah? 

This is probably not the best solution but you can set all plus read on a group of all the users that are allowed access, including Apache. 

Okay, say that once more. Set what to read? 

All plus read. 

Oh, okay. 

On the group -- on a group, like students or [inaudible] on a web server. 

Okay, good. 

[Inaudible] all the accounts that are allowed to access that file, including Apache. 

Okay, good. So in theory you could put all of the users into a group called students or something like that, and make sure that Apache is in the same group, and then you can give the group read access, and the command for this, recall, is not A plus R, but would be G plus R in this case, chmod G plus R, group plus readability, right, to whatever file or directory is in question. All right, so that's not bad. It's a little more work and it's a little weird I would say that you have all of these students for instance or all of these customers in a group and the web server who is not a customer or a student in that same group. It's a little weird, but possible. But if we don't like that we could just do A plus R, all plus everybody can read the files, and that seems reasonable right? Because if my GIF's, and JPEG's, and HTML files are on the Internet they're meant to be read. What's the big deal about giving read access to all students or to Apache and other accounts on the system? Jack? 

Well that means if I really wanted to I could find a way to easily see your plain text PHP without having to go through any hoops. 

So in this case if you're on a shared webhost and you're a customer, someone else is a customer, someone else is a customer, and all of you have your own websites, which is common on a virtual hosting environment like, DreamHost or the like, but the web server by necessity needs you to chmod your files to be world readable, those files then are going to readable by anyone else on the system. For instance, if Axle has chmoded his files to be world readable just because Apache needs them to be. Well if Jack if a malicious user on the system and knows Axle's username he can essentially start poking around his around using CD or LS or the various Linus commands with which you could do this, and see Axle's PHP files, inside of which might be passwords, usernames, and so forth. So it doesn't feel ideal, it feels like we're giving too much access to the world here. Yeah? 

But the PHP source would never be distributed over the Internet, right? Because it's actually is configured never to display the PHP source unless there's an internal server error. 

Good. Correct. So, in this case your PHP code is not at risk for being spit out on the Internet without being interpreted. The threat here is that Jack is just another paying customer on the same server. So at least it's not billions of people, who could potentially see your code, but it's at least a few more malicious users or just curious, nosy people who are poking around the account. So thankfully there do exist protections against even this, and even then the appliance has this built in even though it's not strictly necessary if there's only one John Harvard and the appliance isn't on the whole Internet. But the principle is the same in that suPHP and other software like it allows you to specify that the username that should be used to execute this PHP file should be jharvard. Shouldn't be HTTPD, definitely shouldn't be root, it should be whoever actually owns the file. So the idea here is that when you are running Apache in the appliance, which I'm currently doing as, is anyone else running this version of it, and I'm going to run a command that we haven't run before but just to poke around, ps aux and then grep httpd. So this is a fairly cryptic sequence of symbols that simply gives me a process list, ps, with a bunch of flags which means show me everything. The pipe means send the output of ps to the command grep, and the grep command is like a find command. So I'm saying spit out the process list, all the running programs on the system, like the activity monitor or the task manager in Mac OS and Windows, respectively. And then pass that output to grep and search for httpd. And I'm going to hit enter and what you see here is, oh, I lied. The username that's being used is not in fact Apache it is -- sorry, not in fact httpd, its Apache. So in this case each of these rows says that there's a program called httpd running on the system, and that's to be expected, right? The appliance runs a web server, that's how project zero, project one works in the appliance. There seems to be a whole bunch of them, but more on that in a moment. But if I scroll to the left you see in the left most column who the web server is running as here and almost all of those are as Apache, and it's those rows that are going to be used to actually execute user code or serve up user files. But you don't see jharvard's name, but that's fine because notice what's going to happen here if I go into my vhost directory and my appliance directory. Here's all of our examples from last time. I'm going to do a quick and dirty demo here, demo.php. And I'm going to say echo "hello"; just as a quick quick test. Okay, and let me zoom out. Let me open by browser, and let me go to http://appliance/demo.php. Okay, so now we're -- this is just sort of, we did this weeks ago. So now let's do something a little more interesting. Let's so echo 'whoami; Now notice I'm using back-ticks mean execute the command called whoami. So whoami is a program on the system, and I can demonstrate this at a command prompt. So let me pause that program and run whoami enter, and I'm indeed jharvard. If I go back in here, the fact that I'm putting whoami in the PHP program means that when this file is interpreted it's going to inform me who is interpreting the file. And so the litmus test here is, is it root? Is it apache? Or is it jharvard? Hopefully its jharvard, otherwise suPHP is not in fact enabled. So let's go down here to the appliance again. Let's reload, and indeed I'm running the web server as Apache. And if I weren't suPHP we would instead see the username Apache in this case. As an aside, too, another thing that's useful diagnostically when you're setting up your own webhost, which some of you might want to do after the class ends, there's a function in PHP called phpinfo. Generally you would not write a program and then make it available on the web that echoes phpinfo because this dumps all of the details of the current php installation, including its version number and all of the various modules that are installed. But if I go up here and click on reload this is what phpinfo spits out. It deliberately spits out a whole bunch of HTML that's crazy cryptic looking at first but this configure command essentially tells you how the people at Fedora, who oversee this operating system, decided to compile this version of PHP. So PHP itself recalls a program, it's an interpreter; it's an executable program that happens to read or interpret other programs. It itself is written in C, or C plus plus most likely, and this is the command with which they compiled PHP from source code into its binary. And all of these various flags essentially tell you what features are enabled. But there's an easier way to parse this. If we scroll down we see a whole bunch of stuff. For instance, I mentioned weeks ago php.ini is the configuration file that's typically used, and this output is confirming as much, that the config file we're using is in the etc directory, called php.ini. We have, let's see, what are other particulars here of interest. You can see that there's apparently built in bz2, which is compression support, some kind of calendar support, some kind of -- let's scroll down -- woops. Let's scroll down even further to -- let's see if we can find this. DOM support and so forth. So PHP has a whole bunch of modules that you can add optionally, and long story short this output just informs you what's actually there. So this is useful because sometimes when you're using commercial webhosts they might have certain features on that you did not have on in the appliance. They might have certain features off, and running this, say, on your local machine, your Mac, your PC, your appliance, and then also running this command on the remote webhost, like DreamHost, will give you a sense of what the differences might be. For instance, there is a feature of PHP called Magic Quotes and this is largely been disabled these days, but Magic Quotes did this. Anytime a user used GET or POST to send input to a PHP file, PHP, if Magic Quotes were enabled, would very presumptuously escape all of the quotes in that input. So anytime there was a quote PHP would automatically put a backslash there. The upside of this is that if you then insert that into your database you're already safe, for the most part. Right? Because all of the potentially dangerous characters have been escaped. And we'll come back to SQL injection attacks in a bit. The problem though is that if you then call MySQL real escape string, or use PDO, or the equivalent, it's going to escape the escaped characters. And so a symptom that just the other day someone was seeing was that her code was spitting out "/" marks all throughout her website, and it was because not only was she escaping user input, as is good practice, the web servers presumptuously doing it and do just a lot of bad things happened aesthetically. So in short, this is not a good feature where you should outsource your security to the web server, you should be doing this yourself in code. So this is something too that can be helpful diagnostically. And if I search for this, let's see if we can find it in here. Magic -- yeah, there it is here. So, enable-magic-quotes, but we've disabled it elsewhere in the configuration file even though Fedora enabled it by default. All right, so finally, suPHP then ensures what? So, if your PHP files are executed as you, jharvard, or Axle, or Jack, and you screw up, and you write buggy PHP code that somehow allows someone on the Internet to trick you into running commands, not just whoami, but maybe the delete command, whose files, whose accounts are at risk when using something like suPHP? Jack? 

Only their own. 

Only their own. Right? You can't affect other customers, you can't affect the root account, you can't affect the web server account. So in general this is a very good thing. Meanwhile, files like images and CSS files and HTML files, those are just served up, not as you, but as Apache itself because it doesn't matter. Those are static files; they're not programs being executed. So suPHP really just applies here to PHP files. All right, so you guys proposed cookies earlier as a potential threat and here's one such example. So these are HTTP headers. Two hundred at the top signifies what? Don't say okay. Yeah. 

Everything -- well, it's really okay, but that the page has been received by the client and there's no error [inaudible] server side. 

Okay, good. So it indicates that there was no error on the server and that everything is well, it's been received okay, and indeed everything is okay. So this is in contrast to something like 404, 401, 500, all of the various numbers that we sometimes see on our own or other people's websites when mistakes have happened. But here, 200, you rarely see and in fact you'll only see it if you take a look at the HTTP headers, because it means all is well. So we see some other information. Date of the current web server, when this response was made, the server's name and version number, X-Powered-By. Why are these included in the headers? Axle? 

Really no reason other than [inaudible] and just Apache selling their product. 

Which is free, in fairness. 

Yeah. No, no, no, but they are telling people about it and it could be a potential security risk because you are telling people what version of PHP you are running. 

Good. 

So if, for example, we saw an older version in the PHP and later we had discovered like a flaw in that, something that you do with that, you would know that, well this server's vulnerable and now you can search the Internet for those servers. 

Exactly. So one is branding, that's why it's there for the most part. But two, the downside of this is that you're telling the whole not only what you're running but what version of it. So as Axle says, if there's somehow a flaw discovered in PHP or in Apache, and it's in a specific version of one of those because it was introduced accidentally at some point. Well now, you've just told the whole world that hey, I'm vulnerable to this error and if someone is kind of like aggregating that information and hanging onto it for a rainy day the moment the world realizes that oh, PHP 5.3.3 is buggy, let me go ahead and wage my attack using the list I gathered in advance by poking around on the Internet to compromise those servers. So you're just making it unnecessarily easy for the adversary. So this kind of stuff is typically on by default but where can you disable something like the Apache line? It's version number? Where would you go? So if you agree that this is not necessary and not great how do you go turning this off? Jack? 

Isn't it somewhere in the httpd.conf? 

Good. Yeah, so there's this file we keep referring to httpd.conf. It's generally somewhere in the etc directory, the et cetera directory, and you just have to find the appropriate line there that has to do with OS tokens, which will reveal whether or not -- tokens, whether or not this will be displayed. And how about something related to PHP, X-Powered-By, how do you get rid of that? Axle? 

[Inaudible] to the php.ini. 

Yeah, exactly, php.ini, the config file for PHP specifically, there's a directive in there called expose php, which by default is on you just have to change it to off and then restart the web server. All right, so more interesting is this expires date. This is kind of weird, right? I definitely didn't make this example in 1981 and yet for some reason there's mention of 1981. Thursday, 19 November, 1981 in my headers, for some reason. Expires. Why is this here? Frankly, the Apache version two and PHP 5.3.3 did not exist in 1981, let alone the web. At least in this form. What could that possibly signify? Yeah? Scott? [ Inaudible Speaker ] Okay. So in this case it's actually not the expiration date for the cookie, though that's on the right track, but it's the expiration date for the page. So if a browser and server are trying to optimize so as to not re-download this content unnecessarily if it hasn't changed, the expiration here is telling the browser that this page expired in the past, which means you should always re-fetch it. And this is actually just kind of a stupid convention; essentially what most websites do is if you want to disable caching or try to disable caching of things like HTML files that are generated by PHP, or really whatever file this is referring to here, you specify that this page expired like 10 years ago. Right? So then you don't do like a minute ago, you don't do an hour ago, just in case there's a bit of clock skew. You choose something that's really far in the past so that when the web server -- when the browser receives this it's going to realize wow, this page is really old, the next time you request the same URL I'm going to definitely request it again and not go to my cache. So not all browsers historically have respected all of these things, so we have an additional header here, Cache-Control: no-store, no-cache, must-revalidate. All these possible directives trying to really discourage caching. So in general you'll find that this is a combination of techniques that people use for various browsers. Pragma: no-cache is yet another header that's meant to further discourage caching. At the end of the day the browser can still do whatever it wants, so these various headers exist really to really encourage the browser to cooperate and not cache. But there in bold is our Set-Cookie header, PHPSESSID, big random sequence of letters and numbers, path, and Set-Cookie: secret equals 12345 is not a session cookie, but what are some of the takeaways here? So one, the set cookie here does not seem to have an expiration time associated with it. It's not -- there's no mention of seven days from now, an hour from now. So what's the implication? Axle? 

Well, it's going to live forever so anybody finding your computer in a week will be able to see the cookie and anyone -- yeah. 

So, careful, it's actually the opposite in this case. So PHPSESSID -- so when you set an expiration to zero whereby you don't have an expiration that actually means the opposite which is that this only going to live for the life of the browser. So as soon as you close your -- quit your browser or even worse, restart your computer, that session cookie is going to be lost. The server might still have the contents of your session stored around in a temp file or in a database, but your browser is not supposed to resend this cookie once you have actually quit the browser. Now, changing tabs, sometimes the behavior is not quite predictable but generally until you quit your browser session cookies might linger, but a session cookie by definition is meant to live only for the life of the browser actually running. When you quit it, it should go away. By contrast if you actually saw an expiration date next to path here, path is slash, just signifying the root of the web server, then you could specify that this cookie is in fact good for a week, a month, a year, and you could typically do that yourself if you wanted to remember something for some amount of time. So the second Set-Cookie line here is just stupid. It seems that the programmer of this webpage has specified a secret key of 12345. In other words, feels like the website is trying to remember your password by storing it in a cookie. So what is bad about this? Well one, there's absolutely no reason as we've seen to store the user's password in a cookie. It suffices in PHP through all of our login examples just to use the session and just remember the user by way of this random identifier. You don't have to have the user send you his username and password, again, and again, and again. So this would be indicative of bad practice or at least an opportunity now for a bad guy to kind of do something malicious with that. Yeah? Axle? 

I have a question. When I work with cookies [inaudible] specify which domain and sub-domain the cookie was valid in, so not specifying [inaudible] that make the cookie valid in all sub-domains of the root of the path? 

Exactly, in this case. Yeah. So, if we had visited foo.com, the cookie is valid for foo.com and anything above it. So dub dub dub foo -- bar.foo.com. or the like. 

Would that be a security threat as well if you were running a big site that has multiple sub-domains? 

It's a good question. Potentially yes. If you're running a big site with multiple sub-domains, or different applications, web applications running at different sub-domains, absolutely. Generally you should put cookies in the most narrowly defined cookie space as possible. So if you have a website that is, again, foo.com, and you have a.foo.com, b.foo.com, c.foo.com, all of which are different applications maybe with different users, different functionality, then you should really be setting your cookies in a.foo.com, or b.foo.com, and nothing should go in foo.com itself. Really good point. You can also mitigate this in some part by, as we saw with mod rewrites, the ability to redirect the user to different URL's. If you want to standardize not on CS75.net, but dub dub dub.CS75.net, you can ensure through a redirect mechanism that you're only planting the cookies in the DUB DUB DUB version ultimately, which can be useful. All right, so let's push a little harder on this cookie issue in session hijacking. So session hijacking, again, to be clear, refers to the process of someone stealing your cookie somehow, whether by sitting near you in Starbucks, or having access to the routers on the Internet between Points A and B, and then presenting it to the world as their own. Now how can go about hijacking someone's session? Well physical access. If you have physical access to someone's computer how do you go about finding their session cookie? Yeah, Jack? 

You find their cookies folder and you take whatever is in there. 

Yeah, exactly. You can poke around in some operating systems, in some browsers, literally to a folder somewhere on the hard drive that contains cookies. Now in fact, the folder tends not to contain session cookies because it tends to contain persistent cookies that have an expiration that's not zero, but in this case you can probably open like, about colon cookies, or about colon history. Generally in all browsers you can start poking around your own history. So if you have physical access to your sibling's or your roommate's computer and they've left it unlocked there's really not much of a barrier between you and their cookies. It might take some technical know-how but frankly if you Google how to find cookies in Firefox or the like, I'm sure someone has posted how you can go about finding cookies in various browsers. Useful for diagnostic purposes, a little scary if you're vulnerable to having your session hijacked. Packet sniffing. So we talked about this earlier, whereby it's really not all that hard to download free software these days that just sniffs wireless traffic in Starbucks, in this room. Anytime you're not using something like WPA2, the encryption protocol that's used by a lot of wireless routers these days to encrypt your traffic, well then anyone's sitting near you right now could be sniffing your traffic and stealing your cookie from any website that's unencrypted. So session fixation is just a necessarily fancy way of saying hardcoding a session ID as your own. Now in theory you could just guess someone's session ID by picking a random sequence of letters and numbers, but the reality is, as you saw, these things are so long that's going to take a huge amount of time for you to try guessing all possible session values -- session keys and that's why they're so long. But as soon as you found it maybe by sniffing or physical access, session fixation just refers to the spoofing of your own cookie as by writing a program or downloading some program that says here, use this cookie as my own and not what the web server gave me. And then XSS, cross site scripting attacks, we've discussed these a couple times, and we'll see in just a little bit an example. So how do we mitigate these threats? Like this is a bunch of scenarios, all of which lead to a bad ending for me whereby my session's been hijacked. What kinds of defenses do we have against these various scenarios? Axle? 

Well, physical access, you don't really have that much defense against. Once somebody has your computer they can essentially do everything. 

Okay. 

But the packet sniffing and session fixation, that could be fixed by sending the data encrypted by -- 

Good. 

-- HTTPS and the XSS could be fixed by being really thorough and escaping everything that the user sends [inaudible]. 

Okay, good. So in the case of packet sniffing especially, just using HTTPS goes a long way, because HTTPS is end-to-end encryption between points A and point B. And in this way you're ensuring that you're cookies are among the things encrypted so no one can see actually what you've encrypted in that scenario. All right, but HTTPS, the website doesn't offer HTTPS, so what if I just instead turn on the encryption feature of my wireless router, in Starbucks or in my home, or even on Harvard's campus? What if I just turn WPA2 so there's a little padlock icon then next to the name of the router in my list on Mac OS or Windows? Does that solve it? Yeah? 

Well anybody who's ever done a trace route sees that the actual packet is sent through multiple [inaudible] and many networks before it actually arrives, so anybody on any of those networks can essentially do it even if they don't have -- and if they don't have WPA2 they can sniff it. 

Okay, good. And to be clear then, when I turn on something like WPA2 and connect to an access point, a WiFi access point that requires a password, for instance that one right there on the wall with the green blinking light, where is my data encrypted, between what points? Yeah? 

Well, your computer and the router. 

Good. And where is it not encrypted? 

Anywhere else. 

Good, between the router and anywhere else. All right, so good. So, what's the implication there? Well, you're protecting yourself against someone in Starbucks but you're not protecting someone who's somehow sitting between you and point B, whether it's some random staff member in some facility that has a router or it's someone in Amazon's area. In short, you don't have true end-to-end encryption. So, better, but not great. And XSS we'll come back to in just a bit as to how to try protecting that. But there's some other things I propose here. Hard-to-guess session keys. So frankly they're already pretty hard to guess but clearly PHP long session are not a complete protection, because again, as soon as you sniff then you can just copy/paste. So making them even longer is probably not going to gain as much. What about rekeying the session? In other words changing the user session ID, changing the value of their PHPSESSID cookie every few seconds, every few minutes, every request, whatever, just change it once in a while. Good? Bad? Axle? 

I see a potential risk for that. Wouldn't you be sending the session -- wouldn't you be sending it many times so that somebody on the network could see well, this computer is every minute sending out a session ID and I know it lasts for a minute, so they'll see. If you just send it once then it might be harder for somebody to actually find that in [inaudible] in all the traffic. 

Okay. 

If you send it continuously somebody might notice that and that might be a security threat, even though it only lasts for like a minute or whatever. 

Okay, good. So rekeying really just means sending a new set cookie header, but the problem is if you're worried about people sniffing your keys and that's why you're changing the key, well if the threat is that they're sniffing your keys you can change it as much as you want, the bad guy's still just going to sniff the new ones. So you might be making his life more annoying in that he has to constantly stay up to date with your latest key, but actually there's a worse scenario here. If the web server is changing the key -- 

Well if the person, whoever's trying hack in, sniffs your cookie or your key, sniffs your session and takes the session and they log on in some period of time while it's expired, the session, they get your session ID and they also make it so that you can't [inaudible] on with that same session. 

Perfect. If you rekey the session but the bad guy's already sniffed your initial ID, spoofed his own as yours, so as to log into your Facebook account or whatever, and now the server decides okay, it's time to rekey to prevent bad guys from getting in, whose going to get the new key potential first? The bad guy, at which point your cookie no longer valid so you've effectively just been logged out of your account. So in short, rekeying while it may seem like, oh, this is a good way to sort of dodge the adversary, doesn't really fundamentally help us because it's still sending the rekeying over the same insecure mechanism. Encryption, this is actually pretty decent solution, and using at least something like WiFi with encryption enabled so at least you're not vulnerable to random people who are sitting in the airport or Starbucks or bored near you. More likely than not someone near you cares a little more about your data, roommate, sibling, or whatnot, than some random person on the Internet. So at least that raises the bar, but even better would be HTTPS. Unfortunately not all servers support that and indeed it was only a year or so ago that Google started offering it for Gmail by default, and Facebook only a few months back started offering SSL support as well. So, not all websites have done it. Because what's involved in SSL? How do you make your site run on HTTPS? What do you need? Yeah? 

Well you buy a certificate that a person sells and the way that people sell certificates is through a big chain of trust. Yeah, somebody trusts somebody and all the vendors, so that sells certificates, they're backed up by bigger players, big companies. So that's essentially how it works. And it's not really -- it can be assuring for the user but it's mostly just getting the green bar and then an image say certificate, like [inaudible]. 

Okay. Good. So, in short if you want to enable SSL on your server, one, you have to buy -- or you should buy a certificate from someone reasonably reputable. You don't need to break the bank by paying for $1000 VeriSign certificates. Generally $50 or $100 ones from places like GoDaddy or the like suffice. But you do have to buy this because you need someone else to endorse that fact that you are who you say you are. In other words, if I go to GoDaddy and by a certificate before they give me that certificate they're going to send an email to the email address of the person who bought that domain. Hopefully it's me, because if that's the case they're going to send an email to the email address with which I bought the domain, that email account I'm going to check, I'm going to see, oh, someone is trying to buy an SSL certificate for CS75.net, is this okay? Click this link to approve. And if indeed I've received that email I can approve it. By contrast, if Axle owns some domain name, like Axle.com, and I want to, for whatever reason buy an SSL certificate for his domain. Because maybe I'm trying to trick users into visiting my website by calling it the same domain and therefore, I want to trick them even more into thinking it's secure by buying an SSL certificate for his domain name, he's going to get that email confirmation because he bought the domain, he gave them his email address, and unfortunately they're not going to hand me the SSL certificate until he confirms. And unless he actually confirms and isn't reading the email, well then I'm going to get that certificate after all. So once I do that I install it in my web server. We saw a while back in http.conf you have to specify the file name of the certificate that you've downloaded as well as your so-called private key, which is a number that you have generated. Because at the end of the day recall that SSL boils down to this thing called public key cryptography, which is a mechanism mathematically whereby a person has a private key and public key, and which of those is used for what process when it comes to encryption? With which of those keys, private or public, do you encrypt information if you know? Yeah? Axle? 

The private one? 

Not a bad guess, but it's the opposite. So in this case, in public-key crypto you start the process by generating two really big random numbers that are somehow -- not quite random, they're somehow mathematically related. And they have these key properties in public-key crypto whereby the private key is the only key in the world that can decrypt information that's been encrypted with the public key. Now why is this compelling? The fact that I have two keys is really nice because suppose now Jack and I -- suppose I'm a user on the Internet and Jack has a website and he's trying to sell widgets on this website securely using HTTPS, and therefore he's brought an SSL certificate and somehow I need to communicate securely with Jack. Well, unfortunately most encryption algorithms assume a private secret between me and Jack. For instance, if you think back to a silly example in grade school, if you wanted to pass a note to a classmate or someone you were crushing on. You want to send them a secret note, literally pass it to a classmate and then give it to the boy or girl across the room, but you don't really want the teacher, let alone the kids between you and that boy or girl to intercept it and read your secret love note, or whatever it is you're sending. So a kid might do something super simple, like change the letters around, so instead of sending an A, you say a B. Instead of writing a B, you write a C. C, D, and so forth. Something simple like that, and in fact there's an algorithm known as the Caesar cipher, or ROT13, which is a specific example of this that does exactly that. Right? Because what's a non-technical teacher going to do? They're going to see this note, it's going to look like nonsense, they're not going to know what it is, or they're at least not going to care to decrypt it by figuring out how you encrypted it. So, in short, nice little childhood encryption scheme. So the problem with that is that if Jack is some random website on the Internet from whom I want to buy something for the first time, Jack and I do not have some known algorithm that we can use. I can't just write him a purchase order and then encrypt it by changing A's to B's, B's to C's, and handing it to him over the Internet because he's obviously not going to know how to reverse that process. So, I could call him up, we could agree on some secret like rotate all the letters one place or something stupid like that, but obviously this is not how Amazon.com works, or real stores work. So public-key crypto is an alternative to that scenario, which is generally called secret-key crypto where Jack and I have some secret. If that's not going to work I can instead use public-key crypto whereby I have a public key and a private key, Jack has a public key and a private key, and the nice feature of public keys is that as their name suggests you can broadcast them to the world. You can display them on your website. You can put them in your email signatures. You can just send them in the clear over the Internet whenever someone asks you for it. And indeed the way SSL and other algorithms work to get started is if I am trying to buy something from Jack's website and it's using SSL, Jack's website essentially says, "Here, this is my public-key, use this to send me information securely." And before I send him any messages or orders or credit card information I first take my information and I encrypt it with his public key, then I transmit the cipher text, the scrambled stuff, across the Internet to him, because what's the only key in the world that can unlock or decrypt that message now? Axle? 

Well, that's the private key. 

The private key. 

He's the only one who has that. 

Exactly. He, by definition of private is the only one who should have that. He's made a mistake if he gives it to anyone else. So only he should be able to decrypt that message. And similarly if he wants to send me a secret message I can just give him my public key, like I can anyone else, and that process can happen in the other direction as well. In reality something like SSL uses a little bit of public-key crypto as well as some other techniques because it tends to be a little more expensive computationally than secret-key crypto, but that in general is the key property. So, how does this help us? Well, so SSL is again, as Axle said, it's this chain of trust. You buy an SSL certificate and it will just work mathematically but until the browser -- until you pay someone for that certificate and have them digitally sign the certificate, so to speak, no one else in the world is supposed to trust the certificate. Because recall from a few weeks ago various browsers, IE, and Firefox, and Chrome, and the like all ship with certain certificate authority keys in them -- certificate authority certificates in them, which says Chrome will trust any SSL certificate from GoDaddy, from VeriSign, and from this list of a whole bunch of other SSL certificate selling companies. So, if you have not bought your certificate from one of those companies Chrome is going to do what to the user when you try to visit the website? Jack? 

Send some crazy warning to you that says like, this site has an invalid security certificate and you shouldn't stay here, or it might be dangerous [inaudible]. 

Exactly. You will get some warning message scaring you. Chrome tends to look like this right now. "This site's security certificate is not trusted!" I ironically as I did a few weeks ago went to cs.harvard.edu, which has not paid for an SSL certificate, and you see a big red X, you see a big red screen here which means you can't actually proceed unless you click this button. Now, out of curiosity what do most of you do when you encounter a website like this? Ben? 

Keep going. 

You just keep going. Anyone else just keep going? Yeah? Axle? I mean I do. 

If you wanted to access the website in the first place. 

Right, if you want to access the website and you know you haven't mistyped it, so you haven't gone to some sketchy random website. You're where you want to be. This generally signifies an error, or a lack of payment, or the like. So I would propose, and I have no data to back this up, that most people probably proceed. Now, on fairness, Chrome makes it pretty easy to proceed, you literally just click proceed now. If I instead use something like Firefox and go in here, Firefox is a real pain in the neck. So, if you've ever used Firefox here is how you get around this issue. And this is kind of tragic because if you do screw up, or you don't want to pay for a certificate, and you pretty much should. Or if you generate your own certificate and do what's called self-signing it, in other words you do the mathematics yourself, which is totally legitimate in terms of the formula, but you haven't had anyone else vouch for you like GoDaddy or VeriSign, here is what your friends or family or customers would have to do on Firefox. One, they're probably not going to click "get me out of here." They're instead going to have to click "I understand the risks." Add exception, get certificate, confirm security exception. And now you can proceed, and so forth. And right now it's hanging for just a moment. There we go. Now it redirected to the actual CS page. So no, most people are not going to do that, even I get annoyed as anything when I have to go through those hoops. So in short, this is why you pay for an SSL certificate. Now the theory behind it is great. As we discussed a couple weeks ago, chain of trust, it's very reasonable, it's a nice way of sort of ensuring there's a mechanism in place whereby you're not visiting potentially bad guys or the wrong websites. But again, it's nice in theory, it's not necessarily the best experience in the end in practice. All right, any questions? All right, so rather than scare with some math let's go ahead and take our five minute break here, but when we come back we'll give an example of how you can implement public-key cryptography to make it a little more concrete than just there's math that makes it work. And then we'll also take a look at a bunch of other threats including involving SQL, JavaScript, and HTML. So why don't we go ahead and take our five minute break here. All right. So, math time, though it's fairly formulaic math. All right, so how do you possibly send -- come up with a public and a private key in such a way that you can use the public key to reverse the effects of -- rather, you can use the private key to reverse the effects of the public key and ultimately exchange some secret? So, there's a couple of very popular protocols when it comes to public key cryptography. RSA is one with which you're probably familiar, at least by name. Has to do with coming up with a public and a private key using very interesting properties of large numbers that when multiplied together are very difficult to un-multiply or factor back down to their primes. So -- but that one's a little more involved and so a nice one to use for the sake of discussion when it comes to public-key crypto, something called Diffie-Hellman, or DLP. And this is an algorithm that similarly involves public-key cryptography, but it's an interesting example of how you can essentially shout across a crowded room some number and your partner, B, can do the same and somehow together that mere exchange of that information is enough to come up with a public/private key pair. So this story works as follows. In the case here of Alice and Bob we have the following story. Alice and Bob in advance are going to agree upon two numbers, G and P. P is going to be some prime number, and G is going to be what's generally called a generator, which is often the number two. As simple as that. But P is a big prime number and they can choose P however they like. So they have to agree upon those in advance, they can tell their friends and announce to the world what they are, these are not secret values, but they have to decide on them upfront. So then Alice decides on some random number A, and Bob similarly decides on some random number B. They don't tell each other these numbers, but they choose them in isolation of each other, and then they perform a bit of mathematics, specifically Alice goes ahead and computes this value here. G to the A mod P. In other words, Alice goes ahead and takes G, which is probably the number two, raises to the power A, and then does modulo P, What is the modulo operator generally do? Yeah, Isaac? 

Well it's kind of like the remainder. 

Okay, yeah. It's kind of like the remainder. It's what you would get if you were to divide some value, like G to the A, by some value. It's what you have left over if it doesn't divide evenly. And as an aside I'm trying a new program for the first time here which is why I have this trial software on the screen, but it's the only way I could try drawing for the first time here today. So G to the A mod P ends up being some number, and we're going to call it generically T sub A. So it's T and it's Alice's T. So what does Alice then do? She transmits that number, G to the A mod P, otherwise known as T sub A, across the Internet, in the clear, doesn't matter. So in other words, even though it's drawn here mathematically as G to the A mod P, she doesn't send that written expression. She sends the result of that arithmetic operation, T sub A. So Bob does the same thing but using his B instead of A. So he gets some number, sends that across the Internet. So at this point in the story Alice has sent that first box, Bob has now sent this second box, and so now Alice has what? She has A, because she came up with it, and she also has T sub B. Similarly, does Bob have B, and T sub A. So those are the values that have now been exchanged. So what do they then do? They then both go ahead and compute this value and this value. Alice computes T sub B raised to the A power. Now what does that mean? Well, T sub B is just our variable that represents what Bob sent her. So she's raising whatever Bob sent her to the power of A. And Bob does the same thing, raising to the power of B what Alice sent him. But if you recall how exponentiation works, when you raise something to the power and then again to another power you end up multiplying the exponents. So mathematically what both Alice and Bob have done here is raise G, which again is some number like two, to the power of A times B, or rather, power of B times A, but that's the same thing. With multiplication you can do it in either direction, mod P. So at this point in the story Alice has A, she has T sub B, but she also has G to the AB mod P, even though she doesn't know what B is. And Bob conversely the same resulting value, G to the AB mod P, and he has B, but he doesn't have A. So we seem to have constructed a scenario in which both Alice and Bob have some shared secret, rather where Alice and Bob both have some value, G to the A mod P, but each of them only has a piece of the rest of the puzzle. Alice has A, Bob has B. So essentially Alice and Bob can now use this number to encrypt information and they can reverse its effects by using their own respective private keys. So it's not quite identical to, for instance, what a browser would use these days, it's sort of a simpler version, or simpler story of what's possible. But it hints at how the mathematics can behave in such a way that you can share some information publicly, like me just talking to Jack's web server, or me just talking to Bob in this scenario, and nonetheless preserving some notion of privacy whereby there's still a number that only I know and can use for the decryption part. All right, so now let's try a concrete example of the SQL injection attack. We've been talking about it for some time but haven't necessarily teased apart. So here's a login form up top. It's representative of most login forms. You got a username, a password field, and then a checkbox for keeping me logged in. Now as an aside, before we look at the SQL, how does a box like that generally get implemented? Facebook has something like this; most websites have something like this. And when you check that it obviously keeps you logged in until either for seven days or until you explicitly log out. Yeah, Axle? 

Well it has to store the login variable in a cookie. Right? Because the session would be destroyed once you closed the browser window. 

Good. 

So if you want to be still logged in when you open it again you have to store it [inaudible]. 

Good. Exactly. So, by checking this box I'm essentially asking the web server, plant a cookie on my hard drive that somehow will remind you who I am and that I have logged in. Now in the worse possible implementation of this, that box could result in the server setting a cookie with both my username and password so that anytime I visit it just sends it again and again and then when I finally logout it just deletes those cookies. But that's of course not a good mechanism, we talked earlier about HTTPS and if it's sent in the clear you're just telling the whole word your username and password. Plus it's just not necessary. So instead what this button would probably do is tell the web server to set another cookie that's not the PHPSESSID, which is a PHP specific thing, it's just for the superglobals purpose. Instead setting a cookie with the set cookie function that's called authenticated, or something like that. And that key authenticated has some value that's similarly a big random looking number, but that big random looking number is remembered on the server, maybe in some database, and the next time the server sees that same big random number in a cookie. It checks its database and sees, oh, I gave this big random number to Isaac, let me assume that this is Isaac again and show him Isaac's Facebook profile, or whatever website he's actually visiting. Now of course you're still vulnerable to what? 

Physical [inaudible]. 

Okay, physical access is always a problem, but what else here? Jack? 

Packet sniffing. 

Packet sniffing. Right? If it's not HTTPS this feature is even worse now because sessions are generally ephemeral, right? They only exist for the life of that browser window. But something like a cookie that has a seven day expiration or no expiration, that means someone can sniff this and you use it anytime they want so all the more reason to ensure that this is encrypted. All right, so now the SQL part of this. Suppose that this form is a login form that essentially results in a SQL query getting generated like this one here. So we have -- this can be implemented in a few ways, and I went with the MySQL query version of this specifically so that the inputs would not be escaped. So let me go ahead here and zoom in a bit. And what do we have here? So on the left we just have a variable called result for our result set. To the right we have mysql_query, recall which was the function we used initially a couple lectures ago for executing SQL commands. Then I'm using sprintf. Does anyone recall what sprintf does? It's not strictly necessary but it's a possible approach? 

That you insert placeholders where the variables go and then you do command and then you define the variable. 

Yeah. Exactly. You just insert these placeholders. So in this case I have a placeholder %s, another placeholder %s, and that means that the next two arguments to sprintf, here's the first, here's the second, are going to be plugged in for that placeholder. And it just makes things a little more readable, I don't have to use the concatenation operator, the dot operator. I don't have to use the curly braces. It's just one way of constructing a SQL query without just doing it all very manually. But there's a fundamental problem here with my query because I've obviously not done what? Axle? 

You haven't escaped with user input [inaudible] anything. 

Exactly. I haven't called MySQL real escape string, which even though it's a poorly named function it does escape potentially dangerous characters, things that might lead to the server being tricked into executing a command like delete, or drop in the database. Now, how can we see this here? Well notice that the SQL query that's being built up is SELECT uid, so user ID, or whatever that is, FROM users WHERE username equals '%s' AND password equals '%s'. So what's perhaps a dangerous character a malicious user could provide when filling out this form? Jack? 

Semicolon? 

Semicolon, potentially dangerous because it would seem to terminate the query, or in this case what's even worse? 

You can do a single quote to end the username -- yeah end the username and then you can insert the SQL in between. 

Exactly. In this case the semicolon is not going to be too worrisome because it's going to be in between quotes. So it's not going to terminate the query. But if I were to be like David O'Malley, like O-'-M-A-L-L-E-Y, or some Irish name that has an apostrophe, or any name that has an apostrophe, that's going to be a problem because it's going to be WHERE username equals 'O seemingly unquote and then Malley and then another quote, a third quote, and now things are just imbalanced. Now that's going to break. So that's just going to trigger some kind of server error, but what if the bad guy is smart enough to realize that if he's going to pretend to close one of my quotes, if he's going to pretend to close this first quote here. He had better realize that in order for this not to be a syntax error, and in order to truly trick the server into executing something, he better open a new quote later that corresponds with my second apostrophe. So what do I mean by this? Well let's take a look at what a user might type in. Suppose the user -- rather, woops. Suppose the user types in this, and I've deliberately removed the bullets that would normally appear in a password field just so you can see what this bad guy has typed. But what if he proclaims his password is 12345' OR '1' equals '1. Now that in and of itself does not look syntactically valid, but what's the implication of the bad guy having provided this as his password? Axle? 

Well it's going to be sent to the server and first of all it's going to add two quotes on either side because that's what's inside your password thing. 

Good. 

But then it's going to interpret -- I think it's going to interpret the 1=1 as just a valid, logical statement, so it's going to return true to the login. 

Exactly. So because one obviously equals one, and because the bad guy has been smart enough to construct a string that looks weird, but if you think about where it's going in my PHP code it's going to be prefixed with a single quote and it's going to be suffixed with a single quote, at which point this actually becomes a syntactically valid SQL expression, or condition. So now the fact that he's specifically saying OR '1' equals '1 that's the real brilliant aspect here because he had a hunch that I was doing some kind of select where I'm and-ing or maybe or-ing things together. But if he somehow tricks me into executing, give me uid if 1equals1, well that code is indeed going to return one or more uid's if there are any users in the system. And presumably, like we saw in our login examples a couple weeks ago, if you are using the presence of a uid signifying that a user exists and should therefore be logged in, well now the bad guy has somehow tricked the server into logging him in as who knows who but as someone. And if his goal was simply to get into the system or get WiFi access, or you know, take someone's money, now he has access to an account on the system even if he doesn't know which uid was returned. So to be clear, in red here, the query that's just been constructed would look like this. SELECT uid FROM users WHERE name equals 'jharvard' AND password-'12345' OR '1' equals '1'. And unfortunately one always equals one, which means you're going to get back a result set with one or more uid's if there are, again, one or more in the database. So, kind of bad. How do we actually fix this? Irony is that it's so simple to fix, it's annoying to type but simple to fix. And we can take the exact same code in blue from before and this time simply call mysql_real_escape_string on both the username field and on the password field so that now I have the ability to specify that these things should be escaped in advance specifically by calling this function here, and this function here. So, why do SQL injection attacks nonetheless happen even though it's so relatively easy to avoid them by simply escaping your input? And to be clear, what is mysql_real_escape_string do? It's for things like quotes; it puts a backslash in front of them, which means you won't be tricked into executing the wrong thing. So why does the world still suffer SQL injection attacks sometimes? Jack? 

Just errors on the programmer. Maybe he didn't put it in. 

Yeah. Exactly. The lack of knowledge, lack of remembering, an attitude of oh, I'll go back an escape my inputs later, which is a horrible possible scenario, to know you're doing it wrong then to claim you'll do it later, lest you forget. 

But also I don't think that the login forms are the most vulnerable because I think people don't do this in the login forms, but I know of some Internet security programs they scan the entire website looking for [inaudible] forms and the ones that usually the most vulnerable ones the ones that say send a suggestion, or send [inaudible]. 

Yeah. 

The ones that were added later on top of the site, after the login was finished, and they didn't really think that that would be a potential security threat. 

That's good. Yeah. It's potentially the stupid or the seemingly innocuous forms that have nothing to do with security, nothing to do with users, but if they have to do with SQL, like if you're using SQL to insert into the database some user's feedback, well the problem here is that they can execute not just or 1equals1, they could do something like semicolon select star from users and dump your whole database. Or worse, they can do drop table users, or delete star from user, or delete from users. You can do any number of things. So in fact some of the popular press attacks that you've read about, there was one, was it Yahoo? Someone's recently, I forget, that involved a dump of a SQL database, was very likely the result or was likely the result of something like this where someone injected SQL into a script that hadn't scrubbed it properly. That or someone got access electronically to the database and just kind of manually executed these queries. Both scenarios could yield the data in question. I don't think Yahoo, or whoever it was, was very -- 

I think it LinkedIn. 

Oh, it was LinkedIn, maybe that was the one. I don't think they were very forthcoming with their details. But the fact too that one of them actually had clear text passwords I think. That was idiotic, whoever that was. That was just not necessary. Right? That was like, what? Lecture three or something? So, all right. Anyhow. So what more is there to fear out there? Oh, and to be clear in green here this is what the bad guy would experience if you actually called mysql_real_escape_string. Looks stupid but it's no longer tricking you into evaluating something as 1equals1 expression. Now it says give me the user ID where the username is jharvard and the password is literally 12345; OR and so forth. Which is not likely someone's password. And even if it is it's probably John Harvard who's trying to log in with that. All right. So same-origin policy, and its relation here to security. So we talked briefly about this on Monday in what context? There's a same-origin policy that, long story short, essentially says what? Axle? 

Well, I encountered it when I tried to get some [inaudible] data from the [inaudible]. They wouldn't allow me because I wasn't on the same server. 

Good. 

So what I had to do -- well, you can go around it, you can just do a PHP [inaudible] contents and then echo [inaudible] HTML, but you would essentially, the policy constitutes that you have to be on the same server [inaudible] database. 

Exactly. Browser behavior is governed generally by the same origin policy which is really relevant for us most recently by context of Ajax. Which allows us of course to get more data from a server and incorporate it into the existing DOM. Problem is the same origin policy tells browsers that you can make Ajax requests to other servers but you cannot integrate their response into the current DOM because it's not from the same origin as the original DOM, the original HTML file that was downloaded. So there's a couple workarounds for this. One, the person who owns the server, Yahoo in that case, could start sending certain HTTP headers, called cors headers, C-O-R-S, that essentially tell browsers it's okay to let people incorporate our data into their existing DOMs. Most websites don't enable that, and you can do it if you run the server, you can't do it if obviously you're trying to use Yahoo. Alternatively you can write a proxy, a little PHP script that's maybe one or two lines that makes the request for you, but that PHP script at least lives in the same origin as your JavaScript file, so that might help. You can use a third party proxy service like Yahoo's YQL service that I mentioned last time, which will allow you to go to some other server and sends it back to you in a way that you can incorporate it back into your own DOM, which is nice. Or you can use something called JSONP, called padded JSON, which essentially uses some interesting script tag hacks so that you can grab data from a server and you tell the server in advance what function you want to be called when the data comes back. And if the server cooperates then you can trick the server, rather, it's not so much a trick in that case. You can incorporate data into your DOM by using a server that supports, again, JSONP, padded JSON. But many websites don't support that; they'll support just JSON or XML, or the like. So in short, number of workarounds but a pain nonetheless. Oh, and this affects everything, windows, frames, objects, Ajax requests, and it's the Ajax ones that are the most current for us. All right, so a couple of remaining attacks to be mindful of. Things that you might not have even appreciated as you've made your HTML based websites or more recently your PHP websites, or now your JavaScript based websites. Cross-Site Request Forgeries, which has an acronym I rarely remember, and Cross-Site Scripting attacks, XSS. So what are the two and what should you fear? So here's the story here. You log into something like project2 and suppose it lives in some domain name on a web. So domain.tld, top level domain. So you log into project2.domain.tld. You then visit a bad guy's site and the bad guy has a link to this URL here, HTTP://projecet2.domain. tld/by.php? symbol=INFX.PK. You unwittingly buy the penny stock. What's going on here? So this is assuming something like CS75 finance in this case, although, actually, two would have been the [inaudible]. So if it's project1, this would have been something like CS75 finance, and it seems to be proposing that this bad guy has a link that just so happens to be leading back to your domain, but he has it there because he just likes tricking people into buying his penny stock. And that's advantageous for him, the bad guy, because the more people that buy the penny stock, the more it drives the price up, then he can sell and screw over all these people, so a reasonable attack scenario. So how is this working, or what's the flaw in your website, project2.domain.tld? Jack? [ Inaudible Speaker ] Good. [ Inaudible Speaker ] Perfect, right? It's not that hard to figure out how a website works; you can just use Chromes Inspector. Anyone who's taken this class, like, has -- is savvy with which to start poking around, and just figure out how almost every mechanism of a website works, unless things -- like, JavaScript [inaudible] which just makes it harder. But at the end of the day you can certainly figure out what HTTP parameters are being used by just watching the traffic in your own browser or some other debugging tool. And if you made the conscious decision as the designer of this website to have a buy.thp file, which that's reasonable, and it takes a parameter called symbol, and a stock symbol as its value, and that buys one share [inaudible] or something like that. Well, you have made a mistake here because you've made it super easy for a user to similarly construct a URL that looks like this. And the bad guy might not even put it in his website, what if he just sends a million spams and tells people in that spam email to click this link, right? He's going to get some small percentage of people actually clicking the link, who if they also happen to be users of your website, have just been tricked into buying that stock. Now obviously this is kind of a silly name for a website, but what if the URL is actually etrade.com, or something like that, and you've been logged in recently to your E-Trade account and they have implemented their [inaudible] parameters in this way, you can trick them into buying the penny stock. So what's a defense? So Jack proposed, like, a random number. Can you elaborate, Jack? [ Inaudible Speaker ] Good. [ Inaudible Speaker ] Good. So you can have the web server, etrade.com for instance, generating some kind of random token that it requires be in the actual buy. And the motivation here is that if the server uses cookies or sessions to remember that I gave Jack this random ID for subsequent purchases, now the bad guy has to not only know the format of the URL, he also has to get so damn lucky as to also hard code into his link the same identifier that Jack was handed. And if the number is long enough, there's no way statistically that's going to happen. So that can help ward off the threat. [ Inaudible Speaker ] Okay. [ Inaudible Speaker ] Ah, a really good thought. So among the HTTP headers that a browser typically sends there's one called the referer header, which specifies, I came from this URL recently, and this is useful. This is how things like Google Analytics figure out where you're coming from when you get to a page. They can tell you that you came from the Google, or the like. But the catch with this HTTP referer header is that it's not guaranteed to exist. It's a nice feature that most browsers honor, but if you're one of these paranoid types, or you are behind corporate firewall that scrubs certain information, the HTTP referer header is not required for correct behavior of a browser, so privacy scraping tools -- privacy scrubbing tools could remove it all together. And it's just not always sent, depending on how you visited the URL or you open a new tab or the like. So not bad, it raises the bar a bit, but it's not a sufficiently reliable mechanism. It's not going to keep your purchases super secure. So one problem, too, seems to be we're using what method here for the stock purchase? Ben? [ Inaudible Speaker ] Okay, so what's an obvious alternative, then? [ Inaudible Speaker ] Okay, good. So if you use post instead, you don't just have to trick the user into clicking the link, you have to trick them into filling out a form, at least clicking a button. Axle? [ Inaudible Speaker ] Good. [ Inaudible Speaker ] Good. So that's the catch. So this does raise the bar, and you've just protected yourself against the less intelligent of adversaries. But we saw the other day how you can register event handlers [inaudible] that prevents form submission. Turns out in JavaScript there's a function called submit that you can call to actually submit a form via code. So all you've done here, too, is you've made it a little harder for them. You've tried -- now the user has to fill out a -- has to submit a form by clicking a button. But, again, with JavaScript, and almost everyone has JavaScript enabled, a user could visit a page, there could be a hidden form there, and they're tricked into submitting it because there's a JavaScript function that just says, when the dom is loaded, submit this form automatically. So it raises the bar, helps maybe with emails a little bit, but not all of the possible attacks. And in fact it's a little scarier than this. Even though I chose this example of, like, a spam email or the user visiting the URL, take a look at these other possible attacks. Suppose you just happen to visit a website that has an image in it, a script tag, an iframe, another JavaScript tag with some code, you can trick the user to visiting URL in so many different ways, right, simply by including an image tag, that's maybe the simplest one. And notice this is kind of weird in that I'm saying the image is the URL of the buy.php, but it's not a big deal if that doesn't actually return an image, right? It's going to return a broken icon or something like that, but who cares, I just tricked the user into buying the stock already. So realize that you don't even have to be overtly trying to trick the user into clicking a link, you can trick them into sending an HTTP request from their browser or their mail client just by hard coding the URL. So now POST would help with these attacks in particular, but not in the case of a browser supporting JavaScript. So how can we really fix this, then? It's not sufficient just to use POST. I claim it's not sufficient to rely on the referer header, because it's not always there. It might help, but not sufficient. POST doesn't seem to do it fully for us. The random numbers, not bad, actually, that's probably one of the most solid solutions thus far because it requires server side cooperation. Axle? [ Inaudible Speaker ] Okay. [ Inaudible Speaker ] Okay. Good. [ Inaudible Speaker ] Good. [ Inaudible Speaker ] Okay, so a slight variant on Jack's idea. This time you have a hidden form field that contains some server generated token that it can validate somehow so that if it doesn't get that same token, it knows that this is a forged buy. So that's not bad. That, too, is solid. What is -- if you've ever bought something from Amazon, what does Amazon ask you to do before you buy something? Yeah? [ Inaudible Speaker ] Okay, so CAPTCHA's. What's a CAPTCHA? [ Inaudible Speaker ] Okay, but what does that mean? 

It's a -- generally an image that viewers having a hard time figuring out. It's distorted text -- 

Okay, good. 

-- generally. And for -- for [inaudible] it's really hard to see what it actually says, but for you it's really easy [inaudible]. So [inaudible] spend time [inaudible] can't have [inaudible] because [inaudible]. But [inaudible] buy stock, it's going to go, okay, [inaudible] five letters and five numbers. 

Perfect. Yeah, and so you just -- you put a bump in the road, so that even though the user could be tricked into visiting your webpage, either behind the scenes with a tag like these, the image tag, or explicitly by a link they click in an email or in a website that they visited or redirect. Before the buy.php actually buys the stock, it shows the user a CAPTCHA, a thing like, please type in the following word that you see, just to raise the bar so that the user now kind of has to be not so sharp. If they're like, okay, I'll fill out this form, even though that form says to buy this stock fill out this form, or fill out this CAPTCHA. And similarly we could do something like Amazon does itself, they just prompt the user to re-login. So even if you're logged into your Amazon account and are browsing and adding things to your cart, the moment you click check out, even if you just logged in a minute ago, they prompt you to login again, putting that bump in the road so that the user has to demonstrate that it's indeed me and not some bad guy that just tricked me into visiting the checkout line. And that's particularly important for Amazon because they have this patented feature that does what? One-click shopping. All right, one-click, kind of bad, right? If you can buy a stock or a buy a book or buy a TV with one click, you better be sure that it's the user who has clicked and they haven't been duped by one of these various techniques. And lastly, XSS. So this one we've talked about, but let's take a look at a concrete example. So this URL's unfortunately a little long and small, but demonstrates the idea. So let me zoom in. so supposed in this scenario you happen to click on a link that goes to vulnerable.com, literally, and foo equals and then some crazy looking script tag. Now, generally the script tag would not be written as a script tag, it would be something cryptic looking like this with percent signs, because we call it PHP and other languages and have a URL and code function that take potentially dangerous characters and spaces and they URL and code them using percent signs and numbers to make sure that it's all one string, with no spaces or breakages in it. But just so we can talk about it, the top is what the bottom translates to. But what does that do? It's a script tag inside of which is apparently some JavaScript code, and we didn't see document.location the other day, but document.location is a property inside of the document global object in JavaScript that if you assign a new URL to it, it redirects the browser to that URL. So we saw the redirect ability of PHP by sending the location header, the [inaudible] header. You can also redirect users in JavaScript by setting document.location or document.location.atra, more specifically, to another URL and they'll be whisked away as soon as the code executes. But in this case the bad guy is doing something clever, he's sending the user to badguy.com, for clarity, /log.php, the idea being that he's going to log whatever he gets of this file. And the argument he's giving himself is cookie equals the [inaudible] of document.cookie. We also didn't talk about this, but it turns out that in JavaScript you can also access cookies; you can set them and get them, and they're all stored in an object called document.cookie. So inside this global document object there's another object called cookie and everything in there is -- all of the key value pairs you have for this website's cookies are stored there. So what's the implication? If it's a PHP based website you're visiting, there's at least one cookie involved [inaudible] sessions and that cookie is called what? [ Inaudible Speaker ] A PHPSESSID, right? That big capitalized word we keep seeing in the headers. So PHPSESSID is going to be present in the memory of any browser that's visited a PHP website that has session start having been called, where sessions are in use. So that means in document.cookie I have access to a user's session cookie. So what I'm doing here is constructing a URL, badguy.com/log.php cookie equals document.cookie, so that effectively the URL I'm going to be sending the user to is badguy.com/log.php?cookie equals one, two, three, four, five, six, seven, either, whatever the big random number is that is the user's PHP session ID. So what does this mean? This is a very sophisticated way of sniffing someone's session cookie, not via WiFi, not via a wired Internet connection. In fact, it's doing it via JavaScript. So they could be anywhere in the world. They could be on an encrypted connection, on a VPN, they can have WPA2 installed, but it doesn't matter because JavaScript is as close to the user as you can get. The cookies are unencrypted at that point, they're stored inside of this container called document.cookie. So if I trick the user into executing some JavaScript code, that JavaScript code doesn't have to be something stupid, like a few lectures ago where I just said, alert hello, or alert annoy, or whatever it was, rather, it can be something dangerous like this. Because now badguy.com has in his log file somewhere, what? [ Silence ] [ Inaudible Speaker ] Exactly, someone's cookie value. Maybe it's Facebook, maybe it's Gmail, maybe it's Bank of America, something and now he can presumably take that value from his logs and use a special program that he wrote or downloaded and pretend that that is his own cookie, and now he's logged in as this random person. So in the end of the story -- by the end of the story, vulnerable.com has to be flawed, it has to be vulnerable by doing what? So step two describes it, but what do I mean by it makes the mistake of writing the value of foo to its body? Jack? [ Inaudible Speaker ] Exactly. Right? So just as we did this silly example a couple lectures ago where I tricked the browser into triggering an alert that said hi, or annoy me, or whatnot, if there's a similar form on vulnerable.com's website, and I have tricked it into pre-populating that form field with in this case a script tag, because that's what the bad guy has tried to trick me into providing as input, and you did not only -- you only called print or echo or use the equals sign operator, you did not use what function? [ Inaudible Speaker ] Yeah, HTML special charge, which escapes things with ampersands and entities and the like and makes them execute. Well, badguy.com is going to get your cookies in this case. So what's the key takeaway here? How do you protect yourself against XSS? Jack? [ Inaudible Speaker ] It really is as simple as that, right? Like, the alternative is, don't click links, but that's not going to happen, users are going to that. And it doesn't even matter if they click, because they can just use an image tag to trick them. Don't trust user input. So that's a given, all right. So a key theme here with these latest attacks is, you should never trust that what the user is typing is going to be valid, and you should certainly encode it or escape it so that you're warding off these kinds of attacks. All right, because what if -- and again, just to emphasize one lesson from our JavaScript discussions, but this is why we have form validation. Why don't I just check that what the user is submitting doesn't contain the open bracket or the script tag? Why do I also still need to encode or escape all user input? [ Inaudible Speaker ] Exactly. [ Inaudible Speaker ] Exactly. [Inaudible] validation not sufficient because it can be so easily disabled, as I did the other day by just clicking something, you can turn it off for your entire browser by some preferences menu, usually, or can just write your own software at a terminal window that pretends to be a browser and therefore doesn't have any JavaScript support whatsoever, it just makes HTTP requests. So there's a whole number of attacks that we explored today, and we've talked about things here and there over time. But ultimately the lesson really should be, never trust the users input, and also consider the fact that at least one of your users is going to be some adversary or just a little bit curious as to your site works and you should never just expect that, you know, users are going to behave in the manner you intend. So what remains? Thus far we've been using [inaudible] appliance, you've been having probably one user bang on your website, or two, including your teacher fellow. But on Monday what we'll discuss is scalability and how you can actually take a website and not just tolerate dozens or hundreds or even thousands of users, but maybe tens of thousands. And what kinds of design decisions you can make even for the simplest of projects so that if you do have some happy coincidence of becoming popular overnight, as some websites these days have become, you at least have designed things in such a way that you can scale your website out without having to throw a lot of money at it, certainly, and also without having to rewrite all of your code. There's a number of decisions you'll be able to make up front that, if you are so lucky as to have a problem of scalability, you'll be able to adapt to it. Why don't we adjourn here. I'll stick around for one on one questions, otherwise we have Section coming up, and I'll see you guys on Monday.