[ Silence ] All right. Apologies for the delay. Welcome to Computer Science S-75, Building Dynamic Websites. My name is David. I'll be your instructor this summer. And it's a pretty brief summer. So we're going to dive right in tonight to some material, then we'll take a breathe, look at the structure of the course, expectations thereof, and then conclude with some additional material. And along the way, please interject with any questions that you might have. But first some questions from me. So, you go ahead on the internet on your laptop or desktop. You pull up your favorite browser, you type in www.google.com and hit enter, what happens? Let's tell the story and we can be as high level or low level as we want and I'll steer us in both directions. So you've hit enter, what happens? Give me anything you got. Yeah? Well, first the request is sent through your modem to your internet service provider to wherever the Google website is stored and then the information is sent back. Oh good, OK. So that's the whole story. So that's very good. Let's tease it apart a little bit now. And I'll repeat some of the answers sometimes into the microphone so that our folks who are taking the course from afar can hear everything. So your computer makes a request to your-- through your modem, goes to your ISP, reaches google.com servers and they reply with a response. So good, now let's dive in deeper there, and let's focus on the active hitting enter to someone want to propose what-- give me just one step, in more technical detail what happens next. And then we'll get to that same end point eventually. [ Inaudible Remark ] Perfect. So we first need to translate the name of the site, in this case www.google.com, into an IP address. And someone else, what is an IP address? Yeah. Well, it identifies like a server or? OK, good. So an IP address, it identifies a server or computer on the internet. And an IP address is simply a number of this form. Let me go ahead and pull up a little scratch pad for notes here. So, an IP address, as you've probably seen, is something of the form w.x.y.z. And little internet trivia, each of these placeholders can be a digit from what to what? Or number from what to what? Sure. Zero to 255. Perfect. Zero to 255. And there's some restrictions on what numbers can be where but essentially you have number dot number dot number dot number, and each of those numbers can be again zero through 255. And if we really want to start pressing deeper here how many bits is used to represent an entire IP address under the schema? For those familiar with bits, 32. So why is that? Well for those less familiar, unfamiliar, if you want to represent the number zero through 255, which is a total of 256 numbers, you need 8 bits, because 2 to the 8th is 256. But we won't go into too much detail on along those lines. But if you've seen that IP addresses are just 32 bits, it is because each of these numbers is 8 bits itself. So actually, let's go here, there won't be much math in this course after the following sentence really. But if you have 32 bits, how many possible IP addresses are there for the world's computers? Yeah. [ Inaudible Remark ] So it's 2 to the 32nd which is roughly? Those who are good with math in their heads? So it's roughly 4 billion. So that's a lot but these days most of you have laptops, most of you have desktops, most of you have telephones in your pockets or iPads or the like. So there's more and more devices these days that are consuming IP addresses. So if you follow the popular media of late you'll find that people have been freaking out that we're about to run out of IP addresses. But that's because we've been using version 4 for far too long. Thankfully, version 6 has begun to get rolled out and version 6 will have 128 bit IP addresses, which is great because that's 2 to the 128, which is huge, barely pronounceable, but it will also become a little more complex to write these things down. So we can squeeze a few more years of discussion out of these addresses but realize the world is transitioning. Now just for the sake of the experience for those at home. Let me actually pause here, just so we can plug in this recording device so we can capture to another format. So let's leave that as a cliffhanger for just a minute or two and I'll be right back. So, where do we leave off? You'd just hit enter, we had proposed that your computer had translated or needed to translate the host name www.google.com into an IP address and then we talked for a moment about various forms of IP addresses. So let's now push a little harder on how this translation happens. So Google has a numeric address of this form. And as an aside Google actually has probably a whole bunch of IP address of that form. All of which lead to the same experience but perhaps different servers. So how does your little Mac or PC or Linux computer know what the IP address of www.google.com actually is? It has to do with a domain name lookup, the DNS servers. OK. Good. So it has to do with domain name lookup using a DNS server. So for those unfamiliar DNS is Domain Name System. And this is an infrastructure on the internet that pretty much does exactly that. It converts domain names and host names to IP addresses and vice versa. And we'll see tonight that it does a few other things in terms of helping with rerouting of email, with validation of ownership of domains and the like. So there are these servers out there. Now your computer, your home probably doesn't have its own DNS server but probably Harvard does if you're on campus or Comcast does or Verizon or your company does. Now if you're at a small college, for instance, and you're not visiting google.com but you're visiting somerandomwebsite.com, it's very possible that you are the first person on that campus to visit that website ever or at least in a long time. So what if you're small little campus's DNS server has no idea what this IP address is? Are you sort of out of luck because you went to that school and not one where there's more people using that websites? Or equivalently it's kind of a chicken and the egg problem. If you're the first person to ever need to visit that website and therefore your campus's DNS server has no idea what that mapping is, how do you solve this problem? Yeah. [ Inaudible Remark ] Exactly, so there's a hierarchy, thankfully, to the DNS system whereby even though you might have your own DNS server on campus or company but that doesn't necessarily store all possible domain names and IP addresses in the world. In fact, that would be quite a large database otherwise, and it's just not efficient to keep all of them around if they're not being accessed at all or very frequently. But your ISP knows some bigger fish and maybe that bigger fish knows an even bigger fish that has its own DNS servers that might know. But in the worst, if no one along this hierarchy knows, there also exists in the world what are called root servers which are spread out geographically across the several continents, and it's those root servers that essentially know who does know what the IP address is of somerandomwebsite.com. In other words, those root servers know who the authority is for instance for all of the dot coms in the world or all of the dot nets or the like so that you can have this initial request from little old your computer bubble up to these very high level servers and then bubble back down to some authority who does actually know. And the reason for that-- that that works is because when you go and buy your own domain name, which is a process we'll discuss in just a bit, you have to tell the world what the IP address is of your DNS server. So someone has to be informed proactively once really and only once when you buy the domain. So for now, let's come back to our story, we've hit enter, google.com was in my browser's window, my computer has somehow figured out that it is 1.2.3.4 or something like that. So now my computer puts together a message to send it across the internet to google.com. What does that message look like? Well, in it's simplest form it's a message that pretty much looks like this. It is literally the word GET in all caps, a space, a forward slash, if you're just requesting the root of the web server demarked typically with slash. And then HTTP slash version number. Now in reality there's a few more headers, so to speak. HTTP headers that gets sent from browser to server and we'll see those in action in just a bit. But this message captures really the most import aspect of the request. So your little computer creates a virtual envelope, more technically called a packet of some sort. Inside of that packet is a message like this. I'll put on the front of that virtual envelope is a to address namely 1.2.3.4 or whatever Google's IP address is. In the return field of this virtual envelope, you know, just like you were mailing something to a human, there is the return address who-- which should be whose IP address, probably? So your own IP address and your computer does know that if you have an internet connection. And then your computer sends it out on the internet. Now we can dive deeper and deeper and deeper but for now assume that your ISP has what's called a default gateway, a.k.a router. And routers are the computers on the internet that know how to get data from point A to point B or if they don't know precisely how to go from A to B, they know whom to pass it off to, who can then get it one step closer to point B. So in reality a packet, this virtual envelope might go from router to router to router to router, sometimes as many as many 30 different routers across the globe until finally it gets to its actual destination, google.com. Google receives this virtual envelope, sees that it's for its IP address, opens the envelope up, sees this message. google.com server happens to be running a web server and so that web server looks for the file called slash. Now slash is typically a synonym for an actual filename like index.html or index.php or any number of other default standard filenames. So Google grabs that file from its hard drive and then puts it in its own virtual envelope, flips the two IP addresses, the from and the sender, sends it back to the internet via these routers. It arrives on my computer. I have my computer, unbeknownst to me, opens this envelope, sees a whole bunch of language called HTML, renders that HTML top to bottom and I see the search page for Google's main site. Yeah? What is the function of the slash? What is the function of the slash? So whenever you type in a URL, you-- there are several different components to it. Http typically followed by :// followed by something like this, and so this is let's say a representative URL. But we can actually tease this apart into a few components. This is the protocol or schema at the beginning. Even though in a browser we almost always use http://. Have folks seen others? Yeah? HTTPS. HTTPS similar but different in that it uses cryptography, a topic we'll come back to. Yeah? Ftp://. Ftp://, sftp://, webcal://, some of these are more standardized than others but the schema is typically an indicator to some piece of software how it should view the contents at that address. So what comes after the ://? You typically have something called a host name or subdomain name followed by the domain name which in this case is google.com, or followed more precisely by a domain name with a TLD, top-level domain, a .com, .edu, .gov, .uk would be the TLD. And then you have what we call a path. And a path specifies exactly what file or folder you want to access. So a single slash means get me the root of my hard drive and if you come from the Windows world, this is essentially equivalent to c: or on a Mac it's equivalent to that or on Linux computer it's equivalent to that. So that is truly the root of your hard drive, the folder in which everything else on your hard drive lives. Now, it turns out you-- in a browser these days, you don't have to type most of that. You can omit the HTTP, you can typically omit the www, you can omit the slash and things just work. Why is that? Well, for the most part, it's because browsers have just gotten a lot more user friendly, right. There was a time, a few years ago where advertisements in prints and on TV would actually have http:// but then the world kind of realized that you know, anytime you see www something, it's probably a website so we started omitting http://. Now the world has gotten acclimated to any mention of .com or .gov so we don't even really need the www anymore. And so, whether or not www works or doesn't work is actually completely configurable by the system administrators of a website. And in fact, I don't have a sort of a soapbox to hop on right now but invariably during a semester, I'll come across some website for which foo.com or whatever their domain is .com just doesn't work, you have to type in www.something.com and that's just a foolish technical design decision on their part. We'll talk today about how you can configure things to just work. And it involves a bit of DNS, a bit of web server configuration but typically, you don't see that dead end because browsers these days, if you type in foo.com and hit enter and there is no foo.com IP address out there, the browser will presumptuously or helpfully prepen www to the start of the address and then retry that one. Some browsers, if you just type foo, will automatically try .com, .net, .gov some of the most popular ones. So in short, a lot of the technical processes that are happening are being sort of hidden now by browser user friendliness for better or for worse. So, the story began with hitting enter, the story ended with your seeing the homepage of Google. Any questions on the various steps in between, whether high level or low level? Right. So that's the story told from the perspective of a user. Why don't we tell the story from the perspective now of someone who owns a website or wants to operate a website? So suppose one of your goals in this class or some other is to actually have your own presence on the web. When you actually buy your own domain name and have your own business or personal home page or whatever the case may be, how do you go about doing that? You need more than just a laptop and a browser, now you need a server on the internet because even though every computer on the internet, your laptop included, has an IP address, it's not necessarily publicly accessible. Because even that statement's a bit of an oversimplification. You do not necessarily have a public IP address. In fact if you go home and you have internet access at home, especially wireless, you probably have a home router like an Apple AirPort Extreme or you have a Linksys router or some device with antennas that gives you wireless internet access. But Comcast or Verizon or whoever you're paying each month to give you internet access into the house via your cable modem or DSL modem which in turn is probably connected to that router. If it's not one in the same device, which some of the ISPs provide these all in one devices these days, odds are you have one IP address. And if you have three brothers and sisters or parents or grandkids in the house, all of you are sharing that one IP address. And yet the individual computers in the home still need an IP address so what actually is the case is that when you're in a home network, you have a-- what's called generally a private IP address. Something of the form-- anyone what a popular internal IP address is? [ Inaudible Remark ] Yeah. Exactly. Anything in fact starting with 192.168 dot something dot something is a private IP address. So the folks who invented the internet along the way decided, "You know what? We should probably have some IP addresses that should never be given out so that within a company or a home or a little test network, you can have IP addresses that are guaranteed not to exist on the public internet." So what home routers typically use is 192.168.0 or .1 and then the last digit can be again, between zero and 255. But some exceptions, it really-- it can't be zero, can't usually be 255 so there are some constraints but it gives you roughly 250 or so possible IP addresses. If you don't like that, there's 172.16 dot something dot something. There's a few more constraints on this one, but then if you really need a lot of internal IP addresses, you can have a what's called a class A private network, 10 dot anything is a private address. And this actually gives you millions of IP addresses for your home or your business or your data center. But in short, any IP addresses beginning with these and a few other prefixes are considered private but the problem then is that even if after this class, you know, HTML and CSS all the better, you know PHP and SQL and JavaScript and you create a website and you run it on your laptop using software we'll introduce you to, a web server called Apache, no one in the world is going to be able to visit your website because your address probably starts with one of theses prefixes and your home router or cable modem or DSL modem is not going to let outside random people into your home network to access this IP address because frankly, there's tens of thousands of people who probably have that exact same private IP address so it's just not uniquely identifiable. And because your home router and your cable modem is sometimes a firewall onto itself, this traffic is not going to get into your home. So in short, that won't work but you have at least two options, two alternatives. How can you get your websites out on the internet? Well, if you're still trying to leave it on your own network if you port forward to your own private IP address from your public IP address. You can. Port forwarding. So let's go there, for those unfamiliar. When you use a protocol like HTTP, you're actually using other protocols behind the scenes and in fact you've probably at least heard the buzzword TCP/IP, Transmission Control Protocol/Internet Protocol. It's actually two protocols. Two different standards or languages, so to speak, that govern how data can be transmitted on the internet. And this is a bit of an oversimplification, but for today's purposes assume that IP, the internet protocol, is just a set of conventions that humans came up with years ago that govern how you associate numeric addresses with computers. So IP address derives from this protocol. So IP is just the standard for assigning computers addresses. However, just assigning someone an address doesn't mean you can get data to that address. For that you need another standard, another protocol and that's typically TCP, Transmission Control Protocol. So TCP is the standard that web browsers and web servers speak in order to actually physically move data or electronically move data from point A to point B using the higher level notion of an IP address to actually uniquely identify points A and point B. So for those who might want to go further in computer science and in networking in particular, there's typically what's called the TCP/IP stack. And so there's topics like there's the transport layer down here. There's the IP or addressing layer here, there's the application layer. In short, much of the internet is the result of smart people having design things and then design things on top of thing on top of things and so we just typically oversimplify and say TCP/IP. So what's the point there? TCP/IP allows not just the web to work but all sorts of applications. There's the web, there's email, there's instant messaging, there's-- I mean what else-- there's things like Spotify, there's dedicated applications that are using the internet but aren't necessarily inside of a browser. So a server can actually do multiple things. It can receive email like Gmail can. It can be a website and get HTTP traffic. So a server, because it can do multiple things, somehow needs to be able to uniquely identify the various things that it can do. And so the world introduced this notion of port numbers. And typically for a web sever for rather for HTTP, it uses this protocol TCP and the world decided some years ago the number 80 will arbitrarily but consistently identify this service. So if you have a server and you have a website. And a website uses, as you probably know, HTTP but we'll look at what that means in a bit. It is running, so to speak, on port 80. It is listening, so to speak, on port 80. And the motivation for that is because you might also have an email server on the same physical box, right? Gmail, it's kind of an oversimplification but they are both a website and an email service. And if you want to be able to send email to Gmail, you can also use TCP but you have to use port 25. In other words, if you go to Gmail.com with a browser, you obviously want a webpage back. So even though you, the human haven't typed 80, it's automatically inserted for you by your browser behind the scenes. But if you sent an email from Eudora or Apple Mail or Outlook, or whatever you're using, you again probably don't have to care about this detail but that program is going to send data still to Gmail.com but specifically to port 25. So when a computer is on the internet, a server and it's listening for traffic, all of that traffic comes in on a specific port, a specific like pathway into the server so that it knows if it's a webpage or an email, right? Because especially email. Emails can contain HTML now so you need some way of distinguishing the two fundamentally. So when you proposed port forwarding, what does this mean? Well, if your home network has a public IP address and you usually, again, get one from your ISP, and that is some address of the form, w.x.y.z, and you, your individual laptop on which you've created your final project that you want to make publicly available, is that one of these IP addresses, doesn't really matter what it is. What you can do is configure your home router, a.k.a. firewall, a.k.a. cable modem, it depends on what make and model you have but that device, you can configure it to say, any internet traffic that comes from the internet to my home on my public IP address destined for port 80 should be "port forwarded" to IP address 192.168 dot something port 80. In other words, you can tell this machine to take incoming data on that port and then route it very specifically to this computer, yours, so that it just works. Now there is one gotcha here, especially if you have siblings for instance or other technically minded family members or roommates. If you're doing port forwarding in this way, only one of you can operate a web server behind your cable model. Because you only have one IP address to uniquely identify your website and if you've already claimed 80 as your own and that's the default for the world's browsers to use, pretty much only your web server can be accessed. Now there is a work around here. If your roommate is really ticked off at you, you can say, "Fine, fine, fine, I will give you port 81." But what does that mean, that means the entire world has to type out a URL like let's say your address was indeed, w.x.y.z this would be your address, your URL. Your roommate's, unfortunately, would be this crazy looking thing. Right? Or any number other-- any number really. Now there are some restrictions on the numbers. You just probably can't use 81, but the point is the same. This is not standard and you probably don't want your users having to remember such an esoteric detail as an arbitrary number. However, if on the internet you visit any website with colon 80, odds are you will get to the website with which you're familiar. It's just the browser is again for user convenience inserting the port number automatically for you. And little trivia for HTTPS, the secure version of HTTP, what port number does that use? 443. 443. And you sometimes do see that in URL. You also see some other ports commonly like 8080. 8080 is just kind of an arbitrary popular port that some company has used to run certain services. But in short, using anything nonstandard these days especially for commercial production websites where you're trying to make money or trying to stay online 100% of the time, using nonstandard ports is bad. Because there are certain companies, there are certain campuses that will pretty much block any port besides 80 and 443. But thankfully there's a work around. Even if you want to run some random server like a BitTorrent server or something like that, all you have to do is change the port number to be 80 or 443. So the reality is with firewalling, and we'll have this conversation toward the end of the semester when we talk about security more generally, and a lot of security mechanisms are kind of a joke because all you need is a modicum of savvy or, you know, having listened to the past 30 seconds of words that I just said and you can circumvent these kinds of restrictions. Hotels do this a lot, Starbucks does this a lot. But port numbers are really just this very basic mechanism and the world has adopted some standards. All right. So, perfect, we have a solution. All you have to do is somehow figure out how to download the manual for your Linksys router or Apple AirPort and you can configure all this port forwarding stuff and run a website from your home. So not quite, right? Because if you actually have a popular website, Verizon and Comcast might very well notice and just shut you off entirely because that huge disclosure agreement you probably clicked through and never read when you've signed up for internet service probably said you may not run a website on your home computer. So plus this was a pain in the neck to do anyway. So we might-- plus I unplug my laptop sometimes and so my website is going to go down anytime I go to-- go out. So not the best solution even if you have a desktop. So let's at least try to push a little harder and assume that we need to outsource this problem or we at least need to put a computer on the internet itself in a data center, on a campus where it can stay plugged in perpetually under your desk at work, if the sys admins allow it. And moreover, I don't want my website to live at w.x.y.z or any number for that matter. I want it to live it, david.com or some URL that is sort of distinctly my brand or my name. So, that begs the question, how do you go about getting your own domain name. Has anyone done this before? Yeah, how do you it? I purchase them. OK. Where do you purchase them? I got mine in Namecheap. OK so Namecheap.com is a very popular place, fairly inexpensive. GoDaddy is another very popular place. This one is kind of riddled with upsell attempts, trying to get you to buy everything in the kitchen sink. But you don't need to do that. There's all sorts of domain name registrars out there these days. Many-- A bunch of years ago, Network Solutions was the only one. But then a market was created and so there's lots of places to buy domain names. For the most part it doesn't matter where you buy your domain name from. But you do sometimes get different features. In particular, you get DNS feature sometimes, more control over your DNS servers. They might throw in free email accounts, free hosting but for the most part it doesn't matter huge amount, in particular you don't need to go to someone like Network Solutions and pay $30 a year when you could go to someone like GoDaddy and pay 9.99 a year or Namecheap and pay 4.99 a year. So in short, just paying more for a domain name isn't necessarily giving you anything more in the way of functionality. It depends on what maybe the add-ons are. So, how do we go about doing this? Well, let's go to something like GoDaddy. GoDaddy is kind of a-- Well, let's actually try Namecheap. Let's go to Namecheap and see what they look like. Namecheap. Bunch of my friends have indeed used these websites. All right. So let's see, domain name to search. I'm going to search for david.com, probably taken. Oh, that is a good price. We're already doing better than GoDaddy. All right, so as I expected, it is taken, as are almost all forms of David. They've suggested I name myself David John, David Smith, David Johnson, King David, David photography dot US. So, one of the hardest things frankly of starting a business these days is finding an available domain name, let alone your own personal vanity domain names for people's names. But if we found something we liked, maybe I do want DavidTV dot-- well, that's atrocious, $6000 for this domain but if it's not yet taken it's probably one of the cheaper ones up above. So let's assume we found something we're happy with. So we add it to our cart and we check out. I now own some domain name, David something dot com. So what now do I do with it? How do I associate it with my web server, and for that matter, how do we get a web server? Let's assume I have a web server and we'll cross that bridge in a moment. But I have a domain name, what do I need to do with it to start using it? Well, I need to tell the world what my IP address is. So I need to somehow tell the world that my server, I don't know who's going to be hosting it but I know it will have an IP address by nature of how the web works. So let's assume I know the IP address is going to be w.x.y.z. I somehow have to inform the whole world that david.com's IP address is w.x.y.z. So one of the things I'll have to do at Namecheap.com or GoDaddy or networksolutions.com is I tell the registrar not what my own computer's IP address will be but rather what the IP address of my domain name's DNS servers will be. And the convention is typically that every domain name in the world should have two DNS servers, primary and secondary, so a main one and a backup one. They can be one in the same but the world really pushes people to having at least two for the sake of uptime and redundancy. So I need to know not my own IP address per se but I need to know the IP address of one and then a second DNS server. Now I don't have my own DNS servers and I don't want to go have to configure two more computers in addition to my web server. So this is where web hosting companies come in. So in addition to buying the domain name, I also want to host my website somewhere and it could very well be the same exact company. It could be GoDaddy, it could be Namecheap depending on the service that they provide. But we need to have a web hosting option. So what's a web host going to give us? A web host is going to give us a hard drive to put my files on, you know, maybe not a hard drive per se, but some illusion of storage space. They are going to have their own connections to the internet, this web hosting company. They are hopefully going to have a pool of IP addresses so that I can have at least one of them. They're also going to have some RAM. They're also going to have technical support staff. In short, they're going to have a server and all of the things necessary to keep a server alive on the internet. And hopefully, they're also going to have at least two of what? DNS servers. So if I decide to host my website, let's say DreamHost.com. This is a very popular sort of el cheapo kind of web hosting company that I've used myself in the past, it's like 6.95 or 8.95 a month, so that's pretty good, but again, you get what you pay for. I wouldn't necessarily build a big business on it. So for 8.95 a month, I have the ability to upload my HTML and CSS files and soon PHP and JavaScript files to their server. Their server has nearby two DNS servers, each of which have their own IP addresses. So once I know what DreamHost's IP addresses are for its name servers, I tell Namecheap, or GoDaddy or wherever I bought my domain name, and that's it. The only time I have to talk to my registrar again most likely is in a year when they charge me another 5.99 or $99 for my domain name. Unfortunately, buying, you're really renting your domain name from these registrars. Now, there's a whole bunch more involved in setting up of the web server and getting my files there, but at least now I've told the world that if you want to know where david.com is, ask these people, these two IP addresses of the name server, either one. And those IP-- those DNS servers should hopefully know. Why? Because so long as I keep paying DreamHost or someone else 8.95 per month, they will ensure that both of those DNS servers know what my own website's IP address is. And how will they know? Because of what I'm paying for is some storage space and some internet connectivity on one of their servers. One of their servers has an IP address, so they just tell their DNS servers that david.com's IP address is whatever the IP address is of the server they've told me to put my content on. And we'll actually look in a little more detail of what's involved in that. But any questions? So, in answer to the somewhat frequent problem where a website does work at www.something.com but not at something.com, how do you fix something like that? There's usually two pieces to the solution. One, you have to make sure that there's a DNS record for something.com, that is there's an IP address associated with it in addition to one being associated with www.something.com and you need to configure the web server to accept requests for either something.com or www.something.com. But really, let's focus on just this DNSPs for now. So DNS. Turns out DNS is relatively straightforward, and once you start operating a whole bunch of services on your own website, maybe you have an email server, maybe you want to use hosted services like Google Calendar, maybe Google Docs, you can do things like-- actually for CS-75, for this course, the TFs and I use Gmail essentially to host cs75.net's email. So that's the website as we'll-- as I'll soon reveal, if you haven't pulled up the website. And we want to be able to have an email list so that each of us can email everyone else very easily. So we want email addresses of the form mailing@cs75.net. Now, how do we do this? Well, we could set up a mail server, we could pay someone to do this, but an amazing servers out there is Google Apps with which some of you might be familiar, and for small fish like us, where we only have a few people on staff, you can actually have hosted Gmail, hosted Google Calendar, hosted Google Documents for I think 20 or fewer people for free, and what you do is you configure your own DNS servers to map something like mail.cs75.net to essentially Gmail.com. So that whenever we send an email to something of the form mail.cs75.net, it figures out via DNS to actually go to Google. We could have calendar.cs75.net and you hit enter, you actually end up at Google Calendar, but our copy of Google Calendar. And this is all thanks to DNS. And there's only a few settings with which you need to be familiar, and we already talked about this one. An NS record is a record in a DNS server that tells the world what the IP address is for that domain. So, what's inside a DNS server? Frankly, it's a database and you can think of it as like a database with Excel files, so spreadsheets that just have rows and columns. And those columns essentially represent-- well, in each row rather, you would have for instance a domain name and an IP address. Domain name, IP address, domain name, IP address. That's really all that's underneath the hood in a DNS server, at least so far as we're concerned. But there are different types of rows. So one of those rows can be an official record that says the name server, NS, for this domain is whatever IP address DreamHost gave me for instance. Now, what else can I have? Well, there's an A record. So an A record, a row of type A in the spreadsheet of sorts is literally domain name to IP address. It's as simple as that. So if I have something.com and its IP address should be x-- w.x.y.z, that's what's called an A record. And I can also have mail.something.com or calendar.something.com and I can associate with an IP address. And how do I do this? It totally depends on your registrar or on your DNS provider, whether it's DreamHost or GoDaddy or the like. But these days, it's usually a web interface. Back in the day, it was a command line, you edit a text file on a server, but these days it's been made to be more user friendly. But it's essentially a spreadsheet. Now, there's two slightly fancier features. A CNAME or canonical name is an alias. So it turns out with a lot of these web services like Google Apps, where Google is providing the service, you don't necessarily want to have to know what Google's IP address is, right? Because one, you probably don't know anyone who works there, and so you can't really ask them. Now, frankly, you can run a command and figure it out, but if you hardcode into your DNS server the IP address of google.com, the implication is that if they ever need to change their IP address, which happens, not everyday, but you know, every few months, a few years for whatever technical reasons, now your website goes down. It's-- would kind of be better, at least like it conceptually if calendar.something.com didn't resolve to Google's IP address, but rather, what if calendar.google.com could instead resolve more generically to calendar.google.com. So don't have your domain mapped to an IP address, have your domain name mapped to someone else's domain name, and then let their DNS server tell the world what the current IP address is of calendar.google.com. So in other words, if you want this layer of abstraction, where you don't care what the IP address is, you just care that your domain name be a synonym for someone else's domain name, then you use a CNAME record. And what the two columns look like are domain name, domain name instead of domain name, IP address. So it's a wonderful useful feature, especially these days, if you look into hosted solutions, not just like Google Apps, but companies that have services like customer service forums. If you go to a website, they'll often have an address like support.dell.com or the like or there's a lot of companies these days that provide customer service websites, but it would look kind of lame if I go to dell.com and I get redirected to customerservice.com. Dell would rather rebrand someone else's service to look like Dell even though someone else implemented it and is hosting it. So via a CNAME, someone like Dell could say support.dell.com should actually resolve to customerservice.com, but the user should never know that because the URL stays support.dell.com. So that's just one of the things you can do with these things called CNAMEs. And lastly, an MX record is a mail exchange record. And a mail exchange record simply states what is the IP address of the server or servers that should handle inbound mail for this domain name. And this is great, because when you use Eudora or Gmail or Outlook and you type in deardavidmalin@harvard.edu and hit enter, similarly there, you have no idea what Harvard's IP address is, but your computer does, but it's not the IP address of harvard.edu per se that your email client needs, it's the IP address of Harvard's mail server, so thanks to DNS, your Mac or PC can ask your ISP's DNS server or dot dot dot, this whole hierarchy we discussed earlier, can say what is the MX record for harvard.edu. And harvard.edu's domain's name server should be able to say "send all mail to this IP address." And what's nice about MX records is you can have multiple ones with priorities, so websites or rather domains that have very large numbers of users where you really don't want their mail servers going down, you can have 2 or 3 or 10 different mail servers, and the DNS system will say, try this one, then this one, then this one, then this one, just in case any of those go offline. And it's all thanks to DNS. And while we sort of take all of these for granted, once you start developing your own websites, maybe creating your own companies or contributing back to your own school, having these abilities is wonderfully powerful, and it really boils down to these basics. Any questions? No? All right. So that was kind of long. Why don't we take a three or so minute break here? There's restrooms in the hallway, there's soda machines, I think, around the whole corner, and we'll rejoin in about three minutes. All right. We are back. So why don't we take a look at the course itself and what you are in for and what the course's expectations are. So, in terms of prerequisites, the official prerequisites are these. So multiple years of programing experience as well as comfort with HTML and CSS. So, what is this mean in real terms? So, summer school is very short. It's, just about six weeks and the course has three nontrivially sized projects and the goal really is to make sure that at the end of the short semester, you feel quite comfortable going off and doing much more on your own in the way of web site development not just HTML and CSS in the form of static websites but truly the dynamic websites that are driven by a language like PHP and JavaScript back to the database like MySQL so it's a fairly intense course. If you've only taken something like computer science S1, the introductory computer science class or just one or two courses, I will say from past experience, you'll probably find the course challenging to say the least and typically will you estimate about 30 hours per projects. So, there's three projects so you have roughly nine days for each of them. It's about 30 hours each but that was beyond average so student for whom programming is a little less familiar or it's been a bunch of years since you programmed or you've only taken one or two introductory courses but don't really think of yourself as a programmer, the course is definitely more challenging. So, do be aware diving in. I will say if you're on that fence and not sure if you're comfort level and background is there, you can go to CS75.tv which is the open courseware site for the course or we have several previous semesters worth of lecture videos, hand outs, projects, some of which will be similar to the summers so by looking to the past, you can perhaps infer as to what the summer will be like and get a sense then if the PDFs of past year's projects completely overwhelm you or completely excite you so I would try to use that as an additional input tonight before deciding whether this is the course for you. In terms of expectations there are these three projects in attending or watching, if distant or unable to make it in person the lectures, the lectures will be structured as follows. So, tonight our focus is on HTTP and sort of the mechanics the underlying fundamentals of the internet that for years you've probably taken for granted but once you really start building your own websites and having to negotiate things like configurations of servers and code and databases, tonight we'll start looking at some of those more technical details. On Wednesday and next week, we'll look at PHP itself. So, one of the PHPs most compelling features these days is one, it's syntactically very familiar-- very similar to languages with which many of the folks in this room and at home are familiar syntactically, it's similar to C and C++ and other procedural languages. It's very much in vague, pretty popular, it's pretty omnipresent these days in terms of the web hosting companies that they're out there and it's super easy to get set up even Mac OS comes with PHP and Apache, the web server pre installed and there are packages for Windows and Linux and other computers that make it super simple to get set up in terms of related languages, Python and Ruby are probably the two closest contenders in terms of popularity with PHP. And none of this is necessarily better than the others. It could vary quickly devolved into religious debates but one of the nice things about PHP is again the omnipresence of support for it out there and also I think pedagogically the documentation for PHP is outstanding and as you'll see the PHP.net online reference manual for functions and whatnot is rich with examples, intelligent discussions and so we've just found that it's a very nice way of diving in deeper to web programming. And from a course like this, you should be able to continue on. If you haven't come from that direction already to the likes of Python and Ruby, JSP for the JavaWorld, ASPs for the Windows's world. There's a lot of commonalities among them. We'll transition in lecture three to looking at XML. So, when it comes time to actually stored data whether statically or dynamically, you don't need a full fledge database. You don't need MYSQL, you don't need Oracle, you don't need anything along those lines, you could just use text files but it would be nice if it's easy to read and write those text files so XML is a very popular language of sort to metalanguage with which to write out textual files and it's representative more generally of a topic we'll come back to in our JavaScript lectures on the document object model so they'll be some commonalities there. SQL, structured query language is what's used by many relational databases these days among them MYSQL, Oracle, Postgres and others. Also in vogue these days are NoSQL servers, document storage engines which we'll look at later in the semester as well. But we'll primarily use for the courses projects MySQL, we'll look in Lecture 6 and 7 a JavaScript and it is more general technique of Ajax, the ability to use JavaScript to query a server even after a page is loaded to get back more data for instance Google maps, does this to get more squares of mapping information when you click and drag. Facebook does this to push a live updates from your News Feed and the like. We'll look through toward the end of the semester then at some higher level concepts like security which will interlace throughout the semester but we'll really focus on it in Lecture 8 looking at common attacks on web servers, on web sites, on databases, so as to not necessarily acquaint you with everything that can go wrong but to at least plant the seeds in your mind of things you should be thinking about. Indeed there are so much code out there that is just vulnerable because people don't think do things like sanitize user input that is they don't check it for dangerous characters. So, we'll talk about things like SQL injection attacks. We'll talk about a cross-site scripting attacks and any number of other ways that are so darn easy to avoid yet many people just don't realize it or don't know how even though typically simple little function calls can fix. And then the last lecture, we'll look at scalability. So, it would be a great problem to have if you've got so much traffic that all of the lessons you learn from lecture 0 through 8 starts-- your website starts crumbling under the load and so we'll conclude the semester by looking at OK now you have to built now for a few dozen people, a few 100 people at your school but several 1000 or maybe even several 1000 people per second how do you actually scale from one little web server to a bigger one but then once you have the biggest and most expensive available web server what do you do but you start to scale as they say horizontally. So, you get multiple servers, maybe even cheaper, slower web servers but you somehow figure out how to balance load balance so to speak traffic across them. How do you that with databases how do you that geographically. How do you do that with cloud computing. A buzz word that's all the rage these days but has some very interesting technologies underlying it. We'll wrap up the semester looking at those bigger picture issues. In addition to set a lectures, we will have most weeks sections and office hours so the course has four teaching fellows, folks who way either taught or taken the course before who'll be with us in the form of sections and office hours, sections will be a more slightly more intimate opportunity so on Wednesdays and Mondays typically right after lecture if you'd like to stick around to dive a little deeper into that week's project. So in addition to the PDF specification you'll get of a project. One of the TFs will walk you through the week's project so it give point-- offer some design tips, some helpful direction, answer any confusion. If I do something quite poorly in lecture, we can revisit those kinds of topics and sections so that you get another perspective altogether and then office hours which is meant to simply follow a section so one section officially wraps after an hour. At office hours will be an opportunity for one on one Q&A with one or two of the TFs and this will be an opportunity in particular for questions with the projects if you're having trouble understanding something or trouble chasing down some bug in addition to reaching out to us online we'll have this in person opportunities for those of who are local, for those of you who are distant, more on the online opportunities in just a moment. In addition to the courses' classes, there are projects, three of them and they will flow roughly in the order of the topics on the syllabus where we'll start in terms of PHP which would be new to some or most people in the room, then we'll introduce mid-semester databases and MySQL then we'll introduce JavaScript and Ajax and so that all essentially be the tripartite approach of the course's projects in terms of the topics. In terms of grades, the projects are graded fairly holistically because you'll be encouraged to make a lot of design decisions on your own. You won't necessarily have to implement precisely what we tell you to rather you'll have to meet certain feature in technical requirements so we'll evaluate the three projects on this axis. So, scope will be in axis that it will be a numeric score that captures how much of the project you actually attempted. Correctness will capture how much of your code works in accordance with the spec. If it's very buggy, that would now be considered very correct. Design is more subjective, design is OK it might work, might work perfectly but does it look like a mess underneath the hood. Do you have like 10 nested for loops. That is not good design for instance and so design would be an opportunity for particularly qualitative feedback from the teaching fellows on your code. And style is the more of the aesthetics or your variables reasonably named or is your code well common? Is it nicely indented? The sort of easy things that are good habits to get into even if you're not in them that's what we define a style and then just for reference, things are weighted and roughly the amount of time that's required to get things right. So, for instance this is the formula we'll use to compute a total score for each of the projects, where correctness for instance is weighted more heavily than style and that should capture the fact that, you know, indenting your code probably shouldn't take you all that long but chasing down bugs can certainly take quite a bit of time. So, the formulas meant to capture that. The course's website which I'll follow up in just a moment has everything that you will need for the course including videos of lectures if you can't make some evening or if it's tricky because of full time work, it's totally fine to watch the course's lectures online, of the handouts similarly will be available there. What will be rolling out over e-mail this week is access to a tool that I actually we've used in another class of mine called CS50 but it's a discussion tool that will allow you to interact with classmates, with myself, with the course's TFs online. So, online discussion forms of sorts but using some of the same technologies that we'll talk about in the class including Ajax and similar. So, you will soon receiving e-mails from us with invitations of sorts to create accounts within the website so that we can-- you can start directing questions to classmates or privately to the staff via CS50 discuss. So, any questions on the structure, expectations whether you're on the right place? As to the course itself. No. All right. In terms of-- yeah. Is there added credit for attendance? Nope. Attendance is expected and you encourage but it's not factored in. So, see you at the end of the semester, probably, right? But in terms of lecture, typically, we're slated for 3:15 to 6:15, I think, we're rarely go all three hours. Typically, the same course during the year is two hours of class so we'll typically have a little bit of wiggle room and let me not commit to just two hours per night but we will typically not go I think as many as three hours. So, that's frankly a lot to take in twice a week, no less. So, we shall see where we end up each night. Any other questions? With regard to sections, the implication of that detail is that sections will not start necessarily at the pre-ordained time. What we'll try to do is the TFs will come a bit early so if we're doing the propping of a lecture, we'll take a short break then dive in right immediately to section in office hours so that you don't sit here awkwardly just waiting for an arbitrary time to come around. Yeah. How are we then work with the distance student? For the distance students, sections will be filmed as well and we will be making ample use of online interactions for students who are primarily distance and we've also experimented in the past with things like Skype and video conferencing or online chats. We're quite flexible for whatever works pedagogically for folks, [Inaudible] distance students to attend the section via chat or something? Good question. Typically, not for distance students with sections. We do film them but there is some latency and when we post them, we may experiment with trying to stream something's online but this room is not equipped for that so, I shouldn't make promises to that just yet. But either way things will be available asynchronously after the fact. Yeah. What are the office hours? The office hours will typically be right after sections on Mondays and Wednesdays which are right after lectures. The motivation being especially for folks who commute, we'd figure we try to compact things to Mondays and Wednesday so you don't have to come to campus yet again. And we're flexible too. If for instance, you're really struggling in the class, you get lots of questions or your schedule, you have a night time class right afterward, we're happy to do things by appointment as well. So, in short, we'll meet you halfway as best we can. All right. So, web hosting companies. We talked earlier about DNS and sort of getting traffic to some destination B but once they get there, what's waiting for the user. Where are your HTML and CSS and soon, PHP files actually store? So, this is a little screen shot from this one company DreamHost and I don't necessarily recommend them over any others but they're popular and well-known and super cheap. And just to give you a sense of what you get and what you don't really get, here is a screen shot of what you get for apparently for 8.98 per month. So, you apparently get unlimited terabytes of disks storage space. You get unlimited terabytes of monthly bandwidth. You get an unlimited number of domains hosted, you get an unlimited number of user accounts, e-mail accounts, MySQL databases and a particular distribution of Linux called Debian here. So, it kind of sounds too good to be true. So, what is the catch here? Like that's an amazing deal for 8.95 a month. Unlimited everything. So, what are some of the catches or what are they doing here technologically to make this possible? Well, they're not alone on their own server. They're sharing it a bunch of other sites. Exactly. So, a lot of these web hosting companies are shared services whereby you might get this but they're also promising the exact same thing to 10 other people, to 100 other people. Now, it turns out that in HTTP, the protocol we discussed earlier, there is a feature these days for what's called virtual hosting. So, back in the day for the web, every website needed a unique IP address, essentially. So, that when you typed in something dot com you went to one website and that website lived on a server, that server had an IP address. And if you wanted a second website, you better get a second server or at least give that computer a second IP address. However, in more recent versions of HTTP, we'll see through some experimentation with actual browsers, browsers send another HTTP header. They don't just send gets. They also send a reminder to the web server as to what the user typed in to the URL so that you can now have these days multiple websites. Food.com, bar.com, bus.com, all living on the same physical server, at the exact same IP address and because the browsers remind the server what the user typed in, food.com, or, bar.com or bus.com, the server, even though its receiving for three different websites can figure out from those so-called headers what was requested and then, return the appropriate domains homepage. So, in this case, that's great because it makes this possible. We only have 4 billion IP addresses in the world and they are legitimately running out and so, this is great that we can multiplex servers in this way and put multiple people, multiple websites on the same IP address. But there's a couple of gotchas, what's the implication of this, the fact that multiple customers are on the same machine? Well, if one-- well if the server crashes and all of the websites will go down. Good. So, if the machine crashes, now all of you are affected rather than just the one. Contention for resources. Contention for resources, right? So, you're kind of in bad luck-- a bad place, if for instance, one of the other customers on the web server is Facebook.com or something that achieve unexpected popularity all of a sudden or maybe it's a website that's really ticked someone off and is getting some kind of internet biz [phonetic] attack like a denial of service attack because people are going after that website and just because your server-- your website is on the same server, now, you are down or otherwise offline as well. Moreover, one of the-- one of the ways in which these companies offer such discounted prices is because it's not just you and two other websites. It's probably not 10, it could be 100, it could be 1000 other customers on the same server and so, there must be some fine print. And hopefully, there are some fine print somewhere that does say this is subject to something or rather, right? They don't have infinite terabytes on their web server. They don't have infinite bandwidth. There's got to be some catch here, otherwise, the world will not pay $1000 a month to host to real large scale websites. So, you, again, sort of get what you pay for. And this is actually expensive. Years ago, I signed up for some fly by not operation for like 2.95 a year to host my website and it was a website that I did not care much for and that was good because it went down quite a bit. So, what they're not guaranteeing here is unlimited uptime for instance. So, there is some gotchas. But frankly, if you're just starting small, you just want to experiment, you need a place for testing a website or you don't-- your 8.95 is more compelling than several $100 or even more, this is certainly quite compelling. All right. But as an aside things like e-mail and calendar and whatnot there's other alternatives. You don't need to get those through your web post when places like Google exist. But suppose, you are not so comfortable with that approach and you suppose too that you're not comfortable also with the fact that you do not have any control over a DreamHost like server because it's being shared by other people and it's because it's being managed by other people which is to say, if they are running PHP 5.2 which is a few years old, so are you like you are running PHP 5.2. And if you want to take advantage of new language features that were introduced in PHP 5.3 and more recently PHP 5.4, you're out of luck like you're going to have to other fund a new web post or just deal with it. You can't just install it yourself typically. So, it's similarly can you not upgrade different versions of software. You can't necessarily reconfigure the web server at will. Now, they might give you some form of control but you'll reach a point perhaps where it's just too frustrating not to have administrative access to the server. So, you can still achieve that. So, virtual or private servers, VPSs or an alternative to a shared web hosting model. In the VPS world, you get a dedicated server to yourself and sort of. You get the illusion of a dedicated server to yourself. So, thanks to a technology, generically known as virtualization, these days, you can buy a server with like a bunch of CPUs or a bunch of course, lots of RAM, lots of disk space and then, you can run virtualization software on it that is something known generally as hypervisor like VMware or Parallels or VirtualBox. There's a whole bunch of these products, free and commercial or like out there that once you run them and install them on a server, on top of that software, you can then, install multiple instances of Windows, multiple instances of Linux, multiple instances, if they allowed it, of Mac OS. So you could create the illusion of multiple distinct computers each of which has its own user names and passwords, its own administrative or so-called root account. And even though, they're sharing the physical hardware, they are not sharing the same software. So, what you would get as the customer is the root log in or the administrator log in to your machine. Now, there's still the risk of resource contention because these players too were typically over provision especially if you're spending, you know, 9.95 a month and not a 159.95 a month, you're probably going to be on a server with fewer resources or with more customers. But at least here, you gain something. And if you've been following along, what is that fundamentally you're gaining from a VPS that you didn't get from a web post? You can choose to update things, however you wish. Exactly. -- to update. Control. You can keep things up to date, you can install whatever you want and also, if someone else's server is compromised odds are your is might not be whereas of web hosting server is compromised everything on that server is potentially vulnerable. So still not perfect because the reality is to even know you are the only one now with root or administrative access because it is a dedicated albeit virtual server for you that to you is kind of a white lie who else has access? The people there. The people there. Even if they don't know your password they have physical access to the machine and as soon as you as your physical access to any machine in the world pretty much you can compromise it. You can boot-- you know, Linux computer for instance can booted in what's called single user mode by pretty much hitting the letter S when it's booting up and that circumvents any request for password at which point you can even change it. Even on PCs and computers you can usually reset certain passwords by opening the case up putting a little metal connector on two pins and it short circuits out the password and clears it out. So in short physical access bad for security so you're not gaining more security fundamentally, you're just making it less likely that someone else is compromised will affect you. And in some of these system some software that system administrators will have the password or at least access to the root account on your server so in short you should just assume that this is for you but it probably at least one other person could physically access your contents. So what do you get though for the money, well here and frankly these numbers are a little more compelling because is not unlimited so I'm kind of incline to believe a bit more about the quality of service we're getting here but 20 gigabytes of storage, you know, that's fine for typical website unless your website has a ridiculous amount of traffic and database traffic and logs which could build up and start taking megs or gigabytes of space. Or if you allow users to upload files or photos then you might need a lot of space but many websites even if they are dynamic this is probably plenty. Transfer is an interesting one, 20, 200 gigabytes per month, for most websites that's probably fine unless your website is photo website or worst a video website then you have to start to do the math and figure out exactly how much trend-- data will be coming in and out of your server based on users patterns and moreover there's also corner cases as we'll discuss toward the end of the semester, you got to worry about the bad guys out there. If someone just doesn't like you or is bored or downloads some free piece of software that bangs up-- bangs the heck out of your computer they could just eat up your monthly allotment of bandwidth just by sending bogus traffic or downloading the same video again and again and again. So there's very interesting adversarial tax when you have finances somehow tied to usage so you need to be aware of that especially with cloud computing. And let's see you get some amount of RAM 512 megabytes here and so forth one of the things we'll look at during the semester is we start playing with Apache is it will give you the sense of how you can asses how much RAM your computer is using how much disk space its using. I dare say one of the most of common platforms for web hosting these days whether it's a VPS or it is a shared web post is Linux in some form whether it's Debian or Fedora or Ubuntu or Red Hat or CentOS or any number-- versions of Linux. We'll happen to use Fedora in the class but its representative of many similar operating systems. You can use Mac OS but it's not really use commercially through host websites just because is not really geared toward that. You can use Windows but you really-- there's no good reason, there's no technically compelling reason to use a Windows machine to host things like PHP or Python or Ruby because you're paying money for Windows license to run free software so it's not necessarily compelling unless you already have the licenses and have the machines generally going with these open source tools is quite common and compelling because none of the software were used in the course costs any money or whatsoever and it's nonetheless quite popular and robust. All right so what else, the appliance, so we won't introduce it tonight but we will in the form of the first project so that you have an experience in the class that is as realistic as possible. What we'll actually have each of you do is run your own web server and run your own data base server and actually run your own copy of Linux itself. For this we'll use another tool that I used in other class of mine called the CS50 Appliance, and this will be a downloadable file that inside of which is an installation of Linux, Fedora Linux specifically, but also installed for you in advance will be a Apache which is web server software MySQL which is data base software, PHP as well as support for bunches of other languages and standard tools and the like, and the upside of this is that rather than have you connect for instance to some random Harvard servers on which you'll only have temporary access, this is a virtual machine that you have on your computer for as long as you want to keep it around and it's very representative of the configuration. You would find out of VPS or the commercial web post. And because you'll have root access on it and because it will be leave on your machine only you have perfectly secure access to it unless your laptop or desktop is compromised and you'll be able to configure Apache and PHP and really tinker with things and best get-- best yet if screwed it up, that's fine, just download a new one and you're back to a sort of the beginning so long as you've saved your code at somewhere which will encourage you how and where to do. So more than that in a week or so, how would you connect to this kind of thing, so SSH. Does anyone use-- Does anyone not use SSH here? There is going to be some hands? OK. So if you haven't used SSH, stands for secure shell and this is a way, sort of old school way but now a much more secure of connecting to a remote server and executing commands on it. So this is just a free program that comes with MacOS Terminal and there is analogs for the windows or a patti [phonetic]. It's a free program for windows that a lot of people like to use and it allows you to open up essentially a black and white window or white and black window and type in a username and password, connect to some remote server and execute commands on it. And those commands can be to create files, remove files configure the web server, turn the database on or off for the like. So what you'll find, once we start using this CS50 appliance or virtual machine, albeit running on your own computer not some server is you will be able to connect to that appliance as though it's a remote server. So that you never even need to see the appliance itself. Literally, once you turn it on, you can minimize it and pretend that it's a server somewhere else in the internet because once you install it, the appliance, a.k.a. virtual machine is going to have its own IP address but it's going to be what type of IP address? So it will be private to your own laptops and no one else can even access it but you'll be able to go through precisely the same motions that you would if you're actually paying some third party to host your website or if you own some server else where on the internet. So SSH will be one of the techniques that we use. SFTP for those unfamiliar, this is screen shot of a popular windows client for transferring files, called secure effects but others exist, free ones in particular, it just let's you drag and drop files from your computer to a server but in this case, the server is going to be a virtual machine running on your own computer that maybe-- maybe or maybe not is minimized. But again, the experience would be precisely the same. So where does that leave us? So it turns out when you are writing HTML, you have fairly static content but you do have these mechanisms and I'm guessing most people in the room have some or a lot of experience with HTML and basic websites and the like. But these ultimately are the basic input mechanism by which we can start making dynamic websites. In other words, we have text fields, password fields, hidden fields, checkboxes, radio buttons, drop down menus, so these are the mechanisms by which we can start to get input from users so that when they interact with our website, they don't necessarily see the same thing rather they might see different things every time we visit that website. So let's do a little example here. I'm going to go ahead and-- don't download this on your own just yet because we will be posting a newer version soon-- but I'm going to go ahead and open up a program called fusion, VMWare fusion, this is what's called generally again a hypervisor virtualization software, and what I've just done is essentially run Linux on my own Mac and you too can do this. I'm going to actually use the Linux desktop just because it's here but I could similarly minimize it as I will in just a few minutes and we'll connect to it as well. So now, I'm running Linux on my same computer and notice I currently have no IP address but this should change in a few seconds once I get a-- there we go. So my Linux computer virtual machine has just asked the network to give me an IP address and it came. The protocol that computers used to get IP addresses dynamically, does anyone know? DHCP, Dynamic Host Configuration Protocol, that's what did that. It's also how your own personal computer works at home and gets an IP address from your Linksys router or AirPort Extreme. So I'm going to go ahead and do this. First let me go to my, back to my Mac. I'm going to open up the simplest of programs text edit which is what we used earlier and I'm going to just make a very simple web page. First, I'm going to do DOCTYPE html. So in the course, we'll use html 5 which is sort of the latest and greatest version of html and I'm going to say "Hello." Well, let's do this. Let's call this Google and the body and then close body and then I'm going to say Google, h1, OK, I'm going to save this. I'm just going to go ahead and save it on my desktop as Google.html. I'm going to say yes use html even though it's a text file and now I'm going to go ahead and pull up Google.html. OK. So not exactly Google just yet, I can do a little better so let's make this a little different. So let me go in here, div style, text-align: center. So if any of these looks completely cryptic or new, these are the kinds of things that we will take for granted in the course, that this stuff looks familiar. So let me save and reload. OK. Now, it looks a little more like Google but it's certainly lacking in some key features among them. All right, search bar, right? So let's go there. So let me start to make a simple webpage that again odds are many of you could have done already because you know HTML and forms and whatnot. So, let's go ahead and do this. So down here, I'm going to say form, I'm going to close my form, I'm going to go in here and I need an input type equals text and I probably need a submit button, so let me do input type equal submit and I'll give this a value of Google search, what we really recreated as best we can, save and reload. OK. So now, it's getting there. You know, this isn't the prettiest thing, so let me go ahead and do style with let's say 200 pixels. Let's go back here and save and it's still a little small, 300 pixels. OK, looks more Google like. And then if we really want to be anal here, we could do this. I'm feeling lucky, OK. So now we've got roughly Google but in black in white. So unfortunately, it takes way more work to implement the backend of this website. Right? So front end, pretty easy. We're pretty much done other than some colors and some other features these days. But what about the backend? So if I actually wanted to patch in to Google, let's see if we can now revisit that conversation we started earlier. When I type www.google.com and hit enter, what really is happening? Let's take a look underneath the hood. Let's look at the HTTP traffic and think about what it is we're going to start building next week in terms of the actual backend. So let's suspend this mental thread, pull up the actual google.com and take a look at what is here. I'm going to ahead first so that this is enlightening. They hid it so we have to disable this annoying instant search feature. Let's select this. Gear icon, search settings, Google instants, how do I disable you. Never show instant results. So the reason I want to do this is we're not going to talk about JavaScript and AJAX, the technologies that underlie this annoying or beneficial instant search feature. We want to do sort of old school HTTP searches right now. So I've disabled that. So now hopefully, I can save. OK. Now, I'm going to go back to google.com and this is what it was like five years ago when you wanted to search for something on the internet. So now, I'm going to go ahead and type for instance Harvard. So it's still doing auto complete but it's not immediately showing me the search results. So now, notice before, here is URL I'm at, www.google.com. And now after, let me hit enter, now notice the URL. So this is now hinting at the fundamental functionality of HTTP. We have just issued one of those get requests. We had two of them in fact. The first one came up when I visited the homepage then I hit enter and it appears that another get request has been sent. Why? Because my URL changed. So generally, anytime the keyword get is involved, it's because the URL is changing or equivalently if the URL changes, you just did a get most likely. So there is a whole lot of distracting stuff up there but what is relevant and what looks familiar up there in the URL in gray? I have no idea what HL is. Sites, I don't know. Source, I don't know. But what looks familiar? OK. Harvard. So let me delete manually all the stuff that I have no idea what it means, at least not yet. So let me, I don't know what this is. Q equals Harvard that I-- oq equals Harvard, oq, I don't know. I'm just going to presumptuously whittle it down to that and now let's hit enter. So interestingly, still works and what's nice is that there is much less distraction and we can have the same story but with fewer distractions in the tail. So it looks like when hitting enter on the previous page, if I throw away the distractions, I have now visited, not slash but slash search, question mark, q equals Harvard. So what is q? Q is generally known as an HTTP parameter. So it is an input to a web server that generally comes from a form. But as we'll see in a few weeks, it can also come from JavaScript code. It doesn't have to come from a form per se. Harvard is obviously what I typed in. So what is slash search? Well, it's not obvious here what programming language Google uses but if we were on Facebook, we would probably see search.php because Facebook is known for using PHP. They're also known for not hiding their file extensions, which is very easy to do but they just don't for some reason. Google does hide their file extension but a lot of Google's code, at least front end is written apparently in Python or in some other languages. So it's not clear what language is on the server but slash search is referring to some file or some folder on the server. What does the question mark denote? What's that? Start. The start of the parameters. So anytime you have a question mark in the URL, that demarks the path and the preceding part of the URL from all of the parameters and parameters are key value pairs, something equals something. And if you have multiple parameters, what separates them, even though I already deleted the others. Yeah? And. The and, the ampersand symbol. So if I hadn't deleted all of that, recall that we saw something like this just a moment ago and oq equals Harvard. And I don't know what oq is but that's how you would separate parameters, with ampersands. So this means we have submitted key of q and a value of Harvard to the server. So now, let's use a fairly common tool built into Chrome. It's also built into Safari. Firefox has something similar when we recommend something called Firebug on the course's website with which to do the same kind of thing. But I'm going to go to View, Developer, and Developer Tools. And I will say these days certainly when using LAMP, Linux, Apache, MySQL and PHP, which is this course's focus, many people are increasingly using Chrome, one, because it's popular, two, because it's fast, three, because it comes with some developer tools. I would say Firefox is also wonderfully convenient for doing development and you should certainly test on multiple browsers as we'll require in one of the-- in the first project spec. You can do Window-- rather Internet Explorer is getting better about having some integrated development tools. From the courses perspective, we don't care about browser you use because you'll be using again the appliance as a server. You can use whatever browser, whatever operating system on your own computer that you're most comfortable with. But if you are coming to this with some, I'd say less familiarity with various tools, Chrome is pretty popular and Firefox tend to be, I think better for development purposes even though you should test on all of them. So what am I seeing here? I've just opened the developer tabs, and now I have elements, resources, networks, scripts, timeline, profiles, audits and console. We're not going to use all of these but a few of them are quite helpful, one, the elements tab shows you the pages HTML but it pretty-prints it for you and it makes it hierarchical so that with those little triangles, you can dive in deeper and see even though if we look at view source for the page, it is an utter mess of a page. If I go over here and view page source, this is what came back from the server. And I would argue this is not very readable to a human, even when we get to the actual HTML, even the HTML not that readable. Color coded maybe, still not useful. So what does this do? It actually-- The developer toolbar actually parses it for you so you can start to navigate. And this is actually wonderfully compelling whether it's your site or someone else is. If it's someone else is, it's a wonderful way of learning how they did something or how they stylize something. If it's your own site, it's a wonderful way of chasing down bugs. And also as you'll see, changing on the fly some of the aesthetics without having to change actual files and then reload or re-upload. So, we don't care so much about elements right now but we do care about network. So let me go to the network tab. And what this tab will do for us is sniff all of the network traffic between my browser and the server and it will show each HTTP request one per line at the bottom. So I'm going to leave this window open, I'm going to click reload. And again, this is my URL. And here we go. That's a lot. I only hit reload once but why in the world that so many rows appear down here? I clicked once, but look how much stuff just happened. Why? Each of those again represents an HTTP request. So a virtual envelope from browser to server and back. Yeah? Well there's a lot going on behind the scenes. What does that mean, behind the scenes? Well, Google doesn't just have one method for [inaudible]. There's things that it has to call, other things that it has to find and bring up and so that's all coming up in this-- OK, good. So Google needs to pull up other things. Give me a concrete example of something it has to pull up. Like an image. Good. So inside of the HTML that's initially downloaded, there could be an image tag, a source tag, a link tag to CSS, to JavaScript, to images, it could be flash files, it could be a whole bunch of other assets, so to speak. And to get those, the browser is predefined to sort of recursively go get those assets. So if it sees a source tag or an image tag, it will send another virtual envelop requesting that file specifically. It might do it over the same network connection, the same TCP socket, so to speak, but each of these rows represents a different file that was downloaded. Ironically, it seems that Harvard is not behaving well in the terms of the auto previews but that's good. In another day we can look at why. But let's look at the first one because that's the one that'll be the most enlightening for now. And when I click on this, there's a few detail. So, one, the preview is just what was returned. And here is another big mess of results from the web server. But we don't care so much about that. I care about the headers. So let me zoom in on this. And rather than look at this fairly pretty-printed version of it, I want to look at the raw source. So, we're diving in deep sort of intellectually here so let me look at view source. Now this is what was literally sent in that virtual envelope that we started tonight's discussion with. So, there's the top line, GET/search?q=harvard HTTP/version number, so that was in the envelope. And we did promise there's some other stuff in there. Second line is a reminder to the server as to what the user typed in. So what is the host name. Now frankly Google is not sharing their servers with other companies most likely so this doesn't really matter there. But for shared web posting companies, the fact that I'm being reminded what the URL was means I can serve up foo.com or bar.com or bus.com, so thankfully HTTP does that. There's some arcane information here related to caching and connections and efficiency. Well let me wave my hand at that for now. User-Agent is interesting and you might know this already but if you don't, every webpage you've ever visited has-- every website you've ever visited knows what computer you have and what operating system you're running and what browser you were using. Why is that? Well browsers by default reveal precisely that information. I have just told Google, behind the scenes that I have a Mac running Mac OS X 10.7.4 and if I scroll down further, they will be able to infer that I was using Chrome version something or rather. So why in the world is that useful? Yeah? [ Inaudible Remark ] Good. So, arguably, this is useful for debugging purposes, useful for demographic purposes to know who your users are. [ Inaudible Remark ] Good. So there are some features that could be dictated by what type of OS or browser someone is using. For instance, if you go to a website that lets you download software, you know, it's not necessary that you detect what the user's operating system and browser are but it's kind of a nicer user experience if the server only shows you the Mac software because you clearly have a Mac as opposed to me having to figure out which of the links to click for Linux or Windows or Mac OS. Another argument frankly is that this is completely unnecessary and we should never have gotten to this habit in the first place. Because if for-- it's not necessarily used all that much. And indeed, writing websites that require knowing what the user's browser is, is actually generally bad practice because there will be certain privacy tools that users can install in their computer that just hide this information altogether for better or for worse. And if you're relying on certain headers to be sent, your own website could misbehave. So there are-- it turns out there's other tricks for doing detection and typically as we'll see in JavaScript, it's better generally to detect whether a browser has a certain feature rather than is it a specific operating system or a specific version of a browser. However, databases freely available exist that allow you to figure out based on the so called user agent strings, what version of browser and operating system someone's using. Because frankly this is a little hard to read, so software exist that simplifies this. So you can just check a boolean variable is Mac or in PC. All right, what's below? Some more arcane details that I'll wave my hand at. Cookies we'll come back to in a week or two when we actually start using them to our advantage but we'll also talk about the security implications of them. But in a nutshell, all of these headers, just text is what was inside of that virtual envelope. And the most important one arguably was the very first one because that tells Google what to return. But now we see that it's not just slash, it's a full path. So Google has hopefully parsed that string, so to speak, slash, search, question mark, q equals Harvard, and then use the q equals Harvard as input to its database or whatnot to return customized results to me. Now if we scroll down, let's see how Google replied. So this is just a Chrome thing. It is just kind of dumbed down display of the query string parameter, so it's just useful. Especially for a developer you can just see it more easily this way. But let me go ahead and view source now for response headers. This is what the server responded with. So it turns out many of you have seen numbers return by server. Who has ever seen the message 404 come back? OK. What does 404 mean? [ Inaudible Remark ] File not found, right. So it's an HTTP status code. It's an arbitrary number the world decided on years ago that means file is not found. What are some others you might have seen besides this one? [ Inaudible Remark ] 501, so internal server error of some sort. 503, it's another internal error or resource forbidden. There's 403 rather which is forbidden. What's that? 301 and 302. 301 and 302 are redirects which are actually quite useful and we'll start using those in the next lecture or two. So in short, there's some codes that you've probably seen. 404 is maybe the most popular. 200 you might not have ever seen, but this is the best one of all, 200 is literally OK, it means everything worked out well so you just don't see it because it indicates success as this little green icon that we saw a moment ago before I expanded this. So, this is the server's response, 200 means, found what you're looking for, here it is. Now, what else comes down? We have the date from the server which might be useful, expires and cache control, so directives to the browser saying, doer, don't cache this, even though these are not necessarily reliable, but we'll talk about this when we get to PHP, this is interesting. Set-Cookie, Set-Cookie is amazingly powerful, if not, a little unsettling especially in the world of advertising and tracking, but we'll talk about that in the context of PHP. Notice that the server is telling us that it supports gzip which is like a compression utility, which is a compression utility and this just means, hey, you can compress your data to and from me. The name of the server, gws, probably Google web services and then some headers that they use for some of the security things we'll talk about later in the semester. So that's what Google has returned in addition to the content that has come back from the server. So let's see this outside the scope of our browser. I'm going to go on and open a program called Terminal which comes with Mac OS, for those of you with Windows PuTTY is another option and we'll look at that or encourage that in-- my music is still playing. We'll look at that, we'll recommend that for future project and I'm going to run a program called telnet, telnet is like SSH but unencrypted. You know, that's bit of an over simplification. I'm going to go head and telnet to google.com and nothing actually happens. But it did figure out Google's IP address, so that's interesting. But telnet by default uses port 20-- it's been so long, 21, tenet uses port 21, TCP port 21 but there's no telnet server there, telnet use to be to send messages and connect to email serves and the like. But what if I instead say 80, so there's no colon in this program, there is in the browser. But this is going to connect from my laptop to google.com on TCP port 80. So this is interesting. Now, I've connected to their server, why? Or how do I know that? It's telling me, connected to www.l.google.com. Where did this l come from? They're doing some DNS trickery, it's probably for load balancing purposes, they have multiple servers, so therefore, I've gotten one of them, specifically. But I'm going to pretend to be a browser. I'm going to say, get me slash using HTTP version 1.1 and then hit enter. And then if that's the last of my headers, I have to hit enter twice, and viola, what do I see? Well the font is kind of big and HTML and Java Script is kind of minified but that's exactly what my browser got back. But if I keep going up and up and up and up, notice I can see exactly what the server's response was. So I see my HTTP headers that came back from the server, Set-Cookie and all those same lines, exactly what I saw on the browser. So I've just pretended really to be a browser. And we can do this with any websites and it's more than just a curiosity, it can actually help with debugging or actually seeing what's coming back from a server. I can do www.harvard.edu 80. GET/HTTP 1.1, enter, enter. So, interesting. Bad request. Now why is that? So we see some HTML, because this-- the web server assumes that a browser will typically be doing this. Why might this be a bad request? I'm actually going to guess here. Let's try this, GET/HTTP 1.1, Host harvard.edu. There we go. So it didn't like the fact that I did not send the host header which means Harvard's web server is probably using something called virtual hosting, which is that feature I alluded to earlier when a website can support-- when a web server can support multiple websites but for that to work, browsers have to cooperate. And the fact that I did not send that header meant that the server didn't know who's home page to return so it gave me that 400 response of I don't know what to do. Now let's try one other thing, let me cancel this and let me do, telnet to not www.harvard.edu, let's try this one, just see what happens. So GET/HTTP 1.1, enter, enter. OK. It didn't like that. So let's fix this again. So, GET/HTTP 1.1, Host harvard.edu, enter, enter. Interesting. This is not the home page? What did I get this time? Some message about it's moved, harvard.edu has moved permanently no less. And if I scroll up, more esoterically in the headers is one, a different status code, 301 which we mentioned earlier, 301 means permanent redirect. If a browser receives a 301, it should never ask that question again. It should just remember harvard.edu moved, and it moved where? To the value of the location field which should also be included in the response headers. How did that happened? Well some system administrator or a dean at Harvard just decided arbitrarily that's-- but reasonably, that we don't want to standardize on harvard.edu in our browsers and people's browsers. We want them automatically to be redirected to www.hardvard.edu. Why? One, branding, I mean, that's one reason, which is perfectly reasonable. Two, more technologically, it can be better for, security is a bit of an overstatement, but for technical reasons, having the www means your cookies can be isolated to www.harvard.edu, whereas, if your cookies were instead sent to hardvard.edu, that means your cookies could be read really by any websites, so including cs.harvard.edu or summer.harvard.edu. So by saying www, you're also forcing at least by default cookies to be more precisely defined. So there are some technical reasons as well. Only a year or two was this problem fixed, they-- few years back, someone new came to Harvard to run the news office and one of her first things, one of her first acts was to fix a horrific omission for years where harvard.edu did not exist. www.hardvard.edu existed and they weren't even redirecting. So, that is a bug that's now been solved. Any questions then on what just happened there? You all have the terminal window open, let me offer up some other troubleshooting tips. NS look up, name server look up is a wonderful way of doing those DNS look ups we talked about before. What I've just done is ask the nearest DNS server which happens to be this because that's how Harvard has configured the campus, that's the DNS server. And I've asked what the IP address of hardvard.edu and it's given me this IP address. So if I want to get curious, let me do this, http://ipaddress, interesting. Why this is not working? Well, again, VHosting. Like the website is not configured to understand IP addresses by default. However, let's try another one. NS look up cnn.com-- oops, cnn.com, well interesting. So it turns out with DNS, you can also do what's called round robin. You can return multiple IP addresses for a web server and those can rotate literally in the order in which they're returned to do load balancing and we'll discuss that topic again toward the end of the semester in scalability. But let me choose one of these. And CNN either pretty big. I am guessing they don't really share with some other with some other websites. So let's just go their IP address and indeed, there it works. And now notice my URL hasn't changed. So now, if I really want to get sort of-- if I really want to get sort of creative, I'm going to do this. On my Mac and you can do this on a Windows machine as well. There's typically a file on Macs and Linux computers, it' called etc host which a text file that maps, that hard codes IP addresses for domain names. This is useful generally for internal corporate use or development purposes. So we'll be able to do this with projects as well. I'm going to go ahead and authenticate here. It's just a text file and notice this is some basic ones that come with the system. This is an IPV 6, version 6 address written in a crazy from. I'm going to go ahead and paste in not the URL but the IP address of CNN and I'm going to say this is davidnews.com. So this is like manually overwriting, the mapping of that IP address to something else here, only from my own computer. I'm not running a DNS server. It's just that my Operating System Mac OS and Windows is supposed to look at a file like this before asking a DNS server. So now let's see if this works. It doesn't work with all websites but let me go to http://dividnews.com. Come on. Oh, come on. There it is. I've just made my own news site. So, frankly, this is kind of stupid of them. Like, I was just joking with some friend the other day that you could kind of have fun with this and make fairly offensive domain names and they all lead to CNN somehow. And why is this? So, this is trivial defects frankly in a web server. A web server could be configured as you will be able to do with features of Apache before long of checking upon receive of one of those virtual envelopes, what was in two field? If the two field does not match something that we're happy with, redirect the user. How? You respond with what status code? 301. So it is actually trivial to fix this kind of thing. That could still lead-- They can't stop davidnews.com from leading to cnn.com but they can stop the browser from staying there or at least by encouraging it with that 301 to redirect elsewhere. And this redirection is super common, not just for harvard.edu but even the course's own website. If I go to http://cs75.net and hit enter, notice what the URL changes to. A few things happened there. So this is the course's website. What are some the things that got inserted automatically it seems? Yeah. [Inaudible] secure version. Indeed. So, I didn't just go to the www version. I also went to the secure version, why? We've just gotten into the habit and I personally have gotten to the habit of using this it as a result for everything. It's relatively cheap to do. It's relatively trivial to turn on and it's only getting cheaper as CPUs are getting faster. And some of you might be familiar with-- about a year and a half ago a tool called Firesheep was released which was a wonderfully free proof of concept of something called a session hijacking attack. Something we'll talk about in a few weeks time in the context of security. Long story short, if you are visiting a website that uses http://, it is fairly trivial for someone in your nearby wireless vicinity whether in this room, at Starbucks, even in your home, if you have adversarial siblings or roommates to login to your Facebook account or your Google account or-- actually not Google, Facebook account or Twitter account or any websites that's not using https. And that is because if you're not using https, nothing is encrypted and you probably knew that. But among the things that aren't encrypted are things called cookies. So if you are just broadcasting cookies and cookies it turns out as we'll see in a week or two are the mechanism via which users are remembered as being logged in to websites. If you're just sending that cookie to a website again and again to remind you, I'm logged in, I'm logged in, I'm logged in, that's not encrypted. Anyone in Starbucks can sniff that cookie and with the right technical savvy, as you will soon have, send it as their own and now they're logged in to whatever you were. It doesn't mean they know your password but it does mean they can hijack your current session so to speak. So I retracted Google because Google about a year or two ago, thanks to some of the issues in china they had with the hacking and whatnot. They transitioned all their services to https. At least if you opt into it Facebook, also finally offers this. But again, we'll come back to this. So, a few more weeks of insecurity in your lives if you don't mind but we'll come back to this and talk about how you can have certain defenses up. And then tragically, even websites that redirect from the unencrypted version to the encrypted version might still be vulnerable because many of those websites will first do a redirect that sends your cookie in the clear and then it realizes, "Oh. This should be secure." But by then it's too late. So, even though banking websites almost always use this. There have been certain banks to be known to be not so technically savvy who are still leaking cookies for reasons that we'll soon reveal. So I inserted the www the https and then the stupid main page, which is a media wiki thing which is the tool we used for the course's website. It's free wiki software and that's not really intellectually interesting, just their own software thing. All right. Any questions on http? Well delete davidnews.com. And again, that wouldn't work with all web sites. And in fact, Harvard's probably would not work for us. All right. So we've teased apart these http headers. They'll become an invaluable resource when it comes time to chase down bugs or features in your own code. Let me take another quick look at something within Chrome's developer toolbar then we'll do one last thing with regard to Google and see if we can implement a little more of our own version of Google. So, let me go to the elements tab and just as a proof of concept here, Google's website is a little complex underneath the hood, but let me go ahead and right click on Harvard University and I'm going to choose Inspect Element. And Inspect Element is nice because it's going to jump me right to the part of the html that relates to that portion of the page which is wonderfully useful for diving in deeper to a specific place. So that is the A href that got me that Harvard University link. Now suppose, I'm actually Google's designer and we're not quite happy with the shade of blue or the font size or font face and so I want to tinker with the web site but frankly I don't want to have to log in to the server and change font size then save the file then reload the browser and go through all these hoops. I kind of like to do it inline in the browser albeit without saving any changes. So notice here on the right, if you've not used Chrome before, that-- or the developer toolbar. Notice on the right, you have a summary of all the styles that relate to that specific element in the web page. So from top to bottom, here's the C in CSS, cascading top to bottom. Here are all of the rules that apply to that element. So, here there's apparently somewhere in Google CSS and apparently the file called search on line six, an a link, a .w class mentioned and all of this where they're specifying the color of the link and the cursor that should be used over it. So let me just try it for kicks. Let me go in here and let me change this to let's say completely random like orange. So notice what I've done up there. I've changed the Google's link to orange of course not permanently and only for the links in the page for which that CSS rule applies. But the point here is that this is just a wonderfully quick and dirty way of experimenting especially if you're quite eager and you want to get like the pixel alignment perfect in something. Being able to just tweak it ever so slightly here and then figure out what the values are and then write them in the actual file on the server. It's just wonderfully useful. Also too, if you are trying to figure out what font a web site uses. I can go to computed style because this can be a little overwhelming like, "My God. There's so many rules that apply to this element because of the cascading nature of CSS." Let me just look at the computed styles which is the summary of the end result of all of the styles that have been applied and if look for font family, it indeed this is apply-- this is Arial followed by Sans Serif. So I know what font now Google is using. So just a debugging trick it you've not used it, this and the Network tab we'll end up using quite a bit most likely. So now back to our own version of Google. Unfortunately, if I type in Harvard and hit Google search, it doesn't go anywhere sort of-- it kind of did. Where did it end up? The URL is almost the same. It's a crazy looking URL because its obviously a file on my hard drive not in the internet. But what did change in the URL? Yeah? [ Inaudible Remark ] Good. So the question mark got appended but no parameters got sent so let me go ahead and take a look at my file again. Oh well, no parameters were sent because I didn't give any of them names. So let me go back and fix this. Let me shrink the font so we can see more at once and let me do input name equals q here and let me go back over here and reload the page. And now let me go ahead and type Harvard and now click Google search. So now we have some progress. So this is interesting. Now unfortunately, Google.html is not a web site nor is it dynamic. It's literally just a static file so you can send any parameters you want it's just going to ignore you every time. But what if I kind of cheat here? I'm kind of in a hurry to implement my own search engine. I-- You know, I already knocked off my own news site, now I want to do my own search engine. Well I can actually do this, form action should actually go to, let's say www.google.com/search and I'm going to say method equals get in all lower case here, slight inconsistency with what we've seen before. And now, let me go back to this page, oops, reload and now let me type Harvard and watch the URL as I hit enter. Ah, now I have implemented my own version of Google but how? Right. All I did was I constructed a form I specified a method of gets an action that happens to be a point B elsewhere but because of HTTP and because the browser knows how to handle forms, it compiled all of the key value pairs, in this case just one, q equals something put it in the URL and sent it to that action attribute. Sent it to that particular URL. So now we have implemented our own version of Google. Now of course it would be nice if it were our search results and we weren't completely cutting corners here but that's where we'll need something like PHP to do things server side. Let me pause for just a moment. Peter, do you want to say hello to the class? Peter is one of our four teaching fellows the others are at work right now. Do you want to say actual hello? Yeah. I have to come near you with the microphone though. Yeah. Why don't you come this way so the camera's a little more readily available? Hello all. I look forward to working with all of you and I will see you on Wednesday. OK. Excellent. Thanks very much. Any questions then about Google, fake news, HTTP? Yeah? [ Inaudible Remark ] Good question. Let's see. Had it not been named q, would this be broken? So let me go back into here and let's just call it query thinking that's a reasonable name to give a parameter to. Let me reload my HTML, let me type in Harvard, enter. And interesting, they support query. So let me misspell it. OK. That's probably not supported but maybe let's see. OK. Let me reload, type Harvard, enter. OK. That just didn't work. So they support query or q for whatever backwards compatibility reasons. Good question. Other questions? Yeah. [ Inaudible Remark ] Yes. On the entirety of Harvard's campus your sessions can be hijacked. So, if you have malicious room mates or you are-- there's more technical people around you, you are vulnerable to this. Session hijacking can happen can happen anywhere where encryption is not used. Whether between you and the access point, the wireless access point or between you and the end point web server and I should say especially those of you who are from out of town, realize that Harvard does have fine print and rules about not doing this to other people otherwise they-- you can solve this problem by expelling people,. That's one way if you can't do it technologically. So, this is one of these things where we're trying to educate you as to how it can be done but don't go trying this in the dorms on campus. Wait until you go home on your own home network. Other questions? All right. So where does this leaves us? So we started by talking about Google and we keep talking about Google but just because it's so popular but the story applies to really, any web site out there. So, we talked about DNS and the process of not only looking up a web page's URL or rather IP address but also getting your own IP address and getting your own web server and your own domain name. We talked a little bit about HTML forms which you might be familiar already but some of the tools with which you can get a little more comfortable with diagnosing things and debugging things and we've written some HTML that submits a form. All that we haven't done yet is actually implement a dynamic web site. For that we've completely out sourced through CNN and Google today but one of the first things we will do this coming Wednesday is dive in deeper to PHP and things like get and post and sessions and shopping carts. Some of the security implications around them, we'll dive into the CS50 appliance and this virtual machine environments with which you'll get to play in terms of Apache and PHP and MySQL. So tonight, we'll adjourn a bit early. I'll stick around with Peter for any one-on-one questions you might have but otherwise, we will see you again on Wednesday and after Wednesday's lecture, we'll flow right into section and office hours of you have questions about content even though the first project won't be released for a week or so. All right. See you in a couple of days. [ Silence ]