[ Silence ] All right. Welcome back to Computer Science S-75. This is Lecture 5, SQL, Continued. So this worked out so well last time. I thought we could start with the same question. What did we do last time? Yes. 

Posting into internet SQL. 

OK, good, so an internet SQL. So structured query language. Someone from this side now, what is SQL about? Yeah, Jack [assumed spelling]. 

It's a nice way that we can interact with the data that we can store in MySQL database. 

OK. Nice way to interact with a data that you can store in a MySQL database. And in what sense, Axel [assumed spelling], do you interact with the database? 

Well, you can interact with it because it runs on a port on a server-- 

OK. 

-- and essentially just stop it via a terminal window-- 

OK. 

-- type commands but there is a GUI that's called PhpMyAdmin-- 

OK. 

-- that we use together with MySQL. 

OK, good. So we've seen at least a couple of ways to interface with the MySQL database. You can use the command line or the actual MySQL client, which is the sort of retro black and white window, which is wonderfully useful but also limited just in terms of its screen real state. We also looked at a tool called PhpMyAdmin, which is popular. It's not the only tool but it tends to be popular. It's a coincidence for our purposes that it's actually written in PHP. We haven't looked underneath the hood but if you did start poking around, you would see that that entire GUI is a dynamic website implemented in PHP but there are others. If you're a Windows user, there's actually a downloadable Windows client that you can use natively on your own Windows computer and connect to, for instance, a remote MySQL database. Generally, you cannot, though, connect too far away MySQL databases. The reason being that the communication is not, by default, encrypted in any way and indeed MySQL itself does not really have very good built-in functionality to do the equivalent of say HTTPS or SSL. So it's generally best practiced to keep your MySQL database on the same network as the machines that are actually talking to it. So to be more clear, you could not generally sign up for like dreamhost.com, have MySQL database somewhere out there on the internet and then use like this Windows client to connect to it because everything you've been sending, including your password, in the clear. So be mindful of that. All right. So that's how you interface with it. What kinds of queries or statements did we see last time are possible? 

INSERT INTO. 

INSERT INTO. So this is for inserting rows into a table. We call that MySQL databases or relational databases, which means there are rows and columns. What other statements did we see? 

SELECT and then would you want to select a star, a wildcard star or natural-- 

Good. So we saw SELECT queries like select star from the table name. And that gives everything or you can say select X, Y, Z, that will just give you those three fields and then we introduced the notion of a predicate, and a predicate lets you filter that result set, much like XPath lets you filter the node set that comes back. You can say where user ID equals 1 or where email address equals JHarvard at something dot something or the like. So you can filter your queries and you can even Boolean and/or them together. So you can say give me anyone whose gender is male or gender is female. So you can get back to everyone in that way even though star would suffice in a case like that. So in short, we have this ability to filter our results. So that's SELECT. We've talked about INSERT. There's DELETE. We haven't really used this one yet but you can use DELETE to delete rows as you'll find. What's another? Yeah. 

UPDATE. 

UPDATE. So UPDATE is for actually updating existing rows. This might make sense if a user updates their password, their name, their phone number, whatever the case may be. So, for those unfamiliar, realize that this kind of database follows a paradigm that has kind of a silly acronym associated with it but handy to know. This is the-- literally, the CRUD model for database queries. Anyone want to take a stab at guessing what this acronym means? It's really kind of stupid. At least the acronym but the capabilities that it's describing is actually quite common. So C is for-- Yeah. 

It might be Create, Read and then something and then Delete. 

So close. Yes. What's the something? What's the U? 

Update. 

Update. That's all it is. So Create, Read, Update, Delete. These are the incarnation of this acronym in SQL is what we've already said, create is insert, read is select, update is update, and delete is delete. But you'll find that in other context where you have this kind of expressive capabilities, that it follows this so called CRUD model. So just FYI if you come across that. All right. So, let's introduce a couple of new features now of SQL and then we'll try to take a stab at designing something a little more complex than last weeks user's table. So recall that MySQL has a whole bunch of types. And here we have on the top left, string-related ones, then dates and times, then numeric ones, then some floating point values and blobs. Let see if we can pluck off a few these interesting characteristics of each of these. So when might you use a VARCHAR field in a table? How about someone from this side again? VARCHAR, when should you go for a VARCHAR? Yeah, Louis [assumed spelling]. 

Middle initial. 

OK, so a middle initial. Why? 

Because you'll know like the length of it and see the space. 

OK, good. So it's probably just one character, maybe two or maybe zero. And so, if that's the case, you might not necessarily want to waste space unnecessarily. So initials, not necessarily a very compelling case because we're only talking one or two bytes. So what's another scenario that's a little more compelling for a VARCHAR? 

When you use it when you do not know the length of the thing that user inputs. 

Yes. 

Username or name. 

Good. So for something where you really don't know the length and an initial, I mean, most of the time, it's one or zero maybe two, but I'm hard-pressed to even think of a two-letter one. So for something that's a little more variable by nature, email address or name, then a VARCHAR might be reasonable because then you can say anyone's name can be up to 30 characters but frankly most people's names are a few characters, 10 maybe, if that. So, why waste 22 or so additional characters, bytes in your database, storing them unnecessarily if do not need them. So there's a price we pay, though, for this. What's the downside of saying variable length CHARs as opposed to just a CHAR field? Yeah. 

Searching is slower if it's not on the same size. 

Yeah. So searching is most likely slower even if it's imperceptible for certain cases. In large scale scenarios, we have many, many, many rows of data in your database and you actually want to search those rows efficiently. One of the upsides of using a CHAR, which by contrast is a fixed length field, is that you can generally search that field more efficiently because you don't have sort of this variable length to your cells in that table so to speak, rather everything is fixed length. So just like an array allows you random access, similarly might to be able to leverage the idea of random access in your column if all of the lengths are identical, eight bytes, six bytes, sort of the like. The downside of course of a fixed length is that if your middle initial is just one letter or if your email address is fewer than the fixed length you've come up with, well, then you're just wasting space. So it's a tradeoff, we just do not get anything for free. What about text fields? Why did we introduce those? When are they useful? Yeah. 

I think text was for like longer texts. 

OK, so longer text. 

That were much bigger than what you would typically-- say the length of the VARCHAR, some text would be very much bigger than VARCHAR. 

Good. So much bigger than a VARCHAR, typically. So if it's like in essay fields. So if it's like a college application kind of thing where you want people to upload big blob-- actually, college essays are what, like 500 words-- So probably not a good scenario there, but if it's a much larger corpus of text, maybe it's the contents of a web page that you screen scraped and you want to store all of that data in just a big chunk of text, maybe you would use a text field. And one of the upsides of that, as we'll probably encounter in PhpMyAdmin again tonight, is that you can specify that you want to have a full text index on it, which means you can use Google-like queries to search for keywords in that field, and you don't get those same capabilities in for instance a VARCHAR or a CHAR itself. So what does this mean? Well, typically, when you're using a VARCHAR or a CHAR, you can say something like this, SELECT star FROM users WHERE what might be reasonable. How about let's say WHERE citystate-- oops, autocomplete-- WHERE citystate LIKE, and now I do not necessarily know what their state is going be but I want to search for all states, let's say containing the word let's say "New". I do not know why I want to do this but I want things like New York, New London. I want to get back all of the rows in the table where people are from towns whose names contain the word "new". It's completely arbitrary, this particular example, but how can I express this? Well, there's the LIKE operator in SQL whereby you can pattern-match, and the fact that I've used this percent signs happens to be the wildcard character in the context of a predicate. So in this context, it's not the star, it's in fact the percent sign but this allows me to do pattern matching inside of a CHAR field or a VARCHAR field. However, if I instead make something a text field then I can actually use a more powerful matching expression that allows me to say, "Give me all the things that match this keyword and this one," and then your database engine can even rank them in terms of relevance which might be advantageous if you're building a search engine for your site or the like. But to do that it needs to be a text field. This is probably-- typically the more common one when you're just trying to look up queries. And you'll see this perhaps in MySQL if you actually leverage its search tab, which we didn't use last week, but you'll see it among the various options. All right. So, dates and date times and years. Why use something like a date or date time or year field when clearly, you could implement this yourself with just CHAR fields or VARCHARs? What's the point? Yeah? 

It's probably more standardized and may be more efficient. 

Good. So more standardized and arguably could be made more efficient. For instance, to represent a year, we humans, we typically write out four characters but maybe a computer can do that a little more efficiently by storing the number of years from some offset, 1970. Now, that would be dumb in this case because then you can never store other years but you-- stands the reason that the computer could come up a more efficient representation than you plus it is in deed standardized. You also have the ability in MySQL to call functions that are built in to the database engine itself. So, you can manipulate things like dates and times just like you can actually do in PHP and in other languages. But if you ask for a date, you can get it back in-- a date time rather, you get it back in your preferred format, maybe it's 24-hour time, 12-hour time, you get that kind of flexibility. And so it's standardized and it's better typed inside of your database. So, it's not just some free form string that you're trusting yourself and other developers to actually insert reliably. And in term of numbers, TINYINT SMALLINT MEDIUMINT, INT, and BIGINT. So INT is pretty common to use. Facebook, for instance, use INTs early on for its IDs. But what's the downside of using an INT for things like Facebook IDs? Yeah? 

You eventually get a lot of them might run out of the-- what you can store in it. 

Yeah, exactly. So there is a finite number of INTS, which in this case are 32-bit values, and if they're unsigned, you have as many as 4 billion values which frankly would be a good problem to have if you site has more than 4 billion users, and therefore you've overflowed in INT, but you got to think about and in Facebook's case, people are signing up for fake accounts and they're already had like 800 million legitimate users. So, they're getting up there anyway, and so it was definitely a scenario in which they would not have wanted to use 32-bit INTs long-term just because you're fixed in the maximum number values that you can use there. So, frankly, these days, if you're doing anything where you anticipate a huge number of rows, it's not unreasonable to use BIGINT, which would be a 64-bit value and what kinds of scenarios, well, probably not users, you don't need 2 to the 64 possible users in your system most likely. But if you're writing things out like log files or you're recording transactions of some sort where you have no idea how many widgets you're going to sell in your e-commerce site, or you just don't want to have to bound yourself, it's probably worth spending an extra 32 bits just to ensure that you'll have a higher probability of uniqueness long-term, especially when consider the auto_increment feature. And this is kind of one of the little downsides of it. What was the auto_increment feature of a numeric field? Yeah? 

If you add auto_increment, it's automatically going to add into itself, so you're going to have, say, an automatic unique ID-- 

Exactly. 

-- so you don't have to query the database with the largest number, take it back to the server, plus one, and then INSERT. 

Exactly, so with auto_increment, by contrast, you can tell the database, this field here-- let's call it ID as we did with the user's table last week. You can just say, this should be number and it should start at one and it should forever auto_increment for each additional insertion that I do. And that's fine and you can get up to 2 billion with signed numbers, 4 billion with unsigned, or 2 to the 64 with unsigned 64-bit numbers. But, what you don't get is the ability to reuse numbers. So, if you delete, for instance, a user from that table, their number is not going to get reused. So you could end up getting sparsely-populated data, which isn't necessarily a bad thing, but that just means that even though you might not have 4 billion users, you might have used 4 billion IDs, it just so happens that a bunch of those people deleted their accounts. Now, you could go through and change around people's IDs but if-- I mean, if you do use Facebook, just imagine how many thing would break if you started changing people's IDs, any URLs people had copied and pasted around the world containing a user ID would now be invalid if your re-- changing people IDs around. So that's probably a bad idea. And frankly, it's just expensive to go back and figure out where are the holes in your data, so most databases just kind of forge ahead blindly to higher and higher numbers so at least you have more wiggle room with something like a 64-bit value. So those are some of the gotchas. All right, so floating point numbers doubles are what you would expect in most languages. Decimal is an interesting one and it will come up probably in the context for you of project one which will talk about tonight, where you want o represent money somehow, dollars and cents for at least US currency. What's the problem with representing money in general with a floating point value in any language really, yeah? 

If you ever want to display it and it has a cent value that runs like 90 cents or 80 cents, it cuts off the last zero and it just looks like $12.9 or something like that. 

OK. So, sure. So there is this aesthetic issue involved in floating point numbers where if the amount is a dollar 90, as we would see it on a store, well the zero is not strictly necessary for that number's representation. So a computer, by default, might just do dollar sign 1.9 and that's it. So that problem we can fix pretty easily by just formatting it as a correct string with two digits. But there's an underlying problem whenever you use floating point values, whether it's a float or double, 32-bit or 64-bit real number. What's the problem? Why should you not store money as floats or doubles? Yeah, Jack. 

Rounding issues. 

Rounding issues, what do you mean? 

It's like-- from my experience with floats and doubles, occasionally if you try to store money in or any value in, and you want to get down to the exact value back in the end, you're going to end up with 0.999 and something-- 

Good. Exactly. In fact, there is an infinite number of numbers that you cannot represent precisely using a floating point value, whether it's a 32-bit float or a 64-bit double, and you can think of it is follows, if you have a 32-bit value or even a bigger one, a 64-bit value, there's still a finite number of numbers that you can represent with those bits. However, how many real numbers are there in the world, where a real number, in this case, is something with a decimal point and a number before and after it? How many of those are there? 

Infinite. 

So there's an infinite, right? But if you only have a finite number of ways of representing real numbers in a computer, as you do with the float or double, but there's an infinite number of them, you're going to have to round or cut corner somewhere. So the problem of representing something important, like money, using floating point values, is that you might be trying to represent a dollar 90. But you know what, the closest the computer can actually get using its bits is $1.89 cents or $1.89.9999 cents, which is not exactly right. So, how many of you have ever seen, "Office Space" the movie? Just you and me? Just Louis and me? Or "Superman 3", even less likely? Chris [assumed spelling] saw that one as well. OK. So, if you haven't, this is a good excuse to watch a movie. You can say it's for class. Watch the movie, "Office Space", which is the funnier of the two. And there is a scene in this movie-- this is not a spoiler at all, but there is a scene in this-- it's kind of a spoiler, but you-- by now, you should have seen this. It came out like 10 years ago. So, there is a scene in the movie where these guys are trying to steal money from their company because the company's computers are not taking into account this kind of imprecision of values. So, essentially, they write a program that says anytime there is an amount, like a dollar point 89 or rather, let's say a dollar 90.00001, which would normally be shown as just a dollar and 90. There're still some fractions of cents there, right? There's an opportunity to steal those fractions of cents but the company would never know because all they see on their paperwork is 1.90. So, they write a program to steal all of those fractions of pennies and hilarity ensues as a result. So a real world example of floating point imprecision. Excellent movie called "Office Space." All right. So, blobs, we haven't used them, not really going to use them, but what could they be used for? Yeah. 

Images. 

Images, yes. So, binary objects of some sort, binary large objects. But again, I would generally recommend that if you need to store binary data, you store in a folder on the file system, and then store in your database instead what? Yeah. 

Path. 

Just the path. The file name or the path to it, whatever directory it happens to be stored and plus its filename, something like that. All right. So, here are just a couple of functions that are germane to the data types built in to MySQL. If you go to that URL there, you'll see a whole bunch of others that are built in, things like averaging and summation and the like. But here's just two very common ones, date formatting and time formatting. So long as you have stored a date or time in one of those date time-related fields we looked at, literally date, time, date, time, year and so forth, you can call this function in your SQL query to get back a different representation of that. So, how does this work in practice? Well, instead of doing something like SELECT sales date From-- OK, auto complete, just going to keep doing this to us. SELECT salesdate FROM mytable. If instead, you want to not get back something like-- and actually let me change it to this. If you don't want to get something back that's not very user-friendly like 2012-07-01 at 23 hours, 1 minute, and 23 seconds, like this is what will come back by default, that's not a very user friendly. You could get that back and then use something like PHP to massage it into something a little more user-friendly or you can actually say DATE_FORMAT, passing this in, and then passing in something like Y-M-D to just get back the month, or you could something, like M/D, to just get back month/day and so forth. And you can do something similar with-- you can do something similar with times as well. All right. So, just a couple of functions but realize that functionality exists. But now, let's take a look at one other fundamentally different approach from last time. So, here's a snippet of representative code that actually doesn't do all that much besides connecting to a database, but uses different functions than we used last time. What were the functions we used last week to implement our various log-in examples to connect to a database and select the database and so forth? Yeah. 

You do MySQL and then Connect. 

Good, so mysql_connect. Then there is mysql_select_db for selecting a database. Then there is mysql_query for querying the database. And there's a whole bunch of others. In fact, if we look this up in the documentation, let me go ahead and pull up a browser. Let me go ahead and search for MySQL PHP. That will pull up the manual page on php.net that has a listing of all of the various functions here, and the ones we're are about to get back here, albeit a little slowly every time we try to do this. There we go. All right. So, you can see here a list of most of the MySQL functions. The rest are cut off. And there's more here than we probably are going to care to ever use. But mysql_connect is up there at the top. Mysql_fetch_assoc is there. There is another one, mysql_fetch_array, which allows you to get back in numeric array if you would prefer. There's mysql_fetch_object if you're familiar with object-oriented programming. You can actually have the database, or rather the library, return to you a PHP object inside of which are properties that represent the various cells from a row. And a whole bunch of other ones here. There's mysql_num_rows. We use that to figure out how many rows had come back in our result set and so forth. But there is a downside of using this built-in library in PHP. One is the function names are atrocious. The worst of them is recall mysql_real_escape_string. Now, just because it's got awfully named, it doesn't mean it's a bad thing to use. In fact, not using it is a far worse sin than having use that particular name since you'll be vulnerable to SQL injection attacks, which we talked briefly about last time, but we will look again more closely at the end of the semester when we focus on security. But also, you're doing a couple of things that are little short-sighted. One, you have to-- you're tied now to MySQL. If you ever decide or your new boss ever decides or whatever, someone decides to change the database from MySQL to something called PostgreSQL or Oracle or the like, you literally have to go through and rewrite all of your code or at least do a massive find and replace, and even then, some things are likely to break. So, you've just committed yourself from day 1 to using MySQL. And maybe that's fine and maybe 9 times out of 10 that's fine, but for the other scenario is it really worth the aggravation down the line? But more compellingly too, it's so much easier to make mistakes, right? If you-- as soon as you start writing code for project 1, which involves using a database, the very first time you forget to call mysql_real_escape_string, that's all it takes for your site to get compromised. And at that point, it would be so nice if instead someone else could do the scrubbing of data for you so that the burden is not on every one of us in this room to have to call this particular function every time we want to protect ourselves against theses kinds of attacks. Better would be to use a library that does that kind of escaping for us. So enter into the picture something called PDO, portable data objects, which refers to a code, like this, that abstracts away the detail of what kind of database you're using. And by that, I mean, you can use the exact same code to talk to MySQL, to talk to Oracle, Microsoft Access even SQLite, which recall is just a little binary file that you store locally on disk. So you have this layer of abstraction now so that the code you write doesn't have to change at all if you do change your database. The only thing you have to change is a variable or an argument that you pass in up here in the top, for instance. Notice that I've said a variable called DSN, by convention mysql:dbname=lecture. So that's really no different from last time, host=127.0.0.1. And the only part of that line that I would have to change if I move to something, like Oracle, is the one mention of MySQL. So, essentially, that's a unique string that identifies for this library what type of driver to use, MySQL, Oracle of the like, and then what the data base name is and the host actually is. And then the user and password have nothing to do really with the type of database you're using. So, what do I have now? So for those unfamiliar with tries and catches, this simply has to do with exception handling, so this doesn't have anything fundamentally to do to this-- with this idea of using this library, but the way it works is that when you call a new PDO and the jargon here is instantiating a new object of type PDO. You pass in a couple of three arguments, the connection string, user, and password. And then if something goes wrong, rather than returning false as is the case with a lot of PHP functions, rather than triggering an error which generally forces your program to die, it instead does what's called "throws an exception". And if your code has not been designed to catch any such exceptions, your program will quit, right then and there, crashing on the user effectively. So instead, if we say "prepare to catch" a PDO exception, and if one happens, let's call it $e, at least, you can, in a slightly more user-friendly way, handle that kind of error. Now what might the error be, username or password is wrong, MySQL is down, something like that, but this line of code essentially, becomes our equivalent now, of mysql_connect. Now, beyond this, and you'll see in section tonight as well as with this tutorial which we refer to in project 1, pretty much the-- all of the lessons from last time and all of the lessons from tonight about SQL itself still apply. This doesn't change SQL, this doesn't change your selects or your updates or your inserts or deletes, all it changes really is how you initially, connect to the database, and it also just changes what function you calls. Instead of mysql_query, you're instead going to execute a function called EXECUTE or another one called PREPARE and the nice thing about PREPARE is that it creates in database terminology, which called a prepared statement. This is sort of like a compiled SQL string, so if you have query you execute a whole lot, select star from table where foo equals bar. If you're constantly executing that string-- that query, but maybe only one part of it is changing, maybe the value you're searching for is what's changing, so select star from users where ID equals something. If you want to prepare that query but call it again and again and again, and every time you call it, you just want to change what? The ID, that you're searching for, you can call PDO's PREPARE function and this will essentially optimize that query to just have one part of it changed on each iteration. And the other upside is when you plug in an ID to a prepared statement or really plug in any value to a prepared statement, guess what the library does for you for free. Yeah. 

It escapes it. 

It escapes it, so you no longer have to think about any of that. So understanding it still good certainly, but you no longer have to worry about trusting yourself to write a code that's 100% correct when it comes to security. So again, let's see some examples tonight in section. This is what we'll have use in project one, but realize this is just a layer on top of those MySQL functions that come built into PHP. Now there's the PDO library which just solves a couple of problems that you might, otherwise, run into. All right so last time, we had life pretty simple. We only had one table called "Users" and what kinds of fields do we have in our Users' table? Yeah? 

Username and password. 

Username, password and what else? 

And we have user ID. 

We eventually added a user ID. Why did we introduce a user ID, yeah? 

To be the primary key. 

We wanted a primary key-- a primary key, which you uniquely identifies every row in the table, but why not just use username? 

Because username may be duplicated and maybe the unique one. 

OK, well let's assume that usernames, by design, will be unique just like the numeric IDs will be unique but what was better about using an integer for the ID, the unique identifier, as opposed to just trusting that usernames are unique? Yeah? 

The [inaudible] will be sending a lot more data if you send the username. 

Yeah, exactly, it's really, a performance thing. If users can have variable length usernames, three characters, four, maybe even 12 characters long, if you want to look up a user, that means you have to search for 3 or 4 or 12 characters. By contrast, if you also assigned users a unique numeric identifier like an INT or BIGINT, then you are spending exactly 4 bytes of maybe 8 bytes on every user, so you again have a fixed width table effectively and it's just much more efficient in general to search for integers, whether they're 32 or 64-bits, than for arbitrary length strings. So that's the reason we ultimately introduce the ID field but the problem too, is that we started telling the story about normalization, and we started talking about, "Well, what if users have cities and states and zip codes?" There's obviously going to be some redundancy, and so we propose factoring out what, from a user's table, if we also want to store their addresses, cities and states and zips and so forth. Well, how did we solve that inefficiency? Yeah. 

We store the zip codes instead of the actual address. 

Yeah, good. So rather than store Cambridge, mass, O2138. Cambridge, mass, O2138. Cambridge, mass-- so it's already getting boring to say and imagine just storing that again and again and again for every user, we can do better than this. Well, what do we minimally need to identify user's locales, probably just the zip code. So now we could have a zips table that has city, state and zip as its columns, but the only thing we put in the user's table is which of those three fields? 

Zip code. 

The zip code, exactly. So, that's nice but it feels like the price I'm now paying is every time I want to look up a user and figure out their address, now I have to do a select on the user's table and also another on the zip's table. And my God, imagine if we did this with other pieces of data as well, all right? If we have other common pieces of data, would we have to now select from a third table or fourth table? It feels like we're solving one problem but introducing others, namely ones related to performance. But thankfully, relational databases allow you to join tables together. So let's take a look at this as an example. So at top right here there're a couple of tables. This is taken from a site called w3schools.com which has varying degrees of accuracy but this particular example is decent to work with. We have two tables, employees and orders. And in the employee's table we have a field called Employee_ID and then another field called Name. So it looks like Ola has an employee ID of 01, Tove has one of O2, Stephen of O3, and so forth, so nothing too unreasonable there. And now under Orders, we have a product ID, a product, and an employee ID table that apparently is specifying who sold these products. So it's just some kind of log, maybe it's for commission so you know whom-- who to pay and how much to pay of who sold what. Well, let's start finding some faults with this design and see if we can't fix but let's also consider what they did that wasn't bad. So at the top, the employee's table, if you were doing this table, what would you have done differently? Isaac [assumed spelling], that counts as a hand up. 

Well-- 

Find faults with the employee's table at the moment. [ Inaudible Remark ] Sir, that's actually-- I forgot your name. 

Scoot. 

Scott, yes. Find faults with the employee's table. [ Inaudible Remark ] OK, good. So we might as-- we have a database, right? We might as well tease apart, first name and last name because storing it with just the commas is just completely unnecessary. It's as though we've created a fake column by sort of merging together the idea of a CSV for the database table, why not just put last name field and first name field? So we could do that. So that's pretty good. All right, and what else? Well, for some reason, these employee IDs are written as 01, 02 which suggests that they're using like a CHAR or a VARCHAR. And that's just dumb at this point, right? It should just be a number and if it's an actual number, it would suggest that there's not a prefix of zero but now we're just kind of inferring from the example. But that could be a potential fault. So now, what about orders? Well, it looks like here that what they have done which is good, is they have not put the employee's names in the order's table. They've just put the employee's employee IDs. So this is similar conceptually then to not putting city, state, zip in a user's table just putting zip. So that seems like they made a smart decision, what was kind of stupid though, nonetheless? Find fault with the order's table now. Jack, that counts as a hand up. Everyone has to stop scratching their head. Find faults with the order's table. 

I don't really see much fault in the order's table, you know, except for maybe that old one, like the employee IDs or being 01, they have 01-- 

I'm going to decline that one. So no, I can't reuse that idea. There're still faults here though. Yeah. You could have an order ID. Yes, so we have no way of uniquely identifying these orders and maybe there's no use case for it but that does seem inconsistent with the idea of having employee IDs and product IDs. And even beyond that, there's still-- there's pretty bad fault in the order's table. Axel. 

I don't know if this counts as a fault but I really wouldn't order the tables from the employee ID and order from the product ID [inaudible]. 

OK, order the table from product ID. But isn't that what they've done it on the left? 

Yes, but what about to the 35 and 36? 

Oh, to the 30-- 

That's just arbitrarily. 

I see. So let's-- it is an arbitrary example and maybe those were also got deleted because they were canceled even though that's kind of a lot of cancellations. Jack. 

I figured it. They have product ID, and the product stored in the table while-- yes, they have a product ID they can store that somewhere else. 

Exactly. [ Inaudible Remark ] Exactly. So whoever made this table kind of got it half right and they did factor out the employee ID and put only the employee ID here, but then they sort of forgot that lesson learned and seemed to be duplicating product name, product name, product name, product name, and you can't see it here but suppose another person orders another printer. Well, what's going to end up in the next row? Well, it's going to be a product ID of 234 but then the word "printer" in the second column and then whoever the employee ID is that's sold on that printer. So we probably should introduce another table. What should that table be called reasonably? 

Products. 

Products, right? So like a product table. Inside of which is a product ID field, and a name field which then suggests that how should we fix the order's table, what should go? Yeah, Isaac. 

Just product ID. 

So just keep the product IDs and ditch what? 

And take out then product. 

Exactly, ditch the product. Now, if you continue this kind of normalization as it's called, your database starts to get very cleanly designed. And by clean, I mean, there is only one authoritative place to find out what the name is of a product. You check the products' table and there's only one mention of printer or table or chair. It's not duplicated all over the place. And this is good obviously if you just change the name of something, you wouldn't want to have to go through all of your various tables looking for all of the redundantly named things just so you can update them. So that's good, but again, it feels like I'm creating a huge amount of work for myself now because whereas in a simple world, I just select star from one table, now I have like three different tables. And how do I select data simultaneously so that the data I get back is representative of a given moment of time and not from this second followed by another second followed by another second, whereby your queries themselves might be some number of seconds or milliseconds apart. How do I get a snapshot in time? Well, we can actually do this all at once. We can do this with a couple of different syntaxis. So let's try this. Let's assume for the moment that this is good enough even though the faults we already found, if we assume these table structures, though, how can we go about querying data and getting it back all at once? Well, here's some slightly new syntax and I've written on three lines really just for readability. SQL doesn't care. It only matters when-- that the whole expression is syntactically valid. So SELECT Employees.Name,Orders.Product. Now, what does this mean? Well, Employees.Name, as you can probably infer, is referring to the name field of what table? Employee. So that's all. The dot notation does exactly that, and we saw the dot notation actually [inaudible] last week. Orders.Product, same idea, select the product name from the order's table. Where do you want to select those fields from? It's a little-- it feels a little redundant but this is just the way it is. You then specify what tables this query is selecting data from. So you have to say, from employees comma orders. And it doesn't matter of the order in which you say them but you have to say the table names that you want to involve in this query. But now I'm doing this last predicate. And this is definitely more involved than the ones we've looked at briefly thus far where Employees.Employee_ ID=Orders.Employee_ID. In English, what is that predicate doing for us? Yeah. [ Inaudible Remark ] Exactly. It's creating a cross reference of sort between the two tables namely employees and orders and it's explaining to the database how to join those two tables together. Now, I can sort of simulate this with my fingers. Let me pull up the data over here. What is obviously the one common field between the employee's table and the order's table? So it's the Employee_ID, right? That's the only that's in both. So if we again kind of do the little physical depiction I did this with my fingers last week, suppose that here are the employees and supposed that these are the employee IDs, the tips of my fingers here. And now the order's table also has employee ID and it's the tips of my fingers here. Effectively, what that query is doing by saying match Employees.Employee _ID=orders.employee_ID, it's saying to line up the employee's ID. So if this is 001 on the left, it's 01 on the right. If it's 03 on the left, it's 03 on the right. So you kind of join these things together using identical employee IDs and what does this whole query return? What returns a result say which recalls a fancy way of saying a temporary table, that table looks like this where these fields have been conjoined somehow and each row represents the combination of some data from one table with the other table. So, let's actually try to see this in practice. Let's go and recreate then we'll improve upon it. So let me go over to phpMyAdmin in the appliance, and I'm going to go ahead and create a new database. I'm going to go ahead and call this jharvard_Monday just for kicks today. So now we have a clean database. We won't commingle it with last time. Let me choose the Monday database. And now let's create a table. Now, in general I would not do what W3Schools did here with the capitalization, but let's just keep it identical. So we'll create an employee's table, and how many columns did it have? Two. So let's do that and then the first one was called Employee_ID. It could be an INT but let's just do BIGINT just to be different this time. And then it had a name field which should be of what type, probably? Yes. So let's go in the VARCHAR, and now we have to make a design decision, how many characters should the name be? 

Fifty. 

Fifty? OK. So let's-- 50. Something reasonable but be consistent in general probably. And now what else do we want to choose here? Should either ID or employee name be null? Probably not, I want both really in my database. Index, should employee ID be a primary unique index or a full text? 

Primary. 

Primary. Why, because it hopefully uniquely identifies the user. And we're going to kind of clean up 1 to 10. We're not going to have prefixed leading 0s in our employee IDs. Let's actually use a number like a BIGINT. How about the name? Should we make the names unique? [ Inaudible Remark ] Right, there's a lot of like Mike Smiths in the world. It would kind of be unfortunate if we can only hire one at our company so we probably should not impose that. When might I want to define an INDEX on a field? So it's not unique INDEX it's just an INDEX. What was the advantage of defining an INDEX? I didn't really talk about this but just think instinctively, why would you want to index, so to speak, a database field? 

For searchability 

For searchability, for performance, right? If a field is not indexed and you want to search on it, you essentially have to do linear search. You have to search the whole column to find the values you're looking for. But if instead you as the database designer, say index this field, you're essentially providing a hint to MySQL that you or someone will maybe want to search on this field, so please do some effort upfront to optimize those future searches. And it will create some kind of tree structure, it's generally called the B-tree structure that gives us more like logarithmic search time instead of linear search time, which for those unfamiliar, just means it will be faster. How about AI, which is just a shorthand way of saying auto_increments? Should name or ID be auto_increment? Probably ID if we don't want to have to deal with managing those ourselves. In terms of defaults, we probably don't want to give people default names or anything like that, so let's leave everything else alone. Lastly, let me mention this. We'll come back to this. We didn't choose this last time but notice under database engine, there are different engines you can use. And I made the comparison with like NTFS and HFS+ for file systems, if you're familiar with those words. But for now we're going to leave it on the default InnoDB, but later we'll see what the actual implications are of not giving that choice much thought. There will be some tradeoffs. So let's save. All right. So what query was just executed? Again, here is one of the upsides of phpMyAdmin. It kind of teaches you as you go by telling you what query it actually executed to do your bidding just a moment ago. So we'd executed to create table statements. So now, let's go ahead and create this table. And let's go ahead and choose create table, orders. How many columns does this thing have? There's three at the moment. And we'll just deal with the imperfections of it now. In those three columns, where what, product ID? So, Prod_ID. And then we had product. And then we had what, as the last field? 

Employee_ID. 

Good, Employee_ID. All right, so for product ID, what should this guy be? Let's go with BIGINTS. So if we're going to use BIGINTS everywhere, let's just use them for this guy too, product should be what? 

VARCHAR. 

VARCHAR. How long should the products name be? 

Depends on what you're selling. 

Depends on what you're selling. We are selling printers, tables, and chairs. 

Twenty. 

Twenty, all right? The marketing people will be upset with you when they come up with the longer word. But that's OK. Employee_ID should be what? 

BIGINTS. 

BIGINTS for consistency now, right, because this is as we'll call it, a foreign key whereas in the other table it was a primary key. All right, we'll leave blank there. Let's see. Null, should any of these fields be null? Probably not, unless you want no one to be able to solve these things. Index, which should the product ID be? 

Primary case. 

Primary case. So this one is kind of a trick question, so I would argue that primary key is not appropriate here, but why? Here's the-- here are the tables again. Yeah. 

It's not really that one search word, you want to link the product to the employee. 

True, but I would argue I would want to search on product ID. Suppose I want to know how many chairs have we sold, I could do a select star from orders where product ID equals 865. So, searching on is compelling, but primary key, primary key means that uniquely identifies the row. So what would the implication be of making product ID unique in a primary key at that, in this particular table? Yeah. 

If you had more than one chair and your table which you're probably going to have [inaudible]. 

Exactly. We could only ever sell one printer or one table or one chair, or rather in individual employee could only sell one printer or one table, or one chair because again the definition of primary key must uniquely identify a row. So, if you have another-- if you have employe number 1, if Ola sells a second printer, what should the row be? It should be 2, 3, 4 printer or 1, but that would violate the primary key constraint. So really, someone else offered up the solution earlier, what field is really missing from the order's table that should be there? [ Inaudible Remark ] I might be just call it order ID. So something that uniquely identifies the order. And if we did introduce that, I would propose making order ID the primary key because then you can start it one and two and three and for everything you sell, you can increment the order ID. So, what might we want to do then with the poorly designed table that we have here? Well, let's go ahead and-- I'm going to propose indexing it. It's not unique but this way I can search on it, which reasonable to want to search on an ID fields. Now the next field was Name. Should we have any keys on the product name? 

No. 

OK, I heard a no. Anyone who want to argue the opposite, and then well flesh out which is best? So I'll play the contrarian. So I'm going to propose yes, I do want an INDEX on here. Why might I be taking that position no? And again the field in question is the name of the product. Yeah. 

We want to search for all tables that were sold? 

Yeah, what if I want to search for all tables that were sold? So I'm, you know, the salesperson. I don't really remember what the number is for that product but I want to check how many chairs have I sold or my boss wants to check how many chairs have I sold, and maybe we have different types of chairs. Maybe we have products called big chair, small chair, white chair, black chair, you know, any kind of longer product name that has the key word chair but I, like a person just using a computer, I just want to do the equivalent of like a Goggle search. I want to search for chair. Now, what's the SQL keyword that we can use to do those kinds of keyword searches? 

Like. 

Like. So we saw like earlier. So actually, the like expression here would probably be "%chair%", because that would find me any products that have the word chair in them. Now that could fail if there's some weird word out there that has chair as a prefix or as a substring, but it's probably good enough for our purposes here. So why might I want to index then the name if I want to search on it? Now, maybe I do, maybe I don't but if I do I should have that as an index, but not unique because then I could only share-- sell one chair for instance. Now lastly, employee ID, should employee ID be a primary key, unique or index? Full text isn't relevant because it's not a text field. Yeah. Exactly, probably you want to index it because it is an ID and we want to not only search on it here, we also want to do the join thing. And in fact, anytime you're joining one table with another, the field or fields that you want to join on should be defined as having indexes for performance, otherwise, again, it's a very expensive operation. If you remember-- what's your name again? 

Ben. 

Ben, OK, trying to get them all now. All right. So, let's finish this up. Let's make this an INDEX. And should any of these be auto_increment? 

If we had like a sales ID that's-- 

Good. If we had, let's keep calling it an order ID. Then yes, that would be reasonable if order ID uniquely identifies that order. But as it stands now, no, I don't want this auto incrementing because I'm going to deliberately specify who sold what and what that ID actually is. So let me go ahead and click Save. And now let's go ahead and look back at this query. So, when I SELECT Employees.Name, Orders.Product FROM Employees, Orders WHERE Employee_ID in one table equals Employee_ID in the other, I get back this sort of visual effect, and what does the temporary table look like? Well, if we kind of do that query, we get back this table. So, it's a temporary table, it's my result set, it has some number of rows, in this case, three and I have all Ola and Stephen and all of the items that they sold. But this is a little confusing to me. I thought I have colleagues named Tove and Kari also. Why are they not in my table? What's the mistake or problem here, Isaac? [ Inaudible Remark ] Yeah. This isn't a bug, right? It's just that neither of them is selling very much right now because look at the orders table, who sold things, employee ID one and then three and three sold two things. So, Tove and Kari haven't sold anything. So this is indeed correct, it just so happens that only Stephen and Ola have sold something from among the products. So, there's another syntax now we can use to do this join. This is what's generally called an implicit join, why is it implicit? Well, I have nowhere used literally the world join and you're just kind of implicitly saying create a new table that's the result of joining these two tables by a way of the where clause that I had there. But we can be more explicit here and we can do something like this. Notice this is after, before, after, before. So, here's the after version, what's different? I still say SELECT Employees.Name, Orders.Product but this time, I say From Employees JOIN Orders. So, I explicitly say, I want to join employees and orders but how do I want to join them, I have to specify on Employee_ID equals Employee_ID from each of the respective tables. So, the end result is identical and it's really up to you as to which one is more clear. Frankly, I almost always go with the join syntax like this just because it's super explicit as to what I'm joining on what and it's a little more clear to me what's going on. And again, the white space where I fit enter is meaningless. I just did this for formatting reasons on the screen. So that you've heard it now, there's other types of joins. There are left joins, there are right joins and outer and inner and now what the relevance is for us is it won't really affect you with project 1 but sometimes there are corner cases, like what if you don't have-- suppose that you-- let's see, who is number three. Suppose that Stephen was fired, he no longer works at the company. So you go ahead and delete him from the employees table. You probably don't want to lose his entire sales history because if you're trying to balance the books, you want to know what you sold and who sold it even if that person is not there. But the problem, though, is that if Stephen is not in the employees table anymore and you do a join employee ID, those rows are going to disappear. They will not appear in the output and we won't know that Stephen sold both the table and a chair. So, by using left join or right join, you can essentially say which of your two tables that you're joining should carry a little more weight. So, in other words, if in this case, I did a right join, now which is the table on the left and which is the right? It's not quite obvious from the way I've formatted the text here but notice I said FROM Employees JOIN Orders. If you write that out in a sentence, which one's on the left employees and then orders. So, employee join orders, if you write it out, you have a left table and a right table, even though it might happen to be on separate lines. So what's the implication then? If I actually said FROM Employees right JOIN Orders, that means that every relevant row in the orders table should be included in the result set. And if it just so happens that there is no corresponding employee ID in the employees table because Stephen was let go, that's OK, still give me a row for all of the stuff he sold but put no there and no there. You don't know who sold it but you don't want to lose track of the fact that it was sold. Now, by contrast, a left join would not solve this problem. But if something were missing from the orders table then maybe you do a left join. But the idea here is that you bias it toward one table or the other just so that you don't accidentally drop some rows. All right, any questions on joins and we'll try to make this more near term relevant when we discuss later today project 1. All right. So let's go ahead and take a five-minute break but the teaser for now is we're going to come back and talk about milk and going to the store and putting in a refrigerator. So hopefully that will get you to come back all excited. Take a five-minute break. All right. So, consider the following scenario. You have a roommate and you have a refrigerator and you both like milk and you open them-- one of you gets home one day and you open the fridge and you're out of milk. And so, you close the fridge, you head outside, you walk across the street to CVS and you get in line to buy some milk and the lines at CVS these days are always ridiculously long because they have those self-checkout machines. So it's perfect. Because now your roommate comes home while you're still waiting in line at the self-checkout machine at CVS. He or she discovers you're out of milk, he or she closes the refrigerator, walks outside, goes to the supermarket instead of CVS, so you don't actually cross paths. Then some number of minutes later, you both get home and voila, you now have twice as much milk and milk spoils so this was not a very good plan because you like milk but you don't really like milk so it's not like you're going to drink two gallons of milk. So now, you've wasted some money in some milk. All right. It's a horrible problem, right? So how do you solve this in the real world? How do you avoid getting twice as much milk? Yes? 

I mean you could just write a note on the fridge that you've gone out to buy-- 

OK, good. Right, so it's totally reasonable decision. If you are the first one to come home and you realize there's no milk, just leave your roommate a note, right? It's a courteous thing to do. You'll go get the milk. He or she will appreciate it. And so when he or she then comes home, opens the fridge door and realizes, oh, there's no milk, he or she does not take it upon himself to go to the store as well. So this is good. A note is very reasonable thing and we can actually kind of ramp up the metaphor a bit and instead of saying like leaving a note on the fridge, you could be really kind of crazy and just lock the fridge altogether if you're the first person, right? Because the problem here fundamentally is that both of you are independently checking the state of the refrigerator. Now, you start thinking of the refrigerator as a variable that has some value and that value is either zero or one there's milk or maybe it's a number that indicates how many ounces of milk you have, but either way, it's a variable, both of you at the moment in the first version of the story are independently checking its state but the problem is that your operations of checking the variable's state and updating the variable's state, in other words, opening the fridge and looking and then, buying more milk, those operations are not, atomic. They can happen but there can be interruptions in between those two operations. What's the interrupt here? You check the variable's state, you then start to go for milk, your roommate then checks the variable's state and you both have correct views of the world, but you, the first person, have made a decision based on that variable's initial state, by walking across the street to CVS. So when you come back is when the problem ensues. So again, a note will solve this problem assuming your roommate reads it and if he or she is not really paying attention to the post-it notes, you could literally put a lock on the refrigerator. What would that prevent? Well that would physically prevent your roommate from checking the value of that variable. Now, it could be a little awkward because if they're really determined to have milk, you're going to come home and they're still going to be holding on to the door because you've blocked them, so to speak. They're in what's called deadlock mode if we really abuse this analogy to computer stuff. So they're in deadlock because nothing can actually happen there or they're spinning and waiting for the lock. Deadlock usually involves two people. So, we can solve that problem in the real world, but now, let's consider a related problem in the world of databases. So, the same scenario arouse in conversation last week, when we were talking about registering for a website or buying something from a website. For instance in a simple scenario of registration, suppose that your name is Mike Smith and someone else's name is Mike Smith. And just by bad lock, both of you hop on the internet one day and tried to register for the very new facebook.com, the next big thing. And you really want Smith as your username or Mike or M. Smith. Whatever the case may be you both what the same username. And suppose that by bad luck, you're both sitting at your laptops and you both hit enter at the same time, well, what happens? Well, those HTTP requests go to the server, the server has a database now, so there's some code that passes those username requests off to the database and what's going to happen? Well even though, we like to think of computers as doing dozens of things at once because they seem to be. Really, they're typically doing one thing or maybe a finite number of things simultaneously. And in this case let's consider a simple story whereby your two requests might have arrived at the same time at the server for the username Smith, but one of you has got to win, right? One of them will be serviced ever so quickly before the other one. So what's the problem? What if that first, request is, is Smith available? This guy checks, but then the computer again needs to maintain this illusion of parallelism, so as we'd said last week this thread, your request is going to be put to sleep, maybe for a second, probably, just for a split second at which point the other person's thread is going to be a awoken and it's going to check, is Smith available? The answer to that query is also going to be yes if you just use a simple MySQL SELECT statement, right? You might select. Select star from users where username equals Smith. If that returns zero rows, this guy knows it's available. If that query returns zero rows, that means this guy can have that username as well. So suppose you issue your SELECT statement first. But then in your next line of code is when you actually do your insert. How do you create a new user, you insert into that table. But what's going to happen? So the first user has his choice of username, Smith, inserted. The second one, what happens? Now there's a collision, right? And so the MySQL query function or the corresponding PDO is going to return some kind of error. And yet why should it be an error? You told both of these Michaels that Smith is available so this is an incorrect result, quite bad. Now instead, let's assume a more compelling scenario, right? OK, so what, the second Mike Smith didn't get his favorite username, so not a huge deal. But now consider the case of money or ATM. So if you want to go to a cash machine and suppose that you're being really crafty, you're a bad guy, and you've somehow figured out how to duplicate your plastic ATM card, that's not hard, it's just a magnetic strip. And suppose you go up to an ATM and you cover up the cameras so there's no corner cases, and you put in the two ATM cards simultaneously and you log in and then you do something like, I want to do a withdraw of $100 enter simultaneously on these two machines. How is this story similar? Where in lies the same problem? Axel? 

When the ATMs checks the account for balance or whatever, it's going to return the same value to both ATMs. 

Good. 

So essentially it's going to withdraw the same amount from the same-- from the same balance and the end result is going to be the same. 

OK, exactly. So if it's really the same kind of story, it's just the variable you're checking now is your account balance and suppose that you only have $100 in your account, you being the bad guy, but you've executed this query simultaneously. So just as in the case of milk, just as in the case of the username checking, both ATMs say select star from, let's say, accounts where user ID equals 12345, whatever, that is the unique identifier on my ATM card. And what I'm selecting this time is balance and the balance comes back and I say, "Oh, this user has $100" but both ATMs get that same answer. So what does an ATM do then next? Well, if there is $100 in the account, what should it do? It should dispense $100 from both machines. So now you've withdrawn $200 even though you only had $100 and somehow your account now is at negative 100 or worse, zero, right, if the math just kind of works out that way and they're using an unsigned insert because it doesn't really makes sense for a bank to just give you more money than you actually have, unless you actually signed paperwork for a loan. So in the worse case from the bank's perspective, they have just given a guy who owns $100, $200 and they don't even know it because the account balance is now zero. So how could we have solved this problem? How could we have solved the username problem so that the expected outcome is indeed correct? Yeah? 

Lock the fridge. 

OK, so lock the fridge. So what does that mean in the case of the ATM? Well, you just have to ensure that after you ask a question like how much money does this guy have, you then perform the deduction before answering any similar queries for other ATMs for that particular guy. You somehow need to make your two operations of checking balance and withdrawing money atomic, so to speak. Atomic in the sense that it's like a very small particle, it all happens at once together. So how do we do this? We don't really have this way of-- we don't really have a way of doing this in code just yet because in PHP, recall, when we were doing all those log in examples last week, we called MySQL query and then MySQL query in different places of code. You want to somehow be able to execute two queries simultaneously. Now, you saw semicolons in use for the MySQL command line client, the black and white interface. But it's not enough to just separate your queries by semicolons and send them both to the database at once because they can still get interrupted. You have to be more explicit when you want your database operations to be atomic and there's a couple of ways to do that. There is this handy syntax in MySQL where you can do the following. This doesn't quite solve the ATM example but it does solve similar problems and that you can express yourself as follows. INSERT INTO table, whatever it's called, columns a, b, c, what values do you want to insert for a, b, and c? Let's arbitrarily say the value is 1, 2, 3. However, if one of those fields, say, a is defined as a primary key, unique key, and if you are trying to insert a duplicate key, this syntax let's you say, "Mm-mm, don't do that. Instead, just update one of those field's values." So what's the implication here? Let's actually take an example like let's say ordering something-- let's come up with a good one here. Let's actually use stocks, stocks symbols. So suppose you're implementing a website for keeping track of what shares of stock someone owns and you just want a total count. How many shares of Google do I own? How many shares of Microsoft do I own and so forth. So in whatever table you're using to store a user's portfolio, you simply want to have one row for Google maximally and in another column next to Google, you want the total number of shares. And if it's one, that's fine, but if the user buys some Google shares today and then buy some more tomorrow, you want to update that same row. You do not want duplicate Google rows just because there's no need. It's inefficient. You can just update that particular value in this version of the story. So with this kind of query, you could say exactly that. INSERT INTO table and then in parenthesis "Google" or whatever its stock symbol is, comma something, comma 1 where the 1 is the number of shares that you're trying to buy. If, though, there is a duplicate key, well, what's likely to be the unique key here? Goog, G-O-O-G happens to be the symbol for Google. So if you already have a row in this table with a key of Goog, what is this code telling it to do? It's instead saying, "Don't insert a new row with the values a, b, c. Instead, just update the existing row c value by incrementing it by one." So in other words, this is one way of expressing with a new query that we haven't seen before with the ON DUPLICATE KEY syntax essentially checking for the presence of a row or a value and then acting based on that check all atomically, all at once. By contrast, if you had to do this manually, you would need to do a select to first find out if there's already a row for Google, then you would do an update or an insert. This allows you to collapse a select followed by an update or an insert into just one query. So that's one way of doing this but unfortunately that is not enough. So thankfully, databases typically support transaction these days. So InnoDB is now relevant. The storage engine that we chose earlier for my database was InnoDB by default. That just means that's the format in which all of my data is stored. So what does that really mean? Well, that means I have a feature called transactions that I can use on that database engine type but not on others. So how do you use transactions? You can do this. These four lines represent four separate SQL statements. The first, START TRANSACTION and you can also say begin transaction, they're synonyms. That just means here comes a sequence of SQL queries that I want to execute atomically. In other words, execute all of these together or none of them do not interrupt anyone of these steps with someone else's queries or anything like that. Put a lock on these rows. So UPDATE account SET balance = balance - 1000 WHERE number = 2, UPDATE account SET balance = balance + 1000 WHERE number = 1. What's the context here? It's kind of arbitrary. But imagine that I'm just trying to transfer $1000 between two different accounts. One of those accounts, account number is one, the other account number is two. So this is one of those scenarios much like the ATM withdrawal that I want both the deduction from one account and the deposit into the other to happen together or not at all, otherwise I'm going to end up losing $1000 accidentally or gaining $1000 accidentally and not just moving $1000 from one account to the other. COMMIT, as the name suggests, means commit this transaction to the database. In other words, execute. So by saying START TRANSACTION, you're essentially telling the database to buffer up a few SQL statements. Once you do COMMIT, it executes them all without letting anyone else squeeze inside there. What's also nice is that there's a command we can use that's sort of the opposite of COMMIT, it's ROLLBACK. So this is another powerful feature of InnoDB. Whereby suppose we do the following, start transaction, update one account, update the other account and now you do a check, select the balance from the accounts where number equals 2 and then in my-- with my hash sign there, I'm just proposing that we have some PHP code. It's in the comment because I just mean it to be pseudo code. Suppose that I screwed up and after selecting the balance from account where number equals 2, now I realize, shoot, now the user has gone to a negative $1000 or negative $1 even. I noticed a problem and I want to undo every one of the steps that I just executed whether it's three steps or two or ten. I want to undo these changes. It's like hitting command or control Z, Z, Z, Z, Z when you really screw up in an essay you want to just go back, back, back, back, back or in some programming code you're writing. So ROLLBACK undoes everything up until the most recent transaction. So it's wonderfully powerful and allows you to try to achieve some result. But if you can't, you don't have to figure out how to fix all of your various tables and cells based on the changes you made thus far. So wonderfully powerful and it's also quite efficient and it's in contrast to something called locks. So MyISAM, for many years, was the more popular default engine for MySQL. The upside of MyISAM is that it actually tends in certain benchmarks to be faster than InnoDB especially for selects and sometimes writes, but you don't get the same property of having transaction support but you do have locks. In MyISAM tables, which is just another option from that dropdown, essentially the data then is stored in a slightly different format. But if you want to achieve the same idea, you have to literally lock the whole table. So the syntax for that which is less commonly necessary now is LOCK TABLES account WRITE. Now, what does that mean? Account is the table name. And what kind of things do you prevent people from doing? From writing. So you can think of this is as like the padlock on the fridge. You're preventing your roommate from writing to the fridge by inserting more milk into that variable. So this just means don't let-- anyone else can read the table, I don't really care, but I don't want them to write to the table here. So as now, I'm going to select the balance and then I'm going to go ahead and update the balance, this time equal to 1500. And then I'm going to unlock the tables. So in effect, you can do the same thing. You can say execute all of this stuff and don't let anyone get in the middle. But the problem with locks in MyISAM is that you literally lock the entire table which means even if someone wants to check someone else's account balance or someone else wants to make a deposit that has nothing to do with account number 2. The entire table is locked. So this tends to be bad for efficiency if you're essentially telling everyone, every other row especially if you have a million rows, no one else can deposit money, check their account balances or anything because I am blocking this whole table. Transactions in InnoDB by contrast effectively only lock the rows that you care about. So it's much better for performance at least in terms of transactions. So that there is the tradeoff. All right, questions? So let me come back to one thing that relates now to InnoDB as well. So recall that we were re-implementing these couple of tables for the example involving employees and products. And suppose now, what I want to do is refine a little bit this thing here, Employee_ID, recall we defined an index on it. Now, why do we do that? For performance really. It's a unique identifier. I might want to search by employee ID so I made it an index. But notice this, I'm going to go down to relation view in phpMyAdmin which is this link here below the table summary. And now, notice that, oops, did I not actually check that box. I'm going to go back. Let's see, employees, we have. So here's how you look up what indexes you've created in my-- phpMyAdmin. I've clicked the employees table. And here I see a reminder of what the columns are called. If I click Indexes, I'll see that this tables has two indexes at the moment, a primary index on the Employee_ID field and it's unique by definition of primary. And then I remember that I also put an index on Name which was on the Name column and it's BTREE. That's an illusion to the tree structure. That actually does the-- That optimizes the searching. But now, let me go back to Orders because I actually think I did not check the box correctly earlier and look at my Indexes. And indeed-- OK, I completely screwed up earlier. I didn't mean to check, select all of those things. This is a shortcoming of my memory and of phpMyAdmin's interface. What I did here in defining an index, I defined it on all of three fields which was not what I intended. I wanted to have an index on each of the individual field so that I could search on each field individually. This index suggests that I want to select-- I want to search on all three fields simultaneously. As with the where statement like where product ID equals 1 and product equals table and Employee_ID equals 3. So that's not the kind of query I had in mind. So I'm going to fix this. Let me click Drop next to this key rather next to this index. And let me do this manually. So I'm going to go here and add a index. I'm going to go here and add an index and over here and add an index. Unfortunately, phpMyAdmin, when you do it all at once, apparently assumes you want a joint key. So now, I have three separate indices which is the intended behavior. And the link I'm going to click now is relation view. And this is another feature you get with a database like MySQL that you don't get with XML or with CSV or the like. Notice under Relations, so I can now define what are called foreign key constraints. And that's a feature of InnoDB only. In other words, I can now tell the database where these fields also appear. And the one I care about at the moment is Employee_ID. Notice here that under this checkbox, I have a list of all of the other fields I've defined in other tables with indices. And the one I want to do is I want to say that this table, these orders .Employee_ID field has a relationship with the employee's table .Employee_ID field. In other words, I have given this an index in the orders table, Employee_ID. And where else does Employee_ID appear? Obviously in the employee's table where it's a primary key. So if I want to teach the database that Employee_ID in the orders table is a foreign key. And what's a foreign key? Well, it's a field that happens to be a primary key elsewhere. I literally just make that association here. So now, the database, MySQL knows that Employee_ID is a primary key in employees. And it's a foreign key in orders. Well, who cares? Well now, you can set up rules for undelete and unupdate. In other words, by restricting this field in this way, I can now prevent developers, myself included, from deleting this employee from the database. Why? Because if Stephen, employee three, has both a row in the employee's table and he's also sold a couple of things in the orders table, by specifying this relationship between the two tables, I can ensure that I or a colleague doesn't accidentally delete Stephen from the employee's table because I've restricted deletes and updates. In other words, if I try to delete this-- delete Stephen, it's not going to be allowed because I'm going to be violating these constraints. By contrast, I can specify that I don't want to restrict those kinds of operations. I want to do something like CASCADE. And CASCADE is kind of the opposite. What this means is that if I delete Stephen from the employee's table, you know what, go ahead and delete Stephen from the orders table, too. We're done with Stephen. If he's fired, that's it. We're going to remove all evidence of him. So this might not be the desired behavior. Sometimes it is, sometimes it isn't. But if you did want that behavior, where deleting from one table automatically triggers deletes from another, you can do this again all in the database. So what's the upside? Well, you don't have to worry about this in code, whether it's PHP or some other language, your database can do be self-managing in this way. So all you have to worry about is deleting the one thing you care abut, Stephen, from the user's table. And if Stephen's name or ID is anywhere else in your entire database and any of your dozens of tables, these kinds of operations can go through automatically and clean all that up for you. Now again, probably bad for terminating an employee because you don't want to forget what they did actually sell. So we might instead want to do something like restrict in that case. But that's another advantage of InnoDB that you don't get from MyISAM. All right, so let's talk about design with an eye toward project 1. So project 0 still in progress presumably, per the course's website we'll announce tomorrow on the course's homepage how to submit it and the process for doing so. So keep an eye on that. You will need to submit it electronically. And then this second project 1 is due next week so realize there's deliberately this overlap so that even though project 0 is not due until Wednesday, we'll release the next project on a Monday just so that you can manage your own time. And if you get a head start, great. If not, you can start on Wednesday or thereafter. All right, so CS75 Finance is the challenge at hand next. And this is your second project that actually involves MySQL this time around. So an actual database, no more XML. Division for CS75 Finance for the specification, whose PDF is already online is to implement your own etrade.com-like website, a website that allows you to manage the buying and selling of stocks and to allow users to pretend that they're actually buying and selling real stocks. In other words, if I'm a user using your version of CS75 Finance, you're going to have to allow me to register for an account and then log in to your site. And the moment I register, it's a nice little promotion you're going to give me $10,000. And you're going to do that by associating that with me, my user ID, somewhere in your database. And then, you're going to enable me to get stock quotes like how much is Google right now, how much is Yahoo right now or any other company, and then buy and sell stocks, so to speak. And how do you buy a stock? Well, you're going to check how much the stock currently is. And if I want to buy one share, you're going to do the math, one share times stock price is probably a few dollars or a few hundred dollars. Then you're going to check what's my account balance. I have $10,000 out of the gate. And then you're going to do the math and say, "OK now, you have $9,000 something but you have one share of stock in some company. So the building blocks we've been looking at tonight in terms of locks and in terms of automatically updating rows and the like, it's definitely going to come into play here especially since money is involved, where correctness is all the more important because otherwise someone, you or the bank, are going to lose money. So the proposal we have here is this. First, take notice that here is the most recent piece of spam I got over the weekend, advertising some penny stock and this penny stock goes for 3 cents right now. Invariably, we have to change this example every year because these penny stocks keep getting driven out of business. But this is an email I got. Someone encouraging me to buy this penny stock and a penny stock is one that's really cheap, some usually some number of pennies. And this was 3 cents as of yesterday. So, what's the relevance here? This is really just a screenshot of Yahoo Finance. What's nice about Yahoo Finance is that you can look up stock quotes like this one. ROSV is the stock symbol. If you're unfamiliar with buying and selling stocks, realize that this specification points you at a tutorial for them that applies to-- that will give you a sense of how this world works if unfamiliar. But let me go to Yahoo Finance and type in ROSV, which again is the symbol for Rostock or Rostock Ventures Comp, Corp, whatever that is, and get quotes. And I see pretty much this. So it looks like no, it didn't really trade heavily today. In fact, 20,000 shares were actually bought or sold today but the price remains at 3 cents. So this is nice and interesting but what's really need about Yahoo is that down here at the bottom is that they have this little toolbox. Because what we want to do is not want to-- we don't want to resort to something like screen scraping where to get a stock price from Yahoo, we have to grab their web page and then read the HTML and figure out what number in that mess is the stock price. Rather here is a nice link, Download Data. So let's actually click this. I'm going to click Download Data delayed and that's just a reminder that it's actually a few minutes delayed. It's not truly real time. Now notice this downloaded file called quotes.csv. So this is nice because now we can tie it together last week's discussion of CSV and open up this file. It's going to open up a spreadsheet program like numbers or like Excel and what do I have here, well this is it. So it's a very simple spreadsheet, it seems, where the left-hand column contains ROSV, the next column contains 0.03 which is presumably the price, the next column contains today's date and then the time and then I'm not sure about those other fields, draw off the top of my head because it's not actually indicated but in a spec, we actually point out exactly what-- how you can figure out what all the various fields mean. But the real important one for now is just the price. It's 3 cents. So what's the implication of having access to CSVs? It turns out it's really easy in PHP to write code that essentially pretends to be a browser, hits a foreign URL, gets back the result, and if-- it's great if the result is in CSV. What function do we talk about last week that will actually parse the CSV file and hand you back an array? What's that? 

XPath? 

Not XPath. So XPath was only for XML but there was a function that we talked about that can open a CSV file and parse it for you and hand all of the issues involving commas and quotes and the like. Louis? 

Fgetcsv. 

Yeah, so fgetcsv. We didn't use it but now is going to be our chance tonight to actually use this function to fetch the data and then do something with it. So we can pretend to be a browser, effectively downloading the CSV file. Why? Well if we'd look at the Yahoo Finance URL, let's take a look at what it actually is. Let me go ahead and control click and choose copy link address and then paste this into a little text file. Notice that this is the URL that we just visited, download.finance .yahoo.com/d/ quotes.csv? The question mark is the interesting part because that means there's some dynamism. S equals ROSV and f equals a lot of stuff. So take a guess, what does the s HTTP parameter apparently represent here? Yeah? 

Search? 

Search or, yeah, more concretely here, though, what are you searching for? Isaac? 

Stock. 

The stock symbol, so s is presumably the stock symbol. Now I didn't create this URL. Remember I searched for ROSV and hit enter, but it appears to have generated later in that page under the toolbox of link that, you know, I feel like I can fake this, right? We already know that you can generate your own URLs or your own strings in PHP, surely I could copy and paste most of this URL but just change which part of it in my forthcoming project? Just the value of? 

The stock? 

S, yeah, just the value of the stock. Now what does f refer to? That's another parameter and I only know what those fields mean by looking it up in the documentation. We give you URL in line but this is just some arbitrary sort of 1990 style thing from Yahoo where they have sl, I think that is, 1d1t1. Each of those letters and numbers just represents a certain field, like the stock price, the date, the time, all of the columns we saw on our spreadsheet, they are derived from that cryptic looking string there and some random guy in the internet figured out what all of these meant and that's the URL we put in the specs so that you can go see what they all represent. But by default, we get back this. This last thing, &e=.csv is actually a hack. Older browsers sometimes used to choke when you were trying to download something like a CSV if the file extension in the URL was not actually CSV. So even though this is not a file, this is probably a dynamically generated spreadsheet of sort, a CSV file, it's not an actual file on a hard drive somewhere, that's just a silly little old school workaround for various browser issues that have largely disappeared at least in this case. So what's the takeaway here? It feels like I should be able to copy and paste this entire URL and just change this if I want to implement my own stock trading website that allows users to specify stocks via my own interface. So let's take a look at an example here. Let me go into the appliance where I have this simple page waiting, this is one of the snippets of code available on the lecture's page tonight for tonight's lecture. This is index.php. The functionality I've implemented in advance here is this. If I go here and type in GOOG as Google stock symbol, it doesn't matter if it's upper case or lower case for Yahoo, and then click Submit, notice that this is what comes back. So as of tonight, Monday, it's apparently 574.92. Let's try another one. Microsoft, MSFT is their symbol, submit, there is this 29.44. We can do something like Facebook, submit, there is this 28 and so forth. Now, where is this coming? This is not hard coded. This is generated dynamically by the server. So let's take a look now at index.php. Let me go ahead and open up index, and it's a pretty small program and index rather is just my form. So notice this is where we were a moment ago. Where do I send the user when I submit the form just to be clear? Yeah? 

Quote.php. 

Quote.php. I'm using GET just because I don't have to but I chose to, input name="symbol". I didn't call it s but not a big deal. I can call it anything I want. Type as text and then a submit button. So that's it. And now what's the next file obviously of interest? Quote.php, so let me go in there and open up quote.php, and this is how I go about getting the stock price. So notice at the very top of my file I've got some PHP tags, so now I'm in PHP mode. Notice I'm next doing this. I'm URL-encoding s. What does it mean to URL-encode something? Yeah? 

I think it takes all the-- all the spaces and turns it into, like, [inaudible] and all that. 

Yes. Spaces are generally just simplified as pluses but are B percent 20, yup, percent 20. So yes, URL-encode takes a string and ensures that there are no dangerous characters or confusing characters for URL's sake. What are the confusing characters? Well, URLs can not have spaces. So how are they typically represented? With a space character or with descriptive sequence of percent two, zero. So URL-encode does that. Now, is this a big deal? I just typed in GOOG. URL-encode is not going to do anything to that string. It's not going to do anything to Microsoft. It's not going to do anything to FB. But if I typed in some weird punctuation for certain stock symbols or I hit the space bar accidentally, that would handle that kind of scenario. So this is just one of those do it as a matter of principle. Any time you send user input to a browser, you call htmlspecialchars. Any time you send user input to a database, you call mysql_real_escape_string or now the prepare function in PDO. Any time you pass something to a URL, you should call urlencode on it, those three rules of thumb. All right. So $url is my next line in line four and notice all I did literally what I promised. I copied and pasted the URL from Yahoo's website and I put this little placeholder here of $s with the curly braces around it. The curly braces are not strictly necessary. It's just good practice that I've gotten into just in case it's a complex looking variable like a superglobal or an associative array. So that just means put the value of s there and s, recall, was defined one line earlier to be the return value of urlencode. All right, the next line, what am I doing? This is what's great about PHP sometimes. It just does so many things easily, $handle = fopen URL "r" let's go in reverse order. What does the r likely mean? Yeah? 

Read. 

Read. So this doesn't really make sense in the context of URLs because it's not like you can write, change the URL writing code on your server as opposed to someone else's. But fopen is generally used to open a file, file open. But PHP overloads a lot of these file related functions. So that if you instead give it, not the name of a file that's in your current folder, but you instead give it to URL, it will open a connection to that server via TCP/IP, it will do all the requisite DNS stuff, it will get back the results of that URL and hand it to you in the form of a reference, called the file handle. So that's what $handle represents. It represents essentially the return value from that server that represents the file that came back, which is hopefully an actual CSV. Now, here is the function I promised that we could use. Fgetcsv takes a file handle, so it doesn't take a string, doesn't take a file name, it takes a file handle, which is the return the value of fopen. So, you have to do it this way. And it reads in the first row from that file. So, you would have to call fgetcsv again and again and again if the spreadsheet contained multiple stock symbols. But as we saw, there is just one. So, I'm only calling this once. So fgetcsv returns the first. And what's the data type that comes back? Well, it's an array, where it's a numerically indexed array, where $row open bracket zero close bracket will represent the first column from the CSV. Bracket one the next column, bracket two the next column and so forth. So, there is no notion of headers in CSV. Sometimes companies or people will put the first row as having words for human friendliness however fgetcsv is not going to use that at all. So, in fact, you sometimes want to call fgetcsv once to E-up that first line as being useful for humans but not for machines, because it's the next row and beyond that actually has your data. But for us, so simple, because there's only one column or one row to worry about that came back from Yahoo. So I just call fgetcsv. And then for good measure, I call fclose on the handle to close that connection. I don't need anything more. Now, down below, I have some super simple HTML and notice what I'm plugging in, the current price of this. Now, just to be clear, why am I calling htmlspecialchars on the symbol that the user typed in to my little form? Yeah. [ Inaudible Remark ] Exactly. I want to protect against what we've been calling cross-site scripting attacks, so more on that again to come. But for now, we want to make sure that they've not typed any dangerous characters or been tricked into typing in some dangerous characters. And now the interesting part, the current price of that symbol is and then some more PHP code over here. And what am I outputting? Row bracket one. How do I know it's one? I just looked at the CSV, right? We just open the N numbers or an Excel. We saw that column one, not zero, column one is the price, so I want to print that out. Now there is a problem here, like Jack's concern earlier about money, could end up causing us some problem. And if the price happens to be something, like a $1.90, maybe we would get $1.09. You'll have to actually experiment with some real stock quotes and see what kinds of numbers Yahoo is spitting out. But realize there's a function called number_format in PHP, printf in PHP. Both of which allow you to format numbers to a specific number of decimal places, so you can guarantee that your money will be formatted in the right way. And back to our discussion now of databases, what were the data types that allow decimal points? It was float, double and there is one other. Yeah. 

Decimal. 

Yeah. It was called decimal, which doesn't exists in most programming languages, but this is a database field type that actually allows you to specify precisely the number of digits you want before the period and after the period. So for this look-- ultimately look in the MySQL documentation for more. It's very easy to use. But you'll be able to decide you probably want two, maybe three, maybe even four numbers after the decimal point. It really depends on what kind of precision Yahoo is returning, is it just cents or is it fractions of cents. And ask for how many numbers to the left of the symbol-- the left of the period you'll kind of have to make a judgment call. I mean, Google is like $500, so you need at least three digits. Berkshire Hathaway, sometimes $1000, so you might need four digits. But you probably don't need 20 digits because that would be a really expensive stock. So you're going to have a judgment call there. But realize that decimal for money is probably your best bet, because there is no imprecision. If you specify give me two numbers after the decimal place, that's what you'll get. You won't get weird rounding issues as you would more traditionally. So realize decimal will be your friend in the database and something, like printf or number_format, will be your friend when it comes to the aesthetics of showing a human the number. All right. So, that's how we can fetch stock queries, and that's it for integration. What's really nice here is that with CS75 finance you can integrate real time stock prices into your application by just using this machine readable format, CSV, which we all agree is kind of a crappy format. It's not very flexible. You can't sort of tag information very well. It's very flood. It's vulnerable to like commas and quotes being an awkward place. But it's at least machine readable. And it's not something that we have to worry about parsing ourselves. So, when we get back that value, we can just plug it right in. And we can now do it to perform math and bys and the cells and the like, so a bit of a word on strategy then. So, this website has to support these features. And the dot, dot, dot refers to other things that you might want to add by your own choice. So, you're going to need to enable users to log in to your website. Now, you have a bit of a chicken and an egg problem here, because how you're going to implement log in functionality before you actually have users. And again, obviously, it's going to be you pretending to be a user or creating users. So, you'll think through how best to approach this. If you want to implement log in, well, you have the advantage of having seen like nine different log in examples thus far, probably you shouldn't use the earliest of those, with hard coded John Harvard username and password. But the most recent ones, where you actually use the database. So, you actually have code already from class that you can adopt for your own project to implement log ins. Now, how do you implement registration? Well, registration is not going to use select from your user's table. What keywords we're going to use instead? Yeah. 

Insert into. 

Insert into, right? If you want to register for an account, you, the programmer, are probably going to have to call one or more insert statements to add that user to the table once they've told you their preferred username and password. Passwords, that's in the design decision, right? We talked about not storing in clear text, not the best practice. You should at least hash it, but using the password function might not be the best approach. So, again, more on project 1 in sections tonight. But realize even though the functionality is maybe obvious, the implementation is meant to be non-obvious. A lot of interesting decision points to make. What I would propose in general is that if you want to implement log-in first, that's fine and you should. But you can simulate registrations, how? Just use phpMyAdmin, right? Create your user's table. Literally click the insert tab and just manually insert some fake users, so that you at least have someone to play with. Then, go and test your log in code using those users you already inserted manually. Once you have log in working, then you can actually implement register.php or similar to actually do those inserts. So realize, you certainly shouldn't try tackling steps one and two and three and four and five and then test everything, you can absolutely do this step by step by just simulating certain things by using the command line or phpMyAdmin. So, what about get quotes? Well, here, too, you kind of have some code from class, right? How do you go about reading a stock quote from Yahoo? It's like two lines of code, maybe three. So you at least have that module now. But that's a good candidate to write a function for it, so it's not going to be the only thing your file does. You can write a function that you then call, so you could have like a helpers.php file again or something like that if you've gone that route for project 0. So getting the quote is relatively easy. Now selling and buying is less obvious. So, why do I propose selling first? Create your portfolio table, or whatever you're going to call the table that keeps track of who has bought what. And how do you stimulate buying stocks before you have that in functionality? Use phpMyAdmin, right? Go to the insert tab and just manually pretend like you bought Goog and these many shares or you bought Microsoft and these many shares. Then you can implement selling functionality. And do look at the spec, because we do specify what we mean by selling. We, for instance, don't let yourself fractions of shares. We specify that you can-- the user has to sell them all at once, so you don't have to sell just a few of them. But when it comes to buying, you're going to have to do a few things. You're going to have to not only get a quote right then and there because we want it to be current. You're going to have to then may check how much money the user actually has. So, somewhere in this database, you have to keep track of my account balance, which initially we propose as $10,000. After that, it should go up and down based on what the user does. And you're going to have to make sure that if the user has already bought some Google shares, you're going to have to update that row in your database, so that you don't just have many, many, many different Google rows when clearly you could consolidate them and just care about the total. So in short, there's going to be another-- a number of other design decisions. But how many tables do we seem to be talking about already, minimally? 

Minimally two, users and portfolios. 

OK, users and portfolios. And what kinds of field should probably go into user's table, even though there are many ways to do this? Yeah? 

Username, password, probably email and the, perhaps, portfolio ID or something like that, [inaudible] perhaps. 

OK. Good. So, your username, password and maybe email or maybe that could just be the username. The-- And some kind of unique ID, because you're going to want to correlate a user in the user's table with a portfolio owner in the portfolio's table, because what fields might go in the portfolio's table? 

Your stocks? 

Your stocks. So one field is maybe the symbol of the thing you own. One field is the ID of the person who owns that symbol. And then, what's probably at least the third field in the portfolio's table? Yeah? 

How many stocks you have [inaudible]. 

Exactly. Quantity. Do you want to store the price in the portfolio's table? 

No. 

Probably not, right, because the price is going to change presumably. So, it's not like once you buy a stock, that's what it's worth. It's going to change day to day. Yeah? 

Just looking at the stock, you might want to actually store the purchase price because [inaudible] calculate the difference. 

Good. 

It's gone up or down. 

So there is an opportunity. And here in lies the dot, dot, dot. Maybe it would be a good idea to store the purchase price recognizing that the current price will change. But then, you can do some basic arithmetic and say your portfolio is up 50% or it's down 50%. But to do that, you have to know what the starting point or the starting price was. But there, realize that you run into some design issues, like if you're going to start restoring the purchase price, now you might not want to consolidate rows in your portfolio table. Because what if I buy a share of Google at $400, but another share of Google at $500 I want to somehow remember that one of those shares was 400, one of those was 500 or I want to store the average price or something like that, now I have some non-obvious accounting issues to deal with. So realizing the spec, we do allow you to simplify certain things, but those are the kinds of tradeoffs ultimately that you need to make. Any questions? So it's actually-- you'll actually find that it's pretty neat once you have the things up and running and you actually buy and sell these stocks and then check your portfolio the next day because if you're querying Yahoo every time the user wants to look at their portfolio as you'll have to, you'll be able to see whether things are going up or down. And certainly, if you're buying like millions of shares of penny stocks or whatnot, you can actually see things change day to day if there's actually a price movement and it's going from 3 cents to 4 cents. It's actually a nice 33% profit overnight. But, of course, this is just imaginary and you're just doing the math even though the actual volume of shares might not be consistent with reality. Any questions. All right. So, why don't we officially wrap there? I'll stick around for questions. We'll let Chris get set up and dive a bit more into PDO in section, into project 1 and into SQL itself. All right. [ Silence ]