SUNY Geneseo Department of Mathematics

Lab 9—Cell Arrays and Relations

Math 230 01
Fall 2015
Prof. Doug Baldwin

Complete by Friday, December 11
Grade by Tuesday, December 15

Purpose

This lesson develops your understanding of cell arrays in Matlab. In the process, it introduces you to the mathematical notion of a “relation” and how it provides a mathematical foundation for computer databases.

Background

Many programs operate on collections of data. Vectors (and matrices) are familiar ways of forming collections in Matlab. However, all of the elements in a vector or matrix must be the same type (for instance, you can’t have a vector in which some elements are numbers and others are strings—either all must be numbers, or all must be strings—and in fact, strings of the same length). Cell arrays are a Matlab data structure that allow you to create things much like vectors and matrices, but with elements of different types. Sections 15.2 through 15.4 of our textbook discuss cell arrays, and we will talk about them in class on December 4.

One type you will need for this lab that we haven’t talked about in class, although you have used it informally, is the string. We will talk about strings in class on December 7. Our textbook discusses them in chapter 8, and section 15.9 specifically discusses using strings with cell arrays.

Programs that work with large data structures (such as cell arrays) typically also need to save those structures in, and retrieve them from, files. Matlab provides a very easy-to-use way of saving Matlab variables in so-called “.mat” files. Matlab also makes it easy to read variables from “.mat” files. See section 6.5 of our textbook for more information. We will discuss files in class on December 9.

More sophisticated data storage than single files is often provided by “databases.” A database is typically a collection of files that work together to support quick and large-scale searching and updating. For example, most modern online services such as email, e-commerce, etc. are supported by databases that record email messages and their status (read, archived, etc.), items for sale, customer information (name, password, credit card number, etc.), and so forth. Databases are very conveniently modelled via the mathematical concept of a “relation.” For an introduction to relations, see the “Introduction to Relations” video lecture at http://www.geneseo.edu/proofspace/ch5sec1, and for information about relations involving more than 2 values see the “n-Ary Relations” video at https://www.youtube.com/watch?v=3kW6uSee3JE. We talked about this material in class on December 2.

Activity

The following questions introduce you to the relational data model, and some of the operations that are common in relational databases. These operations are technically operations of the so-called “relational algebra.”

Question 1

A relation can very nicely be represented by a two-dimensional cell array in Matlab, with each row representing one tuple. I have provided a small database represented in this way. The database is in a file named “database.mat,” which you can download from our “Exercises” page on myCourses. It consists of four relations: three of them (in variables mathContributions, physicsContributions, and csContributions) describe some famous mathematicians, physicists, and computer scientists, respectively, some of their major contributions to mathematics/physics/computer science, and a year (more or less, some are historically unclear) for that contribution. The fourth relation, in variable institutions, describes institutions with which some of the people in the first relations were associated, along with the years the association began and ended.

Load the database into Matlab, and look at the four relations to get a sense for what is in them and how it is represented.

Figure out the general form of Matlab expression that picks a single attribute out of a single tuple in a relation. Test your ideas on the relations from my database. Is picking an attribute out of a tuple better done with cell indexing or content indexing? (Be sure you understand the difference!) Why? Can you use a similar expression to assign a new value to an attribute within a tuple?

Question 2

A simple but useful operation in relational algebra is projection, which simply amounts to extracting one or more attributes from every tuple in a relation, producing a new relation with just those attributes. For example, the query “in what year(s) did each mathematician make his or her contributions?” could be answered by projecting the year attribute out of the mathContributions relation in the database. You could also project the mathematician’s name attribute at the same time, so that you didn’t end up with years without any indication of who did something in them.

Write a Matlab function that does a projection on a relation represented by a cell array. Your function should take the cell array and a vector of indices of the columns to project as its arguments. Since projection is supposed to produce a new relation, the result of your projection should be another cell array.

Hints: (1) If you’re having trouble understanding what the projection function should do, try coming up with examples to clarify it. In other words, take one of the relations from the database, and work out by hand what a projection of one or a couple of columns from it would be. Then try to figure out how the inputs to what you just did constitute a “cell array” and a “vector of indices of the columns to project.” Finally, write yourself examples of calls to a hypothetical function with that cell array and vector as arguments—now you have one or more concrete examples of calls to the function you need to write; they tell you what to name the function and what arguments it has. From there, you can write the function and test it to see if gets the same results from your example(s) as you got by hand. (2) The projection function can be written with literally one line of code in its body.

Question 3

Another common relational algebra operation is selection, which amounts to extracting one or more tuples from a relation (again, producing a new relation containing just those tuples). Selection typically extracts all tuples that contain a given value in a given attribute. For example, if you wanted to know what Isaac Newton did, you could select the tuples containing “Isaac Newton” in their name attribute from the mathContributions and physicsContributions relations in the database.

Write a Matlab function that performs a selection from a relation represented as a cell array. Your function should take the cell array, a value, and the index of an attribute to compare to that value as its arguments. It should return a new cell array that contains all, but only, the tuples from the input relation that have the specified value in the specified attribute.

Beware that the sizes of the value you are looking for and the atttribute value you are comparing it to might be different. This will make doing the comparison a little bit tricky, but not impossible.

Question 4

Create an entire relation of your own for the database. The relation can represent anything you want, although it might work better with the existing relations if it has something to do with the history of math or science. The facts stored in your new relation can come from any source you wish.

Save your modified database in a file. You can save it in a different “.mat” file from the original if you don’t want to risk changing the original.

Question 5

Since relations are sets, all set operations (union, intersection, etc.) can be performed on them. Union is particularly useful when the relations come from a database. For example, the database I gave you might in many cases be more useful if you could take the union of the mathContributions, physicsContributions, and csContributions relations to get a new relation that contained all the contributions.

Write a function that calculates the union of two relations, and returns the resulting relation. (Recall that the union of two sets is a new set that contains every element that was in either, or both, of the input sets. The only thing that is tricky about doing this in a program is that if the same element—i.e., tuple in this case—appears in both sets, only one copy of it should appear in the result. Matlab has a union function, but I don’t believe it works on cell arrays as they are used in this database.)

Follow-Up

I will grade this exercise in a face-to-face meeting with you. During this meeting I will look at your solution, ask you any questions I have about it, answer questions you have, etc. Please bring a written solution to the exercise to your meeting, as that will speed the process along.

Sign up for a meeting via Google calendar. If you worked in a group on this exercise, the whole group should schedule a single meeting with me. Please make the meeting 15 minutes long, and schedule it to finish before the end of the “Grade By” date above.