Data Model

The most abstract level of a database design is the data model, the conceptual description of a problem space. Data models are expressed in terms of entities, attributes, domains, and relationships.

Entities

It's difficult to provide a precise formal definition of the term entity, but the concept is intuitively quite straightforward: an entity is anything about which the system needs to store information.

When you begin to design your data model, compiling an initial list of entities isn't difficult. When you (or your clients) talk about the problem space, most of the nouns and verbs used will be candidate entities. "Customers buy products. Employees sell products. Suppliers sell us products." The nouns "Customers," "Products," "Employees," and "Suppliers" are all clearly entities.

The events represented by the verbs "buy" and "sell" are also entities, but a couple of traps exist here. First, the verb "sell" is used to represent two distinct events: the sale of a product to a customer and the purchase of a product by the organization. That's fairly obvious in this example, but it's an easy trap to fall into, particularly if you're not familiar with the problem space.

Attributes

Your system will need to keep track of certain facts about each entity. These facts are referred to as the entity's attributes. If your system includes Customer entities, for example, you'll probably want to know the names and addresses of the customers and perhaps the businesses they're in. If you're modeling an event such as a Service Call, you'll probably want to know who the customer was, who made the call, when it was made, and whether the problem was resolved.

Determining the attributes to be included in your model is a semantic process. You must make your decisions based on what the data means and how it will be used. Let's look at one common example: an address. Do you model the address as a single entity (the Address) or as a set of entities (HouseNumber, Street, City, State, ZipCode)? Most designers (myself included) would tend to automatically break the address up into a set of attributes on the general principle that structured data is easier to manipulate, but this is not necessarily correct and certainly not straightforward.

Let's take, for instance, a local amateur musical society. It will want to store the addresses of its members in order to print mailing labels. Since all the members live in the same city, there is no reason to ever look at an address as anything other than a blob: a single, multiline chunk of text that gets spat out on demand.

Domains

Domains are often confused with data types, but this is inaccurate. Data type is a physical concept while domain is a logical one. "Number" is a data type, and "Age" is a domain. To give another example, "StreetName" and "Surname" might both be represented as text fields, but they are obviously different kinds of text fields; they belong to different domains.

Domain is also a broader concept than data type in that a domain definition includes a more specific description of the valid data. Take, for example, the domain DegreeAwarded, which represents the degrees awarded by a university. In the database schema, this attribute might be defined as Text[3], but it's not just any three-character string, it's a member of the set {BA, BS, MA, MS, PhD, LLD, MD}.

Of course, not all domains can be defined by simply listing their values. Age, for example, contains a hundred or so values if we're talking about people, but tens of thousands if we're talking about museum exhibits. In such instances it's useful to define the domain in terms of the rules which can be used to determine the membership of any specific value in the set of all valid values. For example, PersonAge could be defined as "an integer in the range 0 to 120," whereas ExhibitAge might simply be "an integer equal to or greater than 0."

Relationships

In addition to the attributes of each entity, a data model must specify the relationships between entities. At the conceptual level, relationships are simply associations between entities. The statement "Customers buy products" indicates that a relationship exists between the entities Customers and Products. The entities involved in a relationship are called its participants. The number of participants is the degree of the relationship. (The degree of a relationship is similar to, but not the same as, the degree of a relation, which is the number of attributes.)

The vast majority of relationships are binary, like the "Customers buy products" example, but this is not a requirement. Ternary relationships, those with three participants, are also common. Given the binary relationships "Employees sell products" and "Customers buy products," there is an implicit ternary relationship "Employees sell products to customers." However, specifying the two binary relationships does not allow us to identify which employees sold which products to which customers; only a ternary relationship can do that.

A special case of a binary relationship is an entity that participates in a relationship with itself. This is often called the bill of materials relationship and is most often used to represent hierarchical structures. A common example is the relationship between employees and managers: any given employee might both be a manager and have a manager.

The relationship between any two entities can be one-to-one, one-to-many, or many-to-many. One-to-one relationships are rare, most often being used between supertype and subtype entities. To return to our earlier example, the relationship between an employee and that employee's salesperson details is one-to-one.

One-to-many relationships are probably the most common type. An invoice includes many products. A salesperson creates many invoices. These are both examples of one-to-many relationships.

top

End.