Relationships with MongoDB
Even if MongoDB is classified as a document database, the pieces of information stored in the database still have relationships between them. Understanding how to represent relationships, and deciding between embedding and linking relationship information, is crucial.
Having a good model is the single most important thing you can do to ensure you get good performance. The face of identifying and modeling relationships correctly is a step that is not optional in the methodology.
What are relationships in the data model??
If you look at any schema implementation in MongoDB, or any other database, you can observe objects, referred to as entities.
The relationships represent all the entities and the other piece of information are related to each other.
For example, a customer name and its customer ID have a one to one relationship. We often group this type of relationship in a single entity. The relationship between a customer and the invoice sent to them is a one to many relationship, as one customer has many invoice, but each invoice only belong to one customer. And the invoices and their products have a many to many relationship, as the invoice referred to many products, and a product is likely listed in many invoices.
Types and Cardinality of Relationships
Most of the relationships between units of information can be classified as:
one-to-one: A customer has one name, which is associated with only one customer_id, and this customer_id can only be used to identify one customer’s name. The one-to-one relations are often represented by grouping the two pieces of data in the same entity or document.
one-to-many: Invoices associated with the customer are an example of a one-to-many relationship. A customer has many invoices, and each of these invoices is only associated with one customer.
many-to-many: Finally, an invoice may contain many products, and each of these products is likely present in more invoices than just the one that we were looking at. This is called a many-to-many relationship.
However, is this the best and complete way to describe data relationships, especially when dealing with Big Data?
Let’s say we look at the relationship between a mother and the children she gave birth to. Well, she may not have children, have one, have two, ten, however, the maximum is pretty limited. Very often there are two children per family.
A different example is Twitter users. Some people just started their account and may have zero or one follower, while others may have 20, 100, or up to 100 million if they are a celebrity. In this case, many-to-many is a very poor way to characterize a relationship. And this might be true for an increasing number of examples in the world of Big Data.
We could embed the information about the children in the document representing the mother, but it would not make sense to embed 100 million followers into one document.
What we need is a more expressive way to represent the one-to-many relationship so that we know that we are dealing with large numbers and avoid mistakes associated with that distinction.
Looking at earlier examples, we are missing some information. The fact that the relationship can be a large number isn’t reflected clearly with the one-to-many description, the value for the maximum of many is not clear.
The most likely many value for a given one-to-many relationship is also missing.
Let’s introduce this additional symbol for the crow foot notation, and call it zillions. It is based on the many symbol, however, with additional lines. This relationship would read as from one to zero to zillion. Or in short, one-to-zillions. This new symbol addresses the identification of large numbers.
And if we go to the trouble of identifying the maximum number, why not preserve this information in the model? For this we use a tuple of one to three values, with the following meaning:
- Minimum, usually zero or one
- Most likely value, or the median
- Maximum. If you have two values they represent the minimum and the maximum.When a single value is used it means the relationship is fixed to that number.
One-to-Many Relationship
The most interesting type of relationship is the one-to-many relationship. First, because if all our data is only composed of one-to-one relationships, a spreadsheet application like Excel could do the job, at least for a small data set. As for the many-to-many relationships, most of them can be expressed as two one-to-many relationships.
A one-to-many relationship means that an object of a given type is associated with n objects of a second type, while the relationship in the opposite direction means that each of the objects of the second type can only be associated with one object on the one side. Example: - As an example of this relationship, we use a person and their credit cards, or a blog entry and its comments. A person has n credit cards, but each of these credit cards belongs to one and only one person.
Using MongoDB and its document model, give us a few ways to represent this kind of relationship. We can embed each document from the many side into the document on the one side. Or vise versa, we can embed the document from the one side into each document on the many side. Instead of using a single collection and embedding the information in it, we keep the documents in two separate collections and reference documents from one collection in documents of the other collection.
Many-to-Many Relationship
The many-to-many relationship is identified by documents on the first side being associated with many documents on the second side, and documents on the second side being associated with many documents on the first side.
One-to-One Relationship
Commonly, a one-to-one relationship is represented by a single table in a tabular database. In general, the same applies to MongoDB. For example a person’s name, date of birth, and email address would be kept together in the same document. All these fields have a one-to-one relationship with each other. A user in the system has one and only one name, has one and only one date of birth, and is associated with one and only one email address.
When we group information together, that is in two different entities, we refer to this action as embedding. This is in contrast to grouping fields together in a given entity. We refer to those fields as attributes of the entities.
One-to-Zillions Relationship
The one-to-zillions relationship is not that different relationship, but it is a subcase of the one-to-many relationship. If we have a one to many relationship and the many’s identified as 10,000 or more, we call that relationship one to zillion. This means we need to be mindful of this relationship every place we use it in the code. The last thing you want the application to do is to retrieve a document and its zillions associated documents, then process the complete results set.