This website uses cookies to better the user experience of its visitors. Where applicable, this website uses a cookie control system, allowing users to allow or disallow the use of cookies on their computer/device on their first visit to the website. This complies with recent legislative requirements for websites to obtain explicit consent from users before leaving behind or reading files such as cookies on a user’s computer/device. To learn more click Cookie Policy.

Privacy preference center

Cookies are small files saved to a user’s computer/device hard drive that track, save, and store information about the user’s interactions and website use. They allow a website, through its server, to provide users with a tailored experience within the site. Users are advised to take necessary steps within their web browser security settings to block all cookies from this website and its external serving vendors if they wish to deny the use and saving of cookies from this website to their computer’s/device’s hard drive. To learn more click Cookie Policy.

Manage consent preferences

These cookies are necessary for the website to function and cannot be switched off in our systems. They are usually only set in response to actions made by you which amount to a request for services, such as setting your privacy preferences, logging in or filling in forms. You can set your browser to block or alert you about these cookies, but some parts of the site will not then work. These cookies do not store any personally identifiable information.
These cookies allow us to count visits and traffic sources so we can measure and improve the performance of our site. They help us to know which pages are the most and least popular and see how visitors move around the site. If you do not allow these cookies we will not know when you have visited our site, and will not be able to monitor its performance.
Cookies list
Name _rg_session
Provider rubygarage.org
Retention period 2 days
Type First party
Category Necessary
Description The website session cookie is set by the server to maintain the user's session state across different pages of the website. This cookie is essential for functionalities such as login persistence, ensuring a seamless and consistent user experience. The session cookie does not store personal data and is typically deleted when the browser is closed, enhancing privacy and security.
Name m
Provider m.stripe.com
Retention period 1 year 1 month
Type Third party
Category Necessary
Description The m cookie is set by Stripe and is used to help assess the risk associated with attempted transactions on the website. This cookie plays a critical role in fraud detection by identifying and analyzing patterns of behavior to distinguish between legitimate users and potentially fraudulent activity. It enhances the security of online transactions, ensuring that only authorized payments are processed while minimizing the risk of fraud.
Name __cf_bm
Provider .pipedrive.com
Retention period 1 hour
Type Third party
Category Necessary
Description The __cf_bm cookie is set by Cloudflare to support Cloudflare Bot Management. This cookie helps to identify and filter requests from bots, enhancing the security and performance of the website. By distinguishing between legitimate users and automated traffic, it ensures that the site remains protected from malicious bots and potential attacks. This functionality is crucial for maintaining the integrity and reliability of the site's operations.
Name _GRECAPTCHA
Provider .recaptcha.net
Retention period 6 months
Type Third party
Category Necessary
Description The _GRECAPTCHA cookie is set by Google reCAPTCHA to ensure that interactions with the website are from legitimate human users and not automated bots. This cookie helps protect forms, login pages, and other interactive elements from spam and abuse by analyzing user behavior. It is essential for the proper functioning of reCAPTCHA, providing a critical layer of security to maintain the integrity and reliability of the site's interactive features.
Name __cf_bm
Provider .calendly.com
Retention period 30 minutes
Type Third party
Category Necessary
Description The __cf_bm cookie is set by Cloudflare to distinguish between humans and bots. This cookie is beneficial for the website as it helps in making valid reports on the use of the website. By identifying and managing automated traffic, it ensures that analytics and performance metrics accurately reflect human user interactions, thereby enhancing site security and performance.
Name __cfruid
Provider .calendly.com
Retention period During session
Type Third party
Category Necessary
Description The __cfruid cookie is associated with websites using Cloudflare services. This cookie is used to identify trusted web traffic and enhance security. It helps Cloudflare manage and filter legitimate traffic from potentially harmful requests, thereby protecting the website from malicious activities such as DDoS attacks and ensuring reliable performance for genuine users.
Name OptanonConsent
Provider .calendly.com
Retention period 1 year
Type Third party
Category Necessary
Description The OptanonConsent cookie determines whether the visitor has accepted the cookie consent box, ensuring that the consent box will not be presented again upon re-entry to the site. This cookie helps maintain the user's consent preferences and compliance with privacy regulations by storing information about the categories of cookies the user has consented to and preventing unnecessary repetition of consent requests.
Name OptanonAlertBoxClosed
Provider .calendly.com
Retention period 1 year
Type Third party
Category Necessary
Description The OptanonAlertBoxClosed cookie is set after visitors have seen a cookie information notice and, in some cases, only when they actively close the notice. It ensures that the cookie consent message is not shown again to the user, enhancing the user experience by preventing repetitive notifications. This cookie helps manage user preferences and ensures compliance with privacy regulations by recording when the notice has been acknowledged.
Name referrer_user_id
Provider .calendly.com
Retention period 14 days
Type Third party
Category Necessary
Description The referrer_user_id cookie is set by Calendly to support the booking functionality on the website. This cookie helps track the source of referrals to the booking page, enabling Calendly to attribute bookings accurately and enhance the user experience by streamlining the scheduling process. It assists in managing user sessions and preferences during the booking workflow, ensuring efficient and reliable operation.
Name _calendly_session
Provider .calendly.com
Retention period 21 days
Type Third party
Category Necessary
Description The _calendly_session cookie is set by Calendly, a meeting scheduling tool, to enable the meeting scheduler to function within the website. This cookie facilitates the scheduling process by maintaining session information, allowing visitors to book meetings and add events to their calendars seamlessly. It ensures that the scheduling workflow operates smoothly, providing a consistent and reliable user experience.
Name _gat_UA-*
Provider rubygarage.org
Retention period 1 minute
Type First party
Category Analytics
Description The _gat_UA-* cookie is a pattern type cookie set by Google Analytics, where the pattern element in the name contains the unique identity number of the Google Analytics account or website it relates to. This cookie is a variation of the _gat cookie and is used to throttle the request rate, limiting the amount of data collected by Google Analytics on high traffic websites. It helps manage the volume of data recorded, ensuring efficient performance and accurate analytics reporting.
Name _ga
Provider rubygarage.org
Retention period 1 year 1 month 4 days
Type First party
Category Analytics
Description The _ga cookie is set by Google Analytics to calculate visitor, session, and campaign data for the site's analytics reports. It helps track how users interact with the website, providing insights into site usage and performance.
Name _ga_*
Provider rubygarage.org
Retention period 1 year 1 month 4 days
Type First party
Category Analytics
Description The _ga_* cookie is set by Google Analytics to store and count page views on the website. This cookie helps track the number of visits and interactions with the website, providing valuable data for performance and user behavior analysis. It belongs to the analytics category and plays a crucial role in generating detailed usage reports for site optimization.
Name _gid
Provider rubygarage.org
Retention period 1 day
Type First party
Category Analytics
Description The _gid cookie is set by Google Analytics to store information about how visitors use a website and to create an analytics report on the website's performance. This cookie collects data on visitor behavior, including pages visited, duration of the visit, and interactions with the website, helping site owners understand and improve user experience. It is part of the analytics category and typically expires after 24 hours.
Name _dc_gtm_UA-*
Provider rubygarage.org
Retention period 1 minute
Type First party
Category Analytics
Description The _dc_gtm_UA-* cookie is set by Google Analytics to help load the Google Analytics script tag via Google Tag Manager. This cookie facilitates the efficient loading of analytics tools, ensuring that data on user behavior and website performance is accurately collected and reported. It is categorized under analytics and assists in the seamless integration and functioning of Google Analytics on the website.

When and How You Should Denormalize a Relational Database

  • 255743 views
  • 13 min
  • Jan 13, 2020
Gleb B.

Gleb B.

Copywriter

Alex B.

Alex B.

Ruby/JS Developer

Share

Website speed is a priority for businesses in 2020.

Faster websites rank higher on search engines and also provide better user experiences, resulting in higher conversion rates. No wonder website owners demand faster page loading speeds – leaving developers to make it happen.

Database optimization is an essential step to improve website performance. Typically, developers normalize a relational database, meaning they restructure it to reduce data redundancy and enhance data integrity. However, sometimes normalizing a database isn’t enough, so to improve database performance even further developers go the other way around and resort to database denormalization.

In this article, we take a closer look at denormalization to find out when this method is appropriate and how you can do it.

When to denormalize a database

What is database denormalization? Before diving into the subject, let’s emphasize that normalization still remains the starting point, meaning that you should first of all normalize a database’s structure. The essence of normalization is to put each piece of data in its appropriate place; this ensures data integrity and facilitates updating. However, retrieving data from a normalized database can be slower, as queries need to address many different tables where different pieces of data are stored. Updating, to the contrary, gets faster as all pieces of data are stored in a single place.

The majority of modern applications need to be able to retrieve data in the shortest time possible. And that’s when you can consider denormalizing a relational database. As the name suggests, denormalization is the opposite of normalization. When you normalize a database, you organize data to ensure integrity and eliminate redundancies. Database denormalization means you deliberately put the same data in several places, thus increasing redundancy.

“Why denormalize a database at all?” you may ask. The main purpose of denormalization is to significantly speed up data retrieval. However, denormalization isn’t a magic pill. Developers should use this tool only for particular purposes:

# 1 To enhance query performance

Typically, a normalized database requires joining a lot of tables to fetch queries; but the more joins, the slower the query. As a countermeasure, you can add redundancy to a database by copying values between parent and child tables and, therefore, reducing the number of joins required for a query.

#2 To make a database more convenient to manage

A normalized database doesn’t have calculated values that are essential for applications. Calculating these values on-the-fly would require time, slowing down query execution.

You can denormalize a database to provide calculated values. Once they’re generated and added to tables, downstream programmers can easily create their own reports and queries without having in-depth knowledge of the app’s code or API.

#3 To facilitate and accelerate reporting

Often, applications need to provide a lot of analytical and statistical information. Generating reports from live data is time-consuming and can negatively impact overall system performance.

Denormalizing your database can help you meet this challenge. Suppose you need to provide a total sales summary for one or many users; a normalized database would aggregate and calculate all invoice details multiple times. Needless to say, this would be quite time-consuming, so to speed up this process, you could maintain the year-to-date sales summary in a table storing user details.

Database denormalization techniques

Now that you know when you should go for database denormalization, you’re probably wondering how to do it right. There are several denormalization techniques, each appropriate for a particular situation. Let’s explore them in depth:

Storing derivable data

If you need to execute a calculation repeatedly during queries, it’s best to store the results of it. If the calculation contains detail records, you should store the derived calculation in the master table. Whenever you decide to store derivable values, make sure that denormalized values are always recalculated by the system.

Here are situations when storing derivable values is appropriate:

  • When you frequently need derivable values
  • When you don’t alter source values frequently
Advantages Disadvantages
No need to look up source values each time a derivable value is needed Running data manipulation language (DML) statements against the source data requires recalculation of the derivable data
No need to perform a calculation for every query or report Data inconsistencies are possible due to data duplication

Example

Denormalization through Storing Derivable Data

As an example of this denormalization technique, let’s suppose we’re building an email messaging service. Having received a message, a user gets only a pointer to this message; the pointer is stored in the User_messages table. This is done to prevent the messaging system from storing multiple copies of an email message in case it’s sent to many different recipients at a time. But what if a user deletes a message from their account? In this case, only the respective entry in the User_messages table is actually removed. So to completely delete the message, all User_messages records for it must be removed.

Denormalization of data in one of the tables can make this much simpler: we can add a users_received_count to the Messages table to keep a record of User_messages kept for a specific message. When a user deletes this message (read: removes the pointer to the actual message), the users_received_count column is decremented by one. Naturally, when the users_received_count equals zero, the actual message can be deleted completely.

Using pre-joined tables

To pre-join tables, you need to add a non-key column to a table that bears no business value. This way, you can dodge joining tables and therefore speed up queries. Yet you must ensure that the denormalized column gets updated every time the master column value is altered.

This denormalization technique can be used when you have to make lots of queries against many different tables – and as long as stale data is acceptable.

Advantages Disadvantages
No need to use multiple joins DML is required to update the non-denormalized column
You can put off updates as long as stale data is tolerable An extra column requires additional working and disk space

Example

Denormalization through Pre-Joined Tables

Imagine that users of our email messaging service want to access messages by category. Keeping the name of a category right in the User_messages table can save time and reduce the number of necessary joins.

In the denormalized table above, we introduced the category_name column to store information about which category each record in the User_messages table is related to. Thanks to denormalization, only a query on the User_messages table is required to enable a user to select all messages belonging to a specific category. Of course, this denormalization technique has a downside − this extra column may require a lot of storage space.

Using hardcoded values

If there’s a reference table with constant records, you can hardcode them into your application. This way, you don’t need to join tables to fetch the reference values.

However, when using hardcoded values, you should create a check constraint to validate values against reference values. This constraint must be rewritten each time a new value in the A table is required.

This data denormalization technique should be used if values are static throughout the lifecycle of your system and as long as the number of these values is quite small. Now let’s have a look at the pros and cons of this technique:

Advantages Disadvantages
No need to implement a lookup table Recoding and restating are required if look-up values are altered
No joins to a lookup table

Example

Using Hardcoded Values

Suppose we need to find out background information about users of an email messaging service, for example the kind, or type, of user. We’ve created a User_kinds table to store data on the kinds of users we need to recognize.

The values stored in this table aren’t likely to be changed frequently, so we can apply hardcoding. We can add a check constraint to the column or build the check constraint into the field validation for the application where users sign in to our email messaging service.

Keeping details with the master

There can be cases when the number of detail records per master is fixed or when detail records are queried with the master. In these cases, you can denormalize a database by adding detail columns to the master table. This technique proves most useful when there are few records in the detail table.

Advantages Disadvantages
No need to use joins Increased complexity of DML
Saves space

Example

Denormalization through Keeping Details with Master

Imagine that we need to limit the maximum amount of storage space a user can get. To do so, we need to implement restraints in our email messaging service − one for messages and another for files. Since the amount of allowed storage space for each of these restraints is different, we need to track each restraint individually. In a normalized relational database, we could simply introduce two different tables − Storage_types and Storage_restraints − that would store records for each user.

Instead, we can go a different way and add denormalized columns to the Users table:

message_space_allocated

message_space_available

file_space_allocated

file_space_available

In this case, the denormalized Users table stores not only the actual information about a user but the restraints as well, so in terms of functionality the table doesn’t fully correspond to its name.

Repeating a single detail with its master

When you deal with historical data, many queries need a specific single record and rarely require other details. With this database denormalization technique, you can introduce a new foreign key column for storing this record with its master. When using this type of denormalization, don’t forget to add code that will update the denormalized column when a new record is added.

Advantages Disadvantages
No need to create joins for queries that need a single record Data inconsistencies are possible as a record value must be repeated

Example

Repeating a Single Detail with its Master

Often, users send not only messages but attachments too. The majority of messages are sent either without an attachment or with a single attachment, but in some cases users attach several files to a message.

We can avoid a table join by denormalizing the Messages table through adding the first_attachment_name column. Naturally, if a message contains more than one attachment, only the first attachment will be taken from the Messages table while other attachments will be stored in a separate Attachments table and, therefore, will require table joins. In most cases, however, this denormalization technique will be really helpful.

Adding short-circuit keys

If a database has over three levels of master detail and you need to query only records from the lowest and highest levels, you can denormalize your database by creating short-circuit keys that connect the lowest-level grandchild records to higher-level grandparent records. This technique helps you reduce the number of table joins when queries are executed.

Advantages Disadvantages
Fewer tables are joined during queries Need to use more foreign keys
Need extra code to ensure consistency of values

Example

Adding Short-Circuit Keys

Now let’s imagine that an email messaging service has to handle frequent queries that require data from the Users and Messages tables only, without addressing the Categories table. In a normalized database, such queries would need to join the Users and Categories tables.

To improve database performance and avoid such joins, we can add a primary or unique key from the Users table directly to the Messages table. This way we can provide information about users and messages without querying the Categories table, which means we can do without a redundant table join.

Drawbacks of database denormalization

Now you’re probably wondering: to denormalize or not to denormalize?

Though denormalization seems like the best way to increase performance of a database and, consequently, an application in general, you should resort to it only when other methods prove inefficient. For instance, often insufficient database performance can be caused by incorrectly written queries, faulty application code, inconsistent index design, or even improper hardware configuration.

Denormalization sounds tempting and extremely efficient in theory, but it comes with a number of drawbacks that you must be aware of before going with this strategy:

  • Extra storage space

    When you denormalize a database, you have to duplicate a lot of data. Naturally, your database will require more storage space.

  • Additional documentation

    Every single step you take during denormalization must be properly documented. If you change the design of your database sometime later, you’ll need to revise all rules you created before: you may not need some of them or you may need to upgrade particular denormalization rules.

  • Potential data anomalies

    When denormalizing a database, you should understand that you get more data that can be modified. Accordingly, you need to take care of every single case of duplicate data. You should use triggers, stored procedures, and transactions to avoid data anomalies.

  • More code

    When denormalizing a database you modify select queries, and though this brings a lot of benefits it has its price − you need to write extra code. You also need to update values in new attributes that you add to existing records, which means even more code is required.

  • Slower operations

    Database denormalization may speed up data retrievals but at the same time it slows down updates. If your application needs to perform a lot of write operations to the database, it may show slower performance than a similar normalized database. So make sure to implement denormalization without damaging the usability of your application.

Database denormalization tips

As you can see, denormalization is a serious process that requires a lot of effort and skill. If you want to denormalize databases without any issues, follow these useful tips:

  1. Instead of trying to denormalize the whole database right away, focus on particular parts that you want to speed up.
  2. Do your best to learn the logical design of your application really well to understand what parts of your system are likely to be affected by denormalization.
  3. Analyze how often data is changed in your application; if data changes too often, maintaining the integrity of your database after denormalization could become a real problem.
  4. Take a close look at what parts of your application are having performance issues; often, you can speed up your application by fine-tuning queries rather than denormalizing the database.
  5. Learn more about data storage techniques; picking the most relevant can help you do without denormalization.

Final thoughts

You should always start from building a clean and high-performance normalized database. Only if you need your database to perform better at particular tasks (such as reporting) should you opt for denormalization. If you do denormalize, be careful and make sure to document all changes you make to the database.

Before going for denormalization, ask yourself the following questions:

  • Can my system achieve sufficient performance without denormalization?
  • Might the performance of my database become unacceptable after I denormalize it?
  • Will my system become less reliable?

If your answer to any of these question is yes, then you’d better do without denormalization as it’s likely to prove inefficient for your application. If, however, denormalization is your only option, you should first normalize the database correctly, then move on to denormalizing it, carefully and strictly following the techniques we’ve described in this article.

For more insights into the latest trends in software development, subscribe to our blog.

CONTENTS

Authors:

Gleb B.

Gleb B.

Copywriter

Alex B.

Alex B.

Ruby/JS Developer

Rate this article!

Nay
So-so
Not bad
Good
Wow
59 rating, average 4.59 out of 5

Share article with

Comments (0)

There are no comments yet

Leave a comment

Subscribe via email and know it all first!