SQL and Excel: Database Talk

Pedro Prado
3 min readFeb 4, 2021

Why learning about relational databases changed the way I approach Excel

Let’s face it. Everybody tried at least once creating a “database” in Excel. Register your daily activities, create a KPI, graphs or anything else that your company didn’t want to invest time or resources to allocate somebody from IT to help you. So you decided to adventure yourself as an engineer team of one man to create your own database in spreadsheets.

It has columns. It has rows. It stores your data. Right?

The thing is, Excel was not meant for this job.

For simple tasks such as making quick calculations and bar/pizza graphs, you won’t notice any difference. But you will realize that once your spreadsheet becomes overpopulated with more and more functions, VLOOKUPs, pivot tables/charts (and obviously, data) you will experience frequent lags, freezes and an enormous time to load it up. I am not even mentioning crashing which may happen frequently after it has become that behemoth you call “KPI-calculator” spreadsheet (that’s what mine was called, sorry).

Been there, done that.

Take a look at relational databases which commonly use systems well known such as MySQL or Postgre. Relational databases as the name suggests, have your info spread throughout many tables which are (generally) related to each other. You may be wondering: “Why the hell I want to fractionate my data? I want it all bunched up in the same table, so I can look how beautiful it is!”. Well, you may want to do that but space may be an issue in your company if they base their business in a lot of data and calculations around it.

Therefore storing your data into different tables and relating each one of them to another through Primary/Foreign keys (instead of having every possible combination of each column in the same table) will save you space in most cases.

“Alright. What is this primary key you mentioned?”

In a simplified view, imagine that every table you have, have a very unique ID as an attribute (or a column of your table, if you’d like to visualize things better). This is a primary key. This table where the primary key is may communicate to another table where a compatible foreign key is. Primary and foreign keys are the intermediators of this connection between two tables. Think about an embassador of a country as both a primary and a foreign key. He transits between affairs of two countries (our two tables) in the same language (using the same set of keys both in primary and foreign) to make a connection between them.

We need primary and foreign keys here!

So imagine now that those 100s of tables you have in your database connect to each other through this primary-foreign key logic. You will save a lot of space condensing information in small tables instead of having a big as* table with all the possible combinations /redundancies allowed by your data complexity, won’t you?

That’s the primary reason you’d use a relational database instead of storing it in an Excel spreadsheet — which would behave like a NoSQL database.

If you hit the “Microsoft Excel has stopped working” landmark with your spreadsheets, maybe it’s time to move on to something more appropriate.

Some interesting articles about our discussion:

Great article by Dorota on spreadsheets, database and corporate shenanigans related to Excel and Relational Databases: https://academy.vertabelo.com/blog/import-data-excel-database/

A FB engineer telling how they moved from SQL to NoSQL when issues such as loss of developer agility and consistency between cache & DB became unbearable as the company skyrocket in the mid 2000s: https://blog.yugabyte.com/facebooks-user-db-is-it-sql-or-nosql/

A relatively old (but interesting) blog post on the topic of Excel as a NoSQL database: https://gobansaor.wordpress.com/2010/03/02/excel-as-a-document-oriented-nosql-database/

--

--

Pedro Prado

Identity crisis between a Data Scientist and Data Engineer