A database is a database

A database is a database

I see that a lot of developers seems to fail on what a database is and what it was made for.

Questions about: how can I get my last record, how can I order the records in a no normal way, how can I build this query that is totally not logical or how can I make my database to force some data into it, are question that arise every day by a lot of developer, much of them showing a lack of knowledge of what a database is, what a database is made for, and how does it work.

Let’s start from the beggining. A database is a collection of data. No more than that. First databases where only a couple of tables usually stored in separates files inside a folder.
Then it comes some database engines. This make managing data more easily, giving a handfull of tools to manage data, including index (although some of the first databases had files for indexes), store procedures, functions and some other toys to work with.

So, knowing that is the first step to understand what a database is. It’s a collection of tables, with some other functions around, that are powered by the database engine, which in turn do all the tricks around.

How data is store, depends on the database. I will only focus this text on relational database, managed using a language called SQL. (non SQL database doesn’t necessary work in any way telling here). usually data is store in files (or one file) that are managed by the engine,  storing blocks of data (to allow to retrieve them faster), and having pointers everywhere to find the data (like a file system). Well, if you think a little outside of the box, a file system is a database.

The SQL language

SQL means structured query language.
How this language works? well.. it’s use math. The query are executed in a 1 by 1 basic, using as input every row that accomplish a set of rules (the where in the query), taken from a table, or several of them (the from in the query), and showed to the user (or the process that asked for it) in the way the user ask for it (as selected in the select part).

There are other parts of the query, but this is not an sql manual, you can find a lot of them over internet. This is to explain, why a query must be logical to be executed.

There is no concept in a database that says you what row is first or last. The database doesn’t know about that concept. Well.. Some database know, but they are not telling you unless you specifically ask for it. That kind of database has logs with timestamps that can tell you which record comes before or after one. But that concepts is database specific.
So we will asume, because more of the cases are like this, that this is a common database. So, common databases can’t tell you which record comes first, or last, or after or before one. There is no concept behind that.
In fact, most database engine manual, tells you that there is no guarantee that a query without an order clause will return the records in the same order every time you execute it.

So, a simple query like this:

<br />
Select *<br />
From users<br />

can result in the same data, showed in differents ways, like:
User1
User2
User3

or

User2
User1
User3

Why the database do that? because every row fetched usually is fetch in the fastest way it can. That’s why SQL give the developer sentences like Order By, to allow the user to control that.

Indexes

Index are other part of the database that most developer seems to fail using them. An index is no other thing that the same data in the table, ordered in a way that the engine can find it faster. Usually database uses trees for store and complex searching algorithm to find them. But this is totally up to the engine. The index could be stored in a double linked list and searched using a simple divide and conquer search. The index points to the start of the row in the database file. Take in mind, that the index could take up to the same storage space that the original table, and that every index impact every insert in a table (and updates and deletes in some cases) with some overhead time, because the engine needs to update the data on them.

Order

And finally, going back to the order in a table. Order is natural. So a no natural order has to be implemented in a non natural way. Ordering months by name, in a table that has only month names, is no natural, since the database doesn’t know what order that is. The natural order for a database are string alphabetical, and numbers. Other kinds or order should be acchieved with a table that teach the database how to order, usually using an ordered key (a number) and the data in the non natural order.

Final words

So, a database is only a collection of data, powered by an engine that let’s you do a lot of things. The engine gives you flexibility to do a lot of things, but is responsability of the developer to understand how the data is stored and how to retrieve it. Rules are achieved in a combination of database constrains and enforced rules from the program responsable of storing and using that data. Giving some of that rules to the database could lead to some overhead in every process, but this is totally up to the rules behind every system, so there is no final rule about this.

Leave a comment

Your email address will not be published.


*