ActiveRecord: The Great SQLizer Pt. 2

On advanced SQL commands, table creation & ActiveRecord methods

Hassan
12 min readJan 6, 2020

In this article, we’re going to finish off raw SQL with some advanced commands/syntax and table creation, then we’re going to go into some ActiveRecord and talk about some important methods. Although it is not required, you may find it beneficial to check out Pt. 1 of this article series: https://medium.com/@hbmalik88/activerecord-the-great-sqlizer-pt-1-cb826eb17210.

Advanced SQL Commands

Timestamps — Timestamps are important pieces of data that we often need in our applications, we also need to be able to manipulate/extract information from this data. Fortunately, SQL provides us with a command that let us do that: the extract command lets us extract parts of a date. The syntax is SELECT extract(day from payment_date) FROM payments. The from in the extract syntax is conventionally lowercase, because it helps avoid confusion that we’re pulling from a column and not a table. Different SQL engines may use slightly different syntax, so it is best to check online documentation for more on date/time information.

Mathematical Functions — SQL comes with a lot of built in functions for numeric datatypes which you can use to operate on column values. Some of the major ones are below:

  1. ABS(x) — Gives you the absolute value of a number.
  2. MOD(x,y) — Gives you the remainder of y into x.
  3. SIGN(x) — Returns 1 if the number is positive, -1 if it is negative and 0 if it is 0.
  4. FLOOR(x) — Returns the floor of a number.
  5. CEIL(x) — Returns the ceiling of a number.
  6. POWER(x, y) — Returns the value of x to the y power.
  7. ROUND(x) — Rounds off to the nearest whole integer.

String Operators & Functions

SQL provides us with string operators and functions that let us manipulate strings, some of them are below:

  1. || — Used for string concatenation.
  2. len(str) — Returns the length of a string.
  3. upper(str) — Returns the string in uppercase.
  4. lower(str) — Returns the string in lowercase.
  5. Regular expressions — You can also use regular expressions for more complex matches. Please refer to documentation for your specific SQL engine.

Subqueries — Eliminates the need for multiple queries, it lets you run a query on a query. In order to include a subquery, you can put parentheses around the subquery and put it as a condition in the WHERE clause of a SQL statement.

Self Join — You use self join when you want to combine rows in a table with other rows in the same table. To perform a self join you need to use a table alias to be able to differentiate between the left and right tables. Self join doesn’t actually use the JOIN keyword (but you can use JOIN in the syntax; you would change the WHERE to ON and JOIN would come in between the first and second table), instead it uses aliases to pull off the desired functionality. An example is: SELECT e1.employee_name FROM employee AS e1, employee AS e2 WHERE e1.employee_location = e2.employee_location AND e2.employee_name = ‘Joe’. This would be returning the rows that have the same location as Joe.

Generally, it makes sense to rewrite queries that refer to themselves (such as in some subqueries) as self-joins, because self-joins are more efficient and easier to read. Self-joins can be done with left or right outer joins.

Table Creation

Creating your tables is important! It’s the first thing you do when you create your database. Remember the migration class that you create before you run that rails db:migrate command? This section goes into a bit more depth about what is going on behind the scenes with the creation of those tables.

Before I talk about the commands that go into table creation, I’ll talk about an extremely important thing to be aware of is datatypes! The different types offered can be dependent on your SQL engine, so it is best to documentation for all the possible datatypes. Some of the most common datatypes in PostgreSQL are: boolean, character, number and temporal. I’ll go into each of the aforementioned types below.

  1. Boolean — The keyword for a boolean datatype is bool or boolean. Booleans are always true/false and null if unknown. PostgreSQL will convert data into boolean types when inserting data into a boolean column.
  2. Character — Three types of char; single char, fixed-length character string char(n), variable-length character string varchar(n). If you insert into a fixed-length column and the string is less than the length, PostgreSQL will pad the string, if the length is longer PostgreSQL will issue an error. Varchar strings are not padded and can be up to length n.
  3. Numbers — PostgreSQL has two types of numbers: integers and floating point numbers. There are 3 different types of ints: smallint, int and and serial. A small int is defined as 2 byte signed integer with a range of -32768 to 32768. An int is defined as 4 byte signed integer with a range of -214783648 to 214783648. Serial is defined the same as int, except PostgreSQL will populate values into the column automatically (similar to auto_increment in other databases).
  4. Temporal — Temporal types store date/time information. The different types are: Date, Time, Timestamp, Interval and Timestamptz. Date is defined as storing date data. Time is defined as storing time data. Timestamp is defined as storing both date and time, Interval stores the difference between timestamps. Timestamptz is defined as storing both timestamp and timezone data.

Another concept that I’d like to discuss before diving into the table creation syntax is of primary and foreign keys.

Primary Key — A table can have only one primary key, it is good practice to add one for every table. When you add a primary key it adds a unique index for specific rows. We normally add a primary key upon creation of a table, the syntax is: CREATE table_name (column_name data_type PRIMARY KEY, column_name data_type).

Foreign Key — a key that refers to the primary key of another table. The table that contains the foreign key is the referencing table or the child table, whereas the table with the primary key is the reference or parent table. You can also have multiple foreign keys in a table. Foreign keys are often used to create joins between tables.

Finally! We’re going to move onto the actual syntax which I mentioned in the primary key section and will now break down the syntax which is pretty self-explanatory. An example of a table creation SQL line is CREATE account (user_id serial PRIMARY KEY, first_name VARCHAR(50)). This line creates an account table with an auto-increasing user_id column and a first_name column that accepts strings up to 50 characters in length.

CREATE statements also allow you to have something called table constraints and column constraints.

Column Restraints

  1. NOT NULL — value cannot be NULL. An important thing to note is that NULL is missing information, i.e. an unknown email address would be NULL but if there isn’t an email address then then value would be ‘’.
  2. UNIQUE — value must be unique across the whole table, the column can have multiple NULL values because postgreSQL treats each NULL value as unique. However, the standard is only one NULL value is allowed in a UNIQUE column.
  3. PRIMARY KEY — combo of the NOT NULL and UNIQUE constraints. Only one column is allowed for primary key under column constraints, however under table constraints more columns can be assigned to the primary key.
  4. CHECK — enables you to check a condition every time an insert or update happens. For example, the values of a column must be positive.
  5. REFERENCES — foreign key constraint.

Table Restraints

  1. UNIQUE (col_list) — the columns you include in the parentheses have to have unique values.
  2. PRIMARY KEY (col_list) — the columns you include in the parentheses define the PRIMARY KEY.
  3. CHECK — check for a condition on insert/update.
  4. REFERENCES — defines the whole table as foreign key.

You can also name your constraints with the CONSTRAINT keyword followed by the name and then the constraint condition. Naming your constraint is important, because you can use it as a clue when there’s an error because SQL will show the constraint name. Otherwise, a random guid-based name will be generated for the constraint.

The INSERT command lets you insert rows into a table, the syntax is as follows: INSERT INTO table(col1,col2) VALUES(val1,val2). To put in multiple rows you would do VALUES(val1, val2), (val1, val2) and so on. You can also insert data from another table using the following syntax: INSERT INTO table SELECT col1,col2 FROM another_table WHERE condition.

The UPDATE command lets you update certain rows, the syntax is as follows: UPDATE table SET col1 = val1, col2 = val2, … WHERE condition. The SET portion of the syntax specifies which columns should have this value changed to and the WHERE clause determines the condition.

The DELETE syntax is pretty simple: DELETE FROM table WHERE condition.

You can also alter the architecture of your table with the ALTER TABLE syntax: ALTER table action. The different actions available are the following: ADD COLUMN, DROP COLUMN, RENAME COLUMN, ADD CONSTRAINT and RENAME TO.

The syntax for dropping a table is: DROP TABLE IF EXISTS table_name (the IF EXISTS is optional). You can also include the keyword RESTRICT at the end of the DROP TABLE statement to make sure it doesn’t execute if other tables depend on it for data. If you want to drop the table along with tables dependent on it you can use the CASCADE keyword at the end of the DROP TABLE statement.

Another statement to be aware of is the VIEW statement. The VIEW statement returns a database object that is of a stored query. To create a view we use the CREATE VIEW statement, the syntax is: CREATE view_name AS query; where query may be a complex join query. View is a virtual table and it isn’t actually copying data. You can also do ALTER VIEW or DROP VIEW as well.

ActiveRecord Methods

In this section, I’ll be going through the different ActiveRecord methods, their use and some caveats that may exist in their use. Before I jump into the methods I want to address a few things: ActiveRecord Relation objects, scopes and the infamous ‘N+1’ problem.

ActiveRecord relations are of the ActiveRecord::Relations class and are returned for certain methods. They are inherently different from other methods, because these methods don’t necessarily fire a query and store it in memory UNTIL you operate on them. An example is @posts.first.title, this query is only fired off because we called the .title on @posts.first. Methods implemented in the ActiveRecord::FinderMethods will not return relation objects. You can also chain methods for relations, an example would be to order post ascending and then grab the first. If you run into a relation that you want to act as an array, you can use the to_a method. ActiveRecord relations are already a performance boost that Rails gives us to use over raw SQL!

Scopes let you create an abstraction for a long where statement. For example in your code you can have scope :important, -> {where(is_important: true)}. This would let us do a command such as Post.important.all, scopes can be used in place of class methods. However, it is generally recommended to create class methods instead inside your model. The only difference between scopes and regular class methods is that if you have a conditional equaling false, the scope will still return an ActiveRecord Relation, whereas a class method will return nil which will cause a no method error for chained methods.

The N+1 problem is encountered when we execute one query but then end up executing N queries based on that one query. An example would be doing @User.all to query for all the posts and then looping through each of them to access another property such as user.city, where city would be another table. Because we’re trying to access another table, we’re running into this issue of firing off n more queries, hence the name N + 1. In order to work around this we can use something called eager loading. Eager loading lets us side load another table in one query so we aren’t querying every time we loop through each user. The syntax would be as follows: User.includes(:city). You can also have conditionals for your eager loading, but the problem with that is that it will eager load records that don’t have a match for because it uses left outer join. You can use .join instead to only get matches that completely match both tables.

Below, I will go into some ActiveRecord methods that I believe are important to know.

  1. .any? / .many? / .exists? — These methods will return a boolean based on the method, which are self-explanatory based off of their names.
  2. .find_by_sql — This method can be used to execute a raw SQL statement, but is generally not recommended because in essence you would be hardcoding your application.
  3. .select(x) — The select method will return a column from a table just like in SQL. You can do User.select(:id). You can also use aliases but you need to include them in strings not symbols i.e. @user = User.select(“users.id AS user_id”). This creates a new attribute called user_id allowing you to do @user.first.user_id. A thing to watch out for when using .select and only selecting specific columns is that you’re essentially initializing a model object with missing fields, so if you try to access a column that you haven’t selected then you’ll get a ActiveModel::MissingAttributesError. You can use .distinct chained to a .select to get distinct values from a column: i.e. Client.select(:name).distinct.
  4. .group — This method is the equivalent of GROUP BY in raw SQL. An example is Post.joins(:tags).group(“tags.name”).count. This will return an object with the different tag names with their count as the value. Remember, GROUP BY statements are often used with aggregate functions.
  5. aggregate methods — Aggregate functions discussed in pt.1 of this article series can be used as methods such as in the .group example above.
  6. .take(x) — Returns an array of model instances. The SQL equivalent is SELECT * FROM table LIMIT (x).
  7. .first — Returns the first row based on the id column of the table. The method can also take an argument of an integer indicating how many records you want returned. The SQL equivalent is: SELECT * FROM table ORDER BY client.id ASC LIMIT (x). If you are running an .first chained to a .order method, then first would be the first row out of that ordered list.
  8. .last — same as last, except it is in descending order.
  9. .order — Will order your results in ascending order, the SQL equivalent is SELECT col1 FROM table ORDER BY col1 ASC.
  10. .limit(x) & .offset(x)— .limit will limit your records to a certain number and offset will offset by the number passed in as an argument. An example is Client.limit(5).offset(30), this returns records 31–35.
  11. .lock — This method is used in a concept called pessimistic locking. Pessimistic locking as opposed to optimistic locking will let you lock an individual row on a table to prevent it from being updated. The syntax would be something like: Item.lock.first. Optimistic locking involves setting up your table architecture to have a column called lock_version that has an integer datatype. The lock_version increases on every save to a specific record. An example to understand how optimistic locking works is say you have two variables holding the first record in a table: c1 = Client.find(1), c2=Client.find(1). You then try to update the first’s name and save successfully, but when you do try to update the second one it will throw an exception because the second one will have a lower lock_version.
  12. finder methods — For every attribute in your table, ActiveRecord will give you a finder method for that attribute. If you have a first_name attribute, you can do User.find_by_first_name.
  13. .where — lets you set conditions in your query. An example is Client.where(‘order_count = ?’, params[:count]). The same statement can be written as Client.where(‘order_count = #{params[:count]}’), but that is generally not recommended because the count variable is unescaped and that can lead to SQL injection attacks should your attacker be aware of this.
  14. find_each(start, end) & find_in_batches(start, end) — The limit for storing records in memory is around a 1000. If its more than you might be interested in batching your queries with find_each and find_in_batches. Find_each can take a start and end argument or just a start argument, these arguments point to the primary key of a record in a table. Find_in_batches takes the same arguments as find_Each, the only difference between the two is that find_each subjects records individually to a block whereas find_in_batches will subject an array of records to a block. So you would be throwing those array of records into a function that’s expecting that input.
  15. .readonly — Readonly objects are useful, because you can’t alter them, the syntax is client = Client.readonly.first. Trying to alter a readonly record will raise an exception.
  16. .pluck(:column) — pluck lets you ‘pluck’ certain columns from a table and get those values, instead of having to pull the records and map over them to get the ids for example. Client.pluck(:id) would be replacing Client.select(:id).map {c| c.id}. Pluck is essentially creating Ruby objects, an array, without having to construct ActiveRecord objects, this is great on performance for a large or often running query. You can pluck off of joins as well. Pluck can’t be chained with other methods because it’s essentially returning an array and not an AR object.
  17. .explain — will explain the queries behind the AR methods you run on a model.
  18. enums — Enums map an array or hash of values to integers in the database, You can map an :available to 0 and :unavailable to 1, the syntax is as follows: enum availability: [:available, :unavailable]. If a particular record has a column value of 0 then running @product.available on it would return true. Setting up your enums based on an array can become troublesome if you’re adding or removing values from in between because these methods are relying on their position within the array. Enums can be done through a hash with the following syntax: enum availability: {available: 0, unavailable: 1}.

Although this is the end of this article series, you can expect more SQL & ActiveRecord articles in the future as I believe having strong SQL skills and being able to interact with databases efficiently is important to being a great full-stack / back-end developer.

--

--