All About Data
In MySQL (or, for that matter, any other RDBMS), every field in a table is associated with a particular data type, which defines the kind of data that may be entered into that field. This data type is the fundamental characteristic of that field - it states whether the value entered into the field must be numeric, string, binary or timestamp, determines the rules for performing calculations on that value, and helps enforce the consistency of your records. Selecting and attaching appropriate data types to your table fields is thus one of the most important things a database developer must do in the design phase of a project.
Now, MySQL supports a number of different data types, and one of the more versatile and commonly-used ones is the string data type (which itself has many sub-variants, each serving a different purpose). You’ve almost certainly used string values in your interaction with a database before - most databases would be pretty pointless without them - but have you ever taken a look at the MySQL manual and seen the rich variety of string processing functions MySQL gives you?
If you haven’t, you’re in for a pleasant surprise in the next few minutes. You see, MySQL comes with an extremely large collection of built-in functions (over 150 at last count) and many of them are designed specifically to simplify the task of string manipulation. This string processing toolkit is so powerful that many of the tasks developers normally execute at the application level - concatenation, pattern matching, search-replace operations, whitespace trimming - can actually be performed at the database layer itself.
Surprised? Don’t be. Instead, keep reading, because over the next few pages, this article will offer you a broad overview of MySQL’s string manipulation capabilities, serving as both a handy reference and a tool to help you write more efficient SQL code. Regardless of whether you’re new to MySQL or if you’ve been working with it for a while, you should find something interesting in here.
String Theory
In the MySQL world, the word “string” refers to a sequence of characters enclosed in quotes. Consider the following examples, all of which are valid MySQL string values:
"hello"
"Bond 007"
"molly dook!"
Now, here’s the twist: though string values are pretty easy to understand, putting them into a MySQL table is somewhat more involved. This because every table field that is to contain a string must be associated with a string data type, and MySQL supports not one, not two, but rather six different string data types, ranging from simple one-character string types to fields that support large blocks of text or binary data.
The simplest of them is the CHAR data type, primarily used for fixed-length strings. The length of the string that can be stored in a CHAR field ranges from 0 to 255 bytes and it must be specified at the time of field definition. An interesting characteristic of this field is that values smaller than the specified length will be right-padded with spaces, while values larger than the specified size will be automatically truncated.
A close cousin (and more efficient variant) of the CHAR data type is the VARCHAR data type, which is useful for variable-length strings. The major difference between the two types is the manner in which data is stored: the VARCHAR type uses only the number of bytes actually needed to store the string. As a result, shorter values are not padded with spaces when inserted into a field declared as type VARCHAR.
For strings greater than 255 characters in length, MySQL provides the TEXT data type. Typically used to store large blocks of text, this data type does not require the user to specify any size. If the data inserted into the field exceeds the limits set by the system, the data simply gets truncated.
A variant of the TEXT data type is the BLOB data type, which is useful for storing binary data such as images or audio files. In reality, a BLOB data type is simply a TEXT type with the added feature of being compared and sorted in a case-sensitive manner (TEXT types are case-insensitive). Both TEXT and BLOB types also come with sub-variants, such as TINYTEXT and MEDIUMTEXT, which differ primarily in the amount of data they can hold.
Consider the following example, which demonstrates how to create fields of these types, as well as insert some data into them:
mysql> CREATE TABLE strings (charField CHAR(3), varcharField VARCHAR(10), textField TEXT);
Query OK, 0 rows affected (0.00 sec)
mysql> INSERT INTO strings (charField, varcharField, textField) VALUES ('e', 'John Doe', 'It\'s a long way from Sunday today');
Query OK, 1 row affected (0.00 sec)
mysql> INSERT INTO strings (charField, varcharField, textField) VALUES ('abcdef', '[email protected]', 'So I said, "Come on, knave, and I\'ll show you who\'s better!"');
Query OK, 1 row affected (0.00 sec)
mysql> SELECT * FROM strings;
+-----------+--------------+--------------------------------------------------------------+
| charField | varcharField | textField |
+-----------+--------------+--------------------------------------------------------------+
| e | John Doe | It's a long way from Sunday today |
| abc | [email protected] | So I said, "Come on, knave, and I'll show you who's better!" |
+-----------+--------------+--------------------------------------------------------------+
2 rows in set (0.01 sec)
Notice how, when a value entered into a CHAR field is larger than the allocated field size, it gets automatically truncated.
When dealing with a string value in MySQL, it’s important to remember that string values must always be enclosed in single or double quotes, as in the example above. If your string value itself must contain a quote, you can always escape the symbol by preceding it with a backslash (the standard escape character in MySQL). The following example illustrates:
mysql> INSERT INTO strings (textField) VALUES ('So I said, "Come on, knave, and I'll show you who's better!"');
ERROR 1064: You have an error in your SQL syntax. Check the manual that corresponds to your MySQL server version for the right syntax to use near 'll show you who's better!"')' at line 1
mysql> INSERT INTO strings (textField) VALUES ('So I said, "Come on, knave, and I\'ll show you who\'s better!"');
Query OK, 1 row affected (0.00 sec)
So, now you’ve got some strings in your database. How about doing something with them?
The Length Of Your Hands
One of the most basic things you can do with a string is find out how long it is. MySQL comes with a handy LENGTH() function that does just that. Take a look:
mysql> SELECT LENGTH("advertisements");
+--------------------------+
| LENGTH("advertisements") |
+--------------------------+
| 14 |
+--------------------------+
1 row in set (0.00 sec)
String values can easily be concatenated with the CONCAT() function, which accepts a list of the fields to be concatenated. Here’s a simple example:
mysql> SELECT CONCAT("Neo","Anderson");
+--------------------------+
| CONCAT("Neo","Anderson") |
+--------------------------+
| NeoAnderson |
+--------------------------+
1 row in set (0.00 sec)
Want a space between the concatenated values? Fret not, MySQL also has the CONCAT_WS() function, which allows you to specify a separator between the concatenated values. Here’s a quick demo:
mysql> SELECT CONCAT_WS(" ","Neo","""The One""","Anderson");
+-----------------------------------------------+
| CONCAT_WS(" ","Neo","""The One""","Anderson") |
+-----------------------------------------------+
| Neo "The One" Anderson |
+-----------------------------------------------+
1 row in set (0.00 sec)
In this case, the first parameter represents the separator to be used.
How about a more useful real-life example for this function? Consider the following example that fetches the “first_name” and “last_name” fields from the “users” table, and forms a “full_name”, suitable for use in your Web page:
mysql> SELECT CONCAT_WS(" ", first_name, last_name) AS 'full_name' FROM users
WHER username="boba";
+---------------+
| full_name |
+---------------+
| Boba Fett |
+---------------+
1 row in set (0.00 sec)
Neat, huh?
Talking about spaces, programmers often forget to remove leading and trailing spaces around the values to be stored in the database, especially in the world of Web-based applications. In such situations, the spaces are stored in the database, leading to much heartburn when it comes to displaying them. If you’ve ever had this problem, there’s an easy solution - MySQL provides a set of handy “trimming” functions, as shown below:
mysql> SELECT LTRIM(" molly dook"), RTRIM("right handed ");
+----------------------------+----------------------------+
| LTRIM(" molly dook") | RTRIM("right hander ") |
+----------------------------+----------------------------+
| molly dook | right handed |
+----------------------------+----------------------------+
1 row in set (0.00 sec)
As seen above, the LTRIM() and RTRIM() function can be used to remove leading and trailing spaces respectively. Oh yeah - in case you were wondering what on earth a “molly dook” was, it’s Australian for “left-handed person”.
If you happen to be ambidextrous, no sweat - the TRIM() function removes both leading and trailing spaces, as shown below:
mysql> SELECT TRIM(" look ma no hands! ");
+------------------------------------+
| TRIM(" look ma no hands! ") |
+------------------------------------+
| look ma no hands! |
+------------------------------------+
1 row in set (0.00 sec)
But the best part is yet to come. You see, you can also add keywords to the TRIM() function to remove leading and/or trailing occurrences of a pre-defined character (instead of traditional whitespace). Take a look:
mysql> SELECT TRIM(LEADING "_" FROM "_Neo_");
+--------------------------------+
| TRIM(LEADING "_" FROM "_Neo_") |
+--------------------------------+
| Neo_ |
+--------------------------------+
1 row in set (0.02 sec)
mysql> SELECT TRIM(TRAILING "_" FROM "_Neo_");
+---------------------------------+
| TRIM(TRAILING "_" FROM "_Neo_") |
+---------------------------------+
| _Neo |
+---------------------------------+
1 row in set (0.00 sec)
Feeling paranoid? Remove all the underscore characters with the BOTH keyword:
mysql> SELECT TRIM(BOTH "_" FROM "_Neo_");
+-----------------------------+
| TRIM(BOTH "_" FROM "_Neo_") |
+-----------------------------+
| Neo |
+-----------------------------+
1 row in set (0.00 sec)
Search And Destroy
How about finding a particular substring within a longer string? No sweat!
mysql> SELECT LOCATE("he", "Developer Shed");
+--------------------------------+
| LOCATE("he", "Developer Shed") |
+--------------------------------+
| 12 |
+--------------------------------+
1 row in set (0.00 sec)
As you can see above, this LOCATE() function takes two parameters - the substring to look for, and the string in which to look. It returns the location of the first occurrence of the substring in our function.
Another variant of the LOCATE() function takes a third parameter: the position to start looking for the substring in question. Here’s a simple example:
mysql> SELECT LOCATE("friend", "A friend in need is a friend indeed", 10);
+-------------------------------------------------------------+
| LOCATE("friend", "A friend in need is a friend indeed", 10) |
+-------------------------------------------------------------+
| 23 |
+-------------------------------------------------------------+
1 row in set (0.00 sec)
As you can see, the function returned the location of the second occurrence of the substring “friend”, since it only started looking for it from the tenth character of the original string. Omit the third parameter and you will get the following:
mysql> SELECT LOCATE("friend", "A friend in need is a friend indeed");
+---------------------------------------------------------+
| LOCATE("friend", "A friend in need is a friend indeed") |
+---------------------------------------------------------+
| 3 |
+---------------------------------------------------------+
1 row in set (0.00 sec)
An alternative way of accomplishing the same thing is with the INSTR() function, whose purpose is similar to that of the LOCATE() function, except that the positions of the parameters are interchanged. Here’s an example:
mysql> SELECT INSTR("Developer Shed","he");
+------------------------------+
| INSTR("Developer Shed","he") |
+------------------------------+
| 12 |
+------------------------------+
1 row in set (0.00 sec)
In case you were wondering what return value the functions produce if the specified substring cannot be found, the answer’s both simple and logical: 0
Slice And Dice
You can chop strings in order to extract specific substrings from them as well - just use the LEFT() and RIGHT() functions, and tell MySQL how many characters you need, as in the example below:
mysql> SELECT LEFT("Developer Shed", 7);
+---------------------------+
| LEFT("Developer Shed", 7) |
+---------------------------+
| Develop |
+---------------------------+
1 row in set (0.00 sec)
mysql> SELECT RIGHT("Developer Shed", 4);
+----------------------------+
| RIGHT("Developer Shed", 4) |
+----------------------------+
| Shed |
+----------------------------+
1 row in set (0.00 sec)
Want more power? Make way for the undisputed heavyweight in this category, the SUBSTR() function. As the name implies, this function allows you to slice and dice strings into smaller strings. Here’s what it looks like:
SUBSTRING (str, pos)
where “str” is the string in question and “pos” represents the position to begin slicing at (note that in MySQL, the first character of a string is at position 1).
Here’s an example which demonstrates how this works:
mysql> SELECT SUBSTRING("unlucky", 3);
+-------------------------+
| SUBSTRING("unlucky", 3) |
+-------------------------+
| lucky |
+-------------------------+
1 row in set (0.01 sec)
You can also specify the length of the string to be extracted, by adding a third argument to the function call:
mysql> SELECT SUBSTRING("industry", 3, 4);
+-----------------------------+
| SUBSTRING("industry", 3, 4) |
+-----------------------------+
| dust |
+-----------------------------+
1 row in set (0.00 sec)
Cool, huh?
There’s also the FIND_IN_SET() function, which can be used to look for a specific element in a set. Consider the following example, which illustrates:
mysql> SELECT FIND_IN_SET("Spider Man", "Super Man,Hulk, Spider Man, Captain America");
+------------------------------------------------------------------------+
| FIND_IN_SET("Spider Man", "Super Man,Hulk,Spider Man,Captain America") |
+------------------------------------------------------------------------+
| 3 |
+------------------------------------------------------------------------+
1 row in set (0.00 sec)
As you can see, the FIND_IN_SET() function takes two parameters: the first is the string you are looking for and the second is the set of sub-strings to search in, separated by a comma. A point to keep in mind: as documented by the official MySQL manual, “this function will not work properly if the first argument contains a comma”.
Under The Influence
You can also perform search-replace operations within a string, with the REPLACE() function. Here’s an example:
mysql> SELECT REPLACE("yo ho ho and a bottle of rum", "o", "oo");
+----------------------------------------------------+
| REPLACE("yo ho ho and a bottle of rum", "o", "oo") |
+----------------------------------------------------+
| yoo hoo hoo and a boottle oof rum |
+----------------------------------------------------+
1 row in set (0.00 sec)
In case all this is a little too complicated for you in your inebriated state, the second and third arguments specify what to look for and what to replace it with.
And if you want to really test how drunk you are after that “boottle of rum”, try counting the number of times Santa says “ho ho ho” in the string shown below:
mysql> SELECT REPEAT("ho ho ho ", 12);
+--------------------------------------------------------------------------------------------------------------+
| REPEAT("ho ho ho ", 12)
|
+--------------------------------------------------------------------------------------------------------------+
| ho ho ho ho ho ho ho ho ho ho ho ho ho ho ho ho ho ho ho ho ho ho ho ho ho ho ho ho ho ho ho ho ho ho ho ho |
+--------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)
As you’ve seen, the REPEAT function allows you to specify the number of times the string should be repeated. It’s unlikely you’ll use this much in the real world, but it’s good to know it’s there!
Finally, the INSERT() function (not to be confused with the INSERT command that does something similar) allows you insert a substring within another string at a specified location. Look at this next example carefully to see what I mean:
mysql> SELECT INSERT("That's not fair",5,6," ain't");
+----------------------------------------+
| INSERT("That's not fair",5,6," ain't") |
+----------------------------------------+
| That ain't fair |
+----------------------------------------+
1 row in set (0.00 sec)
This function take four parameters: the original string value, the location at which to begin the surgical insertion, the number of characters of the original string that are to be replaced, and finally the substring to be inserted. Even though this looks a trifle confusing at first glance, playing around with it will give you a good idea of what it can do.
Just In Case
Finally, if you’re looking to perform a little cosmetic surgery on your strings, a good place to start is the LOWER() or LCASE() function, which converts a string to only use lower-case characters (there’s also a UPPER() or UCASE() function in case you change your mind later).
mysql> SELECT LOWER("NEO");
+--------------+
| LOWER("NEO") |
+--------------+
| neo |
+--------------+
1 row in set (0.00 sec)
and
mysql> SELECT UCASE("neo");
+--------------+
| UCASE("neo") |
+--------------+
| NEO |
+--------------+
1 row in set (0.00 sec)
Want to encode a string without much fuss? Just use the built-in REVERSE() function - try this out, and you have an easy-to-use (and, may I warn you, easy-to-break) encrypted string
mysql> SELECT REVERSE("password");
+---------------------+
| REVERSE("password") |
+---------------------+
| drowssap |
+---------------------+
1 row in set (0.00 sec)
You’ve already seen how to TRIM() a string to remove unwanted characters. Now, how about doing the opposite: padding a string with a special character?
mysql> SELECT LPAD("turkey", 10, "@");
+-------------------------+
| LPAD("turkey", 10, "@") |
+-------------------------+
| @@@@turkey |
+-------------------------+
1 row in set (0.00 sec)
mysql> SELECT RPAD("turkey", 10, "@");
+-------------------------+
| RPAD("turkey", 10, "@") |
+-------------------------+
| turkey@@@@ |
+-------------------------+
1 row in set (0.00 sec)
As you can see above, the LPAD() and RPAD() functions take three parameters -the string to pad, the length to which the string should be padded, and the character to use for padding.
It is important to note here that if the length of the string is greater than the length specified in the second argument to RPAD() or LPAD(), then the string will get truncated.
Finally, if you ever-wanted to know the ASCII representation of a character, but didn’t know whom to ask, the MySQL ASCII() function has the answers.
mysql> SELECT ASCII('a');
+------------+
| ASCII('a') |
+------------+
| 97 |
+------------+
1 row in set (0.00 sec)
Log Out
And that’s about it for the string functions in MySQL. I started with a quick explanation of the different string data types in MySQL: the CHAR, VARCHAR, TEXT and BLOB types. Each of these types has its own pros and cons and it’s important to factor them in at the time of database design, because once you have created your database, it’s a tough task to keep changing the properties of your table columns.
Then, I moved on to the functions that help you manipulate string data in the database. This included the CONCAT() functon and its close cousin, the CONCAT_WS() function, together with the TRIM() function that allows you to remove unwanted whitespace from around string values.
Next came the functions for searching within a string, starting with the plain-vanilla LOCATE() function that allows you to search for a particular substring within a string, and moving to more complex functions like the FIND_IN_SET() function that allow you to carry out more complex searches.
This was followed by the two Rs: the REPLACE() and REVERSE() functions. As their names suggest, these functions are useful for replacing segments within a string or for reversing string values. Finally, I closed things up with descriptions of functions that come in handy when formatting string values for display purposes, such as the ASCII(), LPAD(), RPAD(), UCASE() and LCASE() functions.
And that’s about it. I hope you enjoyed this article, and that it offered you some insight into the string functions at your disposal in MySQL. Till next time…stay healthy!
Note: Examples are illustrative only, and are not meant for a production environment. Melonfire provides no warranties or support for the source code described in this article. YMMV!
This article was first published on 18 Dec 2003.