enouwee
Non ex transverso sed deorsum
- Reaction score
- 240
Let's do another little experiment.
I start out with my previous setup I've been using for 1 million entries.
1) The "words" consists of:
2) Now I create two additional tables, one being fixed-width, the other varchar-based:
3) On disk, we've got:
4) Let's add some data:
Note that I've had to filter out some rows because of duplicate values:
Thankfully the length is the same:
And so are the words inside:
5) How much disk space do the various tables use?
It shouldn't come as a surprise that the varchar(9) table uses the least space. I hope you're not too surprised that the constant-length table wastes quite a lot of bytes because the most common words are shorter than 9 characters. Nevertheless, it's funny to note that the original table with all and untruncated rows occupies less that the constant-length table.
What does this tell you? You should only use char(...) when your data is fixed length, like 14-digit coded dates: 20090120123000.
But I'm sure you'll tell us why the fixed-length strings will perfectly make sense in your case. After all, aren't you aiming at saving maybe 50-100KB on the words table when the additional join indexes you need already take up more than 10MB?
I start out with my previous setup I've been using for 1 million entries.
1) The "words" consists of:
Code:
mysql> desc test_words;
+-------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+--------------+------+-----+---------+-------+
| id | int(32) | NO | PRI | NULL | |
| word | varchar(255) | NO | UNI | NULL | |
+-------+--------------+------+-----+---------+-------+
mysql> select count(1) from test_words;
+----------+
| count(1) |
+----------+
| 9993 |
+----------+
mysql> select avg(length(word)) from test_words;
+-------------------+
| avg(length(word)) |
+-------------------+
| 6.3832 |
+-------------------+
1 row in set (0.01 sec)
2) Now I create two additional tables, one being fixed-width, the other varchar-based:
Code:
create table test_words_char (
id integer(32) primary key auto_increment,
word char(9) not null,
constraint unique test_words_char_u1(word)
);
create index test_words_char_word on test_words_char(word);
create table test_words_varchar (
id integer(32) primary key auto_increment,
word varchar(9) not null,
constraint unique test_words_varchar_u1(word)
);
create index test_words_varchar_word on test_words_varchar(word);
3) On disk, we've got:
Code:
12 test_words_char.frm
0 test_words_char.MYD
4 test_words_char.MYI
12 test_words.frm
196 test_words.MYD
324 test_words.MYI
12 test_words_varchar.frm
0 test_words_varchar.MYD
4 test_words_varchar.MYI
4) Let's add some data:
Note that I've had to filter out some rows because of duplicate values:
Code:
mysql> insert into test_words_char (word) select distinct left(word, 9) from test_words;
Query OK, 9777 rows affected (0.14 sec)
Records: 9777 Duplicates: 0 Warnings: 0
mysql> insert into test_words_varchar (word) select distinct left(word, 9) from test_words;
Query OK, 9777 rows affected (0.13 sec)
Records: 9777 Duplicates: 0 Warnings: 0
Thankfully the length is the same:
Code:
mysql> select avg(length(word)) from test_words_char;
+-------------------+
| avg(length(word)) |
+-------------------+
| 6.1663 |
+-------------------+
1 row in set (0.00 sec)
mysql> select avg(length(word)) from test_words_varchar;
+-------------------+
| avg(length(word)) |
+-------------------+
| 6.1663 |
+-------------------+
1 row in set (0.01 sec)
And so are the words inside:
Code:
mysql> select count(1) from test_words_char c where not exists (select 1 from test_words_varchar v where v.word = c.word);
+----------+
| count(1) |
+----------+
| 0 |
+----------+
1 row in set (0.09 sec)
5) How much disk space do the various tables use?
Code:
12 test_words_char.frm
320 test_words_char.MYD
256 test_words_char.MYI
[B]588 tota[/B]l
12 test_words_varchar.frm
192 test_words_varchar.MYD
256 test_words_varchar.MYI
[B]460 tota[/B]l
12 test_words.frm
196 test_words.MYD
324 test_words.MYI
[B]532 total[/B]
It shouldn't come as a surprise that the varchar(9) table uses the least space. I hope you're not too surprised that the constant-length table wastes quite a lot of bytes because the most common words are shorter than 9 characters. Nevertheless, it's funny to note that the original table with all and untruncated rows occupies less that the constant-length table.
What does this tell you? You should only use char(...) when your data is fixed length, like 14-digit coded dates: 20090120123000.
But I'm sure you'll tell us why the fixed-length strings will perfectly make sense in your case. After all, aren't you aiming at saving maybe 50-100KB on the words table when the additional join indexes you need already take up more than 10MB?