Where desperate is just another word for a regular day.
A Quick INSERT INTO Trick
From time to time you’re going to need to move some data from one table into another, in particular to generalize or specialize code. I’ve seen amazingly large code blocks written to handle simple cases as this, and here’s a simple trick you’re going to need some day:
INSERT INTO
(field_1, field_2)
SELECT
some_field, some_other_field
FROM
You can also add conditionals, i.e. to retrieve rows that might have been inserted yesterday, etc:
INSERT INTO
(field_1, field_2)
SELECT
some_field, some_other_field
FROM
WHERE
condition = 1
The good thing about this is that all data movement is kept within the database server, so that the data doesn’t have to travel from the server, to the client and then back from the client to the server again. It’s blazingly fast compared to other methods. You can also do transformations with regular SQL functions in your SELECT statement, which should help you do very simple operations at the speed of light.
3 thoughts on “A Quick INSERT INTO Trick”
Mats,
This works well in general but may fall down in MySQL if either or both of the tables are quite large. This method is equivalent and is most times faster for big source/destination tables:
SELECT some_field, some_other_field
FROM
INTO OUTFILE ‘/tmp/mydata.csv’
LOAD DATA INFILE ‘/tmp/mydata.csv’ INTO
(field_1, field_2);
I know, it makes little sense from a programmer perspective yet if you dig in MySQL docs they admit it can be faster…. In my experience it’s a far faster way to dump and load big data than either the code above or using mysqldump!
Thanks for the comment! Didn’t know that going through an explicit temporary file would be faster, so thanks for the information. It seems reasonable when you think about it, as it leaves everything to mysqld itself — and possibly has an more effective code path than the INSERT INTO .. SELECT .. version.
I’d like to learn more precisely why and if Drizzle is better at INSERT-SELECT
Here’s another great technique for moving data between servers… much faster than mysqldump or the FEDERATED engine for large tables.
mysql -hlocalhost -e “SELECT some_field, some_other_field FROM table1 INTO OUTFILE ‘/tmp/mydata.csv’”
mysql -hremotehost -e “LOAD DATA LOCAL INFILE ‘/tmp/mydata.csv’ INTO table2 (field_1, field_2);”
I implemented an interesting use of views + federated engine to pull data from multiple DBs into a centralized server for more analysis. Works very well for modest tables… completely breaks down (speed) for large tables… so I use the above to push data every night to the central server for the big tables.
Mats,
This works well in general but may fall down in MySQL if either or both of the tables are quite large. This method is equivalent and is most times faster for big source/destination tables:
SELECT some_field, some_other_field
FROM
INTO OUTFILE ‘/tmp/mydata.csv’
LOAD DATA INFILE ‘/tmp/mydata.csv’ INTO
(field_1, field_2);
I know, it makes little sense from a programmer perspective yet if you dig in MySQL docs they admit it can be faster…. In my experience it’s a far faster way to dump and load big data than either the code above or using mysqldump!
Thanks for the comment! Didn’t know that going through an explicit temporary file would be faster, so thanks for the information. It seems reasonable when you think about it, as it leaves everything to mysqld itself — and possibly has an more effective code path than the INSERT INTO .. SELECT .. version.
I’d like to learn more precisely why and if Drizzle is better at INSERT-SELECT
Here’s another great technique for moving data between servers… much faster than mysqldump or the FEDERATED engine for large tables.
mysql -hlocalhost -e “SELECT some_field, some_other_field FROM table1 INTO OUTFILE ‘/tmp/mydata.csv’”
mysql -hremotehost -e “LOAD DATA LOCAL INFILE ‘/tmp/mydata.csv’ INTO table2 (field_1, field_2);”
I implemented an interesting use of views + federated engine to pull data from multiple DBs into a centralized server for more analysis. Works very well for modest tables… completely breaks down (speed) for large tables… so I use the above to push data every night to the central server for the big tables.