ideacode has consistently displayed excellent coordination to implement our highly detailed requirements. Their communication on all projects has been outstanding.

919-341-5170

help@ideacode.com

 

Blazing a trail to better facilities management.

MySQL CSV load-fu

Here's the setup.  I have a MySQL table of existing data and a CSV file holding new data that I want to import.  The file and database are not on the same server.  The file has not been cleaned up.  I expect this to be an ad-hoc, one off operation.

What I need is some way to put the data in the table and clean it up.  If this were to be done on a regular basis, I'd write some production code.  But since this is just some throw away code, I want to be as simple and direct as possible.  Fortunately, the MySQL LOAD DATA statement is industrial strength and can get me where I need to be.

CREATE TEMPORARY TABLE ImportTemp LIKE FinalDestination;

   LOAD DATA
LOCAL INFILE '/path/to/file.csv'
  INTO TABLE ImportTemp
      FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' ESCAPED BY '\\' 
       LINES TERMINATED BY '\n' STARTING BY '' IGNORE 0 LINES
             (number, notebook, @proj, @recv, description)
         SET dateReceived = COALESCE(STR_TO_DATE(@recv, '%c/%e/%Y'), ''), 
             projectID = (SELECT CASE WHEN '' = @proj THEN '' ELSE GROUP_CONCAT(ID) END FROM Projects WHERE number=@proj GROUP BY number)
;

DELETE FROM ImportTemp WHERE '' IN (projectID,dateReceived);

INSERT INTO FinalDestination SELECT * FROM ImportTemp;

DROP TEMPORARY TABLE ImportTemp;

The procedure is straight forward:

  1. Create a temporary table that looks exactly like the final table
  2. Use LOAD DATA to pull in the information, massaging as much as possible
  3. Do any remaining clean-up
  4. Copy imported data into final table
  5. Cleanup

The magic happens in step 2, LOAD DATA.  Pay attention to the four major parts of that statement:

  • Signaling that the file is on the client, the same machine where you're running the command.  That's the "LOCAL" noun.  Without that, MySQL looks for the file on the server.
  • Describing the format of the files, which is everything following "FIELDS" and "LINES".  You say how fields are separated, how they're enclosed and escaped.  You say how lines are separated, what they begin with, and number to skip.  I like to specify these explicitly, so that I remember what my options are when the files have different formats.
  • Mapping the columns in the file to columns in the table.  That's the part inside the parentheses.  Read it left to right.  You can have either a column name or a user variable, preceded with an at-sign.  In this example, I have 3 direct mappings and 2 variable mappings (@proj, @recv).
  • Massaging data.  This is where you can wield serious SQL-fu.  MySQL fills the user variable with the corresponding column from the file.  Then you can use the SET noun to update a column after you have fiddled with the value in that variable.

Understanding this SQL-fu

So in this example, MySQL stuffs two columns into two user variables for every row: @proj and @recv.  I convert @proj into "projectID" by running a lookup with a sub-query.  I'm careful to handle edge cases, like when @proj is empty (that's the "CASE" part) or when there are multiple matches (GROUP BY).  I convert @recv into "dateReceived" by parsing it into a date.  Again, I'm careful to handle edge cases like the date doesn't format.

The subsequent DELETE handles the edge cases the import compensated for.  DELETE is just one possible response.  I could run a COUNT(*) and consider anything more than zero an error and stop.  How you handle it is up to you, but this is your final opportunity to modify the incoming data.

Trackback URL for this post:

http://ideacode.com/trackback/63

Post new comment

The content of this field is kept private and will not be shown publicly.
  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Syntax highlight code surrounded by the {syntaxhighlighter SPEC}...{/syntaxhighlighter} tags, where SPEC is a Syntaxhighlighter options string or "class="OPTIONS" title="the title".
  • Lines and paragraphs break automatically.

More information about formatting options

CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.
Image CAPTCHA
Enter the characters shown in the image.