Monthly Archives: April 2010

Howto create “standardized” CSV exports from MySQL

When you start querying/analyzing/and wanting to export data from MySQL (especially anything free-form like SEO/SEM Keyword Data Extracts) you’ll quickly find out you can’t simply concatenate your columns together and join them with a delimiter to be read later.

Sampling some of our keywords illustrates a number of problems:

Quotes: ! 1/4″ inlet muffler

Newlines: xtra vision

headlight h4703xv hours

Control Characters: ^HToyota+starlet+coolant+temp+sensor, 242°/250° duration @ .050″ 601 lift

Commas/Tabs/Semi Colans or other typical delimiters: “+bosch, +part no., +67647″, 2500hd

chevy cat back exhaust

Unicode: ايسوزو 20011

ב.מ.וו 545i, 宝马m3冷却系统

The Original MySQL attempt to export this was:

SELECT
	`medium`,
	`keyword`,
	`landingpage`,
	SUM(`entrances`) as entrances
INTO OUTFILE 'keyword_data.csv'
	FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
	LINES TERMINATED BY '\r\n'
FROM
	`daily_medium_keyword_landingpage_visits`
WHERE
	medium = 'organic'
GROUP BY
	`medium`,
	`keyword`,
	`landingpage`;

This produced a really great file format, any crazy characters were escaped properly and it looked great on paper:

"organic","! 1/4\" inlet muffler","/cache.aspx?q=\"1+4\"+\"inlet+muffler\"&d=76309488474199&mkt=en-US&setlang=en-US&w=6d792683,38b10c05",1
"organic","!((^ toyota avalon mechanical lifter repairs","/auto-parts/toyota/engine-mechanical-valve-lifter",1
"organic","!(gl-508 rm) motorcraft","/auto-part/149373-motorcraft-gl508rm-remanufactured-alternator",1
"organic","!947 dodge power wagon","/auto-part/983439-mallory-ignition-wire-set-m11947",1
"organic","!967 ford mustang parts","/auto-parts/ford-mustang",1
"organic","!971 142 volvo specifications","/auto-parts/volvo-142/engine-electrical-distributor",1
"organic","!972 buick skylark heater core","/cars/1972-buick-skylark",1

I loaded up FasterCSV to start parsing the file and quickly figured out how naive I was about the CSV file formats. MySQL’s default use of \ as an escape character isn’t “standard.” It turns out there isn’t an official CSV standard but I did find something close: http://tools.ietf.org/html/rfc4180#section-2

Hmm so effectively none of the control characters should be escaped, but fields should be quoted if they have a delimeter in them, and a quote in the field should be replaced with “” to be a CSV Standard.

The other wrinkle is the standard calls for \r\n to be the line delimeter, but what happens if there are other \r\n in the columns?

First Step – Remove Escape Characters:

	FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' ESCAPED BY ''

By adding the ESCAPED BY ”, none of the characters will be escaped in the output

Step Two – Replace \r\n with \n:

	REPLACE(`keyword`, "\r\n", "\n")

Step Three – Replace ” with “”

	REPLACE(REPLACE(`keyword`, "\r\n", "\n"), '"', '""')

Step Four – Replace NULL-Byte with a blank

	REPLACE(REPLACE(REPLACE(`keyword`, "\r\n", "\n"), '"', '""'), ""), char(0), "")

Step Five – Replace NULL field with a blank

	IFNULL(REPLACE(REPLACE(REPLACE(`keyword`, "\r\n", "\n"), '"', '""'), ""), char(0), ""), "")

Putting it all together:


SELECT
 IFNULL(REPLACE(REPLACE(REPLACE(`medium`, "\r\n", "\n"), '"', '""'), char(0), ""), "") as medium,
 IFNULL(REPLACE(REPLACE(REPLACE(`keyword`, "\r\n", "\n"), '"', '""'), char(0), ""), "") as keyword,
 IFNULL(REPLACE(REPLACE(REPLACE(`landingpage`, "\r\n", "\n"), '"', '""'), char(0), ""), "") as landingpage,
 SUM(IFNULL(`entrances`, 0)) as entrances
INTO OUTFILE 'keyword_data_new.csv'
 FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' ESCAPED BY ''
 LINES TERMINATED BY '\r\n'
FROM `daily_medium_keyword_landingpage_visits`
WHERE medium = 'organic'
GROUP BY
 IFNULL(REPLACE(REPLACE(REPLACE(`medium`, "\r\n", "\n"), '"', '""'), char(0), ""), ""),
 IFNULL(REPLACE(REPLACE(REPLACE(`keyword`, "\r\n", "\n"), '"', '""'), char(0), ""), ""),
 IFNULL(REPLACE(REPLACE(REPLACE(`landingpage`, "\r\n", "\n"), '"', '""'), char(0), ""), "");
Advertisement

Put Google AdSense Into the Corner, Let your page render first using jQuery

First off, if you haven’t installed both Yahoo YSlow! and Google Page Speed then bookmark this page and come back to it later. Chances are you have bigger fish to fry on the rest of your website.

Now if you read my post on AB Testing with Google Website Optimizer, you probably caught on that <script type=”text/javascript”> tags block rendering of content on your pages.

Now that you’re using these tools, your noticing some of these AdSense calls are taking 300ms to render – sometimes multiple SECONDS…wait but when I combine those two facts that means my content below the ads isn’t rendering yet…

WTF Google!

The good news, is, it’s possible to unblock the rendering of your page’s content. The bad news, is, it involves repositioning the ads after the page has been loaded (a little bit of a flicker).

Solution:

The mechanics of the solution involve three pieces:

  1. A placeholder <div> where the ad should be displayed
  2. Shoving the actual ad elements as one of the last items in your HTML document (so everything else gets rendered first)
  3. jQuery code to reposition the ad back into its placeholder slot

Inline Content:

The container for your ads should be pretty straightforward, you know what size ad unit you’re including so block out the section of your page:

   <div>
      <div id="fmad_placeholder_Detail_Page_Bottom" style="height: 90px; line-height: 90px;">&nbsp;</div>
   </div>

Bottom Page Content:

Your moving div that actually calls AdSense/Ad Manager code:

<div id="fmads_Detail_Page_Bottom" style="display: none; text-align:center;">
  <script type="text/javascript">
   //<![CDATA[
     GA_googleFillSlot("Detail_Page_Bottom");
   //]]>
   </script>
</div>

jQuery Magic to reposition your ads back to where they should be:

<script type="text/javascript">
//<![CDATA[
  var fmPositionGoogleAdsInit = false;
  function fmPositionGoogleAd(ad_name) {
    var fun = (function () {
      var placeholder = $('#fmad_placeholder_'+ad_name);
      var ad = $('#fmads_'+ad_name);

      if(placeholder && ad) {
        var pos = placeholder.offset();
        ad.css({
          "display": "block",
          "position": "absolute",
          "left": pos.left+"px",
          "top": pos.top+"px",
          "width": placeholder.width()
        });
      }
    });

    if(!fmPositionGoogleAdsInit) {
      $(window).resize(fun);
      $(document).ready(fun);
    }
    fun();

    fmPositionGoogleAdsInit = true;
  };

  function fmPositionGoogleAds() {
    // List all of your ad slots
    fmPositionGoogleAd('Detail_Page_Bottom');
  };

  fmPositionGoogleAds();
//]]>
</script>

Gotchas:

Now, if you have any other dynamic content on the pages, you may have to manually reposition the ads again.  This is simply done by calling:

fmPositionGoogleAds();

Hope that helps!

Ruby Script to Extract Google Analytics Keywords, Landing Page, Medium Entrances to CSV/TSV format

I’ve had versions of this script hacked together before.  In fact, if you search for “Google Analytics Data Extractor” the #1 result is a google groups posting I made in a former life for a C# program I developed: http://groups.google.com/group/analytics-help-misc/browse_thread/thread/d2ad6ddf3d73e511

There’s now a couple of ruby gems out there for handling the Google Analytics API

One of them I’ve been doing some light-contributing to, including the pagination support (automatically grab all results in 10,000 result chunks)

I pushed a couple of changes up today at: http://github.com/er1c/gattica that hopefully will get merged into the main trunk soon (http://github.com/activenetwork/gattica)

One of the more useful uses of the library is to extract your Organic Search Results Keywords from Google Analytics. We’re using this at Frugal Mechanic to see how people search for Auto Parts.

require 'rubygems'
require 'gattica'
require 'fastercsv'

ga_profile = "" #Enter your Profile Here
start_date = Date.new(2009, 1, 1)
end_date = Date.new(2009, 12, 31)
file_path = "data" # Directory Needs to exist

puts "Google Username: "
u = gets.chomp
raise "bad username" unless !u.empty?
puts "Google Password: "
system "stty -echo"
pw = gets.chomp
system "stty echo"
raise "bad password" unless !pw.empty?

class ExtractKeywords
  def initialize(email,password)
    @gs = Gattica.new({:email => email, :password => password})
  end

  def get_accounts
    results = []
    @gs.accounts.each{|account|
      profile = {}
      profile[:site_title] = account.title
      profile[:profile_id] = account.profile_id ## this is the id required for requests to the API
      profile[:account_name] = account.account_name
      profile[:account_id] = account.account_id
      profile[:web_property_id] = account.web_property_id
      results << profile     }     return results   end   def connect_profile(profile)     @gs.profile_id = profile[:profile_id]   end   def connect_profile_id(profile_id)     @gs.profile_id = profile_id   end   def get_keywords(start_date = nil, end_date = Date.today)     results = []     csv_data = @gs.get({         :start_date => (start_date || (end_date - 365)).to_s,
        :end_date => end_date.to_s,
        :dimensions => ["medium", "keyword", "landingPagePath"],
        :metrics => "entrances",
        :sort => "-entrances",
        :page => true}).to_csv(:long)

    return FasterCSV.parse(csv_data, :headers => true)
  end

end

gs = ExtractKeywords.new(u, pw)
gs.connect_profile_id(ga_profile)

(start_date .. end_date).each { |date|
  file = "#{file_path}/medium_keyword_landingpage_visits_#{date.strftime('%Y-%m-%d')}.csv"
  next if File::exists?( file )

  FasterCSV.open(file, "w") do |csv|
    csv << ["medium", "keyword", "landingPagePath", "entrances"]
    keywords = gs.get_keywords(date, date)
    keywords.each{|row|
      csv << [row["medium"], row["keyword"], row["landingPagePath"], row["entrances"]]
    }

  end
}