Google Docs to clean html, good for WordPress posts, emails

WARNING! It appears Google Docs has changed and this script no longer works. 

Google docs is a great platform to write documents, especially when you compare it with the WordPress editor.  It would be good to have a clean way to export a Google doc to a wordpress post or generate nice looking emails. If you copy and paste a Google doc into a WordPress post, it loses many formatting and produces a bloated html with lots of inline style, CSS classes that do not go well with WordPress. So, here’s a solution that will generate a clean HTML from a Google Doc and email it to you so that you can copy and paste it a WordPress post or send to others via email.

For example, here’s a Google Doc:

GoogleDocScreenshot

Once you run the script, it will produce a nice clean email for you:

GoogleDocsEmail

Here’s how to do it:

  1. Open your Google Doc and go to Tools menu, select Script Editor. You should see a new window open with a nice code editor.
  2. Copy and paste the code from here: GoogleDocs2Html
  3. Then from the “Select Editor” menu, choose ConvertGoogleDocToCleanHtml
  4. Click the play button to run the script.
  5. You will get an email containing the HTML output of the Google Doc with inline images.
  6. You can easily forward that email to anyone or copy and paste in a WordPress post.

Here’s how the code works:

First it will loop through the elements (paragraph, images, lists) in the body:

function ConvertGoogleDocToCleanHtml() {
  var body = DocumentApp.getActiveDocument().getBody();
  var numChildren = body.getNumChildren();
  var output = [];
  var images = [];
  var listCounters = {};

  // Walk through all the child elements of the body.
  for (var i = 0; i < numChildren; i++) {
    var child = body.getChild(i);
    output.push(processItem(child, listCounters, images));
  }

  var html = output.join('\r');
  emailHtml(html, images);
  //createDocumentForHtml(html, images);
}

The processItem function takes care of generating proper html output from a Doc Element. The code for this function is long as it handles Paragraph, Text block, Image, Lists. Best to read through the code to see how it works. When the proper html is generated and the images are discovered, the emailHtml function generates a nice html email, with inline images and sends to your Gmail account:

function emailHtml(html, images) {
  var inlineImages = {};
  for (var j=0; j<images.length; j++) {
    inlineImages[[images[j].name]] = images[j].blob;
  }

  var name = DocumentApp.getActiveDocument().getName()+".html";

  MailApp.sendEmail({
     to: Session.getActiveUser().getEmail(),
     subject: name,
     htmlBody: html,
     inlineImages: inlineImages
   });
}

Enjoy!

Remember images in the email are inline images. If you copy and paste into WordPress, the images won’t get automatically uploaded to WordPress. You will have to manually download and upload each image to WordPress. It’s a pain. But that’s the problem with WordPress editor.

Special thanks to this GitHub project, that gave me many ideas: https://github.com/mangini/gdocs2md