Turn URLs and Webpages Into PDFs In Your Dropbox

I stumble across a lot of interesting webpages on a daily basis. Sometimes it's a video I want to watch later; sometimes it's an article I don't have time to read right away. Other times, I find a webpage that I want to keep around for future reference. For me, there's a difference between articles to read later and reference material: whereas a new item added to Instapaper has a short life span in terms of attention (read, share, archive), a webpage I want to keep around forever needs to be turned into a document I can read anywhere, highlight, annotate, and carry around between platforms and devices. For that, I like PDFs.

I keep a "PDFs" folder in my Dropbox that contains all the documents I check upon regularly for work and personal purposes. They can be eBooks, tutorials, or guidelines from Apple that are essential to my writing online. Thanks to the increasing support for cloud services in apps like PDF Expert, GoodReader, and iAnnotate, I can keep a single copy of a PDF in my Dropbox, use the app I want to annotate the document with, and forget about duplicates thanks to sync. Furthermore, I'm fairly sure that, due to their popularity, PDFs will still be readable and supported 20 years from now, so I don't have to worry about data preservation and file formats.

Lately, I have become obsessed with turning longer articles I find on the Internet also into PDFs for long-term archival. For as much as I like Instapaper, I can't be sure that the service will be around in the next decades, and I don't want my archive of longform and quality content to be lost in the cloud. So I have come up with a way to combine Instapaper with the benefit of PDFs, Dropbox, and automation to generate documents off any link or webpage, from any device, within seconds.

(Disclaimer: what follows is an explanation of a hack I created for personal use. It uses publicly available tools and apps to fill a personal need. You shouldn't create PDFs off websites and redistribute them -- you should support the sites you read instead).

In short, I use the Instapaper Text bookmarklet to fetch a webpage's text and images (while preserving hyperlinks and great typography) and I convert the resulting page to PDF using wkpdf. Created by Christian Plessl, wkpdf is a command line tool that uses WebKit and RubyCocoa for rendering HTML content to PDF. Since wkpdf uses WebKit's HTML rendering, it can generate good-looking PDFs that maintain most CSS2 and CSS3 stylings and properties. I have tried another command line tool for file conversion, Pandoc, but I like wkpdf better for straight HTML to PDF conversion.

Basic Stuff

Optional Stuff

I have come up with two separate ways to turn URLs or webpages into PDFs: one relies mostly on built-in apps for iOS and OS X, the other adds some third-party apps for extra convenience. However, both require the installation of wkpdf; you can follow the (simple) instructions here to get wkpdf up and running on your Mac. Similarly, while you could beam a link or .html file from iOS to a Mac using email, I highly recommend that you use Dropbox, if anything at least to save the output file. I created this collection of hacks and tools to generate PDFs off any device (mainly iOS) and let the Mac do the processing in the background (in fact, I use my Mac mini server for this), but everything is highly customizable. You can decide to avoid using third-party apps, as I'll explain in a bit, and you don't necessarily need a Mac server to accomplish this (in fact, you can just use your personal Mac, as long as it's running), but this is how I did it.

Send A URL to Mail.app, Have It Converted to PDF

If you don't like using third-party apps, you can rely on built-in (and free) tools to do the file conversion for you. The great thing about wkpdf is that it can receive URLs and fetch their web content in the background, then convert it to PDF. In this way, you're able to simply send a URL to your Mac, and receive a PDF in your Dropbox seconds later. But in order to do this, you have to find a way for Mac to see the URL, and run the necessary shell script that includes your link within the wkpdf command for file conversion. To do this without third-party apps, you can use Mail rules and AppleScript.

First off, find an article you like and run it through the Instapaper bookmarklet. Instapaper's parser will fetch multiple pages (if any) and keep text and images. You don't have to use the bookmarklet -- you can convert any webpage -- but I use it because it strips out unnecessarily elements that I don't need in my reference documents.

You'll then need to email the URL to an account configured in Mail.app on your Mac, and create a rule that runs an AppleScript for the file conversion. As you can see in the screenshot above, I have created a rule that looks for any message from my accounts that contain "viticci" in the name (so I can use the one I like), archives the message, and runs an AppleScript to turn the link in the message into a PDF file in Dropbox.

The AppleScript is as follows:

(*
Federico Viticci, MacStories
August 2012, Version 1.0

Based on michaellindahl's script: http://stackoverflow.com/questions/4642206/applescript-question-copy-email-contents-run-automator-app

and inspired by MacDrifter's append script: http://www.macdrifter.com/2012/07/append-to-dropbox-note-with-drafts-app.html

More information here: http://www.macstories.net/?p=30861

*)

using terms from application "Mail"
	on perform mail action with messages matchmsgs for rule mailrule
		tell application "Mail"
			set msg to item 1 of matchmsgs
			set msgcontent to (content of msg) as Unicode text
		end tell
		set the clipboard to msgcontent
		(do shell script "pbpaste | textutil -convert txt -stdin -stdout -encoding UTF-8 | pbcopy
echo -n >> ~/Dropbox/Apps/InstapaperLinks.txt
pbpaste >> ~/Dropbox/Apps/InstapaperLinks.txt
wkpdf --source `tail -1 ~/Dropbox/Apps/InstapaperLinks.txt` --output ~/Dropbox/Apps/page.pdf")
	end perform mail action with messages
end using terms from

Based on michaellindahl's AppleScript to copy the contents of a message that triggered a rule, the script appends the link to a text file (thanks, Gabe), then an embedded shell script takes care of reading the last line of the text file (through tail) and converts the URL to a PDF.

You can of course change the specified locations to your liking, and select the default output name you prefer (I chose "page.pdf"). I went with the text file trick because it gives me the benefit of keeping an archive of the URLs I converted, and because of the many apps that support appending text these days. Also in the script, textutil takes care of making sure the link is encoded in UTF-8 so I won't end up with a corrupted text file in the future.

The script has been tested on both Lion and Mountain Lion. In Mountain Lion's Mail.app, you'll have to place the script in Mail's own container inside Application Scripts, whereas Lion can load any script from your filesystem (hello, Sandboxing). If you try to run the shell script portion manually in Mountain Lion's Terminal, you may receive an error similar to this -- but the PDF will still be generated.

Send A URL to Mail app, Use Keyboard Maestro To Run Shell Script

A variation of the method described above features a simpler AppleScript that makes Keyboard Maestro run the shell script for PDF conversion. Because Keyboard Maestro macros are scriptable, you can put together an AppleScript that doesn't include the shell portion, but leverages an external macro to do the job.

Obviously, Keyboard Maestro Engine will have to be running to execute the shell script.

Append Text, Use Hazel To Generate PDFs

Another way to turn links into PDFs involves skipping the Mail part and using a dedicated iOS app to append text to the text file we're using in Dropbox. For this, I recommend Agile Tortoise's Drafts or Karbon's Scratch: both apps feature an "append to Dropbox" feature, but only Drafts has an iPad version of the app.

Once the text is appended, you'll need a utility on your Mac that monitors a folder for changes to files. I recommend Hazel, a fantastic app by Noodlesoft that can look for changes in any folder and perform actions as a result. As you can imagine, we'll let Hazel monitor the text file and convert the last line to PDF using wkpdf.

As you can see, we're simply using a different way to append URLs to the text file and a different app to execute the shell script. The result, however, is the same.

Update: Thanks to John Voorhees on Twitter, here's a nice way to bypass Mail.app to append text to a Dropbox file using the free IFTTT service.

Simply send an email to trigger@ifttt.com using your IFTTT account address including the URL in the body; then, create a recipe that appends the body to your text file, monitored by Hazel. It works like Drafts and Scratch, only it goes through IFTTT, which in my tests took less than a minute to append text to a file.

You can also set up a quick action in Launch Center Pro to email IFTTT with a link in your clipboard, like I did.

Here's my recipe as an example.

Convert .html Files to PDF, Use iCab and GoodReader

A great perk of using wkpdf is that this command line tool can also convert .html files to PDF -- not just plain URLs. And because on my iOS devices I use both iCab and GoodReader -- which have some neat integration to exchange webpages as .html files -- I have also created a way to upload .html webpages to my Mac and have them turned to PDF automatically. This is a slower process than just sending off a URL, but it's also "safer" in a way -- as there's always a chance some characters in a URL won't get encoded properly in the email. If you can see the .html file correctly, wkpdf will convert it.

In iCab, use the GoodReader module to open a webpage in the GoodReader app. Don't use the Save Page feature, as that will create a .webarchive file, which we don't need. In GoodReader, you'll end up with a "text.html" file, which you can upload to your preferred Dropbox folder for conversion.

Thanks to GoodReader's sync functionality, it's easy to move files to a synced folder and hit a button to upload/download the latest contents (if you know of other apps that can generate .html files off webpages and upload them to Dropbox, feel free to use those -- I just like iCab and GoodReader).

Once again, we'll use Hazel to look for new .html files uploaded to Dropbox. Thanks to MVasilakis' shell script, Hazel can execute the file conversion to PDF with the "Run shell script" action, as pictured above.

DIY PDFs

I am no AppleScript or shell expert by any chance -- I just like to tinker and happen to have some basic knowledge to understand what's going on with command line tools. I have found this workflow to be an efficient solution for me, but I'm curious to see what others will come up with.

Feel free to play around with the code and tips above, make modifications, and send suggestions for improvement. You can find me on Twitter or App.net.

Update 10/15

Thanks to Alessandro Di Nardo's suggestion, I have created a modified version of the AppleScript to use the email's subject as name for PDF.

using terms from application "Mail"
	on perform mail action with messages matchmsgs for rule mailrule
		tell application "Mail"
			set msg to item 1 of matchmsgs
			set msgcontent to (content of msg) as Unicode text
			set theTitle to (subject of msg)
		end tell
		set the clipboard to msgcontent
		set myName to theTitle
		(do shell script "pbpaste | textutil -convert txt -stdin -stdout -encoding UTF-8 | pbcopy
echo
>> ~/Dropbox/InstapaperLinks/InstapaperLinks.txt
pbpaste >> ~/Dropbox/InstapaperLinks/InstapaperLinks.txt
wkpdf --source `tail -1 ~/Dropbox/InstapaperLinks/InstapaperLinks.txt` --output ~/Dropbox/InstapaperLinks/page.pdf")
		tell application "Finder"
			set folderPath to (path to home folder) & "Dropbox:InstapaperLinks" as text
			set folderPath to folderPath as alias
			set the name of file "page.pdf" in folder folderPath to myName & ".pdf"
		end tell
	end perform mail action with messages
end using terms from
delay 5