1. Making the web more readable for people with Dyslexia

    My wife and I recently found out that our 9 year old daughter has Dyslexia so lately I have done some research on solutions that could make her life easier.  That is how I stumbled upon OpenDyslexic, “a new open sourced font created to increase readability for readers with dyslexia”.  There is some science behind all this, which I will spare you the details of but when I showed it to my daughter, she said it was much easier to read for her, so I decided that all websites should use this font!  Since it’s too much work to call all web designers and demand that the make the web more readable with this font, I came up with something more practical.  I created a bookmarklet to apply the font, and a few other style changes that improve readability, to any website.  It injects some CSS into the current webpage which then gets applied to the current web page, giving me exactly what I need. The CSS looks like this:

    Pretty simple, it loads the Web Font from the defined URL into the browser (@font-face) and we then apply this to ALL elements on the page (font-family:opendyslexic !important;). Obviously this screws up some of the layout, so the bookmarklet is best applied on blog style websites or websites with lots of text, but it will work on any website. So go ahead, drag the bookmarklet to your bookmarks and apply it to your favorite website and let me know what you think.

  2. Build your own Siri Application … in the browser!

    I have emerged myself recently in Voice driven applications and was asked to knock up a quick prototype of something “that looks and acts like Siri”.  That’s a pretty tall order I though but after some research I came up with the following…

    The first problem we have to solve is Speech Recognition, i.e. convert the voice data into text.  The data would have to be streamed to a server which then performs the actual recognition and sends back a string of what it thinks you said.  That’s some complicated stuff right there.  Voice recognition is a science in itself and I also did not want to have to deal with the server setup.  Luckily for me, it turns out that Google has build all of this stuff into their Chrome browser already courtesy of the HTML 5 Speech Input API.  All you have to do is  add a special attribute to an <input> and it will allow users to simply

    "click on an icon and then speak into your computer’s microphone. The recorded audio is sent to speech servers for transcription, after which the text is typed out for you."

    Sounds about right to me, first problem solved!

    The second challenge is to extract meaningful information from the text to understand what the users wants you to do.  When the user says ”What is the weather forecast for tomorrow”, you have to figure out, from this string that the user … well … wants to see the weather forecast for tomorrow.  If this is the only case your application has to handle, it’s pretty easy:

    if utterance =~ /.*weather forecast.*/ig
      return “warm”
    else
      return “I don not know what you mean, try asking again (e.g. what is the weather forecast for tomorrow)”

    But also pretty useless.  

    Clearly you could not write a case statement big enough that could handle all possible scenarios or even a fairly limited scenario, e.g. what would happen if the user asks “Show me the forecast of the weather”, not to mention “Is is going to rain tomorrow?”.  You can see that this processing of natural language can get fairly complicated very quickly.  As it it turns out, this is another field of science (Natural Language Processing or NLP) that people much smarter than myself have worked on for decades.  One example of a website that uses NLP to answer questions is wolframalpha.  And guess who uses wolframalpha … that’s right: Siri.  So if it is good enough for Apple, it’s certainly good enough for my prototype so I registered for a developers licence with them and that was it (I suggest you do the same if you want to follow this article).  Now I just needed to hook up everything, I’m going to create a Rails application to do just that.

    It will be a very simple application with 1 page that has 1 form on it.  This form in turn will have 1 field on it that can be used by the user to “enter” their question.  To support voice entry, I will add the required attribute (“x-webkit-speech”) to this input field.  To further emphasis the fact that this is a voice driven application, I am going to style the input field:

    image -> image

    Using the following CSS:

    Furthermore, that same page will have an area that displays the data: what the user says and what wolframalpha returns as the answer.  We call this the stream and represent it as in ordered list, which gives us the following (using HAML):

    Incredibly simple!  The user presses on the microphone and starts talking.  When he stops talking, Google processes the voice data and returns a text representation (actual it returns several, ranked in decreasing order of “correctness”, we just always use the top result).  It inserts the text into the Text Field on which the voice was triggered, essentialy Chrome fills in the form for us with the transcribed voice data.  This is all handled by chrome, we do not have to do anything for this to work.  

    When the result comes back from google, chrome also raises a JS event that we can listen for.  We will use this to trigger an AJAX call to WolframAlpha, passing in the received text, i.e. we automatically submit the form to process_speech.  process_speech is a controller method that handles the call to WolframAlpha (I am using the Faraday gem).  When we receive an answer from WolframAlpha, we attach this to the stream (in coffeescript):

    And that is it really, some more CSS and more coffeescript to make it look pretty and you are good to go: Siri in the browser in less than 150 lines of code.  I haven’t had a chance yet to clean up the code so it’s not public yet on github, but here’s a video showing the end result.

  3. Tampering with Siri

    A few months ago, just after Siri was released on the world I got the uncontrollable urge to see what Siri could do for me.  I was planning to leverage the Siri APIs and slap some ruby code on it to scratch my itch.  But my enthusiasm was quickly doused when realizing that Apple didn’t release any public APIs and probably won’t be doing so for the foreseeable future.

    Siri Proxy to the rescue!  

    Siri Proxy is a proxy server for Siri, written in ruby.  It sits between apple’s servers and your iPhone, allowing you to intercept all traffic from and to, no jailbreak required.  It also has a handy plugin framework using standard ruby gems which means I could use this for my own nefarious purposes:

    image

    Setting up a Siri Proxy is well documented in the README of the git repo, including several YouTube videos so this is outside the scope of this post.  I will delve deeper into a plugin I wrote for Siri Proxy and what I learned in the process about Siri and a voice driven user experience.

    The plan is to control my Logitech Squeezebox Radio from Siri, using just my voice.  Given that the squeezebox server which controls the radio can be connected to with telnet and queried from the command line, this shouldn’t be to hard.  I will write a plugin for Siri Proxy that will intercept and listen for certain words and trigger calls to the radio.    On with the show.

    I first create an object that represents my radio.  It will allow me to connect and talk to my radio:

    The constructor connects to the default server and port of the radio (this can be configured) using the Telnet protocol.  Once instantiated, we can issue any method we want against the object.  These will get caught by the method_missing method which will pass the calls to the radio.  It takes the method name and parameters and passes these over Telnet to the radio.  This simple construct allows us to call any known squeezebox server command on the Squeezebox object with very little code.  As long as Squeezebox server understands the command, i.e. as long as the method we call on the object is a known command, this will work.  Now that we have this object, writing the actual plugin is peanuts.

    On initialization of the plugin framework, we initialize our Squeezebox object.  This connects us to the radio.  We then listen to certain commands that come from Siri and trigger the appropriate commands on the radio object, which in turn passes them to the radio itself which executes them.  The plugin only supports 3 different types of commands, radio on, off and playing music from a particular artist, but you get the gist of it.  It could easily be extended with many more commands, like forward, backwards, etc.

    So, what have we learned?  Well, it turns out that it is ridiculously easy to listen in on traffic from apple servers, borderline dangerous I’d say.  If you have control over the DNS server, you can listen in on ALL siri traffic.  You can also issue commands to the phone (e.g. send an SMS on his behalf) and the user would be completely unaware.  Fun for you, not so much for the user.  In an enterprise setting, this would obviously be completely unacceptable and I can therefor not recommend this approach to anybody trying to build a business around this idea (are you listening infor?).  However, for home use, this is a lot of fun and quite useful.

    As for User Experience, I think voice has a bright future but the example I created exposes the Achilles heel of the whole concept: understanding exactly what the user is trying to do.  ”Radio on” and “Radio off” are simple commands but even those could be expressed by the users in an infinite number of ways.  A polite user might ask “Could you please turn the radio off?”, a not so polite user might shout “Shut up!”.  Some defensive coding and clever regex’ing might help here, but you can easily see that as the vocabulary expands, this will become very difficult very quickly indeed.  What makes for a good User Experience is not the conversion of voice into text, but extracting intent from that text.  If you cannot do that, your application will fail, no matter how good it is at converting voice to text.  I will delve a bit deeper into this processing of (natural) language in a future post.

    Here is a video of the plugin in action on my own radio.

    All the code, including installation instructions, is available on github.

  4. Using JQuery-ui Datepicker with Rails

    Adding the jquery-ui datepicker widget to your rails views is simple enough, just add jquery-ui (js and css) to your application, add a text field to your form and then apply the datepicker method to your text field:

    However you will notice that this doesn’t actually save the date you picked to the database.  The problem is that the date format in the text field is incompatible with what rails is expecting.  There are actually many ways you can solve this issue, both on the client or on the server (e.g. with a virtual attribute) however I use a feature build into the datepicker widget, specifically for this reason called Alternate Field:

    Populate an alternate field with its own date format whenever a date is selected using the altField and altFormat options. This feature could be used to present a human-friendly date for user selection, while passing a more computer-friendly date through for further processing.

    For this to work you just add an extra, hidden form field to your view and adjust the datepicker call so it populates that hidden field with the correct formatted date:

    Note that I used the same name for both the input fields.  This works because rails seems to take the value of the last input (which in this case is the correctly formatted, hidden field) if there are matching names.  However it would probably be better if you use 2 different names and only have the hidden field point to the database field.

    Finally, you will notice that if you try to edit a date that was entered this way, it will show in your form view formatted incorrectly (i.e as YYYY-MM-DD).  This can be solved by simply adding a format to the value:

    And there you have it, a simple way to include jquery-ui datepicker in your rails application.

  5. email issues with EngineYard: Broken Pipe (Errno::EPIPE)

    I recently upgraded our Rails 2.x application (barrelrun.com) to Rails 3.1.  It went pretty well and everything worked smoothly on my local development instance however, when I deployed to our EngineYard servers I kept having problems with Action Mailer.  Every time an email was being send from the application (we use Devise and we send confirmation emails on registration) we would see the following error:

    Errno::EPIPE: Broken pipe

    and then a stack dump pointing to some issue in the mailer.  Unfortunately the above error was pretty useless to me, I didn’t even know how to debug it.  Turns out though that this error gets raised because there is an issue with the actual linux mail command that gets executed (by rails).  You can actually see this if you try to run the mail from the rails console:

    irb> User.find(1).send_confirmation_instructions

    This would raise the same error as from the actual rails application (which makes sense) with the exception of the following nugget:

    "sendmail: recipients with -t option not supported"

    This finally got me on the right track to debug the issue.  I tried to send email from the command line, not using Rails at all:

    sendmail -t foo.bar@example.com

    sendmail: recipients with -t option not supported

    And presto, that same error message comes up.  As it turns out, on EngineYard, sendmail is symlinked to ssmpt and ssmtp raises an error when you supply an email with the -t option.  -t is used to indicate that the recipients will be specified in the message itself (prefixed by either ‘To:’, ‘cc:’ or ‘bcc:’) and therefore there is no point in specifying the recipients on the actual command line.  However, the actual sendmail command will just ignore -t in that case but it seems that ssmtp raises an error.  

    It seems that Rails IS providing the email address as part of the command line and therefore ssmtp raises an error and Rails barfs. The solution is as simple as overwriting the default arguments provided by the mail gem. You do this in your environment files, e.g. in config/environments/development.rb:

    config.action_mailer.sendmail_settings = { 
    :arguments => ‘-i’ 
    }

    Sending emails started working after I added this.

    I suspect that this issue happens in Rails 3.x because the mailer has been changed.  In Rails 2.3 the -t option was not used, in Rails 3x it is being used by default, hence the error on EngineYard after I upgraded.