a very exciting project - data science toolkit is a collection of open services for the budding data scientist. currently there are 9 APIs with tools for locating/localizing and extracting/converting data. the API for /file2text, for example, will extract the plain text from an uploaded pdf, word doc, or excel file, and scanned image files.
best of all the source for setting up your own copy of this server is hosted on github. definitely looking forward to what can be done with the help of such a compilation of tools.