DEVONthink: the research assistant you’ve been looking for?

I’ve written before about personal information management: why it’s important for everyone—not a subset of ‘power users’—and how to evaluate information management systems.

In short, we simply deal with too much information every day to deal with it all. What’s more, we should only have to dig for a given piece of information once; a good information management system should facilitate easy retrieval the second time around.

“Everything buckets”, as one category of information managers are called, are seemingly everywhere. What’s astonishing is that even in 2010, almost none of these performs more than the most rudimentary information retrieval functions. In general, even with these “smart tools”, the onus remains on the user to a) do a thorough job classifying and organizing his or her information, and b) to know exactly what terms to search for when seeking said information. Except, that is, for DEVONthink.

On its face, DEVONthink is a versatile database that can store and retrieve just about any type of data available: PDFs, web clippings, emails, MS Office documents, bookmarks, multimedia, RSS feeds, etc. At this level, it’s similar to (though, to my knowledge, more robust than) a number of related products. The real value comes in the content analysis functions that are applied to everything you throw at it.

Demystifying DEVONthink’s AI

It’s the “artificial intelligence” features of DEVONthink that really set it apart from the crowd of personal information managers. (I put “artificial intelligence” in quotes because DEVONthink’s brain owes more intellectual debt to the work of information retrieval than machine learning.)

While other information managers hold to an archaic notion of binary relevance (either a thing matches your query terms or it doesn’t), DEVONthink incorporates much more nuance into its reckoning.

In fact, it can treat entire documents as search queries, a feature that seems useless until it almost magically reveals documents related to the one you’re looking at, or offers to automatically file it into the right folder. (This function—”automatic class management” in information retrieval-speak—is invaluable in the paperless office: should you choose, DEVONthink files all your bills away with a single keystroke.)

In short, DEVONthink takes an entire class of advanced tools otherwise restricted to researchers and search engines and unleashes it on your personal data set.

No wonder Steven Berlin Johnson raved about DEVONthink in 1995. No wonder he still uses it today.

Information Capture

As I mentioned, DEVONthink can handle any document type you can give it. If it’s a file, DEVONthink can store it. If the file is searchable with Spotlight, DEVONthink can perform smart analysis on it. Even non-traditional document types (RSS feed items and mail messages, for example) are fair game, and it’s scriptable for those fringe use cases the folks at DEVONtechnologies haven’t thought of (my NetNewsWire-to-DEVONthink script is one).

Information Retrieval

Perhaps the most unusual feature of DEVONthink, the “See Also” bar displays a rank-weighted list of documents related to the current one. By surfacing documents you may not have thought as relevant, this can facilitate serendipity in research.

As an anecdotal example, for this document on GM potatoes, DEVONthink returns a number of related articles I’ve saved—including one on Peruvian potato farmer, another document on how genetic modification is transforming agriculture in Europe, and one on a certain incident in which Pringles are ruled as potatoes.

SeeAlso.png

Another example: the previously pictured article on a fatherless baby shark is suggested as a candidate for my folder on Slaughter-house Five notes. No link was immediately apparent, so I glanced through those notes and found the following quote about the seven Earthling sexes:

There were five sexes on Tralfamadore, each of them performing a step necessary in the creation of a new individual. They looked identical to Billy—because their sex differences were all in the fourth dimension…

While this serendipitous insight may be of limited academic value, I can say with reasonable assurance that I wouldn’t have thought of the Tralfamadorians while investigating virgin births in the local shark population. But I’d be hard pressed to say it’s not relevant, so I’ll chalk it up as useful.

Search

Sometimes you need a precise match for your search query. DEVONthink can also accommodate those needs through advanced search operations:

  • Strict vs fuzzy search (fuzzy search returns near-misspellings, word variants, etc)
  • Regex-style wildcards
  • Boolean operators (e.g., a AND b; a XOR b; NOT b)
  • a NEAR b
  • a BEFORE b
  • etc

Conclusion

In 2010, I am amazed at two things: first, how useful DEVONthink’s smart features can be in real-life scenarios; and second, that no one has begun approaching DEVONthink’s usefulness even though it’s been on the market since 2002.

If you haven’t used DEVONthink before, take some time to try the free demo. In the worst case, you haven’t lost a thing (unlike Evernote, DEVONthink never holds your data hostage in proprietary databases). But in the (more likely) best case, you’ve gained a really, really smart research assistant.

Congratulations, DEVONthink. You deserve these accolades.


Disclosure: in celebration of its birthday, DEVONthink is offering some incentives to users who contribute to the discourse around its product offering. This served as a motivation for the timing of this post, not the content. I’ve actually been intending to write about DEVONthink since before I published my original thoughts on personal information management (in 2008), and more recently since I entered into a Master’s program in information science.

WorldCat library search bookmarklet

For users of WorldCat or Melvyl (their branded search for the UC Berkeley library), the following bookmarklets should come in handy. They do one thing and one thing only: search WorldCat for the selected text.

To install: drag one of the following bookmarks to your bookmarks bar. (You may want to rename it.)

Search on WorldCatSearch on Melvyl

To use: Select some text on a web page, then activate the bookmark to search. That’s it.

Today and Tomorrow (OmniFocus scripts)

11 July 2011: as described here, I’ve switched to a Start-based workflow and updated my scripts to reflect this change. By default, these scripts now set the start dates of selected items, not due dates—though you can still switch to “Due mode”. This post has been updated to reflect these changes.

I’ve added two more scripts to my OmniFocus repertoire: Today and Tomorrow.

As one might expect, Today sets the “Action Date” of selected item(s) to the current date, and Tomorrow sets the action date to the next date. (By default, the Action date is the Start date, but you can switch to use the Due date if you prefer.)

Why might you need this? A few days of ignoring OmniFocus is enough to make any date-sorted view overwhelming. My Defer script is one method to deal with these items: defer them by a day, a week, etc. But sometimes you just need to set these items to today. Or tomorrow.

As with Defer, these scripts work with any number of selected tasks.

If you use the default “Start” mode:

  • The Start date of each selected item is set to the current day
    • If an item has a previously assigned Start date, its original time is maintained. Otherwise, the start time is set to 6am (configurable in the script)

If you use “Due” mode:

  • The Due date of each selected item is set to the current day
    • If an item has a previously assigned Due date, its original due time is maintained. Otherwise, the due time is set to 5pm (configurable in the script)
  • If an item has a Start date, it is moved forward by the same number of days as the due date has to move (in order to respect parameters of repeating actions)

Putting it all together

I’ve set my keyboard shortcuts for Defer, Snooze, Today, Tomorrow, and This Weekend to ctrl-d, ctrl-z, ctrl-t, ctrl-y, and ctrl-w, respectively (using FastScripts), so shuffling tasks couldn’t be easier. Use cases:

Catching up after holiday: Select all overdue tasks, hit ctrl-t to bring them current. Then snooze or defer the ones you won’t get to today.

Planning today’s tasks: Select your tasks and ctrl-t them into the day’s queue. Planning tomorrow? Use ctrl-y instead.

Download them here

 

 


Thanks to Seth Landsman for his role in inspiring my Today script. His version is very similar but doesn’t quite match the defer logic I need.

Usage note: some items inherit due dates from their parent task or project, but don’t actually have due dates themselves. This script ignores those items.

The University of South Carolina Celebrates the Card Catalog

The University of South Carolina’s library is launching a “year-long series of events honoring the card catalog, its use in the transformation of knowledge, and the people who created and used it”. Events include:

  • a catalog card boat race
  • a catalog card design contest

Perhaps the coolest bit is the widget celebrating different cards:

 

More here.

OmniFocus snooze script

Last Updated: 2010-06-15

Here’s an AppleScript that “snoozes” selected OmniFocus items by setting their start date to a future* value. These items will then be unavailable (and out of sight in views showing “available” items) until the snoozed start date.

Usage:

  1. Run the script with one or more items selected in OmniFocus

  2. Choose how long you would like to snooze the items (in # of days)

The script will then set the start date of selected items to the current date + the number of days selected in step 2. For example, snoozing with the default value of 1 day will set the tasks to begin at 12:00 AM tomorrow.

Finally, if you have Growl installed, the script will display a Growl confirmation.

I highly recommend initiating the script from a third-party launcher such as FastScripts or Quicksilver. This will prevent delays within the OmniFocus application due to Growl bugs.)

Download it here.


* This doesn’t have to be a future value. Choosing 0 as the snooze value will set the start date to midnight today; choosing -1 will set the start date to midnight yesterday.

NetNewsWire script: Subscribe to full-text feed of current subscription

Update Dec 2010: it looks like FullTextRSS is down. I’ll leave this up, but be forewarned: it probably won’t work. FiveFilters and WizardRSS provide similar services, so you may want to look there.

Here’s a script that attempts to subscribe to a full-text feed of the current subscription in NetNewsWire. It does this using EchoDittoLabs’ excellent FullTextRSS service.

To use, select a headline or subscription title in NetNewsWire and run the script. The full-text feed will appear in your top-level items.

Thanks to harvey.nu for the URL encoding routine. The script worked without the routine but it seemed safer to include it.

[Updated 2/22 with change suggested by Pascal.]

[code lang=”AppleScript”] (* Fulltextrss.scpt v 0.1b

Attempts to subscribe to the current subscription via EchoDittoLabs’ FullTextRSS service (echodittolabs.org/fulltextrss)

Contains no error checking; use at your own risk

Dan Byler dbyler@gmail.com *)

property fulltextpre : “http://labs.echoditto.com/projects/fulltextrss/?url=”

tell application “NetNewsWire” if exists selectedHeadline then set this_headline to selectedHeadline set stdfeed to RSS URL of subscription of selectedHeadline else if exists selectedSubscription then set stdfeed to RSS URL of selectedSubscription end if try set theTextEnc to my urlencode(stdfeed) set fulltextfeed to fulltextpre & theTextEnc set theresult to subscribe to fulltextfeed on error display dialog “Oops—something went wrong.” return end try end tell

— urlencode routine taken from http://harvey.nu/applescripturlencode_routine.html on urlencode(stdfeed) set theTextEnc to “” repeat with eachChar in characters of stdfeed set useChar to eachChar set eachCharNum to ASCII number of eachChar if eachCharNum = 32 then set useChar to “+” else if (eachCharNum ≠ 42) and (eachCharNum ≠ 95) and (eachCharNum < 45 or eachCharNum > 46) and (eachCharNum < 48 or eachCharNum > 57) and (eachCharNum < 65 or eachCharNum > 90) and (eachCharNum < 97 or eachCharNum > 122) then set firstDig to round (eachCharNum / 16) rounding down set secondDig to eachCharNum mod 16 if firstDig > 9 then set aNum to firstDig + 55 set firstDig to ASCII character aNum end if if secondDig > 9 then set aNum to secondDig + 55 set secondDig to ASCII character aNum end if set numHex to (“%” & (firstDig as string) & (secondDig as string)) as string set useChar to numHex end if set theTextEnc to theTextEnc & useChar as string end repeat return theTextEnc end urlencode [/code]

Download a copy here.

Give before you ask: lessons from the Twine DM snafu

No matter what service you provide, even the most well-intentioned invitation can be seen as a demand for time, effort, and attention. Given the lack of established protocols for interacting online, I submit a single cardinal rule-of-thumb for interacting with people, online or off:

Give before you ask.

The following story is not ultimately about Twine, or about Twitter. It’s an illustration of what happens when asking comes first in an increasingly crowded information ecosystem. It’s about relationships, requests, and demands—the subtle dance that characterizes the interplay between “social” and “information”. (Full disclosure: I have a close relative who works at Twine, and I’m an avid Twitter user, so I want both companies to succeed.)

Here’s the 30-second recap:

  1. Twine implemented a feature to let its users connect with their Twitter followers. It did so by inviting them via direct message (DM); however, to Twine users it wasn’t 100% clear that DMs would be the mechanism.

  2. People started receiving impersonal-sounding DMs from loose Twitter acquaintances inviting them to join Twine. This seemed like a spammy auto-DM campaign and incensed some high-profile Twitterers. When Chris Brogan blogged about it, the Twitter ecosystem lit up with largely negative reactions.

  3. Nova Spivack, Twine’s CEO, responded. He engaged Chris Brogan immediately. He was personal, listened, and took responsibility. In my view, his response was spot-on. (Here’s their conversation—click “Show conversation” to see the whole thing.) Net result: the DM feature was disabled. (1)

Although Twine’s fumble was largely technical (the DMs were supposed to point to a more personal landing page, for instance), its approach to Twitter was misguided as well. Why? Because users felt pitched, not informed. Put another way, Twine didn’t give before it asked.

What would it look like for Twine to “give” first? One simple way would be to make it easier for Twine users to share what they’re doing on Twitter. Add a “Tweet” button so users can share what they’re reading with their followers, all without leaving Twine. It doesn’t sound like much, but it would reverse the dynamic between Twine and Twitter. Here’s what would happen:

  • On the surface, Twine tweets would change from friend requests to “actual information”
  • Visitors’ first impressions of Twine would improve. Instead of seeing a sign-up page on arrival, users would see Twine as in its real capacity, a source of information and discussion
  • Some users would stay at Twine longer if they could tweet without breaking their flow
  • Traffic would feel organic, not forced
  • Traffic may grow more slowly, but pageviews would be more valuable
  • Bottom line: users (of both Twine and Twitter) would feel respected, because the tweets they send or receive would be interesting, rather than asking them for a favor

All this from giving before asking.

In addition—but as a secondary option—it would be nice to see which of your Twitter connections are on Twine and connect with them there. (Twitter contacts who are already on Twine are more likely to reciprocate; those who are not members are likely to see the invitation as spam. The former should be encouraged, and the latter treated with care.)

I think Nova nails it with this comment: “Integrating external services with Twitter is a more subtle art than expected.” He’s right: the details are subtle. But the gross concept stands: give before you ask.


(1) Spivack blogged about the experience—you can read his thoughts here. And the Twine team officially responded here.

OmniFocus defer script updated

Updated 6/15/10: minor edit to improve efficiency

The updated Defer script for OmniFocus is ready. Changes include:

  • Bug fixes to make the script more reliable, particularly when deferring multiple items.

    • For most of these I’m indebted to Curt Clifton, who made the most critical bug fixes on the OmniFocus forum. (If you use OmniFocus, his scripts and tools are invaluable; be sure visit his site.)
  • The default action now defers both start and due dates.

  • Notifications code has been rewritten to make the script friendly for machines without Growl installed.

    • While testing, I discovered that GrowlHelperApp crashes on nearly 10% of notification calls. To work around this, the script now checks to see if GrowlHelperApp is running; if not, the script launches it. If Growl is not installed or can’t launch, the script displays a generic notification of the defer results.

If you experience delays with the script, it’s almost certainly an issue with Growl, not OmniFocus. This is much less of an issue if you launch the script via a third-party utility like FastScripts, because any Growl-related delays will be absorbed by the script launcher, not OmniFocus. If you primarily invoke the Defer script from your OmniFocus toolbar, you can always disable alerts to speed things up. To do this, simply open the script in Script Editor and change property showAlert to false.

Download it here.

WMATA vs. the people

WMATA, the D.C.-area public transit authority, has so far declined to provide schedule information for use by third-party providers, including Google Transit. For now, this means WMATA’s own website is the only online source of schedule information.

This is troubling on two accounts:

a) WMATA is largely government-funded, so route information should be treated as an open, public good; and

b) It appears WMATA refuses to open the data for fear of lost advertising revenues from their website, not out of any alleged benefit to riders. (source)

From their FAQ:

We believe that if we are to partner with an outside entity that we should look at what the cost-benefit is to that third party…

During the past year Metro has invested significant money to upgrade its Web site… The site includes Google maps in the neighborhoods in which our rail stations are located.

Interpretation: WMATA finds it acceptable to freely use the high-quality, extensible maps provided by Google but is unwilling to reciprocate by sharing data. Why? Because they’re worried Google will make money from WMATA data and they won’t get a piece of it. (Rumor has it that WMATA is holding out for a revenue-sharing agreement.)

The critical fallacy of this is that falsely treating information as a scarce good in a zero-sum economy harms both the content creator (WMATA) and the target audience (Metro riders).

On the contrary, when public data are opened for actual public use, everyone wins. Riders win by gaining easier access to route information. Google wins by gaining yet another data source. WMATA wins by increasing ridership.

I sent an email to WMATA’s chief administrative officer, Emeka Moneme, to tease out this last point. Excerpt:

If a traveler is planning to walk or drive to her destination, she will never even think to visit the WMATA website. However, there is a very strong chance she will use Google Maps. I assure you, there is no easier way to learn of a mass transit option than to pull up a route on maps.google.com and see “Also available: Public Transit”. Try it yourself on Google maps here: http://bit.ly/5khV

Each and every individual who discovers a WMATA route through such means is a potential new rider. He or she may use that Metro route for years to come. The value of a single new rider vastly outweighs that of a few pageviews on wmata.com.

Surely we can agree that there’s no zero-sum game in that. With open access to information, everyone wins.


(p.s. From the post title: I am not conflating Google with “the people”. Taxpayer-funded data should be open to any entity. In this case, however, providing data to Google is clearly in the people’s interest.)