NetNewsWire RSS feed-to-DEVONthink archive script updated

A while ago I posted this script, which selectively archives RSS feeds/items in NetNewsWire to DEVONthink Pro Office.

This long-overdue update includes two changes:

adds the option to archive all items in a specific date range
fixes an issue when a news item’s description is not available

Grab it here.

DEVONthink: the research assistant you’ve been looking for?

I’ve written before about personal information management: why it’s important for everyone—not a subset of ‘power users’—and how to evaluate information management systems.

In short, we simply deal with too much information every day to deal with it all. What’s more, we should only have to dig for a given piece of information once; a good information management system should facilitate easy retrieval the second time around.

“Everything buckets”, as one category of information managers are called, are seemingly everywhere. What’s astonishing is that even in 2010, almost none of these performs more than the most rudimentary information retrieval functions. In general, even with these “smart tools”, the onus remains on the user to a) do a thorough job classifying and organizing his or her information, and b) to know exactly what terms to search for when seeking said information. Except, that is, for DEVONthink.

On its face, DEVONthink is a versatile database that can store and retrieve just about any type of data available: PDFs, web clippings, emails, MS Office documents, bookmarks, multimedia, RSS feeds, etc. At this level, it’s similar to (though, to my knowledge, more robust than) a number of related products. The real value comes in the content analysis functions that are applied to everything you throw at it.

Demystifying DEVONthink’s AI

It’s the “artificial intelligence” features of DEVONthink that really set it apart from the crowd of personal information managers. (I put “artificial intelligence” in quotes because DEVONthink’s brain owes more intellectual debt to the work of information retrieval than machine learning.)

While other information managers hold to an archaic notion of binary relevance (either a thing matches your query terms or it doesn’t), DEVONthink incorporates much more nuance into its reckoning.

In fact, it can treat entire documents as search queries, a feature that seems useless until it almost magically reveals documents related to the one you’re looking at, or offers to automatically file it into the right folder. (This function—”automatic class management” in information retrieval-speak—is invaluable in the paperless office: should you choose, DEVONthink files all your bills away with a single keystroke.)

In short, DEVONthink takes an entire class of advanced tools otherwise restricted to researchers and search engines and unleashes it on your personal data set.

No wonder Steven Berlin Johnson raved about DEVONthink in 1995. No wonder he still uses it today.

Information Capture

As I mentioned, DEVONthink can handle any document type you can give it. If it’s a file, DEVONthink can store it. If the file is searchable with Spotlight, DEVONthink can perform smart analysis on it. Even non-traditional document types (RSS feed items and mail messages, for example) are fair game, and it’s scriptable for those fringe use cases the folks at DEVONtechnologies haven’t thought of (my NetNewsWire-to-DEVONthink script is one).

Information Retrieval

Perhaps the most unusual feature of DEVONthink, the “See Also” bar displays a rank-weighted list of documents related to the current one. By surfacing documents you may not have thought as relevant, this can facilitate serendipity in research.

As an anecdotal example, for this document on GM potatoes, DEVONthink returns a number of related articles I’ve saved—including one on Peruvian potato farmer, another document on how genetic modification is transforming agriculture in Europe, and one on a certain incident in which Pringles are ruled as potatoes.

Another example: the previously pictured article on a fatherless baby shark is suggested as a candidate for my folder on Slaughter-house Five notes. No link was immediately apparent, so I glanced through those notes and found the following quote about the seven Earthling sexes:

There were five sexes on Tralfamadore, each of them performing a step necessary in the creation of a new individual. They looked identical to Billy—because their sex differences were all in the fourth dimension…

While this serendipitous insight may be of limited academic value, I can say with reasonable assurance that I wouldn’t have thought of the Tralfamadorians while investigating virgin births in the local shark population. But I’d be hard pressed to say it’s not relevant, so I’ll chalk it up as useful.

Search

Sometimes you need a precise match for your search query. DEVONthink can also accommodate those needs through advanced search operations:

Strict vs fuzzy search (fuzzy search returns near-misspellings, word variants, etc)
Regex-style wildcards
Boolean operators (e.g., a AND b; a XOR b; NOT b)
a NEAR b
a BEFORE b
etc

Conclusion

In 2010, I am amazed at two things: first, how useful DEVONthink’s smart features can be in real-life scenarios; and second, that no one has begun approaching DEVONthink’s usefulness even though it’s been on the market since 2002.

If you haven’t used DEVONthink before, take some time to try the free demo. In the worst case, you haven’t lost a thing (unlike Evernote, DEVONthink never holds your data hostage in proprietary databases). But in the (more likely) best case, you’ve gained a really, really smart research assistant.

Congratulations, DEVONthink. You deserve these accolades.

Disclosure: in celebration of its birthday, DEVONthink is offering some incentives to users who contribute to the discourse around its product offering. This served as a motivation for the timing of this post, not the content. I’ve actually been intending to write about DEVONthink since before I published my original thoughts on personal information management (in 2008), and more recently since I entered into a Master’s program in information science.

Archive newsfeeds in DEVONthink Pro via NetNewsWire

Newsfeeds provide an invaluable service: direct access to content of specific interest. NetNewsWire has long been my preferred newsreader, and the recent addition of synchronized online access makes it, for me, the clear best-in-class news client. Yet although NetNewsWire does a fine job aggregating news feeds, it’s a poor long-term information management solution.

Enter DEVONthink Pro, a highly reviewed research tool/information manager.

The NetNewsWire > DEVONthink bridge is a logical one, so it’s no surprise that DEVONthink Pro comes with preloaded scripts to archive information directly from NetNewsWire. (Sorry, DEVONthink Personal doesn’t include scripting support.) But the feature I really needed – the ability to archive entire RSS feeds into DEVONthink – is not included in these scripts.

To fill this gap, I wrote a script to archive entire newsfeeds (or subsets thereof) to DEVONthink Pro.

This script saves items from your selected feed (or folder of feeds) to DEVONthink Pro as web archives, with the following options:

From the selected feed, folder, or smart folder, save:
- All items
- Flagged items
- Unread items
- Read items
- Date range (archive all items within a specified date range)
Archive options:
- Feed saves just the content of the news item (typically the best option if the site provides full-text feeds)
- Site saves the content of the item’s target website (this should be the whole article – particularly useful if the newsfeed is truncated)
- Site/Print goes to the item’s target website and looks for a link that includes the string “Print”. If one exists, it saves the print-ready page to DEVONthink. If none exists, reverts to “Site” behavior.
Perform post-archive actions: mark read, unflag, or do nothing. This action occurs after DEVONthink has archived the article, so you can see archive progress in action and ensure you don’t double-archive an article
Optional: introduce a random time delay between archiving articles if using “Site” or “Site/Print”. (Useful if the site you’re reading from has countermeasures to prevent content scraping. Read: if they’re trying to keep you from saving their content for later use.)

My personal workflows for this script:

Flag items of interest from all newsfeeds. Then run this script periodically to import all flagged items into DEVONthink. Archive target: Flagged items (feed); Post-archive action: Unflag.
Save full-text content of a magazine I subscribe to but don’t want to keep hard copies of. Run this script every week, when the magazine’s RSS feed is updated. Archive target: unread items (Site/Print); Post-archive action: Mark as read.

Get the script here.

Update 2008-08-12: Added an additional Growl-free script to the download for users who don’t have Growl installed. The only difference is that the lines referencing GrowlHelperApp are commented out. Same download link applies.

Update 2010-05-26: Fixed a problem encountered when news items don’t have a description. Added an option to archive items from a specific date range. The download link has been updated.

Managing acquired information in an information age

Success in the information age hinges on managing the explosion of available information in meaningful ways. To even approach this goal requires a successful information management strategy, which revolves around the questions

“How do I find relevant information?”

and its corollary:

“How do I manage the information I’ve found?”

On a personal note, these are two of the questions that drive my own technological explorations. Brainstorming and note-taking methods and tools provide another side to the issue. This post is intended to provide some background and framework for said exploration.

How do I find relevant information?

Online information is typically located through complementary methods of search and discovery.

Traditional search technologies will long remain the first resort for information-seekers. Desktop search clients are also available for advanced data mining and research. Yet the rising semantic web is the true future of the Internet, and will enable users to interact with information in more meaningful and relevant ways.

Relationship-based information discovery is rapidly adding an important layer over traditional search tools. Social microsharing platforms (e.g., Twitter) and more robust social platforms (e.g., Twine, in private beta) allow individuals to build a liminal space of like-minded individuals with similar interests.

Two points are worth iterating here:

Social networks are becoming a search sphere in their own right. For me, the Twitter ecosystem has become my trusted first source of user opinions; for many types of information, I search on Twitter before going to Google or DEVONagent.
More and more information is shared and recommended through these relationship-based services. In other words, social networking platforms allow information to be discovered rather than explicitly sought.

Search once, not twice

The key to a useful information management strategy is this: You should only have to find a piece of information once.

Search tools should not be relied upon to find specific pieces of previously located information. If it takes more than fifteen seconds to locate online, it should be in your personal information system, not left to The Google.

If you spend a lot of time looking for information you’ve already encountered, your system is broken and you’re wasting your time. Or your employer’s time. Either way, that time should be spent turning information into knowledge, or putting it to use.

So: what to do with all this acquired information?

Tools of the trade

To be effective, an electronic document management system (EDM) should be:

Accessible — it’s available when and where you need it (for both archive and retrieval)
Flexible — able to accept input from any variety of sources
Scalable — can accept many thousands of documents without becoming unwieldy
Searchable — the system is worthless if you can’t find what you’re looking for
Extensible — it can be extended through scripting or other means
Open — It doesn’t hold your information hostage when you need to change systems

The most rudimentary means of storing information – file systems – fail where it matters most. Because file systems are not designed for this type of data management, they are not truly accessible (saving an excerpt from a website, for instance, is a many-step operation), or quickly searchable (your data are hidden amongst tens of thousands of irrelevant system and program files). In addition, file systems don’t provide end-to-end data functions, so viewing the contents of most file types requires launching another application. Add-on tools like Google Desktop mitigate some of these issues, but they’re no match for a real EDM system.

True EDMs are specifically designed for the task archiving and retrieving information. They can store images, text clippings, and documents of all types; add content indexing to the mix (allowing users to search by any word contained in their files); and are streamlined to allow quick archiving of information. EDMs can be implemented as software-based solutions (see Yojimbo, EagleFiler, and the like), as well as online (see Google Notebook, for instance).

Second-generation information managers like DEVONthink and Twine take content management a step further, adding semantic intelligence and useful content analysis to the user’s database. DEVONthink, a tool that I’ve used for years, analyzes the contents of its articles to identify non-obvious semantic relationships and assist with automatic filing. Twine performs similar functionality in the context of a social network, in theory promising to integrate the most relevant search, discovery, and EDM tools.

Live in the cloud…

As computer usage becomes increasingly network-centric and social, individuals are becoming more and more willing to trade privacy for the convenience and utility of web-based services.

Put another way, we are becoming more willing to keep our information in “the cloud”. (I like the cloud metaphor because, for me, it conjures images of Benjamin Franklin flying his kite in the electric storm. There is energy and power and excitement in the cloud. There’s also risk.)

This trend will spell dramatic shifts in EDM solutions to come. Soon all our data will be accessible from any web-enabled smartphone or computer, anywhere in the world. (And with customs agents able to search the contents of any electronic device with impunity, business travelers may soon be required to keep sensitive data online, not on their machines.)

But online services are not a silver bullet—yet. As a general rule, the current generation of Web 2.0 apps:

Make it difficult to work offline (technologies like Google Gears may soon obviate this concern)
Don’t take full advantage of OS-level services, keyboard shortcuts, etc
Are not easily automated or scriptable
Make it difficult to back up files (FUSE applications may change this in the near future)
Put users at the mercy of others for data integrity (Granted, it’s vastly more likely that you’ll lose data from your own hard drive crashing – rather than Google’s servers going kaputt – but either scenario is a possibility. Pick your poison)

…with your feet on the ground

Until these concerns can be fully mitigated, the most promising path forward lies in hybrid desktop/web platforms that allow users to maintain local and online control of information.

These may be end-to-end solutions (for example, the NewsGator family of products includes web- and software-based newsreaders that are fully synchronized) or more specific sync services (Plaxo, for instance, synchronizes desktop calendar and address book clients with online equivalents). When implemented correctly, these tools can be phenomenally useful.

I’ve been waiting for this same innovation to make its way to the world of EDM apps, and there are some promising options emerging. A limited example is DEVONthink Pro Office, which has a built-in web server that provides remote access to your database. (First impression: it’s slick, but you’re out of luck if you’re stuck behind a firewall or the database isn’t running.) Evernote is a new EDM tool with full desktop-to-web synchronization tools, as well as limited online editing.

The beginning

Ultimately, any EDM solution is only a tool — but it may be the most important tool in the arsenal of knowledge workers. It is therefore of critical importance that we take our EDM strategies seriously.

You may not yet have an EDM strategy. But creating one may be the most important step you can take in your development as a knowledge worker.

Take a moment to think about how you manage what you know. Start exploring technologies, asking how they can improve your knowledge set.

It may take months to work out a reasonable system of your own… but it’s a beginning, and one well worth making.

bylr.net

"you're at this website. i guarantee it." -dan byler

DEVONthink

NetNewsWire RSS feed-to-DEVONthink archive script updated

DEVONthink: the research assistant you’ve been looking for?

Archive newsfeeds in DEVONthink Pro via NetNewsWire