“It’s not about displaying documents,” said Aron Pilhofer, editor of interactive news technologies at The New York Times and leader of the team of news journalists and programmers behind those interactive, data-driven displays on NYTimes.com. “It’s about making documents that are already on the Web findable, searchable, and structured in a way that you just can’t do right now.”
Mr. Pilhofer was calling in to The Observer from MIT in Boston yesterday, where the Knight Foundation had just announced the winners of their 2009 Knight News Challenge. Mr. Pilhofer’s DocumentCloud, a data archiving project he created with three other journalists and developers at ProPublica and The New York Times, was one of nine projects to receive $5.1 million in grants total to reinvent news and information using crowd-sourcing, mobile technology and other digital journalism tactics.
DocumentCloud received their own two-year grant of $719,500 (they were originally seeking $1 million over three years).
Mr. Pilhofer explained that DocumentCloud plans to collect all those papers and PDF documents that reporters, bloggers and civic groups usually stack in their bottom drawers or stow away in random folders on their desktop at the end of their investigations, and extract all the information so that it’s findable, shareable and searchable on the Web.
His team’s original idea stemmed from The Times‘ DocViewer software, which created a searchable database of the more than 11,000 pages of Hillary Clinton’s public schedule during her eight years as first lady in the White House.
Think of those huge PDFs displaying political donations or legislative votes or a list of sports players’ stats or even VIP lists from socialite dinners and parties—all extracted and put online in their rawest forms.
“Suddenly, that document becomes a more valuable resource than, say, if it’s sitting on the New York Times Web site or if it’s a link to a downloadable PDF somewhere,” Mr. Pilhofer said.
Using software from DocumentCloud’s partners, users will be able to play with the data and create displays. They would also be able to search for documents by date, topic, person, location, and make connections between them in all kinds of ways.
For example, “If I want a particular topic, say the Iraq war, or the Vancouver Olympics or something like that, geography can come into play,” Mr. Pilhofer said. “Like give me every document about the Vancouver Olympics within 50 miles from Vancouver.” You got it.
“You’ll be able to make references between documents and within documents that you just weren’t able to do before,” he said.
“We’re the card catalog of primary-source documents—we’re developing the data,” he added.
The project was created by Mr. Pilhofer, who previously worked for the Center for Public Integrity in Washington; Eric Umansky, a senior editor at ProPublica, former editor of MotherJones.com and a previous Slate “Today’s Papers” writer; Scott Klein, ProPublica’s editor of online development and expat of The Nation, The New York Times, and Condé Nast; and Ben Koski, a software engineer in The New York Times‘ interactive news technology group.
The Times has partnered with them on the project, offering up some of their own software and data to add to the “card catalog,” along with ProPublica, Gotham Gazette, Talking Points Memo and the National Security Archive.
Back in November, DocumentCloud came under fire from N.Y.U.’s Jay Rosen, who criticized their application on his Twitter, and suggested that The Times should be giving money to the Knight Foundation, not seeking it. Mr. Pilhofer emailed a response to Harvard’s Nieman Journalism Lab:
I can understand why some would feel that way, but I think it’s more a misunderstanding of what the project is and who it’s intended for…This is a grant submitted by us, but it’s not for us…The project is to create what we’re calling a consortium, some sort of entity that is not The New York Times, that is not ProPublica. Ideally, this will incorporate all sorts of media organizations and bloggers and watchdog groups and universities…If anything, Professor Rosen has it kind of backwards: We’re contributing to this effort. We’re contributing development resources, we’re contributing our time.
Mr. Pilhofer said his team hopes to have a launch before the end of the year.