Twenty newspapers, magazines and nonprofit organizations have become new partners with DocumentCloud, a data archiving project created by journalists and developers at ProPublica and The New York Times. The Atlantic, New Yorker, Mother Jones, MSNBC, WNYC and The Washington Post are among the publications that will submit documents, files and other data into the DocumentCloud system, and soon make them available for public search.
“We all had both personal and professional relationships with people in a lot of different newsrooms,” said DocumentCloud co-creator Scott Klein, ProPublica’s editor of online development and expat of The Nation, The New York Times and Condé Nast.
Eric Umansky, a senior editor at ProPublica and former editor of MotherJones.com, told The Observer “it was usually a pretty easy sell.”
“The reality was we were basically telling people we’re gonna make your documents easier to find; we’re going to give more attention to your reporting and, in exchange, it doesn’t cost you anything and you don’t have to make any commitment besides good faith in your documents,” he said.
As The Observer reported in June, DocumentCloud was co-created by Aron Pilhofer, editor of interactive news technologies at The New York Times, and Ben Koski, a software engineer in The New York Times‘ interactive news technology group (who worked on that Gauging Your Distraction online game for The Times). Their proposal for the high-tech data archive was one of nine projects to receive a total of $5.1 million in grants to come up with new ideas on reinventing news and information organization using crowd-sourcing, mobile technology and other digital journalism tactics.
The team received a two-year grant of $719,500 (they were originally seeking $1 million over three years) and are now in full-swing development mode.
DocumentCloud’s software, once its fully built, will take all those papers that reporters, bloggers and civic groups usually stack in their bottom drawers or computer desktop folders at the end of their investigations, and extract all the information so that it’s findable, shareable and searchable on the Web.
Think of those huge PDFs displaying political donations, legislative votes, sports players’ stats or even VIP lists from socialite dinners and parties—all extracted and put online in their rawest forms.
Using software from DocumentCloud’s partners, users will be able to play with the data and create displays. They would also be able to search for documents by date, topic, person or location, and make connections between them in all kinds of ways. All of the information will be free and searchable by the public, according to Mr. Klein and Mr. Umansky.
The New York Times, ProPublica, Talking Points Memo, the National Security Archive and Gotham Gazette have been partners from the beginning from the project.
Other publications that will be contributing their resources and documents include the Chicago Tribune, Dallas Morning News, Arizona Republic, Minnesota Post, St. Petersburg Times and the Voice of San Diego.
Nonprofits and organizations joining in are the ACLU National Security Project, Center for Democracy and Technology, the Centre for Investigative Journalism at City University in London, the Center for Investigative Reporting in California, the Center for Public Integrity, the Investigative Workshop at American University and the Sunlight Foundation.
PBS’ NewsHour with Jim Lehrer is also partnering with DocumentCloud.
Creating these partnerships will “give us tons of documents to train the system, to test the system, put us on the right track for practicalities of the system, and also train our software in terms of symentic data extractions,” said Mr. Klein. “But we want them to be kicking the tires as we go. They should be telling us, ‘This is what is interesting and helpful to us as a newsroom and these are the things that we find confusing.'”
Right now, only these partners will be able to upload documents into the system. But once the beta test of DocumentCloud is up and running (they hope to have something for the public to see by next August) any journalist, blogger or regular Joe will be able to access all of that information.
Their programming and open-source projects will all be free and open for other developers to use as well. They’ll be releasing them throughout the year as they work.
Although the DocumentCloud team is considering pay models to sustain the project once their grant money runs out, “for now, we’re very much thinking of this as something that is going to benefit society and help newsrooms get their messages out,” said Mr. Umansky. “We’re a charitable nonprofit.”