For every reporter, concerned citizen or hard-core anarchist who has stared into the abyss of the WikiLeaks cache and wondered “What do I even search for?” the Associated Press might soon offer relief.
Last month they submitted a proposal called Overview, “a tool for exploring large document sets,” for a $475,000 Knight News Challenge prize. Overview is software that will create visualizations of the kinds of large documents now common thanks to the Freedom of Information Act and organizations like WikiLeaks, which task mere mortal journalists with sifting inhuman amounts of data. The goal of Overview is to tell journalists “what’s in there.” And it’s more than just word clouds:
Visualization is important because it allows the reporter to see patterns in the documents. The goal is not pictures, but insight. Techniques like clustering can provide an instant understanding of the main topics of discussion, threaded displays can be used to trace conversations, and entity relationship diagrams show key people, organizations, and places at a glance. Filtering tools will let the reporter zoom in on interesting potential stories. We’re trying to build an interactive system where the computers do the visualization while a human guides the exploration.
They made a sample of the kind of visualizations Overview will provide using the 2009 Iraq logs released by WikiLeaks (this proposal is dated before the diplomatic cables even leaked–very prescient).
Best of all, Overview and its training materials will be totally free—assuming the proposal gets funded. Leading the project is Jonathan Stray, AP Interactive Technology Editor, who was formerly a senior computer scientist at Adobe. The full Knight News Challenge proposal is on his blog. The Knight News Challenge is part of the John S. and James L. Knight Media Innovation Initiative, and will award $5 million to fund open source tools and platforms for information distribution and analysis.