This Website Will Scan Your Twitter Feed for Content That Threatens the State

Jennifer Gradecki presents on her research into how intelligence agencies gather and analyze open-source data at Radical Networks in Brooklyn.

Jennifer Gradecki presents on her research into how intelligence agencies gather and analyze open-source data at Radical Networks in Brooklyn. Brady Dale/Observer

During Dzhokhar Tsarnaev’s trial for the Boston Marathon bombing, some of the evidence against him came from his Twitter timeline. This was important, because the prosecution needed to establish the degree to which attack was premeditated. As The Guardian reported, the threats in many of Tsarnaev’s tweets as presented by the prosecution, however, were debunked by the defense as quotes from comedy shows or song lyrics. Once the intelligence machinery had identified certain messages as indicative of dangerous activity, no one double checked to see if there might be another explanation for some of the messages.

Jennifer Gradecki and Derek Curry are two artists who have made the Crowd-Sourced Intelligence Agency to help the public understand how law enforcement models scour public sources of information and identify messages as threatening. The team has collected as much publicly available information as they can to construct a machine learning algorithm that models how American law enforcement systems assess messages on social media. “They do work. When I just opened it up on Twitter, you find ISIS pretty quick,” Curry explained, but you also end up seeing a message from a 15 year-old kid in Kansas about putting a distributed denial-of-service (DDoS) attack on someone over a video game. That message ends up getting on the same list as messages from a terrorist organization, which directs agencies down a dangerous road.

“When you put something in a surveillance system, you kind of frame the judgement of the analyst,” Gradecki added.

Gradecki and Curry are both PhD candidates at the University of Buffalo. Gradecki said that she hopes visitors to site will “gain a practiced based understanding of how these systems function, how they frame the data.”

“In short, we really want people not to say, ‘I don’t care if they see what I’m doing. I’m not doing anything wrong,’” Curry explained.

Gradecki presented today at Radical Networks in Brooklyn on what she’s learned about intelligence agencies as she’s researched them. In particular, she used the metaphors the agencies use to describe the work they do as a way of understanding how state intelligence actually works.

“Analyzing metaphors used by intelligence agencies can reveal how their thoughts and actions are structured,” she said in her talk. “This knowledge can reveal tactics for resistance.”

She particularly focused on the metaphor of the “mosaic.” A mosaic is a work where an artist takes a bunch of pieces of different things and assembles them into a new image. She has also found intelligence officials using the metaphor of a puzzle, but a puzzle is very different than a mosaic. In a puzzle, the picture is there, but it has to be assembled. In a mosaic, a picture is created from pieces that were not part of a prior image.

The latter is a better fit for how intelligence tends to work today, Gradecki argued, saying that “analysts tend to assemble a mental model first, then find pieces that fit into it.”

Further, the more pieces of data law enforcement has, the more pictures they can create, which helps to explain why the general practice in signals intelligence today is to simply gather and hold as much data as possible.

The Crowd-Sourced Intelligence Agency helps the public better understand the models used to assess it. It has a social media dashboard designed to look as much like one an intelligence analyst would see as possible. It lets users create watchlists of their own. It assesses text messages to tell users whether or not the team’s algorithms would identify the message as threatening. It also lets the public tell the system whether or not they agree with its assessment about data scraped from the public internet, such as tweets.

The larger point of the project is to illustrate that machine learning systems are only as good as the data that seed them. If the model identifies a lyric from a pop song as dangerous, such as in the case of Tsarnaev, then “every time someone tweeted those same song lyrics, they’d be flagged as threatening,” Gradecki said.