The organization that runs Wikipedia has built a new artificial intelligence editing tool, Objective Revision Evaluation Service. The tool is meant to spur innovation among Wikipedia’s massive team of volunteer software developers. It’s directed at two problems at once. First, the problem of edits made on Wikipedia that damage the quality and credibility of the encyclopedia. Second, the problem of discouraging new contributors when they make bad edits with good intentions.
“The model that we built is intended to detect all kinds of damage,” Aaron Halfaker, Senior Research Scientist at the Wikimedia Foundation told the Observer in a phone call. “But we also have models that predict if an edit is made in good faith or not.”
Wikipedia is massive, with half a million edits made on it per day. The quality is undermined both by trolls and newbies who unintentionally break Wikipedia’s best practices because they don’t know them yet. By wrapping automatic damage detection into an API, developers can build new approaches to handling automated quality control work that can both protect the repository and do a better job of helping new contributors. That way, more new contributors might get invested in the project and stick around.
The announcement explained how ORES thinks, writing, “The system works by training models against edit- and article-quality assessments made by Wikipedians and generating automated scores for every single edit and article.”
One quality control tool, Ra Un, can let curious readers check this out live. It’s listing new edits in real time, by running them through ORES. It flagged this edit, which appears to attempt to insert Tupac Shakur into the band Digital Underground’s page on Portuguese Wikipedia.
That looks like trolling. Trolling is usually easy to spot.
The motivation for Mr. Halfaker and the Wikimedia Foundation wasn’t too smack contributors on the wrist for getting things wrong. “I think we who engineer tools for social communities, have a responsibility to the communities we are working with to empower them,” Mr. Halfaker said. After all, Wikipedia already has three AI systems working well on the site’s quality control, Huggle, STiki and ClueBot NG.
“I don’t want to build the next quality control tool. What I’d rather do is give people the signal and let them work with it,“ Mr. Halfaker said.
The artificial intelligence essentially works on two axes. It gives edits two scores: first, the likelihood that it’s a damaging edit, and, second, the odds that it was an edit made in good faith or not. If contributors make bad edits in good faith, the hope is that someone more experienced in the community will reach out to them to help them understand the mistake.
“If you have a sequence of bad scores, then you’re a probably a vandal,” Mr. Halfaker said. “If you have a sequence of good scores with a couple of bad ones, you’re probably a good faith contributor.”
Here’s an example:
When writing about a living person, the guidelines say that every fact should come from a reliable source, with a reference. A new contributor might know a lot about Jay-Z, for example, but not so much about the best practices on Wikipedia. So, they might add a fact to Jay-Z’s page that they are relatively confident to be true, without adding a citation.
Even if it’s something innocuous, it’s important that Wikipedia backs up everything it reports about living people.
Right now, it’s likely that a contributor’s uncited contribution would get reverted, perhaps with a curt message about the need for citations, generated by some sort of automated system. It could be discouraging enough that the new contributor doesn’t contribute anymore. And that’s a problem. Wikipedia is running lower and lower on contributors and editors.
In fact, it was Mr. Halfaker whose study documented the decline, argued that algorithmic quality control contributed to that decline. Now, an employee of the organization, he’s looking to make it possible for automated quality control to work better for everyone.
“This is really a hypothesis,” Mr. Halfaker said. That is, that quality control will improve when edit evaluation is giving an API.
Right now, a lot of quality control happens on a very macro scale. There are volunteers who devote most of their time on Wikipedia just checking the latest edits, in whatever subject is coming up. ORES could make it possible, for example, for particular communities to make an edit checking feed specific to their subject matter. So media contributors could review media edits and New England public transit contributors could review edits on related transit articles.
ORES is currently live, to some degree on 12 languages of Wikipedia, as well as Wikidata. It’s only functional across all features on English Wikipedia, for now. It will roll out to more languages and Wikimedia projects with time, and it will take on more kinds of tasks, as the AI develops.