Inside YouTube’s Secret Algorithm Wars

Google-owned YouTube has struggled for years with how to handle questionable videos. A new book details how the company handled, and mishandled, the burning issue.

Photo by Jakub Porzycki/NurPhoto via Getty Images NurPhoto via Getty Images

This article is excerpted from Mark Bergen’s book Like, Comment, Subscribe

Almost every day, YouTube’s engineers experiment on us without our knowledge. They tweak video recommendations for subsets of users, review the results, and tweak again. Ideally, YouTube wants more “valued watchtime” — its term for time spent on videos most viewers find agreeable, or at least not detestable.

Since the extended backlash against social media starting in 2016, all major internet platforms have pledged to refine their metrics in this manner, moving beyond optimizing for pure engagement. Facebook vowed to track “time well spent.” YouTube promised valued viewing. The hope is to keep regulators and other critics at bay.

jledbetter | Observer

Trouble is, valued watchtime often eats into overall watchtime, YouTube’s gold standard, which is something the company’s known well before 2016. While reporting for my new book on YouTube’s history, I discovered that prior to its recent pledge, YouTube’s tried several times to rank content qualitatively, but either never figured it out or gave up trying.

It started around a decade ago. Back then, YouTube strove to treat all videos equally. If footage didn’t break copyright or graphic violence rules, YouTube thought it belonged on its site and in its promotional machine. “Look, if it’s not good enough to recommend, it just shouldn’t be on YouTube,” Cristos Goodrow, a senior engineer, once said.

YouTube didn’t love clickbait—videos that lured viewers in under false pretenses or sent them away quickly—so in 2012 it changed the way it recommended videos, moving from a system that favored clicks to one that favored time spent. Clickbait soon went away. And huge new content categories emerged—gaming, beauty, vlogging—while YouTube’s ads business took off.

Some YouTube employees noticed one particular category explode out of nowhere after the algorithmic switch: toy unboxing. The videos were enthralling for young viewers, but didn’t seem that educational. The addictive appeal for sponge-brained preadolescents was obvious. In a news report, one television executive called the YouTube content “toddler crack.” A former YouTube executive recalled watching these videos spread with the distinct feeling they were working at a cigarette company.

Rather than drugs, YouTube considered toy unboxing videos sugary snacks. Best in moderation. YouTube was worried that gorging on them might make viewers (or their parents) abandon YouTube. Other videos, like the Khan Academy’s math tutorials or the Hank and John Green’s SciShow, seemed educational and wholesome.

So later in 2012, YouTube began an initiative internally dubbed “Nutritious and Delicious.”

It considered assigning a “goodness score” to certain videos or channels, giving them more weight in rankings. In meetings and internal correspondence, YouTube referred to the approach as adding “broccoli” to their platform (or sometimes “chocolate-covered broccoli”). Creator teams drew plans to get thirty percent of watch time from Nutritious videos. Some drafted broccoli OKRs, objective and key results, the primary method for setting goals at Google, YouTube’s parent.

Then, in a fateful twist, these discussions petered out.

YouTube was preoccupied with Facebook’s rising threat and Google’s obsession with Google Plus, making it worry more about its survival than your nutrition. Staff also got stuck on certain questions: What exactly is Nutritious? How do we decide? Can we program quality into algorithms? Should we? No company-wide metrics were set. “If you can’t figure out how to measure it,” one executive from that era told me, “you just pretend it doesn’t exist.”

Fast forward a few years, and YouTube would have these conversations all over again. The stakes were much higher — YouTube’s business had become more critical to Google, and more politicians and regulators were paying attention. Except this time, it did something.

Since 2019, YouTube has stopped treating all pixels alike. Its recommendation algorithm now demotes “borderline” videos, those that get uncomfortably close to being harmful. YouTube has also done more to disclose how it decides what’s borderline and how it scores the “authoritativeness” of publishers, key components of its responsibility push.

YouTube says its recommendation system digests some 80 billion signals a day. But the chief input for gauging “valued watchtime” is surveys that appear after videos. These surveys work like Uber rankings: they let viewers vote with one-to-five stars. Videos graded with three stars or above are tossed in with other signals (comments, likes, etc.) into a metric for satisfied viewing.

Sometimes more valued watchtime eats into overall viewing hours, so the engineering team has a loose rule for bending its algorithms. If a test showed certain improvement in satisfaction, say one percent, executives okayed the change—so long as it didn’t dent overall watchtime that much (usually by more than 0.2 percent). If it did, then back to tweaking. A spokesperson said the company didn’t have “hard and fast rules” for this process.

Despite its recent attempts to optimize for valued watchtime, YouTube’s old habits may prove hard to break. Like Facebook, YouTube has switched its algorithmic gears to drive eyeballs to TikTok competitor, YouTube Shorts. These are bite-sized videos, entertaining and absorbing. And while YouTube runs its surveys between these Shorts, the videos aren’t made for likes and comments and other signals YouTube uses to register value. Shorts are designed for viewers to flip through idly, from one video to the next. Inside YouTube’s Secret Algorithm Wars