Within the last few weeks, the Federal Trade Commission (FTC) has issued a series of violations against major U.S. companies—including Microsoft and Amazon—regarding illegal data collection practices. The fines fall in line with a more aggressive stance toward privacy enforcement under FTC chairwoman Lina Khan and under Europe’s five-year-old data law, the General Data Protection Regulation (GDPR). The increased attention on data privacy in recent years has consumers questioning if they can trust the companies selling to them.
The inherent tension in collecting data is that it allows companies to tailor advertisements to what consumers want, but it can be abused when data is transferred, sold or stolen, explained Kevin Werbach, professor of legal studies and business ethics at the University of Pennsylvania. “Even when the Amazons of the world are careful, the more data that is stored, the more harm there is when inevitably there are cyber attacks and breaches,” Werbach told Observer.
Given its size and spotty track record, Amazon (AMZN) is especially under the microscope. In 2021, a European regulator issued the company a then-record 746 million euro ($888 million) penalty for breaking data privacy laws. That same year, an investigation by Wired also revealed a shocking level of carelessness in Amazon’s handling of consumer data, including pervasive sharing of consumer data within the company. The widespread sharing reportedly led to employees spying on the purchases of celebrities, taking bribes to help sabotage a seller’s business and selling knock-off products. Millions of credit card numbers were stored incorrectly in Amazon’s network, and a Chinese data firm had access to millions of Amazon consumers’ information. The company has consistently defended its data privacy practices, including in response to Wired’s article.
Amazon’s breakneck growth has been in part because of its data collection practices. As consumers become increasingly mindful of their data and as regulation creeps up on the e-commerce behemoth, Amazon must convince everyone that it has its users’ best interests in mind.
How much data does Amazon have?
It is difficult to quantify exactly how much data Amazon has. The company’s sales blow other e-commerce companies out of the
“Their business model is driven by getting all the data they can get their hands on,” said Werbach.
What data does Amazon collect?
On the most basic level, Amazon collects a user’s name, location, payment information and the products they purchase. It also remembers click history, including what someone looked at, how long they looked at it and if they bought it. The extent of the company’s data collection depends on how many Amazon products a consumer has and how many details they provide those products. For example, Amazon collects recordings of what users say to Alexa, the virtual assistant. The company notes what consumers watch on Prime Video, its streaming service. Amazon pulls data from its slew of subsidiary businesses, including Ring, Whole Foods, Twitch, IMDb and Audible.
Amazon wedding and baby registries allow the company to infer the age of consumers and their families. At the end of 2021—the most recent data available—Amazon controlled 45 percent of the wedding registry market. Bed Bath & Beyond controlled 30 percent, and with its recent bankruptcy and closure announcement, Amazon could snag an even higher market share. A survey from What to Expect showed 80 percent of parents created their baby registry on Amazon. Based on the timing of the registry creation, Amazon could advertise products to parents based on the assumed age of their children.
The more data an algorithm has access to, the better it becomes, said Kevin Dominik Korte, president of Univention North America, an open-source products supplier. “There’s a strong incentive right now to keep as much data as possible for training purposes.”
The standard differs between countries. Europe has regulations to determine what kinds of data companies are allowed to collect and how they can use it. Regulators then fine companies when they break rules. Authorities have issued three fines against Amazon, which is more than Microsoft but less than Meta. In the U.S., the threat of class action lawsuits bears similar pressure, Korte said.
The concern is just how accurate a picture these companies can paint of an individual consumer. Amazon can likely predict if a person is pregnant from their shopping history, and it might use that information to sell them more products, said Korte. But that data can be stolen or subpoenaed by the government. Last year, Meta received a government order to hand over private messages from a 17-year-old who allegedly planned and executed a late-term abortion. Meta encountered widespread criticism for complying. Like other Big Tech companies, Amazon is regularly subpoenaed for data.
In addition to collecting data on their own, e-commerce companies can purchase information from third-party data brokers to offer additional insights. “Presumably, Amazon has a lesser need to buy third-party data” because of its many sources of collection, said Werbach, but “there’s always more information that can round out a picture.”
“The broad data ecosystem and number of sharing relationships is bigger than most will realize,” he told Observer. “Even the biggest digital platforms tend to be purchasing third-party data.”
Where does Amazon store your data?
Amazon stores user data in a network of data centers around the world known as Amazon Web Services (AWS). The company also sells its data storage services to other companies, which reportedly include Netflix, Comcast, General Electric and McDonald’s. Amazon discloses the regions its centers are housed in, but the locations are unclear. It also doesn’t disclose exactly how many data centers it owns.
Amazon operates data centers on six continents, often near population hubs like London, Sydney and Beijing. In the U.S., Amazon has centers on the east and west coasts, including in Los Angeles, Miami, Seattle and Ashburn, Virginia. AWS owns 15.4 million square feet of property—presumably data centers—according to its annual report. It leases an additional 18 million square feet.
Northern Virginia is a U.S. hub for data centers. Amazon has more than 70 centers in the region, according to Baxtel, a site tracking data center locations. They vary in size from 100,000 square feet to 200,000 square feet, but some buildings are reportedly much larger. Last year, Amazon filed applications to build two 450,000-square-foot buildings in the area, InsideNova reported.
Companies often build data centers, which look like large warehouses, close to where their users are. The farther a center is from its users, the farther information has to travel to reach them, which can lead to delays. By building its centers around the world, Amazon can reach many consumers quickly.
Amazon chooses its data center locations after factoring in environmental and geographic factors. It tries to avoid locations with increased flooding, extreme weather, and seismic activity, according to its website. Centers must also be built to withstand natural disasters. For example, a location on the U.S. east coast should be able to manage hurricanes and tropical storms, which requires having multiple sources of power, cooling and heating at all times.
If a data center goes down due to environmental or technical reasons, companies can be subject to lost revenue and angry customers. Users may not be able to complete purchases or access online material stored within that center. In extreme cases, technical issues can cause outages in entire stock markets, which happened last month in Switzerland. Amazon has contingency plans for data center failure. The service automatically shifts traffic to another data center, according to its website.
How does Amazon share your data?
The third parties that sell on Amazon, like Apple and small businesses, can access personal information from users who buy their products—but only the information necessary to complete the sale. Amazon also provides a slew of anonymized data to merchants. Sellers can see the demographic data of customers, including age, income, education, gender and marital status. They can also see what products consumers are looking at alongside theirs and what shoppers purchase instead of theirs. The company employs third parties to deliver packages, process payments and perform customer service duties, among other tasks. According to Amazon, these groups only “have access to personal information needed to perform their functions.”
Amazon also complies when law enforcement requests data as part of open investigations. Amazon can challenge these data demands in court if it decides a request is overly broad or inappropriate. In the most recent report on the matter covering July 2022 to December 2022, Amazon completed more than 31,000 requests for information, 15.4 percent of which came from U.S. law enforcement agencies. It is unclear how many requests the company received and what percent of the total it complied with, because Amazon stopped disclosing that information in 2020, though other Big Tech companies still provide it. In the same time period, Meta received 230,000 requests and produced information for 76.8 percent of them.
Amazon’s most recent six-month total is in line with the number of requests it has complied with biannually in recent years. A shift happened in mid-2020 when law enforcement increased the number of requests to Amazon by 800 percent. In the first half of 2020, Amazon received 3,220 data requests, which is comparable to the number during the previous six-month periods, going back years. During the second half of 2020—when Amazon stopped disclosing the number of requests received—the company complied with 27,700 requests, meaning law enforcement likely sent more. This spike isn’t evident in the data demands for Microsoft, Meta or Google.
How can you protect your data?
“In general, the extent of data collection and targeting is something consumers should be concerned about, but it’s not something they can do all that much about at an individual level,” said Werbach, the professor. “That’s a reason there should be more comprehensive regulation to identify what practices are legitimate and which ones aren’t.”
Because companies can collect whatever they disclose in their privacy policies, the notices tend to be written vaguely, he said. There is also no ability to negotiate a policy. These practices have changed slightly with the GDPR, but big platforms like Amazon still don’t feel limited, he said.
One of the biggest threats to consumers is data breaches, said Korte. In 2019, a former Amazon engineer orchestrated a massive hack that “did more than $250 million in damage to companies and individuals,” according to attorney Nick Brown. The employee downloaded information from more than 100 million Capital One customers, including 120,000 Social Security numbers and 77,000 bank account numbers. Amazon employees have also inappropriately shared users’ contact information with third parties multiple times, which is against company policy. In some cases, employees received bribes to give certain sellers unfair advantages over others. Because leaks are usually driven by people within a company, it is impossible to stop breaches without getting rid of all employees, Korte said. The best way for a company to manage leaks is to quickly disclose what happened with customers and maintain proactive communication, he told Observer.
Amazon has taken steps to minimize the impact of data leaks. In January, Amazon Web Services began encrypting all new data by default. While this action alone can’t stop breaches, it makes leaks less harmful to consumers, because hackers won’t be able to make sense of the data. Amazon’s products also prompt users to verify their identities before Amazon discloses personal information.
What does the future of data collection look like?
The utopian perspective is that consumers see their data has value and don’t provide it to companies freely, said Korte. “But as a species, we are probably far too lazy to do that. It’s nice to get show recommendations based on the last five you watched,” he said.
A positive and realistic future would consist of strong data regulations and the standard that companies can train algorithms on user data, but delete that data when the training is done, he said. If regulation isn’t possible, companies could benefit from having strong risk management. The threat of losing money through data breaches and fines could drive them to put the proper safeguards in place, he said.
It is likely companies will expand their categories of data collection, with biometrics as a possible new territory, said Werbach. Biometrics include fingerprints, DNA, facial recognition and voice cadence, among other identifiable biographic data. The prevalence of smart devices has already popularized the use of biometrics as passwords—opening an iPhone via a fingerprint or facial scan, for example—but the concern is in how else companies could use this data.
“We are as close as we’ve ever been to a privacy law in the U.S.,” Werbach told Observer. A bill with bipartisan support that would have established privacy rights and oversight mechanisms died in Congress last year, but many individual states are beginning to enforce similar measures.