2020 U.S. election: Why data can't always be right
First Voice

Editor's note: CGTN's First Voice provides instant commentary on breaking stories. The daily column clarifies emerging issues and better defines the news agenda, offering a Chinese perspective on the latest global events.

The Super Tuesday results of the 2020 U.S. election have come out, showing former U.S. Vice President Joe Biden has become more popular than Vermont Senator Bernie Sanders. 

Earlier, opinion poll analysis website FiveThirtyEight had correctly predicted that Biden will get ahead of Sanders by a narrow margin. Data matters. But data can also be wrong. To recall, Cambridge Analytica, an advertising company, used big data to help Donald Trump win the election in 2016, whilst the data prediction sector of the New York Times failed to help Hillary Clinton for the same thing.

How influential can data be? Which data are important and reliable? Why can data be wrong? Just as each tech corporation has its own algorithm logic, each data company has its own method of analysis.

For example, by using a method to balance out the polls with comparative demographic data, including the pollster's historical track record, sample size and time of the poll, FiveThirtyEight predicted correctly the vote winner of all 50 states in the 2012 election.

Screenshot of FiveThirtyEight's forecast for Super Tuesday primaries shows Biden's numbers surging, March 3, 2020.

Screenshot of FiveThirtyEight's forecast for Super Tuesday primaries shows Biden's numbers surging, March 3, 2020.

Demographic data may be accurate because it is based on census, but data do change due to family moving from one state to another because of the shrinking of the economy in some states.

American author J.D. Vance proved this phenomenon in his bestseller "Hillbilly Elegy." "As the economies of Kentucky and West Virginia lagged behind those of their neighbors, the mountains had only two products that the industrial economies of the North needed: coal and hill people. And Appalachia exported a lot of both."

Opinion data may change largely with the demographic data as the globalization sweeps through American counties. Former U.S. Secretary of State Hillary Clinton confirmed this in her memoir on the 2016 election "What Happened" by comparing the demographics during her husband Bill Clinton's election to her own. She found that between 1992 and 2016, that data changed a lot.

Therefore, data is very important in knowing what each candidate's base voters are, and what to do to win the votes of those swing voters.

One of former U.S. President George W. Bush's campaign strategist Karl Rove explained that in detail. He detailed the data usage by dividing the data methods in different stages of the election, so that each election team needs to research on different types of data managing. These include "baseline polls" at the beginning, "update brushfires" at the mid-stage, and "tracking" before the end of the election.

To be effective, once the team members get the accurate data, they will create focus groups to accurately deliver the commercial advertisements on different groups. The hardest data to predict are the opinions of the swing voters. 

Inexperienced teams would passively wait for the signals of how these people would vote, whilst the successful ones would preemptively pinpoint the ideas to particular groups of people either by their race and ethnicity or by their age and education status.

To influence them, the strategist would hire career professionals to speak out on the social media using particular campaign pitches so the uncertain voters would feel they know what is really at stake in the election.

This means, some tweets on social media platforms are not just simple tweets. They may be intentional packages using some key tactics of delivering messages to get a particular group's attention to guide them who to vote for. Predictions based on these data are bought by the campaigning team, so that only the people who deal directly with this type of data know how not only to predict the data but to manage them as well.

However, data of the state may be wrongly analyzed, because some votes depend more on the county level. To get knowledge of what counties focus on what topic more, data trends on search engine such as Google could help.

For example, before the Super Tuesday primaries, Google Trends showed that in the large population state of California, the primary concern is in the field of healthcare. Bernie Sanders campaigned for a policy that everyone gets free healthcare irrespective of economic status, and the results showed he gained the largest vote share in California primary.

Screenshot of Google Trends showed that in the large population state of California, the primary concern among counties is in the field of healthcare.

Screenshot of Google Trends showed that in the large population state of California, the primary concern among counties is in the field of healthcare.

However, some data may get people wrong. During the 2016 elections, the New York Times predicted there were 1,024 ways of Hillary Clinton beating Donald Trump, but eventually all ways proved to be wrong.

This means machine thinking should not completely replace human thinking at the early age of artificial intelligence (AI). While big data can use behavioral data to predict the possible direction of how people take actions in the future, it also has its limits.

People's choices may suddenly change due to a variety of reasons. The predictions are based on the past data, which means the data cannot keep pace with the current times. A voter may be influenced by a campaign rally, but this same voter may have changed his or her mind after a conversation with family members. So as we can see, in each election, there are many last-minute voters.

Each year, The Economist magazine publishes its annual prediction issue "The World in 'the next year'" based on data. However, while some predictions would be right, many would later prove to be wrong in details, with some events delayed or proved to be completely wrong as the dimensions in politics and economics change dramatically.

In its "The World in 2020," it ran two articles to predict the 2020 U.S. elections. Both are based on data, but one is predicted by humans, saying Trump would win again by a betting chance of 24 percent more than his opponent, while the other is predicted by an AI, saying Trump will lose to whoever maybe his opponent.

At this stage, people don't know if their brains have been outsmarted by AI or not, but what they do know is that an octopus had been used to predict the winners of World Cup soccer finals and it did a great job.

An AI product, named GPT-2, told human beings to think differently: "The big projects that you think are impossible today are actually possible in the near future. " That may be true, because humans need to evolve to have better data prediction, but we still need to have our human way of thinking things uniquely.

(Script writer: Xiong Tong)

(If you want to contribute and have specific expertise, please contact us at opinions@cgtn.com)