One takeaway from the results of the U.S. Presidential election is that many prediction algorithms were miscalibrated, or rather the data on which their models were based was inaccurate. In some cases, both the data and the models may have had issues. There are two critical components to building exceptional prediction tools: 1) Building a robust and generalizable prediction model and 2) understanding the accuracy, strengths, and weaknesses of the available data and generating new data if necessary.
Often times tailoring a prediction model, or even simply segmenting a market, depends on the question asked. If you ask “how can I get single white males between the ages 20 and 26 to vote for me?”, you may build a different model than if you ask “How can I figure out which single white males are likely to prefer me over the opposition candidate but are unlikely to go out and vote for me without narrowly targeted advertisements?”. Making sure you are asking the right questions is very important. At a minimum, you must be willing to change the question if necessary as your understanding of the project evolves.
The data you use is even more important. If it is inaccurate, incomplete, or unavailable, your prediction model can have little to no meaning. The challenge to building a powerful model is asking the right question and having (or generating) the right data to answer that question.
This Motherboard article provides a narrative of the psychometric analytical approach claimed to be used for select campaigns in the 2016 U.S. Presidential election. The idea was that asking the typical demographic questions like “what do black men under the age of 40 think about our candidate?” may not be an effective way of targeting campaign advertisements. Since this demographic sub population may likely exhibit varied preferences, targeting a single ad to this entire group may be ineffective or even harmful to the goal. Instead of using demographics (like “black” “men” “under age 40”), the psychometric approach targets personality (like “extrovert” “trusting” “adventurous”) as observed through publicly available data such as social media activity, consumer purchasing data, club memberships, and news subscriptions. Instead, group and individual profiles can built along personality lines. In order to analyze and predict voter patterns along personality characteristics (psychometric patterns), a new dataset needed to be generated. With a combination of asking a slightly different question and generating a dataset to help answer that question, data science teams were able to predict (and possibly influence) outcomes closer to what actually occurred.
Obviously, data science alone did not alter the outcome of the U.S. presidential election. However, it’s clear that different campaigns were asking different questions and likely using some different datasets. The power of insights available from accurate data is exceptional. A data scientist’s job is to figure out the right questions to ask of the data and before building a predictive model, understanding the gap between the real world they are trying to model and the available data they have or must generate.