Rob Sickorez 2017-04-12 02:21:25
Data scientists tend to be detail-oriented people. They have loads of unstructured data and a wide array of techniques at their disposal. It’s easy to become overwhelmed by detail and lose sight of the big picture, but data scientists should aspire to be big picture people. Below are my top seven data science quotes for 2017 that are hopefully witty enough to trick you into reading the comments that follow each one, and encourage you to reflect on how to use data science to support the big picture. We would like to reinforce our opinion that the goal of data science is to enable decision makers to use data to study realworld problems and make smart decisions. The models and algorithms we use are just a means to that end. “It is exceedingly difficult to make predictions, particularly about the future.” ~ Neils Bohr, physicist Supposedly, this quote was a tongue-in-cheek response when Bohr was asked to predict the long-term influence quantum physics would have on the world. Similar quotes have also been attributed to the baseball player, Yogi Berra. We like this quote mostly because it makes us laugh. Also, it motivates a couple of good points relevant to data science. First, an SEC Rule 156-type warning should be attached to all forecasting models: Past performance may not be indicative of future results. We can only use our current data and understanding of the world to make predictions. We’re often going to be wrong, but this doesn’t invalidate the practice of forecasting. Decisions we make using inadequate forecasts are better than those we make using no forecasts. Second, since we know that our predictions will be wrong quite often, we should build data tools that help operators decide how to reallocate resources and execute contingency plans when the future is realized in a different way than we had predicted. “The most misleading assumptions are the ones you don’t even know you’re making.” ~ Douglas Adams, author Douglas Adams wrote about a visit he made to a gorilla sanctuary in Zaire. He began to imagine how the silverback gorilla he was staring at would perceive him and then quickly abandoned that idea. He realized that any meaning he attached to the gorilla’s actions would be based on assumptions derived from his experience as a human, which was sure to be quite foreign to the gorilla. The connection to the world of data science is clear: Data scientists should seek out subject matter experts early and often. Then, clearly communicate modeling assumptions and give the experts a chance to challenge these assumptions. Data scientists are creative and resourceful, but are bound to see the world differently from the business experts. Let the experts take their shots and embarrass you early on, and then wow them with your ability to extract meaningful insights. “If you can’t explain it simply, you don’t understand it well enough.” ~ Albert Einstein, physicist Einstein offers many insights for the data scientist. We like, “Not everything that can be counted counts, and not everything that counts can be counted,” and also, “Everything should be made as simple as possible, but not simpler.” However, we chose this one because it speaks to an essential skill that data scientists bring to organizations: the ability to explain complex relationships in a clear and compelling way to motivate action. Good data scientists should feel comfortable in their role as data-to-business (and vice versa) translators. Just as we struggle to locate the signal amidst noise in our data sets, we must put in the work to distill our mathematical ideas so they may be consumed by business leaders. It is our ability to clearly communicate ideas that help decision makers progress toward an ongoing conversation with data. “Models should be used, not believed.” ~ Henri Theil, econometrician Every model is an abstraction of reality. Through abstraction, they lose their intrinsic connection to the real world. Remember, models only get to learn from the data we provide. Models are unaware of the undocumented subtleties of the real world. In the simple linear regression case, we know that predictions near the center of our data are reliable, while predictions outside the observed data often are not. In general, models are not very good at predicting responses to unusual variation. We need to be aware of this. By all means, use a model. The decisions you make, even by considering an inadequate model, will be better than those made with no model at all. However, always keep your understanding of the real problem and the limits of your model in the forefront. Use any model as you would your cell phone’s GPS navigational services—directions are in beta and common sense need apply. “If you torture the data long enough, it will confess.” ~ Ronald Coase, economist If our goal is to use data to make better decisions, there is clearly nothing to gain from manipulating data to reflect anything but the truth. So, let’s avoid misrepresented statistics that one might associate with lies and damn lies. Still, Coase’s quote reminds us to avoid overworking the data when performing data mining or other types of unsupervised learning. We always need to do two things. First, divide the data set (randomly, of course) into a training set and a test set. Use the training set to identify candidate associations, and the test set to perform statistical tests on these associations. Second, adjust the individual significance thresholds (i.e., p values) to account for multiple comparisons. We either decide on a family-wise error rate (FWER)— the probability of making at least one false discovery—or on a false discovery rate (FDR)— the long-term proportion of discoveries that will be false. There are lots of good internet resources that discuss multiple comparisons and how to control the FWER or FDR. Be sure you understand these ideas so you can avoid “data dredging.” “Most problems have either many answers or no answer. Only a few problems have one answer.” ~ Edmund Berkeley, computer scientist Modern computing enables us to find precise answers to imprecise problems. Sometimes this precision leads us to forget just how approximate our models are. While algorithms tend to yield a single optimal solution, the difference between the identified solution and the next-best one is probably practically insignificant. There are usually many moving parts and a slight change to one of them is not going to drastically alter the result. The best data science tools enable decision makers to explore the decision space. This gives decision makers both a sense of ownership and an understanding of how different sources of variability affect the value of their decisions. It may also change their opinion on how they should weigh competing objectives. We as consumers appreciate the ability to consider all our options when shopping online, even for modestly priced items. For sizeable investments, it stands to reason that consumers of our data science tools would appreciate the same luxuries. “Honest criticism is hard to take, particularly from a relative, a friend, an acquaintance or a stranger.” ~ Franklin P. Jones, author This quip gets at how difficult it is to graciously accept criticism, regardless of your relational context to said critics. Often, the people who deliver constructive criticism effectively do so by never appearing to criticize. They coach. They suggest. They camouflage their criticism as helpful hints. Unfortunately, there doesn’t seem to be enough of these sorts of critics to go around. Once people catch on to their tricks, as polite as they are, they may still have a hard time accepting their “suggestions.” We think this quote could be a great tag line for data science tools. Yes, honest criticism is hard to take from other humans, but what about from computers and other electronic devices? Think about the last time you watched someone use a data science tool to manage their business. You may have noticed that it turned the back-and-forth of receiving and responding to criticism into a sort of game. Good data science tools do this. They don’t criticize. They use facts to tell a story and then provide suggestions to improve the business. Good data science tools, like games, provide a safe space where users learn the rules of the game and master the techniques necessary to advance to the next level. As the value of data science continues to become increasingly clear, it’s important that we continue to keep the big picture in mind. Business leaders are motivated to use data science tools because they tell compelling stories. They extract meaning from data in ways that are both creative and actionable. In that regard, data science tools are almost, but not quite, entirely unlike spreadsheets. In case you’re wondering, that was a reference to another Douglas Adams quote. We hope the above quotes and subsequent comments helped you reflect on the application of data science. Rob sickorez leads data science projects for Cyber Group, when he’s not working on his dissertation in Statistical Science. He has a Bachelor’s Degree in Mathematical Sciences and Operations Research from the Air Force Academy, and a Master’s Degree in Operations Research from the Naval Postgraduate School. Prior to returning for a Ph.D. in Statistical Science, he spent 10 years in analytical and project management roles in industries, including the military, financial services and retail. He likes to think of himself as a data therapist as he enjoys working with non-technical business leaders to help them overcome their data anxieties. Email: firstname.lastname@example.org
Published by Enterprise Systems Media. View All Articles.
This page can be found at http://ourdigitalmags.com/article/Top+Seven+Data+Science+Quotes+For+2017/2761231/399971/article.html.