Enterprise Executive 2017: Issue 2 : Page 8

“The most misleading assumptions are the ones you don’t even know you’re making.” ~ Douglas Adams, author Douglas Adams wrote about a visit he made to a gorilla sanctuary in Zaire. He began to imagine how the silverback gorilla he was staring at would perceive him and then quickly abandoned that idea. He realized that any meaning he attached to the gorilla’s actions would be based on assumptions derived from his experience as a human, which was sure to be quite foreign to the gorilla. The connection to the world of data science is clear: Data scientists should seek out subject matter experts early and often. Then, clearly communicate modeling assumptions and give the experts a chance to challenge these assumptions. Data scientists are creative and resourceful, but are bound to see the world differently from the business experts. Let the experts take their shots and embarrass you early on, and then wow them with your ability to extract meaningful insights. “If you can’t explain it simply, you don’t understand it well enough.” ~ Albert Einstein, physicist ongoing conversation with data. “Models should be used, not believed.” ~ Henri Theil, econometrician Every model is an abstraction of reality. Through abstraction, they lose their intrinsic connection to the real world. Remember, models only get to learn from the data we provide. Models are unaware of the undocumented subtleties of the real world. In the simple linear regression case, we know that predictions near the center of our data are reliable, while predictions outside the observed data often are not. In general, models are not very good at predicting responses to unusual variation. We need to be aware of this. By all means, use a model. The decisions you make, even by considering an inadequate model, will be better than those made with no model at all. However, always keep your understanding of the real problem and the limits of your model in the forefront. Use any model as you would your cell phone’s GPS navigational services—directions are in beta and common sense need apply. “If you torture the data long enough, it will confess.” ~ Ronald Coase, economist TOP SEVE Einstein offers many insights for the data scientist. We like, “Not everything that can be counted counts, and not everything that counts can be counted,” and also, “Everything should be made as simple as possible, but not simpler.” However, we chose this one because it speaks to an essential skill that data scientists bring to organizations: the ability to explain complex relationships in a clear and compelling way to motivate action. Good data scientists should feel comfortable in their role as data-to-business (and vice versa) translators. Just as we struggle to locate the signal amidst noise in our data sets, we must put in the work to distill our mathematical ideas so they may be consumed by business leaders. It is our ability to clearly communicate ideas that help decision makers progress toward an (FWER)—OPVEN 8 | E nt e rp r i s e E xe c u t i ve | 2017: Issue 2 TOP TO SEVEN SEV If our goal is to use data to make better decisions, there is clearly nothing to gain from manipulating data to reflect anything but the truth. So, let’s avoid misrepresented statistics that one might associate with lies and damn lies. Still, Coase’s quote reminds us to avoid overworking the data when performing data mining or other types of unsupervised learning. We always need to do two things. First, divide the data set (randomly, of course) into a training set and a test set. Use the training set to identify candidate associations, and the test set to perform statistical tests on these associations. Second, adjust the individual significance thresholds (i.e., p values) to account for multiple comparisons. We either decide on a family-wise error rate

Previous Page  Next Page


Publication List
Using a screen reader? Click Here