Approaching the practice of data science by scripting your own data pipeline and dashboardsKey FeaturesDavid teaches how to build a new data pipeline using PixiedustHow to get the most out of Jupyter notebooksThink about the data and their visualisations, before worrying about the algorithmsBook DescriptionData science has become the one scientific endeavor every business has to contend with today. We also need to learn why data algorithms work, but even more importantly, we need to be able to create new insights from our data that we can actually work with. The why is addressed in many publications today, but it is not easy to create insights such that the data scientist does not look like a mountebank creating opaque notebook code before getting to the visually compelling bits of data science: the data science process itself has to be transparent, easy to understand, and it has to be straightforward to optimise.David Taieb created Pixiedust in Python to be able to teach non-data scientists to use Jupyter notebooks, without having to slog through the considerable amount of Jupyter code required to be able to create simple and sometimes not-so-simple insights into data. It is possible to use Pixiedust by just writing a few lines in HTML and CSS, while retaining the ability to drop or remove algorithms and visualisation options, adjust the data pipeline to the requirements posed by the data or just get some very quick results. The case studies represent a carefully graded ladder of progress, ranging all the way from data mined from social media to geo-analytical data helpful in business decision making.It is, however, possible to use both Python and Scala to add features to the Pixiedust data pipeline, and ultimately, to bring the power of the Spark big data framework to the data scientist.What you will learnHow to write basic Pixiedust dashboardsBuilding your own data pipelines without writing connecting pipeline codeLearn how to use Jupyter notebooks without the painCreate compelling data visualisations in PixiedustWrite applications running on Spark, without writing Spark codeWho This Book Is ForTo produce a functioning Pixiedust dashboard, only a modicum of HMTL and CSS is required. Fluency in data interpretation and visualization is also a necessary, since this book is addressed to data professionals, e.g. business and general data analysts. The later chapters also much to offer to the budding data scientist, and to developers on a path to becoming data scientists, since they get to play with Python code running in Jupyter notebooks.About the AuthorDavid Taieb has been the lead architect for the Watson Core UI & Tooling team based in Littleton, Massachusetts for the last four years. During that time, he led the design and development of a Unified Tooling Platform to support all the Watson Tools including accuracy analysis, test experiments, corpus ingestion, and training data generation. Before that, he was the lead architect for the Domino Server OSGi team responsible for integrating the eXpeditor J2EE Web Container in Domino and building first class APIs for the developer community. He started with IBM in 1996, working on various globalization technologies and products including Domino Global Workbench (used to develop multilingual Notes/Domino NSF applications) and a multilingual Content Management system for the WebSphere Application Server. David enjoys sharing his experience by speaking at conferences. You’ll find him at various events like the Unicode conference, Eclipsecon, and Lotusphere. He’s also passionate about building tools that help improve developer productivity and overall experience.
Author: David Taieb