Data Wrangling with Python: Simplify your ETL processes with these hands-on data sanitation tips, tricks and best practices

Data is the new oil but it comes as crude, just like oil. To do anything meaningful - modeling, visualization, machine learning, for predictive analysis - you first need to wrestle and wrangle with data. This book teaches the essential basics of data wrangling using Python.Key FeaturesFocuses on essential basics of wrangling to get you up and running with analysis in no timeTeaches the tricks and know-how of "how to solve data wrangling problems"Added bonus topics - random data generation, data integrity checksBook DescriptionTo practice high-quality science with data, first you need to make sure it is properly sourced, cleaned, formatted, and pre-processed. This book teaches you the most essential basics of this invaluable component of the data science pipeline - data wrangling.What you will learnAble to manipulate complex and simple data structure using Python and it’s built-in functionsUse the fundamental and advanced level of Pandas DataFrames and numpy.arrayManipulate them at run timeExtract and format data from various formats (textual) - normal text file, SQL, CSV, Excel, JSON, and XMLPerform web scraping using Python libraries such as BeautifulSoup4 and html5libPerform advanced string search and manipulation using Python and RegEXHandle outliers, apply advanced programming tricks, and perform data imputation using PandasBasic descriptive statistics and plotting techniques in Python for quick examination of dataPractice data wrangling and modeling using the random data generation techniques Who This Book Is ForSoftware professionals, web developers, database engineers, and business analysts who want to movetowards a career of full-fledged data scientist/analytics expert or whoever wants to use data analytics/machine learning to enrich their current personal or professional projects.Prior experience with Python is not an absolute requirement, however the knowledge of at least oneobject-oriented programming language (e.g. C/C++/Java/JavaScript), and high school level math is highlypreferred. It is a bonus if you have rudimentary idea about relational database and SQL.Even seasoned Python app/web developers can benefit from this book as it focuses on data engineering aspectsAbout the AuthorDr. Tirthajyoti Sarkar works as a Sr. Principal Engineer in semiconductor technology domain where he applies cutting-edge data science/machine learning techniques for design automation and predictive analytics. He writes regularly about Python programming and data science topics. He holds a Ph.D. from the University of Illinois and certifications in Artificial Intelligence and Machine learning from Stanford and MIT.Shubhadeep Roychowdhury works as a Sr. Software Engineer at a Paris based Cyber Security startup where he is applying the state-of-the-art Computer Vision and Data Engineering algorithms and tools to develop cutting edge product. He often writes about Algorithm implementation in Python and similar topics. He holds a Master Degree in Computer Science from West Bengal University Of Technology and certifications in Machine Learning from Stanford. He lives in Paris with his wife and kid.

Author: Tirthajyoti Sarkar

Learn more