◐ Shell
reader mode source ↗

Practical Business Python

Taking care of business, one python script at a time

Tue 29 May 2018

Book Review: Machine Learning with Python Cookbook

Posted by Chris Moffitt in articles   

article header image

Introduction

This article is a review of Chris Albon’s book, Machine Learning with Python Cookbook. This book is in the tradition of other O’Reilly “cookbook” series in that it contains short “recipes” for dealing with common machine learning scenarios in python. It covers the full spectrum of tasks from simple data wrangling and pre-processing to more complex machine learning model development and deep learning implementations. Since this is such a fast moving and broad topic, it is nice to get a new book that covers the latest topics and presents them in a compact but very useful format. Bottom line, I enjoyed reading this book and think it will be a useful resource to have on my python bookshelf. Read on for some more details about the book and who will benefit most from reading it.

Where does this book fit?

As data science, machine learning and AI have become more and more popular, there is a proliferation of books that try to cover these topics in differing manners. Some books go very deep in the math and theory behind the various machine learning algorithms. Others try to cover a lot of content but do not provide a quick reference resource with code examples for solving real world problems. Machine Learning with Python Cookbook, fills this code-heavy niche with lots of examples. There are very few paragraphs with math equations or details behind the implementation of machine learning algorithms. Instead, Chris Albon breaks the topics down into bite size chunks that solve a very specific problem. Each of the nearly 200 recipes follows a similar format:

  • Problem definition
  • Solution
  • Discussion (optional)
  • Additional resources (optional)

In most cases, the problem definition is as simple as “You want to multiply two matrices” or “You need to visualize a model created by a decision tree learning algorithm.” This organization makes it convenient to look at the table of contents, and find the relevant section with ease.

Each solution is fully self-contained and can be copied and pasted into a standalone script or jupyter notebook and executed. In addition, the code sample includes all the necessary imports as well as sample data sets (e.g. Iris, Titanic, MNIST). They are all around 12-20 lines of code with comments included so they are easy to dissect and understand.

In some cases, there is further discussion about the approach as well as hints and tips related to the solutions. In many cases, topics like performance for larger and more complex data sets are discussed and options are presented for managing those situations.

Finally, the author also includes links to more details that might be useful when you need to dive into the problem in more depth.

Chapter Overview

The book only has 340 pages of content but it is broken down into 21 chapters. In my opinion, this is a good structure because each chapter provides a concise introduction of a topic and specific code examples that solve common problems.

The chapters start with basic numpy functions, then move to more complex pandas and sckit-learn functions and close out with some keras examples. Here’s a list of each chapter along with its primary focus:

  1. Vectors, Matrices and Arrays [numpy]
  2. Loading Data [scikit-learn, pandas]
  3. Data Wrangling [pandas]
  4. Handling Numerical Data [pandas, scikit-learn]
  5. Handling Categorical Data [pandas, scikit-learn]
  6. Handling Text [NLTK, scikit-learn]
  7. Handling Dates and Times [pandas]
  8. Handling Images [OpenCV, matplotlib]
  9. Dimensionality Reduction Using Feature Extraction [scikit-learn]
  10. Dimensionality Reduction Using Feature Selection [scikit-learn]
  11. Model Evaluation [scikit-learn]
  12. Model Selection [scikit-learn]
  13. Linear Regression [scikit-learn]
  14. Trees and Forests [scikit-learn]
  15. K-Nearest Neighbors [scikit-learn]
  16. Logistic Regression [scikit-learn]
  17. Support Vector Machines [scikit-learn]
  18. Naive Bayes [scikit-learn]
  19. Clustering [scikit-learn]
  20. Neural Networks [keras]
  21. Saving and Loading Trained Models [scikit-learn, keras]

To illustrate how the chapters work, let’s look at chapter 15 which cover K-Nearest Neighbors (KNN). In this cases, the introduction recipe (15.0) gives a concise summary of KNN and why it is a popular tool.

Now that we remember what KNN is used for, we’re likely going to want to apply it to our data. First, we will want “to find an observation’s k nearest observations (neighbors).” Recipe 15.1 contains specific code as well as some more detail around the various algorithm parameters we can tweak such as the distance metrics (Euclidean, Manhattan or Minkowski).

Next, recipe 15.2 shows how to take some unknown data and predict its class based on neighbors. This recipe uses the iris data set but also includes important caveats about scaling data when using KNN.

Recipe 15.3 then moves on to cover a common challenge with KNN, specifically how do you select the best value for k? This recipe uses scikit-learn’s Pipeline function and GridSearchCV to conduct a cross-validation of KNN classifiers with different values of k . The code is simple to comprehend and easy to extend to your own data sources.

The point is that each chapter can be consumed at the individual recipe level or read more broadly to understand the concept in more detail. I really like this approach because so many topics are covered at a quick pace. If I feel the need to dive into the mathematical rationale for an approach, I can use these recipes as a jumping off point for further review.

Conclusion

Overall, the Machine Learning with Python Cookbook is an extremely useful book which is aptly described in the tag line as “Practical Solutions From Preprocessing to Deep Learning.” Chris has done a fabulous job of collecting a lot of the most common machine learning problems and summarizing solutions. I definitely encourage those of you using any of the libraries mentioned here to pick up this book. I have added this book to my recommended resources page so please check it out and see if any of the other recommendations might be useful. Also, let me know if you find this review useful.

Comments