Keras Tutorial for Image Classification: A Convolutional Neural Network and its Interpretation

Convolutional neural network (CNN) has been successfully applied in many areas of computer vision and natural language processing (NLP). Details of CNN can be found in Ref. [1] but for the present discussion, it is suffice to state that a CNN is more efficient than a dense neural network [1] and learns local spatial patterns instead of the global patterns [2]. CNNs are explainable since they retain the spatial properties of the images/ sentences and learn the local patterns [2]. This is a significant advantage since deep learning models are often referred to as “black boxes” and are sometimes avoided when explainability is a key issue.

Many techniques have been proposed for visualizing and interpreting CNNs among which the most useful ones are [2]:

  1. Visualization of the outputs of intermediate convolutional layers
  2. Visualization of the filters of convolutional layers [2,3]
  3. Visualization of class-activation maps (CAM)

In this IPython notebook, I have discussed the implementation of a CNN in Keras to classify the images of CIFAR-10 dataset. I have also discussed briefly about grad-CAM, a specific form of CAM, and used it to “explain” the decisions made by my CNN model. A sample image and the interpretation of CNN using grad-CAM is shown in Fig. 1. Grad-CAM can also be used in NLP to interpret a CNN model. For instance, grad-CAM can be employed to identify the words used to make decision regarding the sentiment of a text.

cifar_gradCAM

Figure 1: Image of a car (left), the corresponding gradient-Class Activation Map (centre) and the superimposition of image & grad-CAM (right). The red ovals in the centre image indicate that the top and the back regions of the car are used by the CNN to classify the image as an “automobile”. However, the road, encircled by the blue oval, is also falsely used by the CNN for classification.

References

  1. Stanford CS231n course
  2. F. Chollet, “Deep Learning with Python”, 2017
  3. Convnet filter visualization

Sentiment Analysis of Cryptocurrencies

There is a hype regarding the investment in cryptocurrencies and investors ranging from students to hedge-fund managers are keen on making profits by riding the wave. Social media is a powerful tool capable of influencing elections, government policies and stock as well as cryptocurrency markets. In this era of social media, John McAfee’s tweet can drive up the prices of cryptocurrencies and Kylie Jenner’s tweet can cause Snapchat a loss of $1.3 billion in market value. Hence, it is critical to assess the sentiment of the public and some famous personalities while making decisions regarding investments, especially in a highly volatile cryptocurrency markets.

In these IPython notebooks, I have described the process of performing sentiment analysis, starting from labelling the data to training neural networks to application of that neural network for tracking the sentiment of public towards a given cryptocurrency (e.g. Bitcoin). In this project, I choose Coindesk over Twitter since the Coindesk articles are usually written by experts whereas any layman has access to Twitter and he/she can just tweet rubbish which would not have much affect on the prices of cryptocurrencies.

SentimentAnalysis_Cryptocurrency

The first step after scraping the data is to label them for supervised training of a neural network. It is tedious to read thousands of articles and decide whether they have positive, negative or neutral sentiment. In the first IPython notebook, I describe an automated labelling process based on the idea in Ref. [1]. The labelled data containing around 2500 news articles is used for training of a Convolutional Neural Network (CNN) in the second notebook. I tried Long Short Term Memory (LSTM) network but it takes more time to train without any improvement in the accuracy. In the third notebook, I apply the CNN model to track the sentiment of the public over time towards a given cryptocurrency (aspect-based sentiment analysis).

The accuracy of the CNN model is around 65% on the test data set and there is enough room for improvement. The accuracy can be improved by using more data or by optimizing the neural network architecture (Ref. [2]). Future work could include the study of correlation between the sentiment score and the cryptocurrency prices, identify the authors who have biased views to give less weightage to their articles, use of numerous news sources to train CNN etc.

Source code is available in this folder on my github repository.

References

  1. Francesco Pochetti’s blog
  2. Konukoii blog

 

Data Analysis and Machine Learning to Improve Sales Effectiveness

Natural language processing (NLP) has various applications [1] and people are still discovering new ways to apply NLP to improve their business [2] or to have an edge over their competitors. Text classification is a subset of NLP and belongs to the category of supervised machine learning where a given text is analyzed to predict its predefined “class”. For instance, the texts “I am sad” and “It’s a sunny day!” will have predefined labels of negative and positive sentiments, respectively, and a machine learning algorithm should be able to predict the classes/labels of those texts. Common applications of deep learning (a subset of machine learning) in text classification include spam filtering on Gmail, news article classification on Google news and sentiment analysis of tweets and movie reviews. Text classification can also be used to shortlist the resumes of candidates, to improve sales effectiveness (e.g. contact only promising customers, identified by an algorithm, instead of every person on the call list), for customer relation management (e.g. sentiment analysis of customer emails to assign priority), to match a freelancer and an employer based on job description on freelancing websites and the list goes on.

I have described an approach to improve the effectiveness of sales team of an event production company using machine learning in this IPython notebook. A convolutional neural network (CNN) model is built in Keras to predict whether a person is going to attend an event based on the job title of that person. The sales team could give higher priority to people likely to attend an event and contact them first, thereby increasing their effectiveness.

The approach described in the IPython notebook can be applied to other fields/business/companies which involve text classification. Feel free to download the notebook and play with it.

References

  1.  https://machinelearningmastery.com/applications-of-deep-learning-for-natural-language-processing/
  2.  https://medium.com/xeneta/boosting-sales-with-machine-learning-fbcf2e618be3

Keras Tutorial for Beginners: A Simple Neural Network to Identify Numbers (MNIST Data)

The “dense” or the “fully-connected” neural network (NN) is the simplest form of neural net where a neuron in a given layer is connected to all the neurons in the previous and the next layers as shown in the below diagram.

mnist_2layers

A Dense Neural Network. Image credits: ml4a.

The dense NN can only take one-dimensional (1D) input and hence the 2D inputs like images have to be “flattened” as shown in the diagram before feeding them to the dense NN. This neural net can then be trained by “showing” it an image and “telling” it the number displayed in the image. This training process involves forward propagation followed by loss computation and backward propagation to update the weights of the neural net. Please refer A. Karpathy’s blog or Andrew Ng’s course for more details. The IPython notebook shared on my Github repository shows that the implementation of a dense neural net in Keras requires less than 10 lines of code (step 2 onwards) and obtains an accuracy of 97% (higher accuracy can be achieved by increasing “epochs” in step 5).

P.S.

  1. Here is the list of loss functions available in Keras. In general, “binary_crossentropy”, “categorical_crossentropy” and “mean_squared_error” are used for binary classification, multi-class classification and regression problems, respectively.
  2. Here is the list of activation functions. Usually, “relu” works well for hidden neurons. The “sigmoid”, “softmax” and “linear” functions are used for output neurons in binary classification, multi-class classification and regression problems, respectively.
  3. Here is the list of optimizers. “Adam” is found to work well in most cases.
  4. “accuracy” is the common metric for classification and “mean_squared_error” for regression.

Top 14 Python Libraries for Machine Learning and Deep Learning

In my previous article, I suggested some simple guidelines for beginners to start their deep learning journey. In this article, I will provide a short list of the widely used python libraries in deep learning. You will be able to train deep neural nets and work on your pet projects after installing these libraries.

There are three main steps in creating a neural net model: data collection, data manipulation and training of neural net, and visualization.

  1. Data collection: Most of the deep neural nets in application today fall under the category of supervised learning. For example, we need to show numerous images of cats to a neural net and “tell” the neural net that they are cats or we need to feed numerous positive sentiment sentences (e.g. iPhone X is amazing!) and tell the neural net that these statements have positive sentiment. In order to train neural nets, you guessed it right, we need lots of data! If you’re lucky, you can get cleaned data from the sources I mentioned in my previous article. Otherwise, you will have to scrape the data from different websites and clean the data yourself. The following libraries are helpful to scrape data from websites:
    1. Requests: This should be the first choice to access the content of a webpage since it is faster.
    2. Selenium: Use Selenium if requests couldn’t do the job for you. It can be used to automate manual tasks like scrolling down a page or clicking buttons.
    3. Beautiful Soup 4: Used to extract data from the webpage obtained from Requests or Selenium.
    4. Scrapy: Most of the data scraping can be done by the above three libraries. Use Scrapy only if you need to perform advanced data scraping.
  2. Data manipulation and training of neural network: The collected data is usually cleaned and manipulated before feeding it to the neural networks. The following libraries are used for data manipulation and training of neural networks:
    1. Pandas: Good for data cleaning and manipulation. You can load data from various sources having different formats (txt, excel, json etc.) into different Pandas dataframes. You can then merge these dataframes, remove duplicate entries, handle missing values, visualize data etc.
    2. Numpy: Used to handle arrays and matrices and to perform mathematical operations on them.
    3. Scipy: Used for advanced mathematical operations like integration.
    4. Scikit-learn: Builds on top of Numpy and Scipy to provide machine learning algorithms like regression, classification, clustering etc.
    5. TensorFlow: Open-source library developed by Google to train deep neural networks.
    6. Keras: Intuitive interface to build and train deep neural networks using TensorFlow backend.
    7. h5py: Used to save Keras models.
  3. Visualization: The final step is to present the results using nice graphs:
    1. Matplotlib: It is the most widely used library for plotting graphs and visualizing data.
    2. Seaborn: Builds on top of Matplotlib and provides advanced visualizations.
    3. Bokeh: Provides interactive visualizations.

It is really simple to install Python libraries using pip. In Python3, it’s just:

pip3 install package-name

You can install the above mentioned libraries in Python3 as follows:

pip3 install requests
pip3 install selenium
pip3 install beautifulsoup4
pip3 install Scrapy

pip3 install pandas
pip3 install numpy
pip3 install scipy
pip3 install scikit-learn
pip3 install tensorflow
pip3 install keras
pip3 install h5py

pip3 install matplotlib
pip3 install seaborn
pip3 install bokeh

The libraries listed in this article are just enough to get you started with machine learning and deep learning. You might need libraries specific to a particular task in advanced projects (e.g. OpenCV for computer vision and NLTK/Gensim for NLP). You can get detailed information about data science libraries at [1, 2] and about scraping libraries here.

A Short Beginner’s Guide to Deep Learning

Deep learning (DL) caught my attention in the beginning of 2017 and I wanted to apply it in my PhD project. I wasn’t sure whether someone without a computer science or mathematics degree can survive in this field but I decided to give it a try anyways. Long story short: it’s neither too easy nor too difficult to grasp deep learning.

DL requires decent programming skills and an affinity to math (you don’t have to be an expert) but the hardest part is to get good guidance and online resources if you prefer studying DL without enrolling for a full-time two-year degree. A novice would be overwhelmed with numerous “beginner’s guides” providing links to hundreds of courses, blogs and tutorials. 60%* of the beginners who cross this stage will manage to do some basic math and python courses/tutorials for few weeks only to conclude that they are unfit for DL. The remaining 10%* move on to advanced math or DL courses and 90%* of them would regret for taking a stab at DL. In a nutshell, one gets a stick without a carrot in sight for a really long time – resulting in a loss of confidence.

That brings me to the motivation of this blog: provide information that is just enough to get one started in DL and build confidence.  This helps a beginner to avoid making decisions (or mistakes) regarding the courses/tutorials, programming language and libraries to use for deep learning.

I prefer learning new things in a non-linear fashion as opposed to the conventional method described above. I would recommend the following steps for those who want to embark on deep learning journey°:

  1. Create google alert with key words “data science”, “AI”, “artificial intelligence”, “machine learning”, “deep learning”: reading news every day about the advances and success stories of artificial intelligence (AI) is a good motivator.
  2. Install Python, Jupyter and basic libraries like NumPy, SciPy, Matplotlib, sklearn. Do basic machine learning (ML) tutorials: Python is gaining popularity as a programming language for DL and ML [1, 2]; Jupyter makes your life easy while implementing new models in Python; Python has numerous libraries like the ones mentioned above to simplify our task. Harvard data science course has free online material to practice Python programming but it’s not necessary to go through each of them and be overwhelmed and demotivated. You can do it later in Step 7.4. Try lab2 and lab3 to get acquainted with basic libraries and lab6 and homework5 to do linear regression and classification, respectively, using sklearn. The cheatsheets for python3, numpy, matplotlib and sklearn [3, 4] might be handy.
  3. Install TensorFlow and Keras. Do tutorials: I prefer TensorFlow since it is open-source, supported by google (no fear of development cease) and is the most popular choice for deep learning. However, it is difficult for non-programmers to use and Keras comes to the rescue. Practice a couple of tutorials to perform linear regression and classification using Keras.
  4. Deep learning basics: Congrats! You can now work on deep learning projects. It would take just a week for you to reach this point instead of a year if you had followed the conventional path. Now that you have a big picture of deep learning and hands-on experience, you can go deeper. I recommend Andrew Ng’s deep learning course to get your basics right. The video lectures are really good but I can understand if you get bored of listening to them for a long time and skip them but practice the assignments!
  5. Blogs: Following blogs is a good way to get different perspectives. I recommend the blogs of A. Karpathy, C. OlahDeepMind and OpenAI.
  6. Twitter: Follow people on Twitter to get the latest DL research updates: Andrew Ng, Fei-Fei Li, Andrej Karpathy, Ian Goodfellow, Francois Chollet, Pieter Abbeel and the list goes on…
  7. Going further deep: If you have reached this stage, you have hands-on experience with Keras, know the basics of deep learning and are aware of the latest deep learning research. Just like we build deeper neural nets after trying shallow ones, it’s time to go deeper in our deep learning journey:
    1. Read the textbook Deep learning with Python by Francois Chollet: It costs around 40 bucks but it’s worth it. Chollet explains deep learning concepts in simple words.
    2. More Keras tutorials on github: Apply DL to various cases.
    3. DL pet projects: You have enough knowledge by now to decide which subfield you like in DL. Get data and start working on your own pet projects. You can find a list of datasets on KDnuggets, Wikipedia, Kaggle, crowdflower, CV datasets, deeplearning.net etc.
    4. Harvard data science course assignments and slides: You skipped this in step 2. Time to finish studying basic ML techniques.
    5. Still got some motivation? More material for basic ML techniques
    6. If you got some stamina left, try Stanford courses on computer vision and natural language processing (NLP) and UC Berkeley course on reinforcement learning.
    7. Still here? Go find that data science job!

P.S.

Deep learning required expert programming skills before 2015 but the introduction of Keras has relaxed this criteria. I believe it’s going to get lot simpler to comprehend and program deep neural networks to create new products (translator, image tagging, sentiment analysis, recommendation system etc.).

Most of the deep learning blogs I came across are long and technical which might not be good for beginners. I will write short blogs focusing on the applications of deep learning in different fields (computer vision, NLP etc.) followed by a short programming tutorial relevant to that field. Stay tuned!

 

*Disclaimer: The percentage values appearing in this blog are fictitious. Any resemblance to the real survey values, if any, is purely coincidental.
°Disclaimer: The guidance given in this article might not work for all.