In my previous article, I suggested some simple guidelines for beginners to start their deep learning journey. In this article, I will provide a short list of the widely used python libraries in deep learning. You will be able to train deep neural nets and work on your pet projects after installing these libraries.
There are three main steps in creating a neural net model: data collection, data manipulation and training of neural net, and visualization.
- Data collection: Most of the deep neural nets in application today fall under the category of supervised learning. For example, we need to show numerous images of cats to a neural net and “tell” the neural net that they are cats or we need to feed numerous positive sentiment sentences (e.g. iPhone X is amazing!) and tell the neural net that these statements have positive sentiment. In order to train neural nets, you guessed it right, we need lots of data! If you’re lucky, you can get cleaned data from the sources I mentioned in my previous article. Otherwise, you will have to scrape the data from different websites and clean the data yourself. The following libraries are helpful to scrape data from websites:
- Requests: This should be the first choice to access the content of a webpage since it is faster.
- Selenium: Use Selenium if requests couldn’t do the job for you. It can be used to automate manual tasks like scrolling down a page or clicking buttons.
- Beautiful Soup 4: Used to extract data from the webpage obtained from Requests or Selenium.
- Scrapy: Most of the data scraping can be done by the above three libraries. Use Scrapy only if you need to perform advanced data scraping.
- Data manipulation and training of neural network: The collected data is usually cleaned and manipulated before feeding it to the neural networks. The following libraries are used for data manipulation and training of neural networks:
- Pandas: Good for data cleaning and manipulation. You can load data from various sources having different formats (txt, excel, json etc.) into different Pandas dataframes. You can then merge these dataframes, remove duplicate entries, handle missing values, visualize data etc.
- Numpy: Used to handle arrays and matrices and to perform mathematical operations on them.
- Scipy: Used for advanced mathematical operations like integration.
- Scikit-learn: Builds on top of Numpy and Scipy to provide machine learning algorithms like regression, classification, clustering etc.
- TensorFlow: Open-source library developed by Google to train deep neural networks.
- Keras: Intuitive interface to build and train deep neural networks using TensorFlow backend.
- h5py: Used to save Keras models.
- Visualization: The final step is to present the results using nice graphs:
- Matplotlib: It is the most widely used library for plotting graphs and visualizing data.
- Seaborn: Builds on top of Matplotlib and provides advanced visualizations.
- Bokeh: Provides interactive visualizations.
It is really simple to install Python libraries using pip. In Python3, it’s just:
pip3 install package-name
You can install the above mentioned libraries in Python3 as follows:
pip3 install requests
pip3 install selenium
pip3 install beautifulsoup4
pip3 install Scrapy
pip3 install pandas
pip3 install numpy
pip3 install scipy
pip3 install scikit-learn
pip3 install tensorflow
pip3 install keras
pip3 install h5py
pip3 install matplotlib
pip3 install seaborn
pip3 install bokeh
The libraries listed in this article are just enough to get you started with machine learning and deep learning. You might need libraries specific to a particular task in advanced projects (e.g. OpenCV for computer vision and NLTK/Gensim for NLP). You can get detailed information about data science libraries at [1, 2] and about scraping libraries here.