Natural Language Processing

There is a hype regarding the investment in cryptocurrencies and investors ranging from students to hedge-fund managers are keen on making profits by riding the wave. Social media is a powerful tool capable of influencing elections, government policies and stock as well as cryptocurrency markets. In this era of social media, John McAfee’s tweet can drive up the prices of cryptocurrencies and Kylie Jenner’s tweet can cause Snapchat a loss of $1.3 billion in market value. Hence, it is critical to assess the sentiment of the public and some famous personalities while making decisions regarding investments, especially in a highly volatile cryptocurrency markets.

In these IPython notebooks, I have described the process of performing sentiment analysis, starting from labelling the data to training neural networks to application of that neural network for tracking the sentiment of public towards a given cryptocurrency (e.g. Bitcoin). In this project, I choose Coindesk over Twitter since the Coindesk articles are usually written by experts whereas any layman has access to Twitter and he/she can just tweet rubbish which would not have much affect on the prices of cryptocurrencies.

SentimentAnalysis_Cryptocurrency

The first step after scraping the data is to label them for supervised training of a neural network. It is tedious to read thousands of articles and decide whether they have positive, negative or neutral sentiment. In the first IPython notebook, I describe an automated labelling process based on the idea in Ref. [1]. The labelled data containing around 2500 news articles is used for training of a Convolutional Neural Network (CNN) in the second notebook. I tried Long Short Term Memory (LSTM) network but it takes more time to train without any improvement in the accuracy. In the third notebook, I apply the CNN model to track the sentiment of the public over time towards a given cryptocurrency (aspect-based sentiment analysis).

The accuracy of the CNN model is around 65% on the test data set and there is enough room for improvement. The accuracy can be improved by using more data or by optimizing the neural network architecture (Ref. [2]). Future work could include the study of correlation between the sentiment score and the cryptocurrency prices, identify the authors who have biased views to give less weightage to their articles, use of numerous news sources to train CNN etc.

Source code is available in this folder on my github repository.

References

Natural language processing (NLP) has various applications [1] and people are still discovering new ways to apply NLP to improve their business [2] or to have an edge over their competitors. Text classification is a subset of NLP and belongs to the category of supervised machine learning where a given text is analyzed to predict its predefined “class”. For instance, the texts “I am sad” and “It’s a sunny day!” will have predefined labels of negative and positive sentiments, respectively, and a machine learning algorithm should be able to predict the classes/labels of those texts. Common applications of deep learning (a subset of machine learning) in text classification include spam filtering on Gmail, news article classification on Google news and sentiment analysis of tweets and movie reviews. Text classification can also be used to shortlist the resumes of candidates, to improve sales effectiveness (e.g. contact only promising customers, identified by an algorithm, instead of every person on the call list), for customer relation management (e.g. sentiment analysis of customer emails to assign priority), to match a freelancer and an employer based on job description on freelancing websites and the list goes on.

I have described an approach to improve the effectiveness of sales team of an event production company using machine learning in this IPython notebook. A convolutional neural network (CNN) model is built in Keras to predict whether a person is going to attend an event based on the job title of that person. The sales team could give higher priority to people likely to attend an event and contact them first, thereby increasing their effectiveness.

The approach described in the IPython notebook can be applied to other fields/business/companies which involve text classification. Feel free to download the notebook and play with it.

Keep it Simple, Deep Learning!

Blog about deep learning tutorials and applications

Menu

Widgets

Search

Sentiment Analysis of Cryptocurrencies

References

Data Analysis and Machine Learning to Improve Sales Effectiveness

References

References

Share this:

References

Share this: