There is a hype regarding the investment in cryptocurrencies and investors ranging from students to hedge-fund managers are keen on making profits by riding the wave. Social media is a powerful tool capable of influencing elections, government policies and stock as well as cryptocurrency markets. In this era of social media, John McAfee’s tweet can drive up the prices of cryptocurrencies and Kylie Jenner’s tweet can cause Snapchat a loss of $1.3 billion in market value. Hence, it is critical to assess the sentiment of the public and some famous personalities while making decisions regarding investments, especially in a highly volatile cryptocurrency markets.
In these IPython notebooks, I have described the process of performing sentiment analysis, starting from labelling the data to training neural networks to application of that neural network for tracking the sentiment of public towards a given cryptocurrency (e.g. Bitcoin). In this project, I choose Coindesk over Twitter since the Coindesk articles are usually written by experts whereas any layman has access to Twitter and he/she can just tweet rubbish which would not have much affect on the prices of cryptocurrencies.
The first step after scraping the data is to label them for supervised training of a neural network. It is tedious to read thousands of articles and decide whether they have positive, negative or neutral sentiment. In the first IPython notebook, I describe an automated labelling process based on the idea in Ref. [1]. The labelled data containing around 2500 news articles is used for training of a Convolutional Neural Network (CNN) in the second notebook. I tried Long Short Term Memory (LSTM) network but it takes more time to train without any improvement in the accuracy. In the third notebook, I apply the CNN model to track the sentiment of the public over time towards a given cryptocurrency (aspect-based sentiment analysis).
The accuracy of the CNN model is around 65% on the test data set and there is enough room for improvement. The accuracy can be improved by using more data or by optimizing the neural network architecture (Ref. [2]). Future work could include the study of correlation between the sentiment score and the cryptocurrency prices, identify the authors who have biased views to give less weightage to their articles, use of numerous news sources to train CNN etc.
Source code is available in this folder on my github repository.
References