Note: This is an early draft I wanted to put online, so I can talk with colleagues about it. I will refine it in the next weeks.
Summary: Based on the events data from gdelt a graph can be plotted which displays the violence in Ethiopia. Here I apply a simple ARIMA model to the data, hoping to predict one or two days. You can find a more detailed description of the source data on my other posts tagged gdelt and on the gdelt website.
This method can be applied to any other country.
- normalize data before modeling
- more verbose description
- data interpretation
Next Steps: Build and compare the following models:
- Sarimax Model (70 % done, 2/3)
- LSTM Model (5 % done, 3/3)
- standard Laptop with Ubuntu
- Python 3x / Jupyter / libaries as imported / anacona
While there are many tools to merge single PDF files to one big PDF file most of them fail if you try to merge many files, e.g. more than 10000. Therefore, I wrote a little python script which takes batches of files and merges them. These batches can be merged again.
Posted in coding, python
Please also read the original paper:
THE GDELT EVENT DATABASE DATA FORMAT CODEBOOK V2.0
From the paper:
” … Mentions table that records all mentions of each event. As an event is mentioned across multiple news reports, each of those mentions is recorded in the Mentions table, along with several key indicators about that mention, including the location within the article where the mention appeared (in the lead paragraph versus being buried at the bottom) and the “confidence” of the algorithms in their identification of the event from that specific news report. …”
For a lack of better knowledge I plan to use GlobalEventID + MentionTimeDate as primary key. Please let me know if there is a better one!