Predict Violence in Ethiopia (1/3 ARIMA Model)

Note: This is an early draft I wanted to put online, so I can talk with colleagues about it. I will refine it in the next weeks.

Summary: Based on the events data from gdelt a graph can be plotted which displays the violence in Ethiopia. Here I apply a simple ARIMA model to the data, hoping to predict one or two days. You can find a more detailed description of the source data on my other posts tagged gdelt and on the gdelt website.

This method can be applied to any other country.

DoTos:

  • normalize data before modeling
  • more verbose description
  • data interpretation

Next Steps: Build and compare the following models:

  • Sarimax Model (70 % done, 2/3)
  • LSTM Model (5 % done, 3/3)

Technology used:

  • standard Laptop with Ubuntu
  • Python 3x / Jupyter / libaries as imported / anacona

Jupyter Notebook:

Posted in coding, data processing, datacoll, gdelt, python | Tagged , | Leave a comment

Gdelt Mentions

Please also read the original paper:  

THE GDELT EVENT DATABASE DATA FORMAT CODEBOOK V2.0

From the paper:
” … Mentions table that records all mentions of each event. As an event is mentioned across multiple news reports, each of those mentions is recorded in the Mentions table, along with several key indicators about that mention, including the location within the article where the mention appeared (in the lead paragraph versus being buried at the bottom) and the “confidence” of the algorithms in their identification of the event from that specific news report. …”
 
For a lack of better knowledge I plan to use GlobalEventID + MentionTimeDate as primary key. Please let me know if there is a better one!

Continue reading

Posted in datacoll, gdelt | Tagged , | Leave a comment