Predict Violence in Ethiopia (2/3 SARIMAX Model)

Note: This is an early draft I wanted to put online, so I can talk with colleagues about it. I will refine it in the next weeks.

Summary: Based on the events data from gdelt a graph can be plotted which displays the violence in Ethiopia. Here I try to improve a simple ARIMA model by adding external and seasonal dimensions to the data.

You can find a more detailed description of the source data on my other posts tagged gdelt and on the gdelt website.

This method can be applied to any other country.

DoTos:

  • normalize data before modeling
  • more verbose description of each step
  • extensive data interpretation

Next Steps: Build and compare the following models:

  • LSTM Model (15 % done, 3/3)

Technology used:

  • standard Laptop with Ubuntu
  • Python 3x / Jupyter / libaries as imported / anacona

Jupyter Notebook:

(download data used here as csv)

Posted in coding, data processing, gdelt, python | Tagged | Leave a comment

Cameo Event Codes

The gdelt database uses Cameo (=Conflict and Mediation Event Observations) Event codes to classify events in news articles. Below the list of event codes in use. There are 310 different event codes, grouped with 20 root event codes, two digits 01 – 20. The event codes may have a leading “0” and are therefore saved in the database as datatype string. The post “Country Dashboard” is based on the root event codes.

Original information is from the CAMEOConflict and Mediation Event ObservationsEvent and Actor Codebook

Posted in database, gdelt | Tagged , | Leave a comment

Predict Violence in Ethiopia (1/3 ARIMA Model)

Note: This is an early draft I wanted to put online, so I can talk with colleagues about it. I will refine it in the next weeks.

Summary: Based on the events data from gdelt a graph can be plotted which displays the violence in Ethiopia. Here I apply a simple ARIMA model to the data, hoping to predict one or two days. You can find a more detailed description of the source data on my other posts tagged gdelt and on the gdelt website.

This method can be applied to any other country.

DoTos:

  • normalize data before modeling
  • more verbose description
  • data interpretation

Next Steps: Build and compare the following models:

  • Sarimax Model (70 % done, 2/3)
  • LSTM Model (5 % done, 3/3)

Technology used:

  • standard Laptop with Ubuntu
  • Python 3x / Jupyter / libaries as imported / anacona

Jupyter Notebook:

Posted in coding, data processing, datacoll, gdelt, python | Tagged , | Leave a comment