I copied parts of the gdelt events dataset in a mysql database. Below is a summary of GDELT Event Codebook V2.0, make sure you read this original paper. I partitioned on MonthYear.
GobalEventID | unique identifier assigned to each event record (here used as primary key together with Monthyear) | 410411699 | INT |
Day | Date the event took place YYYYMMDD | 20140218 | INT |
MonthYear | Alternative fo rmatting of the event date (YYYYMM, here used as primary key together with GlobalEventID) | 201402 | INT |
FractionDate | FFFF is the percentage of the year Completed by that day (MONTH * 30 + DAY) / 365 (YYYY.FFFF) | 2014.1315 | floating point |
Actor1Code | complete raw CAMEO code for Actor1 | EDU | string |
Actor1Name | actual name of Actor 1 | STUDENT | string |
Actor1CountryCode | 3 character CAMEO code | UKR | string |
Actor1KnownGroupCode | CAMEO code | ||
Actor1EthnicCode | If the source document specifies the ethnic affiliation of Actor1 and that ethnic group has a CAMEO entry | ||
Actor1Religion1Code | If the source document specifies the religious affiliation of Actor1 and that religious group has a CAMEO entry | ||
Actor1Religion2Code | If multiple religious codes are specified for Actor1 (e.g. Catholic, Christianity) | ||
Actor1Type1Code | 3 - character CAMEO code of the CAMEO “type” or “role” of Actor1 (police, government, rebel) | string | |
Actor1Type2Code | string | ||
Actor1Type3Code | string | ||
Actor2 | like actor 1 fields | ||
IsRootEvent | guess: most important event of article? | ||
EventCode | raw CAMEO action code describing the action that Actor1 performed upon Actor2 | 874 | INT |
EventBaseCode | CAMEO event codes are defined in a three - level taxonomy. For events at level three in the taxonomy, this yields its level two leaf root node. For example, code “0251” (“Appeal for easing of administra tive sanctions”) would yield an EventBaseCode of “025” (“Appeal to yield”). | 0251 | string (leading 0!) |
EventRootCode | similar to EventBaseCode, this defines the root - level category the event code falls under. For example, code “0251” (“Appeal for easing of administrative sanctions”) has a root code of “02” (“Appeal”). | 02 | string (leading 0!) |
QuadClass | CAMEO event taxonomy is ultimately organized under four primary classifications: Verbal Cooperation, Material Cooperation, Verbal Conflict, and Material Conflict | 3 | INT (1 to 4) |
GoldsteinScale | - 10 to +10, capturing the theoretical potential impact that type of event will have on the stability of a country. | 3 | floating point |
NumMentions | This is the total number of mentions of this event across all source documents during the 15 minute update in which it was first seen. event within a single document also contribute to this count Multiple references to an | 22 | INT |
NumSources | total number of information sources containing one or more mentions of this event during the 15 minute update in which it was first seen . | 12 | INT |
NumArticles | total number of source documents containing one or more mentions of this event during the 15 minute update in which it was first seen | 35 | INT |
AvgTone | average “tone” of all documents containing one or more mentions of this event during the 15 minute up date in which it was first seen. The score ranges from -100 (extremely negative) to +100 (extremely positive). Common values range between -10 and +10, with 0 indicating neutral. | -2 | numeric (-100 to +100) |
Actor1Geo_Type | geographic resolution 1=COUNTRY (match was at the country level), 2=USSTATE (match was to a US state), 3=USCITY (match was to a US city or la ndmark), 4=WORLDCITY (match was to a city or landmark outside the US), 5=WORLDSTATE (match was to an Administrative Division 1 outside the US – roughly equivalent to a US state) | 2 | INT |
Actor1 Geo_Fullname | full human - readable name of the matched location | San Diego, California, United States | string |
Actor1Geo_CountryCode | 2 - character FIPS10 - 4 country code | us | string |
Actor1Geo_ADM1Code | 2 - character FIPS10 - 4 country code followed by the 2 - character FIPS10 - 4 administrative division 1 (ADM1) | USCA | string |
Actor1Geo_ADM2Code | international locations this is the numeric Global Administrative Unit Layers (GAUL) administrative division 2 (ADM2) code assigned to each global location, wh ile for US locations this is the two - character shortform of the state’s name (such as “TX” for Texas) followed by the 3 - digit numeric county code (following the INCITS 31:200x standard used in GNIS) | CA073 | string |
Actor1Geo_Lat. | centroid latitude of the landmark | 32.7153000000 | floating point |
Actor1Geo_Long | centroid longitude of the landmark | -117.1570000000 | floating point |
Actor1Geo_FeatureID | GNS or GNIS FeatureID | 1661377 or AF | string |
Actor2Geo | like Actor1Geo_ fields | ||
DATEADDED | date the event was added to the master database in YYYYMMDDHHMMSS format in the UTC timezone . For those needing to access events at 15 minute resolution, this is the field that should be used in queries | 20160101071500 | INT |
SOURCEURL | URL or citation of the first news report it found this event in. | http://www.thefrontierpost.com/article/36485... | string |
Inserted | date when the dataset was inserten in my mysql database | 20191103081401 | INT |
filename | filename from the gdelt file | 20160101041500.export.CSV | string |