I copy parts of the original mentions dataset from the gdelt project into a mysql database for further evaluations. Below my table definition. The definition of the fields is of cause identical. Please also read the original paper:
THE GDELT EVENT DATABASE DATA FORMAT CODEBOOK V2.0
From the paper:
” … Mentions table that records all mentions of each event. As an event is mentioned across multiple news reports, each of those mentions is recorded in the Mentions table, along with several key indicators about that mention, including the location within the article where the mention appeared (in the lead paragraph versus being buried at the bottom) and the “confidence” of the algorithms in their identification of the event from that specific news report. …”
For a lack of better knowledge I use the hash of GLOBALEVENTID, MentionTimeDate, MentionIdentifier, Actor2CharOffset, ActionCharOffset as primary key. Please let me know if there is a better one!