For
the first time comparable migration data is available for almost every
country of the world. To date, records were incompatible between nations
and especially by gender and age, nonexistent. Emilio Zagheni from the
Max Planck Institute for Demographic Research (MPIDR) in Rostock,
Germany, for the first time provides a rich migration database by
compiling the global flow of millions of e-mails.
“Where
estimates of demographic flows exist, they are often outdated and
largely inconsistent,” says MPIDR researcher Emilio Zagheni. Official
records are difficult to use for various reasons. Emigrants tend not to
register after they move to a new country or do so very late. There is
also no clear agreement between nations on how to actually define a
migrant.
Official migration data is outdated and inconsistent
“Global internet data does not have these drawbacks,” says Zagheni. “You are where you email.”
Together
with Ingmar Weber from Yahoo! Research he traced emails sent from
Yahoo! accounts around the world to infer the residence of its sender.
Every device which sends email can be located at least at the country
level by an internationally standardized code, the so-called IP address.
Zagheni and Weber analysed the countries derived from IP addresses for a
set of messages sent by 43 million anonymous Yahoo! account holders
between September 2009 and June 2011.
In
addition to the date and geographical origin of each message they
compiled the self-reported birthday and gender of the sender. When a
person started sending e-mail from a new location permanently, it was
assumed that he or she had changed residence. This way they were able to
calculate rates of migration from and to almost every country in the
world. Only anonym zed data was used, so identifying individuals was
impossible and no information about the recipients, the subject, or
content of a message was accessed. The findings have now been published
in the ACM Web Science Conference Proceedings.
The
results not only are a proof of concept. They also reveal international
migration characteristics never seen before. For the USA, Zagheni and
Weber were able to produce the first curve of emigration by age and sex
ever. “In the U.S. many statistics are collected about people who move
into the country, but there is no system that keeps track of people who
move out,” says Zagheni.
The
potential of the email statistics goes far beyond calculating gross
country profiles. For instance, the researchers also looked into
Mexico-US cross-border mobility. The data reveals how strongly both
countries are demographically integrated: most people who moved from
Mexico to the United States either spent time in the USA before
emigrating north, or went back to visit Mexico soon after moving to the
United States. Those in their 30s have the highest rate of mobility
across the Mexico-US border, while the least mobile are those 50 and
older.
Only the tip of the iceberg
The
strength of Zagheni’s and Weber’s migration data comes not only from
the vast number of emails available, but also from a mathematical model
they set up to adjust for typical shortcomings of email statistics:
those who send email are not representative of the entire population.
Some groups, like the elderly, use email less or not at all and are thus
underrepresented. But the researchers managed to calculate adjustment
factors for such groups by gauging their email data against migration
numbers from European countries, where official data is fairly reliable.
“What
we addressed so far is only the tip of the iceberg,” says Emilio
Zagheni. With further fine-tuning of the adjustment factors and mining
more digital data like twitter messages, more difficult questions could
be tackled. For instance one could keep track of the short and long-term
mobility patterns before and after a crisis like that of the Japanese
Fukushima reactors.
Unquestionably,
digital records give demographers the chance to gain a more accurate
picture of population dynamics in regions they can so far only guess
about, says Zagheni. “This research has the most potential in developing
countries, where the Internet spreads much faster than registration
programs develop.”