Computational Social Science

David Lazer1, Alex Pentland2, Lada Adamic3, Sinan Aral2,4, Albert-László Barabási5, Devon Brewer6, Nicholas Christakis1, Noshir Contractor7, James Fowler8, Myron Gutmann3, Tony Jebara9, Gary King1, Michael Macy10, Deb Roy2, Marshall Van Alstyne2,11

1. Harvard University, Cambridge, MA, USA. E-mail: david_lazer@harvard.edu
2. Massachusetts Institute of Technology, Cambridge, MA, USA.
3. University of Michigan, Ann Arbor, MI, USA.
4. New York University, New York, NY, USA.
5. Northeastern University, Boston, MA, USA.
6. Interdisciplinary Scientific Research, Seattle, WA, USA.
7. Northwestern University, Evanston, IL, USA.
8. University of California-San Diego, La Jolla, CA, USA.
9. Columbia University, New York, NY, USA
10. Cornell University, Ithaca, NY, USA.
11. Boston University, Boston, MA, USA.

## Question 1: Network Science and Big Data

What are your impressions of the way that network science has gone? A lot of it increasingly (since small worlds especially) focuses on the shape of the network, rather than the attributes of nodes, do you think that’s the right way forward? Is there anything big missing from network sociology, or a direction that you think it should be going in? Will “small data” networks be drown out by big data?

How is Network Science going on?

If you look back at my original paper with Strogatz it has “collective dynamics” right there in the title—it was always the relationship between structure and behavior that we thought was interesting, not structure for its own sake.

We also didn’t intend for the “small world” model that we proposed to be interpreted as a realistic model of network structure; rather we were trying to make a conceptual point that even subtle changes in micro-structure could have dramatic effects on macro-structure and hence possibly also macro-behavior.

I’ve also come to believe that modeling exercises that are unconstrained by data have a tendency to gravitate toward phenomena that are mathematically interesting, which is no guarantee of empirical relevance. Fortunately I think that in recent years we’ve seen more emphasis on studying both network structure and collective behavior empirically.（最重要的是数学推导与经验知识的一致性）

What questions truly need big data?

For some questions, such as when we are interested in rare events or estimating tiny effect sizes, it is indeed necessary to have a very large number of observations; in some of our recent work on diffusion, for example, it turns out that a billion observations is not excessive. But for other questions, the scale of the data is much less important than its type or quality. Sometimes it matters that a sample is unbiased or representative; other times it is important to have proper randomization in order to infer causality; and other times still it is important simply that you have instrumented the outcome variable of interest. Regardless, the point is that the data is relevant to the question you’re asking, not how big or small it is.

## Question 2: Why socialogical imagination exists but not economic thinking?

Economists have been pretty successful at clearly articulating a set of core concepts that have spread out into the broader world and form the basis for economic thinking: supply and demand; markets; externalities; people respond incentives; perverse incentives; sunk costs; exogenous shocks; etc. Since your early work brought together core sociological concepts (namely social influence and the Matthew effect): i) Do you think sociology should try to reorganize itself around core, “Soc 101” concepts that every introductory class would cover? We often talk about the sociological imagination, but that is much less clear than economic thinking.

The social reality is too complex

This is a tough one. One of things I’ve always liked about sociology is its embrace of multiple viewpoints, both in terms of theory and methodology. Personally I think social reality is too complex to be adequately accounted for by any single theoretical framework—a point that Merton made very eloquently many years ago in his article on middle-range theories. Unfortunately I don’t think his argument was properly understood at the time (e.g. by rational choice theorists) and I don’t think it is still.

Perhaps that’s because simple universal frameworks are institutionally powerful even when they’re scientifically questionable. And that’s why it’s a tough question: because I think that one reason why economics is so much more influential than sociology in government, in the media, and in society, is precisely because economists can articulate a fairly coherent worldview that they can all (by and large) get behind, whereas sociologists can’t really agree on anything. Economists are therefore in a much better position to offer answers to questions that people care about, whereas sociologists tend to point out all the ways in which the question is more difficult than the questioner realized. Even if the sociologist’s response better reflects our true understanding of the world, it’s no surprise that most people would prefer to listen to the economist.

Sociologist should try to solve some nontrivial but solvable problems to reach a consensus.

That said, I wouldn’t advocate sociology trying to develop a single set of core concepts just to compete with economics. Rather I would propose that sociologists identify a small set of nontrivial real-world problems that we believe we can actually solve, or at least make some meaningful progress towards solving, and then demonstrating that progress. Identifying nontrivial but solvable social problems isn’t easy, nor do I think that solving problems is the only measure of progress in a discipline. So I certainly wouldn’t advocate that everyone drop what they’re doing to work on these problems, or even try to agree on what they should be. But I do think that being able to point to a set of problems that sociologists have arguably “solved” would greatly enhance our collective reputation and help us to attract more students.

ii) If you could pick a handful of sociological concepts and then have everyone outside of sociology learn them, and they’d be as familiar as the economic examples listed above, what would they be?

A book called Everything is Obvious: Once you Know the Answer

I wrote a book a few years ago called Everything is Obvious: Once you Know the Answer about the failures of commonsense reasoning and how we systematically ignore them. I think the contents of that book is pretty close to the list of concepts I would like everyone to understand, including: the nature of common sense itself; the difference between rational choice and behavioral conceptions of individual decision making; cumulative advantage and intrinsic unpredictability; the fallacy of the representative individual; the perils of ex-post explanations and dangers of “overfitting” to known outcomes; the consequences of overfitting for predictions about the future; and the implications of all of these problems for practical matters of predicting success, rewarding performance, deciding what is fair, and even what is knowable. I wouldn’t claim that these concepts constitute a core of knowledge comparable to core concepts in economics, nor do I think it would help students directly solve real-world problems of the kind I just advocated for, but I do think it would teach students some epistemic modesty and might eventually lead to more intelligent public discourse about these problems. I’m not sure the book has accomplished any of that, but that’s why I wrote it.

## Question 4: Try to compare sociology PhDs with industry demands

You made a transition from academia to the private sector. One of the ways that people have suggested improving the sociology PhD job market should is to make work in the private sector a clearer option from the beginning. What do you think about sociology PhDs and the private sector? How do you think sociology PhDs could or should be going about? What skills should they develop? How should they present themselves to companies?

Big data need theoretical knowledge to build valuable insights.

It’s true that companies are increasingly excited about extracting value from data, which has made data science a very in-demand skill set. I also think that companies are starting to appreciate that truly valuable insight requires more than just good computational and statistical chops—some degree of theoretical knowledge is also required in order to ask the right questions, define the right metrics, and avoid basic errors of sampling bias, causal inference etc. This latter trend is much earlier in its life cycle than the former, but I think as companies learn more about the complexities and compromises associated with “big data” they will increasingly demand data scientists with social scientific training.

The bright side and downsides for sociology PhDs to go to industry.

So on the bright side I think that there is real potential for sociology PhDs to find intellectually rewarding work in industry. The downside is that in order to realize any value from their sociological training they also need a level of technical skill that is well beyond what students can expect to learn in the vast majority of sociology PhD programs. In our postdoc hiring we are starting to see a handful of strong candidates with sociology PhDs—up from zero just a couple of years ago—so that’s encouraging. But I suspect that these students mostly figured it out on their own or took it up themselves to find the relevant courses in other departments. Which is fine, and if I were a current sociology PhD that’s what I would do, but I think it would be better for the field to provide a more systematic level of training.

## Question 5: The differences between writing for AJS and Natures

You’ve been very successful publishing in journals such as AJS, while targeting broader audiences through high-impact journals such as Nature and Science. How is writing for an AJS audience different from writing for Nature and Science? Where would you send your manuscript if was rejected by Science? Do you think more sociologists should be looking to publish beyond our traditional journals in order to reach a broader community of scholars?

Different readers need different writing strategies.

Writing for AJS is completely different from writing for Nature and Science in almost every sense: length, style, treatment of related literature, acceptable methodology, conception of theory, presentation of results…everything. It’s also different from writing for computer science conference proceedings and physics journals, and all of those different outlets are also different from one another. I also occasionally write magazine articles, op-eds and trade books, and those are also all different in their own way. Learning to write for different disciplinary outlets and in different styles is time consuming and sometimes frustrating—because different groups of readers care about such different things. But I think it’s an effort that sociologists should make.

Try speak to CSers in their languages

The fact is that with very few exceptions researchers in other disciplines don’t read sociology journal articles, and when they do they find them incredibly long and tedious. For example, all that effort that we devote to situating our work is completely lost on most computer scientists, so when they get to the results section they wonder why it was necessary to write 40 pages in order to explain one table of regression coefficients. Given that CS is a much bigger and more powerful discipline than sociology, if want to have an impact on them or convince them that we are worth taking seriously, we will have to speak to them in their language and probably in their own publication venues.

A single high impact paper is worth many low impact papers.

On the other hand a single high impact paper is worth many low impact papers, so from a career perspective it’s not necessarily a waste of time to devote a year or two to getting something into a top journal. I do often wish that we could find a more efficient way to publish our research without compromising quality, and in that regard online-only, open-access journals like PLoS One and Sociological Science have some appealing properties. But the reality is that we live in a highly competitive world where attention is scarce; so my fear is that if we stopped using A-journal publications as a differentiator, the likely substitute (relentless self-promotion on social media anyone?) might be even worse.

## Question 6: How to choose between academia and industry

You spent several years at the Columbia Sociology Department. During your time there you mentored several prominent junior scholars including Baldassari and Salganik. How was your experience being an academic sociologist and why did you decide to leave for industry? Will you consider returning to Academia?

Why leave Columbia for Yahoo! Research?

I really loved my time at Columbia but around 2006 it started to dawn on me that, whether it liked it or not, sociology was going to become a computational science, much as biology had become a computational science in the early 1990s. All around us social data were exploding in volume and variety, from email to social networking services to online experiments of the kind I did with Matt (Salganik). It also occurred to me, however, that sociologists weren’t well equipped to handle this transition and that if we were going to make rapid progress we would need to the computer scientists to help, and possibly psychologists and economists as well. Columbia is now pretty open to interdisciplinary collaborations of this sort, and their data science institute is a great example of that openness, but at the time it was very hard to see how it would work within the confines of traditional academic departments.

I was also having difficulty recruiting grad students with rigorous mathematical and computational backgrounds (as you noted there were some like Matt and Delia and also Gueorgi Kossinets, but they were really the exceptions), and raising funding to support the whole thing. Towards the end I felt like I was spending all my time writing grant proposals or sitting in meetings and almost no time doing actual research. So when Prabhakar Raghavan called me from Yahoo! to ask if I would come and help them set up a social science research unit it was very tempting. Even then I wasn’t sure I would do it, and certainly didn’t expect to do it for long, but it really worked out wonderfully and now I’ve been at Yahoo! and Microsoft Research for longer than I was at Columbia.

Doing Research more purely at Microsoft or Yahoo!

Perhaps surprisingly, I think the biggest difference between my experience at Columbia compared with Microsoft (or at Yahoo!) is that I now spend much more time doing and thinking about research. The other big difference is that, in contrast with most university faculty, I am surrounded (literally—we all sit in cubicles in an open plan office) by researchers from different disciplinary backgrounds including psychology, economics, physics, and computer science.One of my colleagues once observed that university departments comprise lots of people with similar training interested in different problems, whereas research labs like ours comprise lots of people with different training interested in the same problems. I think that’s roughly true, and it completely changes the nature of how we work, which is highly collaborative, interdisciplinary, and very problem oriented. That is not to say that we only do “applied” research—we do some of that but we also do a lot of basic science and publish all our work in all the same venues as our colleagues in universities. Rather what it means is that we are more concerned with the relevance of our work to real-world problems and less concerned about what particular disciplinary tradition it fits into.

Would I ever consider returning to academia? I don’t know. I’m very happy at Microsoft right now: I work with fantastic colleagues, we get amazing PhD student interns every summer, and we work on a wide variety of extremely interesting problems. It’s been a great experience and every day I’m grateful to have the job that I have. So although I wouldn’t rule out returning to academia one day I’m not in any hurry to leave.

## Question 7: Common sense and its importance for sociology

You recently wrote an article on common sense and its importance for sociology. What was the intuition for it?

Sociologists conflate causal explanations with explanations that “make sense” of outcomes they have observed

As I mentioned, I recently wrote a book about how people rely on common sense more than they realize, and in so doing end up persuading themselves that they understand much more about the world than they actually do. In course of writing the book, it occurred to me that sociologists make many of the same mistakes that other people do. Just like other people, that is, sociologists conflate causal explanations with explanations that “make sense” of outcomes they have observed, unconsciously substitute representative individuals for collectives, overfit their explanations to past data, and fail to check their predictions. I didn’t belabor this point in the book because, as I mentioned earlier, I wanted it be an advertisement for sociological thinking not a critique of it. Nevertheless I thought the implication was pretty clear, so I was disappointed that some of my colleagues who liked the book’s appeal to non-sociologists seemed to think it had nothing to say to them. I decided that if I wanted them to get the message I would have to sharpen it up a lot, and also make it a bit more constructive; so that’s what I tried to do in that paper.

## Question 8: Changes needed for Sociology department and some tips for current sociology graduates

If you were in charge of a Sociology department and could implement any change you’d like, what specific changes would you introduce to its graduate training program? Is there something that current sociology graduate students aren’t doing that they should be doing?

Changes needed for Sociology department

As I mentioned earlier, I think that a data science sequence (e.g. data acquisition, cleaning and management; basic concepts and programming languages for parallel computing; advanced statistics, including methods of causal inference; some basic machine learning; design and construction of web-based experiments) would be super useful for sociology graduate students, and would make them both better social scientists and also much more attractive to prospective employers. There are already a handful of courses of this sort being trialed in various places, including Stanford, Columbia, and Princeton, and sociology departments could work with their colleagues in other departments to pull together a reasonable sequence from existing pieces. It would take some effort and probably resources, but I don’t think it’s unfeasible.

Some tips for current sociology graduates

In the meantime, as I mentioned earlier: if I were a current sociology grad student, I would be busy taking courses in computer science and statistics to augment my sociology training. I would also look around for any groups doing computational social science and ask to join them.

It is an adventurous thing to join a new interdisciplinary fields like computational social science.

The downside of new, interdisciplinary fields is that nobody really knows what is involved or what the standards are, so you have to be prepared to take some risks and also to feel out of your depth much of the time. The upside is that it can be incredibly stimulating, and there is the possibility of doing something genuinely new. I think computational social science is in that phase now, so it’s a great time for ambitious and creative students to dive in and see what they can do.

Sociological theory, if it is to advance significantly, must proceed on these interconnected planes: 1. by developing special theories from which to derive hypotheses that can be empirically investigated and 2. by evolving a progressively more general conceptual scheme that is adequate to consolidate groups of special theories.

— Robert K. Merton, Social Theory and Social Structure

…what might be called theories of the middle range: theories intermediate to the minor working hypotheses evolved in abundance during the day-by-day routine of research, and the all-inclusive speculations comprising a master conceptual scheme.

— Robert K. Merton, Social Theory and Social Structure

Our major task today is to develop special theories applicable to limited conceptual ranges — theories, for example, of deviant behavior, the unanticipated consequences of purposive action, social perception, reference groups, social control, the interdependence of social institutions — rather than to seek the total conceptual structure that is adequate to derive these and other theories of the middle range.

— Robert K. Merton

## 理论总体系和中层理论

……我们能够形成有限的理论，能够预测一般趋势和普通的因果律；如果它们扩大到全人类，在很大程度上可能就不真实了。但是，如果把它们限于一定的国家，它们就具有某种真实性……通过缩小观察范围，通过把自己限定在某些类型的社群上，并且如实地表述事实，就有可能扩大政治理论的范围。通过采取这种方法，我们能够增加从事实中得出的真正政治公理的数量，同时，使这些公理更加充实、生动和牢固。与仅仅是空洞无物的概括相反，它们类似于培根的中级原理；这种原理是对事实的概括表述，但非常接近实际，可作为生活事务的指南。——乔治.康沃尔.刘易斯

With the introduction of the middle range theory program, he advocated that sociologists should concentrate on measurable aspects of social reality that can be studied as separate social phenomena, rather than attempting to explain the entire social world. He saw both the middle-range theory approach and middle-range theories themselves as temporary: when they matured, as natural sciences already had, the body of middle range theories would become a system of universal laws; but, until that time, social sciences should avoid trying to create a universal theory.

– Mjøset, Lars. 1999. “Understanding of Theory in the Social Sciences.” ARENA working papers.

Network Diversity and Economic Development

Nathan Eagle, Michael Macy, Rob Claxton

heterogeneous social ties may generate these opportunities from a range of diverse contacts(1,2)

1. M. Newman, SIAM Rev. 45, 167 (2003).
2. S. Page, The Difference: How the Power of Diversity Creates Better Groups, Firms, Schools, and Societies (Princeton Univ. Press, Princeton, NJ, 2007).

• greater job mobility (14, 15)
• higher salaries (9, 16, 17)
• opportunities for entrepreneurship (18, 19)
• increased power in negotiations (20, 21).

Although these studies suggest the possibility that the individual-level benefits of having a diverse social network may scale to the population level, the relation between network structure and community economic development has never been directly tested (22).

As policy-makers struggle to revive ailing econ- omies, understanding this relation between net- work structure and economic development may provide insights into social alternatives to traditional stimulus policies.

The communication network data were collected during the month of August 2005 in the UK. The data contain more than 90% of the mobile phones and greater than 99% of the residential and business landlines in the country.

The resulting network has 65 × 106 nodes, 368 × 106 reciprocated social ties, a mean geodesic distance (minimum number of direct or indirect edges connecting two nodes) of 9.4, an average degree of 10.1 network neighbors, and a giant component (the largest connected subgraph) containing 99.5% of all nodes (23).

Although the nature of this communication data limits causal inference, we were able to test the hypothesized correspondence between social network structure and economic development using the 2004 UK government’s Index of Multiple Deprivation (IMD), a composite measure of relative prosperity of 32,482 communities encom- passing the entire country (24), based on income, employment, education, health, crime, housing, and the environmental quality of each region (25). Each residential landline number was associated with the IMD rank of the exchange in which it was located, as shown in Fig. 1.

We developed two new metrics to capture the social and
spatial diversity of communication ties within an
individual’s social network. We quantify topological diversity as a function of the Shannon entropy

High diversity scores imply that an individual splits her time more evenly among social ties and between different regions.

Diversity was constructed as a composite of Shannon entropy and Burt’s measure of structural holes, by using principal component analysis(PCA). A fractional polynomial was fit to the data.

## 新媒体环境下内容生产与内容发布的新策略

• 自我人设。这是李普曼的那些理论在互联网时代的最好的重现。每个使用微信的人，尤其是有影响力的人，他们使用微信时一定是在塑造他的自我形象。比如，通过转发内容来体现自己的娱乐精神，亦或是吐槽、自黑、反鸡汤等等。所以从内容生产者的角度而言，至少不能抵消用户的自我人设。
• 情绪宣泄。被转发的（具有较强传播能力的）内容往往要有一定的争议性。这个道理很简单，没有争议，就没有关注。因此，内容生产者不应害怕内容有争议，关键是自己掌控尺度。很多情况下，还要学会利用争议，甚至创造争议。
• 省时省力。有一种图方便的、或者说有点“偷懒”的内容生产方式，那就是做干货和做盘点。做盘点也是也是一种原创内容，但它是更巧妙的原创。很多人其实都会有这样的心理：“我的朋友不会乱转发东西的，尤其是那些我在乎的朋友。”通常，盘点类文章标题里的数字很关键。这样的内容生产，只要能引起人们的动作，就是好的，收藏也好，转发也好。
• 有利可图。奖品、线下活动等等。在内容发布的同时，可以策划一些或大或小的活动相结合，例如定期举办大活动，例行举行线上小活动等等。

2016年9月24日，数据工场创始人、财新网前CTO、财新数据可视化实验室创始人黄志敏作客南京大学新闻传播学院，为南大学子带来了《从数据新闻到数据工场》的知识讲座。黄志敏从2011年入职财新传媒之后一直忙于“重新搭建研发团队，推动新媒体转型”，从2013年6月开始投入数据新闻领域，三年内大小奖项拿了十一个，代表作品之一是财新传媒于2014年7月29日推出的数据新闻《周永康的人与财》，该作品中英文版分获亚洲新闻奖、以及2014腾讯传媒大奖“年度数据新闻”、国际新闻设计协会（SND）多媒体设计优秀奖等。

# 使用jieba对中文文本进行分词

jieba，中文名为“结巴”，力争要做“最好的”Python中文分词组件，jieba主要支持三种分词模式：

• 全模式，把句子中所有的可以成词的词语都扫描出来, 速度非常快，但是不能解决歧义；
• 精确模式，试图将句子最精确地切开，适合文本分析；
• 搜索引擎模式，在精确模式的基础上，对长词再次切分，提高召回率，适合用于搜索引擎分词。

# 使用snownlp做简单的文本分析

SnowNLP是一个Python写的类库（package），可以方便的处理中文文本内容，是受到了TextBlob的启发而写的，由于现在大部分的自然语言处理库基本都是针对英文的，于是写了一个方便处理中文的类库，并且和TextBlob不同的是，这里没有用NLTK（一个英文文本处理包），所有的算法都是自己实现的，并且自带了一些训练好的字典。

Understanding individual human mobility patterns

Marta C. Gonzalez, Cesar A. Hidalgo & Albert-Laszlo Barabasi

# Abstract

We find that, in contrast with the random trajectories predicted by the prevailing Levy flight and random walk models, human trajectories show a high degree of temporal and spatial regularity, each individual being characterized by a time-independent characteristic travel distance and a significant probability to return to a few highly frequented locations. After correcting for differences in travel distances and the inherent anisotropy of each trajectory, the individual travel patterns col- lapse into a single spatial probability distribution, indicating that, despite the diversity of their travel history, humans follow simple reproducible patterns.

# Background

$$P(\Delta r) \sim \Delta r^{(1+\beta)}$$

# Data

## D1 Dataset

This dataset was collected by a European mobile phone carrier for billing and operational purposes. It contains the date, time and coordinates of the phone tower routing the communication for each phone call and text message sent or received by 6 million costumers. The dataset summarizes 6 months of activity.

Each tower serves an area of approximately 3 km2. Due to tower coverage limitations driven by geographical constraints and national frontiers no jumps exceeding 1, 000 km can be observed in the dataset.

We removed all jumps that took users outside the continental territory.

## D2 Dataset

Some services provided by the mobile phone carrier, like pollen and traffic forecasts, rely on the approximate knowledge of customer’s location at all times of the day. For customers that signed up for location dependent services, the date, time and the closest tower coordinates are recorded on a regular basis, independent of their phone usage.

We were provided such records for 1, 000 users, among which we selected the group of users whose coordinates were recorded at every two hours during an entire week, resulting in 206 users for which we have 10, 613 recorded positions.

as these users were selected based on their actions (signed up to the service), in principle the sample cannot be considered unbiased, but we have not detected any particular bias for this data set.

# Observation

## The distribution of $\Delta r$

We measured the distance between user’s positions at consecutive calls, noted as $\Delta r$

$$P(\Delta r) = (\Delta r + \Delta r_0)^{-\beta}exp(-\Delta r/\kappa) (1)$$

Equation (1) suggests that human motion follows a truncated Levy flight

However, the observed shape of $P(\Delta r)$ could be explained by three distinct hypotheses:

• first, each individual follows a Levy trajectory with jump size distribution given by equation (1) (hypothesis A);
• second, the observed distribution captures a population-based heterogeneity, corresponding to the inherent differences between individuals (hypothesis B);
• third, a population-based heterogeneity coexists with individual Levy trajectories (hypothesis C); hence, equation (1) represents a convolution of hypotheses A and B.

## The distribution of $r_g$

To distinguish between hypotheses A, B and C, we calculated the radius of gyration for each user (see Supplementary Information IV)

$$P(r_g)=(r_g+r_g^0)^{-\beta_r}exp(-r_g/\kappa)$$

Question: 如何计算$R_g$？回转半径如何理解？

## Relationship Between ${r_g}$ and t

The longer we observe a user, the higher the chance that she/he will travel to areas not visited before.

We measured the time dependence of the radius of gyration for users whose gyration radius would be considered

• small ($r_g(T)$ <= 3 km),
• medium (20 < $r_g(T)$ <= 30 km) or
• large ($r_g(T)$ > 100 km)

at the end of our observation period (T = 6 months).
The results indicate that
the time dependence of the average radius of gyration of mobile phone users is better approximated by a logarithmic increase,
not only a manifestly slower dependence than the one predicted by a power law
but also one that may appear similar to a saturation process
(Fig. 2a and Supplementary Fig. 4).

## Relationship Between $P(\Delta r | r_g)$ and $\Delta r$

As the inset of Fig. 2b shows, users with small $r_g$ travel
mostly over small distances, whereas those with large $r_g$ tend to
display a combination of many small and a few larger jump sizes.

Once we rescaled the distributions with $r_g$ (Fig. 2b), we found that the
data collapsed into a single curve, suggesting that a single jump size distribution characterizes all users.

$$P(\Delta r | r_g) \sim r_g^{-\alpha} F(\Delta r | r_g)$$

where $\alpha \approx 1.2 \pm 0.1$ and F(x) is an $r_g$-independent function with asymptotic behaviour, that is,
$F(x) \sim x^{-a}$ for x < 1 and F(x) rapidly decreases for x >> 1
（这个F函数在x<1时是幂律的，x>1时急剧下降）

Therefore, the travel patterns of individual users may be approximated by a Levy flight up to a distance characterized by rg.
Most important, however, is the fact that the individual trajectories are bounded beyond rg;
thus, large displacements, which are the source of the distinct and anomalous nature of Levy flights, are statistically absent.

This indicates that the observed jump size distribution $P(\Delta r)$ is in fact
the convolution between the statistics of individual trajectories $P(\Delta r | r_g)$ and
the population heterogeneity P(rg), consistent with hypothesis C.

To uncover the mechanism stabilizing $rg$, we measured the return probability for each individual $F{pt}(t)$
(first passage time probability)
defined as the probability that a user returns to the position where he/she was first observed after t hours (Fig. 2c).

In contrast, we found that the return probability is characterized by several peaks at 24 h, 48 h and 72 h,
capturing a strong tendency of humans to return to locations they visited before,
describing the recurrence and temporal periodicity inherent to human mobility

To explore if individuals return to the same location over and over,
we ranked each location on the basis of the number of times an individual was recorded in its vicinity

We find that the probability of finding a user at a location with a given rank L is well approximated by $P(L) \sim 1/L$, independent of the number of locations visited by the user (Fig. 2d).

## Preferential Return

Therefore, people devote most of their time to a few locations,
although spending their remaining time in 5 to 50 places, visited with diminished regularity.

Therefore, the observed logarithmic saturation of $r_g(t)$ is rooted in the high degree of regularity in the daily travel patterns of individuals,
captured by the high return probabilities (Fig. 2b) to a few highly frequented locations (Fig. 2d).

each user can be assigned to a well defined area, defined by home and workplace,
where she or he can be found most of the time.

• 在银行纸币测量中观察到的Levy统计数据捕获了等式(2)中所示的- 群体异质性与个体用户的运动的卷积。
• 个人显示出显着的规律性，因为他们回到几个经常访问的地方，如家庭或工作。
• 这种规律性不适用于钞票：票据总是遵循其当前所有者的轨迹; 也就是说，美元钞票弥漫（是散播开的），但人类没有。

Taken together, our results suggest that
the Levy statistics observed in bank note measurements capture a convolution of the population heterogeneity shown in equation (2) and the motion of individual users.
Individuals display significant regularity, because they return to a few highly frequented locations, such as home or work.
This regularity does not apply to the bank notes: a bill always follows the trajectory of its current owner; that is, dollar bills diffuse, but humans do not.

The fact that individual trajectories are characterized by the same rg-independent two-dimensional probability distribution
suggests that key statistical characteristics of individual trajectories are largely indistinguishable after rescaling.
Therefore, our results establish the basic ingredients of realistic agent-based models,
requiring us to place users in number proportional with the population density of a given region
and assign each user an rg taken from the observed P(rg) distribution.
Using the predicted anisotropic rescaling, combined with the density function, the shape of which is provided as Supplementary Table 1,
we can obtain the likelihood of finding a user in any location.
Given the known correlations between spatial proximity and social links, our results could help quantify the role of space in network development and evolution and improve our understanding of diffusion processes.