今天下午,南卡罗来纳大学魏然教授来到我院交流,我有幸去采访了魏老师,并向其请教了一些关于做研究写论文的困惑,颇有感悟,特此记录。

国内的论文写作偏重论述性,而海外的论文写作,讲究严格的八股:选题、意义、数据、方法、结果等等。

海外的学术研究,强调的是一种公共文化,例如,做期刊的editor往往是没有正式的聘书乃至薪资的,仅仅是一个名誉的头衔,教授们往往是自愿地付出时间和精力参与论文的评审,这一过程实际上是一种学术文化的建设的过程,每一个研究者都应该努力去了解、尊重和参与这一过程,也应该尊重专家付出个人时间给出的评审意见。

关于写论文和投稿等事宜,有没有tricks?肯定是有的,但我更想谈的是一份好的研究应有的元素:其一是极强的问题意识(对研究问题的敏感性——社会意义和重要性),其二是扎实的理论的基础,对于好的研究问题需要研究者能够从传播学或其他相关的理论视角切入,这一过程需要大量的积累),其三是科学严谨的研究方法(无论量化或质化),最后是好的写作习惯。一篇好的论文往往不是想起来写的时候才去写,而一定是每天写一点的。对于研究者而言,必须培养起一个每天写几百字的习惯,记录自己所读、所思、所感,唯有平时一直在积累,等到有征稿通知的时候,就可以较快较好地产出一篇好的论文,而不是在deadline之前匆忙赶出来一篇论文。

一点启示:把平时写日记的习惯升华一下,不仅记一些日常琐事和日常所思,更要培养起日常的学术化写作的习惯,把对学术问题的思考用学术的语言写出来。这一点与最近正在准备的托福考试的写作也是相一致的。

最后老师介绍了一下国际中华传播学会。

海外的传播学研究群体,最早(80年代)主要是韩国人(Korean Association of Communication),香港城市大学的李金铨老师学习韩国人的模式搞了一个CCA(Chinese Communication Association),最初主要是联谊性质的组织,CCA reception,发展壮大以后也开始考虑学术性的服务,扮演一个桥梁的作用,比如回国办一些workshop、建立国内和国外的合作、服务等,构建一个活跃的学术圈(服务性质)。CCA的一大原则是不要只做中国的研究问题,不能因为自己是中国人,就只做中国问题。

采访之前准备采访提纲时,拜读了魏然老师两篇论文:

  • Wei, R. (2014). Texting, tweeting, and talking: Effects of smartphone use on engagement in civic discourse in China. Mobile Media & Communication, 2(1), 3-19.
  • Wei, R., Lo, V. H., Xu, X., Chen, Y. N. K., & Zhang, G. (2014). Predicting mobile news use among college students: The role of press freedom in four Asian cities. new media & society, 16(4), 637-654.

两篇文章都采用量化(问卷调查)方法研究手机的使用行为,第一篇文章主要的发现是与传统的政府管控的媒介相比,智能手机的使用能够有效地提高人们的政治讨论和政治的参与度,其中私人化的政治讨论(talking politics in private)、较高的使用频率(extensive use of the smartphone)以及移动端的微博使用(mobile tweeting)是对线上政治讨论的3个主要的正向影响因素。第二篇文章发现,使用手机阅读新闻和在手机上使用微博类工具的行为在4个被研究的城市(Shanghai, Hong Kong, Taipei and Singapore)中的反映大不相同,出版自由(press freedom)与手机端的新闻使用和微博使用行为呈现负相关关系。

魏然老师简介:

魏然,祖籍河南,现任美国南卡罗莱纳大学新闻与大众传播学院终身教授,博士生导师,广告与公关系主任。1986年毕业于上海外国語大学,主修英文与国际新闻。1990、1995年分别获得英国威尔士大学硕士学位及美国印第安那大学博士学位,曾任中国中央电视台记者、香港中文大学新闻与传播学院担任助理教授、新加坡南洋理工大学传播与信息学院高级访问学者。现任美国《大众传播与社会》(SSCI刊物)副主编,新加坡《亞洲传媒》(SSCI刊物)特約主编,以及5份美国和亚洲的传播类学术刊物编委。国际知名的手机媒体研究专家,多次获得美国新闻与大众传播学科杰出论文奖。中国传媒大学、河南大学客座教授,香港城市大学海外学術评鉴委员,香港大学海外评审委员。

http://smd.sjtu.edu.cn/teacher/detail/id/23

Ran Wei, PhD, is the Gonzales Brothers Professor of Journalism in the School of Journalism & Mass Communications at the University of South Carolina, USA. A former TV journalist, active media consultant, and incoming Editor-in-Chief of Mass Communication & Society, his research focuses on media effects in society and digital new media, including wireless computing and mobile media.

https://www.sc.edu/study/colleges_schools/cic/faculty-staff/wei_ran.phphttps://www.sc.edu/study/colleges_schools/cic/faculty-staff/wei_ran.php

Webster, J. G., & Ksiazek, T. B. (2012). The dynamics of audience fragmentation: Public attention in an age of digital media. Journal of communication, 62(1), 39-56. click here

Abstract

Audience fragmentation is often taken as evidence of social polarization.

We offer a theoretical framework for understanding fragmentation and advocate for more audience-centric studies.

We find extremely high levels of audience duplication across 236 media outlets, suggesting overlapping patterns of public attention rather than isolated groups of audience loyalists.

Three factors that shape fragmentation

Media Providers

The most obvious cause of fragmentation is a steady growth in the number of media outlets and products competing for public attention.

Media Users

What media users do with all those resources is another matter. Most theorists expect them to choose the media products they prefer. Those preferences might reflect user needs, moods, attitudes, or tastes, but their actions are ‘‘rational’’ in the sense that they serve those psychological predispositions.

Media Measures

Media measures exercise a powerful influence on what users ultimately consume and how providers adapt to and manage those shifting patterns of attendance. Indeed, information regimes can themselves promote or mitigate processes of audience fragmentation

Three different ways of studying fragmentation

Media-centric fragmentation

An increasingly popular way to represent media-centric data is to show them in the form of a long tail (Anderson, 2006).

Concentration can be summarized with any one of several statistics, including Herfindahl–Hirschman indices (HHIs) and Gini coefficients (see Hindman, 2009; Yim, 2003).

Herfindahl–Hirschman indices (HHIs)

The Herfindahl index (also known as Herfindahl–Hirschman Index, or HHI) is a measure of the size of firms in relation to the industry and an indicator of the amount of competition among them. It is defined as the sum of the squares of the market shares of the firms within the industry (sometimes limited to the 50 largest firms), where the market shares are expressed as fractions. The result is proportional to the average market share, weighted by market share. As such, it can range from 0 to 1.0, moving from a huge number of very small firms to a single monopolistic producer.

$HHI = \sum_{i=1}^{N}(Xi/X)^2 = \sum{i=1}^{N}S_i^2$

  • $X_i$ - 第i个企业的规模
  • $X$ - 市场总规模
  • $S_i$ - 市场占有率

Gini Coefficients

The Gini coefficient is a measure of statistical dispersion intended to represent the income or wealth distribution of a nation’s residents, and is the most commonly used measure of inequality. A Gini coefficient of zero expresses perfect equality, where all values are the same (for example, where everyone has the same income). A Gini coefficient of 1 (or 100%) expresses maximal inequality among values (e.g., for a large number of people, where only one person has all the income or consumption, and all others have none, the Gini coefficient will be very nearly one).

if $x_i$ is the wealth or income of person $i$, and there are $n$ persons, then the Gini coefficient $G$ is given by:

$G = \frac{\sum{i=1}^{n}\sum{i=1}^{n}|x_i-xj|}{2n\sum{i=1}^{n}x_i}$

User-centric fragmentation

This approach focuses on each individual’s use of media, which is fragmentation at the microlevel. Most of the literature on selective exposure would suggest that people will become specialized in their patterns of consumption.

It seems that user-centric studies are generally designed to describe typical users or identify types of users. They rarely ‘‘scale-up’’ to the larger issues of how the public allocates its attention across media.

Audience-centric fragmentation

A useful complement to the media- and user-centric approaches described above would be an ‘‘audience-centric’’ approach. This hybrid approach is media-centric in the sense that it describes the audience for particular media outlets.

A network analytic approach to fragmentation

How to Build Network

  • Node - media
  • Edge - duplication of audience(percent) - how to compute?

The enlarged portion shows the link (i.e., the level of duplication) between a pair of nodes, NBC Affiliates and the Yahoo! brand, where 48.9% of the audience watched NBC and also visited a Yahoo! Web site during March 2009.

Network Metrics

  • expected duplication v.s. observed duplication

Our approach was to compare the observed duplication between two outlets to the ‘‘expected duplication’’ due to chance alone. Expected duplication was determined by multiplying the reach of each outlet. So, for example, if outlet A had a reach of 30% and outlet B a reach of 20%, then 6% of the total audience would be expected to have used each just by chance.1 If the observed duplication exceeded the expected duplication, a link between two outlets was declared present (1); if not, it was absent (0) (see Ksiazek, 2011, for a detailed treatment of this operationalization).

  • degree score: converted into percent

For each outlet, the number of links is totaled to provide a degree score. For ease of interpretation, we converted these totals to percentages. So, for example, if an outlet had links to all the other 235 outlets, its degree score was 100%. If it had links to 188 outlets, its degree score was 80%.

  • network centralization score(degree centrality?)

To provide a summary measure across the entire network of outlets, we computed a network centralization score. This score summarizes the variability or inequality in the degree scores of all nodes in a given network (Monge & Contractor, 2003) and is roughly analogous to the HHI (see Hindman, 2009; Yim, 2003) that measures concentration in media-centric research. Network centralization scores range from 0% to 100%. In this application, a high score indicates that audiences tend to gravitate to a few outlets (concentration), whereas a low score indicates that audiences spread their attention widely across outlets (fragmentation).

Result

The distribution shows that almost all 236 outlets have high levels of audience duplication with all other outlets(i.e., degree scores close to 100%). Furthermore, the network centralization score is 0.86%. This suggests a high level of equality in degree scores and thus evidence that the audience of any given outlet, popular or not, will overlap with other outlets at a similar level.

For instance, the Internet brand Spike Digital Entertainment reaches only 0.36%5 of the population, but its audience overlaps with close to 70% of the other outlets. Although we do not have data on individual media repertoires, these results suggest that repertoires, though quite varied, have many elements in common. The way users move across the media environment does not seem to produce highly polarized audiences.

The graph_tool module provides a Graph class and several algorithms that operate on it. The internals of this class, and of most algorithms, are written in C++ for performance, using the Boost Graph Library.

Python modules are usually very easy to install, typically requiring nothing more that pip install for basically any operating system. For graph-tool, however, the situation is different. This is because, in reality, graph-tool is a C++ library wrapped in Python, and it has many C++ dependencies such as Boost, CGAL and expat, which are not installable via Python-only package management systems such as pip. Because the module lives between the C++ and Python worlds, its installation is done more like a C++ library rather than a typical python module. This means it inherits some of the complexities common of the C++ world that some Python users do not expect.

The easiest way to get going is to use a package manager, for which the installation is fairly straightforward. This is the case for some GNU/Linux distributions (Arch, Gentoo, Debian & Ubuntu) as well as for MacOS users using either Macports or Homebrew.

Reference

Use brew to install graph-tool on MacOS

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
brew tap homebrew/science
brew install graph-tool
brew update
Error: /usr/local is not writable. You should change the ownership
and permissions of /usr/local back to your user account:
sudo chown -R $(whoami) /usr/local
sudo chown -R zhicongchen /usr/local
brew update
After brew update:
==> Migrated HOMEBREW_REPOSITORY to /usr/local/Homebrew!
Homebrew no longer needs to have ownership of /usr/local. If you wish you can
return /usr/local to its default ownership with:
sudo chown root:wheel /usr/local
sudo chown root:wheel /usr/local

brew installed another python into /usr/local/Cellar/python/2.7.13/bin/python, so we can open that python to use graph-tool.

1
2
3
4
5
运行python:
/usr/local/Cellar/python/2.7.13/bin/python
导入graph-tool:
from graph_tool.all import *

It seems that graph-tool cannot be directly installed into anaconda till now, as is said below:

You cannot really mix homebrew with anaconda without defeating the whole purpose of isolated environments. I would try to find a way to install graph-tools directly into your anaconda environment. It seems that there are packages on anaconda cloud. But I am not sure how easy it is to install those. – cel Dec 24 ‘15 at 7:50

http://stackoverflow.com/questions/34447563/force-home-brew-to-install-graph-tools-to-the-anaconda-python-interpretor

Question

There is an equation of exponential truncated power law in the article below:

Gonzalez, M. C., Hidalgo, C. A., & Barabasi, A. L. (2008). Understanding individual human mobility patterns. Nature, 453(7196), 779-782.

like this:

1
P(r_g)=(r_g+r_g^0)^{-\beta_r}exp(-r_g/\kappa)

It is an exponential truncated power law. There are three parameters to be estimated: rg0, beta and K. Now we have got several users’ radius of gyration(rg)

You can directly get rg and prg data as the following:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
rg = np.array([ 20.7863444 , 9.40547933, 8.70934714, 8.62690145,
7.16978087, 7.02575052, 6.45280959, 6.44755478,
5.16630287, 5.16092884, 5.15618737, 5.05610068,
4.87023561, 4.66753197, 4.41807645, 4.2635671 ,
3.54454372, 2.7087178 , 2.39016885, 1.9483156 ,
1.78393238, 1.75432688, 1.12789787, 1.02098332,
0.92653501, 0.32586582, 0.1514813 , 0.09722761,
0. , 0. ])
prg = np.array([ 0. , 0.03448276, 0.06896552, 0.10344828, 0.13793103,
0.17241379, 0.20689655, 0.24137931, 0.27586207, 0.31034483,
0.34482759, 0.37931034, 0.4137931 , 0.44827586, 0.48275862,
0.51724138, 0.55172414, 0.5862069 , 0.62068966, 0.65517241,
0.68965517, 0.72413793, 0.75862069, 0.79310345, 0.82758621,
0.86206897, 0.89655172, 0.93103448, 0.96551724, 1. ])

How can I use these data of rgs to estimate the three parameters above? I hope to solve it using python.

Answser

According to @Michael ‘s suggestion, we can solve the problem using scipy.optimize.curve_fit

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
def func(rg, rg0, beta, K):
return (rg + rg0) ** (-beta) * np.exp(-rg / K)
from scipy import optimize
popt, pcov = optimize.curve_fit(func, rg, prg, p0=[1.8, 0.15, 5])
print popt
print pcov
The results are given below:
[ 1.04303608e+03 3.02058550e-03 4.85784945e+00]
[[ 1.38243336e+18 -6.14278286e+11 -1.14784675e+11]
[ -6.14278286e+11 2.72951900e+05 5.10040746e+04]
[ -1.14784675e+11 5.10040746e+04 9.53072925e+03]]

Reference

scipy.optimize.curve_fit(f, xdata, ydata, p0=None, sigma=None, absolute_sigma=False, check_finite=True, bounds=(-inf, inf), method=None, jac=None, **kwargs)

Use non-linear least squares to fit a function, f, to data.

Assumes ydata = f(xdata, *params) + eps

1
2
3
4
5
6
7
8
9
10
11
12
Examples
>>> import numpy as np
>>> from scipy.optimize import curve_fit
>>> def func(x, a, b, c):
... return a * np.exp(-b * x) + c
>>>
>>> xdata = np.linspace(0, 4, 50)
>>> y = func(xdata, 2.5, 1.3, 0.5)
>>> ydata = y + 0.2 * np.random.normal(size=len(xdata))
>>>
>>> popt, pcov = curve_fit(func, xdata, ydata)

做数据驱动的社会科学研究,需要对小数据进行精雕细琢,从而培养对大数据的直觉和洞察力。

不能把过多地时间放在数据规模的scale-up上,不论是Python、Shell还是Hadoop、Spark,那些是工程性的工作,而我现在不是要往工程的方向发展,而是要往研究的方向发展。我需要的是快速出成果,出故事,出论文,所以要精雕细琢,挖掘故事,兼顾对大数据的反思和审视。

81天的数据,100多个G,很尴尬的数据量,花了很多时间去捣鼓这个数据,却一直在外围打转,对数据本身的特点还是一无所知,这样下去,何时才能真正搞懂这个数据?
所以还是先从一天的数据着手,以小见大,自底向上。

作为社会科学家,在面对”大数据”时,核心的竞争力还是对数据的理解、熟悉和敏感,以及对社会问题深刻的思考和洞察。

The dataset is composed of 81 days’ user records of Baidu Reading. Each file has columns as below:

1
['uid', 'act_id', 'time', 'time_trigger', 'book_id', 'platform']

Now I plan to track how the attention flows between people’s reading behavior on different book. What I need to do is to extract a distinct book sequence(in the order of ‘time_trigger’) along with its duration of every user from every daily files. That’s apparently a parallel job.

I wrote a function that can get one user’s attention flows, defined as below:

1
2
3
def get_one_attention_df(uid, df):
# codes ...
return attention_df

I once tried to do this function for each person in each file. That consumed me such long time. After calculating for 9 hours, only 5 files had been processed. I gave up.

A better approach is to do this job in parallel, which takes 3 steps.

  • The Python codes is designed to process only one day’s file
  • The Linux shell is used to manage each Python process
  • The logging package is imported to log real-time output and status.

The advantages of using logging module is that it can log real-time information into files, no matter if your ssh to the server is on. However, if you use print to record output, it cannot be seen after your ssh is off. If you try to write output into files, it cannot be seen instantly, because of the buffer mechanism of file IO.

Python codes

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
def main():
path = './baidu_yuedu/reading_merge/'
filename = sys.argv[1]
df = pd.read_csv(path + filename + '.txt', encoding='gb18030', delimiter='\x01', names = ['uid', 'act_id', 'time', 'time_trigger', 'book_id', 'platform'])
df = df.dropna()
# drop rows in ['time'] which are not in number
df = df[df['time'].apply(lambda x: type(x) in [int, np.int64, float, np.float64])]
uids = df['uid'].unique()
logger.info(filename + ' read !')
oneday_attention_df = get_one_attention_df(uids[0], df)
uidcount = 1.0
for uid in uids[1:]:
uidcount += 1
if uidcount % 10000 == 1:
logger.info(filename + ' ' + str(uidcount / len(uids) * 100))
oneday_attention_df = oneday_attention_df.append(get_one_attention_df(uid, df))
oneday_attention_df.to_csv(filename + '_attention.csv', index=False)
logger.info(filename + ' done !')
if __name__ == "__main__":
main()

Linux Shell

1
2
3
4
5
6
7
8
9
10
11
# Define main
main(){
# if merge not exists
# mkdir merge
for file in $(ls ./baidu_yuedu/reading_merge/ | grep 2016101)
do
nohup python onedayattentionflow.py ${file:0:8} &
done
}
# Invoke main
main

Python Control Module

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
import os
import Queue
date_list = []
for i in range(20161001, 20161032):
date_list.append(i)
for i in range(20161101, 20161131):
date_list.append(i)
for i in range(20161201, 20161221):
date_list.append(i)
day_queue = Queue.Queue(maxsize = len(date_list))
for i in date_list:
day_queue.put(i)
def run_onedayattentionflow():
a = os.popen("ps aux | grep 'python oneday'").readlines()[:-2]
while(1):
if len(a) < 20:
if day_queue.empty():
break
else:
day = day_queue.get()
os.system("nohup python onedayterminal.py %d &" % day)
a = os.popen("ps aux | grep 'python oneday'").readlines()[:-2]
def main():
run_onedayattentionflow()
if __name__ == "__main__":
main()

logging

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
# create a logger
logger = logging.getLogger('attention flow')
logger.setLevel(logging.DEBUG)
# create a file handler
fh = logging.FileHandler(filename + '.log')
fh.setLevel(logging.DEBUG)
# create a console handler
ch = logging.StreamHandler()
ch.setLevel(logging.DEBUG)
# define the format of handlers
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s')
fh.setFormatter(formatter)
ch.setFormatter(formatter)
# bind handlers
logger.addHandler(fh)
logger.addHandler(ch)
logger.info(filename + ' begin !')

I used watch to show the content of log files every 10 seconds.

I started several Python processes, each processing one day’s file, then the 24 CPU of my server was all on work. That makes me so satisfied and excited!

Computational Social Science

David Lazer1, Alex Pentland2, Lada Adamic3, Sinan Aral2,4, Albert-László Barabási5, Devon Brewer6, Nicholas Christakis1, Noshir Contractor7, James Fowler8, Myron Gutmann3, Tony Jebara9, Gary King1, Michael Macy10, Deb Roy2, Marshall Van Alstyne2,11

  1. Harvard University, Cambridge, MA, USA. E-mail: david_lazer@harvard.edu
  2. Massachusetts Institute of Technology, Cambridge, MA, USA.
  3. University of Michigan, Ann Arbor, MI, USA.
  4. New York University, New York, NY, USA.
  5. Northeastern University, Boston, MA, USA.
  6. Interdisciplinary Scientific Research, Seattle, WA, USA.
  7. Northwestern University, Evanston, IL, USA.
  8. University of California-San Diego, La Jolla, CA, USA.
  9. Columbia University, New York, NY, USA
  10. Cornell University, Ithaca, NY, USA.
  11. Boston University, Boston, MA, USA.

原文:http://blog.sciencenet.cn/home.php?mod=space&uid=64458&do=blog&id=229840

翻译:许小可(xiaokeeie@gmail.com)

我们生活在各种网络中。我们定期检查电子邮件,在各处拨打移动电话,刷卡乘坐交通工具,使用信用卡购买商品。在公共场所,可能有监视器来监控我们的行为,在医院,我们的医疗记录以数字形式被保存。我们也很可能写博客给大家看,通过在线社会网络来维护友谊。以上的种种事情都留下了我们的数字脚印,这些踪迹汇聚起来就成为一幅复杂的个人和集体行为图景,同时这些踪迹也有可能改变我们对人生、组织和社会的理解。

虽然收集和分析海量数据的能力已经改变了一些领域如生物学、物理学等,但是数据驱动的“计算社会学”研究却进展缓慢。尽管在经济学、社会学和政治学上的重要期刊都很少关注这一领域,但计算社会学在国际公司如Google、Yahoo以及政府部门美国安全局已经开始被研究。计算社会学要么是私人公司和政府部门的专有研究领域;要么虽然某些有特权的研究者使用私有数据发表论文,但这些数据却无法被其他人评价和复制。上述的场景毫无疑问都无助于公众在知识积累、验证和分发上的长期利益。

基于一个开放的学术环境,计算社会学的价值在哪里?能够增强社会对个人和集体行为的理解吗?什么是计算社会学发展的障碍呢?

到目前为止,有关人类关系方面的研究主要依赖一次性的、自己报告的数据。新的科技,像视频监控(1),电子邮件和“智能”姓名标记这些手段不仅提供了随着时间发展,在不同时刻的交互关系,而且提供了结构和内容两方面关系信息。例如,团体中的交互关系可以使用电子邮件数据来研究,有关人们交流随时间变化的动态特性等问题也可以被考察:像工作团体是已经稳定下来很少变化,还是他们关系随着时间发生剧烈变化(2)?什么样的交互模式对应着多产的团体和个人(3)?面对面的团体交流能够通过“社会测量法”来评定,而电子设备能够被人戴着从而时刻捕捉人们在物理上的亲密关系、位置、移动以及其他各种个体行为和集体交互等。这些数据有助于解决很多有趣的问题,比如在一个组织内部的亲近关系和交流模式,以及具有杰出表现的个人或集体的信息流模式等(4)。

我们也能够了解社会的“宏观”社会网络信息(5),以及它怎么随着时间进行演化。电话公司拥有数年间他们客户之间通话模式的记录,电子商务门户网站像Google、Yahoo等拥有客户相互交流的即时信息数据。这些数据能够描绘社会通信模式的复杂图景吗?这些交互活动中的哪些方式会影响经济生产力或公共健康?不管怎么样,现在追踪人类活动已经变得很简单了(6)。移动电话提供了一种大规模长时间追踪人们移动和物理上是否亲密的方法(7)。这些数据或许会提供有用的流行病学方面的见识:比如一个病原体,像感冒病毒是如何通过物理上的相互接触而在人群中传播的。

互联网提供了一条完全不同的途径来理解人正在说什么以及人们是怎么连接到一起的(8)。例如,在这个刚过去的政治季节中,只要跟踪一下论点、谣言、政治观点以及其他线索在博客空间的传播(9),以及个人在互联网上的“网上冲浪”行为(10),每一个选民究竟关心什么东西就很清楚了。虚拟世界能自然而然地完全记录每一个人的行为,这也为研究提供了更多的可能性-很多实验在现实中是不可能做和也不被接受的(11)。相似的,社会网络在线站点提供了独特的途径去理解一个人在网络中的地位对整个组织的影响,从他们的感受到他们情绪和健康(12)。自然语言处理已经开始不断增强组织和分析互联网以及其他来源的大量文本材料的能力(13)。

简短地说,计算社会科学正在像杠杆一样以前所未有的方式不断增强我们收集和分析数据的宽度、深度和广度。然而,不容易克服的障碍却影响着这一进程。目前存在的方法不能处理数以兆计的时刻变化的整个人类个体之间的交互关系和位置。例如,目前存在的社会网络理论是往往是通过几十个人的一次“快照”得到的数据建立起来的,它怎么能告诉我们有关百万计人口的各种信息之间的相互关系,这些信息包含这些人的位置、商业交易和日常交流等数据。这些大量涌现的人与人之间相互交互的数据能够定量地提供有关人类集体行为的新观点,但是目前我们的研究框架却无法处理这些数据。

从博客空间得到的数据。上图显示的是政治博客社团之间的链接结构(从2004年开始)。红色线代表保守派博客,蓝色线代表自由派博客;橙色线代表自由派连向保守派,紫色线表示保守派连向自由派。每个博客的大小反映了其他博客连向它的数量。

也有一些制度上的障碍来阻止计算社会学前进。从途径上看,物理和生物学上探索的问题更适应观察和干涉。在发现的过程中,夸克和细胞都不介意我们揭开他们的秘密,也不抗拒我们改变他们的环境。对于基础结构来讲,社会学和计算社会学之间的鸿沟要比生物学和计算生物学之间要大得多,原因主要是计算社会学需要分布式监控,追踪允许以及编码等。这些在社会学中几乎都没有资源可以利用,甚至从物理距离和管理形式上来看,社会学系和工程或计算学系之间的差异要比其他科学之间大得多。

可能最痛苦的挑战是如何保证数据可以获取而又很好保护个人隐私。很多数据都是有所有权的(如移动电话数据和商业交易信息等)。由AOL公司公开它的很多客户“匿名化”搜索记录所造成的大混乱突出了个人或公司通过私人公司分享私人数据的潜在风险(14)。在工业界和学术界之间合作和数据共享的鲁棒模型是必需的,从而来促进研究、保护个人隐私以及为公司提供保护。更一般的讲,恰到好处的处理隐私问题是最基本的。最近美国国家研究委员会有关地理信息系统的报告就特别指出,他们可能会经常性的去掉个人外形特征,并且会仔细地匿名化数据(15)。去年,美国国家健康局和The Wellcome Trust突然去掉了一些基因数据库的在线获取功能(16)。这些数据看起来已经匿名化了,仅仅报告了某些基因标记者的总体频率。然而研究表明,在统计上,如果利用数据库中所有个体的全部数据,还是有可能重新确认个体身份的(17)。

因为一条个别的违背保护隐私的小事故就会导致扼杀新生的计算社会学的制度和法律条文产生,所以自我调整的与手续、技术和规则都相关的制度必须要建立起来,从而降低风险,保护潜在的研究。作为自我调整制度的基石,美国机构审查委员会(IRBs)必须增强他们的科技知识来理解入侵和伤害个人的潜在因素,因为新的可能性已经无法用他们当前有关伤害的范例来判断了。很多IRBs的人员很难来评估复杂数据被去匿名化的可能性。而且,IRBs可能需要检查一下是否有必要建立一个专注于保护数据安全的机构。目前,已有的数据在许多组织中传播,这些机构对于数据安全的理解和处理手段是参差不齐的。研究者必须在保留数据做研究的同时开发技术来保护个人隐私。同时,这些系统反过来可能也有助于对于工业界保护客户隐私和数据安全(18)。

最后,计算社会学的发展和其他新兴交叉学科也息息相关(像可持续性发展科学),这就需要发展一个方式来培养新的学者。决定教授职权的委员会和编辑部需要理解和奖赏跨学科发表的努力。最初地,计算社会学需要拥有社会学家和计算学家一起努力。长期地看,这个问题将取决于学术界决定是否应该培养计算社会学学家,或者计量文献社会学家和社会文献计量学家的团队。认知科学的出现为计算社会学的发展提供了一个很好的范例。认知科学涉及的领域包括生物学、哲学和计算科学。它已经吸引了大量资源的投入来创建一个共同领域,而且为过去一代的公共货物作出了很大贡献。我们认为计算社会学具有相似的潜力、值得相似的投入。

链接:http://computational-communication.com/计算社会科学/what-watts-says/

Question 1: Network Science and Big Data

What are your impressions of the way that network science has gone? A lot of it increasingly (since small worlds especially) focuses on the shape of the network, rather than the attributes of nodes, do you think that’s the right way forward? Is there anything big missing from network sociology, or a direction that you think it should be going in? Will “small data” networks be drown out by big data?

How is Network Science going on?

If you look back at my original paper with Strogatz it has “collective dynamics” right there in the title—it was always the relationship between structure and behavior that we thought was interesting, not structure for its own sake.

We also didn’t intend for the “small world” model that we proposed to be interpreted as a realistic model of network structure; rather we were trying to make a conceptual point that even subtle changes in micro-structure could have dramatic effects on macro-structure and hence possibly also macro-behavior.

I’ve also come to believe that modeling exercises that are unconstrained by data have a tendency to gravitate toward phenomena that are mathematically interesting, which is no guarantee of empirical relevance. Fortunately I think that in recent years we’ve seen more emphasis on studying both network structure and collective behavior empirically.(最重要的是数学推导与经验知识的一致性)

What questions truly need big data?

For some questions, such as when we are interested in rare events or estimating tiny effect sizes, it is indeed necessary to have a very large number of observations; in some of our recent work on diffusion, for example, it turns out that a billion observations is not excessive. But for other questions, the scale of the data is much less important than its type or quality. Sometimes it matters that a sample is unbiased or representative; other times it is important to have proper randomization in order to infer causality; and other times still it is important simply that you have instrumented the outcome variable of interest. Regardless, the point is that the data is relevant to the question you’re asking, not how big or small it is.

Question 2: Why socialogical imagination exists but not economic thinking?

Economists have been pretty successful at clearly articulating a set of core concepts that have spread out into the broader world and form the basis for economic thinking: supply and demand; markets; externalities; people respond incentives; perverse incentives; sunk costs; exogenous shocks; etc. Since your early work brought together core sociological concepts (namely social influence and the Matthew effect): i) Do you think sociology should try to reorganize itself around core, “Soc 101” concepts that every introductory class would cover? We often talk about the sociological imagination, but that is much less clear than economic thinking.

The social reality is too complex

This is a tough one. One of things I’ve always liked about sociology is its embrace of multiple viewpoints, both in terms of theory and methodology. Personally I think social reality is too complex to be adequately accounted for by any single theoretical framework—a point that Merton made very eloquently many years ago in his article on middle-range theories. Unfortunately I don’t think his argument was properly understood at the time (e.g. by rational choice theorists) and I don’t think it is still.

The advantage of economics

Perhaps that’s because simple universal frameworks are institutionally powerful even when they’re scientifically questionable. And that’s why it’s a tough question: because I think that one reason why economics is so much more influential than sociology in government, in the media, and in society, is precisely because economists can articulate a fairly coherent worldview that they can all (by and large) get behind, whereas sociologists can’t really agree on anything. Economists are therefore in a much better position to offer answers to questions that people care about, whereas sociologists tend to point out all the ways in which the question is more difficult than the questioner realized. Even if the sociologist’s response better reflects our true understanding of the world, it’s no surprise that most people would prefer to listen to the economist.

Sociologist should try to solve some nontrivial but solvable problems to reach a consensus.

That said, I wouldn’t advocate sociology trying to develop a single set of core concepts just to compete with economics. Rather I would propose that sociologists identify a small set of nontrivial real-world problems that we believe we can actually solve, or at least make some meaningful progress towards solving, and then demonstrating that progress. Identifying nontrivial but solvable social problems isn’t easy, nor do I think that solving problems is the only measure of progress in a discipline. So I certainly wouldn’t advocate that everyone drop what they’re doing to work on these problems, or even try to agree on what they should be. But I do think that being able to point to a set of problems that sociologists have arguably “solved” would greatly enhance our collective reputation and help us to attract more students.

ii) If you could pick a handful of sociological concepts and then have everyone outside of sociology learn them, and they’d be as familiar as the economic examples listed above, what would they be?

A book called Everything is Obvious: Once you Know the Answer

I wrote a book a few years ago called Everything is Obvious: Once you Know the Answer about the failures of commonsense reasoning and how we systematically ignore them. I think the contents of that book is pretty close to the list of concepts I would like everyone to understand, including: the nature of common sense itself; the difference between rational choice and behavioral conceptions of individual decision making; cumulative advantage and intrinsic unpredictability; the fallacy of the representative individual; the perils of ex-post explanations and dangers of “overfitting” to known outcomes; the consequences of overfitting for predictions about the future; and the implications of all of these problems for practical matters of predicting success, rewarding performance, deciding what is fair, and even what is knowable. I wouldn’t claim that these concepts constitute a core of knowledge comparable to core concepts in economics, nor do I think it would help students directly solve real-world problems of the kind I just advocated for, but I do think it would teach students some epistemic modesty and might eventually lead to more intelligent public discourse about these problems. I’m not sure the book has accomplished any of that, but that’s why I wrote it.

Question 4: Try to compare sociology PhDs with industry demands

You made a transition from academia to the private sector. One of the ways that people have suggested improving the sociology PhD job market should is to make work in the private sector a clearer option from the beginning. What do you think about sociology PhDs and the private sector? How do you think sociology PhDs could or should be going about? What skills should they develop? How should they present themselves to companies?

Big data need theoretical knowledge to build valuable insights.

It’s true that companies are increasingly excited about extracting value from data, which has made data science a very in-demand skill set. I also think that companies are starting to appreciate that truly valuable insight requires more than just good computational and statistical chops—some degree of theoretical knowledge is also required in order to ask the right questions, define the right metrics, and avoid basic errors of sampling bias, causal inference etc. This latter trend is much earlier in its life cycle than the former, but I think as companies learn more about the complexities and compromises associated with “big data” they will increasingly demand data scientists with social scientific training.

The bright side and downsides for sociology PhDs to go to industry.

So on the bright side I think that there is real potential for sociology PhDs to find intellectually rewarding work in industry. The downside is that in order to realize any value from their sociological training they also need a level of technical skill that is well beyond what students can expect to learn in the vast majority of sociology PhD programs. In our postdoc hiring we are starting to see a handful of strong candidates with sociology PhDs—up from zero just a couple of years ago—so that’s encouraging. But I suspect that these students mostly figured it out on their own or took it up themselves to find the relevant courses in other departments. Which is fine, and if I were a current sociology PhD that’s what I would do, but I think it would be better for the field to provide a more systematic level of training.

Question 5: The differences between writing for AJS and Natures

You’ve been very successful publishing in journals such as AJS, while targeting broader audiences through high-impact journals such as Nature and Science. How is writing for an AJS audience different from writing for Nature and Science? Where would you send your manuscript if was rejected by Science? Do you think more sociologists should be looking to publish beyond our traditional journals in order to reach a broader community of scholars?

Different readers need different writing strategies.

Writing for AJS is completely different from writing for Nature and Science in almost every sense: length, style, treatment of related literature, acceptable methodology, conception of theory, presentation of results…everything. It’s also different from writing for computer science conference proceedings and physics journals, and all of those different outlets are also different from one another. I also occasionally write magazine articles, op-eds and trade books, and those are also all different in their own way. Learning to write for different disciplinary outlets and in different styles is time consuming and sometimes frustrating—because different groups of readers care about such different things. But I think it’s an effort that sociologists should make.

Try speak to CSers in their languages

The fact is that with very few exceptions researchers in other disciplines don’t read sociology journal articles, and when they do they find them incredibly long and tedious. For example, all that effort that we devote to situating our work is completely lost on most computer scientists, so when they get to the results section they wonder why it was necessary to write 40 pages in order to explain one table of regression coefficients. Given that CS is a much bigger and more powerful discipline than sociology, if want to have an impact on them or convince them that we are worth taking seriously, we will have to speak to them in their language and probably in their own publication venues.

A single high impact paper is worth many low impact papers.

On the other hand a single high impact paper is worth many low impact papers, so from a career perspective it’s not necessarily a waste of time to devote a year or two to getting something into a top journal. I do often wish that we could find a more efficient way to publish our research without compromising quality, and in that regard online-only, open-access journals like PLoS One and Sociological Science have some appealing properties. But the reality is that we live in a highly competitive world where attention is scarce; so my fear is that if we stopped using A-journal publications as a differentiator, the likely substitute (relentless self-promotion on social media anyone?) might be even worse.

Question 6: How to choose between academia and industry

You spent several years at the Columbia Sociology Department. During your time there you mentored several prominent junior scholars including Baldassari and Salganik. How was your experience being an academic sociologist and why did you decide to leave for industry? Will you consider returning to Academia?

Why leave Columbia for Yahoo! Research?

I really loved my time at Columbia but around 2006 it started to dawn on me that, whether it liked it or not, sociology was going to become a computational science, much as biology had become a computational science in the early 1990s. All around us social data were exploding in volume and variety, from email to social networking services to online experiments of the kind I did with Matt (Salganik). It also occurred to me, however, that sociologists weren’t well equipped to handle this transition and that if we were going to make rapid progress we would need to the computer scientists to help, and possibly psychologists and economists as well. Columbia is now pretty open to interdisciplinary collaborations of this sort, and their data science institute is a great example of that openness, but at the time it was very hard to see how it would work within the confines of traditional academic departments.

Suffering from Academia

I was also having difficulty recruiting grad students with rigorous mathematical and computational backgrounds (as you noted there were some like Matt and Delia and also Gueorgi Kossinets, but they were really the exceptions), and raising funding to support the whole thing. Towards the end I felt like I was spending all my time writing grant proposals or sitting in meetings and almost no time doing actual research. So when Prabhakar Raghavan called me from Yahoo! to ask if I would come and help them set up a social science research unit it was very tempting. Even then I wasn’t sure I would do it, and certainly didn’t expect to do it for long, but it really worked out wonderfully and now I’ve been at Yahoo! and Microsoft Research for longer than I was at Columbia.

Doing Research more purely at Microsoft or Yahoo!

Perhaps surprisingly, I think the biggest difference between my experience at Columbia compared with Microsoft (or at Yahoo!) is that I now spend much more time doing and thinking about research. The other big difference is that, in contrast with most university faculty, I am surrounded (literally—we all sit in cubicles in an open plan office) by researchers from different disciplinary backgrounds including psychology, economics, physics, and computer science.One of my colleagues once observed that university departments comprise lots of people with similar training interested in different problems, whereas research labs like ours comprise lots of people with different training interested in the same problems. I think that’s roughly true, and it completely changes the nature of how we work, which is highly collaborative, interdisciplinary, and very problem oriented. That is not to say that we only do “applied” research—we do some of that but we also do a lot of basic science and publish all our work in all the same venues as our colleagues in universities. Rather what it means is that we are more concerned with the relevance of our work to real-world problems and less concerned about what particular disciplinary tradition it fits into.

Would I ever consider returning to academia? I don’t know. I’m very happy at Microsoft right now: I work with fantastic colleagues, we get amazing PhD student interns every summer, and we work on a wide variety of extremely interesting problems. It’s been a great experience and every day I’m grateful to have the job that I have. So although I wouldn’t rule out returning to academia one day I’m not in any hurry to leave.

Question 7: Common sense and its importance for sociology

You recently wrote an article on common sense and its importance for sociology. What was the intuition for it?

Sociologists conflate causal explanations with explanations that “make sense” of outcomes they have observed

As I mentioned, I recently wrote a book about how people rely on common sense more than they realize, and in so doing end up persuading themselves that they understand much more about the world than they actually do. In course of writing the book, it occurred to me that sociologists make many of the same mistakes that other people do. Just like other people, that is, sociologists conflate causal explanations with explanations that “make sense” of outcomes they have observed, unconsciously substitute representative individuals for collectives, overfit their explanations to past data, and fail to check their predictions. I didn’t belabor this point in the book because, as I mentioned earlier, I wanted it be an advertisement for sociological thinking not a critique of it. Nevertheless I thought the implication was pretty clear, so I was disappointed that some of my colleagues who liked the book’s appeal to non-sociologists seemed to think it had nothing to say to them. I decided that if I wanted them to get the message I would have to sharpen it up a lot, and also make it a bit more constructive; so that’s what I tried to do in that paper.

Question 8: Changes needed for Sociology department and some tips for current sociology graduates

If you were in charge of a Sociology department and could implement any change you’d like, what specific changes would you introduce to its graduate training program? Is there something that current sociology graduate students aren’t doing that they should be doing?

Changes needed for Sociology department

As I mentioned earlier, I think that a data science sequence (e.g. data acquisition, cleaning and management; basic concepts and programming languages for parallel computing; advanced statistics, including methods of causal inference; some basic machine learning; design and construction of web-based experiments) would be super useful for sociology graduate students, and would make them both better social scientists and also much more attractive to prospective employers. There are already a handful of courses of this sort being trialed in various places, including Stanford, Columbia, and Princeton, and sociology departments could work with their colleagues in other departments to pull together a reasonable sequence from existing pieces. It would take some effort and probably resources, but I don’t think it’s unfeasible.

Some tips for current sociology graduates

In the meantime, as I mentioned earlier: if I were a current sociology grad student, I would be busy taking courses in computer science and statistics to augment my sociology training. I would also look around for any groups doing computational social science and ask to join them.

It is an adventurous thing to join a new interdisciplinary fields like computational social science.

The downside of new, interdisciplinary fields is that nobody really knows what is involved or what the standards are, so you have to be prepared to take some risks and also to feel out of your depth much of the time. The upside is that it can be incredibly stimulating, and there is the possibility of doing something genuinely new. I think computational social science is in that phase now, so it’s a great time for ambitious and creative students to dive in and see what they can do.

Sociological theory, if it is to advance significantly, must proceed on these interconnected planes: 1. by developing special theories from which to derive hypotheses that can be empirically investigated and 2. by evolving a progressively more general conceptual scheme that is adequate to consolidate groups of special theories.

— Robert K. Merton, Social Theory and Social Structure

为什么许多经典的社会学著作读起来那么艰深晦涩?因为早期的社会学从哲学研究继承而来,因而过于抽象。对此,默顿提出了中层理论(Middle-Range Theory)。

中层理论原则上应用于社会学中对经验研究的指导。中层理论介于社会系统的一般理论和对细节额详尽描述之间。……中层理论涉及的是范围有限的社会现象。

…what might be called theories of the middle range: theories intermediate to the minor working hypotheses evolved in abundance during the day-by-day routine of research, and the all-inclusive speculations comprising a master conceptual scheme.

— Robert K. Merton, Social Theory and Social Structure

Our major task today is to develop special theories applicable to limited conceptual ranges — theories, for example, of deviant behavior, the unanticipated consequences of purposive action, social perception, reference groups, social control, the interdependence of social institutions — rather than to seek the total conceptual structure that is adequate to derive these and other theories of the middle range.

— Robert K. Merton

胡翼青在其《传播实证研究——从中层理论到货币哲学》一文中指出:

我认为,如果真的像默顿所设想的那样,中层理论指导下的实证研究范式对传播研究当然有重要的认识论意义。……默顿的中层理论也不是完美的理论主张,也有两个值得商榷的前提,其一是默顿把社会学当做是一门像物理学一样,可以通过理论的积累来最终形成完善理论体系的;其二是他认为社会学理论是会向着某个方向不断进步的。而这两个前提都是不可靠的。

以下内容摘自《社会理论和社会结构》 Robert.K.Merton 第二章 论社会学的中层理论

社会学理论的总体系

比起对无所不包的统一理论的探索,对中层理论的探索要求社会学家信奉完全不同的东西。

早期社会学是在创立高度综合的科学体系的学术气氛之中成长起来的。……几乎所有的社会学先驱者都试图建立他自己的体系。每一个体系都宣称是真正的社会学。

应该以发展的眼光来看待社会科学中大多数理论体系的建立与自然科学中学说和同类体系建立的不同。在自然科学中,理论和描述体系都是随着科学家增长着的知识和经验而完善的。在社会科学中,体系常常脱颖于一个人的心智。如果这些体系引起人们注意,那么它们就得到更多的讨论。但是,让多方合作共同努力使其不断完善的情况则很罕见。——L.J.亨德森(生物化学家和业余的社会家)

现在所谓的社会学理论,很大一部分只是对于数据的一般定向,它们提供理论必须说明的各类变量,而不是对具体变量之间的关系做清晰的表述和可验证的判断。

爱因斯坦尽管自己执着而孤独地献身于这一追求,然而他承认:

物理研究的大部分工作,是致力于发展物理学的各个分科,每一分科的目的都是要对那些有限的经验作理论上的解释,而且每一分科中的定律和概念都要同经验保持尽可能密切的联系。

那些希望在我们时代或不久之后能有一个坚实的、普遍的社会学体系的社会学家应该好好想想这些论述。如果自然科学经过几个世纪的积累的理论概括都未能产生出一个包罗万象的理论体系,那么更有说服力的是,社会学这门科学只是刚开始在有限范围内积累依赖经验而获得的理论概括,它似乎最好节制一下它对这种体系的渴望。

对社会学总体系的功利主义压力

我对于社会学家面临的实际问题与他所积累的知识和技艺之间这条鸿沟的强调,当然不是说社会学家不应去寻求去发展日益综合的理论或者不应从事研究直接相关的迫切的实际问题。更不是说社会学家应该可以去研究琐碎的实际问题。各种基础研究和理论概括都相应地与特定的实际问题密切相关。至少也有着潜在的相关。但是,重新确定相应的历史意义是很重要的。社会实际问题的紧迫或重大并不保证这一问题就能得到及时解决。在任何特定的时候,科学工作者都只能解决某些问题,而对其他问题一筹莫展。

理论总体系和中层理论

由此看来,有理由认为:到目前为止,社会学把主要关注点(但不是唯一关注点)放在发展中层理论上,它就能够取得进步;如果它的主要精力集中于发展大而全的社会学体系,则它就会停滞。

社会学理论要想获得有意义的进步,就必须在这两个相互联系的层次上发展:(1)创立可以推导出能够接受经验研究的假设的特殊理论;(2)逐步地而非一蹴而就地发展概括化的概念体系,即能够综合各种具体理论的概念体系。

完全专心于特殊理论就会冒这种风险:提出一些解释社会行为、组织和变迁的有限方面而又自相矛盾的特定假设。

完全专心于能从中推导出所有局部理论的权威概念体系就要冒把二十世纪的社会学引向过去庞大的哲学体系的风险,后者花样繁多,体系壮观,但灼见贫乏。那种完全局限于对极度抽象的总体系进行探索的社会学理论家,其思想就如同时髦装饰品,空洞无物而令人生厌。

如果像社会学早期那样,每一个魅力超凡的社会学家都试图创立自己的总理论体系,那么通往真正的综合社会学理论的道路就会被阻塞。……社会学的理论发展表明,强调中层理论的方向是必要的。

现在所谓的社会学理论,很大一部分只是对于数据的一般定向,它们提供理论必须说明的各类变量,而不是对具体变量之间的关系做清晰的表述和可验证的判断。我们有许多概念,但很少有已被证实的理论;有许多观点,但很少有定理,有很多“方法”,但很少能实现。也许在重心行的进一步变化是非常有益的。

我们对社会学中层理论的讨论就是打算澄清所有社会学理论都面临的一个决策问题:哪一个研究方向应该更多地使用我们的集体能力和经历:是对可验证的中层理论的研究还是对无所不包的概念体系的研究?我相信——尽管信念是容易遭到误解的——只要人们在从事中层理论研究时能普遍注意将专门理论整合为较少抽象的概念和相互吻合的命题,中层理论就将是最有前途的。

小体系也能成事,它们同样生灭有时。——我的兄长们和丁尼生的暂时性看法

更为综合性的理论的发展是通过对中层理论的整合来进行的,而不是突然大规模地从个别理论家们的工作中出现的。

正如我在别处指出的,这一策略既不是新的也不是外来的,它深深扎根于历史之中。培根比所有前辈都更加强调科学上“中级原理”(middle axioms)的重要性。

唯有中级原理才是真正的、坚实的和富有活力的,人们的事务和前程正式依靠着它们,也只有由它们而上,最后才能达到真正最普遍的原理,并且不再是那种抽象的,而是与中级原理相关的最普遍的原理。柏拉图在他的《泰安泰德篇》里说得好:“细节是无穷的,而较高的概括不能给予明确的指导。”使得专家不同于普通人的全部科学的精髓则在于中级命题,在特殊知识里,这种中级命题源自传统和经验。——培根

……我们能够形成有限的理论,能够预测一般趋势和普通的因果律;如果它们扩大到全人类,在很大程度上可能就不真实了。但是,如果把它们限于一定的国家,它们就具有某种真实性……通过缩小观察范围,通过把自己限定在某些类型的社群上,并且如实地表述事实,就有可能扩大政治理论的范围。通过采取这种方法,我们能够增加从事实中得出的真正政治公理的数量,同时,使这些公理更加充实、生动和牢固。与仅仅是空洞无物的概括相反,它们类似于培根的中级原理;这种原理是对事实的概括表述,但非常接近实际,可作为生活事务的指南。——乔治.康沃尔.刘易斯

涂尔干的专著《自杀论》(Suicide)也许是运用和发展中层理论的经典实例。

With the introduction of the middle range theory program, he advocated that sociologists should concentrate on measurable aspects of social reality that can be studied as separate social phenomena, rather than attempting to explain the entire social world. He saw both the middle-range theory approach and middle-range theories themselves as temporary: when they matured, as natural sciences already had, the body of middle range theories would become a system of universal laws; but, until that time, social sciences should avoid trying to create a universal theory.

– Mjøset, Lars. 1999. “Understanding of Theory in the Social Sciences.” ARENA working papers.

Network Diversity and Economic Development

by Nathan Eagle, Michael Macy, Rob Claxton

目标是研究一个国家的社会网络结构对社会发展的影响(the social impact of a national network structure)

前人研究表明,商机往往出现在本地熟人朋友圈之外的社会关系中。

heterogeneous social ties may generate these opportunities from a range of diverse contacts(1,2)

  1. M. Newman, SIAM Rev. 45, 167 (2003).
  2. S. Page, The Difference: How the Power of Diversity Creates Better Groups, Firms, Schools, and Societies (Princeton Univ. Press, Princeton, NJ, 2007).

之前由于数据难以获取,导致网络多样性(network diversity)与人群经济状况(population’s economic well-being)之间的关系不能被量化的研究

以往在个人层面的研究已经表明,跨社群的社会关系会给个人的发展带来益处。Previous studies have found that individuals benefit from having social ties that bridge between communities.These benefits include:

  • access to jobs and promotions (5–13)
  • greater job mobility (14, 15)
  • higher salaries (9, 16, 17)
  • opportunities for entrepreneurship (18, 19)
  • increased power in negotiations (20, 21).

Although these studies suggest the possibility that the individual-level benefits of having a diverse social network may scale to the population level, the relation between network structure and community economic development has never been directly tested (22).

研究意义:

As policy-makers struggle to revive ailing econ- omies, understanding this relation between net- work structure and economic development may provide insights into social alternatives to traditional stimulus policies.

研究数据:

The communication network data were collected during the month of August 2005 in the UK. The data contain more than 90% of the mobile phones and greater than 99% of the residential and business landlines in the country.

The resulting network has 65 × 106 nodes, 368 × 106 reciprocated social ties, a mean geodesic distance (minimum number of direct or indirect edges connecting two nodes) of 9.4, an average degree of 10.1 network neighbors, and a giant component (the largest connected subgraph) containing 99.5% of all nodes (23).

引入IMD

Although the nature of this communication data limits causal inference, we were able to test the hypothesized correspondence between social network structure and economic development using the 2004 UK government’s Index of Multiple Deprivation (IMD), a composite measure of relative prosperity of 32,482 communities encom- passing the entire country (24), based on income, employment, education, health, crime, housing, and the environmental quality of each region (25). Each residential landline number was associated with the IMD rank of the exchange in which it was located, as shown in Fig. 1.

用香农熵计算Topology Diversity

We developed two new metrics to capture the social and
spatial diversity of communication ties within an
individual’s social network. We quantify topological diversity as a function of the Shannon entropy

High diversity scores imply that an individual splits her time more evenly among social ties and between different regions.

Diversity was constructed as a composite of Shannon entropy and Burt’s measure of structural holes, by using principal component analysis(PCA). A fractional polynomial was fit to the data.

Reference

Eagle N, Macy M, Claxton R. Network diversity and economic development[J]. Science, 2010, 328(5981): 1029-1031.