AI & Public Policy, July 22-23, 2017
Tsinghua University & University of Chicago
Public Policy in Tsinghua: wide range research, set up in 2000
Four major transformations in China since 1979
- Economic system: planning->market
- Industrial structure: Agriculture+manufacturing-> manufacturing
- Society: rural->urban; closed->open
- governance system: chrisma & authority -> efficiency
- How is Chinese publications’s quality as it has growed a lot?
- How to measure it?
- How to explain the rise of Chinese publications in high quality journals?
use Excellence in Research for Australia journals as a criterion to select journals
- 知识社会学(Intellectual Sociology)
- Science of Science - Individual, Citation
- 文献情报与信息管理 - Journal, Influence, Index
Science as a complex system
- How does science reproduce?
- How does science evolve?
- How does science persist?
- How do fields ignite?
Knowledge Representation: Collocation Hypergraph/Adjacency Tensor
Through representation, everything is close to each other. It’s a small world after all.
Novel(improbable) outcomes: Novelty - 1/P(I|A, S)
Content Novelty & Context Novelty: 0 correlation
- content: people combining knowledge, pooling concepts
- context: the community you draw from
Science thinks like a Global Bayesian. Science thinks not like what scientist thinks.
What’s Science’ Objective?
- Solving the world’s problems.
- Discovering what it discovers
- Transforming itself
- Generating robust, generalizable knowledge?
topic models - a mathematical identical way to realize the paper
Predictive Signals Behind Success
Using Social theories, combining mathematical methods
Just like the keynote on IC2S2, 2017
Q: Success can be measured, modeled and predicted?
the collective feature of success
You are successful because all of others think you are successful
Modeling Citation Dynamics: 3 factors
- Preferential Attachment
- Intrinsic Novelty
Combine the 3 factors to measure the probability of paper citation and it can be solved analytically
Rescaled Citation and Rescaled Time
Quantifying the evolution of individual scientific impact
Will a scientist produce higher impact work following a major discovery? Hope.
Timing of the hits is high between 0 and 20 years, decays afterwards? Actually it is random! It decays just because their publications decayed. Method: break up the timeline and choose a middle position to observe.
What happends after your biggest hit? Winning begets more winnings
Hot hand phenomenon in artistic, cultural and scientific careers. Biggest vs. Second Biggest hit·
What is the innovation of diffusion? It’s not about adoption, but about substitution.How the substitution looks like? Exponential Growth/ Logistic Growth
- Handset, Impact: number of handsets sold, every handsets have their own exponential parameters
- Mobile Apps, Impact: number of downloads
Power law grows much much slower than exponential. What mechanisms are responsible for the observed non-analytic growth?
Understanding Patterns of Substitutions
A Country-wide Mobile Phone Datasets: 3.25M Users, Everyday over 10 years, ~9000 handset models
Metric: Substitution Probability, determined by 3 factors(model)
- Preferential Return
3 different systems, determined by the same 3 factors(mechanisms)
- Impact grows as power law with non-integer exponents
- By exploring large-scale datasets, we find three mechanisms govening substitution patterns
- We derive Minimal Substitution model, allowing us to not only predict the observed growth pattern, but also to collapse impact trajectories into one universal curve
- The Minimal Substitution model predicts an intriguing connection between short term impact and long term impact
To finish this work this summer, I hope.
A story about 10%
10% -> 60%, 75 years
Theory by geology, provide innovators more data to uncover the fundamental mechanisms behind it.
- Why these 3 factors?
- Given 3 factors, how did you build the model in that way? How did you evaluate that it works best?
It is the minimal model we can have according to our citations. After all, we can make sure that the 3 mechanisms work.
Only by the curve-fitting technology can we find the minimal result.
Small teams create problems and grow attention into future, big teams solve them and harvest. Big teams chase successful works of small teams
Sleeping Beauty Index - PNAS
Funding and Scientific Research: National Science Fund for Distinguished Young Scholars
The Nature of Repeated Failures
Data has to ‘outlive’ individual careers, NIH datasets
alpha - stiffness, use alpha to build up a model
Each failure-success is a circle
School of Computer Science, Southwest University
Probing Behavior of Scientists
Quantifying patterns of research-interests evolution, Nature Human Behavior
We are what we repeatedly do. - Aristotle
Big Data -> Activities -> Features -> User Profile
- Heterogeneity: topic tuple usage in an individual’s career follows a power-law distribution
- Recency: An individual is more likely to publish on research subjects studied recently
- Subject Proximity
Model: Scientific research is like a random walk
To what degree could these patterns be captured by a simple statistical model?
Faculty of Education, University of Hong Kong
Age and team of great scientific discoveries in China
On-going work, only some figures presented
Bots improve human coordination in network experiments
Amazon Mechanical Turk: Online Labor Market - Game on Network - Quantitive Data. breadboard.yale.edu
How can bots accelerate the coordination process?
Every player chooses the best color locally, but the problem was not solved
Microsoft Research, NYC, @hb123boy
Conducting human subjects experiments in the ‘virtual lab’
Computational Social Science
In 1950s, people are set in an experimental room to be tested.
Virtual Lab: Bring the lab closer to the real world, using the Internet as a lab
- Complexity Realism
- Duration, Participation
- Size, Scale
TurkServer, built on Meteor web app framework: https://github.com/TurkServer/turkserver-meteor, Crowd Mapper, Andrew Mao, Winter Mason, Siddharth Suri, Duncan Watts
Intertemporal Choice, Kevin Gao, Dan Goldstein
Long-run Cooperation, a very long prisoner’s dilemma experiment, Andrew Mao…Duncan Watts
PhD candidate in Princeton University, collaborated with Jennifer Pan
Identifying protests from social media data, using deep learning techniques
Training datasets: The Lu dataset, collected by Chinese lawyer Yuyu Lu, from blogspot
Hard task: text are short and meanings are tricky
- Text: RNN(LSTM)
- Image: 4-layer-CNN
Officer mobility in China, What factors will influence?
Data Source: Prof. Zhou Xueguang
Factor based to agent based - Casual inference to Sufficient Condition. Fractual Network.
Logic -> Structure: Rich-get-richer and hub-repulse
Not Markov Process, Not Random Walk. Efficient way to fill a space: 3/4 law
RNN - LSTM
- Machine Translation
- Music Composer
No music theory, Representation, loss function, sequence2sequence
Deep learning in autonomous driving, Momenta
Industrial Thinking: Technology must go first.
Only successful way in industry - Supervised way
- Public: ImageNet
- Blooming of Internet
Software and Infrastructure(data storage)
- Git, AWS, Amazon Mechanical Turk(for labeling)
Faster R-CNN, arxiv.org/abs/1506.01497, Fully Convoluntional Networks for Semantic Segmentation
Physians: Your work is so ugly! There are too many parameters in your model.
Map of Complexity Science, by Brian Castellani
Complexity - AI
Why bother a neural network
- It is a a good predictor
- It can extract features automatically
Deep Learning fights poverty, Science, remote sensing data
use CNN(feature extractor) to train a model to predict lightness(already labeled data)， use these feature(model - first several layers) to concate another model to predict poverty(transfer learning)
Complext network classifier
use a neural network to classify the complex networks(small-world or scale-free), network representation. Image is most easy to be encoded(represented), text is also solved by word2vec.
Deep walk algorithm - use random walk to generate sequence
What the CNN learns - 2 filters
How to recognize without links - Deep Walk can contain the link information into coodinates.
Deep Learning can be used to solve complex network problems. Can DNN become an expert in complex network?
Hyperbolic Network, Boltzmann Machine and Holographic Duality
Popularity vs. Similarity
~ renormalization - field theory
Boltzmann Machine, Hyperbolic Space
Data-driven urban studies, combined computer science and economics
Former Data Scientist in Baidu, Data Science Company - QuantUrban
What we can do with mobile phone data?
- Mobile Phone Data and Urban Dynamics, Real-time dynamics
- Day and Night Population Distribution
- Mapping Home-Work Connection with Machine Learning, 百度地图 - 常去地点, Rule-based, Label
- Commuting Data and Visualization
- Community Detection and City Boundary
- Population Migration During Spring Festival
- Spatial-temporal Behaviors and Economics
- Mobile Internet Coverage and Poverty
Toolkits for social scientists
- Spider - system, dashboard
- Mobile Turk - label data