COMP3121 Social and Collaborative Computing Notes 人群与网络

Lec 1

1.1 无处不在的网络

从差序格局到社会网络

  • 费孝通先生(1947,1985: 23):源与众
    • (中国人的社会关系)好像是把一块石头丢在水面上所发生的一圈圈推出去的波纹。每个人,都是他社会影响所推出去的圈子的中心,被圈子的波纹做推及的,就发生联系。每个人在某一时间某一地点所动用的圈子是不一定相同的
  • 林南先生(2004:37):网络
    • 一个特定的网络可以自然地形成,也可以有对一个特定的共同关注的焦点或关注一种资源利益的社会性建构
  • 只要有“人群”在,就会有“网络”

无处不在的网络

  • There are also networks in nature

    在自然界,网络同样无处不在

    • C. S. Elton (1927), different species: prey and predators also form a network, they form a food chain (大鱼吃小鱼,小鱼吃虾米) 在生态系统中,物种之间的“吃”与“被吃”也是一个网络,食物链是自然生态系统稳定的重要机制

    • A material network: the water system: from surface to stream to river to lake, to evaporation, to rain (precipitation)

      一些自然资源之间的关系,如水的转换关系,也是一个网络

    • ….

  • 甚至可以说,网络是自然世界生发演变的机制之一

形形色色的网络 All kinds of network

Arpanet (1979, 13 nodes) 1979年12月ARPA计算机网络(13节点)

<img src=”images\image-20210913161900145.png” images\image-20210913161900145” style=”zoom:50%;” />

  • Transportation, Postal services, Telecommunication, Internet, WWW

    交通运输网,邮政网,电话通信网,计算机网,互联网,万维网

  • Social networks, Finance networks

    社会关系网,产品供销网,金融借贷网

  • Wireless, Smart Grids, Sensor networks

    智能电网,无线网,传感网,物联网

  • Neural network, food chain

    神经网,生物代谢网,食物链(网)

  • 攻守同盟网,恐怖主义网络

  • Weibo, QQ

不同类型网络的依存 The dependency of networks

  • Internet – physical, technical dependency

    互联网:物理的、技术的

  • WWW – information dependency

    万维网:基于互联网的信息网络

  • Weibo – online social dependency

    基于万维网的在线社会(社交)网络

  • Social network – in old time, the dependency comes from geography. That is, geographically closer people have higher chances to build a social network in ancient times.

    社会关系网,与地域相关(尤其在古代),地域上相邻或相近,见面的机会多,建立关系的可能性就大,随着交通工具的发达,社会关系的地域范围扩大

ICT技术发展的影响 The impact of ICT

  • Information and computing technology makes the network improves substantially

    催化了各种网络的发展

    1. The scale becomes larger, the domain becomes wider 规模变大,范围变广

    2. New forms of networks show up, e.g., network of common interest, one-to-one communication, one-to-many communication, etc.

      新型网络的涌现

    3. There are richer and more complex networks

  • ICT makes it possible to analyze the structure and evolution of large scale network

    ICT技术的发展使分析和理解大规模网络的结构与演变成为可能

    • Paradata – the behavior trace can be (or are) all recorded

      行为(痕迹)数据(para data)与网络的运行伴生

    • They form big data

      海量数据(Big data)分析的能力(计算设施,算法工具)

      • The development of storage and processing capabilities by ICT. In the past, with social sciences, only ~100 nodes can be processed Now we are capable to process 100 million nodes

        以社会网络分析为例,从前的分析能力限制在上百个节点,现在可以分析上亿个节点的网络

郭敬明 Weibo, Jingming Guo

The retransmission of the topic: movie: tiny times/小時代, 100,000 nodes/weibo

小结 A summary

  1. Network is not a phenomenon today. In both nature and in human society, it forms long ago

    ”网络“并不是在今天才出现的现象。自然界本来就有,人类社会自古也有

  2. As long as there are people, there are network 只要有人,就有网络

  3. Network is everywhere and in many forms 网络无处不在,形式多样

  4. ICT creates new meanings and opportunities

    信息通信技术创造了新的意义和机会

  5. Richer and more complex networks更丰富和更复杂的网络

  6. Better processing capability of networks更好的网络处理能力

  7. 计算能力的发展,使得针对网络现象的可计算性越来越强,进而使得对人群的行为及其于网络之间互动的可预测性也越来越有意义

  8. ICT环境的发展,使人群之间的网络现象及其意义更加凸显

1.2 网络与图 Network and Graph

图 Graph

When we think about a network, we think about:

当我们想到“网络”这个词语

Object + Connection/Relationship
“事物” + “联系”

  • Node, also called vertex 节点(vertex,node,point)

  • Link, also called edge 边(连接,链接,关系,联系;edge,link,tie)

图=事物+联系 Graph = object + connection

With nodes and links

  • 1 is coach, 34 founder 1人是教练,34人是创始人

Question

  1. In a class, there are four students, A, B, C, D. A is a friend of B and C. B is a friend of A and a friend of C and D. C and D are also friends. 一个类有四个学生,A, B, C, D, A是B和C的朋友,B是a的朋友,C和D的朋友,C和D也是朋友。

  2. Which of the following graph describes such relationship

同构 Labeling

  • The intrinsic structure of the two graphs are the same, only presented in different drawing

    同构:画法不同,但本质上(结构上)相同

  • In graph theory terminology: Isomorphism
    这两个图的内在结构是相同的,只是用不同的图来表示

小结 Summary

  • Graph is an abstract of network information; it describe the relationship of nodes in the network

    图是网络结构信息的抽象,表达的是网络中各种食物之间的关系。

  • We emphasize on Graph, NOT Image/figure, Nodes (Objects) + Links (Connection)

    这里所说的“图”,不同于日常生活中看到的“图像”,尽管图常常也可以被画出来,呈现出一种图像的形式

  • One graph can have many different ways in drawing

    同一个图,可能有多种不同的画法。也就是说,同一个图可能呈现出不同的图像形式

Lec 2

2.1 路径,连通图,连通分量 Path, Connected Graph, Connected Component

路径、最短路径、距离 Path, Shortest path, Distance

  1. A path is simply a sequence of nodes with the property that each consecutive pair in the sequence is connected by an edge

    路径:在一个图中,从一个节点开始,沿着跟它相关的边,能够到达另外一个节点的可能

  2. The path between F and L F和L之间的路径:

    F-H-J-K-L,

    F-H-L

    F-G-J-K-L,

    F-H-J-G-I-K-L

  3. Shortest path(最短路径)F和L最短路径: F-H-L

  4. Distance: 2 (两个节点之间最短的长度叫做距离)

连通、连通分量 Connected graph, Connected Component

  1. Nodes are connected

    If there is a path between them

  2. Connected component

    Every node in the subset has a path to every other

    节点之间存在路径

    The subset is not part of some larger set with the property that every node can reach every other

    不包含在其他的连通分量中

    • 这个图是不连通的,由三个连通分量构成

小结 A summary

Path is one of the most important concepts in graph theory; we can study the interaction of two non-neighboring nodes

路径是图论中最重要的概念之一,我们可以研究两个非相邻节点的相互作用

Connected component emphasizes on its “maximal” property; this makes it possible for us to more accurately discuss problems

连通分量强调其“最大”性;这使我们能够更准确地讨论问题

2.2 二部图与广度优先搜索

二部图 Bipartite graph

二分图也称二部图,是图论里的一种特殊模型,其最大的特点在于,可以将图里的顶点分为两个小组,且各自小组内的点没有直接关联。

如果某个图为二分图,那么它至少有两个顶点,且其所有回路的长度均为偶数,任何无回路的的图均是二分图。

It does not contain cycle with an odd number of nodes

  • A可以划成B
  • D可以划成E
  • A, B, D, E are Bipartite graph.

一个图是二部图是它没有长度为奇数的圈

怎么判断一个图是否包含长度为奇数的圈呢?How to judge whether there is a cycle of odd number of nodes or not?

图上的广度优先搜索(遍历) Breadth first search (Computer Science)

  1. Starting from any node, enumerate all the nodes one by one following links (algorithm)

    从任意一个节点开始,沿着相连的边,将图的节点一一列举出来的一种过程(算法)

    从F开始

    <img src=”images\image-20211006214541016.png style=”zoom:50%; “ />

  2. Starting from any node, use breadth first search. If you can find two nodes from the same layer having links, there must be a cycle of odd number of nodes

    从任意节点开始,使用广度优先搜索。如果你能找到来自同一层的两个节点有链接,那么一定有一个奇数节点的循环

    <img src=”images\image-20211006214714649 style=”zoom:50%; “ />

​ 上图从F开始,下图从I开始

小结 A summary

  1. We have learned some basics (definitions) of graph

    Graph elements, node, link

    Path, connected graph, connected component

  2. We learned some basic modeling of the real world relationship into graphs (e.g., student relationship)

  3. We also learned some problem solving skills (develop an algorithm, i.e., breadth first search)

    Search from one node to another

  4. 许多社会现象或状态的结构,都呈现出二部图的形式

  5. 是否有长度为奇数的圈,是判断一个图是否为二部图的充分必要条件

  6. 广度优先搜索,是考察一个图是否存在长度为奇数的圈的有效方法(算法)

2.3 三元闭包与聚集系数

一般研究方法 The general study methodologies

  1. We will gradually study more properties of the nodes, links, graphs

    我们将逐步研究节点、链接、图的更多性质

  2. We will gradually study how real world problems can be transformed into graphs (with properties)

    我们将逐步研究如何将现实世界的问题转化为图(带有属性)

  3. We will study how to do inference from the properties (and finally solve problems)

    我们将学习如何从属性中进行推理(并最终解决问题)

从一个现象说起

An angle to look

![image-20210906153217318](images\image-20210906153217318.png”/>

Consider not only a snapshot, but evolution

不仅考虑一个时刻(“快照”)上的状态

还要研究随时间发生的变化(内部原因 vs 外部原因)

三元闭包 Triadic Closure

  • The possible reasoning from sociology: Anatole Rapoport, 1953

    社会网络演化的基本结构性原因

    • Triadic closure: if two people, who originally don’t know each other, have a common friend, they have an increasing chance to get to know each other

      三元闭包:如果两个原本互不相识的人有了一个共同的朋友,则他们俩将来成为朋友的可能性更高

    • Opportunity 机会?
    • Trust of common friends 信任?,e.g.,宋江 / 水滸傳
    • Incentive 动机?
  • 林南: 2004

    • A network can form by natural

      一个特定的网络可以自然地形成

    • Or formed by common interest

      也可以有对一个特定的共同关注的焦点或关注一种资源利益的社会性建构

聚集系数 Clustering Coefficient

![image-20211006215549059](images\image-20211006215549059.png”/>

对节点属性刻画的一种方式 The property of a node

聚集系数

  • Clustering coefficient of A
    • = the probability of any two nodes who are neighbors of A are also neighbors = the friend-pairs of A / total pairs
    • 节点A的聚类系数= 与A的相邻的任意两个朋友之间也是朋友的概率= 与A相邻的朋友对的个数(有连线的)/总对

$$
聚类系数 = \frac{两节点都属于N_u集合的边的个数}{C_{|N_u|}^2}
$$
分子=对于节点u来说,先找出它的邻居节点集合Nu,然后计算两节点都属于Nu集合的边的个数,这就得到分子,节点u的朋友对(有连线的)

分母=计算Nu的基数(Nu元素的个数)。然后计算组合。

  • The interpretation of cluster coefficient, from social science point of view, is how coerce this person can be

    聚集系数,就是三元闭包中,对一个节点属性的测度,表示“凝聚力”的大小

  1. Pairs = {BC, BD, BE, CD, CE, DE}
  2. Friend pairs(Nu) = {CD} 只有CD有连线
  3. The clustering coefficient of A is 1/6
  4. 1/C(2,4)= 1/6

EG:

![image-20211009004434272](images\image-20211009004434272.png”/>

What is A’s clustering coefficient?(Test)

  1. Pairs = {10}

  2. Friend pairs = {CD, ED, CB, BG}

  3. 4/C(2,5)= 4/10 = 2/5

小结 A summary

  1. In a graph, node in certain location can have different properties

    在一个网络结构中,某些位置的点是具有特殊意义的

  2. Clustering coefficient is one way to describe this

    刻画一个网络结构中节点的属性,可以采用聚类系数

  3. The relationship between nodes can change along time

    节点之间的关系可以随着时间的变化而发生变化,在有些之前没有关系的节点之间,也可能出现边

  4. One structural mechanism, i.e., if there are two links among three nodes, it is likely to spin off another link

    是否出现边,其中的一个机制,是结构性机制,即三个节点之间如果有两条边,则没有边的节点之间极有可能发展出边

2.4 三元闭包原理的大数据验证 Triadic Closure Verification using Big Data

需要解决的两个问题 Two problems

  1. Transforming triadic closure from qualitative statement to quantitative statement

    将三元闭包原理最初的定性陈述转变成一种可以定量考察的表达

  2. Find an appropriate data set to verify

    找到一种合适的社会网络数据

三元闭包的两种表达 Two Statements of Triadic Closure

  1. Original statement 最初的表达

    If two people in a social network have a friend in common, then there is an increased likelihood that they will become friends themselves at some point in the future

    如果两个互不相识的人有了一个共同朋友,则他们俩在未来成为朋友的可能性增加

  1. Can be transformed to: 可以转变成

    If there is greater number of common friends of two people, who originally don’t know each other, then there is greater probability that they will become a friend.

    如果两个互不相识的人的共同朋友数量越多,则他们俩在未来成为朋友的可能性越大

Transformed statement

  1. If there is greater number of common friends of two people, who originally don’t know each other, then there is greater probability that they will become a friend.

    三元闭包原理:如果两个互不相识的人的共同朋友数量越多,则他们俩在未来成为朋友的可能性越大

用什么数据验证?How to verify

  • Email network ~ social network

    电子邮件网络 约等于 社会网络

    • The emails of the 20,000 students in a university

      一所大学的两万名学生在一年里的通信关系数据

    • Only cares of who emailed who, do not need to care about the email contents

      只在乎谁给谁何时有过通信,不需要在乎邮件内容

哪一种情形更有可能 Triadic closure: which one is more likely?

可能性怎么衡量 How to measure such probability?

Assume there are 100 node pairs, and there are no links between them. Also assume that each pair of them have 5 common friends.

假设:网络中有100对节点,某一时刻之前之间没有边。但分别都恰好有5个共同的朋友。

If, in a month, 20 node pairs have communications, we say that the probability that two people who do not know each other become friends is 20%.

如果在一个月内,其中有20对节点俩俩之间发生了通信,我们就说两个不相识担忧5个共同朋友的人,在一个月里将成为朋友的概率是20%。

“Empirical Analysis of an evolving social network,” Science 2006

数据验证过程示意

没有边的两点 共同朋友的数量 过一段时间成为朋友
(1,4) 2 *
(1,6) 3
(2,3) 2 *
(2,5) 3 *
(3,4) 0
(3,5) 2
(4,6) 2 *
共同朋友数量 成为朋友的概率
0 1 0 0
1 - - -
2 4 3 0.75
3 2 1 0.5
4 - - -

小结 A summary

  1. We have studied the property of nodes

  2. The social interpretation of the properties of nodes and links

  3. The relationship of nodes changes along time

  4. Triadic closure and its validation through big data
    From social principles -> qualitative description -> quantitative description
    Find appropriate data sets, and a systematic verification process

  • 以三元闭包原理的验证为例,我们看到了一种利用大数据分析,定量考察某些社会科学定型认识的方法
  • 其中有两个关键
    • 将社会科学原理的定性描述,转化为便于定量分析的表述,形成数据指标(与共同朋友数对应的概率)
    • 选择合适的数据,以及从原始数据中提炼出指标数据的方法

2.5 强关系、弱关系、结构洞

回顾三元闭包 Review:

![image-20210906161144615](images\image-20210906161144615.png”/>

  • This considered whether links will be formed between nodes along time.
  • We discuss more on node and also link properties
  • 这里仅仅讨论了两个节点之间是否有变,以及随时间推移而发生的节点之间边的变化,但并没有讨论边的属性,如:友好-敌对,强-弱等等
  • At the time that B, C construct a link<img src=”images\image-20210906162508305.png” images\image-20210906162508305” style=”zoom:50%;float:right” />

  • The closeness of (B, C) should not be the same as (A, B), (A, C)

    以3节点关系为例子:

    • 在A-B,A-C有关系且为朋友关系的条件下
    • B-C之间倾向于建立联系
    • 显然A-B, A-C的关系与B-C建立关系的瞬间,是不一样的
  • How to describe this formally?

    如何表达这样不一样呢

  • Granovetter (1973) propose:

    • Strong – Weak. This is a simple 0-1 description. We can investigate more

      Granovetter (1973) 提出了刻画边的属性的一种测量:强-弱,显然,这是一种简化的测量,强度是可以为连续变量的

被动参与 Passive engagement

  1. How many contacts do you use in you phone book?

  2. Triadic closure indicate that, along with time

    三元闭合表明原理实际上暗含了一个随时间推移的可能

    • People may be passively engaged into a network

      有的人会被动地加入某些网络

    • The study of Twitter by Huberman et al. 2009, even that people have over 500 friends, the active contacts are 10 – 20 and passive contacts are usually less than 50

      Huberman等人2009年对Twitter的研究表明,即使所有朋友的总数超过500,实际联系的总数也在10 - 20人之间,被动联系人的数量也不超过50人

  3. The property of links are thus very important

    因此,节点的属性非常重要

嵌入性 Embeddedness

Concept

  • Karl Polanyi (1944) first proposed in “The Great Transformation”, behavior is embedded in the social structure

    Karl Polanyi (1944) ,《大转变》,行动嵌入制度

  • Granovetter (1985) proposed the relationship between economic behavior and social structure. He proposed that economic behavior is embedded in social structure, and is one kind of social behavior

    Granovetter (1985) 在与经济学家哦的论战中,提出了经济行为与社会结构之间的关系问题,拓展了嵌入型概念。指出,经济行为是嵌入在社会结构之中的,是社会行为的一种

  • Embeddedness was introduced to analyze network structure

    后来,这个概念的应用得到了极大的拓展,甚至被引用到了网络分析

  • 网络分析,恰恰是Granovetter从他老师 Harrision White(1967) 那里得到的衣钵

  • 抽象后的嵌入型
    • The common neighbors of a link
      嵌入型:一条边两端共同的邻里数

    • Common neighbors of link (A, B), are E and F

      看A-B边,有共通的邻里E和F

    • The embeddedness of (A, B) is 2

      则A-B的嵌入性为2

    • From a social point of view, the higher the embeddedness, the higher the trust between the two

    嵌入性越强的边,相互之间的信任就越强
  • The higher the embeddedness, the more the social capital 嵌入性越强的边,社会资本也越多

Structural Holes 结构洞

回顾:聚集系数

Clustering coefficient of A = the probability of any two neighboring nodes of A are friends

节点A的聚集系数 = 与A相邻的任意两个朋友之间也是朋友的概率

对节点属性刻画的另一种方式 Another way to describe the property of a node

  • The node, after removing of which, makes the network become multiple connected components

  • 结构洞

    • 一个节点,移除该节点就会使网络变成多个连通分量的节点

    • B has early access to information originating in multiple, non-interacting parts of the network

      节点B,移除它,就会变成3个连通分量

结构洞的意义

  • 当然是指其在结构位置上所具有的意义

    • B has early access to information originating in multiple, non-interacting parts of the network

      B了解三方面的信息

    • Standing at one end of a “bridging link” can amplify its influence

      处于捷径的一端,对其“长处”有放大影响

    • Social “gate-keeping”, it regulates the access of both C and D

      社会“守门”,它规范了C和D的进入,对与其相邻的节点甚至具有“权利”

    • The less redundancy, the more social capital it has

    • 冗余(凝聚力冗余和结构等位冗余)越小的结构洞,社会资本就越多

Which are structural holes? (Test)

3,6,7,8,9,12

小结 A summary

  • We studied more properties of a graph
  • For a node, we can not only use clustering coefficient, but also structural hole to evaluate it
  • Embeddedness can be one property of a link
  • 边的属性可以用嵌入型来测度,嵌入型越强,社会资本越强
  • 对网络中某个时点结构中的边的属性,可以用强弱进行测度
  • 点的属性除了聚集系数外,还有特殊的社会意义,即结构洞的意义,冗余越小的结构洞,社会资本也越多

2.6 弱关系与捷径

依三元闭包精神 Strong tie and weak tie vs. Triadic Closure

Assume people can name whether its connection with another node is strong or weak

假设人们可以命名它与另一个节点的连接是强还是弱

From triadic closure, if (A, B), (A, C) have strong ties, it is unlikely that B and C has no relationship at all

如果一个节点A有两个强关系邻居B和C,则B和C不应该什么关系都没有

强三元闭包原理 The Strong Triadic Closure Property

  1. If a node A has edges to nodes B and C, then the (B, C) edge is especially likely to form if A’s edges to B and C are both strong ties.

    如果A和B、C之间的关系分别为强关系,则B和C之间形成边的可能性应该很高

  2. Or Granovetter suggested a more formal version 进而,我们可以说

    • We say that a node A violates the Strong Triadic Closure Property if it has strong ties to two other nodes B and C, and there is no edge at all (either a strong or weak tie) between B and C. We say that a node A satisfies the Strong Triadic Closure Property if it does not violate it.

      若A有两个强关系邻居B和C,但B和C之间没有任何关系(S或W),则成节点A违背了强三元闭包原理

      若节点A与没有违背强三元闭包原理,则称节点A符合强三元闭包原理

哪些节点符合/违背强三元闭包? Which node violates the strong triadic closure property?

  • F违背(AG没有边)
  • A违背(EF没有边)

捷径~弱关系?

  • 若节点A符合强三元闭包,且至少有两个强关系邻居,则与A相连的任何捷径必定意味着时弱关系。

    证明:

捷径 ->弱关系:大数据验证 Validation of strong triadic closure property from big data

  • The spirit of strong triadic closure property: 强三元闭包原理的精神

    • No common friend -> weak ties

      没有共同朋友 -> 捷径 -> 弱关系

  • Quantitative description: the more common friends, the higher the strength of the connection

    定量化表述:共同朋友越多,关系强度越高

    • i.e. we expect to see the strength of the connection is proportional to the number of common friends

      也就是,在社交网络中,我们预计看到人们关系的强度与共同朋友数正相关

  • Find appropriate data set

One example in book chapter 3.3

捷径 Short cut

There are links connecting two nodes, without which will lead to a long distance (a distance of at least 3)

有连接两个节点的链路,没有链路会导致较长的距离(至少3个距离)

Question: bridge and short cut

  1. Two important definitions in graph theory on special links
    Bridge: after removal, the number of connected components increases 桥接:拆除后,连接部件的数量增加

    如果一个图中,已知A和B相连,若去掉连接A和B的边导致A和B分属不同的连通分量,则该边称为桥。桥为A、B间唯一路径。

    Short-cut: There are links connecting two nodes, without which will lead to a long distance (a distance of at least 3)

    有连接两个节点的链路,没有链路会导致较长的距离(至少3个距离)

  2. Which links are short cuts? Which links are bridges?

Bridge vs. weak ties

Theorem: If A conforms to strong triadic closure, and has two strong tie neighbors ->

如果A符合强三角闭包,并且有两个强联系邻居

Then any bridge connecting to A must be weak tie.

那么任何连接到A的桥一定是弱连接。

A connection between bridge of Graph Theory from Math with Weak ties from Sociology

数学图论的桥梁与社会学的弱纽带

Strong triadic closure says the B-C edge must exist, but the definition of a local bridge says it cannot

强三元闭包说B-C边一定存在,但局部桥的定义说它不存在

If a node satifies Strong Triadic Closure and is involved in at least two strong ties, then any local bridge it is involved in must be a weak tie. The figure illustrates the reason why: if the A-B edge is a strong tie, then there must also be an edge between B and C, meaning that the A-B edge cannot be a local bridge.

如果一个节点满足强三元闭包且至少参与了两个强连接,则该节点参与的任何局部桥必定是弱连接。从图中可以看出,如果A -B边是一个强连接,那么B和C之间一定也有一条边,也就是说A -B边不能是一个本地桥。

小结 A summary

We studied properties in graph theory, and sociology

We saw different properties can be linked together, i.e., from one property to another property

We studied from social science problems à abstract into graph and à study its properties (with new definitions) à validation from big data

This is a procedure you need to learn and practice

Lec 3

3.1 同质性与社交关系 Homophily

同质性 Homophily

  1. Plato (柏拉圖): similarity begets friendship 相似性带来友谊

  2. Aristotle (亞裏士多德): love those who are like themselves 人们喜欢那些与自己相似的人

  3. Proverbs: birds of a feather flock together

    物以类聚,人以群分

  4. 夫妻相

    物以類聚人以群分乎?人們因為相似,所以他們一起了

    近朱者赤近墨者黑乎?人們因為一起,所以他們變得相似

Lazarsfeld and Merton

  • Berger, et al. 1954. Freedom and Control in Modern Society
    1. Society, people and individuals 社会控制、群体与个体
    2. State and society 国家和社会
  • Lazarsfeld and Merton (1954) distinguished
    1. Status homophily: people with the same social status

      身份同质性: 相同身份的人会彼此相互联系

    2. Value homophily: people with the same value systems

      价值同质性: 相同价值观的人会彼此相互联系

Miller McPherson, et al. 2001

  • “Birds of a feather”, introduces the mechanism of homophily 《物以类聚》提出了同质性的机制

    • The basic ecological processes that link organizations, associations, cultural communities, social movements, and many other social forms;

      生态过程:场所的影响:同一个机构、社区、共同参加活动

    • The impact of multiplex ties on the patterns of homophily;

      关系过程:交叉关系的影响,一个人在不同的场所

    • The dynamics of network change over time through which networks and other social entities co-evolve.

      网络过程:随时间的变化而变化的动态

James Moody 2001

  • Friendship of high school students
    Dark: Black, White: White
    Circle: grade 9, Square: grade 10, Diamond: grade 11, Triangle: grade 12

同质性与社会交往 Homophily and Friendship

社会交往 Friendship

  • For an individual, he has two types of characteristics

    每个人的特质可分为两种

    • Intrinsic: gender, race, mother tongue, etc

      固有特质:性别、种族、母语等,自然属性

    • Changeable: where he lives, expertise, what he likes, etc

      可变特质:居住区,专场,偏好,建构属性

  • Homophily is the external reason for the creation of social networks

    同质性是社会网络结构形成的基本外部原因

    • Common in race, locations, expertise, interests
    • 血缘、地缘、业缘、趣缘等
  • One key question in social sciences 社会学中的一个基本问题

    1. Commonality -> friendship ? (selection)

      共性——>友谊 ? (选择) 是因为“羽毛相似”才交往呢

    2. Friendship -> commonality ? (social influence)

      友谊——>共性 ? (社会影响) 还是因为“同林”之后才变得”羽毛相似“
      呢?

Discussion

  • Facts:
    Hong Kong has a lot of universities 香港有很多大学

    Students are coming from different high schools at different places (maybe around the worlds) 学生来自不同的高中在不同的地方(可能是世界各地)
    Assume a few students from the same high school enter HK PolyU 假设有几名来自同一所高中的学生进入香港理大

  • Discussion:
    At the time these students graduate, what do their friendship (social structure) look like

    当这些学生毕业时,他们的友谊(社会结构)是什么样的

小结

  • 同质性是一个古老的议题,从柏拉图时代就已经开始讨论
  • 在齐美尔(Simmel)的基础上,Lazarsfeld 和 Merton 提出了同质性的两个类型和机制
  • 进一步的研究提出,影响同质性现象的还有外在因素
  • 在社会交往中,同质性与网络之间有着更为复杂的关系

3.2 社交网中同质性的一种测量方法 Measuring Homophily

Homophily in social networks 社交网络中的同质性现象

  • Friend ~ Similar 朋友~相似

  • The definition of similarity can be different in different problems

    “相似”的含义可因考虑的问题不同而不同

    • 固定特征,可变特征

如何定量评估一个社交网络中同质性现象的程度 How to measure the degree of similarity?

  • Given a social network where the nodes have only two properties: red and white

    给定一个社交网络 (只考虑两种不同特质:红,白)

    The information we can have:

    1. The number of nodes (n), the number of links (e) 节点数(n),边数(e)

    2. The ratio of different colors: p, q = 1 – p

      不同颜色节点的占比:p, q = 1 - p

    3. The number of links (s) where the two end nodes have the same color

      两端节点颜色相同的边数(s)

  1. The number of nodes 节点数n = 9

  2. The number of links 边数e = 18

  3. The ratio of red nodes 红色节点占比p = 1/3

  4. The ratio of white nodes 白色节点占比q = 2/3

  5. The number of links where the two end nodes have the same color s = 13

    两端节点具有相同颜色的边数 s=13

Insights 认识

The more links with the same color end-nodes, the higher the homophily

两端节点相同的边越多(占比越高),同质性越明显

![image-20210913140127089](images\image-20210913140127089.png”/>

两端节点相同的边越多(占比越高),同质性越明显

How high is high? Is there a benchmark?(13/18)

多高才算高?有没有一个基准?

  • Use random probability as a benchmark 使用随机情况作为基准

  • Assume, red ratio is p, white ratio q, 给定不同颜色节点的占比:假设红比为p白比为q

  • In random, the probability of a node is red is p, the probability of a node is white is q ->

    在随机情况下,一个节点是红色的概率是p,白色的概率是q

    • The probability that a link has the same-color end-nodes is p^2^ + q^2^

      那么任何一条边的两节点颜色相同的概率就是p^2^ + q^2^,也就是两端节点相同边的占比

How to evaluate the degree of similarity?

  1. The number of nodes n = 9
  2. The number of links e = 18
  3. The ratio of red nodes p = 1/3
  4. The ratio of white nodes q = 2/3
  5. The number of links where the two end nodes have the same color s = 13

13/18 vs. $(1/3)^2$ + $(2/3)^2$

13/18 vs. 1/9+4/9=5/9

13/18 > 10/18

3.3 物以类聚 人以群分

经验观察 Selection

Plato (柏拉圖): similarity begets friendship 相似性产生友谊

物以類聚,人以群分

Aristotle (亞裏士多德): love those who are like themselves 人们喜欢那些与自己相似的人

物以類聚,人以群分
There are exceptions, e.g., 一山不容二虎

理论 Theory

  • Lazarsfeld and Merton (1954) 区分了选择的机制
    • Status homophily: people with the same social status

      身份同质性: 相同身份的人会彼此相互联系

    • Value homophily: people with the same value systems

      价值同质性: 相同价值观的人会彼此相互联系

    • The core is “selection” of similar group of people

      即人们通过自然属性或社会属性的相似性、或价值观的相似性进行选择

如何理解 Interpretation

  • Why social status and value systems may influence the homophily of social networks

    为什么身份和价值观会影响社会网络同质性的同台

    1. Giddens, 1984; 1991, people have the incentive and freedom to join

      个体具有“加入”的主动性和自主性

    2. Example, your friend brings a friend (to you, he is a stranger)

      你的朋友带了一位对你而言是陌生人但却是你朋友的朋友的人来见你

    3. People also have the freedom to leave

      同样,人们也有退出的主动性和自主性

      • Example: Club members, add-drop subjects

        例子:教会的吸收新成员,学生社团吸收新成员

    4. Homophily is a self-selection; it is a dynamic process

      网络的同质性,实际是一个动态的过程,即使是主动选择的。

      从属网络

      Become friend because of common interests 因为共同的兴趣成为朋友

      Anna – karate club – Daniel form a closure

      安娜-空手道俱乐部 - Daniel來自一個閉合

    Real world example

    1. Affiliation network: the memberships of people on corporate boards of directors

      隶属网络:公司董事会成员的身份

    2. A very small portion of this network (as of mid-2009) is shown here

      这里显示的是这个网络的很小一部分(截至2009年年中)

    3. It usually helps to analyze the construction of the board

      分析兼职的结构,以及他们之间的个人关系,对理解公司的行为有帮助

    选择与社会交往 Selection and friendship

    1. Primarily a proactive process, you can create opportunities

      社会网络的建构,无论是加入,还是退出,都可能是一个主动的过程

    2. passive join is possible (embeddedness)

      尽管被动参与(参见嵌入性)也是一种机制

3.4 近朱者赤 近墨者黑

经验观察 Social Influence

  • 传统故事:孟母三迁

  • 俗语:近朱者赤,近墨者黑

  • 先結婚,後戀愛

  • These are phenomena; we need to abstract them into theory

    这些现象; 我们需要把它们抽象成理论

理论 Theory

  • Miller McPherson, et al. 2001 “Birds of a feather”, introduces the mechanism of homophily

    介绍了同质性的机制

    • The basic ecological processes that link organizations, associations, cultural communities, social movements, and many other social forms;

      生态过程:场所的影响:同一个机构、社区、共同参加活动

    • The impact of multiplex ties on the patterns of homophily;

      关系过程:交叉关系的影响,一个人在不同的场所

    • The dynamics of network change over time through which networks and other social entities co-evolve.

      网络过程:随时间的变化而变化的动态

  • Emphasize that affiliations are also important

    强调生态型的重要性

从属关系与社会关系的相互作用随时间发生的变化

Become friend because of common interests
Anna – karate club – Daniel form a closure

  1. Two people don’t know each other, get to know each other in karate club, become friends

    两个人不认识,在空手道俱乐部认识,成为朋友

  2. Daniel even join literacy volunteers

    丹尼尔甚至加入了扫盲志愿者

​ after some time

  • A closure, but not because of a common friend, but because of influence by a friend

    同样是“闭包”,不过,这次是因为人的关系在先,对另一场所的兴趣在后,收到影响之后,有了兴趣,找到了“归属”,即社会影响

<img src=”images\image-20210913143225261.png” images\image-20210913143225261” style=”zoom:67%;” />

Selection -> focal closure 选择->焦点闭合
Social influence -> membership closure 社会影响->成員閉合

qualitative statement 定性 -> quantitive statement 定量

  1. Triadic closure: if two people, who originally don’t know each other, have a common friend, they have an increasing chance to get to know each other

  2. Focal closure: if two people, who originally don’t know each other, have join a common focus, they have an increasing chance to get to know each other

    两个人参加了同一个focus,从而成为朋友的可能性更大,被称为focal closure

(quantitative statement)

If there is more number of common focus of two people, who originally don’t know each other, then there is greater probability that they will become a friend.

  1. Membership closure: 两个人是朋友,其中一个人参加了某个focus,那么另外一个人也倾向于参加该focus

社会归属网:描述从属关系与社会关系 Co-evolution of social and affiliation network

  • 在现实社会中,选择与影响似乎很难明确区分,实际是交替甚至同时发生的现象,同质性是两种机制共同的结果

    选择——社团闭包

    影响——会员闭包

社团闭包的验证 Validation of focal closure

  • Recall how we validate triadic closure

    回忆一下我们如何验证三元闭包

  • Focal closure 社团闭包
    Connection is established when working on the same “focus”

    由于参与同一个事情,两个原本没联系的人之间,建立了联系

  • The more shared foci, the higher the chance of connections (selection), i.e., establishing an edge in the graph

    共享焦点越多(共同参与的事情越多),建立联系的可能性越高,即在图中建立一条边

  • Joint subject selection, the more common subjects, the higher the probability that they are connected (i.e., sending emails) 联合主题选择,越常见的主题,他们连接的概率越高(即发送邮件)

  • Details see the book

会员闭包的验证 Validation of membership closure

  • Membership closure 会员闭包

    1. Because a friend has certain “focus” -> join the focus

    2. The more friends are in a focus, the higher probability one will join this focus (get influenced)

      由于朋友参与某件事情中,原本不在这件事情的另一个也有可能加入这个事情(受到影响)

  • The plot shows the probability of joining a LiveJournal community as a function of the number of friends who are already members

    参与某件事情的朋友越多,其被影响而参与的可能性就越高

  • Details see the book

Discussion

Interplay of the following groups in Hong Kong 香港下列族群的相互影响
triadic closure, membership closure, focal closure 三元闭合,隶属闭合,焦点闭合
Hong Kong resident, visitors from mainland and foreign countries, immigrants from mainland and foreign countries
香港居民、内地和外国游客、内地

外国移民 年龄,关系等等

Age, and affiliations, etc

Lec 4

4.1 “朋友~相似”现象溯源 利用大数据分析同质性 Big data analysis on the origin of homophily

朋友间相似的原因 The reasons that friends are similar

  • The phenomenon: two people are friends -> :两个人是朋友

    they are similar in one way or another 他们在某些方面是相似的

  • We need a longer time to see the evolution, the interplay of selection and social influence

    我们需要更长的时间来观察一群人,看到他们之间关系的演变,以及他们参与的社会活动

如何反映这一点? How to reflect this?

需要一个数据集 We need a data set

  • Which can reflect the social evolution along time

    反映随时间变化的大规模社会归属网

  • Large scale, in terms of number of people, and number of interactions

    大规模,人多,社交聚点多

  • Evolve along time, people vs. people, people vs. foci

    随着时间的发展,人和人之间,人和社交聚点之间

  • Wikipedia: 500K editors, > 3 million articles 维基百科:500K编辑者,> 300万篇文章

Using online data set of Wikipedia

Every editor has a user talk page, other editor can post on it; thus a social network is created

每个编辑都有一个用户讨论页面,其他编辑可以在上面发帖;这样就形成了一个社交网络

  • Two editors become similar: selection vs. social influence

    两个编辑之间相似性的变化与“自然选择”和“社会影响”的关系

  • Before communication, similarity (editing the same article) is because of selection; after arriving at a certain point where they start to social, social influence starts to impact

    没有联系(通信)之前,相似(编辑相同文章)主要是因为选择;达到足够相似度时则容易发生联系,然后社会影响开始对相似性提高其作用

两人相似性的测量 Measuring similarity

Similarity = (number of articles edited by both X and Y)/(number of articles edited by at least one of X or Y)

相似度= (X和Y都编辑的文章数量)/(X或Y中至少一个编辑的文章数量)

相似性:2/4 = 1/2

Question

Given the similarity we just defined,
Assume X edited A, B, C, D, E 5 articles, Y edited C, D, E, F, G, H 6 articles,
What is the similarity between X and Y

鉴于我们刚刚定义的相似性,
假设X编辑A, B, C, D, E, 这5篇文章,Y编辑C, D, E, F, G, H 这6篇文章,
求X和Y之间相似度

X and Y: C, D, E

3/8

相似性、选择与社会影响

4.2 谢林模型 The Schelling Model

从一个现象开始 Starting from a phenomenon

Mobius and Rosenblat, 2001
A study on African American residential areas in Chicago 对芝加哥非裔美国人居住区的研究

Black area means 50% of residence are African America

黑色区域意味着50%的住宅是非裔美国人的

  • Phenomenon: 现象

    More and more black people stay at certain district 越来越多的黑人在某个区域聚集

  • Interpretation: 理解

    The same in race, and select to stay at the same place 同样的种族,并选择停留在相同的地方

    Get to know each other and go to stay at the same place 互相了解,待在同一个地方

This introduces segregation 这引入了隔离

Schelling model 谢林模型示意

  • Schelling 1972, 1978, a study on segregation
    • No one individual explicitly wants a segregated outcome

      隔离的动态模型(1972):隔离不是个人刻意选择的后果

    • Micro incentive vs. macro behavior

      微观动机与宏观行为(1978,2)

  • Agents living in a cell 节点为居住者(单元)

  • Two types of agents, x and o 两类的居住者 (X和O)

  • Constraints: 约束条件

    • every agent needs to have a certain number (t) of same-type neighbors :

      每一个居住者都要与有一定数量(t)的同类为邻居

  • Dynamics: 动态

    • if an agent find that there are fewer same-type neighbors, he will move

      如果一个居住者发现自己的邻居数小于t,他就有兴趣搬家,以满足邻居数

  • Assume t = 3, then satisfy, when ≥ 3

    假设t为3, 当大于等于3 表示满意

  • Use * to represent < 3 小于3,就搬家

  • 一段时间过后如下图

谢林模型的社会意义 Schelling model in dynamics

  • 150 x 150 (街区空间)
  • X, O = 10,000 (X,O两类肤色的人群,各10,000户)
  • t = 3 (也就是自己的邻居有3户或者3户以上一样的肤色就不搬家)
  • Start setup: random (初始搬家户,在有搬家意愿的住户下随机产生)
  • After 2 rounds
  • 150 x 150
  • X, O = 10,000
  • t = 4
  • Start setup: random
  • After 20, 150, 350, 800 rounds

小结 A summary

Use residency segregation as an example, Schelling model emulate the dynamics of homophily

以居住隔离为例,谢林模型模拟了同质性的动态变化

If homophily is a nature phenomenon, promoting it or preventing it under different circumstances, will have an impact to social evolution

如果同质性是一个自然现象,则促进或阻止不同的社会情景下的同质性,将会对社会发展产生重要影响

Discussion, a true story

Some years ago, Yangzhou University established apartment specifically for students from low income families. The residential fee is only 40% of the regular fee. 几年前,扬州大学专门为低收入家庭的学生建造了公寓。住宿费仅是普通住宿费的40%。

However, this brought about a lot of controversy 然而,这引起了很多争议

Discuss the pros and cons of such policy 讨论这种政策的利弊

  • Study the computational thinking of social networks 研究社交网络的计算思维

  • Convert descriptive ideas into models/problems that can be analyzed rigidly 将描述性的想法转化为可以严格分析的模型/问题

    • Mainly using graph 主要使用图
  • Study a specific topic of social network: homophily 研究社会网络的一个特定主题:同质性

  • Big data verification 大数据验证

    • Study the procedure: a real world problem, rigid formulation/modeling, a data set to validate 研究过程:一个真实世界的问题,严格的公式/建模,一个数据集验证

Lec 5

5.1 小世界现象及其惊奇 Small world

An (controversial) experiment (有争议)的实验

  • Stanley Milgram (Harvard) 1967

    • Many would like to prove it or prove it wrong 许多人想要证明它或者证明它是错误的
  • We study some intrinsic properties of social network 我们研究了社会网络的一些内在性质

  • Stanley Milgram (Harvard) 1967

    • 现象:“My it’s a small world”

    • Question: if two strangers want to get to know each other, how many intermediate hops/friends are needed to link them up? 问:如果两个互不相识的人想要互相认识,中间需要经过几个人?

    • Importance: a certain mathematical structure in a society

      意义:社会中的某种数学结构

    • Milgram: feel curious whether there are some structure in the social network that can be mathematically described 对社会网络中是否有某种结构可以用数学来描述感到好奇

    • 假设:

      • 由于每个人都有熟人,熟人之间没有芥蒂,可以交往,所以,不曾有连接的两个人之间,如果要建立连接,中间人的数量应该不多,
      • 每个人确实都有熟人,不过,不同类型或阶级熟人之间,不会有交往。所以,不曾有连接的两个人之间,不可能建立连接。
    • The design of the experiment: 实验的设计

      • Random chosen starters (296), see how many intermediate hops to reach the target person

        选择一个随机起点,观察需要经过多少个中间人

    • Rules:

      • Participants can send a mail to someone he knew on a first-name basis

        参与者只能将信件转发给能直呼其名的熟人,并请他继续转发。如果不认识,则不能将信寄给他

      • Every one try his best to reach the target person

        参与者需要力争让信件能尽快到到目的地

      • e.g., from Wichita Kansas to Harvard in Boston, from Omaha, Nebraska to Boston, etc

        第一次: 从Kansas的Wichita到哈佛大学神学院的某学生的妻子

        第二次:从Nebraska的Omaha到Boston的Shanron的股票经纪人JT

    • 结果:平均中间人数:5

    • Median length 6

    • The six degrees separation

实验带来的惊奇 An astonishing experiment

  • Small world problem

    小世界问题

    • Before this experiment, people do suspect that our world is “small”

      在Milgram的研究之前,人们感觉到世界很小,却没有证据

    • MIT faculty and students have tried ways to verify this

      MIT的师生试图证明只一点,不过没有结果

    • Milgram designed this experiment

      Milgram的实验,用信件传达,得到了一个平均数

  • Small world phenomenon

    小世界现象

    • Milgram study showed: Milgram的研究证明

      1. The world is small (six degrees of separation), i.e., a lot of short paths in social networks:

        **世界是小的(六度分离)**,社交网络中包含丰富的短路径

      2. Short paths can be “discovered” by itself, conscious forwarding can automatically discover these short paths

        “自动寻找”短路径:“有意识的转发”可以“自动地”找到这些短路径

启发?Why?

  • Why social networks have such property? What kind of principles of social networks this property can be derived from?

    为什么社交网络具有这样的性质?它们源于社会网络的哪些基本原理?

  • Can we construct a mathematical network model which reflects such property of social networks?

    能否依据社会网络的某些基本原理,构建出反映出这种性质的的数学网络模型?

小结 A Summary

  • An experiment trying to show that the world is small may reveal some intrinsic mathematical properties exist between people

    一个试图证明 “世界是很小的” 简单实验,提示了或许在人际之间,的确存在着某种数学结构

5.2 小世界现象的普遍性 Generality

小世界现象有多普遍 How general the small world phenomenon is?

  • Milgram’s experiment is not entirely rigid in today’s standard, but it is very creative

    米尔格拉姆的实验在今天的标准中并不完全严谨,但它很有创造性

    • It shows the world is small (in terms of separation between people)这表明世界很小(从人与人之间的距离来看)
    • There are abundant short paths 有大量的捷径
    • Without any sort of global “map” of the network, people are effective at collectively finding these short paths 没有任何类型的网络全球“地图”,人们可以有效地集体找到这些短路径

**Is this generally applicable or an exception case? **这是普遍适用的情况还是例外情况?

后续研究

  • Many follow-up studies, including Milgram himself in 1970

    不少重复研究,包括Milgram自己在1970年的研究

  • Extension to emails 扩展的实验——电子邮件

    • Dodds, Muhamad, and Watts 2003

      • 60,000 accounts

        6W个电子邮件用户

      • Send to 13 country, 18 recipients by forwarding to acquaintance

        通过给熟人转发电子邮件的方式,将邮件发送到13个国家的18个收件人

    • Discovery: 发现

      • number of hops 5 – 7 in average

        中间路径:平均5 - 7步

      • 通过认识的人,不一定有多熟悉

网页之间的“社交”

  • There are billions of webpage, more than the population on earth

    全球互联用有超过几百亿的网页,比人类的的总数还多

  • 问题

    • Is there small world phenomenon between webpage?

      网页之间,有怎么样的关系,是否存在“小世界”现象?

  • Albert, Jeong, Barabási (1999), Barabási (2013)

    • The distance between two web pages is 18.59 on average

      没有关系的两个网页之间的直径是18.59次点击

      The question is: why?

小结 Summary

通过熟人送文件(书信)在不断的检验小世界现象的存在

即使加上了种族因素,小世界依然存在

在电子邮件时代,通过熟人转发电子邮件,仍然有小世界

即使是网页之间的社交,也有短路径

5.3 小世界的基本模型 Small world: The Watts-Strogatz-Kleinberg model

人类社会的小世界现象 Small world phenomenon

  • In social networks, there are a lot of short paths

    社会网络中两节点包含丰富的短路径

    • There is high probability that there are short paths between any two nodes

      任意两节点之间存在短路径的概率很大

  • Decentralized search can find these paths

    短视搜索能够有效地找到这些短路径

    • Decentralized search: every node can only see its neighboring nodes

      短视搜索:在达到目标节点过程中,每一步只能只能看到它的邻居节点

      对于十分稀疏的社会网络来说,这并不是必然的

  • Decentralized search vs. breadth first search

    • Breadth first search: send to every friend 广度优先搜索:发送给每一位朋友

    • Breadth first search can reach the target (as long as the network is a connected component) 广度优先搜索可以达到目标(只要网络是连通组件)

    • It is possible that decentralized search fails to reach the target person; yet it doesn’t

      短视搜索可能找不到目标人;

  • This is not that straightforward

    • There can be tens of millions of people in a social network, yet each person may only have a connection of one hundred neighbors

      一个社交网络可能有数千万人,但每个人可能只跟100个邻居有联系

  • This means that the network is quite sparse 这意味着网络非常稀疏

    • May not have a lot of short path, especially for every pair of starters and targets

      可能没有很多短的路径,特别是对于每一对寻找者和目标

    • The success of decentralized search is not easy to immediately believe

      短视搜索的成功并不容易被立即相信

从现象到问题

  • Question

    • Why social networks have such property? What kind of fundamental social principles can this be derived

      为什么社交网络有这样的性质?他们源于社会网络哪些基本原理?

      • It can be proved that a pure random network does not have such property

        可以证明,完全随机的网络没有这样的性质

      • Social network is also random, any difference?社交网络也是随机的,有什么区别吗?

    • In other words, can we use certain social network principles to prove that this is the case

      换句话说,我们是否能依据社会网络的某些基本原理,说明这种性质的必然性?

形成社会网络的两种基本力量 The fundamental force to create social networks

  • Homophily 同质性

    • Common friends, schoolmates, common interests 共同的朋友,同学,共同的兴趣,邻里关系

    • A circle with a lot of triangles: due to closures

      对应社会网络中有大量的三角形(圈子):由于闭包

  • Weak tie 弱联系

    • Occasional friends, remote friends

      偶然的原因,认识的“远程”的朋友

    • He is not familiar with the circles of these friends 对其所在的圈子并不一定熟悉

What kind of network structure can show these two forces, and also easy for us to analyze the small world phenomenon?

一种什么样的形式化网络,既能表现出这两种力量的作用,又便于我们分析其中是否有小世界现象呢?

Watts-Strogatz 模型 The Watts-Strogatz model (Nature, 1998)

  • Define a graph (network), that can reflect the above two factors

    定义一种图(网络),能体现上述因素

    • A lot of triangles and a few random but remote links

      有许多 “三角形” 和少数随机的”远程边“

      想象大量节点排布成均匀网格状 连接近邻:确定性 连接远程:随机性

  • Two types of distances

    • Grid distance: physical distance

      网格距离:(空间位置上的远近)

    • Network distance: number of hops in separation

      网络距离:由节点之间的边来确定的最短路径的长度

      网格距离近的,网络距离就小。网格距离大的,网络距离不一定就大。

Our goal: network distance is small for every pair of node

我们的目标是:每对节点的网络距离较小

  • Captures homophily and weak ties 抓住同质性和弱关系

  • Can be seen as a reasonable model for a true social network

    体现了同质连接和弱关系连接的概念,于是可以看成是现实社会网络的一个合理近似。

  • It can be proved that in such network, there is high probability to have short paths

    可以证明:在这样的网络中,任意两节点之间存在短路径的概率很大

  • It can also be proved that the Watts-Strogatz model cannot reflect the second requirement: decentralized search cannot easily find the short paths

    也可以证明,Watts-Strogatz模型不能很好地反映第二个要求:

    • 短视搜索的路径太长,尽管短路径存在

    • Though short paths exist

      虽然有捷径

小结 Summary

  • For important social phenomenon, it would be good if we can find a mathematical model to capture it Even though not entirely matching

    对于重要的社会现象,如果可以用一个数学模型来解释,尽管这个模型概括不了现象的所有细节,也是很值得追求的

  • Watts-Strogatz model abstracts the fundamental forces for the creation of a social network, and explain that small world phenomenon is inevitable
    A milestone, which for the first time links the social network properties, homophily and weak ties, with small world phenomenon

    Watts-Strogatz模型,抽象地表达了社会网络成因的基本特征,从理论上说明了小世界现象(一个方面)的必然性

5.4 小世界的精细模型——Watts-Strogatz-Kleinberg 模型

The limitation of Watts-Strogatz model(局限性)

  1. Provide a model where there is a high probability to have short paths between two nodes, i.e., small world

    证明了模型网络中任意两个节点之间存在短路径的概率很高,即小世界

    • Many software package directly support it
  2. But, it cannot explain what was observed by Milgram et. al, decentralized search can find short paths

    但是不能解释Milgram等人实验反应出的小世界现象的另一个层面,在短视搜索情况下可以找到短路径

    • Decentralized search in Watts-Strogatz model usually lead to long paths

      在模型上执行短视搜索,常常导致较长路径

What is decentralized search 体会“短视搜索“概念

  • It is common in our social network

    这在我们的社交网络中很常见

    • What do you do when you want to do something and seek for help, say want to go for a travel in Tibet? 当你想做一些事情并寻求帮助时,比如想去西藏旅游,你会怎么做?

    (1) people know the objective (2) every search has and only has local knowledge (3) one proceeds by comparing with the objective

    (1)人们知道目标(2)每次搜索都有且只有局部知识(3)通过与目标进行比较进行

Decentralized search 短视搜索

  • As compared to breadth first search (without an objective), this is a search with certain objective based on local information

    相对于广度优先搜索(没有目标),这是一种有目标的基于局部信息的搜索,具有如下特点

    • Every node has a property, and two nodes can measure the distance in terms of this property (note the difference between the distance in graph)

      每个节点都有一个特征,任何两个节点之间的特征可以谈差别(距离)(不同于图论中定义的距离!)

    • Every node knows the property of the target node, and the property of neighboring nodes每个节点都知道目标节点的特征,也知道自己和自己邻居节点的特征

    • Search can be seen as a forwarding process: forward the information to a node that has a small distance to the target node (in terms of property)

      搜索过程可以看成是一个信息传递的过程,节点将信息传到离目标节点距离较近(差别较小)的邻居节点

An example of decentralized search 短视搜索例子

  • 16 nodes, 0, 1, …, 9, A, …, F

  • Property: represented by the circle

  • Distance: the location distance between two nodes in the circle

  • Distance between 0 and A is 6

  • Starting from 0, decentralized search to A is 0-C-B-A, why?

  • The shortest path is 0-F-A

  • Decentralized search doesn’t go through the shortest path

    短视搜索不会经过最短路径 (0-F-A will never be found through Decentralized search )

Question

  • Starting from 0, target 9, the path output by decentralized search is:

一种一般的认识论方法

  • There can be certain property for an object (e.g., the structure of a network) looking from a global view, but if we do not have global info, and we pursuit the target from local info, we cannot find that property

    经常,在事物的宏观格局中存在某种性质,但若缺乏宏观视野,仅凭基于微观视野的追求,不一定能发现那种性质

From small world, we see

  • From empirical study, we see that, decentralized search is good in human social networks
    This means that the structure of human social network has certain property that supports decentralized search 通过实证研究,我们发现去中心化搜索在人类社会网络中是很好的
    这意味着人类社会网络结构具有支持去中心化搜索的特性
  • Decentralized search is not good in Watt-Strogatz model
    This means that certain property of human social network is not captured去中心化搜索在Watt-Strogatz模型中并不适用
    这意味着人类社会网络的某些属性没有被捕获

We need a new model

  • Reflect that there exists short paths and also these short paths can be discovered by decentralized search

    即反映任意节点对之间短路径的存在性,也支持在这种信件转发方式下短路径的可实现性

    What kind of property? 网络中需要什么样的结构特征来体现这样的要求?

  • No matter how far the two nodes is, they can approach each other very fast – for short paths to exist

    两个节点无论相距多远,都要有机会很快接近

  • The closer the two nodes is, the higher the probability that they can directly reach each other

    两个节点的距离越近,存在直接连接的机会越大

    • In other words, the further away of two people, the less the probability they form a “weak tie”

Watts-Strogatz-Kleinberg model 模型

不同q值对随机连接长度的影响 The impact of q

  • In Watts-Strogatz model, q = 0, i.e., d has no impact

    • d, the grid distance,
  • what if q is very big, e.g., ∞?

The best q (Kleinberg, Nature 2000) 该模型的最佳工作参数 q

  • In theory, q = 2 is the best for decentralized search

    理论结果,当q=2时,短视搜索达到最佳效果

  • Simulation, q = 1.5 ~ 2. The larger the network, the closer q is to 2

    仿真实验:由几亿个节点组成的网络中,考察不同的q值在短视搜索中的效果

Summary 小结

WS model cannot fully reflect some properties of human social network, this leads to WSK model
WSK model controls the probability of the random links, and shows better matching to real world cases
There is a control parameter q

发现WS模型不能反映现实社会网络的一个重要特征,促进了WSK模型

WSK模型通过适当控制WS模型中的随机性,与实验结果更加吻合

该模型中出现了一个优化参数 q, 当取特定值的时候效果最好,这个参数在现实社会网络中怎么体现的呢?

Review

  • Curiosity: whether people can find a stranger easily, i.e., small world?
    • Milgram experiment
  • Curiosity: is Milgram’s experiment an exception case?
    • Many similar results
    • A summary of the results -> many short paths, decentralized search can easily find target
  • Curiosity: any abstraction to explain why?
    • WS model, link small world with closure and weak tie WS模型,用封闭和弱联系连接小世界
    • WS model is not good enough, in terms of decentralized search -> WSK model with a “q”
  • Curiosity: what is this “q”, and inverse-square?
    • Possible interpretation: the further away two people, the less probability that they form a “weak tie” 可能的解释:两个人距离越远,他们形成“弱联系”的可能性就越小。
    • In other words, the closer they are, the easier they form a “weak tie” 换句话说,关系越亲密,就越容易形成“弱纽带”
  • Curiosity: can WSK and q be verified?

5.5 WSK模型中优化参数的大数据验证 Small world: Verification with Big Data

在WSK网络模型上的最优参数q=2

  • Milgram has demonstrated, in social networks, decentralized search also produces short paths.

    Milgram的实验表明,现实社会网络中,短视搜索的路径也很短,但值得好奇。

  • Social network is naturally constructed

  • During such construction, there is this magic q and q = 2, hmm…

    <img src=”images\image-20210927134717732.png” images\image-20210927134717732” style=”zoom:50%;” />

  • Curiosity: does the probability that people become friends really decreases according to geographic distance and the intensity of decrease is inverse-square distribution?

    难道人们称为朋友的概率真的随空间距离递减,并且递减强度幂次q真的等于2吗?

  • 3/4, 3/10, 4/12

利用在线社会网络进行验证

<img src=”images\image-20210927134816415.png” images\image-20210927134816415” style=”zoom:50%;” />

  • Does a real online social network reflects the property of W-S-K?

    真实大规模在线社会网络是否体现了这个(WSK)网络模型的优化性质?

    • The probability that two people become friends is inverse-square distribution

      两人成为朋友的概率与其空间距离的平方成反比

  • If yes, then a randomly (naturally) formed social network may have certain intrinsic parameter

    如果是,则说明随机形成的社会网络可能具有某种本质的参数!

  • In WSK model, there is grid distance. What is the distance in an online social network? A geographic distance? What does “geographic” mean?

    但是,在线社会网络的节点间如何谈空间距离?

Live Journal (LJ), founded in 1999

Live Journal的实验数据
  • Registered users (500K) must provide their ZIP code

    50万用户,含邮政编码信息(地理位置)

  • So we have both the geographic distance and graph distance (i.e., friends)

  • The geographic distance of LJ users

    但他们是分布不均匀的,不符合模型的假设,需要做一些“适配性”工作

The population density is non-uniform

Compare the distance

  • For a social network, the density of people in a geographic location can be more important than the geographic distance

    对于一个社交网络来说,地理距离范围内的人数比距离具有更实质性的意义

  • In WSK model, the nodes (in terms of grid distance) are uniformly distributed

    在WSK模型中,节点(网格距离)是均匀分布的

社会网络中结合地理距离的节点相对排名 Define rank

  • Rank is proportional to distance when the nodes follow uniform distribution

    可以看成是节点在地理上均匀分布时 区域范围概念的一种推广,“排名”与“距离”有对应关系

  • We now use the rank to represent d^2^

  • We solved non-uniform distribution

    这就能使我们能一般性处理节点在地理上分布不均匀的问题了

Statement translation

  • Originally we want to verify the statement要验证的是
    • (In the situation where people are uniformly distributed) the number of friends of a node A is inverse-squared to their geographic distance (1/d^2^)

      在均匀地理分布情形,一个节点在任一距离上的朋友数量在同等距离节点总数中的占比随距离平方递减(1/d^2^)

  • Now we translate the statement to此时等价于要看
    • The number of friend of a node A (i.e., with connection to this node) is inverse to the rank 一个节点在任一排名上的朋友(即有连接)数量在同等排名节点总数中的占比随排名递减(1/r)

PNAS 2005

  • Empirical data (the parameter) from a real social network matches the parameter of the theory model, i.e., inverse to 1/r (see figure, 1.15~1.2)
  • If divide the data into west/east coast, it is even more perfect
  • 纵轴是概率,横轴是rank。
  • 第一张图r^-1.2^接近于1

  • 第二张图,我们分为东海岸,西海岸,可以看到 一个是1.05, 一个恰好为1

Interpretation

  • Huge naturally constructed social networks of individuals (micro) demonstrated a macro optimization behavior 巨大的自然构建的个体社会网络(微观)展示了宏观优化行为
  • The optimal q (q = 2) is derived from a model and computed by centrally controlled computers最优q (q = 2)是从一个模型中推导出来的,由中央控制的计算机计算
  • Human social network collectively achieved this 人类社会网络共同实现了这一点
  • A case showing social computation

小结 Summary

  • We studied the small world phenomenon in depth

  • We see experiment -> theory -> improve and perfection -> experiment
    Small world -> interpretation by abstraction to WS model -> better interpretation by abstraction to WSK model, and we observe q -> data verification, and we understand more

  • We also see that the potential of big data to assist our study

    我们看到了“实验——理论——完善——实验”研究范式的体现

    • 小世界现象——抽象模型(解释现象)——完善模型(更好的解释)——数据验证(得到对现实更深入的认识)
  • 也看到了大数据分析在推进这类研究中的作用

  • Berger, et al. 1954. Freedom and Control in Modern Society

    1. Society, people and individuals 社会控制、群体与个体
    2. State and society 国家和社会
  • Lazarsfeld and Merton (1954) distinguished

    1. Status homophily: people with the same social status

      身份同质性: 相同身份的人会彼此相互联系

    2. Value homophily: people with the same value systems

      价值同质性: 相同价值观的人会彼此相互联系

Miller McPherson, et al. 2001

  • “Birds of a feather”, introduces the mechanism of homophily 《物以类聚》提出了同质性的机制

    • The basic ecological processes that link organizations, associations, cultural communities, social movements, and many other social forms;

      生态过程:场所的影响:同一个机构、社区、共同参加活动

    • The impact of multiplex ties on the patterns of homophily;

      关系过程:交叉关系的影响,一个人在不同的场所

    • The dynamics of network change over time through which networks and other social entities co-evolve.

      网络过程:随时间的变化而变化的动态

James Moody 2001

  • Friendship of high school students
    Dark: Black, White: White
    Circle: grade 9, Square: grade 10, Diamond: grade 11, Triangle: grade 12

同质性与社会交往 Homophily and Friendship

社会交往 Friendship

  • For an individual, he has two types of characteristics

    每个人的特质可分为两种

    • Intrinsic: gender, race, mother tongue, etc

      固有特质:性别、种族、母语等,自然属性

    • Changeable: where he lives, expertise, what he likes, etc

      可变特质:居住区,专场,偏好,建构属性

  • Homophily is the external reason for the creation of social networks

    同质性是社会网络结构形成的基本外部原因

    • Common in race, locations, expertise, interests
    • 血缘、地缘、业缘、趣缘等
  • One key question in social sciences 社会学中的一个基本问题

    1. Commonality -> friendship ? (selection)

      共性——>友谊 ? (选择) 是因为“羽毛相似”才交往呢

    2. Friendship -> commonality ? (social influence)

      友谊——>共性 ? (社会影响) 还是因为“同林”之后才变得”羽毛相似“
      呢?

Discussion

  • Facts:
    Hong Kong has a lot of universities 香港有很多大学

    Students are coming from different high schools at different places (maybe around the worlds) 学生来自不同的高中在不同的地方(可能是世界各地)
    Assume a few students from the same high school enter HK PolyU 假设有几名来自同一所高中的学生进入香港理大

  • Discussion:
    At the time these students graduate, what do their friendship (social structure) look like

    当这些学生毕业时,他们的友谊(社会结构)是什么样的

小结

  • 同质性是一个古老的议题,从柏拉图时代就已经开始讨论
  • 在齐美尔(Simmel)的基础上,Lazarsfeld 和 Merton 提出了同质性的两个类型和机制
  • 进一步的研究提出,影响同质性现象的还有外在因素
  • 在社会交往中,同质性与网络之间有着更为复杂的关系

Lec 6

Social computation

The example of Captcha

IBM -> Microsoft
Microsoft -> Google
Google -> ?

6.1 核心-外围结构 Core-Periphery Structure

Core and periphery 核心-外围:一种社会网络观

  • People are different in their “importance” in social networks – we do not consider this in small world

    在社交网络中,人们的“重要性”是不同的——我们在小世界中不会考虑这一点

    • Comment: No need to be important in every social network

      没必要在每个社交网络上都很重要

    试图想象 Imagine mail delivery

    送信,通过分散搜索方式

    Send to Yao Ming (姚明) in Shanghai
    Send to Ni Ma Ci Ren (尼瑪次仁) in Tibet

    Which one is more difficult?

    1. 社会阶层 :每个人确实都有熟人,不过,不同类型或阶级熟人之间,不会有交往。所以,不曾有连接的两个人之间,不可能建立连接。
  • This is not reflected in decentralized search 这并没有反映在去中心化搜索中

    In political sciences, Wallerstein developed World-systems theory: core countries, semi-periphery countries, and periphery countries 在政治科学领域,沃勒斯坦提出了世界体系理论:核心国家、半外围国家和外围国家

    Also exist in social network? 社交网络中还存在什么?

核心——边缘结构模型

  • Borgatti and Everett in 1999 discovered that in social networks

    观察到,在社会网络中

    • High social status people are in a core of dense connection

      地位较高的人,被连接在一个密集连接的核心

    • Low social status people are at the periphery

      地位较低的人,都分在网络的外围

  • Core-periphery structure exists

    核心——边缘结构

    • Both in theory and in practice

      不仅是理论上,现实生活中,同样普遍存在

  1. 1处于核心位置,是现实社会赋予的
  2. 1,7,8,9都处于核心位置,9更加核心。

例子:微博的转发

把信送给郭敬明,比把信送给右下角的任何一个博主都容易

例子:China Economy Journals 经济学期刊的引用

经济学期刊核心——边缘结构图

经济研究作为一个整体已经形成了学术声望和口碑。

6.2 理论与现实 Practice and Theory

  • 理论上:

    • Recall cluster coefficient.

      处于网络结构中的节点,不同的节点如果有相同的聚集系数,其被连接到的概率应该是一样的

  • 现实中:

    • Milgram has already indicated in 1967 experiment that looking for nobody is more difficult.

      Milgram的第一次就暗示了,寻找地位较低的人(神学院学生的妻子),会更加困难

    • 人们观察到,媒体寻人较个体寻人有更高的成功率

    • Recall structure hole. The person at structure hole is easier to find than others

    回想结构洞,处于结构洞位置上的人,其被找到的概率,远远大于一般节点上的人

社会意义

  • 如果处于更高的社会地位、且在结构洞位置上呢?
    • 网络结构本身是重要的,尤其是在可计算性上
    • 同样重要的是,网络结构的社会属性
    • 具有相同网络结构,却有着不同社会属性的网络,在现实生活中,具有着不同的“可连通性”(社会含义)
    • 社会地位高的人,具有更多的“关系资源”,更好的连通性

总结 Summary

  • Social status can be reflected by the network structure

    社会地位可以通过网络结构体现出来,网络结构会因为节点上的人的社会属性不同,而又不同

  • There can be more to investigate

    可以有更多的调查

  • 社会地位高的人,倾向于有更多的、更广的关系(连接)

  • 社会属性是影响网络结构及其连通性的重要因素

6.3 有向图 Directed Graph

具有方向性的关系 Relationship with directions

  • Citation among papers

    文章之间的引用关系

  • The relationship of fans in Facebook

    微博账号之间的粉丝关系

  • The relationship of webpages

    网页之间的链接关系

  • 人们之间的认识(了解)关系也可以看成是具有方向性的

出度、入度、有向路径 Out degree, in degree, directed path

​ 图1 图2

  • There is a path from X to Y; it is not necessary that there is a path from Y to X

    从X到Y有一条有向路径,不一定从Y到X也有

  • Even if there is a path from Y to X; it is not necessary the same (in reverse direction) path from X to Y

    即使从X到Y有,从Y到X也有,但从这两条路径经过的节点可能完全不同

强连通性 Strong connectivity

  • 在上面的图1,本身不是强连通性的,因为从C-A,A-D没有边,但如果我们加上一条边倒C-A,图1就是强连通性的图了,A—D也有边了(D-B-C-A)

  • 在上面的图2,本来不是强连通的,因为通过有向路径,没有任何点可以到达G。G全部往外出。但如果我们从D和G加上一条边,图2就是强连通性了,从任何节点都可以到G。而且每两个节点都有有向路径。

  • A directed graph is strongly connected if there is a path from every node to every other node.一个有向图是强连通的,如果从每个节点到每个其他节点都有一条路径。

​ 图1 图2

强连通分量 Strongly connected component

  • We say that a strongly connected component (SCC) in a directed graph is a subset of the nodes such that:

    一个节点子集和他们之间的边,满足

    • every node in the subset has a path to every other;

      如果它有两个或者更多的节点,那么其中任意两个节点之间都存在两个方向上的有向路径;

    • the subset is not part of some larger set with the property that every node can reach every other

      不被包含在一个更大的且满足第一个要求的节点集合之中

强连通分量例子

  • 尽可能大的双向连通节点子集

    • 节点9,e,f这三个节点加上他们之间的边, 即使任意两个节点之间都存在两个方向上的有向路径,但不能够成强连通分量。因为它不满足上面的第二项:不被包含在一个更大的且满足第一个要求的节点集合之中。如下图

    • 由节点1,4,3,8,9,d,e,f,I组成的是强连通分量。

    • 2,6,7,b构成的不是强连通分量,因为他第一条就没满足。(2没办法到b)

    • 节点7时强连通分量

  • 思考:一个有向图中,是否可能有一个节点,属于两个不同的强连通分量?

    不可能。

小结 Summary

  • Directed graph: to capture relationship with directions

    有向图是描述具有方向性的关系的工具

    与(无向)图的几个最基本概念的对应概念

  • 节点——节点

    边—— 有向边

    路径(圈)——有向路径(有向圈)

    连通 ——强连通

    连通分量 ——强连通分量

  • Properties:
    Node, link, path
    Strong Connectivity, strong connected component

6.4 将Web看成一个有向图 Web Structure

  • Web上面的信息是通过有方向的超链联系起来的。于是他们可以看成一个巨大的有向图。其中的节点就是我们常说的网页。边就是在网页中设置的那些个超链。这个有向图可以看成是我们全世界千百万人自发,随机性行为的一个结果。

一组网页之间构成的一个有向图示例

Is there a path from Univ. of X to Contact Us

领结:Web信息结构的一种概貌 Web: A Bowtie Structure

  • Andrei Broder et. al., in 1998 analyzed 200 million webpages; and found that the Web has a super SCC, an IN, an OUT and others

    发现万维网包含一个超大强连通分量SCC

    • Tendrils, Tubes, Disconnected components

      链入,链出,卷须(管道),游离

    • There are other SCCs, e.g., in IN/OUT, however, they are not connected to the super SCC

    • A main characteristics of social network – impossible to have more than one SCC with comparable sizes 社交网络的一个主要特征-不可能有一个以上的SCC具有相同的规模

这是怎么知道的?How do we know this?

如何按照“领结”思路,获得一个有向图的几个组成部分?

  • Given a huge set of webpages, how to obtain the SCC, IN and OUT?

    简化:只关心SSC,IN和OUT这三部分

    • Assume we know one node is in SCC (e.g., yahoo frontpage)

      假设我们知道某一个节点一定在SSC中

      例子1: 怎么得到节点1的强连通分量,以及IN,OUT

      ![image-20211005195026322](images/image-20211005195026322.png”/>

​ 图1 图2

  1. 利用广度优先搜索(从节点一开始,沿着有向边做广度优先搜索):

    1 -> 8 ->13 -> 14 -> 9 -> 3, 4, 15 -> 16, 18, 5 ->10

  2. 前项搜索集合(图一达到的点的集合):

    FS = {1, 8, 3, 4, 9, 14, 13, 15, 18, 5, 10, 16}

  3. 利用广度优先搜索

    1-> 4 -> 9 -> 14 -> 13,15 -> 8, 12, 18 -> 3, 7 - >6, 11

  4. 后项搜索集合(图二到达的点的集合):

    BS = {1, 8, 3, 4, 9, 13, 14, 15, 18, 6, 7, 11, 12}

  5. 包含1节点的强连通分量SCC(FS和BS的交集)

    SCC = FS ∩ BS = {1, 3, 4, 8, 9, 13, 14, 15, 18}

  6. IN = BS – SCC = {6, 7, 11, 12} (BS减去SCC)

  7. OUT = FS – SCC = {5, 10, 16} (FS减去SCC)

<img src=”images/image-20211004151203840.png” “image-20211004151203840” style=”zoom:50%;” />

Proof:
(1) All nodes in SCC are connected, e.g., 13->15
13 first goes to node 1, and then from node 1 to 15
(2) There is no bigger SCC

小结 Summary

Directed graph is one method for information organization

有向图是一种信息组织的有效形式

Web can be seen as a directed graph

将Web看成是一个有向图,人们发现它宏观上像一个”领结“。多次数据实验都验证了这个结论(IN,SCC,OUT)

People found that Web has a bowtie structure
Using breath-first search can obtain the components (SCC/IN/OUT) of the web bowtie structure

广度优先搜索,是具体的到“领结”的各个组成部分的基本手段

6.5 中枢和权威 Hubs and Authorities

搜索引擎关心的基本问题 A basic problem of search engine

  • A webpage can only show 6 – 7 search results, yet a typical search engine can return 1 billion results

    计算机显示屏一次只能显示5-6个结果,典型搜索引擎掌握的网页超过10亿

  • How to show a few “right” results out of a huge number of candidates

    对用户提交的一个查询,如何从这种海量网页集合中将最可能满足用户需求的少数几个结果找出来,展现在计算机显示屏上?

传统信息检索 Information retrieval (IR)

  • IR has been there in 1940’s: e.g., used in library, search for information/books

  • 基于词语之间的相关性(Relevance)

  • Conventional IR: 传统应用背景

    • Information (e.g., books) have regular format

      文档集合:图书,规范的文献

    • Customer: librarians, knowledgeable and corporate

      用户:图书管理人员

    • 查询:主题词,关键词

    • 查询意图:获取与查询词有关的书籍和文章

  • Search is based on keywords, where the content contains the keywords

    “查询目标包含查询词” 是一个合理假设

    • 在形成查询词的时候就有这样的潜意识

Searching for books

The content contains the book title

寻找某些非文献信息呢?

What about searching for The Hong Kong Polytechnic University?

Still content matching?

  • Why Google returns the homepage of The Hong Kong Polytechnic University?

    主页放在最前面,一定不是因为其中包含许多“Polyu”的字样

  • Maybe a lot of links are pointed to PolyU? – hidden knowledge in the linkage information

    很可能是由于许多包含“Polyu”字样的网页指向它——利用链接中隐含的信息

信息检索 Information retrieval

Use the (hidden) knowledge (e.g., relationship and the structure) provided by the linkage information is a big progress (of modern search engine) in information retrieval (as compared to conventional IR)

有效利用链接关系蕴含的信息,是搜索引擎超越传统信息检索系统、技术进步的最重要标志

  • Web page 之间的链接有两层含义:关系,描述

Irrelevant to content! 与内容无关!

餐馆推荐问题 Recommend for restaurants

餐馆投票:

给推荐人评分:

比如15: 因为A和D投票给它,A的质量分为8分,D的质量分为7,所以7+8=15

比如21:因为A,C和D投票给它,A:8,C:6,D:7,所以8+6+7=21

反复改进原理 The principle of repeated improvement

  • Assume that the keyword is newspaper 假设查询词“newspaper”
    • The left is the homepage of newspapers 左边是与“newspaper”字面上相关的网页
    • The right is the # of votes – representing recognition 右边是它们所指向的网页,得到的“票数”表示一定的认可度
  • We can future re-weight the quality of the recommenders

    也可以反过来评估“推荐者”的分量

  • Then we re-weight the webpages given the re-weighting of the recommenders

    然后可以再考虑推荐者分量的情况下,重新评估网站相对于“newspaper”的重要性(相当于加权评分)

  • This process can be repeated continuously 这个过程可以反复进行下去

中枢与权威性 Hubs and Authorities

  • Two properties of a webpage

    万维网中一篇网页的两面属性。观念为:

    • being pointed – high authority, good recognition

      被很多网页指向:权威性高,认可度高

    • Pointed to others – strong hub

      指向很多网页:中枢性强

  • The HITS algorithm (Hyperlink-Induced Topic Search)

    HITS算法

    • Compute the values of the hubs 计算网页的中枢性
    • Compute the values of the authorities 计算网页的权威性

auth(p) and hub(p)

auth:被人关注或者认可的程度

hub:所关心事物的权威的程度

  • Input: a directed graph 输入:一个有向图

  • Initialization: for every p, auth(p)=1, hub(p)=1

    初始化:对于每一个节点p,auth(p)=1, hub(p)=1

  • Authority Update Rule: 利用中枢值更新权威值

    • For each page p, update auth(p) to be the sum of the hub scores of all pages that point to it.

      对于每一个节点p,让auth(p)等于指向p的所有节点q的hub(q)之和

  • Hub Update Rule: 利用权威值更新中枢值

    • For each page p, update hub(p) to be the sum of the authority scores of all pages that it points to.

      对于每一个节点p,让hub(p)等于p指向的所有节点q的auth(q)之和

  • Repeat k times 重复上述两步若干 K 次

![image-20211005200217291](images/image-20211005200217291.png”/>

Example

Compute the authority and hub scores of the following graph, run for 3 rounds

求下图各节点的auth和hub值 (算法运行三轮即可)

<img src=”images/image-20211005200327559.png” “image-20211005200327559” style=”zoom:50%;” />

Compute the authority and hub scores of the following graph, run for 3 rounds

When to stop? 越来越大,什么时候算完?收敛?

归一化于极限 Normalization and Convergence

  • 数值随迭代次数递增

  • The purpose of auth and hub is their relative value

    auth和hub值的意义在于相对大小

  • Normalization: 归一化

    • 在每一轮结束后做归一化: 值/总和

    • Divide each authority score by the sum of all authority scores

      在每一轮结束后做归一化: 权威值/总和

    • Divide each hub score by the sum of all hub scores

      在每一轮结束后做归一化: 中枢值/总和

    • It can be proven that when k goes to infinity, the scores converge to a limit

      归一化结果碎迭代次数趋向于一个极限

      • 相继两次迭代的值不变
      • 极限于初值无关,即存在“均衡”

小结 Summary

  • In a social network with the relationship of “cited by” and “recommend”, each node has two roles: authority and hub.

    在一个由“引用”或者“推荐”关系构成的信息网络中,每个节点有两种自然的作用:“权威”于“中枢”

  • Hub and authority can be materialized by the HITS algorithm

    这样的作用可以通过“HITS”算法得到量化

  • Repeated improvement is the key spirit of HITS

    HITS算法的基本精神是基于信息网络的结果,在两个量之间交叉进行“反复改进”

6.6 网页排名 PageRank

节点重要性的一种测度 PageRank: An importance Metric

  • PageRank has the same spirit with HITS

    PageRank与HITS具有相同的精神

  • Each page divides its current PageRank equally across its out-going links, and passes these equal shares to the pages it points to.

    每一个节点将自己的值均分给出向邻居,每个节点将从邻居收到的值加起来

  • a=c/2

    a的入项邻居是c,c指向了a和b,所以它一半给a,一半给b

  • b=a/2 + c/2

    b有两个入项邻居:a和c,a指向了b和d,所以a一半给b,一半给d

    c指向了a和b,所以它一半给a,一半给b

  • c=d

    c有一个入项邻居:d,且d只指向c,所以它不需要分给任何节点

  • d=a/2 + b

    d有两个入项邻居:a和b,且a指向了b和d,所以a一半给b,一半给d

    b只指向d,所以不需要分给任何其他节点

上图的算例(经过70次)

After 70 iterations: A=0.615, B=0.923, C=D=1.231

经过约70次迭代,最后收敛到:A=0.615, B=0.923, C=D=1.231

基本算法描述 The PageRank algorithm

  • In a network with n nodes, we assign all nodes the same initial PageRank, set to be 1/n.

    输入:一个有n个节点的网络(有向图),设所有节点的PageRank的初始值为1/n

  • We choose a number of steps k. 选择操作的步骤数k

  • Basic PageRank Update Rule: 基本的规则

    • Each page divides its current PageRank equally across its out-going links

      每个节点将自己当前的PageRank值通过出向链接均分传递给所指向的节点

      • Passes these equal shares to the pages it points to. (If a page has no out-going links, it passes all its current PageRank to itself.)

        若没有出向链接,则认为传递给自己(或者说保留给自己)

    • Each page updates its new PageRank to be the sum of the shares it receives.

      每个节点以从入向链接获得的(包括可能自传的)所有值之和和更新他的PageRank

一个计算网页排名的实例

  • 每个节点的初值都是1/8
  • 最后的收敛结果见下右图

<img src=”images/image-20211005202943411.png” “image-20211005202943411” style=”zoom:200%;” />

小结 Summary

  • In a social network with the relationship of “cited by” and “recommend”, the importance of each node can be decided by the number of recommenders, and the importance of the recommenders

    在一个由“引用”或者“推荐”关系构成的信息网络中,每个节点的重要性可以认为取决于由多少人推荐,以及推荐人的重要性

  • Such importance can be materialized by the PageRank algorithm

    这种重要性可通过“PageRank算法”得到量化

  • The key spirit of PageRank is, based on the structure of the graph, repeated improvement

    PageRank

    算法的基本精神是基于信息网络的结构,让每个节点不断把自己的重要性分给出向邻居,同时用入向邻居收到的重要性之和来更新自己

6.7 同比缩减与等量补偿

PageRank基本算法在某些结构上的“病态”

  • F和G两个节点显得很“自私”: 不断吸收别人的价值,但不向外分享

PageRank值很快集中到F和G

PageRank的同比缩减与统一补偿规则

  • 同比缩减
    • 在每次运行基本PageRank更新规则后,将每一节点的PageRank值都乘以一个小于1的比例因子s,0<s<1,经验值在0.8-0.9之间。
  • 统一补偿
    • 在每一节点的PageRank值上统一加上 (1-s)/n

这样,既维持了”ΣPR=1“的性质,也防止了PR值过渡集中到个别节点

随机游走:PageRank的另一种等价理解

  • 想象一个人从一篇随机选择的网页开始,然后随机选择其中的链接浏览到下一篇网页,并不断如此进行,称为“随机游走”
  • 考虑任意一篇网页X,问:经过k步随机游走到达X的概率是多少?
  • 可以证明:到达X的概率等于运行PageRank基本算法k步得到的值
  • 随机游走概念稍加修改也可以和同比缩减统一补偿的PageRank等价。

小结

  • 信息一旦刻画成一种网络,其中的边经常自然地隐含着一种“推荐”或者“引用”关系,人们可以利用这种关系对信息的作用进行评估
    • 影响力,重要性,权威性,新颖性,….
  • 先进的评估方法不仅考虑局部结构,而且会考虑全局结构带来的影响(节点特征性质在网络中的传播)
  • HITS算法,PageRank算法
  • 当理想遇到现实——重要现实情况的处理
    • 数据范围问题,退化网络结构问题

Lec 7

7.1 何为博弈

  • Still emphasize that “scenario -> game -> solution”

    • Math people emphasize more on “game à solution”
  • But also terminologies, concepts, math, etc

中国古典中的“博弈”

博弈论中的“博弈”(基本元素)Basic ingredients of a Game

  • 博弈三要素

    • 参与人(player,玩家)

      • 每个参与人有一个策略集
    • 策略集 (strategy,战略)

      • 策略组:每个参与人出一个策略构成的策略组合
    • 回报(payoff,收益,支付)

      • 对应每个策略组,每个参与人有一个回报
  • A set of participants (at least two), whom we call the players

    一组参与者(至少两个),我们称之为玩家

  • Each player has a set of options for how to behave; we will refer to these as the player’s possible strategies

    每个玩家都有一系列行为选择;我们将把这些作为玩家可能的策略

  • For each choice of strategies, each player receives a payoff 对于每一种策略选择,每个玩家都会获得收益

  • The payoff depends on the strategies adopted by others 收益取决于其他人采用的策略

  • When we can clearly define these three ingredients, we say we have a game 当我们能够清楚地定义这三种元素时,我们便称这是一款游戏

  • Strategy profile (or strategy combination), is a set of strategies for all players which fully specifies all actions in a game. 策略集(或者策略组)是指每个参与人出一个策略构成的策略组合

    • A strategy profile must include one and only one strategy for every player.
  • Notation for payoff: P1(S, T), P2(S, T)

田忌赛马

  • 参与人:齐威王,田忌

  • 策略集(此例中两人一样)

      1. 上中下,2. 上下中, 3.中下上,4.中上下,5.下上中,6.下中上
  • 回报

    • 对于策略组(1,1),田忌三个都输(-3)
    • 对于策略组(5,1),田忌赢二输一(+1);等等
  • 策略组例子:田忌:上中下,齐威王:下上中 简记为(1,5)

博弈论中的博弈不总是讲输赢

例子:商场走失问题

  • 两友人同逛一个大商场,走散了。知商场有南北两个门,也知朋友会在某个门等候,你会去哪一个门?
  • 只有协调,才能共赢

例子:“复习功课与准备报告”博弈

  • 临近期末,明天一门课要考试,同时交一个课程报告

  • 两学生今晚面对选择

    • 复习功课

    • 准备报告

  • 考试成绩可以预期:

    • 如果复习功课,则考试成绩92分

    • 如果没复习,则考试成绩80分

  • 报告的分数是和一个拍档共享的:

    • 若你俩都准备报告,则每人100分
    • 若只有一个准备报告,则每人92分
    • 若两人都没准备报告,则每人84分
  • 参与人

    • 你和你的搭档
  • 策略

    • 复习功课,还是准备报告?

    • You and your partner all prepare presentation -> average grade is (80 + 100) / 2 = 90

      我和我的搭档准备pre,(80+100)/2 = 90

    • You and your partner all study the exam -> average grade is (92 + 84) / 2 = 88

      我和我的搭档都准备考试,(92 + 84) / 2 = 88

    • One studies exam and one prepares presentation ->

      • The one prepares presentation (80 + 92) / 2 = 86

      • The one studies exam (92 + 92) / 2 = 92

  • 回报 -> 收益矩阵

收益矩阵(表达博弈的一种直观方式)Payoff Table/Payoff Matrix

  • 其中第一个数字是“你”的收益,第二个是“搭档”的收益
  • 收益(也称“回报”、“支付”,payoff)

博弈论的关切(不同于博弈参与人的关切)

在“理性人”等基本假设下,博弈(作为一个整体)的结果、走向、发展趋势、哪些策略组(合)会被人们采用

博弈推论的假定(Assumption)

  • 自己的回报是每个参与人关心的唯一因素
  • 参与人都是“理性人”,即只要可能,总是要选择有更好回报的策略
  • 每个参与人都对博弈结构的完全了解

小结

  • 一个博弈,由三个基本要素构成:参与人、策略、回报

  • 博弈论关心的,是博弈的结果,即何种策略组合被参与人(联合)采用

  • 为了严格地推理博弈的结果,需要有“理性人”等一些基本假设

7.2 何为博弈的解

考试-报告博弈中的“稳定”策略组合 Exam-Presentation game: how to choose

  1. (准备考试,准备报告)

  2. (复习考试,准备报告)

  3. (复习考试,复习考试)这是个解,因为谁也不会因为改变策略而变得更好

  4. (准备报告,复习考试)

  • Strictly dominant strategy: When a player has a strategy that is strictly better than all other options regardless of what the other player does, we will refer to it as a strictly dominant strategy
  • By our assumptions, every player will select strictly dominant strategy if there is one.
    Study the exam is the strictly dominant strategy for both

囚徒困境 Prisoner’s dilemma

  • Suppose that two suspects have been apprehended by the police and are being interrogated in separate rooms

    假设有两个疑犯被警察抓住,分开关押

  • The police strongly suspect that these two individuals are responsible for a robbery, but there is not enough evidence to convict either of them of the robbery.

    警方强烈怀疑他们和一个抢劫案有关,但证据不足,然而,他们都拘捕得到事实也是可以判刑的

  • Each of the suspects is told 两个疑犯都被告知以下选择及其后果

    • If you confess, and your partner doesn‘t, then you will be released and your partner will be charged and sent to prison for 10 years.

      如果你坦白,而同伙抵赖,则你马上释放,他将承担全部罪行,会判刑10年

      如果你们都坦白,都判刑4年

      如果都不坦白,拘捕罪判处,都判刑1年

    • If you both confess, then you will both be convicted and sent to prison for 4 years
      Finally, if neither of you confesses, then you will be charged for 1 year of resistance.
      Do you want to confess or not?

囚徒困境的收益矩阵

  • C: confess 坦白, NC: non-confess 抵赖

  • The strictly dominant strategy for both suspect 1 and 2 is confess
    Even though if two NC, they will be sentenced less

    疑犯1和疑犯2的严格占优策略都是“坦白

    尽管如果两人都抵赖都会判的少些

    When there is selfishness, cooperation is difficult

“营销战略”博弈的推理

  • Two firms plan for a new product

  • The product has two versions, low-price and upscale

  • People who prefer a low-priced version account for 60% of the population

  • People who prefer an upscale version account for 40% of the population

  • Assume that the profit is determined by market share

  • If two firms target at different market segments, they each get all the sales in that segment. So the low-priced one gets .60 and the upscale one gets .40.

  • Assume firm 1 is a better brand, and gets 80% share if both compete in the same segment

    Each firm tries to maximize its profits

  • 公司1有严格战略策略(廉价),但公司2没有
  • 不过,对于公司1的廉价策略,公司2有一个严格最佳应对策略(高档)

Performance enhancing drugs

  • Also called arms races game: no good or even harmful for each one internally, but to make sure that each is competitive, have to stay in the competition.

    也叫军备竞赛游戏:对每个人都没有好处甚至是有害的内部,但要确保每个人都是竞争性的,必须留在竞争中。

  • If we don’t develop weapons, the money can go to areas to benefit people

    如果我们不发展武器,这些钱就可以用于造福人民的领域

Best response 最佳应对

S is a strategy for player 1, and T is a strategy for player 2, then in pay-off matrix, there is a cell with (S, T)
P1(S, T): the payoff of player 1 from (S, T)
P2(S, T): the payoff of player 2 from (S, T)
Best response:
a strategy S for Player 1 is a best response to a strategy T for Player 2 if S produces at least as good a payoff as any other strategy paired with T, i.e.,
P1(S, T) ≧ P1(S’, T) for any S’

Strict best response:
a strategy S for Player 1 is a strict best response to a strategy T for Player 2 if S produces at least as good a payoff as any other strategy paired with T, i.e.,
P1(S, T) > P1(S’, T) for any S’

Note: best response targets at a single strategy of the opponent (T), and is among all strategies of himself
For different T, there may be different best response

Is there one (exist 存在) and only one (unique 唯一)?

For a T, there is only one strict best response

The game between two firms

Firm 1 has strictly dominant strategy, but firm 2 doesn’t

Firm 1 will adopt low-price (strict dominant)

Firm 2 will adopt upscale (best response to low-price)

占优策略和严格占优策略 Dominant strategy and strictly dominant strategy

Definition (from best response point of view)
The dominant strategy, S, of Player 1 should be the best response for each strategy of Player 2
The strictly dominant strategy, S, of Player 1 should be the strict best response for each strategy of Player 2

Note: dominant strategy targets at all strategies of the opponent, best response targets at one strategy of the opponent
If one has strict dominant strategy, we can assume that he will take it (follow our basic assumptions on game)

严格与不严格的区别

  • U是参与人1的严格占优策略,R是参与人2的占优策略,但不是严格的
  • L是U的最佳应对,但不是严格的;R是D的严格最佳应对

strategy is weakly dominant if, regardless of what any other players do, the strategy earns a player a payoff at least as high as any other strategy, and, the strategy earns a strictly higher payoff for some profile of other players’ strategies. Hence, a strategy is weakly dominant if it is always at least as good as any other strategy, for any profile of other players’ actions, and is strictly better for some profile of others’ strategies.

如果无论其他参与人怎么做,该策略为参与人带来的收益至少与其他策略相同,那么该策略就是弱优势策略,
这个策略从其他玩家的策略中获得更高的收益。因此,如果一种策略在其他玩家行动的任何情况下与其他策略一样好,并且在其他策略的某些情况下优于其他策略,那么该策略就是弱优势策略。

小结

  • 如果两个人都有严格占优策略,则可以预计他们均会采取严格占优策略
  • 如果只有一个人有严格占优策略,则这个人会采取严格占优策略,而另一方会采取此策略的最佳应对(一定会有!)

Lec 8

7.3 纳什均衡 Nash Equilibrium

Assume player 1 selects strategy S, and player 2 selects strategy T. If S is the best response of T, and T is the best response of S, then we say that strategy group (S, T) is a Nash Equilibrium
In Nash Equilibrium, no one has the incentive to change their strategy

Nash Equilibrium: (best responses to each other) No one can become better by unilaterally change his own strategy; though both can become better if both changes.

纳什均衡:互为最佳应对的策略组

三客户博弈的解 three client game

  • There is no strictly dominant strategy for any firm

    两家公司都没有严格占优策略

  • There exists Nash Equilibrium (A, A)

    策略组(A,A)中的两个策略互为最佳应对

  • Find Nash Equilibrium
    Check every strategy-pair, and see if it is the best response for each other
    Find best response(s), and find the mutual best responses

协调博弈(出现两个或多个纳什均衡) Multiple Equilibria

  • There are two Nash Equilibria (PPT, PPT), (Keynote, Keynote)

    有两个纳什均衡(北门,北门)和(南门,南门)

  • 如何预测协调博弈中参与人的行为

  • Schelling focal point model, Social conventions, etc

Assume you and your partner all likes Apple Software

From Schelling focal point model, (Keynote, Keynote) is more likely with a (2, 2)

What if people’s preferences differ?

Assume you and your partner like different software

It is difficult to predict what will happen purely based on the structure of game

Additional information may be needed

需要借助外部的信息,比如两人事先说好走失 去南门

Stag (鹿) – Hare (兔) Hunt game 鹰鸽博弈的推理

Payoff matrix

The equilibria are (H, D) or (D, H)

两个均衡,不能推断到底哪个均衡会出现

一般来说,纳什均衡概念能有助于缩小预测范围,但它并不一定能给出唯一的预测

Different multi-equilibrium game

习题:

  1. What is the Nash equilibrium for the pure strategy under the following payoff matrix?

We’ll use “A=U” to represent “A chooses U”.

For A:

When B=L, A=U

When B=R, A=U

For B:

When A=U, B=L

When A=D, B=R

Therefore, (U,L) is one Nash equilibrium.

(U,L) is correct.

  1. Select all Nash equilibrium under the following payoff matrix

For pure strategy:

When B chooses L, A will choose D;

When B chooses R, A will choose U.

When A chooses U, B will choose R;

When A chooses D, B will choose L.

For mixed strategy, suppose A chooses U with probability of p, and B chooses L with probability

of q.

For A,

A chooses U will get 1*q+4(1-q)=4-3q;

A chooses D will get 3q+2(1-q)=2+q. 2+q=4-3q, and q=1/2.

For B,

B chooses L will get 1*p+3(1-p)=3-2p,

B chooses R will get 2p+2(1-p)=2. 2=3-2p, and p=1/2.

The Nash equilibrium for mixed strategy is (1/2, 1/2).

Therefore, the Nash equilibria are (D, L), (U, R) and (1/2, 1/2)

  1. Consider the following 2-player game. Select the statement(s) which is/are correct

For pure strategy:

For A:

When B=L, A=U

When B=R, A=L

For B:

When A=U, B=L

When A=D, B=L

Therefore, (U,L) is one Nash equilibrium.

Suppose the payoff changes to y (y>0).

For pure strategy:

For A:

When B=L, A=U

When B=R, A=L

For B:

When A=U, when y>2, B=L; when y=2, B=L or R; when 0<y<2, B=R

When A=D, B=L

Therefore, when y>=2, there exists Nash equilibrium (U, L) for pure strategy, when 0<y<2, thereis no Nash equilibrium for pure strategy.

Now let’s consider where there exists Nash equilibrium for mixed strategy when 0<y<2.

For B:

Choose L: yp+(1-p)=(y-1)p+1

Choose R: 2p

Therefore, we can get 2p=(y-1)p+1, p=1/(3-y).

When 0<y<2, 0<p<1. Therefore, we can get Nash equilibrium for mixed strategy when 0<y<2.

3 is incorrect

Suppose the payoff changes to x (x>0).

For pure strategy:

For A:

When B=L, when x>2, A=U; when x=2, A=U or D; when 0<x<2, A=D

When B=R, A=L

For B:

When A=U, B=L

When A=D, B=L

Therefore, when x>=2, there exists Nash equilibrium (U, L) for pure strategy, when 0<y<=2,

there exists Nash equilibrium (D, L) for pure strategy.

4 is incorrect.

小结

  • 占优策略,严格占优策略

  • 最佳应对,严格最佳应对

  • 互为最佳应对的策略组 - > 纳什均衡

  • 具有多个纳什均衡的博弈

    思考:如果不存在纳什均衡,该怎么办?

一个不存在纳什均衡的博弈

硬币配对——零和博弈(zero sum game)

  • 你和他各持有一枚硬币,分别决定出示手中硬币的某一面。若你们硬币的朝向相同,他将赢得你的硬币,反之,你将赢得他的硬币

    Two people each hold a penny, and simultaneously choose whether to show heads (H) or tails (T) on their penny. Player 1 loses his penny to player 2 if they match, and wins player 2’s penny if they don’t match

    此时,不存在一组互为最佳应对策略

混合策略的引入 Mixed strategy

  • 引入随机性,考虑参与人将以一定的概率在不同策略间进行选择,一个概率对应一个“策略”(称为混合策略)。此时,选择策略就是选择概率,而博弈矩阵中给出的选项成为纯策略。
    • 一般地,混合策略是一个概率分布,双策略情形等价为一个概率
  • 通常,存有两个纯策略H和T的情形,我们说
    • 你的策略是概率p,是指你以概率p执行H;以概率1-p执行T
    • 他的策略是概率q,是指他以概率q执行H;以概率1-q执行T

作为博弈,三要素齐了吗?

此时的策略是在两种固定(纯)策略上选择的概率,每一组纯策略是对应有固定收益的。因为,从概略意义出发,此时的收益应该体现一种在两种纯策略上的“平均”(期望)

  • P1(p, q) is the expected payoff where, with probability p, player 1 choose U and, with probability (1-p), player 1 choose D

  • What is the payoff when player 1 chooses U? ->

  • P1(U, q) is the expected payoff where, with probability q, player 2 chooses L, and with probability (1-q), player 2 chooses R

    P1(p,q) = p P1(U,q) + (1-p) P1(D,q)

    P1(U,q) = q P1(U,L) + (1-q) P1(U,R) ->

    = 0.1 * 4 + 0.9 * 0 = 0.4

    P1(D,q) = q P1(D,L) + (1-q) P1(D,R) ->

    = 0.1*3 + 0.9 *3 = 3

    P1(p,q) = 0.1* 0.4+0.9*3 = 2.74

例子

$你的回报 = 0.60.4+0.4(-0.4)= 0.08$

$你的回报 = 0.6你选H的回报+0.4你选T的回报$

$你选H的回报 = 0.3他选H时你选H的回报+0.7他选T时你选H的回报= 0.3*(-1)+0.7*1 =0.4$

$你选T的回报 = 0.3他选H时你选T的回报+0.7他选T时你选T的回报= 0.3*(1)+0.7*(-1 )=-0.4$

混合策略均衡求解

例子1

  • 0.5,0.5 是这个硬币配对博弈的混合策略纳什均衡,对应的也能算出q=0.5
    • (q)(-1)+(1-q)(+1) = 1-2q
    • (q)(+1)+(1-q)(-1) = 2q-1
  • 这个结果是符合直觉的,假如自己也参加这个博弈,不会倾向于出正面或者反面,因为一旦我倾向出正面,那么出正面的概率高,对方发现后就会出反面,肯定就会赢我。

例子2

  • (p,q)=($\frac{1}{3},\frac{2}{3}$)是互为最佳应对的概率策略

问题

  • 一个博弈没有纯策略意义下的均衡,就一定有混合策略均衡

  • 一个博弈,如果有纯策略意义下的均衡,还可能有混合策略均衡吗?

兼具纯策略和混合策略均衡的博弈

  • 两个纯策略均衡(PPT,PPT)和(Keynote,Keynote)

  • 试求混合策略均衡

    p1+0 * (1-p) = p0 + (1-p) * 2 -> p=2-2p -> p=2/3

    q1 + (1-q) * 0 = q0 + (1-q) * 2 ->q=2-2q -> q=2/3

小结

  • 博弈均衡有两种

    • 纯策略均衡,混合策略均衡
  • 任何博弈都存在均衡

  • 可能是某一种,也可能两种都有

7.4 博弈的解与社会福利

Pareto optimality

A choice of strategies — one by each player — is Pareto-optimal if there is no other choice of strategies in which all players receive payoffs at least as high, and at least one player receives a strictly higher payoff

如果没有其他策略可供选择,使所有参与人获得同等高的收益,且至少有一个参与人获得严格较高的收益,那么每个参与人的一个策略选择就是帕累托最优策略

(presentation, presentation) is Pareto-optimal
(presentation, exam) and (exam, presentation) are also Pareto-optimal: there is no strategy set that everyone is doing at least as well (better)
(exam, exam) is not, as (presentation, presentation) everyone can do better (i.e., group benefit) – there needs a binding agreement however, as player has incentive to change now

博弈导致的社会福利

  • 社会福利:一个策略组对应的回报的总和

​ 上面有四种策略组合,U,L为社会最优

社会最优

  • 社会福利最大的策略组合

    • 均衡是博弈的解(走向、结果),但不一定是社会最优

社会最优和纳什均衡有可能一致

  • 按照这个收益矩阵,(报告,报告)即是社会最优也是均衡
  • 从社会应用的意义讲,均衡与社会最优一致的系统是理想系统

博弈论基本概念总结

  • 博弈三要素
  • 作为博弈推理基础的三个假设
  • 便利博弈推理的几个概念
  • 纳什均衡
  • 简单混合策略博弈的求解(无差异原理)
  • 均衡与社会福利

Lec 9

8.1 交通网络上的博弈 Modeling Network Traffic using Game Theory

网络结构上的博弈 A game on network structure

  1. Transportation network 公共交通网
  2. Holiday: drive or not? which road? 是否出门? 走哪条路线?
  3. This depends on what others are doing 有意无意中,你会想:别人会怎么样?

网络中的博弈:一个简化的例子

4000辆车, 都要从A到B 4000 vehicles want to travel from A to B

  1. 参与人: 4000司机

    Players: 4000 drivers

  2. 策略选择: “走上面”和 “走下面”

    Strategy set: upper path, lower path

  3. 回报:行驶时间(越小越好), 显然也取决他人的策略

    Payoff: travel time (the less the better, but also depends on other’s choices)

  4. 均衡: 没人通过改变选择可以得到更好回报

    Equilibrium: no one has the incentive to change

It is not easy to write down the payoff matrix, but that is fine. We have the three elements of a game very clear.

4000辆车,要从A到B

  1. 均衡:上下路上各2000辆车

    Equilibrium: 2000 A-C-B, 2000 A-D-B

  2. 对每辆车而言,对应回报为65

    Payoff for each driver: 65

  3. 此时, 若某人要改变,则他的行驶时间为2001/100+45>65,因此没人会改变

    If anyone deviates, his payoff will be: 2001/100 + 45 > 65

设想在C和D之间新修了一条快速路

Assume the government want to do something good: let’s build a new road and it is an express road, e.g., travel time on this road is negligible!

  • What will happen?

  1. 均衡:若大家都走:A-C-D-B
    Equilibrium: every one use A-C-D-B

  2. 每人行驶时间为:

    Travel time is
    4000/100+0+4000/100=80

  3. 注意,在没修这条路致歉,均衡中每人行驶时间是65

  4. 如果某人盘算改变A-D-B,则他的行驶时间将会变成45+4000/100=85>80,于是他不会改变!

    If someone tries to deviate,
    Travel time is:
    45+4000/100=85>80

    It is worse than 65 !

  5. No way, no longer equilibrium

    If you are the one on A-C-B

    A-C-D-B will be faster, there is incentive to change

    This is called Braess’s Paradox (布雷斯悖論)

为什么大家不能像以前那样?
布雷斯悖论的普遍性 Generality of Braess’s Paradox’

Starting from 2012 mid-autumn festival, Chinese government announced a policy to benefit people: the highways in China become free in public holidays
The revenue loss will be 20 billion RMB
This didn’t change the transportation network structure, but changed the ticketing policy

Can we consider this as a generality of Braess’s Paradox and model it? Paradox, i.e., 好心辦壞事

小结

  • 通过一个简单的交通网络模型,我们看到“在网络上的博弈”的一种范式,特别是结构对均衡的影响。

    A simple example shows a game on network structure

  • 我们看到了“布雷斯悖论”的出现,它其实可以看成是我们现实社会生活中有时见到的“投入资源反而使情况更糟”情形的一种简单化、但有效的解释。

    We see Braess’s Paradox

  • 这个例子也告诉我们,在现实生活中,参加一个博弈,可能是无形中的。

    Invest more resources may not get a good result

More examples:
  • In 1997, HK Government proposed to build 85,000 apartment annually (八萬五建屋計劃)

  • Reforming the education system may ends up to making the demand-supply of schools worse: e.g., congestions in “good” schools, and no interest in “bad” schools

  • Reforming the medical systems may ends up to making the demand-supply of hospitals worse
    Can these problems be modeled into Braess’s Paradox?

8.2 布雷斯悖论现象的一般性

例子

![image-20211110191711719](images/image-20211110191711719.png”/>

![image-20211110191724540](images/image-20211110191724540.png”/>

![image-20211110191740052](images/image-20211110191740052.png”/>

小结:

![image-20211110191821727](images/image-20211110191821727.png”/>

8.3 拍卖的意义及其形式 Auction

Game and auction

  1. In game and Braess’s Paradox, we see the interaction of rational behavior, equilibrium, and network structure
  2. Can we change the condition and let the players to directly interact (of course, we also need to set up a set of rules for their interaction)?
  3. Auction can be one of such scenario

拍卖的普遍性 Auction is everywhere

  • 拍卖Auction是再典型不过的节点之间的互动了,例

    佳士得和苏富比拍卖艺术品 Auctions by Christie and Sotheby’s

    政府拍卖地块、车牌、债券等 Government auctions the land, license, etc

    毕业了,把自己不要的书和杂碎买了After graduation, sell your books, groceries, etc

  • 竞标Bidding也是一种拍卖Bidding is also an auction

拍卖的意义Auction is important

  1. 拍卖是普遍存在的经济互动形态It is everywhere, with very simple format, but there can be complicated interactions

    • 模式简单,互动却复杂
  2. 也是博弈论应用的典型场景 It is also a game

    • 参与者:参加拍卖的人,买卖双方,相当于两房博弈Participants: sellers and buyers

    • 策略:出价 Strategy: bid

    • 收益:对物品的估值 (支付价格),或为0,即不成交;对卖方而言,就是其得到的支付价格,或者为0

      Payoff: for the buyers: the value of the object, (0 if the auction fails); for the sellers: the paid price, 0 if the auction fails)

    • 均衡:即在该状态下所有参与者的策略互为最佳应对,任何个人都没有理性的动机来改变自己的策略,在拍卖中如何体现?Equilibrium: best response for each other, i.e., no one has the incentive to deviate

拍卖的形式 The format of auctions

  1. 在拍卖中,均衡是拍卖规则下的均衡,因此,拍卖规则或拍卖形式,对均衡的达成具有直接影响
    The equilibrium depends on the format and regulations of the auctions
  2. 最简单的,单品拍卖,至少有以下形式 The format also influence the strategy choices of the buyers and sellers
  1. Ascending-bid auctions/English auctions: the seller gradually raises the price, bidders drop out until finally only one bidder remains;

    • This is useful for auctions of art works, antiques, etc.
  2. Descending-bid auctions/Dutch auctions: the seller gradually lowers the price from some high initial value until the first moment when some bidder accepts and pays the current price;

    • This is useful for auctions of flowers, fresh farm products.
  3. nFirst-price sealed-bid auctions: bidders submit simultaneous “sealed bids” to the seller, who would then open them all together. The highest bidder wins the object and pays the value of her bid.

    • This is used in call for bid auctions (招標)
  4. Second-price sealed-bid auctions/Vickrey auctions: Bidders submit simultaneous sealed bids to the sellers; the highest bidder wins the object and pays the value of the second-highest bid

    • This is used in advertisement auction in Internet websites

拍卖形式的几个要点 A few points in auction formats

  1. Who get the object

    • Usually the highest or lowest bidder
  2. What kind of price to pay

    • First price or second price, this influences the strategy of the bidders
  3. Do the bidders know the price of others

    • Sealed auction, needs to guess about others

    • Unsealed auction, can see other’s bids

A summary

  1. Auction is common in our life
  2. Auction has many formats
  3. Auction different objects may need to choose different auction formats

Gaming in Auction

Sealed-bid auctions

  1. First-price sealed-bid auctions (FPA)
    • Highest bidder wins and pays the value of her bid
  2. Second-price sealed-bid auctions (SPA)
    • Highest bidder wins and pays the value of the second-highest bid.
  3. Formulating as a game
    • Players: bidders
    • Strategy: bid
    • Payoff: true value – payoff, or 0 (if auction fails)
  4. A game-oriented thinking
    • Equilibrium! Best response to each other, no one changes

为什么要支付次价?

  1. 假设有一件物品 An object

    • 不同人,对其价值有不同的认识(即愿意花的最多的钱):v1, v2, …, vk

      Different people may have different values for it, v1, v2, …, vk. These are true value/intrinsic value, i.e., player i will pay at most vi.

    • Players don’t know other’s true values

    • 每个人提出一个竞拍价:b1,Every one has a bid, assume b1 > b2 > … > bk

    • 出价最高者得到购买权,但只需支付 (b中)第二高的出价 By the rule of second-price auction, the payoff of the highest bidder is v1 – b2 and others are 0; here, b1 is the highest bid, so v1 is the true value

      How do you play the game, i.e., how do you bid?

8.4 拍卖中的博弈与占优策略

拍卖作为一种博弈 Formulating as a game

  • 参与人:参加拍卖的人 Players: bidders
  • 策略:出价 Strategy: bid
  • 回报:对物品的估值-支付的价格,或者0(不成交) Payoff: true value – payoff, or 0 (if auction fails)

报价密封拍卖的两种基本形式 Sealed-bid auctions

首价密封拍卖(FPA) First-price sealed-bid auctions (FPA)

  • 最高报价者得到交易权支付最高报价 Highest bidder wins and pays the value of her bid

次价密封拍卖(SPA) Second-price sealed-bid auctions (SPA)

  • 最高报价者得到交易权支付次高报价 Highest bidder wins and pays the value of the second-highest bid.

一般性考虑次价密封拍卖问题 Second-price sealed-bid auctions

  • 一件物品 An object

  • 估值,不同人对它的价值有不同的认识(即底价,最多愿意花的钱):v1, v2, …, vk

    Different people may have different values for it, v1, v2, …, vk. These are true value/intrinsic value, i.e., player i will pay at most vi.

  • 每个人提出一个竞拍价:b1,Every one has a bid, assume b1 > b2 > … > bk

  • 次价拍卖规则:出价最高者得到购买权,但只需支付 (b中)第二高的出价,即v1-b2是出价最高的参与人的收益,其他人收益为0. By the rule of second-price auction, the payoff of the highest bidder is v1 – b2 and others are 0; here, b1 is the highest bid, so v1 is the true value

  • 你如何出价最优?

什么策略“最优”? What strategy is optimal

博弈论视角(占优策略):不可能通过改变其他策略得到更大的回报(估值-付出),无论别人出价策略如何。

结论:按照自己的估值出价最优

次价拍卖按估值出价最优性的论证

  • 假设在一次拍卖活动中你认为拍品最多值100元,你也出价100元现在考虑你能否通过不同的出价获得较大回报。
  • 第一,你获得了交易权。此时,你有正的回报
    • 提高报价不会改善回报;(100-次价>0)
    • 降低报价,若不低于第二个人的,也不会改善回报,若低于第二个人的,则失去了交易权,回报变成0(减少了)
  • 第二,你没有获得交易权(有人出价X>100)。此时,你的回报为0。
  • 隆低报价不会改变现状;
  • 提高报价,若不高于第一个人的报价,也不会改善回报,若高于第一个人的,你赢得交易权,但要支付原来第一个人的报价(高于你的估值),于是回报为负(减少了)(100-0<0)

小结

  • 采用次价密封拍卖的规则,拍卖一件物品,对参拍人来说,按照估值报价是占优策略
    • 在现实中,参拍人自觉应用这个结论的困难在于每个人其实很难知道自己对一件物品“估值”到底是多少
      • “估值” 不等于 “我愿意出的钱数”
        “估值” = “我绝不接受高于这个钱数”
  • 首价密封拍卖没有这个性质

In second-price sealed-bid auction, bid the true value is a dominant strategy (more in Chapter 15)
First-price seal-bid auction does not have such property

It may not be easy to know the “true value”

Lec 10

9.1 匹配问题 Matching

匹配市场问题框架 The Matching Market Framework

Sellers, buyers, valuation, different on different objects

We care whether there is a matching that everybody will be satisfied (i.e., happy)

What kind of mechanism will provide such matching

简单匹配问题

  • 我们看到,二部图是表达这种供需关系的方便自然公的工具
  • 问,是否有一种安排(配置),使每人都满意?

Y表示想要,N表示不想要

这两个图体现的要求能被满足吗?

图A不存在完美匹配,图B有一个完美匹配

  • “不能满足” = 存在“受限组”;S,N(S)
  • “能被满足” = 存在“完美匹配”

基于估值的匹配问题 Matching based on valuations

  • 人们表达偏好不一定就是“要”和“不要”。对不同的物品,可以有多样性的价值判断〔估值)Not only buying or not buying, there can be different valuation on a single object
    • 同一个物品,不同的人,估值可能不同
    • 同一个人,不同的物品,估值可能不同
  • 如何在“人”和“物品”之问进行配置(匹配)?

如何在他们与它们之间进行安排?

  • 比如Xin对room1的看法是12,room2是2,room3是4
  • 这里,不同于简单匹配问题,不存在“能否安排”的问题,而是哪种安排“最好”的问题

哪一种配置较好?

如何评估一种安排的好坏?:社会福利=参与人收益综合

a:12+7+2=21

b:2+6+7=15

还有没有更好安排?社会最优:参与人收益总和最大

小结

  • 匹配问题描述的基本框架
  • 通过简单匹配问题,引入供需关系之间的二部图表达方式
  • 不同供需配置方案的优劣比较指数一社会福利,以及对“社会最优”性质的认识

9.2 匹配市场问题的解 Solving the Matching Market Problems

  • 如果供需方的个数分别为N,则一共有N!个配置方案

例子

21=12+7+2

23=12+6+5

偏好卖家图 Preferred seller graph

a涨价后

ab涨价后

a继续涨价

完美匹配卖家图

12+6+5=23

进一步理解匹配问题

最优配置对应估值矩阵中使求和最大的不同行不同列的那些元素

其他例子

9+6+7+9=31

9.3 市场经济思路

  • 引入价格机制,让需求方选择(博弈),看最后是否已能到到宏观最优的结果。

清仓价格 –> 最大的$\sum$估值

$$
\sum 收益(差价)=\sum 估值-\sum 价格
$$

  • 考虑清仓价格,以及偏好卖家图中的完美匹配对应的配置(M)

10.1 搜索引擎中的广告市场

搜索引擎广告市场的基本问题

  1. 广告主如何向互联网公司支付广告费用?
  2. 互联网公司如何将广告空间分配给广告主?
  3. 每个关键词广告的定价问题?

互联网广告的付费方式

  • 按展示时间计费
  • 按行动计费
    • CPC广告(cost per click):每次点击的费用
    • CPA (cost per action)
  • 按销售计费

点击费用排名

分配问题

  • 市场匹配的基本原则:价格可以使商品的分配分散化,产生最优的社会分配

每个广告位的价值

  • 广告位价值 = 点击率 * 点击价值针对不同广告主

定价问题

  • 匹配市场——前提是知道广告主的价值
  • 竞价拍卖——如果无从得知广告主的价值

小结

  • 网络广告类型及其盈利模式
  • 搜索引擎基于关键词的广告市场问题
    • 收费方式
    • 广告位与广告主分配
    • 广告位定价模式

10.2 多广告主、多广告位的匹配

匹配市场基本要素

  • 商品有不同售价

  • 买家对每个商品有估值

  • 匹配原则:最大化回报

  • 完美匹配——最优分配

  • 市场清仓价——完美匹配价格

构造广告位的市场清仓价格

初始情况

a加价到5

a提高到10

a,b同时提高

  • 按照点击价值的高低配置对应的广告位达到社会最优

小结

  • 运用匹配市场原理构造广告位的市场清仓价
  • 如果不了解估值,采用拍卖机制

10.3 次价拍卖方式的直接推广(GSP)

用哪种方式?

  • 单品首价拍卖
  • 单品次价拍卖
  • GSP:单品次价拍卖机制的一种“自然”推广

GSP实现规则

  • 支付点击价:b2
  • 回报:v1*r1 - b2 * r1
  • v1是广告主1的点击估值

占优策略?

最优分配?

纳什均衡?

真实报价不能构成纳什均衡

  • x为b的支付的点击费用应该是z的出价1
  • 那么他的回报就是:x的点击估值:7* b的点击率:4 - 支付的价格:1 * b的点击率4:
  • 7*4 -1 *4

构建GSP定价机制最优纳什均衡

每个广告主, 计算出他对每一个广告位的估值。

X对三个广告位的估值分别是70,28,0

Y对三个广告位的估值是60,24,0;

Z对三个广告位的估值是10,4,0

那么有了估值以后,我们就可以计算出,三个广告位的市场清仓价格,

我们计算出来,这三个广告位A、B、C,它的市场清仓价分别是40,4,0;

GSP存在多重均衡

  • 上表x=3, y=5, z=1依旧满足纳什均衡
  • GSP多重均衡包括社会最优均衡,非优均衡

小结

  • GSP——真实报价不是占优策略,多重均衡
  • GSP是实际中用得较多的广告位定价机制
    -广告主容易懂
    -性质比较复杂

10.4 次价拍卖方式的优化推广(VCG)

对单品次价拍卖支付价格的一种理解

  • 第一个人支付价V2含义:补偿给其他人(集体)带来的价值损失

从这个思路推广单品次价拍卖

  1. 构建一个最优分配

  2. j为I支付的VCG价格 = $\sum_2-\sum_1$

VCG价格计算例子

x为a应该支付的VCG价格

  • 如果x存在:

  • 如果x不出现时

  • 所以x的VCG价格应该是25-12=13

y为b应该支付的VCG的价格

  • y出现时

  • y不出现时

  • 所以y的VCG价格是 35-32=3

z为c应该支付的VCG的价格

  • 0,因为它的出现与否不影响其他人获得的价值

VCG机制的特性

  • 社会最优
  • 真实报价是占优策略

搜索公司的收入

  • 基于VCG机制

    总收入=40+4=44

  • 基于GSP机制

    总收入=4 * 10+2 * 4 = 48

    另外一组

    总收入=30+4=34

小结

  • VCG是理论上最漂亮的广告位定价机制
  • GSP是实际中用的较多的广告位定价机制

为什么VCG是优化的

最高估值总和$V_B^S=\sum v_{ij}$

VCG定价机制的执行

证明

小结

  • VCG是理论上最漂亮的广告位定价机制
    • 社会最优
    • 鼓励真实报价(占优策略)