使用python从零实现囚徒博弈

yurika34 · 发表于 2022-9-1 22:36

（所有代码都在附件中）1、囚徒博弈简介
两个共谋嫌疑犯作案后被警察抓住，分别关在不同的屋子里接受审讯。警察知道两人有罪，但缺乏足够的证据。警察告诉每个人：如果两人都抵赖，各判刑一年；如果两人都坦白，各判八年；如果两人中一个坦白而另一个抵赖，坦白的放出去，抵赖的判十年。于是，每个囚徒都面临两种选择：坦白或抵赖。然而，不管同伙选择什么，每个囚徒的最优选择是坦白：如果同伙抵赖、自己坦白的话放出去，抵赖的话判一年，坦白比不坦白好；如果同伙坦白、自己坦白的话判八年，比起抵赖的判十年，坦白还是比抵赖的好。（来自度娘，仅为科普，了解的可以跳过）
判刑是负收益，换成更易理解的得分:（A,B）表示A与B在各自选择下的得分
得分.png

双方都合作各得一分，都叛变则都是零分······
对于多次博弈，怎样才能规避双输，或者说，使自己拿到更多的分呢？分享一个python代码实现，youtube链接：https://youtu.be/pMHOqotUiP8
附件是所有用到的py代码。无外部引用库，所有工具函数由自己编写，看完这个小项目，除了让你了解如何用python实现，也可以不用把所有代码挤在一个py文件中。

2.包的介绍
下载解压之后直接运行main.py，选1是测试某个固定策略对所有策略的得分，选2则是与某一固定策略对弈。这里以选1的titforTat为例。首先与其它策略进行20次重复博弈，得出均分。然后“if everybody was doing it”,与自己博弈，求得均分。选1时调用AISimulation.py，传入其它策略进行博弈；选2时调用AhumanGame.py.下面是main.py的代码

import AIsimulation,AhumanGame,alwaysCollude,alwaysDefect,titForTat,randomBasic,randomColluding,randomDefecting,grudger,pavlov,Sanjin,myStrategy,titfor2Tat
#引入记分、模拟与运转策略的库

choices = ['1-alwaysCollude','2-alwaysDefect','3-titForTat','4-randomBasic','5-randomColluding','6-randomDefecting','7-grudger','8-pavlov','9-Sanjin','10-myStrategy','11-titfor2Tat']
#选择策略，列表
strategies = {1:alwaysCollude,2:alwaysDefect,3:titForTat,4:randomBasic,5:randomColluding,6:randomDefecting,7:grudger,8:pavlov,9:Sanjin,10:myStrategy,11:titfor2Tat}
#字典
print('Here are your game options')
print('press 1 to test your AI strategy against all other AI strategies')
print('press 2 to play against an AI strategy of your choice ')
choice = int(input())

if choice == 1:
  print('here are the strategies, choose one')
  print(choices) #展示可供选择的策略
  num = int(input('choose a strategy via number'))
  strategy = strategies[num]
  AIsimulation.testStrategy(strategy,20) #AIsimulation中的函数testStrategy调用

if choice == 2:
  print('who do you want to play against')
  print(choices)
  num = int(input('choose a strategy via number'))
  strategy = strategies[num]
  rounds = int(input('how many rounds do you want to play:'))
  AhumanGame.play(strategy,rounds)

3.工具函数
你可能以为描写策略的工具函数十分复杂，可实际上它可能简单到只有一个函数。以最经典的titforTat（以牙还牙）为例，记录对手的历史并重复其上一次的行为。If opponentMove==’start’,即初始选择，为1(collude).接着再return opponentMove[-1].下方是这个工具函数的代码

def play(opponentMove):
  if opponentMove == 'start':
    return 1
  opponentHistory = []
  opponentHistory.append(opponentMove)
  if opponentHistory:
    return opponentHistory[-1]
  else:
    return 1

def name():
  return 'titForTat'

值得一提的是有一个mystrategy.py,你可以编写自己的策略，与其它策略博弈。如：

def play(opponentMove):
  if opponentMove == 'start':
    return 1
  opponentHistory = []
  opponentHistory.append(opponentMove)
  average = sum(opponentHistory)/len(opponentHistory)
  if average >0.7:
    return 1
  else:
    return 0

def name():
  return 'myStrategy'

代码并不难懂，祝大家玩得开心！

ct268gh · 发表于 2022-9-2 11:16

都抵赖（1年，1年）一方坦白（10年，0）或（0，10年），都坦白（8年，8年）
虽然都抵赖是最好的结果，但坦白比不坦白好。
如果有律师帮助或者害怕被报复，肯定都抵赖。

lansemeiying · 发表于 2022-9-2 07:06

还是可以的啊

逐雅斋 · 发表于 2022-9-2 07:11

这个不错，感谢分享！

JaqenZe · 发表于 2022-9-2 07:48

博弈论基本模型智能解题

canso123 · 发表于 2022-9-2 07:50

学习了，感谢你的分享

Lunarcsy · 发表于 2022-9-2 07:55

确实可以

sunshinepoch7 · 发表于 2022-9-2 08:03

博弈论多谢楼主分享! 学到了

ErXing · 发表于 2022-9-2 08:25

感谢分享

wqipk · 发表于 2022-9-2 08:48

感谢楼主分享

cheny12120 · 发表于 2022-9-2 09:32

突然有灵感了，我也吸取一些你的经验，自己也弄个玩

帐号		自动登录	找回密码
密码			注册[Register]

[Python 转载] 使用python从零实现囚徒博弈

免费评分