LC's Blog

May 2016 Blog Posts

CentOS5.8下安装配置VPN服务器，这里centos 5.8配置不适用centos 6.5

LC 2016-05-26 Networking

CentOS5.8下安装配置VPN服务器，这里centos 5.8配置不适用centos 6.5

一、检查服务器系统环境是否支持安装PPTP vpn
1、检查系统内核是否支持MPPE补丁
modprobe ppp-compress-18 && echo success
显示success说明系统支持MPPE补丁，如果不支持，需要先安装kernel-devel
yum install kernel-devel
2、检查系统是否开启TUN/TAP支持
cat /dev/net/tun
如果这条指令显示结果为下面的文本，则表明通过：
cat: /dev/net/tun: File descriptor in badstate
3、检查系统是否开启ppp支持
cat /dev/ppp
如果这条指令显示结果为下面的文本，则表明通过：
cat: /dev/ppp: No such device or address
上面三条必须同时满足，否则不能安装pptp vpn

二、先检查系统多少位
具体操作：
cat /etc/issue
cat /proc/version
uname -a
对应的32位和64位安装包下载
1、ppp #安装pptpd需要此软件包
http://poptop.sourceforge.net/yum/stable/packages/ppp-2.4.4-14.1.rhel5.x86_64.rpm
http://poptop.sourceforge.net/yum/stable/packages/ppp-2.4.4-14.1.rhel5.i386.rpm
2、pptpd #目前最新版本
http://poptop.sourceforge.net/yum/stable/packages/pptpd-1.4.0-1.rhel5.x86_64.rpm
http://poptop.sourceforge.net/yum/stable/packages/pptpd-1.4.0-1.rhel5.i386.rpm
下载好之后上传到/usr/local/src目录

三、安装pptp
Cd /usr/local/src
rpm -ivh ppp-2.4.4-14.1.rhel5.x86_64.rpm #安装ppp
rpm -ivh pptpd-1.4.0-1.rhel5.x86_64.rpm #安装pptp

四、配置pptp
1、vi /etc/ppp/options.pptpd #编辑，添加、修改以下参数
name pptpd
refuse-pap
refuse-chap
refuse-mschap
require-mschap-v2
require-mppe-128
proxyarp
lock
nobsdcomp
novj
novjccomp
nologfd
ms-dns 114.114.114.114 #添加主DNS服务器地址
ms-dns 8.8.8.8 #添加备DNS服务器地址
:wq! #保存，退出
2、vi /etc/ppp/chap-secrets #设置pptp拨号用户和密码（可以设置多个用户，每行一个）
# client   server secret IPaddresses
test      pptpd   123456       *
lc      pptpd   1234        *
格式：用户名 pptpd 密码 *
其中*表示为客户端自动分配IP地址
:wq! #保存，退出
3、vi /etc/pptpd.conf #设置pptp服务器IP地址，设置vpn拨入客户端ip地址池
option /etc/ppp/options.pptpd
logwtmp
localip 10.10.10.1 #设置pptp虚拟拨号服务器IP地址（注意：不是服务器本身的IP地址）
remoteip 10.10.10.10-100 #为拨入vpn的用户动态分配10.10.10.10～10.10.10.100之间的IP地址
:wq! #保存，退出
/sbin/service pptpd start #启动pptp
/etc/init.d/pptpd stop #关闭
service pptpd restart #重启
chkconfig pptpd on #设置开机启动

五、开启服务器系统路由模式，支持包转发
vi /etc/sysctl.conf #编辑
net.ipv4.ip_forward = 1 #设置为1
#net.ipv4.tcp_syncookies = 1 #注释掉
:wq! #保存，退出
/sbin/sysctl -p #使设置立刻生效

到这里，客户端使用PPTP拨号成功后

客户端会多出一个网络接口：
PPP adapter 239:

        Connection-specific DNS Suffix . :
        IP Address. . . . . . . . . . . . : 10.10.10.10
        Subnet Mask . . . . . . . . . . . : 255.255.255.255
        Default Gateway . . . . . . . . . : 10.10.10.10

服务器上也会多一个接口：
ppp0      Link encap:Point-to-Point Protocol
          inet addr:10.10.10.1 P-t-P:10.10.10.10 Mask:255.255.255.255
          UP POINTOPOINT RUNNING NOARP MULTICAST MTU:1396 Metric:1
          RX packets:364 errors:0 dropped:0 overruns:0 frame:0
          TX packets:8 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:3
          RX bytes:26216 (25.6 KiB) TX bytes:104 (104.0 b)

六、设置防火墙转发规则
yum install iptables #安装防火墙
service iptables start #启动防火墙
iptables -t nat -A POSTROUTING -s 10.10.10.0/255.255.255.0 -j SNAT --to-source 192.168.1.200 #添加规则
#解释：设置客户端分配的公网ip, 就是使用vpn访问网站的时候体现出来的ip
iptables -A FORWARD -p tcp --syn -s 10.10.10.0/255.255.255.0 -j TCPMSS --set-mss 1356 #添加规则
/etc/init.d/iptables save #保存防火墙设置

七、开启pptp服务端口tcp 1723，设置vpn拨入客户端ip地址池10.10.10.0/255.255.255.0通过防火墙
vi /etc/sysconfig/iptables #编辑，添加以下代码
-A RH-Firewall-1-INPUT -p tcp -m state--state NEW -m tcp --dport 1723 -j ACCEPT
-A RH-Firewall-1-INPUT -s 10.10.10.0/255.255.255.0-j ACCEPT
:wq! #保存，退出
备注：
#192.168.1.200为服务器IP地址
#10.10.10.0/255.255.255.0是第四步中设置的pptp虚拟拨号服务器IP地址段
/etc/init.d/iptables restart #重启防火墙
chkconfig iptables on #设置开机启动

cat /etc/sysconfig/iptables #查看防火墙配置文件
# Generated by iptables-save v1.3.5 on WedDec 11 20:21:08 2013
*nat
: PREROUTING ACCEPT [60:4680]
: POSTROUTING ACCEPT [4:258]
:OUTPUT ACCEPT [4:258]
-A POSTROUTING -s 192.168.1.0/255.255.255.0-j SNAT --to-source 192.168.1.200
COMMIT
# Completed on Wed Dec 11 20:21:08 2013
# Generated by iptables-save v1.3.5 on WedDec 11 20:21:08 2013
*filter
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [94:16159]
:RH-Firewall-1-INPUT - [0:0]
-A INPUT -j RH-Firewall-1-INPUT
-A FORWARD -j RH-Firewall-1-INPUT
-A FORWARD -s 10.10.10.0/255.255.255.0 -ptcp -m tcp --tcp-flags FIN,SYN,RST,ACK SYN -j TCPMSS --set-mss 1356
-A RH-Firewall-1-INPUT -i lo -j ACCEPT
-A RH-Firewall-1-INPUT -p icmp -m icmp--icmp-type any -j ACCEPT
-A RH-Firewall-1-INPUT -p esp -j ACCEPT
-A RH-Firewall-1-INPUT -p ah -j ACCEPT
-A RH-Firewall-1-INPUT -d 192.168.1.200 -pudp -m udp --dport 5353 -j ACCEPT
-A RH-Firewall-1-INPUT -p udp -m udp--dport 631 -j ACCEPT
-A RH-Firewall-1-INPUT -p tcp -m tcp --dport631 -j ACCEPT
-A RH-Firewall-1-INPUT -m state --stateRELATED,ESTABLISHED -j ACCEPT
-A RH-Firewall-1-INPUT -p tcp -m state--state NEW -m tcp --dport 22 -j ACCEPT
-A RH-Firewall-1-INPUT -p tcp -m state--state NEW -m tcp --dport 1723 -j ACCEPT
-A RH-Firewall-1-INPUT -s 10.10.10.0/255.255.255.0-j ACCEPT
-A RH-Firewall-1-INPUT -j REJECT--reject-with icmp-host-prohibited
COMMIT
# Completed on Wed Dec 11 20:21:08 2013

八、设置开机自动建立ppp设备节点（系统重新启动后有可能会丢失此文件，导致pptp客户端拨号出现错误619）
vi /etc/rc.d/rc.local #编辑
mknod /dev/ppp c 108 0 #在文件最后添加此行代码
:wq! #保存，退出
至此，CentOS下PPTP VPN服务器搭建完成，可以在Windows客户端建立VPN连接，输入服务器外网IP地址，用上面配置的账号、密码进行连接。

CentOS5.8安装配置PPPoE服务器以及问题总结

LC 2016-05-24 Networking

CentOS5.8安装配置PPPoE服务器以及问题总结

rp-pppoe是一个集成了拨号客户端和服务端的解决方案

1.下载

https://www.roaringpenguin.com/products/pppoe

2.编译安装：

将rp-pppoe-3.11.tar.gz 放到 /opt目录下

cd /opt
tar -zxvf rp-pppoe-3.11.tar.gz
cd /opt/rp-pppoe-3.11/src
./configure

2.1这时如果出现错误“no acceptable C compiler found in $PATH"
则运行 yum install gcc 安装GCC软件套件

再次运行./configure
成功
make
make install

3.接下来修改PPPoE配置文件

vi /etc/ppp/pppoe.conf
修改以下几个参数值

ETH=eth1
USER=rp-pppoe
LINUX_PLUGIN=/etc/ppp/plugins/rp-pppoe.so

vi /etc/ppp/pppoe-server-options

# PPP options for the PPPoE server
# LIC: GPL
require-pap
require-chap #added by Liping
login
lcp-echo-interval 10
lcp-echo-failure 2
#following added by Liping
logfile /var/log/pppoe.log
ms-dns 114.114.114.114
defaultroute

vi /etc/ppp/chap-secrets

# Secrets for authentication using CHAP
# client server secret IP addresses
####### redhat-config-network will overwrite this part!!! (begin) ##########
####### redhat-config-network will overwrite this part!!! (end) ############
rp-pppoe * rp-pppoe *
表示用户名和密码都是rp-pppoe

vi /etc/ppp/options

#lock
local

4.添加防火墙规则，做nat转换
设置iptables的IP策略
iptables -A POSTROUTING -t nat -s 10.0.0.0/24 -j MASQUERADE
注：-s 参数后面的网络地址是一会儿将要开启的pppoe-server设置的网络地址，这个地址可以根据需要自己设定，只要iptables和pppoe-server匹配就好。

iptables -A FORWARD -p tcp --syn -s 10.0.0.0/24 -j TCPMSS --set-mss 1256

echo 1 > /proc/sys/net/ipv4/ip_forward

sysctl -w net.ipv4.ip_forward=1

第一条是添加nat，转换来自10.0.0.0/24网段的ip
第二天是修改mtu，根据自身需求改了
第三条打开转发
第四条是修改转发文件

5.运行程序： pppoe-server -I eth1 -L 10.0.0.1 -R 10.0.0.2 -N 20

注：

-I 参数用于指定监听哪个网络端口。可以使用ifconfig命令查看当前工作的端口名称。
-L 参数用于指定在一个PPP连接中，PPPoE服务器的IP地址。由于本人架设的以太网网络地址为10.0.0.0/24，所以就使用网络地址的第一个IP地址作为服务器的地址了。
-R 参数用于指定当有客户连接到服务器上时，从哪个IP地址开始分配给客户。
-N 参数用于指定至多可以有多少个客户同时连接到本服务器上。
还有一些其他的参数你可以参考一下，直接man pppoe-server自己看了，每个参数都有默认值

成功完成上述步骤，就完成了搭建pppoe服务器端了，下面就是windows验证了

6.windows验证

我的是windows xp系统，win7或者路由器应该都没问题
不过要改下验证方式，在“属性”--》“安全”--》“高级--设置”--》将“数据加密”改为“可选加密”，然后勾选上chap或者pap验证就可以了
输入用户名密码，应该就可以连接到linux下的PPPoE Server上并可以正常上网了

拨号成功后，客户端会有一个ppp设备
PPP adapter 123:

Connection-specific DNS Suffix . :
IP Address. . . . . . . . . . . . : 10.0.0.7
Subnet Mask . . . . . . . . . . . : 255.255.255.255
Default Gateway . . . . . . . . . : 10.0.0.7

服务器端也会多出一个ppp0设备
ppp0      Link encap:Point-to-Point Protocol
          inet addr:10.0.0.1 P-t-P:10.0.0.7 Mask:255.255.255.255
          UP POINTOPOINT RUNNING NOARP MULTICAST MTU:1480 Metric:1
          RX packets:45 errors:0 dropped:0 overruns:0 frame:0
          TX packets:22 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:3
          RX bytes:4033 (3.9 KiB) TX bytes:1340 (1.3 KiB)

7.过程图解

PPPoE协议主要有以下几个过程：

client       server
     ---PADI--->
     <--PADO----
     ---PADR--->
     <--PADS----

解释

1.PADI
PPPoE 客户端发送主动发现初始包（PPPoE Active Discovery Initiation，PADI），以太头中的目的地址是以太广播地址 FF:FF:FF:FF:FF:FF，PPPOE 头中的 CODE 为 0x09，SESSION_ID 值必须为 0，负载部分必须只包含一个 Service-Name 类型的 TAG 表示请求的服务类型，另外可以包含其他 TAG，整个 PPPOE 包不能超过 1484 字节；

2.PADO
服务器端 PPPoE 进程在网络接口侦听到 PADI 包后，发送主动发现提议包(PPPoEActive Discovery Offer, PADO)，用来回应客户机的 PADI 包，以太头中的目的地址是客户机的MAC 地址，PPPOE 头中的 CODE 为 0x07， SESSION_ID 值必须为 0，负载部分必须包含一个 AC-Name 类型的 TAG，用来指示本 AC 的名称，一个在 PADI 包中指定的Service- Name 的 TAG，另外可以包含其他 Service-Name 的 TAG。如果 AC 不对该客户机提供服务，AC 就不回应 PADO 包。

3.PADR
PPPoE 客户端收到 PADO 包后，在 PADO 包中选择一个（可能有多个 PPPoE 服务器，通常选取最快的一个）发送主动发现请求包（PPPoEActive Discovery Request，PADR），以太头中的目的地址是所选取的 PADO 包的源以太头地址（即 PPPoE 服务器的 MAC 地址），PPPOE 头中的 CODE 为 0x19，SESSION_ID 值必须为 0，负载部分必须只包含一个 Service-Name 类型的 TAG 表示请求的服务类型，另外可以包含其他 TAG。。当主机在指定的时间内没有接收到PADO，它应该重新发送它的PADI分组，并且加倍等待时间，这个过程会被重复期望的次数。

4.PADS
MAC 地址匹配的 PPPoE 服务器收到 PADR 包后，发送主动发现会话确认包(PPPoE Active Discovery Session-confirmation, PADS)，将产生一个 SEESSION_ID 值用来标志本次 PPP 会话，以 PADR 包方式发送给客户机。以太头中的目的地址是客户机的 MAC 地址，PPPOE 头中的 CODE 为 0x65，SESSION_ID 值必须为所生成的那个SESSION_ID，负载部分必须只包含一个 Service-Name 类型的 TAG，表示该服务类型被 PPPoE 服务器接受，另外可以包含其他 TAG。如果 PPPoE 服务器不接受 PADR 中的Server-Name，PADS 中则包含一个 Service-Name -Error 类型的 TAG,这时 SESSION_ID 设置为 0。当主机收到PADS分组确认后，双方就进入PPP会话阶段。

5.PADT
PPPoE还有一个PADT分组，它可以在会话建立后的任何时候发送，来终止PPPoE会话，也就是会话释放。它可以由主机或者接入集中器发送。当对方接收到一个PADT分组，就不再允许使用这个会话来发送PPP业务。PADT分组不需要任何标签，其CODE字段值为0×a7，SESSION-ID字段值为需要终止的PPP会话的会话标识号码。在发送或接收PADT后，即使正常的PPP终止分组也不必发送。PPP对端应该使用PPP协议自身来终止PPPoE会话，但是当PPP不能使用时，可以使用PADT。

8.PPP Session 阶段:

当客户端与服务器端远成发现阶段之后，即进入会话阶段，在 PPP 会话阶段，PPP 包被封装在 PPPOE 以太帧中，以太包目的地址都是单一的，以太协议为 0x8864，PPPOE 头的CODE必须为0，SESSION_ID必须一直为发现阶段协商出的SEESION_ID值，PPPOE的负载是整个 PPP 包，PPP 包前是两字节的 PPP 协议 ID 值。
在 Session 阶段，主机或服务器任何一方都可发 PADT（PPPoE Active Discovery Terminate）报文通知对方结束 Session。

PPPoE 的身份验证发生在会话（PPP Session）阶段。可以这样更解，rp-pppoe 包负责Discovery 及会话终止 PADT，ppp 包负责会话阶段的数据传输。

9.问题分析

连接时错误

Windows拨号连接显示错误651

可能的原因是没有正确打开服务器。通过WireShark抓包可以看到，Windows在发送了4次PADI报文而没有得到PADO回复后，会报告引错误。

因此，可能是在 pppoe-server 打开时没有指定到正确的网卡。也可能是使用虚拟机上网时没有设置好上网模式，如果没有使用桥接模式上网而是使用了NAT模式，则也可能遇到此问题。

同时，如果没有pppoe-server-options 文件或者该文件没有 auth 和 require-chap 选项设置的话，也会出现这个问题。

同时，该问题也可能是因为在Windows拨号连接时在属性中指定了一个服务器，和linux下开启的PPPoE Server名称不同造成的。

pppoe-server中，-S参数用于指定服务器名称。

Windows拨号连接显示错误734

错误内容为

PPP链接控制协议终止
这个原因可能是在 pppoe-server-options 文件中加入了 login 选项。如果设置了该选项，则登陆的用户名必需和linux系统下的一个用户名相同，否则会出现这个错误。

Windows拨号连接显示错误628

错误内容为

在连接完成前，连接被远程计算机终止
通过WireShark抓包分析，可以看到在原理分析的四个阶段完成后，立刻收到一个PADT报文。PADT报文的内容描述为：

Generic-Error: RP-PPPOE: child pppd process terminated
这个描述十分有误导性，网上甚至有人说这个需要将pppoe编译进内核，以便可以使用pppoe-server命令的-k参数。后来我发现终究是配置问题，出现了配置错误，一般是出现了程序无法识别的配置。这个错误很麻烦，应当结合刚刚配置的logfile以及自己注释掉一些不确定的命令来排查错误。

无法识别用户名和密码

很可能是用户名和密码输入错误，也可能是设置错误。注意，设置用户名和密码时，两个星号是不能省略的。

上网错误

此类错误是Windows可以拨号连接成功，但是无法上网。主要是在linux下使用 tcpdump 或者 wireshark 程序进行排查。

使用命令：

tcpdump -i wlan0 host 10.10.10.1002
可以看到，只有从主机10.10.10.100发出的报文，但是没有发送给10.10.10.100的报文。

出现这个错误的原因，可能是没有打开IP转发功能。所以当网络上的报文发送给linux主机时，linux主机不会把报文转发给Windows主机，而是由于目的地址不是自己而直接丢弃。

同时也可能是没有设置iptables 的POSTROUTING的nat规则。

python中文乱码

LC 2016-05-13 Python

我们使用Python时，常常会用到交互命令raw_input，但是如果输入中文，没有经过编码处理，有时候程序就不是你想要的结果。

# -*- coding: utf-8 -*-
import os

print u"这里是来自LC\'的问候。" # 这里是来自LC'的问候。

print '=' * 10 # ==========
print u'这将直接执行' + os.getcwd() # 这将直接执行C:\Python27

print "直接打印Unicode" # 鐩存帴鎵撳嵃Unicode
 
print u"直接打印Unicode" # 直接打印Unicode
print u"Unicode转换成GB18030".encode('gb18030') # Unicode转换成GB18030
print "UTF-8中文转换到GB18030, 然后再打印".decode("utf-8").encode('gb18030') # UTF-8中文转换到GB18030, 然后再打印

while  True:
 import sys, locale
 message1 = raw_input(u'提问1:'.encode('gb18030')).decode(sys.stdin.encoding or locale.getpreferredencoding(True))
 if message1 == u"你好":
  print message1
 else:
  print u"我不知道你在说什么"

 message2 = raw_input(u'提问2>'.encode('gb18030'))
 print message2
 if message2 == u"你好":
  print message2
 else:
  print u"我不知道你在说什么"
 #import chardet
 #print chardet.detect(message)

如何把raw_input输入的字符转成utf-8编码格式？

Python中可以使用decode和encode两个方法。先decode把str转成Unicode格式，然后encode把Unicode编成要求的字符串。

decode用法：str -> decode('the_coding_of_str') -> unicode

encode用法：unicode -> encode('the_coding_you_want') -> str

字符串是Unicode经过编码后的字节组成。decode时需要知道输入的编码格式，如果格式不对python会抛出错误

C:\Python27>python bianmaceshi.py
这里是来自LC'的问候。
==========
这将直接执行C:\Python27
鐩存帴鎵撳嵃Unicode  -- 直接打印出来就是这样的乱码，使用后面三种格式。就对了
直接打印Unicode
Unicode转换成GB18030
UTF-8中文转换到GB18030, 然后再打印
提问1:你好
你好
提问2>你好
你好
bianmaceshi.py:26: UnicodeWarning: Unicode equal comparison failed to convert both arguments to Unicode - interpreting them as being un
equal
  if message2 == u"浣犲ソ":
我不知道你在说什么  -- 这里是出错的地方，预期结果是你好，要改成提问1中的格式，就对了
提问1:呃
呃
我不知道你在说什么
提问2>呃
呃
我不知道你在说什么
提问1:

使用python提取mdx中的数据

LC 2016-05-10 Python

使用python提取mdx中的数据

mdx/mdd 是供 MDict、GoldenDict 等加载使用的词库，有些时候我们想要自己动手排版，这就需要解压 mdx/mdd ，提取其中文字、图片、音频等数据，这时候就可以利用 Python 脚本来处理。

我使用的是Python 2.7和readmdict.py、 ripemd128.py、pureSalsa20.py

1.运行 python readmdict.py
返回错误 LZO compression support is not available

2.用编辑器打开readmdict.py文件看了一下，将以下几行都注释掉：

try:
    import lzo
except ImportError:
    lzo = None
    print("LZO compression support is not available")

运行 python readmdict.py

C:\Python27\readmdict>python readmdict.py
Try Brutal Force on Encrypted Key Blocks
Traceback (most recent call last):
  File "readmdict.py", line 649, in <module>
    mdx = MDX(args.filename, args.encoding, args.substyle, args.passcode)
  File "readmdict.py", line 503, in __init__
    MDict.__init__(self, fname, encoding, passcode)
  File "readmdict.py", line 105, in __init__
    self._key_list = self._read_keys_brutal()
  File "readmdict.py", line 399, in _read_keys_brutal
    key_list = self._decode_key_block(key_block_compressed, key_block_info_list)
  File "readmdict.py", line 205, in _decode_key_block
    if lzo is None:
NameError: global name 'lzo' is not defined

将文件恢复

3. >>> import lzo
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: No module named lzo
>>> exit()
发现没有lzo模块

4. pip安装也不成功：

C:\Python27\Scripts>pip install lzo
Collecting lzo
c:\python27\lib\site-packages\pip-7.1.2-py2.7.egg\pip\_vendor\requests\packages\urllib3\util\ssl_.py:90: InsecurePlatformWarning: A tru
e SSLContext object is not available. This prevents urllib3 from configuring SSL appropriately and may cause certain SSL connections to
 fail. For more information, see https://urllib3.readthedocs.org/en/latest/security.html#insecureplatformwarning.
  InsecurePlatformWarning
  Retrying (Retry(total=4, connect=None, read=None, redirect=None)) after connection broken by 'ProtocolError('Connection aborted.', er
ror(10054, ''))': /simple/lzo/
c:\python27\lib\site-packages\pip-7.1.2-py2.7.egg\pip\_vendor\requests\packages\urllib3\util\ssl_.py:90: InsecurePlatformWarning: A tru
e SSLContext object is not available. This prevents urllib3 from configuring SSL appropriately and may cause certain SSL connections to
 fail. For more information, see https://urllib3.readthedocs.org/en/latest/security.html#insecureplatformwarning.
  InsecurePlatformWarning
  Could not find a version that satisfies the requirement lzo (from versions: )
No matching distribution found for lzo
You are using pip version 7.1.2, however version 8.1.1 is available.
You should consider upgrading via the 'python -m pip install --upgrade pip' command.

C:\Python27\Scripts>

5. 在stackoverflow上查找到在windows上无法直接安装， http://stackoverflow.com/questions/7517075/how-can-i-install-python-lzo-1-08
按照回答，先到http://www.lfd.uci.edu/~gohlke/pythonlibs/#python-lzo下载了python_lzo-1.11-cp27-none-win32.whl（因为我的版本是Python 2.7）

然后运行pip install python_lzo-1.11-cp27-none-win32.whl进行安装：

C:\Python27\Scripts>pip install python_lzo-1.11-cp27-none-win32.whl
Processing c:\python27\scripts\python_lzo-1.11-cp27-none-win32.whl
Installing collected packages: python-lzo
Successfully installed python-lzo-1.11

如果是不同版本，会提示：

C:\Python27\Scripts>pip install python_lzo-1.11-cp35-none-win32.whl
python_lzo-1.11-cp35-none-win32.whl is not a supported wheel on this platform.

6.使用Python提取mdx中的数据

C:\Python27\readmdict>python readmdict.py
======== C:/Python27/readmdict/test.mdx ========
  Number of Entries : 34151
  Compact : No
  Compat : No
  GeneratedByEngineVersion : 1.2
  Description : <font color=blue size=5><b>銆婃煰鏋楁柉楂樼礆鑻辫獮瀛哥繏瑭炲吀绗?鐗堛€?/b></font>
<p>杞夋彌閬斾汉锛氬翱鐗?(-_-)鈥欌€?
<br>杞夋彌鏃ユ湡锛?9骞?鏈?0鏃?
<br>瑭炲吀瀛楁暩锛?4151
<p><b><font color=red><瑭炲吀鍏у></b></font >
<br>銆€銆€Collins Cobuild鏄瓧鍏歌垏鑻辫獮瀛哥繏鏇哥殑鐭ュ悕鍝佺墝锛屻€奀ollins Cobuild Advanced Learner&apos;s English Dictionary銆嬩
竴鐩存槸涓栫晫鍚勫湴璁€鑰呭績涓殑鏈€浣冲瓧鍏镐箣涓€锛岀敋鑷宠璀界偤銆岀従浠ｈ嫳瑾炴渶鍏ㄩ潰銆佹瑠濞佺殑鍤皫銆嶃€?澶氬湅鏆㈤姺鏇搞
€婂崈钀垾瀛歌嫳瑾炪€嬩腑涔熺壒鍒ユ帹钖︺€奀ollins Cobuild Advanced Learner&apos;s English Dictionary銆嬨€?
<p><font color=red>鏈经鍏歌綁鑷猯ingos锛屾湰杈吀渚涘缈掕嫳瑾炵殑鏈嬪弸浣跨敤锛?
<p><font color=grape>杞夋彌绱旂偤鑸堣叮锛?/font>
<p><font color=green>鏈鍏哥敱婢抽杸鏈嬪弸鍒朵綔锛?/font>


  RequiredEngineVersion : 1.2
  Format : Html
  Encrypted : No
  Encoding : UTF-16
  StyleSheet :
  Title : Title (No HTML code allowed)
  KeyCaseSensitive : No
  DataSourceFormat : 107

C:\Python27\readmdict>

7. 在mdx 所在目录下，出现了 test.txt

附：

readmdict.py

#!/usr/bin/env python
# -*- coding: utf-8 -*-
# readmdict.py
# Octopus MDict Dictionary File (.mdx) and Resource File (.mdd) Analyser
#
# Copyright (C) 2012, 2013, 2015 Xiaoqiang Wang <xiaoqiangwang AT gmail DOT com>
#
# This program is a free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, version 3 of the License.
#
# You can get a copy of GNU General Public License along this program
# But you can always get it from http://www.gnu.org/licenses/gpl.txt
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.

from struct import pack, unpack
from io import BytesIO
import re
import sys

from ripemd128 import ripemd128
from pureSalsa20 import Salsa20

# zlib compression is used for engine version >=2.0
import zlib
# LZO compression is used for engine version < 2.0

try:
    import lzo
except ImportError:
    lzo = None
    print("LZO compression support is not available")

# 2x3 compatible
if sys.hexversion >= 0x03000000:
    unicode = str


def _unescape_entities(text):
    """
    unescape offending tags < > " &
    """
    text = text.replace(b'&lt;', b'<')
    text = text.replace(b'&gt;', b'>')
    text = text.replace(b'&quot;', b'"')
    text = text.replace(b'&amp;', b'&')
    return text


def _fast_decrypt(data, key):
    b = bytearray(data)
    key = bytearray(key)
    previous = 0x36
    for i in range(len(b)):
        t = (b[i] >> 4 | b[i] << 4) & 0xff
        t = t ^ previous ^ (i & 0xff) ^ key[i % len(key)]
        previous = b[i]
        b[i] = t
    return bytes(b)


def _mdx_decrypt(comp_block):
    key = ripemd128(comp_block[4:8] + pack(b'<L', 0x3695))
    return comp_block[0:8] + _fast_decrypt(comp_block[8:], key)


def _salsa_decrypt(ciphertext, encrypt_key):
    s20 = Salsa20(key=encrypt_key, IV=b"\x00"*8, rounds=8)
    return s20.encryptBytes(ciphertext)


def _decrypt_regcode_by_deviceid(reg_code, deviceid):
    deviceid_digest = ripemd128(deviceid)
    s20 = Salsa20(key=deviceid_digest, IV=b"\x00"*8, rounds=8)
    encrypt_key = s20.encryptBytes(reg_code)
    return encrypt_key


def _decrypt_regcode_by_email(reg_code, email):
    email_digest = ripemd128(email.decode().encode('utf-16-le'))
    s20 = Salsa20(key=email_digest, IV=b"\x00"*8, rounds=8)
    encrypt_key = s20.encryptBytes(reg_code)
    return encrypt_key


class MDict(object):
    """
    Base class which reads in header and key block.
    It has no public methods and serves only as code sharing base class.
    """
    def __init__(self, fname, encoding='', passcode=None):
        self._fname = fname
        self._encoding = encoding.upper()
        self._passcode = passcode

        self.header = self._read_header()
        try:
            self._key_list = self._read_keys()
        except:
            print("Try Brutal Force on Encrypted Key Blocks")
            self._key_list = self._read_keys_brutal()

    def __len__(self):
        return self._num_entries

    def __iter__(self):
        return self.keys()

    def keys(self):
        """
        Return an iterator over dictionary keys.
        """
        return (key_value for key_id, key_value in self._key_list)

    def _read_number(self, f):
        return unpack(self._number_format, f.read(self._number_width))[0]

    def _parse_header(self, header):
        """
        extract attributes from <Dict attr="value" ... >
        """
        taglist = re.findall(b'(\w+)="(.*?)"', header, re.DOTALL)
        tagdict = {}
        for key, value in taglist:
            tagdict[key] = _unescape_entities(value)
        return tagdict

    def _decode_key_block_info(self, key_block_info_compressed):
        if self._version >= 2:
            # zlib compression
            assert(key_block_info_compressed[:4] == b'\x02\x00\x00\x00')
            # decrypt if needed
            if self._encrypt & 0x02:
                key_block_info_compressed = _mdx_decrypt(key_block_info_compressed)
            # decompress
            key_block_info = zlib.decompress(key_block_info_compressed[8:])
            # adler checksum
            adler32 = unpack('>I', key_block_info_compressed[4:8])[0]
            assert(adler32 == zlib.adler32(key_block_info) & 0xffffffff)
        else:
            # no compression
            key_block_info = key_block_info_compressed
        # decode
        key_block_info_list = []
        num_entries = 0
        i = 0
        if self._version >= 2:
            byte_format = '>H'
            byte_width = 2
            text_term = 1
        else:
            byte_format = '>B'
            byte_width = 1
            text_term = 0

        while i < len(key_block_info):
            # number of entries in current key block
            num_entries += unpack(self._number_format, key_block_info[i:i+self._number_width])[0]
            i += self._number_width
            # text head size
            text_head_size = unpack(byte_format, key_block_info[i:i+byte_width])[0]
            i += byte_width
            # text head
            if self._encoding != 'UTF-16':
                i += text_head_size + text_term
            else:
                i += (text_head_size + text_term) * 2
            # text tail size
            text_tail_size = unpack(byte_format, key_block_info[i:i+byte_width])[0]
            i += byte_width
            # text tail
            if self._encoding != 'UTF-16':
                i += text_tail_size + text_term
            else:
                i += (text_tail_size + text_term) * 2
            # key block compressed size
            key_block_compressed_size = unpack(self._number_format, key_block_info[i:i+self._number_width])[0]
            i += self._number_width
            # key block decompressed size
            key_block_decompressed_size = unpack(self._number_format, key_block_info[i:i+self._number_width])[0]
            i += self._number_width
            key_block_info_list += [(key_block_compressed_size, key_block_decompressed_size)]

        #assert(num_entries == self._num_entries)

        return key_block_info_list

    def _decode_key_block(self, key_block_compressed, key_block_info_list):
        key_list = []
        i = 0
        for compressed_size, decompressed_size in key_block_info_list:
            start = i
            end = i + compressed_size
            # 4 bytes : compression type
            key_block_type = key_block_compressed[start:start+4]
            # 4 bytes : adler checksum of decompressed key block
            adler32 = unpack('>I', key_block_compressed[start+4:start+8])[0]
            if key_block_type == b'\x00\x00\x00\x00':
                key_block = key_block_compressed[start+8:end]
            elif key_block_type == b'\x01\x00\x00\x00':
                if lzo is None:
                    print("LZO compression is not supported")
                    break
                # decompress key block
                header = b'\xf0' + pack('>I', decompressed_size)
                key_block = lzo.decompress(header + key_block_compressed[start+8:end])
            elif key_block_type == b'\x02\x00\x00\x00':
                # decompress key block
                key_block = zlib.decompress(key_block_compressed[start+8:end])
            # extract one single key block into a key list
            key_list += self._split_key_block(key_block)
            # notice that adler32 returns signed value
            assert(adler32 == zlib.adler32(key_block) & 0xffffffff)

            i += compressed_size
        return key_list

    def _split_key_block(self, key_block):
        key_list = []
        key_start_index = 0
        while key_start_index < len(key_block):
            # the corresponding record's offset in record block
            key_id = unpack(self._number_format, key_block[key_start_index:key_start_index+self._number_width])[0]
            # key text ends with '\x00'
            if self._encoding == 'UTF-16':
                delimiter = b'\x00\x00'
                width = 2
            else:
                delimiter = b'\x00'
                width = 1
            i = key_start_index + self._number_width
            while i < len(key_block):
                if key_block[i:i+width] == delimiter:
                    key_end_index = i
                    break
                i += width
            key_text = key_block[key_start_index+self._number_width:key_end_index]\
                .decode(self._encoding, errors='ignore').encode('utf-8').strip()
            key_start_index = key_end_index + width
            key_list += [(key_id, key_text)]
        return key_list

    def _read_header(self):
        f = open(self._fname, 'rb')
        # number of bytes of header text
        header_bytes_size = unpack('>I', f.read(4))[0]
        header_bytes = f.read(header_bytes_size)
        # 4 bytes: adler32 checksum of header, in little endian
        adler32 = unpack('<I', f.read(4))[0]
        assert(adler32 == zlib.adler32(header_bytes) & 0xffffffff)
        # mark down key block offset
        self._key_block_offset = f.tell()
        f.close()

        # header text in utf-16 encoding ending with '\x00\x00'
        header_text = header_bytes[:-2].decode('utf-16').encode('utf-8')
        header_tag = self._parse_header(header_text)
        if not self._encoding:
            encoding = header_tag[b'Encoding']
            if sys.hexversion >= 0x03000000:
                encoding = encoding.decode('utf-8')
            # GB18030 > GBK > GB2312
            if encoding in ['GBK', 'GB2312']:
                encoding = 'GB18030'
            self._encoding = encoding
        # encryption flag
        #   0x00 - no encryption
        #   0x01 - encrypt record block
        #   0x02 - encrypt key info block
        if header_tag[b'Encrypted'] == b'No':
            self._encrypt = 0
        elif header_tag[b'Encrypted'] == b'Yes':
            self._encrypt = 1
        else:
            self._encrypt = int(header_tag[b'Encrypted'])

        # stylesheet attribute if present takes form of:
        #   style_number # 1-255
        #   style_begin  # or ''
        #   style_end    # or ''
        # store stylesheet in dict in the form of
        # {'number' : ('style_begin', 'style_end')}
        self._stylesheet = {}
        if header_tag.get('StyleSheet'):
            lines = header_tag['StyleSheet'].splitlines()
            for i in range(0, len(lines), 3):
                self._stylesheet[lines[i]] = (lines[i+1], lines[i+2])

        # before version 2.0, number is 4 bytes integer
        # version 2.0 and above uses 8 bytes
        self._version = float(header_tag[b'GeneratedByEngineVersion'])
        if self._version < 2.0:
            self._number_width = 4
            self._number_format = '>I'
        else:
            self._number_width = 8
            self._number_format = '>Q'

        return header_tag

    def _read_keys(self):
        f = open(self._fname, 'rb')
        f.seek(self._key_block_offset)

        # the following numbers could be encrypted
        if self._version >= 2.0:
            num_bytes = 8 * 5
        else:
            num_bytes = 4 * 4
        block = f.read(num_bytes)

        if self._encrypt & 1:
            if self._passcode is None:
                raise RuntimeError('user identification is needed to read encrypted file')
            regcode, userid = self._passcode
            if isinstance(userid, unicode):
                userid = userid.encode('utf8')
            if self.header[b'RegisterBy'] == b'EMail':
                encrypted_key = _decrypt_regcode_by_email(regcode, userid)
            else:
                encrypted_key = _decrypt_regcode_by_deviceid(regcode, userid)
            block = _salsa_decrypt(block, encrypted_key)

        # decode this block
        sf = BytesIO(block)
        # number of key blocks
        num_key_blocks = self._read_number(sf)
        # number of entries
        self._num_entries = self._read_number(sf)
        # number of bytes of key block info after decompression
        if self._version >= 2.0:
            key_block_info_decomp_size = self._read_number(sf)
        # number of bytes of key block info
        key_block_info_size = self._read_number(sf)
        # number of bytes of key block
        key_block_size = self._read_number(sf)

        # 4 bytes: adler checksum of previous 5 numbers
        if self._version >= 2.0:
            adler32 = unpack('>I', f.read(4))[0]
            assert adler32 == (zlib.adler32(block) & 0xffffffff)

        # read key block info, which indicates key block's compressed and decompressed size
        key_block_info = f.read(key_block_info_size)
        key_block_info_list = self._decode_key_block_info(key_block_info)
        assert(num_key_blocks == len(key_block_info_list))

        # read key block
        key_block_compressed = f.read(key_block_size)
        # extract key block
        key_list = self._decode_key_block(key_block_compressed, key_block_info_list)

        self._record_block_offset = f.tell()
        f.close()

        return key_list

    def _read_keys_brutal(self):
        f = open(self._fname, 'rb')
        f.seek(self._key_block_offset)

        # the following numbers could be encrypted, disregard them!
        if self._version >= 2.0:
            num_bytes = 8 * 5 + 4
            key_block_type = b'\x02\x00\x00\x00'
        else:
            num_bytes = 4 * 4
            key_block_type = b'\x01\x00\x00\x00'
        block = f.read(num_bytes)

        # key block info
        # 4 bytes '\x02\x00\x00\x00'
        # 4 bytes adler32 checksum
        # unknown number of bytes follows until '\x02\x00\x00\x00' which marks the beginning of key block
        key_block_info = f.read(8)
        if self._version >= 2.0:
            assert key_block_info[:4] == b'\x02\x00\x00\x00'
        while True:
            fpos = f.tell()
            t = f.read(1024)
            index = t.find(key_block_type)
            if index != -1:
                key_block_info += t[:index]
                f.seek(fpos + index)
                break
            else:
                key_block_info += t

        key_block_info_list = self._decode_key_block_info(key_block_info)
        key_block_size = sum(list(zip(*key_block_info_list))[0])

        # read key block
        key_block_compressed = f.read(key_block_size)
        # extract key block
        key_list = self._decode_key_block(key_block_compressed, key_block_info_list)

        self._record_block_offset = f.tell()
        f.close()

        self._num_entries = len(key_list)
        return key_list


class MDD(MDict):
    """
    MDict resource file format (*.MDD) reader.
    >>> mdd = MDD('example.mdd')
    >>> len(mdd)
    208
    >>> for filename,content in mdd.items():
    ... print filename, content[:10]
    """
    def __init__(self, fname, passcode=None):
        MDict.__init__(self, fname, encoding='UTF-16', passcode=passcode)

    def items(self):
        """Return a generator which in turn produce tuples in the form of (filename, content)
        """
        return self._decode_record_block()

    def _decode_record_block(self):
        f = open(self._fname, 'rb')
        f.seek(self._record_block_offset)

        num_record_blocks = self._read_number(f)
        num_entries = self._read_number(f)
        assert(num_entries == self._num_entries)
        record_block_info_size = self._read_number(f)
        record_block_size = self._read_number(f)

        # record block info section
        record_block_info_list = []
        size_counter = 0
        for i in range(num_record_blocks):
            compressed_size = self._read_number(f)
            decompressed_size = self._read_number(f)
            record_block_info_list += [(compressed_size, decompressed_size)]
            size_counter += self._number_width * 2
        assert(size_counter == record_block_info_size)

        # actual record block
        offset = 0
        i = 0
        size_counter = 0
        for compressed_size, decompressed_size in record_block_info_list:
            record_block_compressed = f.read(compressed_size)
            # 4 bytes: compression type
            record_block_type = record_block_compressed[:4]
            # 4 bytes: adler32 checksum of decompressed record block
            adler32 = unpack('>I', record_block_compressed[4:8])[0]
            if record_block_type == b'\x00\x00\x00\x00':
                record_block = record_block_compressed[8:]
            elif record_block_type == b'\x01\x00\x00\x00':
                if lzo is None:
                    print("LZO compression is not supported")
                    break
                # decompress
                header = '\xf0' + pack('>I', decompressed_size)
                record_block = lzo.decompress(header + record_block_compressed[8:])
            elif record_block_type == b'\x02\x00\x00\x00':
                # decompress
                record_block = zlib.decompress(record_block_compressed[8:])

            # notice that adler32 return signed value
            assert(adler32 == zlib.adler32(record_block) & 0xffffffff)

            assert(len(record_block) == decompressed_size)
            # split record block according to the offset info from key block
            while i < len(self._key_list):
                record_start, key_text = self._key_list[i]
                # reach the end of current record block
                if record_start - offset >= len(record_block):
                    break
                # record end index
                if i < len(self._key_list)-1:
                    record_end = self._key_list[i+1][0]
                else:
                    record_end = len(record_block) + offset
                i += 1
                data = record_block[record_start-offset:record_end-offset]
                yield key_text, data
            offset += len(record_block)
            size_counter += compressed_size
        assert(size_counter == record_block_size)

        f.close()


class MDX(MDict):
    """
    MDict dictionary file format (*.MDD) reader.
    >>> mdx = MDX('example.mdx')
    >>> len(mdx)
    42481
    >>> for key,value in mdx.items():
    ... print key, value[:10]
    """
    def __init__(self, fname, encoding='', substyle=False, passcode=None):
        MDict.__init__(self, fname, encoding, passcode)
        self._substyle = substyle

    def items(self):
        """Return a generator which in turn produce tuples in the form of (key, value)
        """
        return self._decode_record_block()

    def _substitute_stylesheet(self, txt):
        # substitute stylesheet definition
        txt_list = re.split('`\d+`', txt)
        txt_tag = re.findall('`\d+`', txt)
        txt_styled = txt_list[0]
        for j, p in enumerate(txt_list[1:]):
            style = self._stylesheet[txt_tag[j][1:-1]]
            if p and p[-1] == '\n':
                txt_styled = txt_styled + style[0] + p.rstrip() + style[1] + '\r\n'
            else:
                txt_styled = txt_styled + style[0] + p + style[1]
        return txt_styled

    def _decode_record_block(self):
        f = open(self._fname, 'rb')
        f.seek(self._record_block_offset)

        num_record_blocks = self._read_number(f)
        num_entries = self._read_number(f)
        assert(num_entries == self._num_entries)
        record_block_info_size = self._read_number(f)
        record_block_size = self._read_number(f)

        # record block info section
        record_block_info_list = []
        size_counter = 0
        for i in range(num_record_blocks):
            compressed_size = self._read_number(f)
            decompressed_size = self._read_number(f)
            record_block_info_list += [(compressed_size, decompressed_size)]
            size_counter += self._number_width * 2
        assert(size_counter == record_block_info_size)

        # actual record block data
        offset = 0
        i = 0
        size_counter = 0
        for compressed_size, decompressed_size in record_block_info_list:
            record_block_compressed = f.read(compressed_size)
            # 4 bytes indicates block compression type
            record_block_type = record_block_compressed[:4]
            # 4 bytes adler checksum of uncompressed content
            adler32 = unpack('>I', record_block_compressed[4:8])[0]
            # no compression
            if record_block_type == b'\x00\x00\x00\x00':
                record_block = record_block_compressed[8:]
            # lzo compression
            elif record_block_type == b'\x01\x00\x00\x00':
                if lzo is None:
                    print("LZO compression is not supported")
                    break
                # decompress
                header = b'\xf0' + pack('>I', decompressed_size)
                record_block = lzo.decompress(header + record_block_compressed[8:])
            # zlib compression
            elif record_block_type == b'\x02\x00\x00\x00':
                # decompress
                record_block = zlib.decompress(record_block_compressed[8:])

            # notice that adler32 return signed value
            assert(adler32 == zlib.adler32(record_block) & 0xffffffff)

            assert(len(record_block) == decompressed_size)
            # split record block according to the offset info from key block
            while i < len(self._key_list):
                record_start, key_text = self._key_list[i]
                # reach the end of current record block
                if record_start - offset >= len(record_block):
                    break
                # record end index
                if i < len(self._key_list)-1:
                    record_end = self._key_list[i+1][0]
                else:
                    record_end = len(record_block) + offset
                i += 1
                record = record_block[record_start-offset:record_end-offset]
                # convert to utf-8
                record = record.decode(self._encoding, errors='ignore').strip(u'\x00').encode('utf-8')
                # substitute styles
                if self._substyle and self._stylesheet:
                    record = self._substitute_stylesheet(record)

                yield key_text, record
            offset += len(record_block)
            size_counter += compressed_size
        assert(size_counter == record_block_size)

        f.close()


if __name__ == '__main__':
    import sys
    import os
    import os.path
    import argparse
    import codecs

    def passcode(s):
        try:
            regcode, userid = s.split(',')
        except:
            raise argparse.ArgumentTypeError("Passcode must be regcode,userid")
        try:
            regcode = codecs.decode(regcode, 'hex')
        except:
            raise argparse.ArgumentTypeError("regcode must be a 32 bytes hexadecimal string")
        return regcode, userid

    parser = argparse.ArgumentParser()
    parser.add_argument('-x', '--extract', action="store_true",
                        help='extract mdx to source format and extract files from mdd')
    parser.add_argument('-s', '--substyle', action="store_true",
                        help='substitute style definition if present')
    parser.add_argument('-d', '--datafolder', default="data",
                        help='folder to extract data files from mdd')
    parser.add_argument('-e', '--encoding', default="",
                        help='folder to extract data files from mdd')
    parser.add_argument('-p', '--passcode', default=None, type=passcode,
                        help='register_code,email_or_deviceid')
    parser.add_argument("filename", nargs='?', help="mdx file name")
    args = parser.parse_args()

    # use GUI to select file, default to extract
    if not args.filename:
        import Tkinter
        import tkFileDialog
        root = Tkinter.Tk()
        root.withdraw()
        args.filename = tkFileDialog.askopenfilename(parent=root)
        args.extract = True

    if not os.path.exists(args.filename):
        print("Please specify a valid MDX/MDD file")

    base, ext = os.path.splitext(args.filename)

    # read mdx file
    if ext.lower() == os.path.extsep + 'mdx':
        mdx = MDX(args.filename, args.encoding, args.substyle, args.passcode)
        if type(args.filename) is unicode:
            bfname = args.filename.encode('utf-8')
        else:
            bfname = args.filename
        print('======== %s ========' % bfname)
        print('  Number of Entries : %d' % len(mdx))
        for key, value in mdx.header.items():
            print('  %s : %s' % (key, value))
    else:
        mdx = None

    # find companion mdd file
    mdd_filename = ''.join([base, os.path.extsep, 'mdd'])
    if os.path.exists(mdd_filename):
        mdd = MDD(mdd_filename, args.passcode)
        if type(mdd_filename) is unicode:
            bfname = mdd_filename.encode('utf-8')
        else:
            bfname = mdd_filename
        print('======== %s ========' % bfname)
        print('  Number of Entries : %d' % len(mdd))
        for key, value in mdd.header.items():
            print('  %s : %s' % (key, value))
    else:
        mdd = None

    if args.extract:
        # write out glos
        if mdx:
            output_fname = ''.join([base, os.path.extsep, 'txt'])
            tf = open(output_fname, 'wb')
            for key, value in mdx.items():
                tf.write(key)
                tf.write(b'\r\n')
                tf.write(value)
                if not value.endswith(b'\n'):
                    tf.write(b'\r\n')
                tf.write(b'</>\r\n')
            tf.close()
            # write out style
            if mdx.header.get('StyleSheet'):
                style_fname = ''.join([base, '_style', os.path.extsep, 'txt'])
                sf = open(style_fname, 'wb')
                sf.write(b'\r\n'.join(mdx.header['StyleSheet'].splitlines()))
                sf.close()
        # write out optional data files
        if mdd:
            datafolder = os.path.join(os.path.dirname(args.filename), args.datafolder)
            if not os.path.exists(datafolder):
                os.makedirs(datafolder)
            for key, value in mdd.items():
                fname = key.decode('utf-8').replace('\\', os.path.sep)
                dfname = datafolder + fname
                if not os.path.exists(os.path.dirname(dfname)):
                    os.makedirs(os.path.dirname(dfname))
                df = open(dfname, 'wb')
                df.write(value)
                df.close()

pureSalsa20.py

#!/usr/bin/env python
# coding: utf-8

"""
    pureSalsa20.py -- a pure Python implementation of the Salsa20 cipher, ported to Python 3

    v4.0: Added Python 3 support, dropped support for Python <= 2.5.
    
    // zhansliu

    Original comments below.

    ====================================================================
    There are comments here by two authors about three pieces of software:
        comments by Larry Bugbee about
            Salsa20, the stream cipher by Daniel J. Bernstein 
                 (including comments about the speed of the C version) and
            pySalsa20, Bugbee's own Python wrapper for salsa20.c
                 (including some references), and
        comments by Steve Witham about
            pureSalsa20, Witham's pure Python 2.5 implementation of Salsa20,
                which follows pySalsa20's API, and is in this file.

    Salsa20: a Fast Streaming Cipher (comments by Larry Bugbee)
    -----------------------------------------------------------

    Salsa20 is a fast stream cipher written by Daniel Bernstein 
    that basically uses a hash function and XOR making for fast 
    encryption.  (Decryption uses the same function.)  Salsa20 
    is simple and quick.  
    
    Some Salsa20 parameter values...
        design strength    128 bits
        key length         128 or 256 bits, exactly
        IV, aka nonce      64 bits, always
        chunk size         must be in multiples of 64 bytes
    
    Salsa20 has two reduced versions, 8 and 12 rounds each.
    
    One benchmark (10 MB):
        1.5GHz PPC G4     102/97/89 MB/sec for 8/12/20 rounds
        AMD Athlon 2500+   77/67/53 MB/sec for 8/12/20 rounds
          (no I/O and before Python GC kicks in)
    
    Salsa20 is a Phase 3 finalist in the EU eSTREAM competition 
    and appears to be one of the fastest ciphers.  It is well 
    documented so I will not attempt any injustice here.  Please 
    see "References" below.
    
    ...and Salsa20 is "free for any use".  
    
    
    pySalsa20: a Python wrapper for Salsa20 (Comments by Larry Bugbee)
    ------------------------------------------------------------------

    pySalsa20.py is a simple ctypes Python wrapper.  Salsa20 is 
    as it's name implies, 20 rounds, but there are two reduced 
    versions, 8 and 12 rounds each.  Because the APIs are 
    identical, pySalsa20 is capable of wrapping all three 
    versions (number of rounds hardcoded), including a special 
    version that allows you to set the number of rounds with a 
    set_rounds() function.  Compile the version of your choice 
    as a shared library (not as a Python extension), name and 
    install it as libsalsa20.so.
    
    Sample usage:
        from pySalsa20 import Salsa20
        s20 = Salsa20(key, IV)
        dataout = s20.encryptBytes(datain)   # same for decrypt
    
    This is EXPERIMENTAL software and intended for educational 
    purposes only.  To make experimentation less cumbersome, 
    pySalsa20 is also free for any use.      
    
    THIS PROGRAM IS PROVIDED WITHOUT WARRANTY OR GUARANTEE OF
    ANY KIND.  USE AT YOUR OWN RISK.  
    
    Enjoy,
      
    Larry Bugbee
    bugbee@seanet.com
    April 2007

    
    References:
    -----------
      http://en.wikipedia.org/wiki/Salsa20
      http://en.wikipedia.org/wiki/Daniel_Bernstein
      http://cr.yp.to/djb.html
      http://www.ecrypt.eu.org/stream/salsa20p3.html
      http://www.ecrypt.eu.org/stream/p3ciphers/salsa20/salsa20_p3source.zip

     
    Prerequisites for pySalsa20:
    ----------------------------
      - Python 2.5 (haven't tested in 2.4)


    pureSalsa20: Salsa20 in pure Python 2.5 (comments by Steve Witham)
    ------------------------------------------------------------------

    pureSalsa20 is the stand-alone Python code in this file.
    It implements the underlying Salsa20 core algorithm
    and emulates pySalsa20's Salsa20 class API (minus a bug(*)).

    pureSalsa20 is MUCH slower than libsalsa20.so wrapped with pySalsa20--
    about 1/1000 the speed for Salsa20/20 and 1/500 the speed for Salsa20/8,
    when encrypting 64k-byte blocks on my computer.

    pureSalsa20 is for cases where portability is much more important than
    speed.  I wrote it for use in a "structured" random number generator.

    There are comments about the reasons for this slowness in
          http://www.tiac.net/~sw/2010/02/PureSalsa20

    Sample usage:
        from pureSalsa20 import Salsa20
        s20 = Salsa20(key, IV)
        dataout = s20.encryptBytes(datain)   # same for decrypt

    I took the test code from pySalsa20, added a bunch of tests including
    rough speed tests, and moved them into the file testSalsa20.py.  
    To test both pySalsa20 and pureSalsa20, type
        python testSalsa20.py

    (*)The bug (?) in pySalsa20 is this.  The rounds variable is global to the
    libsalsa20.so library and not switched when switching between instances
    of the Salsa20 class.
        s1 = Salsa20( key, IV, 20 )
        s2 = Salsa20( key, IV, 8 )
    In this example,
        with pySalsa20, both s1 and s2 will do 8 rounds of encryption.
        with pureSalsa20, s1 will do 20 rounds and s2 will do 8 rounds.
    Perhaps giving each instance its own nRounds variable, which
    is passed to the salsa20wordtobyte() function, is insecure.  I'm not a 
    cryptographer.

    pureSalsa20.py and testSalsa20.py are EXPERIMENTAL software and 
    intended for educational purposes only.  To make experimentation less 
    cumbersome, pureSalsa20.py and testSalsa20.py are free for any use.

    Revisions:
    ----------
      p3.2   Fixed bug that initialized the output buffer with plaintext!
             Saner ramping of nreps in speed test.
             Minor changes and print statements.
      p3.1   Took timing variability out of add32() and rot32().
             Made the internals more like pySalsa20/libsalsa .
             Put the semicolons back in the main loop!
             In encryptBytes(), modify a byte array instead of appending.
             Fixed speed calculation bug.
             Used subclasses instead of patches in testSalsa20.py .
             Added 64k-byte messages to speed test to be fair to pySalsa20.
      p3     First version, intended to parallel pySalsa20 version 3.

    More references:
    ----------------
      http://www.seanet.com/~bugbee/crypto/salsa20/          [pySalsa20]
      http://cr.yp.to/snuffle.html        [The original name of Salsa20]
      http://cr.yp.to/snuffle/salsafamily-20071225.pdf [ Salsa20 design]
      http://www.tiac.net/~sw/2010/02/PureSalsa20
    
    THIS PROGRAM IS PROVIDED WITHOUT WARRANTY OR GUARANTEE OF
    ANY KIND.  USE AT YOUR OWN RISK.  

    Cheers,

    Steve Witham sw at remove-this tiac dot net
    February, 2010
"""
import sys
assert(sys.version_info >= (2, 6))

if sys.version_info >= (3,):
	integer_types = (int,)
	python3 = True
else:
	integer_types = (int, long)
	python3 = False

from struct import Struct
little_u64 = Struct( "<Q" )      #    little-endian 64-bit unsigned.
                                 #    Unpacks to a tuple of one element!

little16_i32 = Struct( "<16i" )  # 16 little-endian 32-bit signed ints.
little4_i32 = Struct( "<4i" )    #  4 little-endian 32-bit signed ints.
little2_i32 = Struct( "<2i" )    #  2 little-endian 32-bit signed ints.

_version = 'p4.0'

#----------- Salsa20 class which emulates pySalsa20.Salsa20 ---------------

class Salsa20(object):
    def __init__(self, key=None, IV=None, rounds=20 ):
        self._lastChunk64 = True
        self._IVbitlen = 64             # must be 64 bits
        self.ctx = [ 0 ] * 16
        if key:
            self.setKey(key)
        if IV:
            self.setIV(IV)

        self.setRounds(rounds)


    def setKey(self, key):
        assert type(key) == bytes
        ctx = self.ctx
        if len( key ) == 32:  # recommended
            constants = b"expand 32-byte k"
            ctx[ 1],ctx[ 2],ctx[ 3],ctx[ 4] = little4_i32.unpack(key[0:16])
            ctx[11],ctx[12],ctx[13],ctx[14] = little4_i32.unpack(key[16:32])
        elif len( key ) == 16:
            constants = b"expand 16-byte k"
            ctx[ 1],ctx[ 2],ctx[ 3],ctx[ 4] = little4_i32.unpack(key[0:16])
            ctx[11],ctx[12],ctx[13],ctx[14] = little4_i32.unpack(key[0:16])
        else:
            raise Exception( "key length isn't 32 or 16 bytes." )
        ctx[0],ctx[5],ctx[10],ctx[15] = little4_i32.unpack( constants )

        
    def setIV(self, IV):
        assert type(IV) == bytes
        assert len(IV)*8 == 64, 'nonce (IV) not 64 bits'
        self.IV = IV
        ctx=self.ctx
        ctx[ 6],ctx[ 7] = little2_i32.unpack( IV )
        ctx[ 8],ctx[ 9] = 0, 0  # Reset the block counter.

    setNonce = setIV            # support an alternate name


    def setCounter( self, counter ):
        assert( type(counter) in integer_types )
        assert( 0 <= counter < 1<<64 ), "counter < 0 or >= 2**64"
        ctx = self.ctx
        ctx[ 8],ctx[ 9] = little2_i32.unpack( little_u64.pack( counter ) )

    def getCounter( self ):
        return little_u64.unpack( little2_i32.pack( *self.ctx[ 8:10 ] ) ) [0]


    def setRounds(self, rounds, testing=False ):
        assert testing or rounds in [8, 12, 20], 'rounds must be 8, 12, 20'
        self.rounds = rounds


    def encryptBytes(self, data):
        assert type(data) == bytes, 'data must be byte string'
        assert self._lastChunk64, 'previous chunk not multiple of 64 bytes'
        lendata = len(data)
        munged = bytearray(lendata)
        for i in range( 0, lendata, 64 ):
            h = salsa20_wordtobyte( self.ctx, self.rounds, checkRounds=False )
            self.setCounter( ( self.getCounter() + 1 ) % 2**64 )
            # Stopping at 2^70 bytes per nonce is user's responsibility.
            for j in range( min( 64, lendata - i ) ):
                if python3:
                    munged[ i+j ] = data[ i+j ] ^ h[j]
                else:
                    munged[ i+j ] = ord(data[ i+j ]) ^ ord(h[j])

        self._lastChunk64 = not lendata % 64
        return bytes(munged)
    
    decryptBytes = encryptBytes # encrypt and decrypt use same function

#--------------------------------------------------------------------------

def salsa20_wordtobyte( input, nRounds=20, checkRounds=True ):
    """ Do nRounds Salsa20 rounds on a copy of 
            input: list or tuple of 16 ints treated as little-endian unsigneds.
        Returns a 64-byte string.
        """

    assert( type(input) in ( list, tuple )  and  len(input) == 16 )
    assert( not(checkRounds) or ( nRounds in [ 8, 12, 20 ] ) )

    x = list( input )

    def XOR( a, b ):  return a ^ b
    ROTATE = rot32
    PLUS   = add32

    for i in range( nRounds // 2 ):
        # These ...XOR...ROTATE...PLUS... lines are from ecrypt-linux.c
        # unchanged except for indents and the blank line between rounds:
        x[ 4] = XOR(x[ 4],ROTATE(PLUS(x[ 0],x[12]), 7));
        x[ 8] = XOR(x[ 8],ROTATE(PLUS(x[ 4],x[ 0]), 9));
        x[12] = XOR(x[12],ROTATE(PLUS(x[ 8],x[ 4]),13));
        x[ 0] = XOR(x[ 0],ROTATE(PLUS(x[12],x[ 8]),18));
        x[ 9] = XOR(x[ 9],ROTATE(PLUS(x[ 5],x[ 1]), 7));
        x[13] = XOR(x[13],ROTATE(PLUS(x[ 9],x[ 5]), 9));
        x[ 1] = XOR(x[ 1],ROTATE(PLUS(x[13],x[ 9]),13));
        x[ 5] = XOR(x[ 5],ROTATE(PLUS(x[ 1],x[13]),18));
        x[14] = XOR(x[14],ROTATE(PLUS(x[10],x[ 6]), 7));
        x[ 2] = XOR(x[ 2],ROTATE(PLUS(x[14],x[10]), 9));
        x[ 6] = XOR(x[ 6],ROTATE(PLUS(x[ 2],x[14]),13));
        x[10] = XOR(x[10],ROTATE(PLUS(x[ 6],x[ 2]),18));
        x[ 3] = XOR(x[ 3],ROTATE(PLUS(x[15],x[11]), 7));
        x[ 7] = XOR(x[ 7],ROTATE(PLUS(x[ 3],x[15]), 9));
        x[11] = XOR(x[11],ROTATE(PLUS(x[ 7],x[ 3]),13));
        x[15] = XOR(x[15],ROTATE(PLUS(x[11],x[ 7]),18));

        x[ 1] = XOR(x[ 1],ROTATE(PLUS(x[ 0],x[ 3]), 7));
        x[ 2] = XOR(x[ 2],ROTATE(PLUS(x[ 1],x[ 0]), 9));
        x[ 3] = XOR(x[ 3],ROTATE(PLUS(x[ 2],x[ 1]),13));
        x[ 0] = XOR(x[ 0],ROTATE(PLUS(x[ 3],x[ 2]),18));
        x[ 6] = XOR(x[ 6],ROTATE(PLUS(x[ 5],x[ 4]), 7));
        x[ 7] = XOR(x[ 7],ROTATE(PLUS(x[ 6],x[ 5]), 9));
        x[ 4] = XOR(x[ 4],ROTATE(PLUS(x[ 7],x[ 6]),13));
        x[ 5] = XOR(x[ 5],ROTATE(PLUS(x[ 4],x[ 7]),18));
        x[11] = XOR(x[11],ROTATE(PLUS(x[10],x[ 9]), 7));
        x[ 8] = XOR(x[ 8],ROTATE(PLUS(x[11],x[10]), 9));
        x[ 9] = XOR(x[ 9],ROTATE(PLUS(x[ 8],x[11]),13));
        x[10] = XOR(x[10],ROTATE(PLUS(x[ 9],x[ 8]),18));
        x[12] = XOR(x[12],ROTATE(PLUS(x[15],x[14]), 7));
        x[13] = XOR(x[13],ROTATE(PLUS(x[12],x[15]), 9));
        x[14] = XOR(x[14],ROTATE(PLUS(x[13],x[12]),13));
        x[15] = XOR(x[15],ROTATE(PLUS(x[14],x[13]),18));

    for i in range( len( input ) ):
        x[i] = PLUS( x[i], input[i] )
    return little16_i32.pack( *x )

#--------------------------- 32-bit ops -------------------------------

def trunc32( w ):
    """ Return the bottom 32 bits of w as a Python int.
        This creates longs temporarily, but returns an int. """
    w = int( ( w & 0x7fffFFFF ) | -( w & 0x80000000 ) )
    assert type(w) == int
    return w


def add32( a, b ):
    """ Add two 32-bit words discarding carry above 32nd bit,
        and without creating a Python long.
        Timing shouldn't vary.
    """
    lo = ( a & 0xFFFF ) + ( b & 0xFFFF )
    hi = ( a >> 16 ) + ( b >> 16 ) + ( lo >> 16 )
    return ( -(hi & 0x8000) | ( hi & 0x7FFF ) ) << 16 | ( lo & 0xFFFF )


def rot32( w, nLeft ):
    """ Rotate 32-bit word left by nLeft or right by -nLeft
        without creating a Python long.
        Timing depends on nLeft but not on w.
    """
    nLeft &= 31  # which makes nLeft >= 0
    if nLeft == 0:
        return w

    # Note: now 1 <= nLeft <= 31.
    #     RRRsLLLLLL   There are nLeft RRR's, (31-nLeft) LLLLLL's,
    # =>  sLLLLLLRRR   and one s which becomes the sign bit.
    RRR = ( ( ( w >> 1 ) & 0x7fffFFFF ) >> ( 31 - nLeft ) )
    sLLLLLL = -( (1<<(31-nLeft)) & w ) | (0x7fffFFFF>>nLeft) & w
    return RRR | ( sLLLLLL << nLeft )


# --------------------------------- end -----------------------------------

ripemd128.py

""" 
ripemd128.py - A simple ripemd128 library in pure Python.

Supports both Python 2 (versions >= 2.6) and Python 3.

Usage:
    from ripemd128 import ripemd128
    digest = ripemd128(b"The quick brown fox jumps over the lazy dog")
    assert(digest == b"\x3f\xa9\xb5\x7f\x05\x3c\x05\x3f\xbe\x27\x35\xb2\x38\x0d\xb5\x96")
"""
      


import struct


# follows this description: http://homes.esat.kuleuven.be/~bosselae/ripemd/rmd128.txt

def f(j, x, y, z):
	assert(0 <= j and j < 64)
	if j < 16:
		return x ^ y ^ z
	elif j < 32:
		return (x & y) | (z & ~x)
	elif j < 48:
		return (x | (0xffffffff & ~y)) ^ z
	else:
		return (x & z) | (y & ~z)

def K(j):
	assert(0 <= j and j < 64)
	if j < 16:
		return 0x00000000
	elif j < 32:
		return 0x5a827999
	elif j < 48:
		return 0x6ed9eba1
	else:
		return 0x8f1bbcdc

def Kp(j):
	assert(0 <= j and j < 64)
	if j < 16:
		return 0x50a28be6
	elif j < 32:
		return 0x5c4dd124
	elif j < 48:
		return 0x6d703ef3
	else:
		return 0x00000000

def padandsplit(message):
	"""
	returns a two-dimensional array X[i][j] of 32-bit integers, where j ranges
	from 0 to 16.
	First pads the message to length in bytes is congruent to 56 (mod 64), 
	by first adding a byte 0x80, and then padding with 0x00 bytes until the
	message length is congruent to 56 (mod 64). Then adds the little-endian
	64-bit representation of the original length. Finally, splits the result
	up into 64-byte blocks, which are further parsed as 32-bit integers.
	"""
	origlen = len(message)
	padlength = 64 - ((origlen - 56) % 64) #minimum padding is 1!
	message += b"\x80"
	message += b"\x00" * (padlength - 1)
	message += struct.pack("<Q", origlen*8)
	assert(len(message) % 64 == 0)
	return [
	         [
	           struct.unpack("<L", message[i+j:i+j+4])[0]
	           for j in range(0, 64, 4)
	         ]
	         for i in range(0, len(message), 64)
	       ]


def add(*args):
	return sum(args) & 0xffffffff

def rol(s,x):
	assert(s < 32)
	return (x << s | x >> (32-s)) & 0xffffffff

r =  [ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9,10,11,12,13,14,15,
       7, 4,13, 1,10, 6,15, 3,12, 0, 9, 5, 2,14,11, 8,
       3,10,14, 4, 9,15, 8, 1, 2, 7, 0, 6,13,11, 5,12,
       1, 9,11,10, 0, 8,12, 4,13, 3, 7,15,14, 5, 6, 2]
rp = [ 5,14, 7, 0, 9, 2,11, 4,13, 6,15, 8, 1,10, 3,12,
       6,11, 3, 7, 0,13, 5,10,14,15, 8,12, 4, 9, 1, 2,
      15, 5, 1, 3, 7,14, 6, 9,11, 8,12, 2,10, 0, 4,13,
       8, 6, 4, 1, 3,11,15, 0, 5,12, 2,13, 9, 7,10,14]
s =  [11,14,15,12, 5, 8, 7, 9,11,13,14,15, 6, 7, 9, 8,
       7, 6, 8,13,11, 9, 7,15, 7,12,15, 9,11, 7,13,12,
      11,13, 6, 7,14, 9,13,15,14, 8,13, 6, 5,12, 7, 5,
      11,12,14,15,14,15, 9, 8, 9,14, 5, 6, 8, 6, 5,12]
sp = [ 8, 9, 9,11,13,15,15, 5, 7, 7, 8,11,14,14,12, 6,
       9,13,15, 7,12, 8, 9,11, 7, 7,12, 7, 6,15,13,11,
       9, 7,15,11, 8, 6, 6,14,12,13, 5,14,13,13, 7, 5,
      15, 5, 8,11,14,14, 6,14, 6, 9,12, 9,12, 5,15, 8]


def ripemd128(message):
	h0 = 0x67452301
	h1 = 0xefcdab89
	h2 = 0x98badcfe
	h3 = 0x10325476
	X = padandsplit(message)
	for i in range(len(X)):
		(A,B,C,D) = (h0,h1,h2,h3)
		(Ap,Bp,Cp,Dp) = (h0,h1,h2,h3)
		for j in range(64):
			T = rol(s[j], add(A, f(j,B,C,D), X[i][r[j]], K(j)))
			(A,D,C,B) = (D,C,B,T)
			T = rol(sp[j], add(Ap, f(63-j,Bp,Cp,Dp), X[i][rp[j]], Kp(j)))
			(Ap,Dp,Cp,Bp)=(Dp,Cp,Bp,T)
		T = add(h1,C,Dp)
		h1 = add(h2,D,Ap)
		h2 = add(h3,A,Bp)
		h3 = add(h0,B,Cp)
		h0 = T
	
	
	return struct.pack("<LLLL",h0,h1,h2,h3)

def hexstr(bstr):
	return "".join("{0:02x}".format(b) for b in bstr)

新手学python-2015年终奖计算python实现

LC 2016-05-10 Python

　　年终奖个人所得税计算方法
　　年终奖拿到手大家乐开怀，不过年终奖也会要像发工资一样。要交税费的哦。很多人不知道年终奖个人所得税怎么计算!这里小编居仁义就来为大家分享年终奖个人所得税的计算方法。仅供参考!!
　　国税发[2005]9号文件基本规定是：“纳税人取得全年一次性奖金，应单独作为一个月工资、薪金所得计算纳税”。
　　年终奖个人所得税计算方式：
　　1、发放年终奖的当月工资高于3500元时，年终奖扣税方式为：年终奖*税率-速算扣除数，税率是按年终奖/12作为“应纳税所得额”对应的税率。
　　2、当月工资低于3500元时，年终奖个人所得税=(年终奖-(3500-月工资))*税率-速算扣除数，税率是按年终奖-(3500-月工资)除以12作为“应纳税所得额”对应的税率。
　　这里我们以一个示例进行计算演示：
　　小王在2013年12月工资6000元，同时领到2013年的年终奖20000元，当月所需缴纳的个人所得税如下：
　　1)当月工资个人所得税=(6000-3500)*10%-105=145元
　　2)年终奖个人所得税=20000*10%-105=1895元
　　当月个人所得税总额=145+1895=2040元

　　由于单位发放给员工的年终奖形式不同，个人所得税计算方法也不尽相同。
　　一、员工当月的工资薪金超过「3500」元，再发放的年终奖单独作为一个月的工资计算缴纳个人所得税。
　　全年一次性奖金，单独作为一个月计算时，除以12找税率，但计算税额时，速算扣除数只允许扣除一次。
　　例一：赵某2013年1月工资5000，年终奖24000，无其它收入。
　　赵某工资部分应缴纳个人所得税：(5000-3500)*3%=45元
　　赵某年终奖(24000)部分应缴纳个人所得税计算：
　　先将雇员当月内取得的全年一次性奖金，除以12个月，即：24000/12=2000元，
　　再按其商数确定适用税率为10%，速算扣除数为105.
　　赵某年终奖24000应缴纳个人所得税：
　　24000*10%-105=2295元。
　　赵某2013年1月份应缴纳个人所得税2340元。
　　二、员工当月的工资薪金不超过3500元，再发放的年终奖单独作为一个月的工资计算缴纳个人所得税。
　　但可以将全年一次性奖金减除“雇员当月工资薪金所得与费用扣除额的差额”后的余额，作为应纳税所得额。其中“雇员当月工资薪金所得”以收入额扣除规定标准的免税所得(如按规定缴纳的社会保险和住房公积金等)后的数额。
　　例二：钱某2013年1月工资2000，年终奖24000，无其它收入。
　　钱某当月工资2000元，未超过费用扣除标准3500元，不需要缴纳个人所得税。
　　钱某2013年1月当月工资薪金所得与费用扣除额的差额为3500-2000=1500元。
　　钱某年终奖24000元，先减除“当月工资薪金所得与费用扣除额的差额(1500元)”，22500元为应纳税所得额。
　　22500除以12个月，即：22500/12=1875元，
　　再按其商数确定适用税率为10%，速算扣除数为105.
　　钱某年终奖24000应缴纳个人所得税：
　　(24000-1500)*10%-105=2145元。
　　钱某2013年1月份应缴纳个人所得税2145元。
　　三、员工一个年度在两个以上单位工作过，只能按照国税发[2005]9号文件规定，在一个纳税年度内，对每一个纳税人年终奖计税办法只允许采用一次，纳税人可以自由选择采用该计税办法的时间和发放单位计算。
　　该条款的要点是：
　　1，一个员工2013年1月发放的年终奖适用了除以12找税率的优惠计算政策，2013年其它月份就不能再适用了。
　　2，一个员工一年一次，在两处以上取得年终奖，也只能适用一次。
　　3，员工即使工作时间不足12个月，也可以适用一次。
　　例三：孙某2012年1-3月在石油企业工作，2012年4-8月跳槽到电信企业，2012年9月至今跳槽到房地产企业工作，
　　如果孙某2012年12月在房地产企业取得工资5000，年终奖24000，其它无收入，虽然钱某只2012年在房地产企业工作4个月，但其应缴纳个人所得税与例一赵某相同，即当月工资部分应缴纳个人所得税45，年终奖部分也是除12找税率，应缴纳个税2295元。
　　房地产企业计提年终奖时，计提、发放会计处理同例一。
　　四、员工同一月份在两个以上单位取得年终奖，可以选择一个单位的一次性奖金按照国税发[2005]9号文件优惠办法计算，从另一单位取得的年终奖合并到当月工资薪金项目缴税。
　　国税发[2005]9号文件规定：“在一个纳税年度内，对每一个纳税人，该计税办法只允许采用一次”。如果同一个人同月在两个企业都取得了年终一次性奖金，纳税人在自行申报时，不可以将这两项奖金合并计算缴纳个人所得税，享受一次性奖金的政策;对该个人按规定只能享受一次全年一次性奖金的优惠算法。
　　例四：李某2013年1月工资5000，取得本企业发放的年终奖24000元，另取得兼职单位发放的年终奖6000元，无其它收入。
　　李某本企业年终奖(24000)部分应缴纳个人所得税计算：先将雇员当月内取得的全年一次性奖金，除以12个月，即：24000/12=2000元，
　　再按其商数确定适用税率为10%，速算扣除数为105.
　　李某本企业年终奖24000应缴纳个人所得税：
　　24000*10%-105=2295元。
　　李某取得的兼职单位发放6000元年终奖应合并到李某当月工资薪金中计算缴纳。如果兼职单位按发放年终奖计算个人所得税，代扣代缴了个人所得税6000*3%=180元。
　　李某在本企业取得工资5000元，本企业代扣代缴(5000-3500)*3%-=45元个人所得税。
　　年终汇算清缴，李某工资部分应缴纳个人所得税：(5000 6000-3500)*20%-555=945元。

由以上这些说明，用Python实现了下，只支持上面提到的1,2情况。

中间可能还有些错误存在，不过也先记录下：

print """
****************************************************************
These results just be suitable for:
working on one company in a year and just get bonus once in a year.
****************************************************************
"""

def ab(your_salary, your_annual_bonus):
    
    ab_per_m = annual_bonus / 12   
    
    if ab_per_m <= 1500:
        ab_tax = annual_bonus * 3 / 100 - 0
    if 1500 < ab_per_m <= 4500:
        ab_tax = annual_bonus * 10 / 100 - 105
    if 4500 < ab_per_m <= 9000:
        ab_tax = annual_bonus * 20 / 100 - 555
    if 9000 < ab_per_m <= 35000:
        ab_tax = annual_bonus *25 / 100 - 1005
    if 35000 < ab_per_m <= 55000:
        ab_tax = annual_bonus * 30 / 100 - 2755
    if 55000 < ab_per_m <= 80000:
        ab_tax = annual_bonus * 35 / 100 - 5505
    if ab_per_m > 80000:
        ab_tax = annual_bonus * 45 / 100 - 13505
    
    ab_get = annual_bonus - ab_tax

    return (ab_get, ab_tax)

your_salary = float(raw_input('Please input your salary when you get your Annual Bonus > '))
your_annual_bonus = float(raw_input('Please input your Annual Bonus > '))

if your_salary >= 3500:
    annual_bonus = your_annual_bonus 
    (ab_get, ab_tax) = ab(your_salary, your_annual_bonus)

if 0 < your_salary < 3500:
    annual_bonus_p = your_annual_bonus - ( 3500 - your_salary )
    if annual_bonus_p >= 0:
        annual_bonus = annual_bonus_p
        (ab_get, ab_tax) = ab(your_salary, your_annual_bonus)
        
    if annual_bonus_p < 0:
        ab_tax = 0
        ab_get = your_annual_bonus       


print "Your salary is %0.2f." % your_salary
print "Your Annual Bonus is %0.2f." % your_annual_bonus
print "You can get %0.2f Annual Bonus without the tax." % ab_get
print "You Annual Bonus tax is %0.2f." % ab_tax

运行结果：

C:\Python27>python.exe annual_bonus.py

****************************************************************
These results just be suitable for:
work on one company in a year and just get bonus once in a year.
****************************************************************

Please input your salary when you get your Annual Bonus > 5000
Please input your Annual Bonus > 24000
Your salary is 5000.00.
Your Annual Bonus is 24000.00.
You can get 21705.00 Annual Bonus without the tax.
You Annual Bonus tax is 2295.00.

C:\Python27>
C:\Python27>python.exe annual_bonus.py

****************************************************************
These results just be suitable for:
work on one company in a year and just get bonus once in a year.
****************************************************************

Please input your salary when you get your Annual Bonus > 2000
Please input your Annual Bonus > 24000
Your salary is 2000.00.
Your Annual Bonus is 24000.00.
You can get 20355.00 Annual Bonus without the tax.
You Annual Bonus tax is 2145.00.

C:\Python27>

随记

May 2016 Blog Posts

CentOS5.8下安装配置VPN服务器，这里centos 5.8配置不适用centos 6.5

CentOS5.8安装配置PPPoE服务器以及问题总结

python中文乱码

使用python提取mdx中的数据

新手学python-2015年终奖计算python实现

标签云

数据归档

日志分类

最近评论

最新博客