使用python提取mdx中的数据

<< 新手学python-2015年终奖计算python实现 | Home | python中文乱码 >>

使用python提取mdx中的数据

mdx/mdd 是供 MDict、GoldenDict 等加载使用的词库，有些时候我们想要自己动手排版，这就需要解压 mdx/mdd ，提取其中文字、图片、音频等数据，这时候就可以利用 Python 脚本来处理。

我使用的是Python 2.7和readmdict.py、 ripemd128.py、pureSalsa20.py

1.运行 python readmdict.py
返回错误 LZO compression support is not available

2.用编辑器打开readmdict.py文件看了一下，将以下几行都注释掉：

try:
    import lzo
except ImportError:
    lzo = None
    print("LZO compression support is not available")

运行 python readmdict.py

C:\Python27\readmdict>python readmdict.py
Try Brutal Force on Encrypted Key Blocks
Traceback (most recent call last):
  File "readmdict.py", line 649, in <module>
    mdx = MDX(args.filename, args.encoding, args.substyle, args.passcode)
  File "readmdict.py", line 503, in __init__
    MDict.__init__(self, fname, encoding, passcode)
  File "readmdict.py", line 105, in __init__
    self._key_list = self._read_keys_brutal()
  File "readmdict.py", line 399, in _read_keys_brutal
    key_list = self._decode_key_block(key_block_compressed, key_block_info_list)
  File "readmdict.py", line 205, in _decode_key_block
    if lzo is None:
NameError: global name 'lzo' is not defined

将文件恢复

3. >>> import lzo
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ImportError: No module named lzo
>>> exit()
发现没有lzo模块

4. pip安装也不成功：

C:\Python27\Scripts>pip install lzo
Collecting lzo
c:\python27\lib\site-packages\pip-7.1.2-py2.7.egg\pip\_vendor\requests\packages\urllib3\util\ssl_.py:90: InsecurePlatformWarning: A tru
e SSLContext object is not available. This prevents urllib3 from configuring SSL appropriately and may cause certain SSL connections to
 fail. For more information, see https://urllib3.readthedocs.org/en/latest/security.html#insecureplatformwarning.
  InsecurePlatformWarning
  Retrying (Retry(total=4, connect=None, read=None, redirect=None)) after connection broken by 'ProtocolError('Connection aborted.', er
ror(10054, ''))': /simple/lzo/
c:\python27\lib\site-packages\pip-7.1.2-py2.7.egg\pip\_vendor\requests\packages\urllib3\util\ssl_.py:90: InsecurePlatformWarning: A tru
e SSLContext object is not available. This prevents urllib3 from configuring SSL appropriately and may cause certain SSL connections to
 fail. For more information, see https://urllib3.readthedocs.org/en/latest/security.html#insecureplatformwarning.
  InsecurePlatformWarning
  Could not find a version that satisfies the requirement lzo (from versions: )
No matching distribution found for lzo
You are using pip version 7.1.2, however version 8.1.1 is available.
You should consider upgrading via the 'python -m pip install --upgrade pip' command.

C:\Python27\Scripts>

5. 在stackoverflow上查找到在windows上无法直接安装， http://stackoverflow.com/questions/7517075/how-can-i-install-python-lzo-1-08
按照回答，先到http://www.lfd.uci.edu/~gohlke/pythonlibs/#python-lzo下载了python_lzo-1.11-cp27-none-win32.whl（因为我的版本是Python 2.7）

然后运行pip install python_lzo-1.11-cp27-none-win32.whl进行安装：

C:\Python27\Scripts>pip install python_lzo-1.11-cp27-none-win32.whl
Processing c:\python27\scripts\python_lzo-1.11-cp27-none-win32.whl
Installing collected packages: python-lzo
Successfully installed python-lzo-1.11

如果是不同版本，会提示：

C:\Python27\Scripts>pip install python_lzo-1.11-cp35-none-win32.whl
python_lzo-1.11-cp35-none-win32.whl is not a supported wheel on this platform.

6.使用Python提取mdx中的数据

C:\Python27\readmdict>python readmdict.py
======== C:/Python27/readmdict/test.mdx ========
  Number of Entries : 34151
  Compact : No
  Compat : No
  GeneratedByEngineVersion : 1.2
  Description : <font color=blue size=5><b>銆婃煰鏋楁柉楂樼礆鑻辫獮瀛哥繏瑭炲吀绗?鐗堛€?/b></font>
<p>杞夋彌閬斾汉锛氬翱鐗?(-_-)鈥欌€?
<br>杞夋彌鏃ユ湡锛?9骞?鏈?0鏃?
<br>瑭炲吀瀛楁暩锛?4151
<p><b><font color=red><瑭炲吀鍏у></b></font >
<br>銆€銆€Collins Cobuild鏄瓧鍏歌垏鑻辫獮瀛哥繏鏇哥殑鐭ュ悕鍝佺墝锛屻€奀ollins Cobuild Advanced Learner&apos;s English Dictionary銆嬩
竴鐩存槸涓栫晫鍚勫湴璁€鑰呭績涓殑鏈€浣冲瓧鍏镐箣涓€锛岀敋鑷宠璀界偤銆岀従浠ｈ嫳瑾炴渶鍏ㄩ潰銆佹瑠濞佺殑鍤皫銆嶃€?澶氬湅鏆㈤姺鏇搞
€婂崈钀垾瀛歌嫳瑾炪€嬩腑涔熺壒鍒ユ帹钖︺€奀ollins Cobuild Advanced Learner&apos;s English Dictionary銆嬨€?
<p><font color=red>鏈经鍏歌綁鑷猯ingos锛屾湰杈吀渚涘缈掕嫳瑾炵殑鏈嬪弸浣跨敤锛?
<p><font color=grape>杞夋彌绱旂偤鑸堣叮锛?/font>
<p><font color=green>鏈鍏哥敱婢抽杸鏈嬪弸鍒朵綔锛?/font>


  RequiredEngineVersion : 1.2
  Format : Html
  Encrypted : No
  Encoding : UTF-16
  StyleSheet :
  Title : Title (No HTML code allowed)
  KeyCaseSensitive : No
  DataSourceFormat : 107

C:\Python27\readmdict>

7. 在mdx 所在目录下，出现了 test.txt

附：

readmdict.py

#!/usr/bin/env python
# -*- coding: utf-8 -*-
# readmdict.py
# Octopus MDict Dictionary File (.mdx) and Resource File (.mdd) Analyser
#
# Copyright (C) 2012, 2013, 2015 Xiaoqiang Wang <xiaoqiangwang AT gmail DOT com>
#
# This program is a free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, version 3 of the License.
#
# You can get a copy of GNU General Public License along this program
# But you can always get it from http://www.gnu.org/licenses/gpl.txt
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.

from struct import pack, unpack
from io import BytesIO
import re
import sys

from ripemd128 import ripemd128
from pureSalsa20 import Salsa20

# zlib compression is used for engine version >=2.0
import zlib
# LZO compression is used for engine version < 2.0

try:
    import lzo
except ImportError:
    lzo = None
    print("LZO compression support is not available")

# 2x3 compatible
if sys.hexversion >= 0x03000000:
    unicode = str


def _unescape_entities(text):
    """
    unescape offending tags < > " &
    """
    text = text.replace(b'&lt;', b'<')
    text = text.replace(b'&gt;', b'>')
    text = text.replace(b'&quot;', b'"')
    text = text.replace(b'&amp;', b'&')
    return text


def _fast_decrypt(data, key):
    b = bytearray(data)
    key = bytearray(key)
    previous = 0x36
    for i in range(len(b)):
        t = (b[i] >> 4 | b[i] << 4) & 0xff
        t = t ^ previous ^ (i & 0xff) ^ key[i % len(key)]
        previous = b[i]
        b[i] = t
    return bytes(b)


def _mdx_decrypt(comp_block):
    key = ripemd128(comp_block[4:8] + pack(b'<L', 0x3695))
    return comp_block[0:8] + _fast_decrypt(comp_block[8:], key)


def _salsa_decrypt(ciphertext, encrypt_key):
    s20 = Salsa20(key=encrypt_key, IV=b"\x00"*8, rounds=8)
    return s20.encryptBytes(ciphertext)


def _decrypt_regcode_by_deviceid(reg_code, deviceid):
    deviceid_digest = ripemd128(deviceid)
    s20 = Salsa20(key=deviceid_digest, IV=b"\x00"*8, rounds=8)
    encrypt_key = s20.encryptBytes(reg_code)
    return encrypt_key


def _decrypt_regcode_by_email(reg_code, email):
    email_digest = ripemd128(email.decode().encode('utf-16-le'))
    s20 = Salsa20(key=email_digest, IV=b"\x00"*8, rounds=8)
    encrypt_key = s20.encryptBytes(reg_code)
    return encrypt_key


class MDict(object):
    """
    Base class which reads in header and key block.
    It has no public methods and serves only as code sharing base class.
    """
    def __init__(self, fname, encoding='', passcode=None):
        self._fname = fname
        self._encoding = encoding.upper()
        self._passcode = passcode

        self.header = self._read_header()
        try:
            self._key_list = self._read_keys()
        except:
            print("Try Brutal Force on Encrypted Key Blocks")
            self._key_list = self._read_keys_brutal()

    def __len__(self):
        return self._num_entries

    def __iter__(self):
        return self.keys()

    def keys(self):
        """
        Return an iterator over dictionary keys.
        """
        return (key_value for key_id, key_value in self._key_list)

    def _read_number(self, f):
        return unpack(self._number_format, f.read(self._number_width))[0]

    def _parse_header(self, header):
        """
        extract attributes from <Dict attr="value" ... >
        """
        taglist = re.findall(b'(\w+)="(.*?)"', header, re.DOTALL)
        tagdict = {}
        for key, value in taglist:
            tagdict[key] = _unescape_entities(value)
        return tagdict

    def _decode_key_block_info(self, key_block_info_compressed):
        if self._version >= 2:
            # zlib compression
            assert(key_block_info_compressed[:4] == b'\x02\x00\x00\x00')
            # decrypt if needed
            if self._encrypt & 0x02:
                key_block_info_compressed = _mdx_decrypt(key_block_info_compressed)
            # decompress
            key_block_info = zlib.decompress(key_block_info_compressed[8:])
            # adler checksum
            adler32 = unpack('>I', key_block_info_compressed[4:8])[0]
            assert(adler32 == zlib.adler32(key_block_info) & 0xffffffff)
        else:
            # no compression
            key_block_info = key_block_info_compressed
        # decode
        key_block_info_list = []
        num_entries = 0
        i = 0
        if self._version >= 2:
            byte_format = '>H'
            byte_width = 2
            text_term = 1
        else:
            byte_format = '>B'
            byte_width = 1
            text_term = 0

        while i < len(key_block_info):
            # number of entries in current key block
            num_entries += unpack(self._number_format, key_block_info[i:i+self._number_width])[0]
            i += self._number_width
            # text head size
            text_head_size = unpack(byte_format, key_block_info[i:i+byte_width])[0]
            i += byte_width
            # text head
            if self._encoding != 'UTF-16':
                i += text_head_size + text_term
            else:
                i += (text_head_size + text_term) * 2
            # text tail size
            text_tail_size = unpack(byte_format, key_block_info[i:i+byte_width])[0]
            i += byte_width
            # text tail
            if self._encoding != 'UTF-16':
                i += text_tail_size + text_term
            else:
                i += (text_tail_size + text_term) * 2
            # key block compressed size
            key_block_compressed_size = unpack(self._number_format, key_block_info[i:i+self._number_width])[0]
            i += self._number_width
            # key block decompressed size
            key_block_decompressed_size = unpack(self._number_format, key_block_info[i:i+self._number_width])[0]
            i += self._number_width
            key_block_info_list += [(key_block_compressed_size, key_block_decompressed_size)]

        #assert(num_entries == self._num_entries)

        return key_block_info_list

    def _decode_key_block(self, key_block_compressed, key_block_info_list):
        key_list = []
        i = 0
        for compressed_size, decompressed_size in key_block_info_list:
            start = i
            end = i + compressed_size
            # 4 bytes : compression type
            key_block_type = key_block_compressed[start:start+4]
            # 4 bytes : adler checksum of decompressed key block
            adler32 = unpack('>I', key_block_compressed[start+4:start+8])[0]
            if key_block_type == b'\x00\x00\x00\x00':
                key_block = key_block_compressed[start+8:end]
            elif key_block_type == b'\x01\x00\x00\x00':
                if lzo is None:
                    print("LZO compression is not supported")
                    break
                # decompress key block
                header = b'\xf0' + pack('>I', decompressed_size)
                key_block = lzo.decompress(header + key_block_compressed[start+8:end])
            elif key_block_type == b'\x02\x00\x00\x00':
                # decompress key block
                key_block = zlib.decompress(key_block_compressed[start+8:end])
            # extract one single key block into a key list
            key_list += self._split_key_block(key_block)
            # notice that adler32 returns signed value
            assert(adler32 == zlib.adler32(key_block) & 0xffffffff)

            i += compressed_size
        return key_list

    def _split_key_block(self, key_block):
        key_list = []
        key_start_index = 0
        while key_start_index < len(key_block):
            # the corresponding record's offset in record block
            key_id = unpack(self._number_format, key_block[key_start_index:key_start_index+self._number_width])[0]
            # key text ends with '\x00'
            if self._encoding == 'UTF-16':
                delimiter = b'\x00\x00'
                width = 2
            else:
                delimiter = b'\x00'
                width = 1
            i = key_start_index + self._number_width
            while i < len(key_block):
                if key_block[i:i+width] == delimiter:
                    key_end_index = i
                    break
                i += width
            key_text = key_block[key_start_index+self._number_width:key_end_index]\
                .decode(self._encoding, errors='ignore').encode('utf-8').strip()
            key_start_index = key_end_index + width
            key_list += [(key_id, key_text)]
        return key_list

    def _read_header(self):
        f = open(self._fname, 'rb')
        # number of bytes of header text
        header_bytes_size = unpack('>I', f.read(4))[0]
        header_bytes = f.read(header_bytes_size)
        # 4 bytes: adler32 checksum of header, in little endian
        adler32 = unpack('<I', f.read(4))[0]
        assert(adler32 == zlib.adler32(header_bytes) & 0xffffffff)
        # mark down key block offset
        self._key_block_offset = f.tell()
        f.close()

        # header text in utf-16 encoding ending with '\x00\x00'
        header_text = header_bytes[:-2].decode('utf-16').encode('utf-8')
        header_tag = self._parse_header(header_text)
        if not self._encoding:
            encoding = header_tag[b'Encoding']
            if sys.hexversion >= 0x03000000:
                encoding = encoding.decode('utf-8')
            # GB18030 > GBK > GB2312
            if encoding in ['GBK', 'GB2312']:
                encoding = 'GB18030'
            self._encoding = encoding
        # encryption flag
        #   0x00 - no encryption
        #   0x01 - encrypt record block
        #   0x02 - encrypt key info block
        if header_tag[b'Encrypted'] == b'No':
            self._encrypt = 0
        elif header_tag[b'Encrypted'] == b'Yes':
            self._encrypt = 1
        else:
            self._encrypt = int(header_tag[b'Encrypted'])

        # stylesheet attribute if present takes form of:
        #   style_number # 1-255
        #   style_begin  # or ''
        #   style_end    # or ''
        # store stylesheet in dict in the form of
        # {'number' : ('style_begin', 'style_end')}
        self._stylesheet = {}
        if header_tag.get('StyleSheet'):
            lines = header_tag['StyleSheet'].splitlines()
            for i in range(0, len(lines), 3):
                self._stylesheet[lines[i]] = (lines[i+1], lines[i+2])

        # before version 2.0, number is 4 bytes integer
        # version 2.0 and above uses 8 bytes
        self._version = float(header_tag[b'GeneratedByEngineVersion'])
        if self._version < 2.0:
            self._number_width = 4
            self._number_format = '>I'
        else:
            self._number_width = 8
            self._number_format = '>Q'

        return header_tag

    def _read_keys(self):
        f = open(self._fname, 'rb')
        f.seek(self._key_block_offset)

        # the following numbers could be encrypted
        if self._version >= 2.0:
            num_bytes = 8 * 5
        else:
            num_bytes = 4 * 4
        block = f.read(num_bytes)

        if self._encrypt & 1:
            if self._passcode is None:
                raise RuntimeError('user identification is needed to read encrypted file')
            regcode, userid = self._passcode
            if isinstance(userid, unicode):
                userid = userid.encode('utf8')
            if self.header[b'RegisterBy'] == b'EMail':
                encrypted_key = _decrypt_regcode_by_email(regcode, userid)
            else:
                encrypted_key = _decrypt_regcode_by_deviceid(regcode, userid)
            block = _salsa_decrypt(block, encrypted_key)

        # decode this block
        sf = BytesIO(block)
        # number of key blocks
        num_key_blocks = self._read_number(sf)
        # number of entries
        self._num_entries = self._read_number(sf)
        # number of bytes of key block info after decompression
        if self._version >= 2.0:
            key_block_info_decomp_size = self._read_number(sf)
        # number of bytes of key block info
        key_block_info_size = self._read_number(sf)
        # number of bytes of key block
        key_block_size = self._read_number(sf)

        # 4 bytes: adler checksum of previous 5 numbers
        if self._version >= 2.0:
            adler32 = unpack('>I', f.read(4))[0]
            assert adler32 == (zlib.adler32(block) & 0xffffffff)

        # read key block info, which indicates key block's compressed and decompressed size
        key_block_info = f.read(key_block_info_size)
        key_block_info_list = self._decode_key_block_info(key_block_info)
        assert(num_key_blocks == len(key_block_info_list))

        # read key block
        key_block_compressed = f.read(key_block_size)
        # extract key block
        key_list = self._decode_key_block(key_block_compressed, key_block_info_list)

        self._record_block_offset = f.tell()
        f.close()

        return key_list

    def _read_keys_brutal(self):
        f = open(self._fname, 'rb')
        f.seek(self._key_block_offset)

        # the following numbers could be encrypted, disregard them!
        if self._version >= 2.0:
            num_bytes = 8 * 5 + 4
            key_block_type = b'\x02\x00\x00\x00'
        else:
            num_bytes = 4 * 4
            key_block_type = b'\x01\x00\x00\x00'
        block = f.read(num_bytes)

        # key block info
        # 4 bytes '\x02\x00\x00\x00'
        # 4 bytes adler32 checksum
        # unknown number of bytes follows until '\x02\x00\x00\x00' which marks the beginning of key block
        key_block_info = f.read(8)
        if self._version >= 2.0:
            assert key_block_info[:4] == b'\x02\x00\x00\x00'
        while True:
            fpos = f.tell()
            t = f.read(1024)
            index = t.find(key_block_type)
            if index != -1:
                key_block_info += t[:index]
                f.seek(fpos + index)
                break
            else:
                key_block_info += t

        key_block_info_list = self._decode_key_block_info(key_block_info)
        key_block_size = sum(list(zip(*key_block_info_list))[0])

        # read key block
        key_block_compressed = f.read(key_block_size)
        # extract key block
        key_list = self._decode_key_block(key_block_compressed, key_block_info_list)

        self._record_block_offset = f.tell()
        f.close()

        self._num_entries = len(key_list)
        return key_list


class MDD(MDict):
    """
    MDict resource file format (*.MDD) reader.
    >>> mdd = MDD('example.mdd')
    >>> len(mdd)
    208
    >>> for filename,content in mdd.items():
    ... print filename, content[:10]
    """
    def __init__(self, fname, passcode=None):
        MDict.__init__(self, fname, encoding='UTF-16', passcode=passcode)

    def items(self):
        """Return a generator which in turn produce tuples in the form of (filename, content)
        """
        return self._decode_record_block()

    def _decode_record_block(self):
        f = open(self._fname, 'rb')
        f.seek(self._record_block_offset)

        num_record_blocks = self._read_number(f)
        num_entries = self._read_number(f)
        assert(num_entries == self._num_entries)
        record_block_info_size = self._read_number(f)
        record_block_size = self._read_number(f)

        # record block info section
        record_block_info_list = []
        size_counter = 0
        for i in range(num_record_blocks):
            compressed_size = self._read_number(f)
            decompressed_size = self._read_number(f)
            record_block_info_list += [(compressed_size, decompressed_size)]
            size_counter += self._number_width * 2
        assert(size_counter == record_block_info_size)

        # actual record block
        offset = 0
        i = 0
        size_counter = 0
        for compressed_size, decompressed_size in record_block_info_list:
            record_block_compressed = f.read(compressed_size)
            # 4 bytes: compression type
            record_block_type = record_block_compressed[:4]
            # 4 bytes: adler32 checksum of decompressed record block
            adler32 = unpack('>I', record_block_compressed[4:8])[0]
            if record_block_type == b'\x00\x00\x00\x00':
                record_block = record_block_compressed[8:]
            elif record_block_type == b'\x01\x00\x00\x00':
                if lzo is None:
                    print("LZO compression is not supported")
                    break
                # decompress
                header = '\xf0' + pack('>I', decompressed_size)
                record_block = lzo.decompress(header + record_block_compressed[8:])
            elif record_block_type == b'\x02\x00\x00\x00':
                # decompress
                record_block = zlib.decompress(record_block_compressed[8:])

            # notice that adler32 return signed value
            assert(adler32 == zlib.adler32(record_block) & 0xffffffff)

            assert(len(record_block) == decompressed_size)
            # split record block according to the offset info from key block
            while i < len(self._key_list):
                record_start, key_text = self._key_list[i]
                # reach the end of current record block
                if record_start - offset >= len(record_block):
                    break
                # record end index
                if i < len(self._key_list)-1:
                    record_end = self._key_list[i+1][0]
                else:
                    record_end = len(record_block) + offset
                i += 1
                data = record_block[record_start-offset:record_end-offset]
                yield key_text, data
            offset += len(record_block)
            size_counter += compressed_size
        assert(size_counter == record_block_size)

        f.close()


class MDX(MDict):
    """
    MDict dictionary file format (*.MDD) reader.
    >>> mdx = MDX('example.mdx')
    >>> len(mdx)
    42481
    >>> for key,value in mdx.items():
    ... print key, value[:10]
    """
    def __init__(self, fname, encoding='', substyle=False, passcode=None):
        MDict.__init__(self, fname, encoding, passcode)
        self._substyle = substyle

    def items(self):
        """Return a generator which in turn produce tuples in the form of (key, value)
        """
        return self._decode_record_block()

    def _substitute_stylesheet(self, txt):
        # substitute stylesheet definition
        txt_list = re.split('`\d+`', txt)
        txt_tag = re.findall('`\d+`', txt)
        txt_styled = txt_list[0]
        for j, p in enumerate(txt_list[1:]):
            style = self._stylesheet[txt_tag[j][1:-1]]
            if p and p[-1] == '\n':
                txt_styled = txt_styled + style[0] + p.rstrip() + style[1] + '\r\n'
            else:
                txt_styled = txt_styled + style[0] + p + style[1]
        return txt_styled

    def _decode_record_block(self):
        f = open(self._fname, 'rb')
        f.seek(self._record_block_offset)

        num_record_blocks = self._read_number(f)
        num_entries = self._read_number(f)
        assert(num_entries == self._num_entries)
        record_block_info_size = self._read_number(f)
        record_block_size = self._read_number(f)

        # record block info section
        record_block_info_list = []
        size_counter = 0
        for i in range(num_record_blocks):
            compressed_size = self._read_number(f)
            decompressed_size = self._read_number(f)
            record_block_info_list += [(compressed_size, decompressed_size)]
            size_counter += self._number_width * 2
        assert(size_counter == record_block_info_size)

        # actual record block data
        offset = 0
        i = 0
        size_counter = 0
        for compressed_size, decompressed_size in record_block_info_list:
            record_block_compressed = f.read(compressed_size)
            # 4 bytes indicates block compression type
            record_block_type = record_block_compressed[:4]
            # 4 bytes adler checksum of uncompressed content
            adler32 = unpack('>I', record_block_compressed[4:8])[0]
            # no compression
            if record_block_type == b'\x00\x00\x00\x00':
                record_block = record_block_compressed[8:]
            # lzo compression
            elif record_block_type == b'\x01\x00\x00\x00':
                if lzo is None:
                    print("LZO compression is not supported")
                    break
                # decompress
                header = b'\xf0' + pack('>I', decompressed_size)
                record_block = lzo.decompress(header + record_block_compressed[8:])
            # zlib compression
            elif record_block_type == b'\x02\x00\x00\x00':
                # decompress
                record_block = zlib.decompress(record_block_compressed[8:])

            # notice that adler32 return signed value
            assert(adler32 == zlib.adler32(record_block) & 0xffffffff)

            assert(len(record_block) == decompressed_size)
            # split record block according to the offset info from key block
            while i < len(self._key_list):
                record_start, key_text = self._key_list[i]
                # reach the end of current record block
                if record_start - offset >= len(record_block):
                    break
                # record end index
                if i < len(self._key_list)-1:
                    record_end = self._key_list[i+1][0]
                else:
                    record_end = len(record_block) + offset
                i += 1
                record = record_block[record_start-offset:record_end-offset]
                # convert to utf-8
                record = record.decode(self._encoding, errors='ignore').strip(u'\x00').encode('utf-8')
                # substitute styles
                if self._substyle and self._stylesheet:
                    record = self._substitute_stylesheet(record)

                yield key_text, record
            offset += len(record_block)
            size_counter += compressed_size
        assert(size_counter == record_block_size)

        f.close()


if __name__ == '__main__':
    import sys
    import os
    import os.path
    import argparse
    import codecs

    def passcode(s):
        try:
            regcode, userid = s.split(',')
        except:
            raise argparse.ArgumentTypeError("Passcode must be regcode,userid")
        try:
            regcode = codecs.decode(regcode, 'hex')
        except:
            raise argparse.ArgumentTypeError("regcode must be a 32 bytes hexadecimal string")
        return regcode, userid

    parser = argparse.ArgumentParser()
    parser.add_argument('-x', '--extract', action="store_true",
                        help='extract mdx to source format and extract files from mdd')
    parser.add_argument('-s', '--substyle', action="store_true",
                        help='substitute style definition if present')
    parser.add_argument('-d', '--datafolder', default="data",
                        help='folder to extract data files from mdd')
    parser.add_argument('-e', '--encoding', default="",
                        help='folder to extract data files from mdd')
    parser.add_argument('-p', '--passcode', default=None, type=passcode,
                        help='register_code,email_or_deviceid')
    parser.add_argument("filename", nargs='?', help="mdx file name")
    args = parser.parse_args()

    # use GUI to select file, default to extract
    if not args.filename:
        import Tkinter
        import tkFileDialog
        root = Tkinter.Tk()
        root.withdraw()
        args.filename = tkFileDialog.askopenfilename(parent=root)
        args.extract = True

    if not os.path.exists(args.filename):
        print("Please specify a valid MDX/MDD file")

    base, ext = os.path.splitext(args.filename)

    # read mdx file
    if ext.lower() == os.path.extsep + 'mdx':
        mdx = MDX(args.filename, args.encoding, args.substyle, args.passcode)
        if type(args.filename) is unicode:
            bfname = args.filename.encode('utf-8')
        else:
            bfname = args.filename
        print('======== %s ========' % bfname)
        print('  Number of Entries : %d' % len(mdx))
        for key, value in mdx.header.items():
            print('  %s : %s' % (key, value))
    else:
        mdx = None

    # find companion mdd file
    mdd_filename = ''.join([base, os.path.extsep, 'mdd'])
    if os.path.exists(mdd_filename):
        mdd = MDD(mdd_filename, args.passcode)
        if type(mdd_filename) is unicode:
            bfname = mdd_filename.encode('utf-8')
        else:
            bfname = mdd_filename
        print('======== %s ========' % bfname)
        print('  Number of Entries : %d' % len(mdd))
        for key, value in mdd.header.items():
            print('  %s : %s' % (key, value))
    else:
        mdd = None

    if args.extract:
        # write out glos
        if mdx:
            output_fname = ''.join([base, os.path.extsep, 'txt'])
            tf = open(output_fname, 'wb')
            for key, value in mdx.items():
                tf.write(key)
                tf.write(b'\r\n')
                tf.write(value)
                if not value.endswith(b'\n'):
                    tf.write(b'\r\n')
                tf.write(b'</>\r\n')
            tf.close()
            # write out style
            if mdx.header.get('StyleSheet'):
                style_fname = ''.join([base, '_style', os.path.extsep, 'txt'])
                sf = open(style_fname, 'wb')
                sf.write(b'\r\n'.join(mdx.header['StyleSheet'].splitlines()))
                sf.close()
        # write out optional data files
        if mdd:
            datafolder = os.path.join(os.path.dirname(args.filename), args.datafolder)
            if not os.path.exists(datafolder):
                os.makedirs(datafolder)
            for key, value in mdd.items():
                fname = key.decode('utf-8').replace('\\', os.path.sep)
                dfname = datafolder + fname
                if not os.path.exists(os.path.dirname(dfname)):
                    os.makedirs(os.path.dirname(dfname))
                df = open(dfname, 'wb')
                df.write(value)
                df.close()

pureSalsa20.py

#!/usr/bin/env python
# coding: utf-8

"""
    pureSalsa20.py -- a pure Python implementation of the Salsa20 cipher, ported to Python 3

    v4.0: Added Python 3 support, dropped support for Python <= 2.5.
    
    // zhansliu

    Original comments below.

    ====================================================================
    There are comments here by two authors about three pieces of software:
        comments by Larry Bugbee about
            Salsa20, the stream cipher by Daniel J. Bernstein 
                 (including comments about the speed of the C version) and
            pySalsa20, Bugbee's own Python wrapper for salsa20.c
                 (including some references), and
        comments by Steve Witham about
            pureSalsa20, Witham's pure Python 2.5 implementation of Salsa20,
                which follows pySalsa20's API, and is in this file.

    Salsa20: a Fast Streaming Cipher (comments by Larry Bugbee)
    -----------------------------------------------------------

    Salsa20 is a fast stream cipher written by Daniel Bernstein 
    that basically uses a hash function and XOR making for fast 
    encryption.  (Decryption uses the same function.)  Salsa20 
    is simple and quick.  
    
    Some Salsa20 parameter values...
        design strength    128 bits
        key length         128 or 256 bits, exactly
        IV, aka nonce      64 bits, always
        chunk size         must be in multiples of 64 bytes
    
    Salsa20 has two reduced versions, 8 and 12 rounds each.
    
    One benchmark (10 MB):
        1.5GHz PPC G4     102/97/89 MB/sec for 8/12/20 rounds
        AMD Athlon 2500+   77/67/53 MB/sec for 8/12/20 rounds
          (no I/O and before Python GC kicks in)
    
    Salsa20 is a Phase 3 finalist in the EU eSTREAM competition 
    and appears to be one of the fastest ciphers.  It is well 
    documented so I will not attempt any injustice here.  Please 
    see "References" below.
    
    ...and Salsa20 is "free for any use".  
    
    
    pySalsa20: a Python wrapper for Salsa20 (Comments by Larry Bugbee)
    ------------------------------------------------------------------

    pySalsa20.py is a simple ctypes Python wrapper.  Salsa20 is 
    as it's name implies, 20 rounds, but there are two reduced 
    versions, 8 and 12 rounds each.  Because the APIs are 
    identical, pySalsa20 is capable of wrapping all three 
    versions (number of rounds hardcoded), including a special 
    version that allows you to set the number of rounds with a 
    set_rounds() function.  Compile the version of your choice 
    as a shared library (not as a Python extension), name and 
    install it as libsalsa20.so.
    
    Sample usage:
        from pySalsa20 import Salsa20
        s20 = Salsa20(key, IV)
        dataout = s20.encryptBytes(datain)   # same for decrypt
    
    This is EXPERIMENTAL software and intended for educational 
    purposes only.  To make experimentation less cumbersome, 
    pySalsa20 is also free for any use.      
    
    THIS PROGRAM IS PROVIDED WITHOUT WARRANTY OR GUARANTEE OF
    ANY KIND.  USE AT YOUR OWN RISK.  
    
    Enjoy,
      
    Larry Bugbee
    bugbee@seanet.com
    April 2007

    
    References:
    -----------
      http://en.wikipedia.org/wiki/Salsa20
      http://en.wikipedia.org/wiki/Daniel_Bernstein
      http://cr.yp.to/djb.html
      http://www.ecrypt.eu.org/stream/salsa20p3.html
      http://www.ecrypt.eu.org/stream/p3ciphers/salsa20/salsa20_p3source.zip

     
    Prerequisites for pySalsa20:
    ----------------------------
      - Python 2.5 (haven't tested in 2.4)


    pureSalsa20: Salsa20 in pure Python 2.5 (comments by Steve Witham)
    ------------------------------------------------------------------

    pureSalsa20 is the stand-alone Python code in this file.
    It implements the underlying Salsa20 core algorithm
    and emulates pySalsa20's Salsa20 class API (minus a bug(*)).

    pureSalsa20 is MUCH slower than libsalsa20.so wrapped with pySalsa20--
    about 1/1000 the speed for Salsa20/20 and 1/500 the speed for Salsa20/8,
    when encrypting 64k-byte blocks on my computer.

    pureSalsa20 is for cases where portability is much more important than
    speed.  I wrote it for use in a "structured" random number generator.

    There are comments about the reasons for this slowness in
          http://www.tiac.net/~sw/2010/02/PureSalsa20

    Sample usage:
        from pureSalsa20 import Salsa20
        s20 = Salsa20(key, IV)
        dataout = s20.encryptBytes(datain)   # same for decrypt

    I took the test code from pySalsa20, added a bunch of tests including
    rough speed tests, and moved them into the file testSalsa20.py.  
    To test both pySalsa20 and pureSalsa20, type
        python testSalsa20.py

    (*)The bug (?) in pySalsa20 is this.  The rounds variable is global to the
    libsalsa20.so library and not switched when switching between instances
    of the Salsa20 class.
        s1 = Salsa20( key, IV, 20 )
        s2 = Salsa20( key, IV, 8 )
    In this example,
        with pySalsa20, both s1 and s2 will do 8 rounds of encryption.
        with pureSalsa20, s1 will do 20 rounds and s2 will do 8 rounds.
    Perhaps giving each instance its own nRounds variable, which
    is passed to the salsa20wordtobyte() function, is insecure.  I'm not a 
    cryptographer.

    pureSalsa20.py and testSalsa20.py are EXPERIMENTAL software and 
    intended for educational purposes only.  To make experimentation less 
    cumbersome, pureSalsa20.py and testSalsa20.py are free for any use.

    Revisions:
    ----------
      p3.2   Fixed bug that initialized the output buffer with plaintext!
             Saner ramping of nreps in speed test.
             Minor changes and print statements.
      p3.1   Took timing variability out of add32() and rot32().
             Made the internals more like pySalsa20/libsalsa .
             Put the semicolons back in the main loop!
             In encryptBytes(), modify a byte array instead of appending.
             Fixed speed calculation bug.
             Used subclasses instead of patches in testSalsa20.py .
             Added 64k-byte messages to speed test to be fair to pySalsa20.
      p3     First version, intended to parallel pySalsa20 version 3.

    More references:
    ----------------
      http://www.seanet.com/~bugbee/crypto/salsa20/          [pySalsa20]
      http://cr.yp.to/snuffle.html        [The original name of Salsa20]
      http://cr.yp.to/snuffle/salsafamily-20071225.pdf [ Salsa20 design]
      http://www.tiac.net/~sw/2010/02/PureSalsa20
    
    THIS PROGRAM IS PROVIDED WITHOUT WARRANTY OR GUARANTEE OF
    ANY KIND.  USE AT YOUR OWN RISK.  

    Cheers,

    Steve Witham sw at remove-this tiac dot net
    February, 2010
"""
import sys
assert(sys.version_info >= (2, 6))

if sys.version_info >= (3,):
	integer_types = (int,)
	python3 = True
else:
	integer_types = (int, long)
	python3 = False

from struct import Struct
little_u64 = Struct( "<Q" )      #    little-endian 64-bit unsigned.
                                 #    Unpacks to a tuple of one element!

little16_i32 = Struct( "<16i" )  # 16 little-endian 32-bit signed ints.
little4_i32 = Struct( "<4i" )    #  4 little-endian 32-bit signed ints.
little2_i32 = Struct( "<2i" )    #  2 little-endian 32-bit signed ints.

_version = 'p4.0'

#----------- Salsa20 class which emulates pySalsa20.Salsa20 ---------------

class Salsa20(object):
    def __init__(self, key=None, IV=None, rounds=20 ):
        self._lastChunk64 = True
        self._IVbitlen = 64             # must be 64 bits
        self.ctx = [ 0 ] * 16
        if key:
            self.setKey(key)
        if IV:
            self.setIV(IV)

        self.setRounds(rounds)


    def setKey(self, key):
        assert type(key) == bytes
        ctx = self.ctx
        if len( key ) == 32:  # recommended
            constants = b"expand 32-byte k"
            ctx[ 1],ctx[ 2],ctx[ 3],ctx[ 4] = little4_i32.unpack(key[0:16])
            ctx[11],ctx[12],ctx[13],ctx[14] = little4_i32.unpack(key[16:32])
        elif len( key ) == 16:
            constants = b"expand 16-byte k"
            ctx[ 1],ctx[ 2],ctx[ 3],ctx[ 4] = little4_i32.unpack(key[0:16])
            ctx[11],ctx[12],ctx[13],ctx[14] = little4_i32.unpack(key[0:16])
        else:
            raise Exception( "key length isn't 32 or 16 bytes." )
        ctx[0],ctx[5],ctx[10],ctx[15] = little4_i32.unpack( constants )

        
    def setIV(self, IV):
        assert type(IV) == bytes
        assert len(IV)*8 == 64, 'nonce (IV) not 64 bits'
        self.IV = IV
        ctx=self.ctx
        ctx[ 6],ctx[ 7] = little2_i32.unpack( IV )
        ctx[ 8],ctx[ 9] = 0, 0  # Reset the block counter.

    setNonce = setIV            # support an alternate name


    def setCounter( self, counter ):
        assert( type(counter) in integer_types )
        assert( 0 <= counter < 1<<64 ), "counter < 0 or >= 2**64"
        ctx = self.ctx
        ctx[ 8],ctx[ 9] = little2_i32.unpack( little_u64.pack( counter ) )

    def getCounter( self ):
        return little_u64.unpack( little2_i32.pack( *self.ctx[ 8:10 ] ) ) [0]


    def setRounds(self, rounds, testing=False ):
        assert testing or rounds in [8, 12, 20], 'rounds must be 8, 12, 20'
        self.rounds = rounds


    def encryptBytes(self, data):
        assert type(data) == bytes, 'data must be byte string'
        assert self._lastChunk64, 'previous chunk not multiple of 64 bytes'
        lendata = len(data)
        munged = bytearray(lendata)
        for i in range( 0, lendata, 64 ):
            h = salsa20_wordtobyte( self.ctx, self.rounds, checkRounds=False )
            self.setCounter( ( self.getCounter() + 1 ) % 2**64 )
            # Stopping at 2^70 bytes per nonce is user's responsibility.
            for j in range( min( 64, lendata - i ) ):
                if python3:
                    munged[ i+j ] = data[ i+j ] ^ h[j]
                else:
                    munged[ i+j ] = ord(data[ i+j ]) ^ ord(h[j])

        self._lastChunk64 = not lendata % 64
        return bytes(munged)
    
    decryptBytes = encryptBytes # encrypt and decrypt use same function

#--------------------------------------------------------------------------

def salsa20_wordtobyte( input, nRounds=20, checkRounds=True ):
    """ Do nRounds Salsa20 rounds on a copy of 
            input: list or tuple of 16 ints treated as little-endian unsigneds.
        Returns a 64-byte string.
        """

    assert( type(input) in ( list, tuple )  and  len(input) == 16 )
    assert( not(checkRounds) or ( nRounds in [ 8, 12, 20 ] ) )

    x = list( input )

    def XOR( a, b ):  return a ^ b
    ROTATE = rot32
    PLUS   = add32

    for i in range( nRounds // 2 ):
        # These ...XOR...ROTATE...PLUS... lines are from ecrypt-linux.c
        # unchanged except for indents and the blank line between rounds:
        x[ 4] = XOR(x[ 4],ROTATE(PLUS(x[ 0],x[12]), 7));
        x[ 8] = XOR(x[ 8],ROTATE(PLUS(x[ 4],x[ 0]), 9));
        x[12] = XOR(x[12],ROTATE(PLUS(x[ 8],x[ 4]),13));
        x[ 0] = XOR(x[ 0],ROTATE(PLUS(x[12],x[ 8]),18));
        x[ 9] = XOR(x[ 9],ROTATE(PLUS(x[ 5],x[ 1]), 7));
        x[13] = XOR(x[13],ROTATE(PLUS(x[ 9],x[ 5]), 9));
        x[ 1] = XOR(x[ 1],ROTATE(PLUS(x[13],x[ 9]),13));
        x[ 5] = XOR(x[ 5],ROTATE(PLUS(x[ 1],x[13]),18));
        x[14] = XOR(x[14],ROTATE(PLUS(x[10],x[ 6]), 7));
        x[ 2] = XOR(x[ 2],ROTATE(PLUS(x[14],x[10]), 9));
        x[ 6] = XOR(x[ 6],ROTATE(PLUS(x[ 2],x[14]),13));
        x[10] = XOR(x[10],ROTATE(PLUS(x[ 6],x[ 2]),18));
        x[ 3] = XOR(x[ 3],ROTATE(PLUS(x[15],x[11]), 7));
        x[ 7] = XOR(x[ 7],ROTATE(PLUS(x[ 3],x[15]), 9));
        x[11] = XOR(x[11],ROTATE(PLUS(x[ 7],x[ 3]),13));
        x[15] = XOR(x[15],ROTATE(PLUS(x[11],x[ 7]),18));

        x[ 1] = XOR(x[ 1],ROTATE(PLUS(x[ 0],x[ 3]), 7));
        x[ 2] = XOR(x[ 2],ROTATE(PLUS(x[ 1],x[ 0]), 9));
        x[ 3] = XOR(x[ 3],ROTATE(PLUS(x[ 2],x[ 1]),13));
        x[ 0] = XOR(x[ 0],ROTATE(PLUS(x[ 3],x[ 2]),18));
        x[ 6] = XOR(x[ 6],ROTATE(PLUS(x[ 5],x[ 4]), 7));
        x[ 7] = XOR(x[ 7],ROTATE(PLUS(x[ 6],x[ 5]), 9));
        x[ 4] = XOR(x[ 4],ROTATE(PLUS(x[ 7],x[ 6]),13));
        x[ 5] = XOR(x[ 5],ROTATE(PLUS(x[ 4],x[ 7]),18));
        x[11] = XOR(x[11],ROTATE(PLUS(x[10],x[ 9]), 7));
        x[ 8] = XOR(x[ 8],ROTATE(PLUS(x[11],x[10]), 9));
        x[ 9] = XOR(x[ 9],ROTATE(PLUS(x[ 8],x[11]),13));
        x[10] = XOR(x[10],ROTATE(PLUS(x[ 9],x[ 8]),18));
        x[12] = XOR(x[12],ROTATE(PLUS(x[15],x[14]), 7));
        x[13] = XOR(x[13],ROTATE(PLUS(x[12],x[15]), 9));
        x[14] = XOR(x[14],ROTATE(PLUS(x[13],x[12]),13));
        x[15] = XOR(x[15],ROTATE(PLUS(x[14],x[13]),18));

    for i in range( len( input ) ):
        x[i] = PLUS( x[i], input[i] )
    return little16_i32.pack( *x )

#--------------------------- 32-bit ops -------------------------------

def trunc32( w ):
    """ Return the bottom 32 bits of w as a Python int.
        This creates longs temporarily, but returns an int. """
    w = int( ( w & 0x7fffFFFF ) | -( w & 0x80000000 ) )
    assert type(w) == int
    return w


def add32( a, b ):
    """ Add two 32-bit words discarding carry above 32nd bit,
        and without creating a Python long.
        Timing shouldn't vary.
    """
    lo = ( a & 0xFFFF ) + ( b & 0xFFFF )
    hi = ( a >> 16 ) + ( b >> 16 ) + ( lo >> 16 )
    return ( -(hi & 0x8000) | ( hi & 0x7FFF ) ) << 16 | ( lo & 0xFFFF )


def rot32( w, nLeft ):
    """ Rotate 32-bit word left by nLeft or right by -nLeft
        without creating a Python long.
        Timing depends on nLeft but not on w.
    """
    nLeft &= 31  # which makes nLeft >= 0
    if nLeft == 0:
        return w

    # Note: now 1 <= nLeft <= 31.
    #     RRRsLLLLLL   There are nLeft RRR's, (31-nLeft) LLLLLL's,
    # =>  sLLLLLLRRR   and one s which becomes the sign bit.
    RRR = ( ( ( w >> 1 ) & 0x7fffFFFF ) >> ( 31 - nLeft ) )
    sLLLLLL = -( (1<<(31-nLeft)) & w ) | (0x7fffFFFF>>nLeft) & w
    return RRR | ( sLLLLLL << nLeft )


# --------------------------------- end -----------------------------------

ripemd128.py

""" 
ripemd128.py - A simple ripemd128 library in pure Python.

Supports both Python 2 (versions >= 2.6) and Python 3.

Usage:
    from ripemd128 import ripemd128
    digest = ripemd128(b"The quick brown fox jumps over the lazy dog")
    assert(digest == b"\x3f\xa9\xb5\x7f\x05\x3c\x05\x3f\xbe\x27\x35\xb2\x38\x0d\xb5\x96")
"""
      


import struct


# follows this description: http://homes.esat.kuleuven.be/~bosselae/ripemd/rmd128.txt

def f(j, x, y, z):
	assert(0 <= j and j < 64)
	if j < 16:
		return x ^ y ^ z
	elif j < 32:
		return (x & y) | (z & ~x)
	elif j < 48:
		return (x | (0xffffffff & ~y)) ^ z
	else:
		return (x & z) | (y & ~z)

def K(j):
	assert(0 <= j and j < 64)
	if j < 16:
		return 0x00000000
	elif j < 32:
		return 0x5a827999
	elif j < 48:
		return 0x6ed9eba1
	else:
		return 0x8f1bbcdc

def Kp(j):
	assert(0 <= j and j < 64)
	if j < 16:
		return 0x50a28be6
	elif j < 32:
		return 0x5c4dd124
	elif j < 48:
		return 0x6d703ef3
	else:
		return 0x00000000

def padandsplit(message):
	"""
	returns a two-dimensional array X[i][j] of 32-bit integers, where j ranges
	from 0 to 16.
	First pads the message to length in bytes is congruent to 56 (mod 64), 
	by first adding a byte 0x80, and then padding with 0x00 bytes until the
	message length is congruent to 56 (mod 64). Then adds the little-endian
	64-bit representation of the original length. Finally, splits the result
	up into 64-byte blocks, which are further parsed as 32-bit integers.
	"""
	origlen = len(message)
	padlength = 64 - ((origlen - 56) % 64) #minimum padding is 1!
	message += b"\x80"
	message += b"\x00" * (padlength - 1)
	message += struct.pack("<Q", origlen*8)
	assert(len(message) % 64 == 0)
	return [
	         [
	           struct.unpack("<L", message[i+j:i+j+4])[0]
	           for j in range(0, 64, 4)
	         ]
	         for i in range(0, len(message), 64)
	       ]


def add(*args):
	return sum(args) & 0xffffffff

def rol(s,x):
	assert(s < 32)
	return (x << s | x >> (32-s)) & 0xffffffff

r =  [ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9,10,11,12,13,14,15,
       7, 4,13, 1,10, 6,15, 3,12, 0, 9, 5, 2,14,11, 8,
       3,10,14, 4, 9,15, 8, 1, 2, 7, 0, 6,13,11, 5,12,
       1, 9,11,10, 0, 8,12, 4,13, 3, 7,15,14, 5, 6, 2]
rp = [ 5,14, 7, 0, 9, 2,11, 4,13, 6,15, 8, 1,10, 3,12,
       6,11, 3, 7, 0,13, 5,10,14,15, 8,12, 4, 9, 1, 2,
      15, 5, 1, 3, 7,14, 6, 9,11, 8,12, 2,10, 0, 4,13,
       8, 6, 4, 1, 3,11,15, 0, 5,12, 2,13, 9, 7,10,14]
s =  [11,14,15,12, 5, 8, 7, 9,11,13,14,15, 6, 7, 9, 8,
       7, 6, 8,13,11, 9, 7,15, 7,12,15, 9,11, 7,13,12,
      11,13, 6, 7,14, 9,13,15,14, 8,13, 6, 5,12, 7, 5,
      11,12,14,15,14,15, 9, 8, 9,14, 5, 6, 8, 6, 5,12]
sp = [ 8, 9, 9,11,13,15,15, 5, 7, 7, 8,11,14,14,12, 6,
       9,13,15, 7,12, 8, 9,11, 7, 7,12, 7, 6,15,13,11,
       9, 7,15,11, 8, 6, 6,14,12,13, 5,14,13,13, 7, 5,
      15, 5, 8,11,14,14, 6,14, 6, 9,12, 9,12, 5,15, 8]


def ripemd128(message):
	h0 = 0x67452301
	h1 = 0xefcdab89
	h2 = 0x98badcfe
	h3 = 0x10325476
	X = padandsplit(message)
	for i in range(len(X)):
		(A,B,C,D) = (h0,h1,h2,h3)
		(Ap,Bp,Cp,Dp) = (h0,h1,h2,h3)
		for j in range(64):
			T = rol(s[j], add(A, f(j,B,C,D), X[i][r[j]], K(j)))
			(A,D,C,B) = (D,C,B,T)
			T = rol(sp[j], add(Ap, f(63-j,Bp,Cp,Dp), X[i][rp[j]], Kp(j)))
			(Ap,Dp,Cp,Bp)=(Dp,Cp,Bp,T)
		T = add(h1,C,Dp)
		h1 = add(h2,D,Ap)
		h2 = add(h3,A,Bp)
		h3 = add(h0,B,Cp)
		h0 = T
	
	
	return struct.pack("<LLLL",h0,h1,h2,h3)

def hexstr(bstr):
	return "".join("{0:02x}".format(b) for b in bstr)

Tags:

Python

Tuesday, May 10, 2016 | Python

文章评论

No comments posted yet.

发表评论

标题*: 给个方向吧.
姓名 *: 怎么称呼您？
Email
网站地址
评论内容 *: 写上些您的评论吧.; Remember Me?

Please add 3 and 2 and type the answer here:

Enter the code shown above:

LC's Blog

随记