对waterflow问题的数学描述和函数编程解决方案

原文: http://philipnilsson.github.io/Badness10k/posts/2013-10-11-functional-waterflow.html

waterflow 问题

假设如下图(1),我们在不同的x轴的每一个点都有一个不同的y轴高度。

图1

该图可以用一个数组表示,然后数组里的每一个值表示的Y的高度,而数组的序列号(index)则表示为图中
的X轴。

比如上图可以用数组表示为: [2,5,1,2,3,4,7,7,6]

现在假设从上面(Y轴)向下下雨(如图2),而且雨从来也不用考虑会停,那么上图在不同高度的Y轴之间最
多可以收集多少雨水?

图2

另外,我们假设1升为(X轴的单位)X(Y轴的单位),即1x1为1升。现在如图2中,最左边的因为没有东西
挡住,所以雨水会洒出,最右边的也是一样。

数学描述与答案

请去上面的原文里查看。另外该作者也有另外几篇关于函数式编程的文章,也非常不错,放在附录里了。

附录

最后

最近迷上了函数式编程(functional programming),不知道为什么?!

Dynamic Python Class Generation

作者是Kyle Knapp,是Amazon的软件工程师。

什么是python的动态类生成?

  • Generate classes at runtime
  • data driven

简单示例

def hello_world(obj):
print "Hello world from %s" % obj

class BaseClass(object):
def __init__(self, instance_name):
self.instance_name = instance_name

def __repr__(self):
return self.instance_name

my_class = type("MyClass", (BaseClass,), {"hello_world": hello_world})
my_class_inst = my_class("My Instance")

Why dynamic class generation

  • Improve workflow
  • Improve visibility
  • Reduce physical size
  • Is production-level

Downside

  • traceback (类似于javascript中的匿名函数,难以调试)
  • IDE support

Production level demo: AWS SDK for Python

  • dynamic and data driven
  • efficient workflow

When to conside dynamic class generation

  • existing canonical data
    • web API
    • database

An application demo

  • client interface
  • 1:1 mapping of methods of API
  • model driven
  • validate inputs
  • parse response
  • JSON RPC

Steps

  • 1) make a simple JSON RPC client
  • 2) integrate API models
  • 3) add specific API methods
  • 4) make the client extensible
  • 5) add more APIs
JSON API overview
  • reply on HTTP POST
  • Signle URI
  • JSON in body of request and response
import json
import requests

class Client(object):
def __init__(self, end_point_url):
self._end_point_url = end_point_url

def make_api_call(self, method, params):
headers = {'content-type': 'application/json'}
payload = {
'jsonrpc': '2.0',
'method': 'method',
'params': params,
'id': 0
}

return requests.post(
self._end_point_url, data=json.dumps(payload), headers=headers
).json()
expanded
  • feels very general
  • no input validation
  • return the entire response
integrate model into client
class ModeledClient(Client):
def __init__(self, model):
self._model = model
self._validator = RequestParamsValidator()
self._parser = ResponseParamsParser()
super(ModeledClient, self).__init__(model['endpoint_url'])

def make_modeled_api_call(self, method, *args, **kwargs):
# Validate the parameters provided for the particular method using the
# provided model.
validated_params = self._validate_method_params(
method, *args, **kwargs)
# Make the API call and get the response.
api_response = self.make_api_call(method, validated_params)
# Parse the response for the particular method using the provided
# model.
return self._parse_api_response(method, api_response)

def _validate_method_params(self, method, *args, **kwargs):
operation_input_model = self._get_operation_model(method)['input']
return self._validator.validate(operation_input_model, *args, **kwargs)

def _parse_api_response(self, method, api_response):
operation_output_model = self._get_operation_model(method)['output']
return self._parser.parse(
operation_output_model, api_response)

def _get_operation_model(self, method):
operation_models = self._model['operations']
if method not in operation_models:
raise RuntimeError(
'Unknown operation %s' % method)
return operation_models[method]
More expanded
  • API remains undocumented
Add specific method

create a class factory:

def create_client(model):
cls_name = 'MyClient'
cls_bases = (ModeledClient,)
cls_props = {}

for operation_name, operation_model in model['operations'].items():
method = _get_client_method(operation_name)
cls_props[operation_name] = method

cls = type(cls_name, cls_bases, cls_props)
return cls(model)

def _get_client_method(operation_name):
def _api_call(self, *args, **kwargs):
return self.make_modeled_api_call(
operation_name, *args, **kwargs)
return _api_call
Expand class factory for specific docstring
def create_client(model):
cls_name = 'MyClient'
cls_bases = (ModeledClient,)
cls_props = {}

for operation_name, operation_model in model['operations'].items():
method = _get_client_method(operation_name)
method.__name__ = str(operation_name)
method.__doc__ = _get_docstring(operation_model)
cls_props[operation_name] = method

cls = type(cls_name, cls_bases, cls_props)
return cls(model)
Expand to support custom class name and custom inheritance
def create_client(model, cls_name='MyClient', cls_bases=None):
if not cls_bases:
cls_bases = (ModeledClient,)
cls_props = {}

for operation_name, operation_model in model['operations'].items():
method = _get_client_method(operation_name)
method.__name__ = str(operation_name)
method.__doc__ = _get_docstring(operation_model)
cls_props[operation_name] = method

cls = type(cls_name, cls_bases, cls_props)
return cls(model)
class CachedClient(ModeledClient):
def __init__(self, model):
super(CachedClient, self).__init__(model)
self._operation_cache = {}

@property
def history(self):
return self._operation_cache

def make_modeled_api_call(self, method, *args, **kwargs):
cache_key = '%s(args=%s,kwargs=%s)' % (method, args, kwargs)
if cache_key in self._operation_cache:
print('Retrieving result from history')
return self._operation_cache[cache_key]
else:
print('Retrieving result from server')
result = super(CachedClient, self).make_modeled_api_call(
method, *args, **kwargs)
self._operation_cache[cache_key] = result
return result

Presentation and Source Code at: https://github.com/kyleknap/dynamic-web-api-clients

题外话:类似的codegen技术,还有最近正火的swagger:https://github.com/swagger-api

Python and ByteCode

今天在europython 2016看到的一个好看的印度人的演讲,觉得很有意思,记录下来。

作者:Anjana Valill (vakila.github.io)

问题:Why does python code run faster in a function? (http://stackoverflow.com/questions/11241523/why-does-python-code-run-faster-in-a-function)

1 # outside_fn.py                     # inside_fn.py
for i in range(10**8): def run_loop():
i for i in range(10**8):
i
$ time python3 outside_fn.py $ time python3 inside_fun.py
real 0m9.185s real 0m5.738s
user 0m9.104s user 0m5.634s
sys 0m0.048s sys 0m0.055s

The awesome stuff your program does

source code
|
~
compiler
=> parse tree > abstract syntax tree > control flow graph =>
|
~
byte code
|
~
interpreter
virtual machine performs operations on a stack of objects

What is byte code?

An intermediate representation of your program.

What the interpreter “sees” when it run your program

Machine code for a virtual machine (the interpreter)

A series of instructions for stack operations.

Cached as .pyc file.

How can we read it

dis: bytecode disassembler

>>> def hello():
return 'hello'

>>> import dis
>>> dis.dis(hello)

2 0 LOAD_CONST 1 ('hello')
3 RETURN_VALUE

What does it all mean

                                              argument value  
|
arg index |
| |
operation name | |
| | |
line # | | |
| | | |
| offset | | |
| | | | |
~ ~ ~ ~ ~
2 0 LOAD_CONST 1 ('hello')

Sample Operations

LOAD_CONST(c)           pushes c onto top of stack (TOS)
BINARY_ADD pops & adds top 2 items, result becomes TOS
CALL_FUNCTIONS(a) call function with arguments from stack
a indicates # of positional & keyword args

What can we dis

functions:

In [1]: def add(spam, eggs):
...: return spam + eggs
...:

In [3]: import dis

In [4]: dis.dis(add)
2 0 LOAD_FAST 0 (spam)
3 LOAD_FAST 1 (eggs)
6 BINARY_ADD
7 RETURN_VALUE

classes:

In [5]: class Parrot:
...: def __init__(self):
...: self.kind = "Norwegian Blue"
...: def is_dead(self):
...: return True
...:

In [6]: dis.dis(Parrot)
Disassembly of __init__:
3 0 LOAD_CONST 1 ('Norwegian Blue')
3 LOAD_FAST 0 (self)
6 STORE_ATTR 0 (kind)
9 LOAD_CONST 0 (None)
12 RETURN_VALUE

Disassembly of is_dead:
5 0 LOAD_GLOBAL 0 (True)
3 RETURN_VALUE

code string (3.2+)

>>> import dis
>>> dis.dis("spam, eggs = 'spam', 'eggs'")
1 0 LOAD_CONST 3 (('spam', 'eggs'))
3 UNPACK_SEQUENCE 2
6 STORE_NAME 0 (spam)
9 STORE_NAME 1 (eggs)
12 LOAD_CONST 2 (None)
15 RETURN_VALUE

module

$ python -m dis knights.py
1 0 LOAD_CONST 0 ('print("Ni!")')
3 STORE_NAME 0 (__doc__)
6 LOAD_CONST 1 (None)
9 RETURN_VALUE
# knights.py
print("Ni!")
def is_fresh_wound():
return True

>>> import knights
Ni!
>>> import dis
>>> dis.dis(knights)
Disassembly of is_fresh_wound:
3 0 LOAD_GLOBAL 0 (True)
3 RETURN_VALUE

last traceback

>>> print(spam)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'spam' is not defined
>>> import dis
>>> dis.dis()
1 0 LOAD_NAME 0 (print)
--> 3 LOAD_NAME 1 (spam)
6 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
9 PRINT_EXPR
10 LOAD_CONST 0 (None)
13 RETURN_VALUE

Why do we care

debugging

>>> ham/eggs + ham/spam
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'ham' is not defined
>>> dis.dis()
1 --> 0 LOAD_NAME 0 (ham)
3 LOAD_NAME 1 (eggs)
6 BINARY_TRUE_DIVIDE
7 LOAD_NAME 0 (ham)
10 LOAD_NAME 2 (spam)
13 BINARY_TRUE_DIVIDE
14 BINARY_ADD
15 PRINT_EXPR
16 LOAD_CONST 0 (None)
19 RETURN_VALUE

Solving puzzles

outside_fn.py

>>> outside = open('outside_fn.py').read()
>>> import dis
>>> dis.dis(outside)
1 0 SETUP_LOOP 24 (to 27)
3 LOAD_NAME 0 (range)
6 LOAD_CONST 3 (100000000)
9 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
12 GET_ITER
>> 13 FOR_ITER 10 (to 26)
16 STORE_NAME 1 (i)

2 19 LOAD_NAME 1 (i)
22 POP_TOP
23 JUMP_ABSOLUTE 13
>> 26 POP_BLOCK
>> 27 LOAD_CONST 2 (None)
30 RETURN_VALUE

inside_fn.py

>>> import dis
>>> inside = open('inside_fn.py').read()
>>> dis.dis(inside)
1 0 LOAD_CONST 0 (<code object run_loop at 0x010F6DE0, file "<dis>", line 1>)
3 LOAD_CONST 1 ('run_loop')
6 MAKE_FUNCTION 0
9 STORE_NAME 0 (run_loop)

4 12 LOAD_NAME 0 (run_loop)
15 CALL_FUNCTION 0 (0 positional, 0 keyword pair)
18 POP_TOP
19 LOAD_CONST 2 (None)
22 RETURN_VALUE

Investigate the bytecode diff

STORE_NAME(namei)

Implement name = TOS. namei is the index of name in the attribute co_names of
the code object.

LOAD_NAME(namei)

Pushes the value associated with co_names[namei] onto the stack.

STORE_FAST(var_num)

Stores TOS into the local co_varnames[var_num]

LOAD_FAST(var_num)

Push a reference to the local co_varnames[var_num] onto the stack

Digg the deep

ceval.c: the heart of the beast

A.Kaptur: “A 1500 (!!) line switch statement powers your Python”
(http:aksptur.com/talks/)

  • LOAD_FAST (#11368) is ~10 lines, invovles fast locals lookup
  • LOAD_NAME (#12353) is ~50 lines, invovles slow dict lookup
  • prediction (#11000) makes FOR_ITER + STORE_FAST even faster

Use Docker to automate testing

缘由:在测试中需要启动一些后台进程,但是后台进程启动的非常慢,严重影响了自动化测试。

问题:需要快速的启动或者关闭一些后台进程,必须redis、memcache等。

方案:使用docker这种轻量级的虚拟机(container)。


安装docker-py

pip install docker-py

定义docker

postgres = dict(image="postgres", 
prots=["5432:5432",],
environment={
'POSTGRES_USER': 'test',
'POSTGRES_PASS': 'test'
}
)

rabbit = dict(image="rabbitmq:3-management",
ports=["15672:15672", "5762:5762"]
)

container = [postgres, rabbit]

定义class level的setUp和tearDown

import docker
class DockerUnitTestExample(unittest.TestCase):
@classmethod
def initContainer(cls):
postgres = dict(image="postgres",
ports=["5432:5432",],
envrionment={
"POSTGRES_USER": "test",
"POSTGRES_PASSWORD": "test"
}
)

container = [postgres,]

for container in containers:
cls.dockClient.pull(container["image"])
cls.containers.append(cls.dockerClient.create_container(**container)
@classmethod
def runContainers(cls):
for container in cls.containers:
cls.dockerClient.start(container["Id"])

@classmethod
def stopContainers(cls):
for container in cls.containers:
cls.dockerClient.stop(container["Id"])

@classmethod
def setUpClass(cls):
cls.containers = []
cls.dockerClient = docker.Client()
cls.dockerClient.login('dockerhubuser', 'dockerhubuser')
cls.initContainers()
cls.runContainers()

@classmethod
def tearDownClass(cls):
cls.stopContainers()

参考: http://www.talperry.com/2015/10/03/python-unittests-with-docker/

Python functional programming aspect

名词解释

  • TCO (Tail Call Optimization): 尾调优化
  • CPS (Continuous Passing Style): 续传样式

TCO 库

  • tail call optimization
  • optimized tail recursion
  • continuation passing style
  • pseudo call-with-current-continuation

tco库的主要动机:

重复的调用function的时候,并不会增加执行栈(execution stack)的大小,并且在函数式编程中可以随时从一个执行栈的某一部分跳转到另外一部分之前的代码,同时并不需要穿越中间的函数调用和return。

第一个实例,二叉搜索树

假定我们有一个二叉树,现在我们需要在这个二叉树上搜索制定的节点(node)。

使用尾递归(tail recursion)

from itertools import groupby
from random import sample

nbrlevels = 16
n = 2 ** nbrlevels - 1
t = sample(range(2 * n), n)
t.sort()

def make_tree(l):
if len(l) == 1:
return [l[0], [], []]

return [l[0], make_tree(l[1:len(l)//2+1]), make_tree(l[len(l)//2+1:])]

tree = make_tree(t)

如果使用tco,则:

  • 递归调用自己,且不重载(overloading stack)
  • 在递归的内部不退出递归的情况下调用其他的函数

待续—

参考:
1) http://baruchel.github.io/python/2015/11/07/explaining-functional-aspects-in-python/
2) https://github.com/baruchel/tco
3) https://en.wikipedia.org/wiki/Continuation-passing_style
4) https://blogs.msdn.microsoft.com/ericlippert/2010/10/21/continuation-passing-style-revisited-part-one/
5) https://www.zhihu.com/question/24453254
6) http://blog.pengyifan.com/articles/functional-programming-for-the-rest-of-us/
7) http://blog.csdn.net/hikaliv/article/details/4548561
8) https://segmentfault.com/n/1330000005012969
9) http://baruchel.github.io/python/2015/07/10/continuation-passing-style-in-python/
10) https://github.com/baruchel/tco

Compare Image Pixel by Pixel by ImageMagick

第一个方案: 把要去掉的部分,写进去一个矩阵里,然后再逐像素比较时,跳过原本要去掉的部分。

http://stackoverflow.com/questions/28881471/comparing-images-in-which-some-parts-are-variable-but-should-not-matter

第二个方案:把要填充的部分,使用单一的颜色借助于ImageMagick的convert功能,生成一个新的
图片,这个图片保证在需要去除的部分,使用实色填充。