lxml infinity loop/memory leak - workaround for deliverance
lxml's (v2.0 - v2.2.2) implementation of CSSSelector has bug that can use all available memory. This situation can totally hang your server. In here I will show few workarounds - useful until this bug will be fixed.
Few weeks ago we had some problems with development server. After few minutes of work server totally hang - every try to connect via ssh was finished with timeout. We had to make hard reset but after few minutes it hang again. Next restart, but this time we turn on debug mode (single user mode in linux) and start services one by one. After this investigations we was surprised that deliverance after few minutes of work consume all available memory (4 GB of RAM and 6 GB of swap).
Further investigation show that CSS selector with typo was so problematic.
In deliverance v0.3 (trunk version) rules has pointers to HTML elements. This pointers consists with two parts - type of selection (e.g.: children or attributes) and selector (CSS selector or XPath) separated with colon:
<replace content="children:#content-wrapper" theme="children:#content" />
Let look at sample:
attributes(href):/html/body/a
Deliverance will use XPath to extract tag a from body of document. But if developer forgot colon (:):
attributes(href)/html/body/a
this pointer will be parsed by lxml as CSS selector. Of course this selector is not correct, but lxml (versions 2.0 - 2.2.2) cannot manage it and parse it in infinity loop and consume more and more memory. Issue is submitted on lxml bug tracker
Workarounds
Deliverance use not more than 512 MB of memory so environment must not allow to use more than this.
1. Memory limit for user
Run deliverance-proxy as separated user. Make system limits for this user. More info in manual pages on every linux:
man limits.conf
If deliverance will eat to much memory python will rise MemoryError.
2. Memory limit for process
Configure limits only for processes started form commend line. Create simple bash script:
#!/bin/bash ulimit -v 507200 ulimit -H -v 507200 bin/deliverance-proxy $*
Also with this script deliverance will stop with MemoryError.
3. Use memmon from supervisor
Supervisor's plugin superlance has special feature that will restart process it if use to much memory. Configuration is simple with buildout:
[buildout]
extensions = gp.vcsdevelop
vcs-extend-develop = svn+http://codespeak.net/svn/z3/deliverance/trunk/#egg=deliverance
develop-dir = src
parts =
deliverance
supervisor
eggs =
deliverance
[deliverance]
recipe = repoze.recipe.egg:scripts
scripts = deliverance-proxy
eggs = ${buildout:eggs}
[supervisor]
recipe = collective.recipe.supervisor
plugins = superlance
port = 8000
user = admin
password = :)
serverurl = http://127.0.0.1:8000
programs =
10 deliverance ${buildout:bin-directory}/deliverance-proxy [${buildout:directory}/src/theme/rules.xml] true
eventlisteners =
Memmon TICK_60 ${buildout:bin-directory}/memmon [-p deliverance=500MB]
This will build deliverance-proxy controlled with supervisor and monitored for memory leaks.

