Bug 1: Python and Operating Systems
This week I've run into two bugs of note. The first comes from a classmate of
mine. His python scraping script terminated with an out of memory error. He
determined the error came from a call to the subprocess
module, specifically
subprocess.check_output
.
From this StackOverflow post we find a comment:
The issue is due to os.fork() and the standard solution is a fork-server (a dedicated process that spawns other processes -- multiprocessing-based instead of sh-based as in your question). -- J.F. Sebastian
My initial reaction is to use vfork()
since it is designed to be used
immediately before an exec()
. I imperfectly recall that vfork()
makes use of
its parent process's memory until it reaches the exec()
.
But a fork-server is an entirely new concept to me. The reason for this is that
fork()
(which is used in the underlying implementation of subprocess) first
creates an identical process before calling the command. This means that memory
usage temporarily doubles. This isn't always so bad, since the new process
requires some base amount of memory anyway, but all attempts to quickly find
such a number failed. Obviously it depends on the operating system. Anyway, back
to the bug!
Clicking through gives us this:
As a general rule (i.e. in vanilla kernels), fork/clone failures with ENOMEM occur specifically because of either an honest to God out-of-memory condition or because security_vm_enough_memory_mm failed you while enforcing the overcommit policy.
The answer mentions 'overcommit policies'. I had to look it up.
overcommit policy - Overcommit refers to the practice of giving out virtual memory with no guarantee that physical storage for it exists.
Here is some reading on overcommit.
Ah, here we see my beloved vfork()
. As well as a heretofore unknown to me
posixspawn
. And here's the relevant part of the answer:
consider using
suprocess.Popen
only once, at the beginning of your script (when Python's memory footprint is minimal), to spawn a shell script that then runs free/ps/sleep and whatever else in a loop parallel to your script; poll the script's output or read it synchronously, possibly from a separate thread if you have other stuff to take care of asynchronously -- do your data crunching in Python but leave the forking to the subordinate process.
The third StackOverflow post I referenced recommends a similar solution.
More specifically, Eric Angell suggests the rfoo
library for RPCs. I've never
used this, and expected there to be an official python standard library
implementation. Again, something else to research.
Bug 2: Angular $http.get Exception
While this bug is less interesting, I write about it here in case any poor soul goes through the same trials as I. The result is this, DO NOT TEST CODE ON YOUR OWN BROWSER. Instead you should fire up a fresh, clean, plugin-free browser when testing JS code. Adblocker and Privacy Badger will intercept cross site GET requests they deem insecure. Granted, I probably shouldn't have written the code I did, but I was merely testing a feature.
If you get an exception (not an HTTP error status), and the chrome/firefox network monitor doesn't show the network request, try disabling these plugins.