Date

Bug 1: Python and Operating Systems

This week I've run into two bugs of note. The first comes from a classmate of mine. His python scraping script terminated with an out of memory error. He determined the error came from a call to the subprocess module, specifically subprocess.check_output.

From this StackOverflow post we find a comment:

The issue is due to os.fork() and the standard solution is a fork-server (a dedicated process that spawns other processes -- multiprocessing-based instead of sh-based as in your question). -- J.F. Sebastian

My initial reaction is to use vfork() since it is designed to be used immediately before an exec(). I imperfectly recall that vfork() makes use of its parent process's memory until it reaches the exec().

But a fork-server is an entirely new concept to me. The reason for this is that fork() (which is used in the underlying implementation of subprocess) first creates an identical process before calling the command. This means that memory usage temporarily doubles. This isn't always so bad, since the new process requires some base amount of memory anyway, but all attempts to quickly find such a number failed. Obviously it depends on the operating system. Anyway, back to the bug!

Clicking through gives us this:

As a general rule (i.e. in vanilla kernels), fork/clone failures with ENOMEM occur specifically because of either an honest to God out-of-memory condition or because security_vm_enough_memory_mm failed you while enforcing the overcommit policy.

The answer mentions 'overcommit policies'. I had to look it up.

overcommit policy - Overcommit refers to the practice of giving out virtual memory with no guarantee that physical storage for it exists.

Here is some reading on overcommit.

Ah, here we see my beloved vfork(). As well as a heretofore unknown to me posixspawn. And here's the relevant part of the answer:

consider using suprocess.Popen only once, at the beginning of your script (when Python's memory footprint is minimal), to spawn a shell script that then runs free/ps/sleep and whatever else in a loop parallel to your script; poll the script's output or read it synchronously, possibly from a separate thread if you have other stuff to take care of asynchronously -- do your data crunching in Python but leave the forking to the subordinate process.

The third StackOverflow post I referenced recommends a similar solution. More specifically, Eric Angell suggests the rfoo library for RPCs. I've never used this, and expected there to be an official python standard library implementation. Again, something else to research.

Bug 2: Angular $http.get Exception

While this bug is less interesting, I write about it here in case any poor soul goes through the same trials as I. The result is this, DO NOT TEST CODE ON YOUR OWN BROWSER. Instead you should fire up a fresh, clean, plugin-free browser when testing JS code. Adblocker and Privacy Badger will intercept cross site GET requests they deem insecure. Granted, I probably shouldn't have written the code I did, but I was merely testing a feature.

If you get an exception (not an HTTP error status), and the chrome/firefox network monitor doesn't show the network request, try disabling these plugins.