For a long time I’ve wanted to create a module that would shed light on how perl uses memory. This year I decided to do something about it.
My research and development didn’t yield much fruit in time for OSCON in July, where my talk ended up being about my research and plans. (I also tried to explain that RSS isn’t a useful measurement for this, and that malloc buffering means even total process size isn’t a very useful measurement.) I was invited to speak at YAPC::Asia in Tokyo in September and really wanted to have something worthwhile to demonstrate there.
I’m delighted to say that some frantic hacking (aka Conference Driven Development) yielded a working demo just in time and, after a little more polish, I’ve now uploaded Devel::SizeMe to CPAN.
In this post I want to introduce you to Devel::SizeMe, show some screenshots, a screencast of the talk and demo, and outline current issues and plans for future development.
For a while I thought Devel::NYTProf might be a useful framework for building some kind of “memory profiler”. Something that would measure changes in memory use over time between lines and subroutines. Nicholas Clark even created a clever experimental hack to demo the concept. Sadly the data just didn’t seem to be very useful. It turns out that knowing where memory is allocated and freed isn’t nearly as important as knowing where memory is being held.
It was clear that some kind of ‘snapshot’ mechanism was needed. Something that would:
- crawl all the data structures within a perl interpreter
- have some way of naming the path to each data structure
- stream the data out for external storage and processing
- be fast enough that snapshots could be taken frequently
- visualize the vast amount of data
- compare different snapshots
Luckily the hardest part, step 1, was already covered by Devel::Size. Originally written by Dan Sugalski in 2005, then maintained by Tels and BrowserUK, it had been picked up and polished by Nicholas Clark to stay in sync with the many internal optimizations he and others were adding to the perl core. It’s not without problems, and I’ll outline those below, but it was a great base for me.
I added a callback mechanism, so my code and others could “hitch a ride” on the back of Devel::Size as it crawled the data structures, and came up with a very lightweight way to track and output the “name path”.
My initial code just wrote a tree-like textual representation to prove the concept:
$ SIZEME='' perl -MDevel::SizeMe=total_size -e 'total_size([ 1, "hi",  ])' SV(PVAV) fill=2/2 [#1 @0] : +24 sv_head =24 : +40 sv_body =64 : +24 av_max =88 : ~note av_len 2 : AVelem-> [#2 @1] : : SV(RV) [#3 @2] : : : +24 sv_head =112 : : : RV-> [#4 @3] : : : : SV(PVAV) fill=-1/-1 [#5 @4] : : : : : +24 sv_head =136 : : : : : +40 sv_body =176 : : ~note i 2 : AVelem-> [#6 @1] : : SV(PV) [#7 @2] : : : +24 sv_head =200 : : : +16 sv_body =216 : : : +16 SvLEN =232 : : ~note i 1 : AVelem-> [#8 @1] : : SV(IV) [#9 @2] : : : +24 sv_head =256 : : ~note i 0
There you can see the array (PVAV) ‘node’ with ‘leaf’ sizes for the sv_head (24 bytes), sv_body (40 bytes), and the array of element pointers (av_max, 24 bytes). Below that you can see a ‘link’ called AVelem pointing to a reference (RV) to an array with no elements. The “~note” lines are ‘attributes’ that can be used to provide extra information about nodes. The ‘
=NNN‘ gives a running total of the accumulated size.
The terminology here (sv_head, sv_body, av_max etc.) might not be familiar to you unless you’ve spent time delving into perl guts. Hopefully, though, it’s clear that Devel::SizeMe gives access to immense detail.
That detail can quickly become overwhelming for non-trivial data structures. Some kind of visualization was needed. So I added a more compact ‘raw’ output format and a script (sizeme_store.pl) to process it. The script ‘decorates’ the nodes with the leaf and attribute data, gives the links better names, and adds extra details like the total size of the children.
$ SIZEME='|sizeme_store.pl --dot=sizeme.dot' perl -MDevel::SizeMe=total_size -e 'total_size([ 1, "hi",  ])'
The SIZEME env var gives the name of the file to write the raw data to, or in this case the name of a program to pipe the data into. Here I’m asking sizeme_store.pl to write a dot format file which, when rendered by Graphviz, produces a graph like this:
You can see the links have been labeled with the index attribute, and the nodes show how the size is calculated (self+children=total) and the sizes accumulate up the graph.
That’s lovely, and works well for modestly sized data structures. It doesn’t scale well though. You quickly find yourself looking at diagrams like this:
$ SIZEME='|sizeme_store.pl --db=sizeme.db' perl -MDevel::SizeMe=total_size -e 'total_size([ 1, "hi",  ])'
That’s asking sizeme_store.pl to produce a sizeme.db file. Then, to visualize the data you can run sizeme_graph.pl to launch the web app:
$ sizeme_graph.pl --db=sizeme.db daemon
then visit http://127.0.0.1:3000/ to see the result:
The overall grey area, which has a title bar labeled “SV(PVAV)”, represents the total memory used by the structure. The area is divided into three parts for the three elements of the array. The smallest, labeled “-> SV(IV)”, is the integer. The next larger one, labeled “-> SV(PV)”, is the string. The largest area is the array reference. Because the referenced array was empty the logic in sizeme_graph.pl has ‘collapsed’ the array into the parent node to simplify the tree map. This is reflected in the label “-> SV(RV) RV-> SV(AV)”.
The darker box is a tooltip that moves with the pointer and displays extra detail about whatever node the pointer hovers over. In this case it’s showing that the total memory use is 88 bytes (the head and the body size of the RV and the AV have been summed up). The rest of the content is mostly debugging information. They’ll be more useful info here in future.
The Whole Picture
The total_size($ref) function dumps the contents of a particular data structure. But it’s not enough to get the whole picture. For that I wanted to be able to dump everything in a perl interpreter. Executing total_size(\%main::) gets closer to everything, but it’s still a long way off.
So I added a
perl_size() function. That starts by dumping the stashes (
\%main::, or in internals speak PL_defstash) but then goes on to dump many more items you might never have realized existed. PL_stashcache, PL_regex_padav, PL_encoding, PL_modglobal, and PL_parser to name but a few. It then records the amount of unused space in perl’s arenas.
Finally then scans the arenas looking for any values that haven’t been seen yet. Currently this finds quite a lot because the
perl_size() code isn’t complete yet. (Many thanks to rafl for helping improve the coverage here.) Once it’s complete, any unseen values found in the arenas will be leaks. So Devel::SizeMe may turn into a useful leak detection tool.
Taking this idea further, there’s also a
heap_size() function. The goal here is to try to account for everything in the heap. (See my slides if you’re not familiar with that term.) The one key item here is asking malloc for information about how much memory it’s using and, especially, how much ‘free’ memory it’s holding on to, for malloc’s which support that.
See It In Action
This explanation is rather dry. To get a real sense of what Devel::SizeMe can do you need to see it in action with some non-trivial data. Here’s a screencast of my Perl Memory Use talk at YAPC::Asia (also available as a raw mov here and here, mv4 here and here, and mp4 here and here). The demonstration starts at 13:00.
Just four steps:
- cpanm Devel::SizeMe # install the module
- perl -d:SizeMe …your.script.here…
- sizeme_graph.pl daemon
- open http://127.0.0.1:3000/
Devel::SizeMe notices that it’s been run as
perl -d:SizeMe and arranges to automatically call
perl_size() in an
END block. Simple.
There are two weakness with the current Devel::Size logic that affect Devel::SizeMe.
The first is that it uses a simple depth-first search. That’s fine when just calculating a total, but for Devel::SizeMe it means that chasing references held by one named item, like a subroutine, can lead to all sorts of other items, including entire stashes, appearing to be “within” the item that held the reference. The second is that Devel::Size doesn’t have well defined sense of when to stop chasing references because it doesn’t consider reference counts.
So I plan to add a multi-phase search mechanism. References with a count of 1 will be followed immediately. References with a count greater than one will be queued, along with a count of how many times the reference has been seen so far. In this way all the ‘named’ data reachable from
%main:: will be found first and identified with their natural names before the queued items are crawled. This should greatly improve the output.
More coverage is needed in perl_size() to reduce the number of ‘unseen’ items that show up in the arenas, as seen in the screencast.
A priority is to get my changes to the core of Devel::Size integrated back in. It would be crazy to have two modules duplicating this sometimes complex and perl-version-specific logic. My goal is to have a single C file that’s used by both modules. Each would compile it with different macros to enable the required behavior. This should enable Devel::Size to suffer no performance loss for the extra logic that Devel::SizeMe has added.
I’ve already started adding some support for “named” runs. The idea is to enable the size functions to be called multiple times within a single process, and to store the data in separate tables within the database. This is an important step towards being able to compare multiple runs to see how the memory use has changed.
Lots of refactoring is needed to turn my conference-driven-dash-for-the-finish-line hacking into more robust and reusable code. In particular I’d like to get a reasonable stable and useful database schema so other people can write module to process the data generated by Devel::SizeMe.
Further in the future I can imagine having an option to record the existence of pointers to data that’s already been seen. That information is currently discarded but would add a great deal of detail to the output. Reference loops would be much easier to see for example. It would turn the output ‘tree’ into a directed graph and enable much richer visualizations.
We’re just at the start.