Performance Anxiety

I’ve wanted to write something about our performance problems for a while now. It’s a long standing issue, and something that is a real showstopper for some users. I’ve written this for two reasons: firstly, to tell our users why we haven’t “just fixed it” yet; secondly, so that other/future developers can get an idea of the WHY of the architecture and not just the HOW.

One of the oldest issues on the RabbitVCS issue tracker is issue number 4 — “Inadequate (sub-par) performance.” Hurtful, hurtful words, but ultimately true. The basic problem is that for small working copies, our extension works great. Once you get above about 50,000 items though, things really…

…Slow

…Down.

When you try to open up a folder with many version controlled children, Nautilus locks up. It is completely unrepsonsive until almost all the emblems are populated and — as much as I hate to say it — this renders RabbitVCS a little bit useless. Basically, we’re taking the I out of GUI.

Worst of all, it makes our “clever play on the word Tortoise” into a terrible, terrible irony.

What Performance Problems?

One of the first things  to understand is that performance is not actually the issue here. This may sound a little strange, but think about it: when our users say they want better performance, what they really seem to mean is that they want Nautilus to stay responsive. To put it another way, given the choice between:

  1. Entering a versioned directory and waiting 1 minute for all the emblems to display correctly BUT have Nautilus lock up for the whole time; and
  2. Entering a versioned directory and waiting 1 minute 20 seconds for all the emblems to display, but still being able to use Nautilus

…most users really seem to want (2), even though techincally (1) is the “better performance” option. Of course, if it were 20 minutes instead of one then yes, we’d have a problem. But that’s not what we’re experiencing at all. Status checks take the same amount of time, no matter what we do, no matter what language we use, no matter what the phase of the moon is. The call to vsc_client.status(path) will always always be our performance floor.

Now this might seem like minor semantic bickering — performance, responsiveness, who cares what you call it? But how you state a problem has a huge influence on how you deal with it (and how patient people are with your efforts…). Calling it a performance problem immediately implies a certain kind of solution, and causes well-meaning users to suggest things like JIT compilers and optimisers (bah), or rewriting in C (go on then). But if you stop obsessing over performance for five seconds, the solution becomes a little clearer. As Bruce and Jason (Field) realised, as soon as a Nautilus extension becomes non-trivial you need to start looking at how to do things asynchronously.

The simple story is:

  1. Nautilus calls all of its Python extensions. They are all called in the same thread, and no other threading is permitted.
  2. The Python extensions do their thing and return.
  3. Nautilus displays the results.

So if step #2 takes minutes to complete, Nautilus is frozen waiting for a function call to return. That’s our problem.

Doing things in the background isn’t all that trivial though. For example, some users have suggested using threads to do things asynchronously. But threading is not permitted in Nautilus Python extensions, and so either the module will hang or just fail to run (depending on the direction of the local magnetic field). The next logical step is to use a separate process for the heavy lifting, so we have something like:

  1. Nautilus calls all of its Python extensions.
  2. The RabbitVCS extension calls a status checker service running in the background.
  3. The status cache service should return immediately with either the real status or some indication that it will calculate the status (those cute little clock emblems).
  4. The status is calculated.
  5. The background service tells our extension when the status is calculated.
  6. The extension tells Nautilus to invalidate the information for that path. This triggers another check, but now we have the status!
  7. Profit.

This is what we’ve achieved so far for doing the emblems. We use DBUS to communicate with a background service that is automatically started on demand — we could have simple used a self-contained subprocess, but a DBUS service means that the other parts of RabbitVCS can access it too.

Diversions

The transition to DBUS was not as easy as it sounds. When we first moved things out to a DBUS controlled “subprocess,” we didn’t notice any improvement. This was because the Subversion bindings we use also prevent threading, and we were trying to intersperse status checks with status requests. This lead to me implementing a hideously complicated system of another layer of subprocesses to solve this. Really, we should have been using DBUS’ in-built asynchronous calling system to simplify things, which I finally got around to figuring out last week.

Another upshot of all this was what I learned about profiling. One of the most frustrating things for me was just how difficult it was to get a decent picture of what was taking so long in the first place. Profiling (in Python or otherwise) is an arcane art — unless you have a single process with a single thread, you’re going to get some mysterious results. The things that cause the most problems are thread synchronisation mechanisms, and if you’re lucky they’ll show up as huuuuuuge blocks of time spent waiting on a locking function. If you’re unlucky, you might as well throw darts at a stack trace. (Protip: print it out first.)

This isn’t a fundamental flaw with Python. But I was frustrated because the Python profiling tools aren’t designed to work on individual sections of the code. The most infuriating comment from #python regarding this was “which part of all this is cProfile.run(‘foo()’) ?” It’s a freaking embedded module inside a C extension API, not a shell script! Eventually I found a way to get a return value from cProfile so that I can profile the functions that Nautilus calls under normal use. (Of course, the good thing about subjecting yourself to #python is that the inhabitants do help you live up to the general Python philosophy of “if it seems harder than it should be, you’re doing it wrong, or possibly you’re an idiot with unsalvageable code; but let’s work on the other thing first.” I should have used DBUS properly from the start, and I should have remembered that it’s very rare for Python/C bindings to allow threading.)

So Close…

We’re now at the point where status emblems have almost no noticeable impact on Nautilus’ behaviour. Fantastic! So if we open up a test repository (in my case, sections 0-9 and A-D of the official Debian repository), Nautilus should just sit back and relax while waiting for everything to haaaaaappeeeeeeeeeoh the humanity it’s still freaking slow.

Here’s the problem: I forgot about the menus. Whenever Nautilus enters a new directory, it requests the menu items for the directory (“get_background_items”). This requires a status check, which yes indeed has to happen in the extension code. Nuts.

So why don’t we just do the same thing? Load up a menu that says “Please wait, loading…” and then use a callback to populate the menu? It’s a great idea, and the Nautilus extension API even has such a function: “nautilus_menu_provider_emit_items_updated_signal” which does exactly what it says on the box. Alas, the problem is with the Nautilus/Python bindings, which don’t allow us to use this method! The Nautilus/Python bindings have some real quirks in them, and it’s no trivial matter to simply pass Python objects and C interfaces around and expect things to work.

On the bright side, the Nautilus/Python bindings are now in the hands of Adam himself. It’s not always the case that cross-language bindings are in the hands of someone who uses them heavily enough to come up against constraints like this, so we’re at an advantage there. Adam has been hacking away to get the bindings to cover what we need, and while it’s not an easy task, it is certainly doable and should have some pretty good benefits even besides the menu signal. For example, once the proper Nautilus “update_file_info” callbacks are available, we can clean up a lot of hacky code and workarounds in the extension (possibly fixing a mysterious memory leak too).

I don’t want to promise anything too soon, but I do want to emphasise the fact that we are working on it, with an actual plan and everything! Keep your eye on issue #4, and if you’re a developer (or feeling brave), there will be a related branch in the repository soon. When we’ve got the code sufficiently functional, we’ll also release some packages for testing. So if you’re one of our users who has had to drop RabbitVCS because of performance problems, please hang in there — we are working on it, and it won’t be long now before we’ll be asking for all the real-world testing we can get.

28 responses to “Performance Anxiety”

  1. Brett Alton says:

    I was wondering about this as I only just now am starting to use your program, but need it for work, so of course has “performance anxiety”.

    No one else has this type of functionality in a program for Nautilus however and the performance only seems slow when I open the repository for the very first time, so for being able to develop in Linux, it is very barrable at the moment!

    Keep up the good work.

    • Jason Heeris says:

      Since we don’t have a status cache at the moment, you’d think that every access should take just as long. Bruce wrote an interesting post about how low-level caching makes it difficult to identify performance problems:

      http://blog.rabbitvcs.org/archives/103

      • Brett Alton says:

        I meant when I traverse from www > svn > OAS > admin > reports, it’s slowest when at svn/ (root) and then quicker at OAS/ and admin/. I’ve learned to just type into Nautilus, the folder I want to traverse to and then RabbitVCS doesn’t bog down my workflow.

        • Benjie says:

          That’s exactly what I do too. Control-L /path/to/folder/ :)

          Does the context menu need to do svn status? Can’t you just enable all commands and then display helpful messages if the option is clicked but isn’t available?

  2. Auria says:

    The killer way is to register for filesystem events, and keep a database updated with the status of everything at all times as they change ;)
    I think mercurial does this on at least one platform. But I would understand if you just told me “too complicated” ^^

    • Jason Heeris says:

      Absolutely. That’s the way we want to go, and we’ve made a couple of (aborted) attempts at it. The status checker class that DBUS uses is almost trivial, and so you might wonder why we don’t just do a status check in the DBUS class itself and save a hundred lines of code. It’s actually because it gives us a set of stubs for implementing a status cache later.

      My only concern is that holding several thousand inotify hooks (or whatever we use for it) might have performance or resource problems of its own…

      • Auria says:

        I am not sure about ext file systems; but I know that on HFS (mac OS X) file systems you can register for events on a per-directory basis. If this can be done then it would dramatically reduce the number of hooks and might work

  3. Daniel Trebbien says:

    Jason, this type of post is extremely helpful to potential developers. Thank you for taking the time to write out the details of how this major change is being developed.

  4. […] … at least till the developers of RabbitVCS fix these performance issues. […]

  5. Mark says:

    Did you ever thought about a non recursive status icon view like in TortoiseSVN? They have an external status cache, which in my usage slowed down the system very much by causing to much I/O. I use the status cache “Shell” setting every time I work with TortoiseSVN on Windows, which only shows the status of the current folder, which is the information I need. This would be a quick fix for the performance problems, won’t it?

    • Jason Heeris says:

      I don’t think that a non-recursive status check would be that useful. For example, bringing up the menu on a directory, you’ll always want to know the recursive status otherwise you’d only ever be able to commit individual files.

      • Mark says:

        I just can say that working as a developer for several years and using TortoiseSVN with the non-recursive setting it was useful enough for me. Choosing a menu option which requires recursive information (like commit) just had to do the work on demand.

        • Jason Heeris says:

          But how would we know whether or not to show the “commit” action unless we did a recursive status check? Or would we always show it?

          • Mark says:

            Yes, TortoiseSVN presents the “commit” action on any folder which is under version control. If I commit on a folder without changes, I just get the commit window with an empty list of changed files and the info that no files were changed. In most cases when I do a commit I know where I made the changes and that something under a certain folder needs to be committed.

            • EJ says:

              Totally agree with Mark; also in TortoiseSVN I always deselected the option to recursively enter the folders. I know when I want to commit something; and the only time I really need a recursive search is when I click on ‘commit’!

              It would be so easy to build in, and so much better performance for RabbitSVN!

  6. Manolet says:

    Rabbit VCS was the reason i leave ubuntu 6 months ago. Im a developer and i come from windows. I install it as a replacement for tortoise SVN, but my nautilus start to hangup when i open it, and that makes my work a dissaster. so i erase ubuntu and return to windows.

    I try again with rabbit vcs and is the same thing, but i want to ask you, if this is all about performance, why you just add an option to DISABLE the recursive status check and put the rabbit vcs menu? WHO CARES OF THE ICONS!! the most worth think for me in tortoise is that i can select easily what files to commit and what to no commit. I dont care the icons. i mean, COME ON I KNOW WHAT I CHANGED!!!

    Also, why you dont make a standalone app, im currently using rabbit vcs in gedit and uninstall the nautilus plugin.

    Trust me, im your user, i dont want icons. i want to posibility to use your amazing menu.

    And of course, the posibility to still using my pc (i dont want hangups)

    • Anonymous says:

      I think the menu issue has been improved a lot. It won’t be available until the next release though.

    • Jason Heeris says:

      The emblems do not cause the performance problems. The menu does. There is no easy way to have asynchronous menus in Nautilus, so we have to hack around it. This means that there are about a thousand stupid little edge cases (such as: if another extension does the same thing, neither will work; memory problems because nautilus doesn’t tell you when a window is closed — these are the big two).

      What would a standalone app have? Wouldn’t it just be another file browser?

  7. borgo says:

    pagavcs project is looking good, much faster in nautilus, didn’t test other things yet. Good luck getting rabbitvcs faster in nautilus.

    http://code.google.com/p/pagavcs/

  8. demch says:

    Thank you for keeping us updated, I will do my best to keep our Performance Anxiety under control.

    It’s a big issue for some of us but I have the confidence that you will resolve it.

    Best regards.

  9. Stephen says:

    Well, I was going to write a snarky comment about putting your checkout on your SSD, but nope, that does not help – at least not with emblems.

    Nautilus locked up for at least a minute, which I am staggered by. Even when cached the RabbitVCS menu takes a couple of seconds to come up.

    Without emblems, it seems reasonable. A lock up for “a little while” (< 30s, probably less than 10) is ok for first load.

    I'm sorry, but if it can't be solved (no visible lockup) with an SSD, there's something majorly wrong (thanks for realising this :))

    Oh, for those who are now rejecting getting an SSD for this, know that it was locking up for 10's of minutes prior.

  10. dobrokot says:

    TortoiseSVN has three options for status check and icons:

    1) only current folder
    2) recursive (unusable for large repositories)
    3) disable icons (do not access svn status at all)

    Why menu items care about status? When I want to see history of folder or commit files from it – I’m accept locking for minute.

    I just browse folders or just right click on item… Why you need status of folder to show “update this folder”?

    Also, setting /usr/bin/meld for rabbitcvs log command.

  11. dobrokot says:

    Ok, I had found how to use side-by-side in the rabbitvcs log:
    sudo vim /usr/share/pyshared/rabbitvcs/ui/log.py
    and replace

    def view_diff_for_path(self, url, revision_number, sidebyside=False):

    to

    def view_diff_for_path(self, url, revision_number, sidebyside=True):

  12. Mauro says:

    Any news about the implementation of the non-recursive check?
    I think that it should be a very useful solution for a lot of users.