The results are in, insights into improving NautilusSvn’s performance.
Posted in Status Update on May 7th, 2009 by Bruce van der Kooij – 8 CommentsBased on the results so far from the performance poll with a sample of 130 users it seems 38% of NautilusSvn users are not satisfied with the current performance of the extension, while about 43% of the users could be considered satisfied. 19% rated performance as acceptable but it’s a thin line both ways so I won’t count them in either camp. There are very few people that flat out refuse to consider NautilusSvn because of this issue, if I had to take a guess I’d say some 11% ;-)
If I had to place a wager I would bet that most people don’t mind if NautilusSVN takes a while to determine the status for entire working copies (to a certain extent), as long as it doesn’t overheat their CPU, doesn’t consume too much memory and most importantly doesn’t hang Nautilus. If anybody disagrees with this please leave a comment.
However, judging by some of the comments it seems that some people just have extremely large working copies or tend to collect a lot of extremely large working copies in a single project directory. Based on some tests with timing the PySVN status method and the command-line Subversion client I would have to say that there’s really not a lot that can be improved with regards to actual performance. I’m sorry to have to say it, but that’s just the way it is.
Allow me to elaborate.
First, note that on average initial status checks take about 10x longer than consecutive ones. For example, the initial check for the entire TortoiseSVN working copy takes 8309.0079 milliseconds, consecutive ones take 865.9279 milliseconds. That’s 8 seconds compared to 0.8 seconds, I think everybody agrees with me that that’s quite a difference. However, as I see it there’s simply no way to speed up that initial status check. So, say you have 15 working copies the size of TortoiseSVN organized in a single directory, upon entering that directory it would take NautilusSvn some 2 minutes to just figure out the statuses.
Also note that if we want to properly keep working copy and directory emblems up-to-date we’ll also have to recursively register watches on each working copy, initially registering watches using inotify in the case of the TortoiseSVN working copy takes quite a few seconds (I didn’t time it).
There’s still room left to make NautilusSvn more efficient and perhaps a bit more snappy here and there. Especially with regards to the status logic there are still improvements that can be made. Also the implementation of a proper cache will certainly help in making consecutive status checks even faster. But none of this will result in substantial improvements in the area of initial status checks.
So if there isn’t a way to substantially improve performance with regards to initial status checks what can we do? What I know we can do is create the illusion of performance or possibly degrade some functionality. Here’s a few things that come to mind:
- Pre-loading working copies. Do the initial status checks when the user isn’t looking. It’s probably a good idea to not do this immediately after booting. We would also have to make sure the computer is not on battery power.
- Allow the user to configure to disable NautilusSvn for certain directories.
- Do some scheduling tricks and progressively check parts of a working copy. This will also help prevent 100% CPU usage issues for a considerable duration of time (leading to overheating).
- Executing the status checks asynchronously
However, let me point out that doing everything asynchronously (i.e. in the background) will only obscure performance issues. Sure, Nautilus wouldn’t hang anymore but NautilusSvn will still be hacking away in the background (possibly causing your CPU temperature to rise to unacceptable levels). Especially when developing I find Nautilus hanging a very useful indicator on whether or not progress is being made.
In the end, irrelevant of the performance issues, what I’m most interested in is having an elegant, flexible, maintainable and robust codebase.
Any thoughts?
P.S.
I hope to be posting more of these type of blog entries, that is if people are interested in hearing me talk about this. :-) Now, back to pointless, incessant barking.