Archive for September, 2010

Speeding up KGet — using Callgrind

September 11, 2010

Old slowness

If you have used KGet you might have noticed that KGet tends to be very slow, or rather sluggish if you have many items in your transferlist. For one it can take very long to start, then long for an operation you want to do and again long to remove many transfers.

I have used callgrind before to track the issues down. Though it turned out that the callgrind output was so confusing — even combined with KCachegrind — that it was hard to find the parts that were of interest for a certain task I wanted to look at.

Efficiently using Callgrind

Now triggered by Jonathan’s blog post (and the followup) on speeding up KRatingPointer via finding bottlenecks through callgrind I did the same again for KGet. But this time I used more sophisticated methods that really make it shine, an easy introduction can be found here.

The idea is that I define the places that should be profiled in KGet itself via some macros (CALLGRIND_START_INSTRUMENTATION; and CALLGRIND_STOP_INSTRUMENTATION; from #include <valgrind/callgrind>). That way only what I want is being profiled and I don’t get profiling information for areas I don’t want. The later part is really important since otherwise the result would be completely distorted (e.g. because of the starting KGet phase etc.). Now I only need to start callgrind with valgrind --tool=callgrind --instr-atstart=no kget --nofork and I get some useful dumps.

Example

So what does using callgrind look like?

I wanted to know what takes long when starting KGet, in specific loading all Transfers. So I added the above mentioned macros to KGet::load(…) at the beginning of the method and at the end. Then running callgrind and then kcachegrind on the data generated something like you can see in the screenshot (sorry for the German).

As you see loading 604 Transfers was quite expensive and most of the cycles happened in TransferTreeModel::addTransfer and there what took so much time was QStandardItem::appendRow — or rather the actions caused by the signals emitted by appendRow. What I did to improve this was adding a TransferTreeModel::addTransfers to add all the transfers at once. I did it a little hacky, i.e. I start a beginInsertRows then deactivate emitting signals for the model as those resulted in constant redrawing of the view connected to the model, then call appendRow multiple times, reactivate the signals again and do a endInsertRows. I only did that hacky way because I did not manage to use QStandardItem::insertRows, it appears not to be what I need.

Those measurs alone resulted in a huge speedup. Once done even more I got something like this:

As you can see the total cost decreased drastically and a lot of time is spend in parsing xml-data. Still there are areas left that can be improved easily. DataSourceFactory::setStatus takes 16% of all cycles — 99% because of signals — taking a closer look one can see that both DataSourceFactory::processedSize and DataSourceFactory::percent are emited 1208 times while the rest is emited 604 times. That means that for each Transfer those two are emitted twice. Emitting them just once gave a 5% speedup, more would be possible by emitting just one signal something like changeEvent(Tc_Status | Tc_Speed | Tc_TotalSize | Tc_Downloadedsize), so not having a signal for each event but rather defining in the send data what changed.

Result

All this lead to many speed improvments in KGet. KGet should now (4.5.2) be a lot faster in the startup process, when scrolling and significantly faster when removing many transfers. I will look into speeding up starting/stopping many transfers too.

Before I go into any details remember that all that was tested on my machine so the results may vary on yours. Also keep in mind that faster here means less cycles, so that does not translate 1:1 in real time, as io operations — Callgrind does not help here — could use few cycles yet still take a lot of time as a HD is very slow compared with other memory. An example for this is the SQlite history backend that, looking at the cycles, only improved marginally (~20%) while the time when deleting ~600 downloads — all those get added to the store — shrank from 51942 msec to 138 msec.

Especially removing transfers got a lot faster. When removing many unfinished transfers it can now be more than 170 times faster than before, also removing finished transfers can be around 85 times faster. In the last case the speedup depends on if the removed transfers are neighbours, i.e. there are no transfers inbetween them, in the worst case it should still be a little faster. At the same time starting KGet with ~600 downloads was improved 35 times.

In fact there are still a lot of areas for further improvement, though I am quite happy with the result already.

So thank you Callgrind devs, Jonathan for your blog entry and especially Kyle Fransham for his great tutorial which should be put on techbase imo! And in fact all others who helped me deserve a thank you was well. 🙂

PS.: No this is not the large feature I was talking about last time, haven’t worked on that for a while. This is rather like bug fixing. 😉