Geeky Rui: June 2012

Thursday, June 28, 2012

529s and Financial Aid

I had small contribution to 529, but here is an interesting analysis of its impact on the student's eligibility to apply for financial aid.

529s and Financial Aid

The executive summary is, for federal aid, 529s impact much less than pretty much every other form of investments. For institution aid, it is case by case, and better contact schools early.

Saturday, June 23, 2012

Future vs. Callback

It is common that a lot of asynchronies transport library often offers two types of interfaces, returning Future or taking a callback parameter.

An effective to explain the differences between the two would be push vs pull. Future is pull. When you make the asynchronies call, you get a future handle, via it, you can wait on the result. Callback is more like push, the callback method is being passed and executed on the callee side, which often add the results to a queue for further processing.

In Java, Future uses a blocking lock to control the state of Future, when async call is pending for result, the lock is blocked on get call, when data is returned, the lock will be released.

Here is a link to a well written article about Future vs. Callback:
http://www.drdobbs.com/go-parallel/article/showArticle.jhtml?articleID=226700179

Thursday, June 14, 2012

Gnuplot

Recently I worked on measuring the performance of a RPC protocol vs. a restful HTTP protocol. I checked out gnuplot to chart my results. It comes pretty handy.

I track the server side performance in logs, and used python to parse the data into two columns, representing the size of the data vs response time. Below is the scratch of the gnuplot commands used to generate trending chart.

set term gif
set output "perf.gif"scatch
set multiplot
set xrange [0:60000]
set yrange [0:200]
set xlabel "Response time in milliseconds"
set ylabel "# of C"
set label 1 at first 30000.0, 120.0 "rpc"
set label 2 at first 30000.0, 10.0 "restful"
set title "performance comparison"
plot "data1.txt" notitle w points pt 5
plot "data2.txt" notitle w points pt 5 linecolor rgb "blue"
unset multiplot
set output

Keep gnuplot commands into a file and run
gnuplot < command

That makes it easier to see trending charts quickly.

Monday, June 11, 2012

Large Cluster Client in JVM

I have been working on large scale distributed infrastructure for a few years now and reached point that I believe I should really start putting in serious effort in noting down my learnings before all knowledge vanish with my loosing memories.

A week or two ago, I have been investigating a "memory leaking" case in which a service that talks to two clusters of distributed services to retrieve information. The service is running in Java. We took several heap dump over a course of 3-5 days, using
jmap -dump:live,file=<file_name> <pid>
and Yourkit to load them up to examine what's the biggest changes over time in different heaps.

The heap analysis pointed out an increasing byte[] which has been rapidly growing with a speed roughly .3G/day. And these bytes come from netty.

Two key traits revealed from YourKit. First, the call stack indicate the byte[] was referenced by DynamicChannelBuffer which was in term referenced by LengthFieldBasedFrameDecoder. This is the class that "deframe" the incoming bytes prior to deserialization. The other part is, I noticed in the bigger heap, the length of byte[] tend to be generally bigger, whereas in the smaller one, their size distribute toward the smaller range.

Having these two pieces of information help me found the cause this "leak". Our cluster client has to maintain hundreds of connections to service cluster. To reduce the cost of establishing network connections, we reuse them. Each connection has a DynamicChannelBuffer associated with it and will grow by doubling itself when the incoming data is large. Since our data size have a wide range and can be unpredictable, overtime, all channels would receive the large dataset and adjusted itself. Because of the size of the clusters, the combined memory used in these DynamicChannelBuffers are non-trival, a couple of Gigabytes in one JVM.

A simple fix in this case would be to close the network channels every now and then, thus discarding the old channels and its associating DynamicChannelBuffer. One thing to watch in this case would be choosing reasonable rate to close the channels. Having them close/reopen too frequent might cause the client being too chatty, thus a longer backlog on the netty server side (netty server has a default backlog setting, which needs to be adjusted).

Another fix would be periodically detach the FrameDecoder from the channel, and recreate new ones. This implementation though might be too much tied to a specific implementation of Netty.

In general, when you have a client that connect to large clusters, it is a common problem how the clients efficiently using memory to buffer the received data in JVM. Some choose to keep these buffers short lived in the young gen, others might be pooling these buffers to reduce pressure on Garbage Collection. Either way, whenever your system involves a cluster to a large cluster, it is an area that you should remember to carefully examine.