Diagnosing and Fixing Cassandra Timeouts
8/19/2016We recently brought Cassandra into our infrastructure to take advantage of its robust built-in scalability model. After reading all the "marketing" material, watching/attending talks by evangelists, and reading blog posts by experts and enthusiasts we were excited to embark on our Cassandra journey.
Everything was honky dory in the early going. We followed data modeling best practices, provisioned the right sized servers as recommended by DataStax, and saw great results in the test environment. The issues didn't start until we started moving over portions of our production traffic to our Cassandra production servers. That's when we noticed that we would fairly frequently get read and write timeouts as the volume of traffic got higher.
My first troubleshooting step was to find out was to monitor the servers to see if there were any anomalies or resource constraints that led to the timeouts. I'm a fan of Brendan Gregg's USE method for monitoring overall system health and identifying errors and bottlenecks. However, at the time we didn't have a comprehensive monitoring solution like Datadog or Icinga2 setup for those servers. So in its place, I wrote a shell script that every 30 seconds would log memory and CPU utilization, disk I/O, network I/O, Cassandra thread pool stats (nodetool tpstats), GC stats, compaction stats, and the table stats for my most active tables. To my chagrin, my hacked together monitor did not reveal any anomalies or resource constraints that were the cause of the timeouts. On the contrary, it revealed that resource utilization (with the exception of CPU which was a bit high) was well under the hardware limits. I did notice that a number of times, a query timeout would coincide with the start of SSTable compaction which led down the rabbit hole of compaction tweaks, however I emerged from that hole with no solution to the Cassandra timeouts.
Once hardware constraints had been ruled out, I kicked off my next round of troubleshooting efforts by turning on trace probability and looking at the trace for long running queries. In the traces I noticed high queue times, which is not unexpected during times of high load, but I also noticed something else. There were a number of times when a worker node received the request from the coordinator, completed the read operation for the query, queued up the result to be sent back to the coordinator, but the results were never sent and the coordinator timed out waiting for the response. Bingo! I had found the cause of the timeout! But I still hadn't found the root cause; why were the responses not getting sent back? And I didn't know how to fix it either. I spent a good amount of time looking into tweaking Cassandra thread pools (like the RequestResponseStage pool), reading the Cassandra source code on GitHub and re-verifying network traffic/congestion between the nodes during high loads. But those efforts yielded no solution to the query timeouts which led me to the third round of troubleshooting efforts.
For this round of troubleshooting I turned up the logging level on the org.apache.cassandra logger to DEBUG and I live tailed both the Cassandra system.log and debug.log. That's when I noticed something weird; several times, one of the nodes will detect the other as being down for a period of time. When I looked at the system logs and Linux process activity in htop for the supposedly downed node, it appeared that the node was up and running. This lead me to take a fine-grained look at the system.log for the downed node and in there I noticed a pretty long GC pause. So I flipped over to the gc.log, examined it, and noticed even more long GC pauses. I also noticed that at times there were 35 second pauses that would occur in succession. Those super long GC pauses cause the node to appear down to its peers. It also explained cases where a coordinator sent a request and timed out because the worker node paused execution and didn't return the response in a timely manner. I had found my problem!
So how did this happen? Well, when we were configuring Cassandra, some doofus (aka me!) noticed that
1. The key cache drastically improved the performance of queries
2. They had a high key cache hit rate and continued to have a high hit rate even as they increased the size of the key cache.
So this doofus (again, me), decided to bump the key cache all the way to 8 GB so he could cache entire tables' keys, essentially turning them into in-memory lookups. To accommodate having such a large key cache, he bumped up the JVM heap to 20 GB thinking the new G1GC garbage collector can handle very large heaps. News flash; it can't! At least not when dealing with my workload of high volumes of concurrent reads and writes.
The fix was simple; reduce the size of the JVM heap and the key cache accordingly. The recommended heap size is 8 GB. I'm able to have my new max heap size set to slightly above that recommendation because I switched my memtables to storing part of its data off-heap (offheap_buffers) which greatly reduced heap memory pressure. I also started using the JEMAlloc allocator which is more efficient than the native GCC allocator.
In conclusion, distributed database systems are complex beasts and tuning them has to be done on many levels. While Cassandra gives you many knobs to tune to get the right performance out of your system, it takes a lot of study, experience or both, to know just how to tune them for your specific hardware and workload. DataStax published this useful Cassandra JVM tuning article to help provide guidance for users. But my advice for you would be to try a number of JVM configs and find out what works for your production workload. If I were doing this exercise over, I would start at the hardware level again but with better monitoring tools and then I would move to the system/debug/gc logs before looking at the traces. There is a good performance tuning tutorial by Kiyu Gabriel on DataStax Academy that you might find useful. Hope this article helps someone be more productive than I was.
10 comments
Really very intresting and important points you have discussed. Such a good information to solve a read/write timeout issue with high traffic volumes on their Cassandra production servers. Appreciate your work..
ReplyDeleteThanks
Katherine
(Cassandra)
Great Article Cloud Computing Projects
ReplyDeleteNetworking Projects
Final Year Projects for CSE
JavaScript Training in Chennai
JavaScript Training in Chennai
The Angular Training covers a wide range of topics including Components, Angular Directives, Angular Services, Pipes, security fundamentals, Routing, and Angular programmability. The new Angular TRaining will lay the foundation you need to specialise in Single Page Application developer. Angular Training
rastgele görüntülü konuşma - kredi hesaplama - instagram video indir - instagram takipçi satın al - instagram takipçi satın al - tiktok takipçi satın al - instagram takipçi satın al - instagram beğeni satın al - instagram takipçi satın al - instagram takipçi satın al - instagram takipçi satın al - instagram takipçi satın al - binance güvenilir mi - binance güvenilir mi - binance güvenilir mi - binance güvenilir mi - instagram beğeni satın al - instagram beğeni satın al - polen filtresi - google haritalara yer ekleme - btcturk güvenilir mi - binance hesap açma - kuşadası kiralık villa - tiktok izlenme satın al - instagram takipçi satın al - sms onay - paribu sahibi - binance sahibi - btcturk sahibi - paribu ne zaman kuruldu - binance ne zaman kuruldu - btcturk ne zaman kuruldu - youtube izlenme satın al - torrent oyun - google haritalara yer ekleme - altyapısız internet - bedava internet - no deposit bonus forex - erkek spor ayakkabı - webturkey.net - karfiltre.com - tiktok jeton hilesi - tiktok beğeni satın al - microsoft word indir - misli indir
ReplyDeleteThank you for sharing the information.
ReplyDeletePower BI Training
Thank you for sharing the information.
ReplyDeleteDevOps Online Training
aşk kitapları
ReplyDeleteyoutube abone satın al
cami avizesi
cami avizeleri
avize cami
no deposit bonus forex 2021
takipçi satın al
takipçi satın al
takipçi satın al
takipcialdim.com/tiktok-takipci-satin-al/
instagram beğeni satın al
instagram beğeni satın al
btcturk
tiktok izlenme satın al
sms onay
youtube izlenme satın al
no deposit bonus forex 2021
tiktok jeton hilesi
tiktok beğeni satın al
binance
takipçi satın al
uc satın al
sms onay
sms onay
tiktok takipçi satın al
tiktok beğeni satın al
twitter takipçi satın al
trend topic satın al
youtube abone satın al
instagram beğeni satın al
tiktok beğeni satın al
twitter takipçi satın al
trend topic satın al
youtube abone satın al
takipcialdim.com/instagram-begeni-satin-al/
perde modelleri
instagram takipçi satın al
instagram takipçi satın al
takipçi satın al
instagram takipçi satın al
betboo
marsbahis
sultanbet
marsbahis
ReplyDeletebetboo
sultanbet
marsbahis
betboo
sultanbet
ucuz takipçi
ReplyDeletebinance güvenilir mi
okex güvenilir mi
paribu güvenilir mi
bitexen güvenilir mi
coinbase güvenilir mi
instagram takipçi satın al
instagram takipçi satın alz
tiktok jeton hilesi
ReplyDeletetiktok jeton hilesi
binance referans kimliği
gate güvenilir mi
tiktok jeton hilesi
paribu
btcturk
bitcoin nasıl alınır
If you're be} betting chips of various denominations, stack them with the smallest denomination on high. If you set a bigger denomination on high, the dealer will rearrange them before occurring with the hand. It's a technique the on line casino guards in opposition to somebody 온라인카지노 trying to add a large-denomination chip to their guess after the outcome result} is known. Now that you know how how|you know the way} to play, let's explore variety of the} finer points of the sport.
ReplyDelete