2014年05月22日

How to Collect Information for Troubleshooting Enterprise Servers のメモ #LinuxCon

How to Collect Information for Troubleshooting Enterprise Servers のめも
スライドのほうが詳しく書いてあるかも。

Thursday, May 22 ? 2:00pm - 2:50pm

How to Collect Information for Troubleshooting Enterprise Servers




  • Prepareion for kernel space problems




    • What to do with unexpected kernel panic




      • set up kdump in advance and get the memory image as of kernel panic




        • the kdump may fail due to hardware problem






        • the "clashkernel=auto" command line parameter may not work








      • Setup Sysrq functionlally in advance in you can trust console use








    • What is OOM killer?




      • a built-in feature which tries to survive out of memory condition by killing sume process






      • do not rely on the OOM killar








    • What to do with unexpected system reboot?




      • one of most annoying troubles in linux because it is diffcult to understand...








    • What is serial console?




      • a relatively reliable way to caputure kernel messages






      • some hardware support redirection of serial consol






      • a handy way to capture kernel messages








    • What is netconsole?




      • a utility for saving kernel messages in available




        • sourceforgeになんかあるっぽい






        • difficult to add time stamp




          • because kernel message sent via udp are not record oriented.












    • what to do with unexpected service fail over?




      • the fail over will happens without any prior if the timeout of watchdog is shorter than timeout of kernel warning mechanisms.




        • plz check how you can access information which you dont have control








      • the antivirus software can couse watchdog applications to trigger






      • programs like shell scripts are vulnerable








    • what tools can we use for recording unexpected event




      • Sustem call auditing








    • systemtap example




      • obtaining sysrq -t message might be helpful for analysis




        • なんかいろいろでてきて便利っぽい(ちゃんとよめてない)








      • show threads terminated bu signals




        • find out which thred is exit








      • show disk i/o accounting upon thread exit




        • find out which thread is causing heavy disk i/o request






        • プログラムをかくと、どこが遅いかとかわかる感じ。






        • pid とか tidとか みえる?








      • detect the dentry cache bouncing bomb




        • script はスライド参照








      • watch out for integer overflow problems(整数型の桁あふれ? だっけ?)




        • x86_64 servers which create may dentries






        • "infinit roop"(無限ループ)とか聞こえたぞ?!






        • this kind of problem one day suddenly happens




          • おおこわいこわい








        • 208.5 days problemsってよばれてるのがあるっぽい?




          • don't forget you your server was rebooted(if you cant update kernel some reasons)










      • find the process sending unidentifidied packet






      • is systemtap good at everything?




        • not only measuring performance of funtionally but also tracing functionally






        • allow probing at almost everywhere






        • systemtap is not a tool designed for moniteroing throuout years






        • pzl check whether systemtap is suitable for solving your problem












  • preparing for userspace problems




    • what tool can we use for tracking user space problems?




      • where is the lovation of "log file"?








    • TOMOYO Linux




      • tracking / restincing various operations from boot






      • mainline version is abilable since linux. 2.6.30 kernel






      • いろいろrecordされるっぽい






      • Name based access tracking






      • syslog deamonnのあれこれとか






      • if you are interestid in tomoyo linux:plz visit tomoyo linux page?






      • AKARI package enable








    • single function lsm modules




      • AKARI is an unexpected usage of lsm interface






      • なんかいろいろ見える






      • adding single function lsm modukes to maintain linux kernel is difficult




        • due to lsm's exclusiveness






        • consult me if










    • track program execution




      • Akari requires rebooting the system in order to obtain complete process histry from bool






      • なんかプログラムがスライドにみえる よめない><








    • a bit longer script




      • if you add probes at file open functions,






      • やっぱりログとかプログラムがスライドに見えるんだけど 全然よめない><








    • caitsith




      • a new type of rule based in ker-nel access auditiong and restricting tool




        • used as an auditing tool at security contest 2013 held in japan










    • conclution




      • スライド見ましょう
















posted by Dahlia* at 14:46| Comment(0) | 日記
この記事へのコメント
コメントを書く
お名前:

メールアドレス:

ホームページアドレス:

コメント: