[DRBD-user] Found a new disk flush error code

Thomas Reinhold it-beratung at thomasreinhold.de
Wed Jan 7 09:59:21 CET 2009


> Hi Lars,
>
> thanks for your reply.
>
> I have attached the requested information (I could not retrieve the  
> output from /proc/sysrq-trigger, because I only have a ssh  
> connection).
>
>
>>> The question is if that could be related to DRBD?
>>
>> hard to say yes or no without more information
>> about your setup and the nature of your stress tests.
>> you have to investigate that your self, I guess.
>>
>> drbd status during such periods?
>>
>> drbd messages or other "interessting" messages?
>>
>> does it help to disconnect (physically, if necessary) drbd?
>>
>> does it also happen with disconnected drbd (StandAlone)?
>
>
> I was running dbench for a duration of 7 hours on the primary node.  
> After about half of that time, the described soft lockups occured. I  
> managed to reboot the machine remotely using /proc/sysrq-trigger.  
> While I was trying to check /  mount the XFS filesystem on the  
> secondary, I received similar softlockup errors that rendered the  
> machine unusable and required a hard reboot. There are no unusual  
> DRBD messages in the logfiles.
>
> Unfortunately, I neither have physical access to the machines nor  
> time for testing, as there is a rather short timeframe for going  
> productive. So I updated the kernel to 2.6.24-6 ("Debian Etch and a  
> half") and manually compiled a newer version of my raid controller  
> driver (megaraid_sas). Additionaly, I updated to DRBD 8.0.14, as  
> this version had become available in Debian backports. I couldn't  
> reproduce the error since then.
>
>
>>> I'm getting more and  more convinced, that this issue is due to the
>>> "certified" scsi driver  not working properly,
>>
>> why is that?
>
> Well, I'm using a slightly altered version of Debian Etch, which has  
> been released for the specific hardware I use (FSC Primergy RX300S4  
> with LSI 1078 RAID controller). However, the megaraid_sas driver for  
> that specific controller is the regular Etch version, which in turn  
> is taken from vanilla 2.6.18-6 and apparently has not been changed  
> since. This version is over two years old and is known to have  
> sporadic problems under heavy i/o, causing all kinds of symptoms in  
> layers on top of the block device driver (i.e. filesystem errors).
>
> To conclude: I hope to have solved this issue for now, and it  
> appears NOT to be related to DRBD. If I get any contradictory  
> information, I'll get back to you (hopefully not ;-) ).
>
> Thanks again!
>
>   Thomas
>
>
> ------------------------------------------------------------
> ps -eo pid,state,wchan:40,cmd:
>
>>   PID S WCHAN                                    CMD
>>     1 S -                                        init [2]
>>     2 S migration_thread                         [migration/0]
>>     3 S ksoftirqd                                [ksoftirqd/0]
>>     4 S watchdog                                 [watchdog/0]
>>     5 S migration_thread                         [migration/1]
>>     6 S ksoftirqd                                [ksoftirqd/1]
>>     7 S watchdog                                 [watchdog/1]
>>     8 S migration_thread                         [migration/2]
>>     9 S ksoftirqd                                [ksoftirqd/2]
>>    10 S watchdog                                 [watchdog/2]
>>    11 S migration_thread                         [migration/3]
>>    12 S ksoftirqd                                [ksoftirqd/3]
>>    13 S watchdog                                 [watchdog/3]
>>    14 S migration_thread                         [migration/4]
>>    15 S ksoftirqd                                [ksoftirqd/4]
>>    16 S watchdog                                 [watchdog/4]
>>    17 S migration_thread                         [migration/5]
>>    18 R -                                        [ksoftirqd/5]
>>    19 R -                                        [watchdog/5]
>>    20 R -                                        [migration/6]
>>    21 R -                                        [ksoftirqd/6]
>>    22 R -                                        [watchdog/6]
>>    23 S migration_thread                         [migration/7]
>>    24 S ksoftirqd                                [ksoftirqd/7]
>>    25 S watchdog                                 [watchdog/7]
>>    26 S worker_thread                            [events/0]
>>    27 S worker_thread                            [events/1]
>>    28 S worker_thread                            [events/2]
>>    29 S worker_thread                            [events/3]
>>    30 S worker_thread                            [events/4]
>>    31 R -                                        [events/5]
>>    32 R -                                        [events/6]
>>    33 S worker_thread                            [events/7]
>>    34 S worker_thread                            [khelper]
>>    35 S worker_thread                            [kthread]
>>    46 S worker_thread                            [kblockd/0]
>>    47 S worker_thread                            [kblockd/1]
>>    48 S worker_thread                            [kblockd/2]
>>    49 S worker_thread                            [kblockd/3]
>>    50 S worker_thread                            [kblockd/4]
>>    51 S worker_thread                            [kblockd/5]
>>    52 S worker_thread                            [kblockd/6]
>>    53 S worker_thread                            [kblockd/7]
>>    54 S worker_thread                            [kacpid]
>>   210 S hub_thread                               [khubd]
>>   212 S serio_thread                             [kseriod]
>>   292 D -                                        [pdflush]
>>   293 D -                                        [pdflush]
>>   294 S kswapd                                   [kswapd0]
>>   295 S worker_thread                            [aio/0]
>>   296 S worker_thread                            [aio/1]
>>   297 S worker_thread                            [aio/2]
>>   298 S worker_thread                            [aio/3]
>>   299 S worker_thread                            [aio/4]
>>   300 S worker_thread                            [aio/5]
>>   301 S worker_thread                            [aio/6]
>>   302 S worker_thread                            [aio/7]
>>   861 S scsi_error_handler                       [scsi_eh_0]
>>  1147 S worker_thread                            [ata/0]
>>  1148 S worker_thread                            [ata/1]
>>  1149 S worker_thread                            [ata/2]
>>  1150 S worker_thread                            [ata/3]
>>  1151 S worker_thread                            [ata/4]
>>  1152 S worker_thread                            [ata/5]
>>  1153 S worker_thread                            [ata/6]
>>  1154 S worker_thread                            [ata/7]
>>  1155 S worker_thread                            [ata_aux]
>>  1170 D -                                        [scsi_eh_1]
>>  1171 S scsi_error_handler                       [scsi_eh_2]
>>  1465 S kjournald                                [kjournald]
>>  1703 S -                                        udevd --daemon
>>  2143 S worker_thread                            [kpsmoused]
>>  2515 S worker_thread                            [cqueue/0]
>>  2516 D drbd_req_state                           [cqueue/1]
>>  2517 S worker_thread                            [cqueue/2]
>>  2518 S worker_thread                            [cqueue/3]
>>  2519 S worker_thread                            [cqueue/4]
>>  2520 S worker_thread                            [cqueue/5]
>>  2521 S worker_thread                            [cqueue/6]
>>  2522 S worker_thread                            [cqueue/7]
>>  2558 S worker_thread                            [kmirrord]
>>  2583 S worker_thread                            [kcryptd/0]
>>  2584 S worker_thread                            [kcryptd/1]
>>  2585 S worker_thread                            [kcryptd/2]
>>  2586 S worker_thread                            [kcryptd/3]
>>  2587 S worker_thread                            [kcryptd/4]
>>  2588 S worker_thread                            [kcryptd/5]
>>  2589 S worker_thread                            [kcryptd/6]
>>  2590 S worker_thread                            [kcryptd/7]
>>  2625 S kjournald                                [kjournald]
>>  2758 S -                                        /sbin/portmap
>>  2961 S -                                        /sbin/syslogd
>>  2967 S syslog                                   /sbin/klogd -x
>>  3070 S -                                        /usr/sbin/acpid - 
>> c /etc/acpi/events -s /var/run/acpid.socket
>>  3215 S -                                        /usr/sbin/inetd
>>  3277 S -                                        /usr/lib/postfix/ 
>> master
>>  3284 S -                                        qmgr -l -t fifo -u
>>  3287 S -                                        /usr/sbin/snmpd - 
>> Lsd -Lf /dev/null -u snmp -p /var/run/snmpd.pid 127.0.0.1
>>  3293 S -                                        /usr/sbin/sshd
>>  3330 S -                                        /sbin/rpc.statd
>>  3475 S stext                                    /opt/SMAW/RAID/ 
>> amDaemon
>>  3490 S -                                        [drbd0_worker]
>>  3505 R -                                        [drbd1_worker]
>>  3531 R -                                        [drbd1_receiver]
>>  3542 R -                                        [drbd1_asender]
>>  3555 S -                                        /opt/SMAW/RAID/ 
>> amDaemon
>>  3560 S -                                        /usr/sbin/atd
>>  3568 S -                                        /usr/sbin/cron
>>  4197 S -                                        /sbin/getty 38400  
>> tty2
>>  4198 S -                                        /sbin/getty 38400  
>> tty3
>>  4199 S -                                        /sbin/getty 38400  
>> tty4
>>  4200 S -                                        /sbin/getty 38400  
>> tty5
>>  4201 S -                                        /sbin/getty 38400  
>> tty6
>>  5756 S -                                        /sbin/getty 38400  
>> tty1
>>  7396 S worker_thread                            [xfslogd/0]
>>  7397 S worker_thread                            [xfslogd/1]
>>  7398 S worker_thread                            [xfslogd/2]
>>  7399 S worker_thread                            [xfslogd/3]
>>  7400 S worker_thread                            [xfslogd/4]
>>  7401 R -                                        [xfslogd/5]
>>  7402 S worker_thread                            [xfslogd/6]
>>  7403 S worker_thread                            [xfslogd/7]
>>  7404 S worker_thread                            [xfsdatad/0]
>>  7405 S worker_thread                            [xfsdatad/1]
>>  7406 S worker_thread                            [xfsdatad/2]
>>  7407 S worker_thread                            [xfsdatad/3]
>>  7408 S worker_thread                            [xfsdatad/4]
>>  7409 S worker_thread                            [xfsdatad/5]
>>  7410 S worker_thread                            [xfsdatad/6]
>>  7411 S worker_thread                            [xfsdatad/7]
>>  7454 S -                                        ha_logd: read  
>> process
>>  7462 S -                                        ha_logd: write  
>> process
>>  7581 S 562640683272                             heartbeat: master  
>> control process
>>  7584 S pipe_wait                                heartbeat: FIFO  
>> reader
>>  7585 S 279172841735                             heartbeat: write:  
>> bcast eth1
>>  7586 S -                                        heartbeat: read:  
>> bcast eth1
>>  7587 S 279172841735                             heartbeat: write:  
>> mcast eth0
>>  7588 S -                                        heartbeat: read:  
>> mcast eth0
>>  7589 S 279172874239                             heartbeat: write:  
>> serial /dev/ttyS0
>>  7590 S -                                        heartbeat: read:  
>> serial /dev/ttyS0
>>  7609 S -                                        /usr/lib64/ 
>> heartbeat/dopd
>>  1514 S stext                                    /usr/sbin/eecd
>>  1518 S -                                        /etc/srvmagt/SCS/ 
>> SVRemoteConnector -ci/etc/srvmagt/SCS/Provider/xml -ssl_servcert=/ 
>> etc/srvmagt/SCS/ssl/server_key.crt -ssl_capath=/etc/srvmagt/SCS/ssl
>>  1521 S -                                        /usr/sbin/scagt
>>  1523 S stext                                    /usr/sbin/sc2agt
>>  1525 S -                                        /usr/sbin/busagt
>>  1527 D blk_execute_rq                           /usr/sbin/hdagt
>>  1530 S -                                        /usr/sbin/unixagt
>>  1532 S -                                        /usr/sbin/etheragt
>>  1534 S -                                        /usr/sbin/biosagt
>>  1536 S -                                        /usr/sbin/securagt
>>  1538 S stext                                    /usr/sbin/statusagt
>>  1540 S -                                        /usr/sbin/invagt
>>  1542 S stext                                    /usr/sbin/vvagt
>>  1786 S -                                        /usr/sbin/openvpn  
>> --writepid /var/run/openvpn.client.pid --daemon ovpn-client -- 
>> status /var/run/openvpn.client.status 10 --cd /etc/openvpn -- 
>> config /etc/openvpn/client.conf
>>  7088 S -                                        [xfsbufd]
>>  7094 D drbd_al_begin_io                         [xfsbufd]
>>  7095 D -                                        [xfssyncd]
>>  8333 S -                                        /usr/bin/vmnet- 
>> bridge -d /var/run/vmnet-bridge-0.pid -n 0 -i eth0
>>  8598 S 11371755398898876679                     /usr/sbin/vmware- 
>> authdlauncher
>>  8608 S wait                                     /bin/sh /usr/bin/ 
>> vmware-watchdog -s webAccess -u 30 -q 5 /usr/lib/vmware/webAccess/ 
>> java/jre1.5.0_15/bin/webAccess -client -Xmx64m - 
>> XX:MinHeapFreeRatio=30 -XX:MaxHeapFreeRatio=30 - 
>> Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager - 
>> Djava.endorsed.dirs=/usr/lib/vmware/webAccess/tomcat/apache- 
>> tomcat-6.0.16/common/endorsed -classpath /usr/lib/vmware/webAccess/ 
>> tomcat/apache-tomcat-6.0.16/bin/bootstrap.jar:/usr/lib/vmware/ 
>> webAccess/tomcat/apache-tomcat-6.0.16/bin/commons-logging-api.jar - 
>> Dcatalina.base=/usr/lib/vmware/webAccess/tomcat/apache- 
>> tomcat-6.0.16 -Dcatalina.home=/usr/lib/vmware/webAccess/tomcat/ 
>> apache-tomcat-6.0.16 -Djava.io.tmpdir=/usr/lib/vmware/webAccess/ 
>> tomcat/apache-tomcat-6.0.16/temp  
>> org.apache.catalina.startup.Bootstrap start
>>  8617 S stext                                    /usr/lib/vmware/ 
>> webAccess/java/jre1.5.0_15/bin/webAccess -client -Xmx64m - 
>> XX:MinHeapFreeRatio=30 -XX:MaxHeapFreeRatio=30 - 
>> Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager - 
>> Djava.endorsed.dirs=/usr/lib/vmware/webAccess/tomcat/apache- 
>> tomcat-6.0.16/common/endorsed -classpath /usr/lib/vmware/webAccess/ 
>> tomcat/apache-tomcat-6.0.16/bin/bootstrap.jar:/usr/lib/vmware/ 
>> webAccess/tomcat/apache-tomcat-6.0.16/bin/commons-logging-api.jar - 
>> Dcatalina.base=/usr/lib/vmware/webAccess/tomcat/apache- 
>> tomcat-6.0.16 -Dcatalina.home=/usr/lib/vmware/webAccess/tomcat/ 
>> apache-tomcat-6.0.16 -Djava.io.tmpdir=/usr/lib/vmware/webAccess/ 
>> tomcat/apache-tomcat-6.0.16/temp  
>> org.apache.catalina.startup.Bootstrap start
>>  8782 Z exit                                     [vmware-vmx]  
>> <defunct>
>>  8801 S rtc_read                                 [vmware-rtc]
>> 16026 D -                                        dbench -t 25000 20
>> 16027 D -                                        dbench -t 25000 20
>> 16028 D -                                        dbench -t 25000 20
>> 16029 R -                                        dbench -t 25000 20
>> 16030 D -                                        dbench -t 25000 20
>> 16031 D -                                        dbench -t 25000 20
>> 16032 D -                                        dbench -t 25000 20
>> 16033 D -                                        dbench -t 25000 20
>> 16034 D -                                        dbench -t 25000 20
>> 16035 D -                                        dbench -t 25000 20
>> 16036 D -                                        dbench -t 25000 20
>> 16037 D -                                        dbench -t 25000 20
>> 16038 D -                                        dbench -t 25000 20
>> 16039 D -                                        dbench -t 25000 20
>> 16040 D -                                        dbench -t 25000 20
>> 16041 D -                                        dbench -t 25000 20
>> 16042 D -                                        dbench -t 25000 20
>> 16043 D -                                        dbench -t 25000 20
>> 16044 D -                                        dbench -t 25000 20
>> 16045 D -                                        dbench -t 25000 20
>> 25562 D -                                        ls --color=auto -la
>> 25711 D flush_cpu_workqueue                      vmrun -T server -h https://127.0.0.1:8333/sdk 
>>  -u "" -p "" stop [standard] testvm2/testvm.vmx soft
>> 25712 D -                                        sshd: root at pts/3
>> 26119 D -                                        sshd: root at pts/4
>> 26372 D -                                        sshd: root at pts/5
>> 26498 D -                                        sshd: root at pts/6
>> 26678 D -                                        sshd: root at pts/7
>> 27455 D -                                        sshd: root at pts/8
>> 28562 D -                                        [bash]
>> 28691 D -                                        shutdown -r 0 w
>> 29107 D -                                        umount /srv/vmware/ 
>> virtual_machines
>> 29109 D -                                        sshd: root at pts/12
>> 29264 D -                                        sshd: root at pts/13
>> 29580 D -                                        sshd: root at pts/14
>> 30031 D -                                        sshd: root at pts/15
>> 30366 D -                                        sshd: root at pts/16
>> 30555 D -                                        sshd: root at pts/17
>> 30825 D -                                        [drbdsetup]
>> 31001 D -                                        sshd: root at pts/19
>> 31575 D -                                        sshd: root at pts/20
>> 31815 D -                                        sshd: root at pts/21
>> 32168 D -                                        sshd: root at pts/22
>>  2473 S pipe_wait                                /USR/SBIN/CRON
>>  2474 S wait                                     /bin/sh -c test - 
>> x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.daily )
>>  2477 S wait                                     /bin/sh -c test - 
>> x /usr/sbin/anacron || ( cd / && run-parts --report /etc/cron.daily )
>>  2478 S -                                        run-parts -- 
>> report /etc/cron.daily
>>  2507 S wait                                     /bin/sh /etc/ 
>> cron.daily/find
>>  2509 S wait                                     /bin/sh /usr/bin/ 
>> updatedb
>>  2525 S wait                                     /bin/sh /usr/bin/ 
>> updatedb
>>  2528 S pipe_wait                                /usr/bin/sort -z -f
>>  2529 S pipe_wait                                /usr/lib/locate/ 
>> frcode -0
>>  2533 S wait                                     su nobody -s /bin/ 
>> sh -c /usr/bin/find / -ignore_readdir_race      \( -fstype NFS -o - 
>> fstype nfs -o -fstype nfs4 -o -fstype afs -o -fstype binfmt_misc -o  
>> -fstype proc -o -fstype smbfs -o -fstype autofs -o -fstype iso9660 - 
>> o -fstype ncpfs -o -fstype coda -o -fstype devpts -o -fstype ftpfs - 
>> o -fstype devfs -o -fstype mfs -o -fstype shfs -o -fstype sysfs -o - 
>> fstype cifs -o -fstype lustre_lite -o -fstype tmpfs -o -fstype  
>> usbfs -o -fstype udf -o      -type d -regex '\(^/tmp$\)\|\(^/usr/tmp 
>> $\)\|\(^/var/tmp$\)\|\(^/afs$\)\|\(^/amd$\)\|\(^/alex$\)\|\(^/var/ 
>> spool$\)\|\(^/sfs$\)\|\(^/media$\)' \) -prune -o -print0
>>  2534 D -                                        /usr/bin/find / - 
>> ignore_readdir_race ( -fstype NFS -o -fstype nfs -o -fstype nfs4 -o  
>> -fstype afs -o -fstype binfmt_misc -o -fstype proc -o -fstype smbfs  
>> -o -fstype autofs -o -fstype iso9660 -o -fstype ncpfs -o -fstype  
>> coda -o -fstype devpts -o -fstype ftpfs -o -fstype devfs -o -fstype  
>> mfs -o -fstype shfs -o -fstype sysfs -o -fstype cifs -o -fstype  
>> lustre_lite -o -fstype tmpfs -o -fstype usbfs -o -fstype udf -o - 
>> type d -regex \(^/tmp$\)\|\(^/usr/tmp$\)\|\(^/var/tmp$\)\|\(^/afs$\) 
>> \|\(^/amd$\)\|\(^/alex$\)\|\(^/var/spool$\)\|\(^/sfs$\)\|\(^/media$ 
>> \) ) -prune -o -print0
>>  2712 D -                                        sshd: root at pts/23
>>  2960 D -                                        sshd: root at pts/24
>>  3194 D -                                        sshd: root at pts/25
>>  3252 D -                                        sshd: root at pts/26
>>  4097 D -                                        sshd: root at pts/27
>>  4750 D -                                        sshd: root at pts/28
>>  4965 D -                                        sshd: root at pts/29
>>  5274 D -                                        sshd: root at pts/30
>>  5685 D -                                        sshd: root at pts/31
>>  6150 D -                                        sshd: root at pts/32
>>  6695 D -                                        sshd: root at pts/33
>>  7601 S -                                        pickup -l -t fifo - 
>> u -c
>>  8248 S -                                        sshd: root at pts/34
>>  8257 S wait                                     -bash
>>  8300 R -                                        ps -eo  
>> pid,state,wchan:40,cmd
>>



More information about the drbd-user mailing list