Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
Hello Lars, [...] Yesterday evening, one of the machines crashed again during datatransfer (don't know why up to now, because the location of the machine is another as mine. Maybe I can see something at tuesday). I rebooted the secondary, preventing it from crashing, too. It was syslogd, which crashed. Unfortunately, the log wasn't in the messages-file, but just onto the screen. The previous drbd-errors (from the days before), I could see: Oct 27 16:03:03 FAGINTSC kernel: drbd11: Epoch set size wrong!!found=1 reported=0 Oct 27 16:03:10 FAGINTSC kernel: drbd11: Epoch set size wrong!!found=364 reported=363 Oct 27 16:04:14 FAGINTSC kernel: drbd11: Epoch set size wrong!!found=372 reported=371 Oct 27 16:04:41 FAGINTSC kernel: drbd11: tl messed up! Oct 27 16:04:41 FAGINTSC kernel: drbd11: invalid barrier number!!found=4522056, reported=42224 Oct 27 16:04:41 FAGINTSC kernel: drbd11: Epoch set size wrong!!found=197 reported=63 Oct 27 16:04:44 FAGINTSC kernel: drbd11: [bdflush/6] sock_sendmsg timeout count down: ko=4294967295 Oct 27 16:04:50 FAGINTSC kernel: drbd11: [bdflush/6] sock_sendmsg timeout count down: ko=4294967294 Oct 27 16:04:56 FAGINTSC kernel: drbd11: [bdflush/6] sock_sendmsg timeout count down: ko=4294967293 Oct 27 16:05:02 FAGINTSC kernel: drbd11: [bdflush/6] sock_sendmsg timeout count down: ko=4294967292 [...] Oct 27 16:20:02 FAGINTSC kernel: drbd11: [bdflush/6] sock_sendmsg timeout count down: ko=4294967142 Oct 27 16:20:08 FAGINTSC kernel: drbd11: [bdflush/6] sock_sendmsg timeout count down: ko=4294967141 Oct 27 16:20:14 FAGINTSC kernel: drbd11: [bdflush/6] sock_sendmsg timeout count down: ko=4294967140 Oct 29 19:52:33 fagintsc kernel: drbd16: Epoch set size wrong!!found=1138 reported=1137 Oct 29 19:53:32 fagintsc kernel: drbd16: tl messed up! Oct 29 19:53:32 fagintsc kernel: drbd16: invalid barrier number!!found=0, reported=2739 Oct 29 19:53:32 fagintsc kernel: drbd16: Epoch set size wrong!!found=1331 reported=241 Oct 29 19:53:32 fagintsc kernel: drbd16: invalid barrier number!!found=0, reported=2740 Oct 29 19:53:32 fagintsc kernel: drbd16: Epoch set size wrong!!found=5 reported=260 Oct 29 19:53:32 fagintsc kernel: drbd16: invalid barrier number!!found=0, reported=2741 Oct 29 19:53:32 fagintsc kernel: drbd16: Epoch set size wrong!!found=5 reported=238 Oct 29 19:53:32 fagintsc kernel: drbd16: invalid barrier number!!found=0, reported=2742 Oct 29 19:53:32 fagintsc kernel: drbd16: Epoch set size wrong!!found=5 reported=239 After these sequenzes, the machine was always dead. Kind regards, Andreas Hartmann