Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On Wed, Jul 18, 2007 at 04:19:02PM -0500, alex at crackpot.org wrote: > My 2 drbd boxen are called 42 and 43. > drbd version: 0.7.16 (api:77/proto:74) > > * Today, 42 was primary. > * A co-worker noticed that it was not connected to 43. (42 = > 'st:Primary/Unknown ld:Consistent', 43 = 'st:Secondary/Unknown > ld:Consistent') > * I saw that 43 said 'cs:WFConnection'. Co-worker did 'drbdadm > connect' on 42, and it kernel paniced. what cs: was 42 in, before the "drbdadm connect" ? what is in the kernel logs, what lead to them being disconnected in the first place? what does the panic/oops look like? did it panic in drbd or somewhere else? was it an "intentional" panic? > * 43 took over as primary as it should. (with out-of-date data) > * When 42 was rebooted, it entered Secondary status and performed a > sync of data from 43. Since the 2 boxes had been disconnected for > several days, the data on 43 was old, and the newer data from 42 was > overwritten. yes. > We're getting backup restores from tape. We've added better > monitoring to catch when drbd disconnects in the future. > > I am writing because up to this point I thought that a 'drbdadm > connect' was a fairly safe command to issue. Are there circumstances > under which it should not be done, or which may cause a panic as we > saw today? those would be bugs. some of them might be fixed already, you are 0.7.16, we are 0.7.24? > Would doing 'drbdadm disconnect' before 'drbdadm connect' > have made a difference? hard to say. maybe. probably not. > If the 2 boxes disconnect in the future (for network failure or > whatever other reason), what is the safe way to get them talking again? -- : Lars Ellenberg Tel +43-1-8178292-0 : : LINBIT Information Technologies GmbH Fax +43-1-8178292-82 : : Vivenotgasse 48, A-1120 Vienna/Europe http://www.linbit.com : __ please use the "List-Reply" function of your email client.