Note: "permalinks" may not be as permanent as we would like,
direct links of old sources may well be a few messages off.
On Tuesday 09 August 2005 23:51, Dan Cunningham wrote: > So last night my server started doing the same thing! We have a 1.5 TB > array (80% utilized) and noticed some users were getting permisioned > denied messages accessing certin directories, including the root user. > We have the same reiser error in the message log each time someone tries > to access one of the corrup folders. I actually disconnected the > secondary this weekend, expecting to do some work on it, I don't think > thats the problem, but what I am wondering is if I reconnect the > servers, if the corruption will sync over. The hardware is a dell scsi > storage vault connected to a dell 2650 running 2.6.8-2/debian with drbd > 7. Any ideas or suggesttions??? Extended downtime for fsck is my last > option :-( Also I checked my partitions and the LVM volume has 1G more > space then the reiserfs on top of it (ie drbd has 1GB for meta info) > Sure, the filesystem corruption will sync over when the second box becomes connected. Well, before our server went into production, I already thought about the problem of the long time reiserfsck can sometimes take. Actually drbd is the optimal solution: 1.) Tell the users from now on everything they will save on the failover device will lost, until you tell them the problem is fixed 2.) Disconnect the drbd device, stop heartbeat, etc. on it 3.) Do the fsck on the failover node and fix everything there. The main server stays as it is during this time and will go serving to the clients. 4.) When point 3 is finished, make the the failover node into drbd primary state, the main server shall go into secondary state, invalidate the data on the main node, reconnect. The data should now go from the failover to the primary. You have a data loss of everything that was written ever since you disconnected the failover node. By doing the fsck on the disconnected failover node, you will also see how much time the fsck will take and so its up to you to decide if you prefer a downtime or the data loss from point 4 (all data the users have written in the mean time). I have to admit that this data loss is probably not acceptable for very important data (e.g. databases of online-shops, etc.), but it would surely work with our users. Probably I would even remount our home-directory readonly. Hope it helps, Bernd PS: I hope you noticed the announcement about the possible corruption with some recent kernel versions, did you? -- Bernd Schubert PCI / Theoretische Chemie Universität Heidelberg INF 229 69120 Heidelberg