Linux on a Tyan S2466N-4M Dual Processor Athlon

In about November 2002, I decided my 300 MHz Pentium machine was ready for retirement. I was tired of waiting for long package builds and crashes under X, but also of the noise. After considerable surfing, I settled on these specifications:

My search for a quiet machine turned up the site www.endpcnoise.com which turned out to be N. W. Custom Computers in the state of Washington. They gave me a reasonable quote, so I placed the order. The machine was delivered in February 2003. They had chosen:


Hardware

case
      interior (509 KB)This is how the inside of the case looks (with the CPU fans removed). The ribbon cable to the CDROM drive (bottom center to top right) is stretched pretty tight. The case has two USB ports on the front, but they were not connected. The wiring is there, but it terminates in one plug for each wire, so you have to match up the color coding of the wires with the pin order on the motherboard. Part of the system integration job, I would have thought.

Zalman flower heatsinks (135 KB) The Zalman CPU heat sinks are so massive they exceed the maximum weight specification for the Athlon processors. Accordingly, the system was shipped with the heat sinks unmounted.

metal flakes (46 KB) When I unwrapped the heat sinks, I was somewhat annoyed to find some loose aluminum flakes left over from the machining. I only hoped none had fallen onto the motherboard during burn-in.

They included a package of Arctic Silver 3 so I could install the heat sinks myself. They didn't include any instructions, but I found them at the Arctic Silver web site.

CPU heatsinks
      installed (161 KB) This is how the heatsinks looked when installed. They had trimmed several fins to limit the interference.

video
      heatsink installed (82 KB)Here's how the Zalman video board heat sink looked when installed. The U-shaped heat pipe wraps around the board, so half of the cooling fins can be put on each side.

I booted the machine and was disappointed to hear how much noise it made. I guess I shouldn't have been surprised, since it has six fans (one for each CPU, two on the back of the case, and two in the power supply)! Well, it should at least stay cool.

Here are some of the reports from /proc:

==> cpuinfo <==
processor       : 0
vendor_id       : AuthenticAMD
cpu family      : 6
model           : 8
model name      : AMD Athlon(tm) Processor
stepping        : 1
cpu MHz         : 1800.079
cache size      : 256 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr sse syscall mmxext 3dnowext 3dnow
bogomips        : 3591.37

processor       : 1
vendor_id       : AuthenticAMD
cpu family      : 6
model           : 8
model name      : AMD Athlon(tm) Processor
stepping        : 1
cpu MHz         : 1800.079
cache size      : 256 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 1
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr sse syscall mmxext 3dnowext 3dnow
bogomips        : 3591.37


==> dma <==
 4: cascade

==> meminfo <==
	total:    used:    free:  shared: buffers:  cached:
Mem:  1056772096 1023979520 32792576        0 187002880 365514752
Swap: 912994304 128491520 784502784
MemTotal:      1032004 kB
MemFree:         32024 kB
MemShared:           0 kB
Buffers:        182620 kB
Cached:         299100 kB
SwapCached:      57848 kB
Active:         338684 kB
Inactive:       346616 kB
HighTotal:      130496 kB
HighFree:         2044 kB
LowTotal:       901508 kB
LowFree:         29980 kB
SwapTotal:      891596 kB
SwapFree:       766116 kB

==> version <==
Linux version 2.4.21 (jrv@vanzandt) (gcc version 3.3 (Debian)) #4 SMP Sun Jul 6 10:15:42 EDT 2003

==> pci <==
PCI devices found:
  Bus  0, device   0, function  0:
    Host bridge: Advanced Micro Devices [AMD] AMD-760 MP [IGD4-2P] System Controller (rev 17).
      Master Capable.  Latency=64.  
      Prefetchable 32 bit memory at 0xf0000000 [0xf3ffffff].
      Prefetchable 32 bit memory at 0xec500000 [0xec500fff].
      I/O at 0x1410 [0x1413].
  Bus  0, device   1, function  0:
    PCI bridge: Advanced Micro Devices [AMD] AMD-760 MP [IGD4-2P] AGP Bridge (rev 0).
      Master Capable.  Latency=99.  Min Gnt=12.
        ...
    

lspci reports these PCI devices (note the rev number of the System Controller differs from what's in /proc/pci):

vanzandt:/proc# /usr/bin/lspci
00:00.0 Host bridge: Advanced Micro Devices [AMD] AMD-760 MP [IGD4-2P] System Controller (rev 11)
00:01.0 PCI bridge: Advanced Micro Devices [AMD] AMD-760 MP [IGD4-2P] AGP Bridge
00:07.0 ISA bridge: Advanced Micro Devices [AMD] AMD-768 [Opus] ISA (rev 05)
00:07.1 IDE interface: Advanced Micro Devices [AMD] AMD-768 [Opus] IDE (rev 04)
00:07.3 Bridge: Advanced Micro Devices [AMD] AMD-768 [Opus] ACPI (rev 03)
00:09.0 SCSI storage controller: Adaptec AIC-7892A U160/m (rev 02)
00:10.0 PCI bridge: Advanced Micro Devices [AMD] AMD-768 [Opus] PCI (rev 05)
01:05.0 VGA compatible controller: ATI Technologies Inc Radeon R250 If [Radeon 9000] (rev 01)
01:05.1 Display controller: ATI Technologies Inc Radeon R250 [Radeon 9000] (Secondary) (rev 01)
02:00.0 USB Controller: Advanced Micro Devices [AMD] AMD-768 [Opus] USB (rev 07)
02:05.0 Multimedia audio controller: Creative Labs SB Live! EMU10k1 (rev 07)
02:05.1 Input device controller: Creative Labs SB Live! MIDI/Game Port (rev 07)
02:08.0 Ethernet controller: 3Com Corporation 3c905C-TX/TX-M [Tornado] (rev 78)
      

Only one of the disk drives has a temperature sensor:

vanzandt:# hddtemp /dev/hda /dev/hdc /dev/sda /dev/sdb
/dev/hda: ST380021A: 36 C
/dev/hdc: SONY CD-RW CRX210E1: S.M.A.R.T. not available
/dev/sda: SEAGATE ST336938LW:  known drive, but it doesn't have a temperature sensor.
/dev/sdb: SEAGATE ST336938LW:  known drive, but it doesn't have a temperature sensor.
    

Here is the IDE identification information for the Seagate Barracuda IDE drive:

# hdparm -I /dev/hda

/dev/hda:

non-removable ATA device, with non-removable media
	Model Number:		ST380021A                               
	Serial Number:		3HV366JT            
	Firmware Revision:	3.19    
Standards:
	Supported: 1 2 3 4 5 
	Likely used: 5
Configuration:
	Logical		max	current
	cylinders	16383	16383
	heads		16	16
	sectors/track	63	63
	bytes/track:	0		(obsolete)
	bytes/sector:	0		(obsolete)
	current sector capacity: 16514064
	LBA user addressable sectors = 156301488
Capabilities:
	LBA, IORDY(can be disabled)
	Buffer size: 2048.0kB	ECC bytes: 4	Queue depth: 1
	Standby timer values: spec'd by standard
	r/w multiple sector transfer: Max = 16	Current = 16
	Advanced power management level: 65278
	DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 *udma5 
	     Cycle time: min=120ns recommended=120ns
	PIO: pio0 pio1 pio2 pio3 pio4 
	     Cycle time: no flow control=240ns  IORDY flow control=120ns
Commands/features:
	Enabled	Supported:
	   *	READ BUFFER cmd
	   *	WRITE BUFFER cmd
	   *	Host Protected Area feature set
	   *	look-ahead
	   *	write cache
	   *	Power Management feature set
		Security Mode feature set
	   *	SMART feature set
		SET MAX security extension
	   *	DOWNLOAD MICROCODE cmd
Security: 
	Master password revision code = 65534
		supported
	not	enabled
	not	locked
		frozen
	not	expired: security count
	not	supported: enhanced erase
HW reset results:
	CBLID- above Vih
	Device num = 1
Checksum: correct

Software

I installed Debian Linux 3.0 from CDROMs. I partitioned the disks like this:

vanzandt:/proc# fdisk -l

Disk /dev/sda: 255 heads, 63 sectors, 4492 cylinders
Units = cylinders of 16065 * 512 bytes

   Device Boot    Start       End    Blocks   Id  System
/dev/sda1             1         5     40131   83  Linux
/dev/sda2             6      3895  31246425   83  Linux
/dev/sda3          3896      4381   3903795   83  Linux
/dev/sda4          4382      4492    891607+  82  Linux swap

Disk /dev/sdb: 255 heads, 63 sectors, 4492 cylinders
Units = cylinders of 16065 * 512 bytes

   Device Boot    Start       End    Blocks   Id  System
/dev/sdb1             1       596   4787338+  83  Linux
/dev/sdb2           597      4486  31246425   83  Linux

Disk /dev/sdc: 8 heads, 32 sectors, 484 cylinders
Units = cylinders of 256 * 512 bytes

   Device Boot    Start       End    Blocks   Id  System
/dev/sdc1   *         1       483     61808    6  FAT16

Disk /dev/hda: 255 heads, 63 sectors, 9729 cylinders
Units = cylinders of 16065 * 512 bytes

   Device Boot    Start       End    Blocks   Id  System
/dev/hda1   *         1         5     40131   83  Linux
/dev/hda2             6      2437  19535040   83  Linux
/dev/hda3          2438      9729  58572990   83  Linux
    

The partitions are mounted like this:

# mount
/dev/sda3 on / type ext2 (rw,errors=remount-ro)
proc on /proc type proc (rw)
/dev/sdb1 on /usr type ext2 (rw)
/dev/hda1 on /boot type ext2 (rw)
/dev/md0 on /home type ext3 (rw)
/dev/hda2 on /debian type ext3 (rw)
/dev/hda3 on /backup type ext3 (rw)
    

md0 is a software RAID 1 drive consisting of sda2 and sdb2, hopefully giving me better performance than a single drive. The big partitions have ext3 filesystems, to save time running fsck. Usage so far is like this:

vanzandt:/proc/ide# df
Filesystem       1K-blocks      Used Available Use% Mounted on
/dev/sda3          3842408   1229920   2417300  34% /
/dev/sdb1          4712088   3057276   1415448  69% /usr
/dev/hda1            38859     22030     14823  60% /boot
/dev/md0          30755776   9700328  19493132  34% /home
/dev/hda2         19228308  13928264   4323292  77% /debian
/dev/hda3         57653696  45912060   8812988  84% /backup
    

I also created a directory /home/local, and made /usr/local a symbolic link to it.

The 2.4.19 Linux kernel is configured like this. Here are the boot messages. (Update 2006-05-06: I am now running a 2.6.15 kernel with this configuration, generating these boot messages.)

I installed XFree86 4.2.1 and the DRI driver packages xlibmesa3-dri-trunk_2002.11.06-2_i386.deb and xserver-xfree86-dri-trunk_2002.11.06-2_i386.deb. These Debian packages are maintained by Michel Dänzer (daenzer@debian.org) at http://people.debian.org/~daenzer/dri-trunk/. Here is the XF86Config-4, a boot log, and an xdpyinfo report.

The machine connects to the Internet through a router that blocks all incoming connections except ssh. I also set up a simple firewall for which I've made a Debian package.

In /backup, I maintain daily, weekly, and monthly backups in the form of complete copies of most of the rest of the system. They are kept up to date by these entries in /etc/crontab:

55 23  * * 1-6 root /usr/local/bin/backup-local daily
55 23  * * 7   root /usr/local/bin/backup-local weekly
55 22 27 * *   root /usr/local/bin/backup-local monthly
    

which call this simple script:

#!/bin/sh
if [ "$1" == "" ]; then T=daily; else T=$1; fi
mount /backup
for d in bin boot dev etc home lib opt root sbin usr var; do
nice mirrordir --restore-access /$d /backup/$T/$d
done
touch /backup/$T/TIMESTAMP
umount /backup
echo "updated /backup/$T" >/etc/BACKUP-$T
    

Configuration

Sensors

Here is a working /etc/sensors.conf. The kernel configuration needs to include at least w83781d, hwmon_vid, and i2c_isa (see above for my .config file). The w83781d driver needs some options, so I compiled it as a module and put this in /etc/modprobe.d/w83781d:

options w83781d force_w83782d=0,0x2d force_subclients=0,0x2d,0x48,0x49 force_w83627hf=0,0x2c force_subclients=0,0x2c,0x4a,0x4b init=0
    
(Your distribution may provide another way to supply module options.) I also have a script /etc/init.d/lm_sensors which does this on startup:
	/usr/bin/sensors -s 1> /dev/null 2> /dev/null
    

I decided to record temperatures and fan RPMs periodically, so I added this to root's crontab (actually a new file /etc/cron.d/tempmon):

*/10 * * * *   root    [ -x /usr/local/bin/tempmon ] &&  /usr/local/bin/tempmon | logger -t tempmon
    
Here's the tempmon script:
#!/usr/bin/perl
#	tempmon - monitor temperatures and fans

use strict;

my @tok;
my $dev;
my $str="";
my $temp;
my $fans="";
my $rpm;

open(PROC,"/usr/sbin/hddtemp SATA:/dev/sda SATA:/dev/sdb SATA:/dev/sdc |");
# typical output:
# /dev/sda: ST3300831AS: 34 C
# /dev/sdb: ST3300831AS: 33 C
# /dev/sdc: ST3300831AS: 37 C
while(){
    @tok=split;
    $tok[0] =~ m./dev/([a-z]*):.;
    $dev = $1;
    $str .= "  $dev $tok[2]";
}

open(PROC,"/usr/bin/sensors |");

while(){
# typical output:
# ...
# VRM2 Temp:   +44 C  (high =    +0 C, hyst =  -128 C)   sensor = transistor 
# CPU1 Temp: +43.0 C  (high =   +80 C, hyst =   +75 C)   sensor = transistor 
# CPU2 Temp: +37.0 C  (high =   +80 C, hyst =   +75 C)   sensor = transistor 
# CPU1 Fan:    0 RPM  (min = 21093 RPM, div = 2)                     

    if (/Temp/){
	@tok = split;
	$dev=$tok[0];
	$temp=$tok[2]+0;
	$str .= "  $dev $temp";
    }
    if (/Fan/){
	@tok = split;
	$dev=$tok[0];
	$rpm=$tok[2];
	$fans .= "  $dev $rpm";
    }
}
printf "Temps:$str  Fans:$fans\n";
    
Here are some of the resulting log entries:
May  6 17:10:03 vanzandt tempmon: Temps:  sda 35  sdb 34  sdc 38  VRM1 39  AGP 40  DDR 41.5  VRM2 41  CPU1 40  CPU2 35.5  Fans:  CPU1 2732  CPU2 0  chs1 0  chs2 0  chs3 0
May  6 17:20:03 vanzandt tempmon: Temps:  sda 35  sdb 34  sdc 38  VRM1 39  AGP 40  DDR 42  VRM2 41  CPU1 40  CPU2 36.5  Fans:  CPU1 2732  CPU2 0  chs1 0  chs2 0  chs3 0
May  6 17:30:02 vanzandt tempmon: Temps:  sda 35  sdb 34  sdc 39  VRM1 40  AGP 40  DDR 42  VRM2 41  CPU1 40  CPU2 36.5  Fans:  CPU1 2743  CPU2 0  chs1 0  chs2 0  chs3 0
May  6 17:40:02 vanzandt tempmon: Temps:  sda 35  sdb 35  sdc 39  VRM1 40  AGP 40.5  DDR 42  VRM2 41  CPU1 40  CPU2 36.5  Fans:  CPU1 2732  CPU2 0  chs1 0  chs2 0  chs3 0
    
I plan to extend the script to send me mail if the CPU fan fails, and to halt the system if any temperatures get too high. Everything is running pretty cool now, though.

ALSA Sound

I load these modules for ALSA sound:

snd_emu10k1_synth    
snd_emux_synth       
snd_seq_virmidi      
snd_seq_midi_emul    
snd_seq_oss          
snd_seq_midi         
snd_seq_midi_event   
snd_seq              
snd_emu10k1          
snd_rawmidi          
snd_seq_device       
snd_ac97_codec       
snd_pcm_oss          
snd_mixer_oss        
snd_pcm              
snd_timer            
snd_ac97_bus         
snd_page_alloc       
snd_util_mem         
snd_hwdep            
snd                  
    
By default, sound was turned off in four different ways. I had to unmute and set nonzero volume on two different controls. For example:
amixer set PCM unmute 70%
amixer set Master unmute 100%
    
For what it's worth, the "volume" control in xmms is connected to ALSA's PCM control.

Performance

The SCSI and IDE disk drives appear to be about equally fast. Maybe they're limited by the speed of the PCI bus:

# hdparm -t -T /dev/hda /dev/sda /dev/md0

/dev/hda:
 Timing buffer-cache reads:   128 MB in  0.48 seconds =266.67 MB/sec
 Timing buffered disk reads:  64 MB in  1.58 seconds = 40.51 MB/sec

/dev/sda:
 Timing buffer-cache reads:   128 MB in  0.48 seconds =266.67 MB/sec
 Timing buffered disk reads:  64 MB in  1.64 seconds = 39.02 MB/sec

/dev/md0:
 Timing buffer-cache reads:   128 MB in  0.47 seconds =272.34 MB/sec
 Timing buffered disk reads:  64 MB in  1.65 seconds = 38.79 MB/sec
    

I ran the BYTE Unix Benchmark (after applying this patch to update the shell syntax for supplying a default value for shell parameters). Here are the results:

TEST                                     BASELINE     RESULT     INDEX

Arithmetic Test (type = double)            2541.7   717071.8     282.1
Dhrystone 2 without register variables    22366.3  3730471.3     166.8
Execl Throughput Test                        16.5     2798.1     169.6
File Copy  (30 seconds)                     179.0    33001.0     184.4
Pipe-based Context Switching Test          1318.5   153028.0     116.1
Shell scripts (8 concurrent)                  4.0      776.2     194.1
								======
     SUM of  6 items                                            1113.0
     AVERAGE                                                     185.5

To test the display I ran xbench. Here are the detailed results and the xstone summary:

TOTAL    3458373 lineStones
TOTAL     918771 fillStones
TOTAL     773992 blitStones
TOTAL  192233404 arcStones
TOTAL    9926134 textStones
TOTAL    1919869 complexStones
TOTAL    1752747 xStones

Aside from the noise, I'm very happy with the machine. And I have some quiet fans to install that may help.


Stability Issues

Within a month of delivery, the machine started crashing for no good reason. Things got gradually worse until finally it could not run long enough to finish booting.

In March 2003 I started getting unexplained crashes. I had been running the 2.4.19 kernel. I upgraded to 2.4.20 and made some configuration changes (disable APM, enable high memory) that seemed to help.

The first significant failure was in July. I had left the machine running while I was on a trip. When I returned, the machine was off (no fans running) although it was switched on and getting power. The weather had been hot - the first hot weather of the summer. Power cycling didn't help. I looked inside, but saw nothing wrong. After that it booted and ran.

However, since then it never ran for very long - failing with kernel crashes of one sort or another. I thought it tended to run somewhat longer when cool, but nothing definite. I had been running a stable Linux kernel, and had made no significant changes. Nevertheless, to rule out software problems I booted KNOPPIX. That's a Linux system that runs completely off a CDROM. It may have run a little more reliably, but eventually it crashed too. I immediately booted to BIOS setup and recorded:

Chassis Fan1 speed 0 RPM
Chassis Fan2 speed 0 RPM
Chassis Fan3 speed 0 RPM
CPU1 Temperature 54 C
CPU2 Temperature 52 C
CPU1 fan speed 2934 RPM
CPU2 fan speed 0 RPM
    

(Only one of the fans had a three-wire connection to the motherboard. The rest were connected directly to the power supply.) The temperatures were well within the Athlon specs. Also, the CPU temperatures were pretty close. This was on Sun Jul 27, at a room temperature of 85 F (that's 47 C).

At this point I called N. W. Custom Computers and talked to one of the technicians. He suggested wiping the disks clean and reinstalling the operating system. (I decided he had not worked enough with Linux.)

I was considering returning the system on warranty, but didn't want to risk losing all my data. I had most of the files on the SCSI drives with backups on the IDE drive, but no backups elsewhere. I suggested removing the IDE drive before shipping the system, but he was worried that the problem might actually be in the IDE drive.

On Fri Aug 1, at a room temperature of 72 F (22 C), I ran the system for 55 min with no failure, after which the BIOS reported

CPU1 temp 47 C
CPU2 temp 46 C
    

I reinstalled the IDE disk, rebooted, installed sensord, and found a working /etc/sensors.conf. "sensors" reported CPU temperatures of 47.5 C and 51.5 C (note CPU2 was hotter, for the first time). It crashed after a few minutes. On Sat Aug 2, at 72 F, the system booted, but crashed within three minutes. Apparently it's more than just temperature.

The next day I discovered the CPU2 heat sink was installed incorrectly! The CPU socket has three projections on each side, and that the heat sink is supposed to be clamped down using the center projection on each side. However, I managed to get it offset to one side. At this point I decided CPU2 (i.e. the one further from the power supply and closer to the PCI slots) was likely damaged, and removed it. After all, the Tyan board is supposed to work with only one CPU installed.

However, on at least the next ten attempts, the system failed to boot. It didn't start at all - no beeps, no video signal, nothing. Eventually it did start, and finished the boot sequence. I rebooted, but this time it crashed within a minute. On the next try, it failed to start at all.

Then I then opened the box up and discovered I had neglected to mount the CPU1 fan! I did so. The next try it crashed part way through the boot. I brought up the BIOS screen which reported a CPU1 temperature of 48 C. At this point I decided I had two damaged CPU chips, and ordered replacements (still MP chips - I have enough trouble without trying to modify XP chips to run in SMP mode). Unfortunately, they did not help.

I also tried:

With any of the above combinations, the symptoms were similar: the system failed in 30 sec to 3 min from power up (i.e. sometimes during the BIOS boot, and sometimes during the Linux kernel boot sequence while running fsck on the filesystems). Sometimes it generated a register dump. Other times it just halted.

At this point I ran out of options, and asked for a return authorization...and got no reply.

I discovered a mail misconfiguration that kept me from sending any mail, so I sent it again...and still got no reply.

Well, I was never excited about shipping my machine across the country anyway. I called a local place, Microtime Computers in Amherst NH, and had them pick up the machine and diagnose the problem. They found that the CPUs were okay, and the memory was okay (i.e. worked in another machine). However, based on some research on the web, they decided the memory was not compatible with the Tyan motherboard. Their best guess is that the Kingston memory barely met the actual requirements of the board when new, and a few months of aging was enough to break the system completely. They ordered and installed RAM from Crucial: CT6472Y265.18LT4 512MB DIMM PC2100 ECC REG. Since then, the machine has run flawlessly.

Update, 2003-11-30

The computer ran well in this configuration for two weeks.

As mentioned above, this system has six cooling fans: two in the power supply, two on the case (exhaust), and one for each CPU heat sink. The latter were a Zalman PS92252H (90 mm, 3.36 W) and an 80 mm Papst TYP 8412 NGL (with no tachometer). As delivered, the Zalman fan was connected to the CPU1 fan header on the mother board, so sensord could monitor its speed. All the others were connected directly to the power supply. I wanted to be able to monitor both of the CPU fans, so I replaced the Papst fan with a Vantec SF8025L (80 mm, 100 mA) and connected it to the other CPU fan header. The system ran this way for about two days, then crashed and wouldn't even give me a video signal when I reapplied power.

After several attempts with not even a video signal, I started suspecting that my one hardware change might be the cause. The User's Manual for the S2466N-4M does not give maximum current specifications for the CPU fan headers. The S2460 manual specifies a current of 300 mA, which would be enough for my fans (3.36W/12V=280ma and 100ma). However, my web searches turned up several postings that advised not using the CPU fan headers at all. Indeed, when I connected both CPU fans directly to the power supply, it was able to finish POST. However, it crashed within a few minutes.

I have also tried:

I have send email off to Tyan and asked for an RMA number. I hope that works out. If not, I'll probably buy another mother board (but not a Tyan).

Update, 2003-12-21

Tyan gave me an RMA number and I shipped the motherboard. According to the UPS package tracker, it should be delivered in Richmond CA tomorrow.

Update, 2004-01-11

Tyan received my board on December 22 and shipped a replacement by FedEx on December 24.

The replacement board is different and a bit newer (higher serial number) and appears to be used (the mounting holes show scoring), but I'm not complaining - I'd much rather have a working board now than wait weeks for them to fix mine.

Once I got my system reassembled, the first thing I booted was memtest-86. It found 302 errors on its first pass (about one hour of test time). On the other hand, at least the first 11 were single-bit errors which ECC should handle. (If there's a way to get memtest-86 to display more than that, I didn't find it.)

I then discovered that the BIOS setup was configured with ECC disabled. (I had thought it was memtest-86 that had disabled ECC.) I enabled ECC, re-ran the test, and got no errors at all. I configured for "ECC scrub" and it's been running fine since (over 15 hours so far). (I'm not sure what "ECC scrub" does, but I gather it provides a little more stability with a small cost in performance.)

I was reminded in a private email about the faulty electrolytic capacitors that surfaced last year. I suppose that could have been the problem with my board, though I had not noticed any burst capacitors.

At any rate, I'm happy with Tyan service!

I've replaced the Sony CD/RW drive with a new Memorex CD/DVD writer which reports the following IDE identification information:

# hdparm -I /dev/hda
/sbin/hdparm -i /dev/hdc

/dev/hdc:

ATAPI CD-ROM, with removable media
	Model Number:       Memorex DVD+/-RW Dual-X1                
	Serial Number:      CGDC042020WL       
	Firmware Revision:  1.05    
Standards:
	Likely used CD-ROM ATAPI-1
Configuration:
	DRQ response: 50us.
	Packet size: 12 bytes
Capabilities:
	LBA, IORDY(can be disabled)
	Buffer size: 64.0kB
	DMA: mdma0 mdma1 mdma2 udma0 udma1 *udma2 
	     Cycle time: min=120ns recommended=120ns
	PIO: pio0 pio1 pio2 pio3 pio4 
	     Cycle time: no flow control=240ns  IORDY flow control=120ns
Commands/features:
	Enabled	Supported:
	   *	DEVICE RESET cmd
	   *	PACKET command feature set
	   *	Power Management feature set
HW reset results:
	CBLID- above Vih
	Device num = 0 determined by the jumper
    

Update, 2004-02-09

I am happy to record uptime's report of "up 22 days" with only one failure (an unexplained halt while I was away on a trip). The other oddity is that I found ECC disabled in the BIOS setup a second time.

Update, 2004-02-10

Since the hardware seemed stable, I decided to try the 2.6.1 kernel I compiled a while ago. Unfortunately after I reset, it failed to boot, displaying a message something like "LIWrong boot loader, giving up". I guess I had updated the lilo package but neglected to run lilo. The real problem was that I got the same message even when I tried to boot from a CDROM! Eventually I booted from a "custom boot floppy", ran lilo, and recovered. Apparently my nifty new CD/DVD drive isn't recognized as a valid boot device.

Also, there appears to be no battery backup for the CMOS memory - it loses all settings (ECC mode, boot device order, even time and date) when I turn power off.

Update, 2006-05-06

I had a major incident with this machine in the last month - the fan on CPU #1 died, and the CPU failed. I moved CPU #2 over to the #1 socket, and am running with just the one. I installed a splitter cable so I can power the CPU fan directly from the power supply, but run the RPM sense line to the header on the mother board.

While I had the machine open, I noticed that the CMOS battery was installed upside down. Since correcting that, the CMOS memory is working fine.


Send comments, questions, suggestions to: jrvz at comcast dot net.

DISCLAIMER: I have no connection to any of the companies mentioned here, other than as a customer. For official information, see the official sites.


home

Last modified: 2006-05-13

Valid HTML 4.01!