Welcome to the Slashdot Beta site -- learn more here. Use the link in the footer or click here to return to the Classic version of Slashdot.
Before you choose to head back to the Classic look of the site, we'd appreciate it if you share your thoughts on the Beta; your feedback is what drives our ongoing development.
Beta is different and we value you taking the time to try it out. Please take a look at the changes we've made in Beta and learn more about it. Thanks for reading, and for making the site better!
Top 10 System Administrator Truths
Overarching principle of making-your-life-easy: if you support more than three systems, treat them as a cluster.
- This means you have a dedicated admin machine that only a few very trustworthy admins have access to, that is very secure (no root logins, firewalled heavily, patched often, etc). I highly recommend running
SuSE Enterprise Linux 9 with the IBM EAL4+ Security Configuration
All maintenance activities are run from this management server.
- Use the Parallel Distributed SHell (PDSH) utilities: http://www.llnl.gov/linux/pdsh/pdsh.html. These allow you run commands or copy files to a single system, a group of systems, or all systems at the same time. Wondering what kernel all your systems are running? Just issue a `pdsh -a uname -a`. Need to copy out the sudoers file? `pdcp -a /home/admin/node_files/sudoers /etc/sudoers`
- Run Ganglia for resource monitoring: http://ganglia.info/
- Run Samhain for filesystem integrity scanning on all servers: http://la-samhna.de/samhain/
- Host based firewalls for all servers: http://www.shorewall.net/
- Power supplies have caused more instability in my experience than any other single hardware component. Buy both good equipment and buy systems with dual redundant hot-swappable power supplies for the important machines
- Good deals can be had from the big vendors. Although we run a lot of whitebox and IBM equipment, Sun currently has a great system for a very cheap price (starts at $745): http://www.sun.com/servers/entry/x2100/.
- NFS sucks, but is the best filesystem glue-layer available. It is very sensitive to high latency environments, so run it over Infiniband (it has very low latency, and massive bandwidth (5us, 1.25GB/s) if you need to sqeeze out the best performance.
- Every system should have an electronic "system book", which contains the full hardware specs, including where each part gets service from (if bought separately), how long the warranty lasts (give end dates), contact info, etc. If you are managing 50 or less systems, keep track of all changes in a central location, otherwise track all changes by using a system which scales (even a handwritten script and DB table would be sufficient).
- Good enough is the enemy of the Best, but that is a good thing. Never overengineer a solution, this only means that other problems go unsolved.
Monster Zero hasn't submitted any stories.
Monster Zero has no journal entries.