1. The art of persuasion; or as my manager likes to say ‘the most important class I took was my high school English class; learning to write persuasive essays.’ This skill helps when your audience is fairly logical and you have facts to back up your case. Coming in with the best technical solution is not enough; you need to come in with a persuasive argument as to *why* your solution is best and *convince* everyone else of that fact.

2. If you have more than one SA, peer review is helpful. Now I’m saying helpful here; peer review should be a tool (much like code review) to improve your processes and designs; not some hindrance to keep you from getting stuff done. This again falls back to #1, everyone needs to be fairly good at communicating what they think and why; if they point out problems it is also helpful to provide solutions.

3. Test your changes. Editing things in production is generally considered bad form for routine changes. A testing environment should be accessable (if you don’t have one, get one). This is pretty standard practice. The test environment doesn’t have to be elaborate; testing will never catch all cases. However it will catch a surprising number of silly mistakes that are simply operator error or misconfigurations. Test instances also give you an opportunity to script things in order to minimize human errors which leads to the next point.

4. Script everything. Do it once by hand, then script it. Automation means less work for you; less typing, less mistakes, less cleanup. When you hire a new SA you just point him at the scripts to get stuff done; if he is curious as to how they work he can read the source (you put the source in source control right?).

5. Canary your changes. Even small deployments can enjoy the process of canarying changes. Have a pair of servers, deploy the change on one; wait a few days for complaints or problems, then deploy on the second machine. Have more than two machines? Try 1%, 10%, 50%, 100% over the course of a week. In either case doing a subset of the whole means problems can be spotted before the entire service is affected.