Sysadmin Skills Mulligan
I kind of hate my sysadmin skills entry. On the one hand, I feel strongly about the points I tried to make, and feel they are relevant and useful. On the other, the entry itself is a rambly mess, and the rambly story about troubleshooting DFS makes me cringe every time I read it. So I am going to try again, with the hope that I can write something I can link to without feeling bad. So you might want to skip this entry if you read the other one.
I believe that technically proficient "computer people" have the following:
The ability to model systems and explain how they work. As people with this skill gain knowledge about a domain, they can refine and improve their models. They can be shown some system behaviour and come up with stories about what is going on. They can integrate new information into their models so that as they learn more about these systems, their stories become more correct.
The ability to troubleshoot problems by observing systems they don't understand, coming up with hypotheses about what might be going on, and then devising concrete experiments to see whether their hypotheses are correct. People with these skills can break down complex situations and isolate issues. They have enough judgement to know when they do not understand something, and can get some sense of what they would need to learn in order to improve their understanding.
A solid grasp of domain-specific knowledge about one or more technical areas. These are the details and computer trivia that makes computer people sound smart. They are collections of tools that people can use to observe systems and devise tests. These include implementation details about computer systems that are necessary to get computers working in the real world.
My primary argument is that although all of these are necessary to be an effective systems administrator, modelling and troubleshooting skills are much more important than domain-specific knowledge. As people model systems and troubleshoot them, they fill in gaps of their knowledge, and in so doing they develop domain-specific knowledge. Thus, if you hire somebody who has poor domain-specific knowledge but is strong in modelling and troubleshooting, that person will probably be able to learn on the job.
In contrast, hiring somebody who has a lot of domain-specific knowledge but poor modelling or troubleshooting skills is foolish, because computer technology changes so fast that any domain-specific knowledge will be obsolete within a few years. Domain-specific knowledge has to be continually updated and refreshed.
My secondary argument is that we do a terrible job of hiring technical people for these kinds of jobs, and that certain institutions (namely private colleges, but others as well) do a terrible job of training people for these positions.
The fundamental problem is that domain-specific knowledge is easy to test and evaluate in automated ways, while modelling and troubleshooting skills are not. Therefore we use domain-specific knowledge as a proxy for technical proficiency, reasoning that people with good domain knowledge got that knowledge because they have good modelling and troubleshooting skills. This reasoning is catastrophically incorrect, because it is possible to obtain lots of factoids about a domain through memorization or other shallow learning.
Hiring is Awful
In many cases, organizations hire technical people to fill specific roles. Say that an organization runs an Exchange mailserver, and that the existing sysadmin is moving to China. The organization is under pressure to hire a replacement sysadmin, and because organizations want to be efficient, they are not willing to train somebody from scratch. Thus they make job ads looking for specific skills, and since more experience is better, they inflate the qualifications they are looking for: "The ideal candidate will have 15 years of progressive experience administrating Windows Exchange Server 2013." Job ads become lists of technology buzzwords. This makes life easier for human resources departments, which are largely staffed by people who "aren't computer people", and do not have technical proficiency in the fields they are hiring for. So they filter resumes based on the technology buzzwords in the advertisement.
Then comes the hiring interviews. It is easy to grade people on domain-specific factoids, so often the interview process consists of a bunch of domain-specific questions. Some companies go further, coming up with interview questions intended to test creative thinking, algorithm design, and problem solving skills.
All of this is awful:
It creates incentives for applicants to pad their resumes, which means that resumes can no longer be trusted as reliable sources of information about candidates.
It eliminates candidates who have transferable skills from other domains that could be applied to the current opening. Somebody who has real-world experience adminstering a groupware product on Linux may have highly transferable skills to Exchange administration, but buzzword filtering might eliminate such resumes entirely.
It creates incentives for terrible private colleges and certification mills to focus on as much domain-specific knowledge as possible, so that somebody who has followed an instruction sheet to install Exchange 2013 and create a user can list him or herself as "experienced in Exchange 2013" (or worse, "Expert in Exchange 2013".
I feel it is ethically sketchy to expect that you can hire somebody with the exact skills you need to solve your problem. In doing so, you are foisting off responsibility for learning and training your staff to somebody else.
Most importantly, it eliminates good potential candidates, because those with good modelling and troubleshooting skills can learn on the job.
What about companies that design interview questions designed to exercise creative thinking skills? Companies such as Google and Microsoft do this, but their tests are standardized, and therefore potential applicants can learn the questions and memorize the answers, thus gaming the system. In order to be effective, these kinds of tests have to be obscure.
How do you fix the hiring problem? I don't know how to do so via greater automation. Rather, all the solutions I know of require time and patience:
Look for people who have demonstrated modelling and troubleshooting skills in some domain, even if it is not the one you are hiring for. Focus on hiring the right people over people trained in the right technologies.
When candidates claim domain proficiency, probe deep into their knowledge by asking them to tell stories about their use of the technology in question. Ask a lot of follow-up questions and get further elaboration on the points they bring up.
Get them to tell you the approaches they have taken to solving problems in their domain of proficiency. What troubleshooting tools did they use? How did they narrow down the possibilities? How did the problem get fixed? What was the underlying cause? People who can answer these kinds of questions are more likely to have good models of their domains.
Get them to solve problems. Look at how they approach the problem. Look at what hypotheses they come up with, and how they test these hypotheses. Look at how they integrate new information into the situation.
If people have portfolios of work, look at those portfolios, and try to determine the ways in which they have contributed.
None of these techniques are foolproof. None of them address problems with resume inflation. In my experience job interviews are deeply misleading and perhaps actively unhelpful; the true character of candidates makes itself clear within days of a new hire.
Although I strongly believe that looking for candidates with specific domain-specific skillsets is largely a mistake, I also believe that candidates should demonstrate that they are proficient in some domain. If people have no technical knowledge in any domain, then how do I know they are capable of gaining domain knowledge? If the only domain knowledge people have gained was decades ago in some obsolete technology, how do I know that they have kept their modelling and troubleshooting skills sharp?
Education is (Often) Awful
The worst IT educations come from programs that are buzzword-heavy, where the curriculum consists of following cookbooks of instructions about technology after technology. Such curricula can push domain knowledge into people's brains (especially if memorization is involved) but the information is often context-free and unapplied. Furthermore, following cookbook IT recipes does nothing to develop modelling or troubleshooting skills.
There may be some certifications that are worthwhile, but certifications that involve memorizing a bunch of factoids are stupid and counterproductive.
I feel that my undergraduate education in Computer Science did a good job of teaching me modelling and troubleshooting skills, even though the technologies in question were never used in industry. Some of the things that helped develop these skills included:
- Lots of programming assignments that required lots of troubleshooting and debugging.
- Proofs and analysis in things like data structures and algorithms.
- Being put in positions where I would teach material to others.
- Exposure to completely weird computer science concepts that broke my brain (such as functional programming, computability and NP-hardness, data structures and algorithms, database structures and implementation, and even hateful formal methods)
- Mathematics courses that helped me reason more precisely in domains different from computers.
None of the technologies I used during my undergrad are useful to me today. My degree was focused on software development, not systems administration. Nonetheless, so much of the knowledge was transferable.
This is probably part of the reason "an undergraduate degree in computer science" is used as a resume screening credential by employers, but this is a mistake. Firstly, good candidates without undergraduate educations exist. Secondly, university degrees are expensive, stressful, and time-consuming, and not everybody is in a position to obtain one. (As a data point: I almost did not get through my undergrad.) Thirdly, although having an undergrad degree is probably correlated with having good sysadmin skills, it is not a guarantee.
Developing Sysadmin Skills
This raises some uncomfortable questions:
- Can modelling and troubleshooting skills be developed outside university?
- Can these skills even be taught? Is it the case that some people have them and some people don't, and there is not much you can do to develop them?
I desperately want to believe that these skills can be taught, and that they can be taught outside of a university context. I have coworkers and former coworkers who never went to university but have these skills, so I do not think university is strictly necessary. But I am worried about whether these skills can be taught; I am shocked by how scarce they seem to be among those whom we interview.
If you are somebody who feels your modelling and troubleshooting skills are weak, you might try the following:
Learn computer programming. In my opinion, getting computer programs to run correctly is inherently related to troubleshooting skills; people who cannot debug programs cannot write programs, and people who cannot model solutions cannot design good programs.
Related to the above, contribute to an Open Source project with a public bugtracker. Start by creating test cases for bugs that are reported. Then start trying to fix bugs and contribute patches. This will develop all three skillsets, and develop a portfolio you can show to potential employers.
Find some computer stuff and teach it to others, since teaching is an excellent way to learn.
Volunteer to take care of computers for your friends, family and local organizations. As you do so, ask the kinds of questions and practice the kinds of habits that develop your modelling and troubleshooting skills. Don't be satisfied with getting things to work until you understand why they work. Explicitly make hypotheses about how the system is behaving and then come up with tests to confirm or refute those hunches. Observe your systems to figure out how they work, and build up a toolkit (
ping
,strace
,nmap
,wireshark
,adsiedit
,vmstat
, logfiles, Event Viewer, and many, many others) so you can observe what is happening better. Narrow down possible causes and simplify situations when diagnosing issues.Learn things in other (non-computer) fields that involve modelling and troubleshooting skills. There are lots of them out there, and those skills are transferable.
Learn and implement crazy computer things that will break your brain. Functional programming is one place to start, but there are lots of other things.
I don't know whether these techniques will work, because I do not know whether modelling and troubleshooting can be taught. However, I feel that some of these techniques have helped me develop my modelling and troubleshooting skills, so I hope they might be useful to others as well.