We are looking for a hands-on Operations Engineer with proven experience effectively participating on high-performance teams of system administrators and technical operations staff. The ideal candidate will have experience working with large systems installation from the ground up, proven technical aptitude solving low-level problems, and will be looking to assume multiple responsibilities on a lean, focused team of operations engineers providing a reliable, secure, and highly available service.
Responsibilities & Duties
- Join and contribute to the technical operations team responsible for monitoring and maintaining the health of all Internet Archive networks and online services. This includes all publicly-facing services, the storage and compute cluster, as well as key internal services related to crawling, search, indexing, and access to archived web content
- Implement, maintain, and expand monitoring and reporting systems to communicate current and historical activity for multiple publicly facing services.
- Administer several “turn-key” internal applications (CiviCRM instance, WordPress instance, gitlab, Jira, Email servers, etc.).
- Analyze and implement effective improvements in the maintenance and operations processes and infrastructure.
- Plan and coordinate the transition of new software systems and service applications from a development into a production footing. This includes establishing procedures and policy that will ensure sustainable deployment, monitoring, upgrade and expansion of services.
Skills & Requirements
- Experience working with a large server cluster infrastructure
- Fluency in Linux system administration
- Experience as a principal (or sole) member of a technical operations team
- Ability to document and share critical knowledge with others
- "Customer Service" mentality - advocate for the end user experience of web-delivered services
- Passion for automation, data-driven decision making, and information reporting
- Creative problem solver
- Passion for staying current with industry trends
- Experience using configuration management tools such as Ansible, Chef, or Puppet
- Experience with modern centralized monitoring platforms (e.g. Grafana, InfluxDB, Prometheus)
- Experience in automation and scripting using Python or similar
- Database administration and management experience, with PostgreSQL or otherwise
- Excellent oral/written communication and documentation skills
- Flexibility and a sense of humor
Reporting Structure: The Operations Engineer reports to the Director of Engineering and works closely with the Head Librarian and Founder.
Benefits & Perks
The Internet Archive provides a comprehensive benefits package including; PTO, paid holidays, medical, dental, vision, FSA, commuter, STD, LTD, 403B/Roth accounts and Friday lunches at IA HQ.
Internet Archive is an Equal Opportunity Employer M/F/D/V/L/G/B/T and will consider for employment, qualified applicants with criminal histories in a manner consistent with the requirements of the Fair Chance Ordinance.
The Internet Archive has over 30 petabytes of unique digital information stored on-site in a cluster of over 600 hosts in two data centers.