View Our Website View All Jobs

Open Data Engineer

The position will work in the Web Archiving Group, supporting development of web archiving and data services for partners ranging from national libraries and governments to universities, researchers, and collaborating technology partners. The role will help build, maintain, and scale new and existing software and services, including at-scale data processing pipelines, access and harvesting technologies, APIs, and contribute to improving our systems and services used by libraries and open knowledge organizations around the world. The position will also help to synthesize user and business driven technical needs into new engineering and development projects and features across our portfolio of paid services. This role contributes to managing data at scale, building new tools and technologies, monitoring and deploying production systems, working closely with a distributed team of program/product staff and engineers, and liaising directly with an international set of collaborators and clients on web and data services.

Responsibilities & Duties

  • Contribute to both new engineering and maintenance/improvement needs for our core data systems supporting production services, data processing, and access tools and APIs.
  • Support existing services, and architect and build new products for libraries and knowledge organizations related to the access, indexing, harvesting, and use of large voluments of born-digital scholarly materials collected from the web and other systems, including R&D related to data extraction, processing, and search/access.
  • Participate in documentation, community outreach and partners relations, promoting our work and services to the global community via travel, publication, and presentations.
  • Deploy, monitor, and maintain user-facing public services with a focus on process automation and operational resilience.
  • Support our Research Services activities, working with researchers on using large sets of IA data in data-driven scholarship and computational research.
  • Participate in supporting/improving post-acquisition data processing pipelines such as indexing, content mining, and data transformation and derivation, etc.

Preferred Skills & Requirements

3-4 years of experience in Python and Unix/Linux shell

  • Knowledge of building and deploying web applications, databases, web-host services, and knowledge of basic Linux system administration
  • Experience with version control (git), open source practices, and code review
  • Solid experience in Internet protocols, HTML, JavaScript and web technologies in general
  • General experience in frontend/Javascript coding
  • Cluster computing experience is preferred, especially familiarity with Hadoop, Spark, and related technologies and tools
  • Ability to work in, and enjoy, a loosely structured work environment with a mostly remote team operating from many time zones and continents
  • Bachelor's Degree in Computer Science or a related field, five years of progressively responsible experience in software development, or relevant experience.

Reporting Structure: The Web Data Engineer reports to the Director, Web Archiving & Data Services and works closely with program staff in Web Archiving & Data Services team, and with the broader IA operations/infrastructure and engineering teams.

Location: San Francisco, CA or remote
References must be made available upon request.

Internet Archive is an Equal Opportunity Employer. Internet Archive complies with the Fair Chance Ordinance. Internet Archive is a 501(c)(3) non-profit library founded in 1996.

Read More

Apply for this position

Apply with Indeed
Attach resume as .pdf, .doc, .docx, .odt, .txt, or .rtf (limit 5MB) or Paste resume

Paste your resume here or Attach resume file

To comply with government Equal Employment Opportunity / Affirmative Action reporting regulations, we are requesting (but NOT requiring) that you enter this personal data. This information will not be used in connection with any employment decisions, and will be used solely as permitted by state and federal law. Your voluntary cooperation would be appreciated. Learn more.

Invitation for Job Applicants to Self-Identify as a U.S. Veteran
  • A “disabled veteran” is one of the following:
    • a veteran of the U.S. military, ground, naval or air service who is entitled to compensation (or who but for the receipt of military retired pay would be entitled to compensation) under laws administered by the Secretary of Veterans Affairs; or
    • a person who was discharged or released from active duty because of a service-connected disability.
  • A “recently separated veteran” means any veteran during the three-year period beginning on the date of such veteran's discharge or release from active duty in the U.S. military, ground, naval, or air service.
  • An “active duty wartime or campaign badge veteran” means a veteran who served on active duty in the U.S. military, ground, naval or air service during a war, or in a campaign or expedition for which a campaign badge has been authorized under the laws administered by the Department of Defense.
  • An “Armed forces service medal veteran” means a veteran who, while serving on active duty in the U.S. military, ground, naval or air service, participated in a United States military operation for which an Armed Forces service medal was awarded pursuant to Executive Order 12985.
Veteran status

Voluntary Self-Identification of Disability
Voluntary Self-Identification of Disability Form CC-305
OMB Control Number 1250-0005
Expires 1/31/2020
Why are you being asked to complete this form?

Because we do business with the government, we must reach out to, hire, and provide equal opportunity to qualified people with disabilities.i To help us measure how well we are doing, we are asking you to tell us if you have a disability or if you ever had a disability. Completing this form is voluntary, but we hope that you will choose to fill it out. If you are applying for a job, any answer you give will be kept private and will not be used against you in any way.

If you already work for us, your answer will not be used against you in any way. Because a person may become disabled at any time, we are required to ask all of our employees to update their information every five years. You may voluntarily self-identify as having a disability on this form without fear of any punishment because you did not identify as having a disability earlier.

How do I know if I have a disability?

You are considered to have a disability if you have a physical or mental impairment or medical condition that substantially limits a major life activity, or if you have a history or record of such an impairment or medical condition.

Disabilities include, but are not limited to:

  • Blindness
  • Deafness
  • Cancer
  • Diabetes
  • Epilepsy
  • Autism
  • Cerebral palsy
  • Schizophrenia
  • Muscular dystrophy
  • Bipolar disorder
  • Major depression
  • Multiple sclerosis (MS)
  • Missing limbs or partially missing limbs
  • Post-traumatic stress disorder (PTSD)
  • Obsessive compulsive disorder
  • Impairments requiring the use of a wheelchair
  • Intellectual disability (previously called mental retardation)
Please check one of the boxes below:

You must enter your name and date
Your Name Today's Date
Reasonable Accommodation Notice

Federal law requires employers to provide reasonable accommodation to qualified individuals with disabilities. Please tell us if you require a reasonable accommodation to apply for a job or to perform your job. Examples of reasonable accommodation include making a change to the application process or work procedures, providing documents in an alternate format, using a sign language interpreter, or using specialized equipment.

iSection 503 of the Rehabilitation Act of 1973, as amended. For more information about this form or the equal employment obligations of Federal contractors, visit the U.S. Department of Labor's Office of Federal Contract Compliance Programs (OFCCP) website at

PUBLIC BURDEN STATEMENT: According to the Paperwork Reduction Act of 1995 no persons are required to respond to a collection of information unless such collection displays a valid OMB control number. This survey should take about 5 minutes to complete.