Site Reliability Engineer (SRE)
Datica helps healthcare get on the cloud. Since 2013, we have worked with hundreds of healthcare companies—from digital health startups to hospitals to Fortune 100 enterprises—bringing them into the future. Our secret is the dirty work: No company is better at focusing on the muddy details that are the true blockers to healthcare utilizing the cloud. Datica removes the risk for digital health in the cloud. We solve the problem of HIPAA compliance in the cloud and enable secure data exchange between digital health and EHR’s. Customers and partners across healthcare trust Datica to ensure their clouds are compliant and their data is securely interoperable.
At Datica, we believe the future of value-based patient care will be powered by HIPAA compliant, scalable interoperable infrastructure. We exist to help all of healthcare transition to that future by de-risking the challenges that come with it.
We're on a mission, and you could play a critical role.
Datica is a growth stage startup. In 2019, we plan to double our customer base, double our revenue, and dramatically increase our impact on improving patient care. We are doing this through an aggressive product roadmap, a growing team, and a laser focus on the problems our customer face. The next year will be hard, but the rewards will be worth it.
We are looking for a motivated Site Reliability Engineer with strong security experience to help us scale our Compliant Platform as a Service and our Compliant Kubernetes Service (CKS).
As an SRE at Datica, you will work with diverse technologies to eliminate manual operations, measure service levels, facilitate post-mortems, and enable automated deployments, with a strong focus on security and compliance. You will be responsible for supporting service level objectives, reducing operational burden, and troubleshooting issues with our products.
Work with the product team and engineers to define availability, performance & capacity objectives and the means to monitor, alert, and automatically respond to events that impact these objectives
Develop and implement automated change management and provisioning programs
Resolve service incidents as part of our on-call engineering team
Facilitate blameless postmortems that focus on continuous improvement to processes, service design, and operational capabilities
2+ years of experience developing/operating infrastructure with AWS, Linux and Docker. Experience managing large-scale services strongly desired.
Demonstrated domain knowledge of systems management practices (e.g. ITIL)
Experience with Kubernetes cluster deployment, operation & administration
Experience in one or more of the following programming languages: Python, Go
Experience with infrastructure configuration management using SaltStack
Experience automating infrastructure monitoring & alerting. We use Prometheus, ELK, Grafana, Statuspage.io, PagerDuty, and Slack
Be able to work remotely, as part of a Kanban team, with strong attention to detail and limited supervision
Experience working in regulated and/or high-security environments and industries (PCI, HIPAA, HITRUST) required.
*THIS IS A REMOTE POSITION*