Advisor - Data Architect, Data Foundry
Company: Eli Lilly and Company
Location: San Francisco
Posted on: March 20, 2026
|
|
|
Job Description:
At Lilly, we unite caring with discovery to make life better for
people around the world. We are a global healthcare leader
headquartered in Indianapolis, Indiana. Our employees around the
world work to discover and bring life-changing medicines to those
who need them, improve the understanding and management of disease,
and give back to our communities through philanthropy and
volunteerism. We give our best effort to our work, and we put
people first. We’re looking for people who are determined to make
life better for people around the world. Location: San Diego, CA;
San Francisco, CA; Boston, MA; Louisville, CO; Indianapolis, IN
Reports to: Lead, Data Architecture (R9),
Architecture4InsightOverview Lilly Small Molecule Discovery is
purpose-built to create molecules that make life better for people.
Discovery Technology and Platforms (DTP) accelerates molecule
discovery by building optimized foundational platforms,
streamlining lab operations through advanced technologies and data
connectivity, and investing in novel capabilities. Data Foundry is
a multidisciplinary team within DTP that enables AI-native drug
discovery through four integrated pillars: Architecture4Insight
(data infrastructure and scientific software), Methods4Insight
(analytical and computational methods), Automation & Scale4Insight
(lab automation and agentic workflows), and Preparedness4Insight
(data governance and readiness). These pillars empower every Lilly
scientist to make optimal decisions by providing seamless access to
data, insights, and AI-driven capabilities—serving both human
scientists and autonomous AI agents. Position Summary We are
seeking Data Architects at multiple levels to design and build the
data infrastructure that makes AI-native drug discovery possible.
You will create the schemas, ontologies, data models, knowledge
graphs, and platform architectures that transform raw scientific
data into machine-actionable, FAIR-compliant, insight-ready
assets—serving both discovery scientists and autonomous AI agents.
This role is the foundation of Architecture4Insight . Everything
the software engineering team builds—pipelines, APIs,
prototypes—depends on the data models and platform architecture
this team designs. You will work with deep knowledge of scientific
data (chemical, biological, HTE, automation-generated) to create
custom-fit solutions, then partner with Tech@Lilly to scale and
maintain them. The role spans three focus areas depending on
expertise: data modeling & ontologies , data platform & lakehouse
architecture , and knowledge graph & specialized data systems . You
will independently design schemas, select technologies, and make
build-vs-buy recommendations for their domain. Responsibilities
Data Modeling & Ontologies Design and implement data models,
schemas, and ontologies for chemical, biological, and
automation-generated data that serve discovery workflows across the
portfolio. Define and maintain controlled vocabularies, metadata
standards, and FAIR-compliant data frameworks in partnership with
Preparedness4Insight. Implement semantic data standards (RDF, OWL,
SPARQL) and ontology engineering practices to create interoperable,
machine-readable scientific data. Data Platform & Lakehouse
Architecture Design and implement data lakehouse architecture using
modern platforms (Databricks, Snowflake, or equivalent), including
data storage patterns, partitioning strategies, and query
optimization. Build and optimize ETL/ELT pipelines using Spark,
dbt, or similar tools to transform raw scientific data into
analytical and ML-ready formats. Implement real-time and streaming
data integration (Kafka, Kinesis, event-driven patterns) connecting
LIMS, instruments, and lab automation systems to the data
infrastructure. Knowledge Graph & Specialized Data Systems Design
and implement knowledge graphs (Neo4j, Amazon Neptune, TigerGraph)
that capture molecular, target, pathway, and experimental
relationships across the discovery landscape. Architect specialized
data solutions: array databases (TileDB) for genomics/imaging,
document stores (MongoDB) for experimental records, and vector
databases for embedding-based retrieval supporting ML and RAG
workflows. Build query and traversal patterns that enable
scientists and AI agents to ask relational questions across the
entire data landscape. Cross-Functional Partnership Partner with
scientific software engineers to ensure data architectures are
implementable, performant, and well-documented. Collaborate with
Methods4Insight to design data structures that support analytical
model training, deployment, and evaluation. Work with Tech@Lilly to
define scaling strategies, ensure enterprise compliance, and
transition data architectures to production-grade management.
Contribute to build-versus-buy-versus-adopt decisions by evaluating
commercial and open-source data platforms against Data Foundry
requirements. Basic Requirements M.S. or Phd in Computer Science,
Data Science, Bioinformatics, Computational Biology, Information
Science, or related STEM field 6–12 years of data architecture,
data engineering, or scientific informatics experience. Deep
expertise in at least one of the focus areas: relational databases,
data modeling and ontology engineering, data platform and lakehouse
architecture (Databricks, Snowflake, Spark), or knowledge graph and
specialized database systems (Neo4j, Neptune, MongoDB, TileDB)
Preferred Qualifications Working familiarity with multiple database
paradigms — relational, graph, document, columnar, key-value — and
strong SQL skills. Understanding of scientific data types and
experimental workflows in life sciences or pharma (chemical,
biological, HTE data). Strong communication skills with ability to
translate data architecture concepts for both technical and
scientific audiences. Familiarity with cloud platforms (AWS, Azure,
or GCP) and modern data integration patterns. Pharmaceutical or
biotech research industry experience, particularly in discovery
data management or research informatics. Experience with semantic
web technologies: RDF, OWL, SPARQL, Protégé, or equivalent ontology
engineering tools. Hands-on experience with graph databases (Neo4j,
Neptune, TigerGraph) and knowledge graph design patterns for
scientific data. Data lakehouse architecture experience: Databricks
(Delta Lake, Unity Catalog), Snowflake, or equivalent; ETL/ELT with
Spark, dbt. Experience with streaming/real-time data platforms
(Kafka, Kinesis, Flink) and event-driven architectures. Familiarity
with LIMS, ELN systems (e.g., Benchling), and laboratory instrument
data integration. Experience with vector databases (Pinecone,
Weaviate, pgvector) and embedding-based retrieval for ML/RAG
applications. Array database experience (TileDB, Zarr) for
genomics, imaging, or high-dimensional scientific data. FAIR data
principles implementation experience and Data Readiness Level
frameworks. Scientific data standards and controlled vocabularies
in chemistry (InChI, SMILES) or biology (Gene Ontology, UniProt).
Experience with C, C++, or Rust for performance-critical data
processing; familiarity with HPC data I/O patterns for large-scale
scientific computations. Lilly is dedicated to helping individuals
with disabilities to actively engage in the workforce, ensuring
equal opportunities when vying for positions. If you require
accommodation to submit a resume for a position at Lilly, please
complete the accommodation request form (
https://careers.lilly.com/us/en/workplace-accommodation ) for
further assistance. Please note this is for individuals to request
an accommodation as part of the application process and any other
correspondence will not receive a response. Lilly is proud to be an
EEO Employer and does not discriminate on the basis of age, race,
color, religion, gender identity, sex, gender expression, sexual
orientation, genetic information, ancestry, national origin,
protected veteran status, disability, or any other legally
protected status. Our employee resource groups (ERGs) offer strong
support networks for their members and are open to all employees.
Our current groups include: Africa, Middle East, Central Asia
Network, Black Employees at Lilly, Chinese Culture Network,
Japanese International Leadership Network (JILN), Lilly India
Network, Organization of Latinx at Lilly (OLA), PRIDE (LGBTQ
Allies), Veterans Leadership Network (VLN), Women’s Initiative for
Leading at Lilly (WILL), enAble (for people with disabilities).
Learn more about all of our groups. Actual compensation will depend
on a candidate’s education, experience, skills, and geographic
location. The anticipated wage for this position is $151,500 -
$222,200 Full-time equivalent employees also will be eligible for a
company bonus (depending, in part, on company and individual
performance). In addition, Lilly offers a comprehensive benefit
program to eligible employees, including eligibility to participate
in a company-sponsored 401(k); pension; vacation benefits;
eligibility for medical, dental, vision and prescription drug
benefits; flexible benefits (e.g., healthcare and/or dependent day
care flexible spending accounts); life insurance and death
benefits; certain time off and leave of absence benefits; and
well-being benefits (e.g., employee assistance program, fitness
benefits, and employee clubs and activities).Lilly reserves the
right to amend, modify, or terminate its compensation and benefit
programs in its sole discretion and Lilly’s compensation practices
and guidelines will apply regarding the details of any promotion or
transfer of Lilly employees. WeAreLilly
Keywords: Eli Lilly and Company, Alameda , Advisor - Data Architect, Data Foundry, Science, Research & Development , San Francisco, California