Senior Applied AI Data Scientist

senior

Data ScientistData

0 views0 saves0 applied

Apply Now

Quick Summary

Key Responsibilities

7+ years in applied data science / applied ML with demonstrated ownership of end-to-end model development (problem framing → data → modeling → evaluation → evidence packages).

Requirements Summary

precision/recall by severity tier, false-negative containment strategies, threshold optimization, calibration,

Technical Tools

Data ScientistData

At Schwab, you’re empowered to make an impact on your career. Here, innovative thought meets creative problem solving, helping us “challenge the status quo” and transform the finance industry together. Please note: This position is M-F during standard business hours with a hybrid work model (4 days in-office, 1 day working from home). It is only available in the areas listed. Candidate must reside or be willing to relocate on their own to one of the listed areas. Applicants must be currently authorized to work in the United States on a full-time basis without employer sponsorship.

Retail Supervision & Risk Management is building AI-enabled supervision capabilities that help supervisors identify risk, synthesize evidence, and accelerate consistent, well-documented decisions. In this role, you will lead the development of a portfolio of domain-specific supervision models aligned to discrete risk categories (e.g., documentation review, call transcript risk detection, investor profile vs recommendation discrepancies, and representative activity patterns). You will partner closely with a supervision product portfolio owner, supervision SMEs, and model risk stakeholders to ensure solutions are accurate, explainable, auditable, and operationally sustainable, with supervisors firmly in the loop. This role is expected partner closely with engineering teams to deliver models and controls that are exam-defensible and auditable.

In this role you’ll –

Build a scalable “model factory”

Build and scale a supervision “model factory” by applying strong data science best practices across a portfolio of risk-category models: well-organized, version-controlled code; reproducible data pipelines; repeatable feature engineering; consistent evaluation harnesses; and standardized documentation/templates.
Establish and maintain portfolio standards (dataset curation conventions, feature definitions, labeling guidance, evaluation conventions, documentation structure, and release readiness criteria) to enable consistent delivery at scale.

Lead architecture and delivery of deployable AI systems

Serve as the embedded data scientist within the supervisory organization: lead analytical design and model development, while partnering closely with architects and engineers to enable repeatable, stable deployments into approved production pathways.
Select appropriate approaches (classical ML, NLP, LLM/RAG, hybrid), justify tradeoffs, and establish baselines. Provide clear specifications (features, thresholds, output schemas, and monitoring requirements) that engineering teams can productionize reliably.
Collaborate with engineering/platform partners to ensure model solutions meet operational constraints (latency, cost, throughput, maintainability) without compromising measurement integrity, auditability, or defensibility.

Evaluation, controls, and defensibility

Design evaluation harnesses aligned to supervision outcomes: precision/recall by severity tier, false-negative containment strategies, threshold optimization, calibration, drift detection, and reviewer agreement versus human evaluations.
Perform disciplined error analysis and remediation planning: slice-based performance (by product, channel, rep behaviors, client segments, doc types), root-cause analysis of false positives/false negatives, and concrete corrective actions (data, labels, features, thresholds, reviewer guidance).
Implement evidence & auditability requirements in partnership with stakeholders and engineering teams: reason codes/attribution strategies, input lineage expectations, model/version traceability, reproducible runs, and output logging suitable for exam readiness.
Build guardrails and safe-failure behaviors (conservative defaults, abstention/uncertainty handling, escalation logic, and human-in-the-loop triggers) to ensure supervisors remain firmly in the loop.

Documentation and model risk artifacts

Own model evidence packages (model card/whitepaper): training data description, labeling methodology and quality assessment, evaluation results vs human baselines, known limitations, monitoring plan, change history, and release gates — in partnership with the Supervision PO/SMEs and aligned to model risk expectations.

Data readiness + access patterns

Define required tables/fields, refresh expectations, data quality checks, and lineage requirements; partner with data teams to enable approved access patterns for model development, scoring, and monitoring
Define scalable labeling/ground-truth strategies with SMEs (taxonomies, sampling plans, adjudication workflows, inter-rater reliability) to ensure labels are fit-for-purpose, consistent, and defensible.
Design reusable, performant analytic datasets and feature definitions in partnership with data/engineering teams so multiple supervision models can reliably leverage common sources over time.

Operational controls and continuous improvement

Implement operational controls required for supervision: traceability (IDs), audit logs, replay-ability, and output schemas suitable for supervisory review and downstream workflows — partnering with engineering teams where needed for implementation.
Establish feedback loops using supervisor labels and outcomes to improve models over time; define drift/stability monitoring, retraining triggers, and periodic recalibration plans that are measurable and governable.
Identify and implement emerging techniques (LLM evaluation, retrieval strategies, calibration) to improve model quality and defensibility while maintaining governance discipline.

Platform execution model

Execute across platforms: develop in environments such as VS Code / Python / Dataiku; work with centralized engineering and architecture teams for productionization; and ensure DS artifacts (features, thresholds, evaluation scripts, monitoring definitions, documentation) are complete so deployments are stable and repeatable.

Team influence

Provide technical leadership on data science practices: code organization, reproducibility, evaluation discipline, and documentation standards across the model portfolio; mentor engineers/analysts and raise the bar on defensibility and operational excellence.
Partner tightly with the Supervision PO/SME to ensure clear communication and progress tracking: the data scientist provides ongoing technical updates and artifacts; the PO/SME leads broader stakeholder communications with DS support as needed.

Required qualifications:

7+ years in applied data science / applied ML with demonstrated ownership of end-to-end model development (problem framing → data → modeling → evaluation → evidence packages).
Hands-on proficiency with Python, SQL, and version control (Git); experience writing well-organized, maintainable, reproducible analysis/modeling code.
Strong understanding of applied ML evaluation tradeoffs (false negatives vs false positives, calibration, thresholding) and the ability to translate supervisory risk into testable acceptance criteria.
Experience building rigorous evaluation and testing approaches (holdouts, error analysis, slice-based performance, stability tests) and defining monitoring/drift indicators.
Ability to produce clear, defensible documentation artifacts (model cards/whitepapers, evaluation reports, monitoring definitions) and explain tradeoffs to non-technical partners.
Bachelor’s degree in a quantitative field (e.g., Statistics, Mathematics, Computer Science, Physics, Engineering, Chemistry, Economics) or equivalent practical experience.
Ownership mindset and ability to deliver independently in ambiguous environments; comfortable partnering with engineering/architecture to ship solutions.

Preferred qualifications:

NLP/LLM familiarity (embeddings, classification, retrieval, prompt/eval patterns); experience designing evaluation and measurement strategies for LLM/RAG outputs in human-in-the-loop workflows.
Experience in regulated environments (financial services preferred), including familiarity with audit logging, defensibility, and governance expectations.
Hands-on experience with platforms/environments such as Dataiku and cloud services (e.g., GCP) used to support production-intent analytics/modeling.
Experience applying traditional NLP methods (tokenization, TF‑IDF, topic modeling, embeddings, clustering/classification) to unstructured text.
Experience partnering with engineering teams to productionize models, including defining monitoring, drift response, and release gates.
Master’s degree in a quantitative field.