The kind who looks at a tangle of federal and private datasets that don’t share schemas, don’t share IDs, and were never meant to talk to each other, and gets a little excited. The kind who knows that the answer is almost never “throw a bigger model at it” and is almost always “understand the data first, then pick the model.” The kind who can sit across from a federal subject matter expert and explain what a Leiden community is without making them feel dumb, and without dumbing it down either.
If you’ve ever quietly fixed someone else’s “production” notebook on a Friday afternoon - the one with hard-coded paths, no random seed, and a function called final_FINAL_v3() - this might be you.
Come join us if you're motivated to learn from others, to learn from mistakes, to be part of a future-looking and growth-oriented team.
Let's go Skyward together.
Lead end-to-end data science experiments. From a data readiness assessment, through clustering and topological risk modeling, into unstructured-data enrichment and entity resolution.
Run exploratory data analyses (EDAs) on government-furnished data inside a government-controlled environment: profile completeness, find the schema mismatches, flag the gaps, and document what the data can and cannot support before a single model gets trained.
Apply graph analytics. Leiden community detection, betweenness and eigenvector centrality, motif analysis, temporal cluster detection, link prediction. And be able to explain in plain English what each one means and what it doesn’t.
Train interpretable classifiers (logistic regression, gradient boosted trees) and Graph Neural Networks (GraphSAGE, GAT) where the data supports them; reach for unsupervised anomaly detection when labels are thin (and they will be).
Run probabilistic entity resolution across biographic, behavioral, and biometric features using tools such as Senzing. Handle name transliteration, DOB variation, and fuzzy address matching like the working scientist you are.
Apply LLM-based Named Entity Recognition and relationship extraction to unstructured field text and quantify whether the extracted edges actually change the graph (rather than just adding noise that looks impressive in a deck).
Wrangle messy data and recommend supplemental, de-identified data sources that would enrich the analysis, and document the case for each recommendation so the customer’s privacy and legal teams can make an informed call.
Document everything so the customer can rerun it after you’re gone. Reproducibility is a hard acceptance criterion here, not a “nice to have.”
Hold the line on rigor: confidence intervals on everything, null findings documented with the same care as positive ones, and zero appetite for dashboard theater.
An active Secret clearance at time of hire.
7+ years of applied data science experience, with at least 2 years that included graph analytics or network analysis as a primary tool, not a side dish.
Strong production-grade Python: pandas, NumPy, scikit-learn, networkx (or graph-tool / igraph), and at least one GNN library such as PyTorch Geometric or DGL.
Real-world experience with probabilistic record linkage / entity resolution - Senzing, dedupe, FEBRL, Magellan, or a homegrown Fellegi-Sunter implementation you’d be willing to defend in a code review.
Comfort working without labels. Anomaly detection, positive-unlabeled learning, isolation forests, autoencoders. You know how to evaluate a model when there’s no clean ground truth to compare it to.
Interpretability discipline: SHAP, feature importance, partial dependence plots, and the wisdom to pick the simpler model when it’s the right one.
LLM application experience beyond prompt-and-pray. Entity and relationship extraction on real text, with real evaluation of extraction quality.
Statistical maturity: someone who knows what a power analysis is, why a confidence interval matters, and why p-hacking is bad even when nobody is checking.
Comfort presenting technical findings to non-technical operational stakeholders without losing the nuance.
Reproducibility hygiene to ship code that someone else can actually rerun: version control, parameterized pipelines, deterministic seeds, pinned dependencies, READMEs that work
Published research, open-source contributions, or conference talks in entity resolution, graph ML, or applied causal inference.
Knowledge of commercial data sources like LexisNexis, Transunion, Babel Street, etc
A track record of standing in front of federal subject matter experts and walking them through a null result without flinching.
A reputation among teammates as the person who finds the bug in the pipeline before it hits the deliverable.
Even if you don’t meet 100% of the qualifications, we encourage you to apply. At Skyward, we’re focused on hiring individuals with the right skills and passion to grow, not just checking off every box.
Medical, dental, vision insurance (fully paid for employees)
15 days of paid leave
7 days of sick leave
2 days bereavement leave
11 paid Federal holidays
Up to 40 hours for jury duty
401K with 4% employer contribution (and no vesting period)
Up to 4 weeks of paid paternity and maternity leave
Company provided laptop
$5,000 per year for professional development
$600 per year for technical supplies and equipment
$2,000 referral bonus
Life and disability insurance
HSA and FSA
Legal Shield and ID Shield Voluntary Benefits
Opportunity to work in a collaborative, motivated team focused on modernizing government services with cutting-edge technology and innovative solutions. Who says government work can't be exciting!
At Skyward, we support flexible working hours and remote opportunities to help maintain a healthy work-life balance for all employees.
Offers of employment with Skyward are contingent upon acceptable results of a background investigation.
Applicants must have the ability to obtain and maintain a Public Trust security clearance due to the nature of our work as a government contractor.