Ph.D. in Bioinformatics, UCSD
10+ years in AI/ML algorithm development
5+ years in leading data science teams and projects
Specialty: LLMs, Gen AI, MLOps, Classical ML, Deep Learning, AI/ML project management
Ph.D. in Bioinformatics, University of California, San Diego (UCSD)
Bachelor's in Life Sciences, National Taiwan University
Terra.do Fellow, 2023 Elephant Cohort
Juno Diagnostics, 2021-2023
Led the implementation of automated pipelines and continuous algorithm improvements, resulting in significant cost and performance enhancements after product launch.
Successfully processed and reported out 5000+ samples in 8 months with exceptional accuracy and minimal pipeline errors, demonstrating operational efficiency and data reliability.
Enabled data visibility across the organization by architecting a centralized data lake with Snowflake, seamlessly combining data from multiple platforms. This empowered data-driven decision-making and fast strategic responses.
Optimized the ETL pipeline to Snowflake, achieving a remarkable 60% cost reduction, further demonstrating our expertise in driving cost-effective data operations.
Streamlined the bioinformatics pipeline and AWS S3 storage, leading to a 25% reduction in AWS costs while supporting a remarkable 20% month-to-month sample volume growth.
ResMed, 2020-2021
Applied deep learning NLP algorithms to detect emerging topics and issues in customer reviews, resulting in improved customer insights and proactive response strategies.
Developed a real-time web application with Dash to monitor and track NLP results, enabling data-driven decision-making and quick action on emerging trends.
Designed and implemented a multi-layer machine learning algorithm to optimize HME (Home Medical Equipment) decision-making, leading to increased compliance rates and substantial cost savings.
Contributed actively to shared libraries for machine learning, streamlining the production ML pipeline development process and enhancing feature engineering capabilities.
Spearheaded the development of a production ML pipeline platform using AWS (SageMaker, Lambda, EC2, etc.), Docker, open-source Spark, and MLflow, ensuring scalable and efficient data processing and model deployment.
Thermo Fisher Scientific, 2017-2020
Created best practices for end-to-end productionized machine learning pipelines and an ML model management platform, ensuring streamlined and efficient model deployment and monitoring.
Developed a machine learning model to predict customer buying behaviors based on views and sales history, providing valuable insights for marketing and sales strategies.
Led a 3-person agile team for the Product Interaction Network project, utilizing network/graph analysis and NLP techniques. The project increased product context annotation by an impressive 37%, enhancing understanding of interactions among >200,000 active products.
Provided strategic pricing support to multiple business segments within the Life Science Group (LSG) at Thermo Fisher Scientific, driving data-driven pricing strategies for impactful projects.
Developed novel machine learning algorithms and statistical models for data-driven pricing strategies, including global list price recommendations across the whole LSG portfolio, resulting in a remarkable $80 million impact.
Specialized in machine learning models and demonstrated proficiency in multiple machine learning libraries in R and Python.