I'm Helen,
a data scientist
based in San Diego / Los Angeles.
About
I am a recent graduate from UC San Diego with a B.S. in Data Science and a B.A. in Political Science with a specialization in Public Policy. I love exploring the intersection between technology, specifically anything data-related, and its relationship to public policy or civic tech. I hope to utilize my fascination with both of these amazing fields through data analysis!
Expertise
- Data Mining
- Predictive Analytics
- Machine Learning
- Product Management
- Data Preparation
- Data Visualization
Experience
Bureau of Labor Statistics
Civic Digital Fellowship - Data Science
June 2021- August 2021
Selected in a fellowship with an acceptance rate of 6%, I helped expand on an auto-classifier algorithm based on the PPI Index category. I also helped spearhead a Data Lakes pilot program that would automate the collection of data using AWS and cloud-based technology.
Center for Peace and Security Studies
Research Assistant
October 2019 - June 2021
Collected and organized data using quantitative methods and analyzed these datasets in terms of international conflict. Geocoded visualizations of these conflicts over time using visualization packages in R.
Data Science Student Society
Project Lead
January 2020 - June 2020
Led a group of students in a project to analyze future occurrences of hate crime across the United States. Using data gathered over twenty years, we utilized various data cleaning approaches and EDA on multiple datasets, while representing the best visualizations to capture these aggregates. Developed a machine learning model based on NLP and past trends to predict future cases.
Recent Works
Here are some of my favorite projects I have done lately. Feel free to check them out.
Reddit/Twitter Misinformation
Examining the diffusion of misinformation content on the social media platforms Twitter and Reddit, analyzing both their differences and similarities. We look at different measures of diffusion, particularly looking within echo chambers, as well as user polarity and post polarity.
- Reddit/Twitter API
- Pandas, Numpy, Matplotlib, Seaborn, Plotly
- HTML, CSS
MBTI Personality Predictions
Explored data using histograms and correlation matrices to gain insights on important features of what would make up an MBTI personality. Implemented data imputation when needed. Developed pipeline classification models using kNN and SVM.
- Pandas, Numpy, Sklearn, Matplotlib
- Pytorch
Scratch
As a part of the Curriculum Committee in CS foreach, a student organization at UCSD, we are tasked with developing fun learning tools for kids of all ages to get to learn programming languages. We have found Scratch to be one of the best languages to help facilitate this to promote an interest in STEM early.
- Teaching
Predicting Hotel Bookings
Our team wanted to find how likely a customer would be to cancel a booking of a hotel that they had already reserved. We found a dataset on Kaggle containing booking details from a city and resort hotel, along with a number of other additional details. From this, we created a model through a pipeline to output how likely a guest would cancel their hotel booking.
- Pandas, Numpy, Matplotlib, SKlearn
- KNN, Random Forest, Logistic Regression, Pipeline, Preprocessing
Get In Touch
I love to hear from you. Whether you have a question or just want to chat about data science, civic tech, or anything in between — shoot me a message.