Personal Portfolio - Data Science Professional

About Me

Fall 2023 Graduate looking for Data Scientist & Data Analyst Jobs

I graduated in December 2023 from Georiga Tech with a GPA of 3.85 where I majored in Computational Data Analytics. I went to Scheller College of Business at Georgia Tech for my undergraduate degree.

With my strong academic background and working experience in data analysis, I am equipped with the capabilities of data visualization, statistical analysis, data mining, geospatial data modeling, statistical modeling, predictive modeling, machine learning, deep learning, artificial intelligence, natural language processing, and programming languages to leverage cutting-edge technologies for transforming complex & big datasets into valuable data insights.

Technical Skills

Python90%

SQL90%

Machine Learning90%

Time Series Analysis85%

Natural Language Processing85%

A/B Testing85%

Work Experience

December 2023 - May 2024

Data Scientist Intern

Prospect 33, New York, NY

Developed multiple variations of random forest proximity algorithms to detect outliers and impute missing values, benchmarked their performance models with 14 unsupervisied outlier detection models in terms of AUC-ROC scores, and identified 2% of misclassified funds for a leading asset firm.

June 2023 - August 2023

Business Intelligence Analyst Intern

UPS - Supply Chain Solutions, Alpharetta, GA

Built a real-time data pipeline monitor with Power B.I. and wrote T-SQL queries to automate the pipeline performance analysis and effectively resolve potential issues in the data pipeline, reducing 90% of manual effort.

June 2022 - August 2022

Advanced Tax Intern

PricewaterhouseCoopers LLP, Atlanta, GA

Assessed U.S. ownership in foreign corporations and intangible property values to propose entity restructuring per requirements.

June 2021 - August 2021

Data Analyst Intern

FirstKey Homes, Marietta, GA

Performed data collection in Python to gain insights into the dataset of homes with HVAC issues to potentially save $3k+ monthly maintenance fees and predict key expense drivers using statistical techniques to facilitate data-driven decision-making.

Portfolio

Therapy Dropout Prediction (Humana Mays Healthcare Analytics Competition)

Conducted EDA on 500k+ Rx/medical claims data with best practices to find predictive features leading to premature therapy end.
Engineered features in Python to achieve a 0.94 AUC score with XGBoost, implemented logistic regression to reveal primary dropout predictors, and predicted the treatment discontinuation risk for every 1232 patients following an agonistic drug event.

Service Access Web App (UPS Technology Group Hackathon, Top 5/50)

Designed a Streamlit-based web app with assessment, query features, and employee authentication by building a CatBoost model
Predicted employee access to resources, reduced manager check time by approx. 20%, and raised prediction accuracy by 5% by hyper-tuning the learning rate, iterations, and depth of gradient-boosted trees.
Provided presentation to management on how the app supports reducing onboarding wait times and offboarding audits.

Financial Insight Web App

Built a Tableau dashboard by integrating portfolio optimizer and sentiment analysis with risk assessment for informed decision- making, as well as visualized historical company performance with price and volume charts.
Performed web scraping of 4 years of financial news & reports using Beautiful Soup, FINBERT, and tokenization in NLP. ▪ Analyzed & recommended 18 stocks with high returns & minimal risks with Monte Carlo simulation and Sharpe Ratio.
Analyzed & recommended 18 stocks with high returns & minimal risks with Monte Carlo simulation and Sharpe Ratio.

Best Buy Daily Sales Forecasting (MSA Project sponsored by Best Buy)

Used Pandas, NumPy, and Matplotlib for data preprocessing, data transformation, and data visualization of 5-year sales data.
Performed feature preprocessing and selection using PCA, Random Forests, correlation matrix, and one-hot encoding.
Implemented XGBoost and Facebook Prophet models to enhance sales forecasting & performance and reduced RMSE by 50%.

Movie Recommendation System

Explored ~800k movie rating data and conducted descriptive data analysis using Python Pandas & Seaborn visuals.
Transformed genre into 32 dummy features, identified similarities between movies based on user & movie features using hierarchical clustering with unevenly distributed genre data, selected silhouette coefficient to optimize cluster coherences.
Implemented MLP with SGD & ADAM optimization, SVD, SVD++, KNN, Bernoulli & Gaussian Naive Bayes classifiers, fined-tuned model hyper-parameters and achieved 67.7% accuracy to predict if a user would like a movie using SVD++.

Data Science Club - BirdCLEF Team

Led spatial autocorrelation analysis with ArcGIS using data science and manipulation techniques to find the populated regions.
Developed lightGBM for predicting bird species occurrences based on spatial and temporal data to reveal migration patterns.
Automated statistical & spatial workflows, enhanced data processing, and visualized geospatial data with Geopandas and PyTorch.

Cancer Survival Prediction - 2019 Hackathon Hacklytics

Led a team of 3 analysts to build a linear regression model from Scikit-learn in Python to analyze the correlation between number of mutations per gene and estimated survival rate among cancer patients, with an adjusted R-squared value of 0.76.
Showcased analytical findings, trends & correlations between mutations & percentage of cases affected, derived personal mutation risk based on 5 demographic parameters using an interactive Tableau dashboard.

Contact

City:

Queens, New York

Email:

peterjinsong99@gmail.com

Phone:

(646)-875-6307

Jinsong Zhen

Peter Zhen