About

Experienced Data Analyst proficient in R, Python, Tableau, Javascript, and SQL. Skilled in cleaning and analyzing large datasets, merging data frames, and applying machine learning techniques to solve real-world problems. Holds a Bachelor's degree in Statistics with a minor in Mathematics from the University of California Davis, and recently finished a certificate in Data Analytics BootCamp from the University of California, Berkeley. Committed to lifelong learning and professional development, seeking a challenging opportunity to help companies and clients advance effectively and productively. Known for analytical problem-solving abilities and collaboration across diverse groups, with a focus on delivering detailed and efficient analysis for stakeholders and consumers.

Basic Information
City:
San Francisco, CA
Professional Skills
Python | Pandas | Numpy | Seaborn
Machine Learning | Supervised | Unsupervised
Web Scraping(BeautifulSoup/Selenium) | APIs
Google Analytics | Google Tag Manager
SQL | Postgres | BigQuery
HTML | CSS
Tableau | Looker Studio | Dashboards
JavaScript | D3.js | Ploty | Leaflet
  • Python
  • Postgres
  • HTML
  • CSS
  • JavaScript
Education

2025

Master of Science Degree
M.S. - Statistical Data Science
San Francisco State University

Practiced encompassing modern statistical and machine learning techniques, classical statistical theory, and practical applications. Students also develop and refine computational skills tailored for diverse datasets, particularly focusing on large-scale data common in business, technology, and science. The entire curriculum is built upon a solid foundation of statistical theory and an understanding of the mathematical principles behind various techniques and algorithms. This holistic approach equips students with a well-rounded skill set, preparing them for versatile applications in data analysis and decision-making across different industries.

2023

Certificate
Data Analytics BootCamp
UC Berkeley Extenstion

Focused on the practical technical skills needed to analyze and solve data problems. Gain proficiency in a broad array of technologies like Excel, Python and R programming, JavaScript charting, SQL databases, Tableau, machine learning and more.

2022

Bachelor of Science Degree
B.S. - Statistics
University of California, Davis

Acquired quantitative and qualitative research and analytical skills to understand healthcare, social policy, and many other public datasets. Based on analysis create effective strategies to identify possible solution. My background in statistics and mathematics focused on fundamental linear algebra, data structure modeling, and analysis of algorithms.

Projects

The ultimate goal of this analysis is to determine the possible pipe break. In this hypothetical scenario, we're given some dummy client data which closely follows what we typically see in the real world. The goal is to clean and do some basic preprocessing on data, as well as provide some insight into how the data are structured.

Tools used   :   Python, Jupyter Notebook, Pandas, Supervised Machine Learning, Logistic Regression

Category  : Machine Learning, Time Series Forecasting

Year     :   March 2023

The All-Star Game is a game between teams of outstanding players and many baseball fans are interested in players to be chosen for next coming All-Star game. We made a prediction by applying effective machine learning techniques to MLB 2021 data, which was crawled from the web page. Through visualizing the data, we could understand the data better and found some meaningful insights that might improve our prediction. Since our target variable is a binary categorical variable, we used Logistic regression to train the model and we were able to get the probability of the players to be involved in the next All-Star game.

Tools used   :   Python, Jupyter Notebook, Pandas, Supervised Machine Learning, Logistic Regression, BeautifulSoup

Category  : Machine Learning

Year     :   June 2022

A potentially hazardous asteroid is with an orbit that can make close approaches to the Earth and is large enough to cause significant regional damage in the event of impact. By Identifying the potential hazardous asteroids, we can assess potential prevent the collision between the Earth and the asteroid. To achieve our goal to classify whether the asteroid is potentially hazardous or non-hazardous, we trained NASA Asteroids data with the Naive Bayes Classifier, Support Vector Machine, and Decision Tree. As a result, Decision tree modeling predicted hazardous asteroids with the best performance by achieving 99% accuracy.

Tools used   :   Python, Jupyter Notebook, Pandas, Supervised Machine Learning, Decision Tree

Category  : Machine Learning

Year     :   May 2022

JongWook Choe