Experienced data engineer with a strong track record of three years at a prominent data consulting firm. Skilled in Python, SQL, and proficient in data analysis. Specialized in designing, maintaining, and optimizing data infrastructure for comprehensive data lifecycle management. Excited to apply my expertise in a dynamic and challenging setting, utilizing my problem-solving prowess to deliver impactful solutions.
• Developed a beer recommendation engine with collaborative filtering techniques, namely K Nearest Neighbors and Singular Value Decomposition.
• Achieved low error rates (RMSE: 0.1317, MAE: 0.0547) with KNN model through 5-fold cross-validation, demonstrating effectiveness in predicting beer preferences.
• Leveraged the combined power of KNN and SVD to provide more relevant recommendations, indicating potential for improved user satisfaction and exploration of diverse beer options.
• Designed an optimized portfolio aimed to maximize sharpe ratio, outpacing S&P 500 by over 10% in cumulative return with 2-year data analysis.
• Utilized web scraping and data preprocessing techniques to gather adjusted close prices of S&P 500 companies for portfolio optimization.
• Implemented bootstrap and Gibbs sampling methods to evaluate portfolio performance, achieving a significant increase in return.
• Using data analysis and predictive modeling, the project recommends a weekend ticket price increase to $86 and extending the season to 130 days to achieve the desired profit margin.
• Leveraged the combined power of KNN and SVD to provide more relevant recommendations, indicating potential for improved user satisfaction and exploration of diverse beer options..
• Generated neural networks without a framework to analyze bike-sharing data and generate sales predictions.
• Designed and developed neural network models with PyTorch to classify the sentiment of IMDB film reviews.
• Deployed an endpoint on AWS API Gateway to receive user data and sends it to AWS Lambda to process user data and sends it to a deployed model ‘s endpoint on AWS Sagemaker. The model classifies a user’s review as positive or negative
• Orchestrated end-to-end data management, including ingestion into data lake, ETL processes, and data warehousing using AWS Glue and Athena.
• Leveraged AWS CDK and implemented a robust CI/CD pipeline to ensure seamless operations throughout the data management lifecycle.
• Implemented predictive models using AWS technologies (Kinesis Data Streams, Glue, Redshift, Athena, Step Functions, Sage Maker) to revolutionize machine repair downtime forecasting and real-time outlier detection in sensor data.
• Improved processing efficiency of claim data, optimizing the workflow from sales at the manufacturer level to installation by on-site contractors, and submission to utility companies nationwide, leading to streamlined operations and enhanced turnaround times.
• Collected and labeled location ground truth data for various projects to allow Google Engineers to improve models for Google Maps tools.
• Post-processing on GIS and GNSS datasets involves outlier detection, smoothing noisy data, interpolation to fill missing data, and data validation for ensuring reliable analysis results.
• Over the first 12 months, efforts contributed to 3x subscriber growth and 200%+ revenue increase.
• Designed, developed, launched, and managed numerous apps for 3 corresponding company brands.
• Managed app development, test, and production deployment pipeline with Heroku and AWS.
• Developed dashboards and ETL workflow to process our KPIs such as CPA, CPC along with A/B testing results with various data sources.
• Developed an internal tool that analyzed network traffic in order to generate security reports.
• Trained international colleagues to operate the network analyzer and how to read and gain insights from the generated reports.
• Developing Genomic Data Engine with the Matchmaker Exchange API to sync to the GA4GH’s databases.
• Converting Matchmakers Exchanges JSON objects to MySQL to optimize data store.
• Configuring Seagate’s Kinetic drives using DHCP to access drives over a private network.
• Designing object-oriented database with Seagate’s Kinetic API to organize and store genomic data.
• Taught younger undergraduate students unfamiliar programming concepts in C, Python, and Java.
• Scheduled lesson plans to hold smooth, understandable tutor session.
• Graded with a greater understanding of the material than the students to give out advice and comments along with a grade.
• Collaborated with colleagues outside the technology services to enhance existing applications using C# with ASP .NET and MS SQL Server Management Studios.
• Documented other workflow applications that outsourced contractors wrote for the bank.
Data Science Career Track