İlkay Şafak Baytar
Statistician
Specializing in data engineering, analytics, and machine learning
About Me
As a statistics graduate with a strong technical background in data engineering, data analytics, and machine learning, I specialize in large-scale data processing, ETL pipelines, data visualization, and modeling. I actively work with technologies such as SQL, Python, Apache NiFi, ClickHouse, Kafka, and Podman to develop solutions for data management and analytics. Passionate about enhancing data-driven decision-making processes and creating value for businesses, I continuously improve my expertise in data engineering and data science. In the long run, I aim to specialize as a Machine Learning Engineer or Data Scientist, focusing on deep learning, statistical modeling, and big data analytics.
Skills
Education
Hacettepe University
Ankara, Turkey
B.Sc. in Statistics; GPA: 2.83/4.00
Sep 2019 - Jun 2024
- Developed strong foundation in statistical analysis, probability theory, and data modeling
- Gained practical experience in data analysis using R and Python
- Completed coursework in machine learning, time series analysis, and statistical computing
Work Experience
- Developed and maintained ETL pipelines using Apache NiFi
- Designed and containerized data processing workflows using Podman & Docker
- Optimized ClickHouse queries for large-scale analytics
- Integrated Flask-based APIs with internal data infrastructure
- Conducted data validation, schema enforcement, and transformation
- Collaborated with cross-functional teams for data-driven insights
- Assisted in data preprocessing and ETL pipeline development
- Developed interactive dashboards using Power BI
- Conducted SQL-based data extraction and transformation
- Supported data pipeline automation
- Utilized Python (Pandas, NumPy) and R for data visualization and statistical analysis
Projects
- Conducted Exploratory Data Analysis (EDA) on lung cancer dataset, identifying key factors and their relationships.
- Developed machine learning models to predict lung cancer risk, utilizing naive bayes, clustering, and PCA.
- Created a Streamlit web application for interactive data visualization and risk prediction, improving accessibility for users.
- Conducted case study for opening a coffee shop, utilizing socio-economic and rental cost data from Istanbul.
- Provided strategic recommendations for coffee shop placement, contributing to business planning and decision-making.
- Analyzed Istanbul solar panel data using advanced statistical techniques such as seasonal decomposition, regression, and ARIMA models, enabling accurate forecasting of energy generation.
- Applied data analysis and model development skills to enhance predictive accuracy and computational efficiency.
- Developed a web application for visualizing Covid-19 data, providing an interactive platform for data exploration and analysis.
- Implemented features including four different ggplot2 plots with variable selectors and data tables for different continents, enhancing the comprehensiveness and versatility of the application.