İlkay Şafak Baytar

Statistician

Specializing in data engineering, analytics, and machine learning

About Me

As a statistics graduate with a strong technical background in data engineering, data analytics, and machine learning, I specialize in large-scale data processing, ETL pipelines, data visualization, and modeling. I actively work with technologies such as SQL, Python, Apache NiFi, ClickHouse, Kafka, and Podman to develop solutions for data management and analytics. Passionate about enhancing data-driven decision-making processes and creating value for businesses, I continuously improve my expertise in data engineering and data science. In the long run, I aim to specialize as a Machine Learning Engineer or Data Scientist, focusing on deep learning, statistical modeling, and big data analytics.

Skills

SQL
Python
R
Apache NiFi
ClickHouse
Kafka
Podman
ETL
Data Visualization
Machine Learning
Statistical Modeling

Education

Hacettepe University

Ankara, Turkey

B.Sc. in Statistics; GPA: 2.83/4.00

Sep 2019 - Jun 2024

  • Developed strong foundation in statistical analysis, probability theory, and data modeling
  • Gained practical experience in data analysis using R and Python
  • Completed coursework in machine learning, time series analysis, and statistical computing

Work Experience

Data Engineer Intern
Burgan Bank - Istanbul, Turkey
Sep 2024 - Current
  • Developed and maintained ETL pipelines using Apache NiFi
  • Designed and containerized data processing workflows using Podman & Docker
  • Optimized ClickHouse queries for large-scale analytics
  • Integrated Flask-based APIs with internal data infrastructure
  • Conducted data validation, schema enforcement, and transformation
  • Collaborated with cross-functional teams for data-driven insights
Apache NiFi
ClickHouse
Podman
Docker
Flask
SQL
Data Analytics Intern
Arçelik Global A.Ş. - Istanbul, Turkey
Nov 2023 - Mar 2024
  • Assisted in data preprocessing and ETL pipeline development
  • Developed interactive dashboards using Power BI
  • Conducted SQL-based data extraction and transformation
  • Supported data pipeline automation
  • Utilized Python (Pandas, NumPy) and R for data visualization and statistical analysis
Power BI
SQL
Python
Pandas
NumPy
R

Projects

Lung Cancer EDA and Prediction
Streamlit Web Application
06/2024
  • Conducted Exploratory Data Analysis (EDA) on lung cancer dataset, identifying key factors and their relationships.
  • Developed machine learning models to predict lung cancer risk, utilizing naive bayes, clustering, and PCA.
  • Created a Streamlit web application for interactive data visualization and risk prediction, improving accessibility for users.
Python
Streamlit
Machine Learning
EDA
Data Analytics Challenge
KPMG
04/2023
  • Conducted case study for opening a coffee shop, utilizing socio-economic and rental cost data from Istanbul.
  • Provided strategic recommendations for coffee shop placement, contributing to business planning and decision-making.
Data Analytics
Geospatial Analysis
Business Strategy
Istanbul Solar Panel Data
Time Series Analysis
01/2023
  • Analyzed Istanbul solar panel data using advanced statistical techniques such as seasonal decomposition, regression, and ARIMA models, enabling accurate forecasting of energy generation.
  • Applied data analysis and model development skills to enhance predictive accuracy and computational efficiency.
Time Series Analysis
ARIMA
Regression
Energy Forecasting
Data Visualization For COVID Data
Shiny Web Application
05/2022
  • Developed a web application for visualizing Covid-19 data, providing an interactive platform for data exploration and analysis.
  • Implemented features including four different ggplot2 plots with variable selectors and data tables for different continents, enhancing the comprehensiveness and versatility of the application.
R
Shiny
ggplot2
Data Visualization

Get in Touch

Contact Me

Contact Information