Hello, my name is
Mahmoud Mayaleh
And I'm an
Reach out

About me

I'm Mahmoud and I'm an AI engineer

AI Engineer with a Master’s in Artificial Intelligence, skilled in deep learning, NLP, and large-scale model training. Background in distributed AI systems and applied ML research.

linkedin View CV

My Journey

Research Engineer

CNAM - Paris

  • Researched RDMA + LLM traffic to design scheduling strategies, model topologies, and compute overlaps using ns-3 & SimAI.
  • Simulated GPT workloads (up to 175B on 1024 GPUs) with SimAI to optimize NCCL/RDMA, and prototyped MSCCL collectives for scalability.

Master's in Artificial Intelligence for Connected Industries

CNAM, Paris, France

Back End Developer

Ostim Technical University - IT department

  • Built 5+ AI-enhanced Odoo ERP modules, improving workflows and efficiency by 20%.
  • Automated back end with Python and ML, cutting manual entry 40% and reducing errors.
  • Used AI-driven query optimization to speed up ERP load times by 25%.

Machine Learning Intern

ArkSigner

  • Built a YOLOv5-based object detection pipeline, achieving 90% accuracy.
  • Deployed AI-driven optimizations that boosted system performance by 30%.
  • Validated on custom datasets for robust deployment.

Bachelor of Engineering: Computer Engineering

Ostim Technical University, Ankara, Türkiye

Skills

Core stack

Applied AI/ML engineer focused on building end-to-end systems—data pipelines, model training/serving, and MLOps. Strong in Python + PyTorch, with hands-on work in NLP and Computer Vision.

Python
PyTorch
TensorFlow / Keras
scikit-learn
NLP (Transformers)
Computer Vision (OpenCV)

Publications

Visualization from the sentiment augmentation study

Enhancing Sentiment Classification on Small Datasets through Data Augmentation and Transfer Learning

This work provides a unified benchmark of Easy Data Augmentation (EDA), back-translation, and contextual token substitution (NLPaug) on low-resource IMDb sentiment data, evaluated with Logistic Regression, Random Forest, and BERT under identical experimental conditions.
Abstract:

Small-scale sentiment classification suffers from data scarcity, which limits model generalization. This study systematically compares three text augmentation strategies under a controlled, reproducible framework.

  • EDA (Easy Data Augmentation) based on token-level synonym replacement, insertion, deletion, and swapping.
  • Back-translation using English↔French MarianMT models to create high-fidelity paraphrases.
  • Contextual token substitution (NLPaug-style) with pre-trained language models for semantics-preserving edits.

Methods: Experiments use a 5,000-sample IMDb subset with 100% augmentation, 10-fold cross-validation, and fixed seeds, comparing traditional classifiers (Logistic Regression, Random Forest) and a fine-tuned BERT base model on accuracy, F1, AUC, and effect sizes.

  • All augmentation strategies yield significant and statistically robust performance gains over non-augmented baselines.
  • Contextual augmentation delivers the most consistent improvements for BERT, reaching about 97% test accuracy on the augmented setting.
  • EDA and back-translation provide larger relative gains for traditional models, especially Random Forest, while exhibiting different diversity–cost trade-offs.
Published in: Discover Artificial Intelligence (Springer), 2026
Read Publication
LipLingo CNN lip-reading model schematic

LipLingo: CNN Model for Lip Reading

LipLingo is a deep learning lip-reading system built on LipNet with enhanced preprocessing and spatio-temporal modeling. It achieves 95% character-level and 87% sentence-level accuracy, showing strong potential for assistive tech and biometrics.
Abstract:

Lip reading plays a crucial role in applications such as speech recognition, assistive technologies for the hearing-impaired, and biometric authentication. However, performance often degrades when the speaker varies, the environment changes, or the speech sounds are visually similar. To overcome these challenges, we propose LipLingo:

  • Based on the LipNet architecture
  • Enhanced with standardized preprocessing for consistent mouth-region representation
  • Uses rotational validation for better generalization
  • Combines 3D convolutional layers (spatio-temporal features) and bidirectional recurrent layers (sequence modeling)
  • Optimized with Adam optimizer and CTC loss

Results:

  • 95% character-level accuracy
  • 87% sentence-level accuracy
  • Outperforms baseline LipNet
Published in: 2025
Read Publication

Projects

LipLingo: Deep Learning Model for Lip Reading

LipLingo logo and model schematic
LipLingo is a lip-reading tool that turns silent video of someone’s mouth into text. It learns typical lip-movement patterns, useful when audio is missing or noisy.
View on GitHub

Highlights

  • Visual lip-reading system that converts silent mouth video into text
  • Data & prep: trained and evaluated on the GRID corpus with face/mouth detection, frame normalization, and basic augmentations.

Tech Stack

Python TensorFlow/Keras NumPy OpenCV

Sentiment Analysis on Social Media Comments

Sentiment analysis
Text-classification pipeline extracting emotions from comments using classic ML and lightweight NLP.
View on GitHub

Highlights

  • Computed per-comment scores from Twitter_Data.csv and wrote labeled outputs.
  • Generated a sentiment-score histogram and calculated the dataset’s average compound sentiment.

Tech Stack

Python scikit-learn NLTK pandas Matplotlib

Data-Driven Sales Trend Predictor

Line and bar chart showing forecasted sales
Sales forecasting with linear regression and clear error metrics for decision-ready insights.
View on GitHub

Highlights

  • Sales forecasting tool that predicts future demand per store/item from historical transactions.
  • Pipeline: EDA → cleaning & time-indexing → feature engineering (lags, rolling means, seasonality/holiday flags) → model training → backtesting → error analysis.

Tech Stack

Python NumPy pandas Matplotlib

Contact me

Name
Mahmoud Mayaleh
Address
Paris, France
Email
mahmoud@mayaleh.com
Message me