Hello, my name is
Mahmoud Mayaleh
And I'm an
Reach out

About me

I'm Mahmoud and I'm an AI engineer

AI Engineer with a Master’s in Artificial Intelligence, skilled in deep learning, NLP, and large-scale model training. Background in distributed AI systems and applied ML research.

linkedin View CV

My Journey

Research Engineer

CNAM - Paris

  • Researched RDMA + LLM traffic to design scheduling strategies, model topologies, and compute overlaps using ns-3 & SimAI.
  • Simulated GPT workloads (up to 175B on 1024 GPUs) with SimAI to optimize NCCL/RDMA, and prototyped MSCCL collectives for scalability.

Master's in Artificial Intelligence for Connected Industries

CNAM, Paris, France

Back End Developer

Ostim Technical University - IT department

  • Built 5+ AI-enhanced Odoo ERP modules, improving workflows and efficiency by 20%.
  • Automated back end with Python and ML, cutting manual entry 40% and reducing errors.
  • Used AI-driven query optimization to speed up ERP load times by 25%.

Machine Learning Intern

ArkSigner

  • Built a YOLOv5-based object detection pipeline, achieving 90% accuracy.
  • Deployed AI-driven optimizations that boosted system performance by 30%.
  • Validated on custom datasets for robust deployment.

Bachelor of Engineering: Computer Engineering

Ostim Technical University, Ankara, Türkiye

Skills

Core stack

Applied AI/ML engineer focused on building end-to-end systems—data pipelines, model training/serving, and MLOps. Strong in Python + PyTorch, with hands-on work in NLP and Computer Vision.

Python
PyTorch
TensorFlow / Keras
scikit-learn
NLP (Transformers)
Computer Vision (OpenCV)

Publications

Visualization from the sentiment augmentation study

Enhancing Sentiment Classification on Small Datasets through Data Augmentation and Transfer Learning

This work provides a unified benchmark of Easy Data Augmentation (EDA), back-translation, and contextual token substitution (NLPaug) on low-resource IMDb sentiment data, evaluated with Logistic Regression, Random Forest, and BERT under identical experimental conditions.
Abstract:

Small-scale sentiment classification suffers from data scarcity, which limits model generalization. This study systematically compares three text augmentation strategies under a controlled, reproducible framework.

  • EDA (Easy Data Augmentation) based on token-level synonym replacement, insertion, deletion, and swapping.
  • Back-translation using English↔French MarianMT models to create high-fidelity paraphrases.
  • Contextual token substitution (NLPaug-style) with pre-trained language models for semantics-preserving edits.

Methods: Experiments use a 5,000-sample IMDb subset with 100% augmentation, 10-fold cross-validation, and fixed seeds, comparing traditional classifiers (Logistic Regression, Random Forest) and a fine-tuned BERT base model on accuracy, F1, AUC, and effect sizes.

  • All augmentation strategies yield significant and statistically robust performance gains over non-augmented baselines.
  • Contextual augmentation delivers the most consistent improvements for BERT, reaching about 97% test accuracy on the augmented setting.
  • EDA and back-translation provide larger relative gains for traditional models, especially Random Forest, while exhibiting different diversity–cost trade-offs.
Published in: Discover Artificial Intelligence (Springer), 2026
Read Publication
LipLingo CNN lip-reading model schematic

LipLingo: CNN Model for Lip Reading

LipLingo is a deep learning lip-reading system built on LipNet with enhanced preprocessing and spatio-temporal modeling. It achieves 95% character-level and 87% sentence-level accuracy, showing strong potential for assistive tech and biometrics.
Abstract:

Lip reading plays a crucial role in applications such as speech recognition, assistive technologies for the hearing-impaired, and biometric authentication. However, performance often degrades when the speaker varies, the environment changes, or the speech sounds are visually similar. To overcome these challenges, we propose LipLingo:

  • Based on the LipNet architecture
  • Enhanced with standardized preprocessing for consistent mouth-region representation
  • Uses rotational validation for better generalization
  • Combines 3D convolutional layers (spatio-temporal features) and bidirectional recurrent layers (sequence modeling)
  • Optimized with Adam optimizer and CTC loss

Results:

  • 95% character-level accuracy
  • 87% sentence-level accuracy
  • Outperforms baseline LipNet
Published in: 2025
Read Publication

Projects

ATLAS: Adaptive Task-aware Federated Learning for LLMs

Diagram of federated learning with edge devices
Publication-ready federated learning framework enabling efficient multi-task LLM fine-tuning on heterogeneous edge devices.
View on GitHub

Highlights

  • Four-phase pipeline: task clustering, heterogeneous LoRA configuration, split federated learning, and Laplacian-based personalization.
  • Supports DistilBERT, BERT, RoBERTa, GPT-2 with real PyTorch + HuggingFace training.

Tech Stack

Python PyTorch HuggingFace PEFT/LoRA Federated Learning

Git Commit Multi-Agent

Diagram of multi-agent AI pipeline for git commits
AI-powered multi-agent system that generates high-quality conventional Git commit messages from staged diffs using local LLMs.
View on GitHub

Highlights

  • Three-agent pipeline (DiffAgent, SummaryAgent, CommitWriterAgent) with shared state for robust commit generation.
  • Runs entirely on-device via Ollama + OpenChat 7B, with zero API cost and full privacy.
  • Supports dry-run, verbose mode, and programmatic usage for CI or tooling integration.

Tech Stack

Python Ollama OpenChat 7B GitPython CLI Tools

BERT Sentiment Augmentation

Sentiment analysis graph comparing models and augmentation
Published study on data augmentation and transfer learning for sentiment classification in low-resource settings.
View on GitHub

Highlights

  • Unified benchmark of EDA, contextual augmentation, and back-translation on a 5k IMDb subset.
  • Evaluates Logistic Regression, Random Forest, and BERT with rigorous statistics (CIs, t-tests, effect sizes).
  • Reproducible pipeline with fixed seeds, exported metrics, and augmentation quality analysis via sentence embeddings.

Tech Stack

Python BERT Scikit-learn NLPaug Sentence-Transformers

LipLingo: CNN-Based Lip Reading

Frames of a mouth region used for lip reading
Computer vision model that performs lip reading from video frames for silent speech understanding.
View on GitHub

Highlights

  • CNN-based architecture tailored for visual-only speech recognition.
  • Preprocessing pipeline for mouth ROI extraction and frame normalization.
  • Positioned for accessibility and multimodal AI applications.

Tech Stack

Python CNN Computer Vision Deep Learning

Deep RL for Autonomous Robot Navigation

Simulated TurtleBot navigating around obstacles
Deep reinforcement learning for autonomous TurtleBot3 navigation using PPO in ROS 2 and Gazebo.
View on GitHub

Highlights

  • Trains deep reinforcement learning (PPO) agents to navigate complex environments.
  • Uses ROS 2 for realistic robot control and system orchestration.
  • Runs in Gazebo with configurable, obstacle-rich simulation worlds.

Tech Stack

Python ROS 2 Gazebo PPO Reinforcement Learning

NOVA_5G: Network Slicing Testbed

5G network diagram with core, gNB, and slices
5G Standalone testbed implementing network slicing using Open5GS core and srsRAN gNB with virtual UEs.
View on GitHub

Highlights

  • End-to-end 5G SA setup with Open5GS core and srsRAN gNB + virtual UE.
  • Implements and evaluates network slicing scenarios.
  • Foundation for experiments in edge computing and 6G research.

Tech Stack

Shell Open5GS srsRAN 5G SA Network Slicing

Contact me

Name
Mahmoud Mayaleh
Address
Paris, France
Email
mahmoud@mayaleh.com
Message me