DATA ENGINEER – RESEARCH and DEVELOPMENT “CLEARSKIES TDIR PLATFORM”

WE ARE ODYSSEY, looking for Cyber Warriors to join our journey!

As pioneers in the cybersecurity arena, our journey parallels that of the legendary Odysseus. Just as he ventured into the unknown with unwavering determination, we too navigate the ever-evolving threat landscape with an innovative and forward-thinking mindset.

Our mission is clear: to make the world a cyber safer place. If you share this determination, join our ranks and become an integral part of this journey, contributing your unique skills and perspectives to tackle impossible challenges as we build cyber-resilient futures for our clients. We firmly believe in the power of many and we promote an environment where your voice matters, learning and growth are encouraged, and innovation is rewarded.

Are you someone who thrives in the face of challenges?

Do you have a collaborative spirit, passion for innovation and a commitment to making the world a cyber safer place for all?

If so, join OUR Odyssey and make it your journey as well‘cause the beauty and reward lie in the journey and not the destination itself.

ROLE DESCRIPTION

We are looking for an experienced Data Engineer working with large language models (LLMs) to join our award winning “ClearSkies TDIR Platform” Research and Development forward-thinking team.

This role is crucial for developing and maintaining scalable data pipelines and infrastructure to support the training and deployment of large language models. The ideal candidate will bring a blend of data engineering skills and a deep understanding of the intricacies involved in managing data for LLMs and other advanced modeling from preprocessing to optimization for performance at scale.

MAIN RESPONSIBILITIES
  • Design, build, and maintain scalable and efficient data pipelines specifically tailored for training and deploying large language models.
  • Work closely with data scientists and machine learning engineers to understand data requirements for LLM projects, including data collection, processing, and storage needs.
  • Implement and manage data ingestion routines from a variety of sources, ensuring data quality and accessibility for LLM training.
  • Optimize data infrastructure to support the computational demands of LLMs, including performance tuning and scalability improvements.
  • Develop tools and processes for monitoring and analyzing data pipeline performance and data quality, ensuring the integrity and availability of data.
  • Collaborate with cross-functional teams to ensure seamless integration of LLMs into production environments, including support for model versioning, deployment, and monitoring.
  • Stay abreast of the latest developments in large language models, data engineering practices, and technologies to continually improve pipeline efficiency and model performance.
  • Ensure compliance with data governance and security policies throughout the data lifecycle, from ingestion to model deployment.
KNOWLEGDE, SKILLS AND EXPERIENCE REQUIRED
  • Proven experience as a Data Engineer, with specific experience working on projects involving large language models
  • Strong expertise in data modeling, ETL processes, and data pipeline tools
  • Proficient in programming languages commonly used in data engineering and machine learning, such as Python and SQL.
  • Experience with big data technologies (e.g., Hadoop, Spark) and cloud services (AWS, Google Cloud, Azure) tailored for machine learning and data processing workloads.
  • Knowledge of containerization and orchestration technologies (e.g., Docker, Kubernetes) for deploying and managing LLM applications.
  • Familiarity with machine learning operations (MLOps) practices for managing the lifecycle of machine learning models, including large language models.
  • Excellent problem-solving skills, with the ability to work independently and as part of a team in a fast-paced environment.
  • Strong communication skills, with the ability to explain complex technical concepts to non-technical stakeholders.

Additional Skills (Preferred, but not mandatory):

  • Experience with neural network optimization techniques for efficient training and inference.
  • Contributions to open-source projects related to large language models or data engineering.
  • Certifications in cloud technologies, big data, or machine learning.

Qualifications:

  • Bachelor's or Master's degree in Computer Science, Engineering, Information Technology, or a related field.
  • 2+ years of proven experience as a Data Engineer
WHAT’S IN IT FOR YOU
  • Competitive remuneration package (according to experience and qualifications)
  • Opportunity to work in a highly specialized, progressive and professional setting
  • Hybrid and contemporary working environment, “Best Place to Work” for 3 consecutive years
  • 13th salary
  • Provident Fund
  • Medical and Life Insurance
  • Referral Scheme - You can recommend the best talents to the company and receive a reward
  • Half-day on Fridays
  • Performance based awards and bonus
  • Access to the latest technologies
  • Mentoring, training & development opportunities
Code

136

Job Location
Nicosia, Cyprus
Close modal window

Thank you for submitting your application. We will contact you shortly!