{"id":5559,"date":"2026-02-02T14:56:45","date_gmt":"2026-02-02T14:56:45","guid":{"rendered":"https:\/\/cephasconsult.biz\/?post_type=job_listing&#038;p=5559"},"modified":"2026-03-05T00:30:14","modified_gmt":"2026-03-05T00:30:14","slug":"lead-ii-software-engineering-aws-apache-spark-pyspark-scala-apache-kafka-47292","status":"expired","type":"job_listing","link":"https:\/\/cephasconsult.biz\/?post_type=job_listing&p=5559","title":{"rendered":"Lead II &#8211; Software Engineering &#8211; AWS, Apache Spark (PySpark\/Scala), Apache Kafka 47292"},"content":{"rendered":"<p><span class=\"flex-shrink-0\">Positions:<span class=\"font-semibold\">3 <\/span><\/span><span class=\"flex-shrink-0 font-semibold\">Full Time<\/span><\/p>\n<div class=\"font-inter-regular-paragraph2 text-cbrex-light-neutral-800 capitalize grid grid-cols-3 gap-x-1\">\n<div class=\"col-span-1\">Experience<\/div>\n<div class=\"font-inter-semibold-paragraph2  text-cbrex-light-surface-pb col-span-2\">5 &#8211; 8 Years<\/div>\n<\/div>\n<div><\/div>\n<div>\n<div class=\"transform scale-100 opacity-100\">\n<div id=\"headlessui-disclosure-panel-:r13m:\">\n<div class=\"relative flex flex-col gap-y-2 border-b bg-cbrex-light-neutral-50 py-3 px-4 text-sm text-cbrex-light-neutral-700\">\n<div id=\"jobDescriptionViewer\" data-cy=\"jobDescriptionViewerCbrexEditorViewer\">\n<div class=\"toastui-editor-contents\">\n<div data-nodeid=\"340\">\n<h3>Job Summary<\/h3>\n<p>We are seeking a skilled Data Engineer to design, build, and optimize scalable data pipelines and cloud-based data platforms. The role involves working with large-scale batch and real-time data processing systems, collaborating with cross-functional teams, and ensuring data reliability, security, and performance across the data lifecycle.<\/p>\n<p>Key Responsibilities<\/p>\n<h3>ETL Pipeline Development &amp; Optimization<\/h3>\n<ul>\n<li>Design, develop, and maintain complex end-to-end ETL pipelines for large-scale data ingestion and processing.<\/li>\n<li>Optimize data pipelines for performance, scalability, fault tolerance, and reliability.<\/li>\n<\/ul>\n<h3>Big Data Processing<\/h3>\n<ul>\n<li>Develop and optimize batch and real-time data processing solutions using <strong>Apache Spark (PySpark\/Scala)<\/strong> and <strong>Apache Kafka<\/strong>.<\/li>\n<li>Ensure fault-tolerant, scalable, and high-performance data processing systems.<\/li>\n<\/ul>\n<h3>Cloud Infrastructure Development<\/h3>\n<ul>\n<li>Build and manage scalable, cloud-native data infrastructure on <strong>AWS<\/strong>.<\/li>\n<li>Design resilient and cost-efficient data pipelines adaptable to varying data volume and formats.<\/li>\n<\/ul>\n<h3>Real-Time &amp; Batch Data Integration<\/h3>\n<ul>\n<li>Enable seamless ingestion and processing of real-time streaming and batch data sources (e.g., <strong>AWS MSK<\/strong>).<\/li>\n<li>Ensure consistency, data quality, and a unified view across multiple data sources and formats.<\/li>\n<\/ul>\n<h3>Data Analysis &amp; Insights<\/h3>\n<ul>\n<li>Partner with business teams and data scientists to understand data requirements.<\/li>\n<li>Perform in-depth data analysis to identify trends, patterns, and anomalies.<\/li>\n<li>Deliver high-quality datasets and present actionable insights to stakeholders.<\/li>\n<\/ul>\n<h3>CI\/CD &amp; Automation<\/h3>\n<ul>\n<li>Implement and maintain CI\/CD pipelines using <strong>Jenkins<\/strong> or similar tools.<\/li>\n<li>Automate testing, deployment, and monitoring to ensure smooth production releases.<\/li>\n<\/ul>\n<h3>Data Security &amp; Compliance<\/h3>\n<ul>\n<li>Collaborate with security teams to ensure compliance with organizational and regulatory standards (e.g., GDPR, HIPAA).<\/li>\n<li>Implement data governance practices ensuring data integrity, security, and traceability.<\/li>\n<\/ul>\n<h3>Troubleshooting &amp; Performance Tuning<\/h3>\n<ul>\n<li>Identify and resolve performance bottlenecks in data pipelines.<\/li>\n<li>Apply best practices for monitoring, tuning, and optimizing data ingestion and storage.<\/li>\n<\/ul>\n<h3>Collaboration &amp; Cross-Functional Work<\/h3>\n<ul>\n<li>Work closely with engineers, data scientists, product managers, and business stakeholders.<\/li>\n<li>Participate in agile ceremonies, sprint planning, and architectural discussions.<\/li>\n<\/ul>\n<p>Skills &amp; Qualifications<\/p>\n<h3>Mandatory (Must-Have) Skills<\/h3>\n<ol>\n<li><strong>AWS Expertise<\/strong>\n<ul>\n<li>Hands-on experience with AWS Big Data services such as <strong>EMR, Managed Apache Airflow, Glue, S3, DMS, MSK, and EC2<\/strong>.<\/li>\n<li>Strong understanding of cloud-native data architectures.<\/li>\n<\/ul>\n<\/li>\n<li><strong>Big Data Technologies<\/strong>\n<ul>\n<li>Proficiency in <strong>PySpark or Scala Spark<\/strong> and <strong>SQL<\/strong> for large-scale data transformation and analysis.<\/li>\n<li>Experience with <strong>Apache Spark<\/strong> and <strong>Apache Kafka<\/strong> in production environments.<\/li>\n<\/ul>\n<\/li>\n<li><strong>Data Frameworks<\/strong>\n<ul>\n<li>Strong knowledge of <strong>Spark DataFrames and Datasets<\/strong>.<\/li>\n<\/ul>\n<\/li>\n<li><strong>ETL Pipeline Development<\/strong>\n<ul>\n<li>Proven experience in building scalable and reliable ETL pipelines for both <strong>batch and real-time<\/strong> data processing.<\/li>\n<\/ul>\n<\/li>\n<li><strong>Database Modeling &amp; Data Warehousing<\/strong>\n<ul>\n<li>Expertise in designing scalable data models for <strong>OLAP and OLTP<\/strong> systems.<\/li>\n<\/ul>\n<\/li>\n<li><strong>Data Analysis &amp; Insights<\/strong>\n<ul>\n<li>Ability to perform complex data analysis and extract actionable business insights.<\/li>\n<li>Strong analytical and problem-solving skills with a data-driven mindset.<\/li>\n<\/ul>\n<\/li>\n<li><strong>CI\/CD &amp; Automation<\/strong>\n<ul>\n<li>Basic to intermediate experience with <strong>CI\/CD pipelines<\/strong> using <strong>Jenkins<\/strong> or similar tools.<\/li>\n<li>Familiarity with automated testing and deployment workflows.<\/li>\n<\/ul>\n<\/li>\n<\/ol>\n<h3>Good-to-Have (Preferred) Skills<\/h3>\n<ul>\n<li>Knowledge of <strong>Java<\/strong> for data processing applications.<\/li>\n<li>Experience with <strong>NoSQL databases<\/strong> (e.g., DynamoDB, Cassandra, MongoDB).<\/li>\n<li>Familiarity with <strong>data governance frameworks<\/strong> and compliance tooling.<\/li>\n<li>Experience with monitoring and observability tools such as <strong>AWS CloudWatch, Splunk, or Dynatrace<\/strong>.<\/li>\n<li>Exposure to cost optimization strategies for large-scale cloud data platforms.<\/li>\n<\/ul>\n<p><strong>Skills:<\/strong>big data,scala spark,apache spark,etl pipeline development,<\/p>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n<\/div>\n","protected":false},"author":1,"featured_media":0,"template":"","meta":{"_job_location":"Hyderabad, India","_application":"hrm@cephasconsult.biz","_company_name":"","_company_website":"","_company_tagline":"","_company_twitter":"","_company_video":"","_filled":0,"_featured":0,"_remote_position":0,"_job_salary":"","_job_salary_currency":"","_job_salary_unit":""},"job-types":[],"class_list":["post-5559","job_listing","type-job_listing","status-expired","hentry"],"aioseo_notices":[],"_links":{"self":[{"href":"https:\/\/cephasconsult.biz\/index.php\/wp-json\/wp\/v2\/job-listings\/5559","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/cephasconsult.biz\/index.php\/wp-json\/wp\/v2\/job-listings"}],"about":[{"href":"https:\/\/cephasconsult.biz\/index.php\/wp-json\/wp\/v2\/types\/job_listing"}],"author":[{"embeddable":true,"href":"https:\/\/cephasconsult.biz\/index.php\/wp-json\/wp\/v2\/users\/1"}],"wp:attachment":[{"href":"https:\/\/cephasconsult.biz\/index.php\/wp-json\/wp\/v2\/media?parent=5559"}],"wp:term":[{"taxonomy":"job_listing_type","embeddable":true,"href":"https:\/\/cephasconsult.biz\/index.php\/wp-json\/wp\/v2\/job-types?post=5559"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}