About Company:
Qualifications :
· Experience: Minimum of five (5) years of professional experience in a Data Engineering or similar role building data platforms.
· Python: Expert proficiency in Python, including advanced knowledge of common data processing libraries and engineering best practices.
· Big Data Processing: Demonstrated experience with large-scale data processing using PySpark.
· Database: Extensive, practical experience managing and tuning large-scale ClickHouse clusters for query performance, data retention, and stability.
· Streaming: Proven ability to operate, scale, and maintain production Kafka clusters, including in-depth knowledge of Kafka Connect, Kafka Streams, and performance metrics.
· Data Modeling: Strong knowledge of SQL, data modeling, and techniques for optimizing queries within analytical (OLAP) databases.
· Monitoring: Experience utilizing monitoring tools (e.g., Prometheus, Grafana) to evaluate the health and performance of data infrastructure.
Duties:
· Pipeline Development: Design, build, and maintain robust, scalable, and reliable data pipelines.
· ClickHouse Optimization: Serve as the primary expert for the ClickHouse environment. Responsibilities include system monitoring, performance tuning, data schema design, and scaling.
· Kafka Management: Manage and optimize our high-throughput Kafka cluster. This involves topic management, consumer group balancing, broker configuration, and ensuring reliable data delivery and ersistence.
· Data Quality: Implement robust procedures to monitor, validate, and guarantee data quality across all streaming and batch processes.
· Collaboration: Collaborate closely with data scientists and data analysts to define data requirements and develop efficient technical solutions.
· System Operations: Troubleshoot and resolve complex issues within the production environment. This includes identifying and eliminating performance bottlenecks and managing resource utilization across the data platform.