TsFile: A Standard Format for IoT Time Series Data

TsFile: A Standard Format for IoT Time Series Data

The TsFile project has reached 1.0 as committers work toward making it an independent project within the Apache Software Foundation. TsFile is a columnar storage file format designed for time series data, featuring advanced compression to minimize storage, high throughput of read and write, and deep integration with processing and analysis tools such as Apache …

Read more

Q&A: Cockroach Labs’ Spencer Kimball on Distributing SQL

Q&A: Cockroach Labs’ Spencer Kimball on Distributing SQL

When the young programming whiz Spencer Kimball joined Google in 2004, the up-and-coming search company — like everyone else in Silicon Valley — sharded its databases to overcome storage limitations and performance latencies. But it was a stopgap solution at best. “They sharded and that sharding was ugly,” Kimball said. After Kimball had spent about …

Read more

Apple Comet Brings Fast Vector Processing to Apache Spark

Apple Comet Brings Fast Vector Processing to Apache Spark

Consumer electronics giant Apple has released into open source a plug-in that would help Apache Spark execute vector searches more efficiently, making the open source data processing platform more appealing for large-scale machine learning data crunching. logo The Apple engineers behind the Rust-based plug-in, called Apache Spark DataFusion Comet, have submitted it to become an …

Read more