The ORC team is excited to announce the release of ORC v2.1.0.

New Feature

  • ORC-262 [C++] Support async prefetch in Orc reader
  • ORC-1388 [C++] Support schema evolution from decimal to timestamp/string group
  • ORC-1389 [C++] Support schema evolution from string group to numeric/string group
  • ORC-1390 [C++] Support schema evolution from string group to decimal/timestamp
  • ORC-1622 [C++] Support conan packaging
  • ORC-1807 [C++] Native support for vcpkg

Improvement

  • ORC-1264 [C++] Add a writer option to align compression block with row group boundary
  • ORC-1365 [C++] Use BlockBuffer to replace DataBuffer of rawInputBuffer in the CompressionStream
  • ORC-1635 Try downloading orc-format from dlcdn.apache.org before archive.apache.org
  • ORC-1645 Evaulate stripe stats before load stripe footer
  • ORC-1658 [C++] uniform identifiers naming style.
  • ORC-1661 [C++] Better handling when TZDB is unavailable
  • ORC-1664 Enable the removeUnusedImports function in spotless-maven-plugin
  • ORC-1665 Enable the importOrder function in spotless-maven-plugin
  • ORC-1667 Add check tool to check the index of the specified column
  • ORC-1669 [C++] Deprecate HDFS support
  • ORC-1672 Modify the package name of TestCheckTool
  • ORC-1675 [C++] Print decimal values as strings
  • ORC-1677 [C++] remove m prefix of variables.
  • ORC-1683 Fix instanceof of BinaryStatisticsImpl merge method
  • ORC-1684 [C++] Find tzdb without TZDIR when in conda-environments
  • ORC-1685 Use Pattern Matching for instanceof in RecordReaderImpl
  • ORC-1686 [C++] Avoid using std::filesystem
  • ORC-1687 [C++] Enforce naming style.
  • ORC-1688 [C++] Do not access TZDB if there is no timestamp type
  • ORC-1689 [C++] Generate CMake config file
  • ORC-1690 [C++] Refactor CMake to use imported thirdtparty libraries
  • ORC-1710 Reduce enum array allocation
  • ORC-1711 [C++] Introduce a memory block size parameter for writer option
  • ORC-1720 [C++] Unified compressor/decompressor exception types
  • ORC-1724 JsonFileDump utility should print user metadata
  • ORC-1730 [C++] Add finishEncode support for the encoder
  • ORC-1732 [C++] Can’t detect Protobuf installed by Homebrew on macOS
  • ORC-1733 [C++] [CMake] Fix CMAKE_MODULE_PATH not to use PROJECT_SOURCE_DIR
  • ORC-1751 [C++] Syntax error in ThirdpartyToolchain
  • ORC-1767 [C++] Improve writing performance of encoded string column and support EncodedStringVectorBatch for StringColumnWriter
  • ORC-1796 [C++] Reading orc file which lack of Statistics may give wrong result
  • ORC-1810 Offline build support

Bug Fix

  • ORC-1654 [C++] Count up EvaluatedRowGroupCount correctly.
  • ORC-1657 Fix building apache orc with clang-cl on Windows
  • ORC-1706 [C++] Fix build break w/ BUILD_CPP_ENABLE_METRICS=ON
  • ORC-1725 [C++] Statistics for BYTE type are calculated incorrectly on ARM
  • ORC-1738 Wrong Int128 maximum value
  • ORC-1811 Use the recommended closer.lua URL to download ORC format
  • ORC-1813 Incompatibility with ORC files written in version 0.12 due to missing hasNull field in C++ Reader

Task

  • ORC-1573 Setting version to 2.1.0-SNAPSHOT
  • ORC-1594 Add IntelliJ conf in the project root directory to support JIRA/PR autolinks
  • ORC-1649 [C++] [Conan] Add 2.0.0 to conan recipe and update release guide
  • ORC-1655 Add label definition to conan directory
  • ORC-1656 Skip build and test on conan updates
  • ORC-1666 Remove extra newlines at the end of Java files
  • ORC-1758 Use OpenContainers Annotations in docker images
  • ORC-1802 Enable tag protection

Test

  • ORC-1589 Bump spotbugs-maven-plugin to 4.8.3.0
  • ORC-1590 Bump spotless-maven-plugin to 2.42.0
  • ORC-1603 Bump checkstyle to 10.13.0
  • ORC-1606 Upgrade spotless-maven-plugin to 2.43.0
  • ORC-1611 Bump junit to 5.10.2
  • ORC-1651 Bump checkstyle to 10.14.0
  • ORC-1652 Bump extra-enforcer-rules to 1.8.0
  • ORC-1653 Bump maven-assembly-plugin to 3.7.0
  • ORC-1659 Bump guava to 33.1.0-jre
  • ORC-1660 Bump checkstyle to 10.14.2
  • ORC-1673 Remove test packages o.a.o.tools.[count|merge|sizes]
  • ORC-1676 Use Hive 4.0.0 in benchmark
  • ORC-1678 Bump checkstyle to 10.15.0
  • ORC-1680 Bump bcpkix-jdk18on to 1.78
  • ORC-1691 Bump spotbugs-maven-plugin to 4.8.4.0
  • ORC-1694 Upgrade gson to 2.9.0 for Benchmarks Hive
  • ORC-1695 Upgrade gson to 2.10.1
  • ORC-1699 Fix SparkBenchmark in Parquet format according to SPARK-40918
  • ORC-1700 Write parquet decimal type data in Benchmark using FIXED_LEN_BYTE_ARRAY type
  • ORC-1704 Migration to Scala 2.13 of Apache Spark 3.5.1 at SparkBenchmark
  • ORC-1707 Fix sun.util.calendar IllegalAccessException when SparkBenchmark runs on JDK17
  • ORC-1708 Support data/compress options in Hive benchmark
  • ORC-1709 Upgrade GitHub Action setup-java to v4 and use built-in cache feature
  • ORC-1713 Bump spotbugs-maven-plugin to 4.8.5.0
  • ORC-1716 Bump com.puppycrawl.tools:checkstyle to 10.16.0
  • ORC-1719 Bump guava to 33.2.0-jre
  • ORC-1722 Bump checkstyle to 10.17.0
  • ORC-1726 Bump guava to 33.2.1-jre
  • ORC-1727 Bump maven-enforcer-plugin to 3.5.0
  • ORC-1728 Bump maven-shade-plugin to 3.6.0
  • ORC-1729 Bump maven-checkstyle-plugin to 3.4.0
  • ORC-1731 Upgrade maven-dependency-plugin to 3.7.0
  • ORC-1735 Upgrade maven-dependency-plugin to 3.7.1
  • ORC-1736 Bump junit to 5.10.3
  • ORC-1737 Bump spotbugs-maven-plugin to 4.8.6.1
  • ORC-1739 Bump spotbugs-maven-plugin to 4.8.6.2
  • ORC-1745 Remove Ubuntu 20.04 Support
  • ORC-1750 Bump protobuf-java to 3.25.4
  • ORC-1756 Bump snappy-java to 1.1.10.6 in bench module
  • ORC-1760 Upgrade junit to 5.11.0
  • ORC-1761 Upgrade guava to 33.3.0-jre
  • ORC-1763 Upgrade checkstyle to 10.18.0
  • ORC-1764 Upgrade maven-checkstyle-plugin to 3.5.0
  • ORC-1765 Upgrade maven-dependency-plugin to 3.8.0
  • ORC-1771 Upgrade checkstyle to 10.18.1
  • ORC-1772 Bump spotbugs-maven-plugin to 4.8.6.3
  • ORC-1774 Upgrade snappy-java to 1.1.10.7 in bench module
  • ORC-1776 Remove MacOS 12 from GitHub Action CI and docs
  • ORC-1778 Upgrade Spark to 4.0.0-preview2
  • ORC-1779 Upgrade extra-enforcer-rules to 1.9.0
  • ORC-1780 Upgrade spotbugs-maven-plugin to 4.8.6.4
  • ORC-1783 Add MacOS 15 to GitHub Action MacOS CI and docs
  • ORC-1786 Upgrade guava to 33.3.1-jre
  • ORC-1788 Upgrade checkstyle to 10.18.2
  • ORC-1789 Upgrade junit to 5.11.2
  • ORC-1790 Upgrade parquet to 1.14.3 in bench module
  • ORC-1794 Upgrade checkstyle to 10.19.0
  • ORC-1795 Upgrade junit to 5.11.3
  • ORC-1797 Upgrade spotbugs-maven-plugin to 4.8.6.5
  • ORC-1799 Upgrade maven-checkstyle-plugin to 3.6.0
  • ORC-1801 Upgrade checkstyle to 10.20.0
  • ORC-1804 Upgrade parquet to 1.14.4 in bench module
  • ORC-1805 Upgrade checkstyle to 10.20.1
  • ORC-1806 Upgrade spotbugs-maven-plugin to 4.8.6.6
  • ORC-1809 Upgrade checkstyle to 10.20.2
  • ORC-1812 Upgrade parquet to 1.15.0 in bench module
  • ORC-1816 Upgrade checkstyle to 10.21.0
  • ORC-1820 Bump junit.version to 5.11.4
  • ORC-1821 Upgrade guava to 33.4.0-jre
  • ORC-1822 [C++] [CI] Use cpp-linter-action for clang-tidy and clang-format
  • ORC-1823 Upgrade checkstyle to 10.21.1
  • ORC-1826 [C++] Add ASAN to CI

Build and Dependency Changes

Documentation

  • ORC-642 Update PatchedBase doc with patch ceiling in spec
  • ORC-1634 Fix some outdated descriptions in Building ORC documentation
  • ORC-1668 Add merge command to Java tools documentation
  • ORC-1800 Upgrade bcpkix-jdk18on to 1.79
  • ORC-1814 Use Ubuntu 24.04/Jekyll 4.3/Rouge 4.5 to generate website
  • ORC-1815 Remove broken people.apache.org links
  • ORC-1819 Publish snapshot website through GitHub Pages
  • ORC-1824 Update Python documentation with PyArrow 18.1.0 and Task 2024.12.1
  • ORC-1830 Fix release table hyperlink to use baseurl