Reduce the amount of llvm IR instantiated #205

Marwes · 2020-09-29T17:44:48Z

This works to improve the generated code in a similar way to #204 , however it is entirely orthogonal to it but also far more invasive.

The main change here is the introduction of RawTableInner which is contains all the fields that RawTable has but without being parameterized by T. Methods have been moved to this new type in parts or their entirety and RawTable forwards to these methods.

For this small test case with 4 different maps there is a reduction of the number of llvm lines generated by -17% (18088 / 21873 =0.82695560737) .

fn main() {
    let mut map1 = hashbrown::HashMap::new();
    map1.insert(1u8, "");
    map1.reserve(1000);
    let mut map2 = hashbrown::HashMap::new();
    map2.insert(1i16, "");
    map2.reserve(1000);
    let mut map3 = hashbrown::HashMap::new();
    map3.insert(3u16, "");
    map3.reserve(1000);
    let mut map4 = hashbrown::HashMap::new();
    map4.insert(3u64, "");
    map4.reserve(1000);
    dbg!((
        map1.iter().next(),
        map2.iter().next(),
        map3.iter().next(),
        map4.iter().next()
    ));
}

The commits are almost entirely orthogonal (except the first which does the main refactoring to support the rest) and if some are not desired they can be removed. If it helps, this PR could also be split up into multiple.

For most commitst I don't expect any performance degradation (unless LLVM stops inlining some function as it is now called more), but there are a couple of commits that does slow parts down however these should only be in the cold parts of the code (for instance, panic handling).

Amanieu

Could you try running the benchmarks (cargo bench) to see if any of the changes have a performance impact?

src/raw/mod.rs

Amanieu · 2020-10-01T21:35:44Z

src/raw/mod.rs

+    }
+}
+
+impl RawTableInner {


All methods here should be #[inline] since they are non-generic.

Marwes · 2020-10-02T08:51:03Z

Didn't bother to inline the private functions since I think it is redundant within the same compilation unit(?) But I guess they are used from generic functions which causes them to be used from another compilation unit.

That seemed to be enough to restore performance though.

 name                         old ns/iter  new ns/iter  diff ns/iter   diff %  speedup
 clone_from_large             1,272        1,325                  53    4.17%   x 0.96
 clone_from_small             17           17                      0    0.00%   x 1.00
 clone_large                  1,371        1,362                  -9   -0.66%   x 1.01
 clone_small                  32           32                      0    0.00%   x 1.00
 insert_ahash_highbits        8,753        8,957                 204    2.33%   x 0.98
 insert_ahash_random          8,982        8,872                -110   -1.22%   x 1.01
 insert_ahash_serial          8,462        8,141                -321   -3.79%   x 1.04
 insert_erase_ahash_highbits  20,480       20,212               -268   -1.31%   x 1.01
 insert_erase_ahash_random    19,826       19,715               -111   -0.56%   x 1.01
 insert_erase_ahash_serial    19,543       19,168               -375   -1.92%   x 1.02
 insert_erase_std_highbits    38,157       38,102                -55   -0.14%   x 1.00
 insert_erase_std_random      38,685       38,385               -300   -0.78%   x 1.01
 insert_erase_std_serial      31,856       32,867              1,011    3.17%   x 0.97
 insert_std_highbits          17,661       17,811                150    0.85%   x 0.99
 insert_std_random            17,952       17,970                 18    0.10%   x 1.00
 insert_std_serial            15,414       15,504                 90    0.58%   x 0.99
 iter_ahash_highbits          1,153        1,201                  48    4.16%   x 0.96
 iter_ahash_random            1,060        1,203                 143   13.49%   x 0.88
 iter_ahash_serial            1,076        1,141                  65    6.04%   x 0.94
 iter_std_highbits            1,112        1,087                 -25   -2.25%   x 1.02
 iter_std_random              1,239        1,111                -128  -10.33%   x 1.12
 iter_std_serial              1,171        1,117                 -54   -4.61%   x 1.05
 lookup_ahash_highbits        5,998        5,831                -167   -2.78%   x 1.03
 lookup_ahash_random          5,611        5,674                  63    1.12%   x 0.99
 lookup_ahash_serial          5,405        5,393                 -12   -0.22%   x 1.00
 lookup_fail_ahash_highbits   5,407        5,167                -240   -4.44%   x 1.05
 lookup_fail_ahash_random     5,182        5,480                 298    5.75%   x 0.95
 lookup_fail_ahash_serial     5,014        5,475                 461    9.19%   x 0.92
 lookup_fail_std_highbits     14,310       14,504                194    1.36%   x 0.99
 lookup_fail_std_random       14,117       14,548                431    3.05%   x 0.97
 lookup_fail_std_serial       12,181       12,272                 91    0.75%   x 0.99
 lookup_std_highbits          14,178       14,635                457    3.22%   x 0.97
 lookup_std_random            14,449       14,862                413    2.86%   x 0.97
 lookup_std_serial            12,059       12,254                195    1.62%   x 0.98

Amanieu · 2020-10-04T16:27:21Z

The changes mostly look good. However they are quite invasive so I would feel more comfortable if we could test the impact of this change in compilation performance.

Could you try creating a PR in rust-lang/rust which switches libstd to use your branch of hashbrown so that we can do a perf run?

bors · 2020-10-15T15:09:30Z

☔ The latest upstream changes (presumably #208) made this pull request unmergeable. Please resolve the merge conflicts.

Note that reviewers usually do not review pull requests until merge conflicts are resolved! Once you resolve the conflicts, you should change the labels applied by bors to indicate that your PR is ready for review. Post this as a comment to change the labels:

@rustbot modify labels: +S-waiting-on-review -S-waiting-on-author

Includes rust-lang/hashbrown#204 and rust-lang/hashbrown#205 (not yet merged) which both server to reduce the amount of IR generated for hashmaps. Inspired by the llvm-lines data gathered in rust-lang#76680

feat: Update hashbrown to instantiate less llvm IR Includes rust-lang/hashbrown#204 and rust-lang/hashbrown#205 (not yet merged) which both serve to reduce the amount of IR generated for hashmaps. Inspired by the llvm-lines data gathered in rust-lang#76680 (cc `@Julian-Wollersberger)`

Marwes changed the title ~~Smaller ir~~ Reduce the amount of llvm IR instantiated Sep 29, 2020

Amanieu reviewed Oct 1, 2020

View reviewed changes

Marwes force-pushed the smaller_ir branch from dec2859 to 6da8a59 Compare October 2, 2020 12:42

Marwes force-pushed the smaller_ir branch from 6da8a59 to d556b53 Compare October 5, 2020 08:02

Marwes mentioned this pull request Oct 5, 2020

feat: Update hashbrown to instantiate less llvm IR rust-lang/rust#77566

Merged

Marwes force-pushed the smaller_ir branch from a15fe50 to 9ab2ee0 Compare October 6, 2020 14:35

Marwes force-pushed the smaller_ir branch from 57b9ebe to 50ae8ee Compare December 5, 2020 18:32

Marwes force-pushed the smaller_ir branch 2 times, most recently from 3faab35 to 502f9fe Compare January 14, 2021 22:27

Marwes force-pushed the smaller_ir branch 2 times, most recently from adb8127 to 0ec09ce Compare January 21, 2021 16:33

Marwes force-pushed the smaller_ir branch from 0ec09ce to 876c9c1 Compare January 21, 2021 20:17

chore: Add a benchmark for growing a map while inserting to it

873985a

Amanieu mentioned this pull request Jul 14, 2021

Optimize insertion to only use a single lookup #277

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce the amount of llvm IR instantiated #205

Reduce the amount of llvm IR instantiated #205

Marwes commented Sep 29, 2020 •

edited

Loading

Amanieu left a comment

Amanieu Oct 1, 2020

Marwes commented Oct 2, 2020

Amanieu commented Oct 4, 2020

bors commented Oct 15, 2020

Reduce the amount of llvm IR instantiated #205

Reduce the amount of llvm IR instantiated #205

Conversation

Marwes commented Sep 29, 2020 • edited Loading

Amanieu left a comment

Choose a reason for hiding this comment

Amanieu Oct 1, 2020

Choose a reason for hiding this comment

Marwes commented Oct 2, 2020

Amanieu commented Oct 4, 2020

bors commented Oct 15, 2020

Marwes commented Sep 29, 2020 •

edited

Loading