Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix race condition in LRU list update in get_cached_relsize #8807

Merged
merged 1 commit into from
Aug 22, 2024

Conversation

knizhnik
Copy link
Contributor

@knizhnik knizhnik commented Aug 22, 2024

Problem

See https://neondb.slack.com/archives/C07J14D8GTX/p1724347552023709
Manipulations with LRU list in relation size cache are performed under shared lock

Summary of changes

Take exclusive lock

Checklist before requesting a review

  • I have performed a self-review of my code.
  • If it is a core feature, I have added thorough tests.
  • Do we need to implement analytics? if so did you add the relevant metrics to the dashboard?
  • If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.

Checklist before merging

  • Do not forget to reformat commit message to not include the above checklist

@knizhnik knizhnik requested a review from hlinnaka August 22, 2024 18:52
@knizhnik knizhnik requested review from a team as code owners August 22, 2024 18:52
@knizhnik knizhnik requested a review from arpad-m August 22, 2024 18:52
Copy link

3801 tests run: 3695 passed, 0 failed, 106 skipped (full report)


Code coverage* (full report)

  • functions: 32.4% (7241 of 22342 functions)
  • lines: 50.4% (58544 of 116127 lines)

* collected from Rust tests only


The comment gets automatically updated with the latest test results
056174e at 2024-08-22T19:46:55.556Z :recycle:

@knizhnik knizhnik merged commit 7a485b5 into main Aug 22, 2024
69 checks passed
@knizhnik knizhnik deleted the relsize_cache_lru_rc_fix branch August 22, 2024 20:53
hlinnaka added a commit that referenced this pull request Aug 23, 2024
I wrote this as a regression test for the bug fixed in #8807, but seems
like a useful test in general.

Without the fix from PR #8807, this readily fails with an assertion
failure or other errors.

XXX: This is failing on 'main', even with the fix from #8807, just not
as quickly. This uses a small neon.relsize_hash_size=100 setting; when
I bump it up to 1000 or more, it doesn't fail anymore. But this
suggests that it's still possible to "overwhelm" the relsize cache
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants