I was manually testing the thread safety of reloading a Berkeley environment and I found that the three databases I had set up were not in the same Berkeley environment despite all being in the same directory.

With LLDB I discovered why, the strPath member variable had a trailing separator in one case but not in the other:

This happened because the walletdir in my bitcoin.conf had a trailing separator in it, but if you call loadwallet or don’t specify walletdir the strPath will not have a trailing separator. Having two Berkeley database environments open in the same directory can cause data corruption.

I found plausible solutions on Stack Overflow and the question then was where to place the modification. The strPath private member is assigned by BerkeleyEnvironment’s initializer list.

My first thought was to move strPath assignment into the BerkeleyEnvironment constructor body, so as to remove the trailing separator there. However, this approach was flawed because BerkeleyEnvironment objects are only created by the GetWalletEnv function if the env_directory in that function does not already exist as a key in the g_dbenvs map. This means that if the trailing separator removal is in the class constructor, the key in the g_dbenvs global object could still be wrong and we could still have two environments in the same directory.

I iterated on the solution in a xeus-cling Jupyter notebook:

#include <iostream>
#include <string>
#pragma cling load("libboost_filesystem")
#include <boost/filesystem.hpp>
namespace fs = boost::filesystem;
using namespace std;
class BerkeleyEnvironment
{
private:
    std::string strPath;
public:
    BerkeleyEnvironment(const fs::path& dir_path) : strPath(dir_path.string()) {};
    fs::path Directory() const { return strPath; }
}
std::map<std::string, BerkeleyEnvironment> g_dbenvs;
BerkeleyEnvironment* GetWalletEnv(const fs::path& wallet_path, std::string& database_filename)
{
    fs::path env_directory;
    if (fs::is_regular_file(wallet_path)) {
        env_directory = wallet_path.parent_path();
        database_filename = wallet_path.filename().string();
    } else {
        env_directory = wallet_path;
        database_filename = "wallet.dat";
    }
    // The fix 👇
    while ((env_directory.string().back() == '/') || (env_directory.string().back() == '\\'))
        env_directory = env_directory.remove_trailing_separator();
    return &g_dbenvs.emplace(std::piecewise_construct, std::forward_as_tuple(env_directory.string()), std::forward_as_tuple(env_directory)).first->second;
}
std::vector<fs::path> wallet_test_paths;
fs::path expected_path = "test/path";
std::string trailing_separators;
for (int i = 0; i < 4; ++i ) {
    trailing_separators += '/';
    wallet_test_paths.push_back(expected_path / trailing_separators);
}
for (auto& wallet_test_path: wallet_test_paths)
{
    std::string database_filename;
    BerkeleyEnvironment* env = GetWalletEnv(wallet_test_path, database_filename);
    cout << env->Directory().string() << " ← " << wallet_test_path << "\n";
}

test/path ← “test/path/”

test/path ← “test/path//”

test/path ← “test/path///”

test/path ← “test/path////”

g_dbenvs

{ “test/path” => @0x7f996d673708 }

Satisfied with this solution, I implemented it in the Bitcoin codebase, added a Boost unit test, and submitted my pull request for review.

Code Review Round 1

In their comments ken2812221 and promag pointed out that the boost filesystem library has fs::detail::is_directory_separator and fs::path::preferred_separator respectively, which I can use to avoid hard coding the separator characters and, as ken2812221 points out, avoid “an infinity loop on Unix if the wallet filename is wallet\”. I took a look at how they are implemented in the boost filesystem library here and here to make sure there were no surprises or caveats.

In his review promag pointed out that the trailing separator removal would be better in walletutil.cpp’s GetWalletDir function. This seemed like a much better placement than in GetWalletEnv. I verified that all of the GetWalletEnv calls ultimately get their path from the GetWalletDir function and that the -walletdir configuration argument is only retrieved for use by the GetWalletDir function. I moved the logic and rewrote the unit test. It was a good lesson: don’t just look at where you found the bug, trace it back to a root cause and fix that.

I made these changes in my Jupyter notebook and the behavior was the same:

#include <iostream>
#include <string>
#pragma cling load("libboost_filesystem")
#include <boost/filesystem.hpp>
namespace fs = boost::filesystem;
using namespace std;
std::string walletdir_arg = "-walletdir";
class ArgsManager
{
protected:
    std::map<std::string, std::string> m_override_args;
public:
    void ForceSetArg(std::string& strArg, const std::string& strValue)
    {
        m_override_args[strArg] = strValue;
    }
    bool IsArgSet(std::string& strArg)
    {
        std::map<std::string, std::string>::const_iterator it = m_override_args.find(strArg);
        return it!=m_override_args.end();
    }
    std::string GetArg(std::string& strArg)
    {
        return m_override_args[strArg];
    }
}
ArgsManager gArgs;
fs::path GetWalletDir()
{
    fs::path path;
    if (gArgs.IsArgSet(walletdir_arg)) {
        path = gArgs.GetArg(walletdir_arg);
        if (!fs::is_directory(path)) {
            // If the path specified doesn't exist, we return the deliberately
            // invalid empty string.
            path = "";
        }
    }
    // Ensure that the directory does not end with a trailing separator to avoid
    // creating two Berkeley environments in the same directory
    while (fs::detail::is_directory_separator(path.string().back())) {
        path.remove_trailing_separator();
    }
    return path;
}
fs::path expected_wallet_dir = "test/path";
if (fs::create_directories("test")) {
    // This is the first run, create wallets subdirectory too
    fs::create_directories("test/path");
}
std::string trailing_separators;
for (int i = 0; i < 4; ++i) {
    trailing_separators += fs::path::preferred_separator;
    std::string database_filename;
    std::string wallet_dir_trailing = (expected_wallet_dir / trailing_separators).string();
    gArgs.ForceSetArg(walletdir_arg, wallet_dir_trailing);
    cout << GetWalletDir().string() << " ← " << wallet_dir_trailing << "\n";
}

test/path ← test/path/

test/path ← test/path//

test/path ← test/path///

test/path ← test/path////

Code Review Round 2

In his second round of comments promag pointed out that there is a fs::canonical function that clean up the path beyond just trailing separators. After some testing I agreed it was a better solution. I also searched the codebase again to identify a the best spot to insert the modification and found the WalletInit::Verify function. I realized that if I modified the -walletdir argument there then it wouldn’t need to be cleaned up with every call to GetWalletDir. I also added unit tests to make sure that existing Verify behavior was preserved.

#include <iostream>
#include <string>
#pragma cling load("libboost_filesystem")
#include <boost/filesystem.hpp>
namespace fs = boost::filesystem;
using namespace std;
std::string arg_key = "-walletdir";
class ArgsManager
{
protected:
    std::map<std::string, std::string> m_override_args;
public:
    void ForceSetArg(std::string& strArg, const std::string& strValue)
    {
        m_override_args[strArg] = strValue;
    }
    bool IsArgSet(std::string& strArg)
    {
        std::map<std::string, std::string>::const_iterator it = m_override_args.find(strArg);
        return it!=m_override_args.end();
    }
    std::string GetArg(std::string& strArg)
    {
        return m_override_args[strArg];
    }
}
ArgsManager gArgs;
bool Verify()
{
    if (gArgs.IsArgSet(arg_key)) {
        fs::path wallet_dir = gArgs.GetArg(arg_key);
        boost::system::error_code error;
        // The canonical path cleans the path, preventing >1 Berkeley environment instances for the same directory
        fs::path canonical_wallet_dir = fs::canonical(wallet_dir, error);
        if (error || !fs::exists(wallet_dir)) {
            return false;
        } else if (!fs::is_directory(wallet_dir)) {
            return false;
        // The canonical path transforms relative paths into absolute ones, so we check the non-canonical version
        } else if (!wallet_dir.is_absolute()) {
            return false;
        }
        gArgs.ForceSetArg(arg_key, canonical_wallet_dir.string());
    }
    return true;
}
struct TestCase
{
    std::string label;
    fs::path input_path;
    bool expected_result;
}
std::vector<TestCase> test_cases;

std::string sep;
sep += fs::path::preferred_separator;

fs::create_directories("tempdir");
fs::path m_cwd = fs::current_path();
fs::path m_datadir = m_cwd / "tempdir";
fs::current_path(m_datadir);

test_cases.push_back(TestCase {"default", m_datadir / "wallets", true});
test_cases.push_back(TestCase {"custom", m_datadir / "my_wallets", true});
test_cases.push_back(TestCase {"trailing", m_datadir / "wallets" / sep, true});
test_cases.push_back(TestCase {"trailing2", m_datadir / "wallets" / sep / sep, true});

test_cases.push_back(TestCase {"nonexistent", m_datadir / "path_does_not_exist", false});
test_cases.push_back(TestCase {"file", m_datadir / "not_a_directory.dat", false});
test_cases.push_back(TestCase {"relative", "wallets", false});

fs::create_directories(m_datadir / "wallets");
fs::create_directories(m_datadir / "my_wallets");
std::ofstream f((m_datadir / "not_a_directory.dat").BOOST_FILESYSTEM_C_STR);
f.close();
void SetWalletDir(const fs::path& walletdir_path)
{
    gArgs.ForceSetArg(arg_key, walletdir_path.string());
}
for (auto test_case: test_cases)
{
    std::cout << test_case.label << std::endl;
    SetWalletDir(test_case.input_path);
    bool result = Verify();
    if (result) {
        fs::path walletdir = gArgs.GetArg(arg_key);
        fs::path expected_path = fs::canonical(test_case.input_path);
        fs::path wallet_dir = gArgs.GetArg(arg_key);
        cout << wallet_dir.string() << " ← " << test_case.input_path.string() << std::endl << std::endl;
    }
}

default

/Users/pierre/src/cling-notebooks/bitcoin/tempdir/wallets ← /Users/pierre/src/cling-notebooks/bitcoin/tempdir/wallets

custom

/Users/pierre/src/cling-notebooks/bitcoin/tempdir/my_wallets ← /Users/pierre/src/cling-notebooks/bitcoin/tempdir/my_wallets

trailing

/Users/pierre/src/cling-notebooks/bitcoin/tempdir/wallets ← /Users/pierre/src/cling-notebooks/bitcoin/tempdir/wallets/

trailing2

/Users/pierre/src/cling-notebooks/bitcoin/tempdir/wallets ← /Users/pierre/src/cling-notebooks/bitcoin/tempdir/wallets//

nonexistent

file

relative

fs::current_path(m_cwd);    
fs::remove_all(m_datadir);