A trick to save complex data structures into Eigen Vectors in C++

At some occasion, codes are designed for specific purposes. But years later, when we want to add functionalities to our code, we may realize that the initial design choices were not flexible enough. Then you must think of how to accommodate with that design choice in order to be able to incorporate the new capabilities you had in mind.

This is exactly what happened to me recently: I wanted to add the possibility of handling bi-dimensional grids into my MCMC code that would be suitable for a bi-cubic interpolation of a function that is otherwise costly to compute (double integral of spherical harmonic functions). And in a MCMC, it is extremely important to keep the performance as high as possible. Especially that my use-case involves a MCMC sampling in typically 50 – 100 dimensions. You can by the way see the code at https://github.com/OthmanB/TAMCMC-C.

In my code and to briefly summarise, parameters are passed from a function in charge of initiallising the MCMC setup (in function of the user choice and of the analysed data), to another function that computes the model chosen by the user and on which the MCMC will be looping. However, to keep the code efficient, the multiple data interpolator functions (there is multiple interpolation to be made in my model) required by the new model I want to implement should be initiallised outside the loop and then passed to the model function. Unit tests that I performed separately showed that loading the grid in memory and intialising the interpolator within the model function induce a speed decrease of a factor ~50 in my typical case.

I use the GSL library to create the interpolators, and I decided to contain these into a structure along with other constants required for the GSL interpolating function. This to keep things well organised and contained.

However, my code is made of several modules and it is designed to be able to easily switch between models in order to test different scenarios. But this also means that I cannot easily add a new variable to the model function: New parameters have to be contained into the Eigen::VectorXd that host all of the free or fixed parameters of the model. It would require a refactoring of many parts of the code if not doing so. Obviously an Eigen::VectorXd seems not the perfect way to store the pointers to the GSL interpolators and the grids themselves. Or isn’t it? Well, thanks to a short discussion with ChatGPT (actually the plugin CodeGPT in VSCode), I very quickly got a nice solution that fits my purpose. CodeGPT gave me an answer with several bugs, but it was good enough so that I build upon it. GPT 3.5 is used here (OpenAI charges you very little for GPT3.5, but much more for GPT4) and it is far from perfect, but is extremely good to quickly point towards the correct direction: For me, it completely replaced stackoverflow.com. I will give you just a test case, which is not the final way I am coding this into my own MCMC analysis program. But it will give you a sense on how to do it for your application, if you need so.

Here is a way to store as a double within a VectorXd, a pointer to a data structure.

#include <Eigen/Dense>
#include <iostream>
struct MyStruct {
    int num;
    double val;
};


int main() {
    MyStruct st;
    st.num=1;
    st.val=3.14;
    MyStruct *ptr_st = &st;

    Eigen::VectorXd params(4);

    // Store the pointer values as doubles in the VectorXd object
    params[0]=static_cast<double>(reinterpret_cast<std::uintptr_t>(ptr_st));
    params[1]=11;
    params[2]=22;
    params[3]=325.23234234;

    // Cast the pointer values back to pointers and access the values
    MyStruct st_new = *reinterpret_cast<MyStruct *>(static_cast<std::uintptr_t>(params(0)));

    std::cout << params[1] << std::endl;
    std::cout << params[2] << std::endl;
    std::cout << params[3] << std::endl;
    std::cout << " --- " << std::endl;
    std::cout << "By pointer to struc.num =" << st_new.num << std::endl;
    std::cout << "By pointer to struc.val =" << st_new.val << std::endl;
    
    return 0;
}

You can test this by compiling:

g++ -o test main.cpp -I$EIGEN3_INCLUDE_DIR

Where $EIGEN3_INCLUDE_DIR is the path to you eigen main directory.

Edit on 21 April 2023: Note that this solution did not work eventually work in my code. The tests worked, but it effective implementation fails. The reason is unclear, but I suspect that the many data manipulations I make are not suitable for this ‘trick’. So it may work for you, but just be careful to test all of your code rigorously. I eventually picked another solution which consisted on loading the data required for the GSL interpolation into a new structure within the class that handle the models. It is then available to all models, which is in fact a better solution on my case.

Leave a Reply

Your email address will not be published. Required fields are marked *