Ask HN: Are my HPC professors right? Is Python worthless compared to C?

12 points by megaloblasto a day ago

I'm a PhD student implementing a finite element code. It simulates electromagnet waves passing through heterogeneous material. This code has to run in parallel, and run fast. I've been using old C libraries like PETSc to do this, and honestly, I do not enjoy working with C at all. Its esoteric and difficult to understand, and just overall feels like I'm using a tool from the 70s.

I want to rewrite my simulation in Python. Every single HPC professor I had told me that Python is worthless for HPC and I should use C or C++ (they generally think Rust is interesting but don't recommend it).

I don't understand this way of thinking. My thought is to write it in Python, profile it, and if needed, rewrite the slow parts in C. I can use CuPy to run my code on a GPU, or mpi4py to run it in parallel with MPI. If I get my code working and prove that what I want to do is possible, but still need more performance, then I can write it in C as a last step.

What do you think? Should a young PhD student in HPC really be investing all their time in C and not consider Python as a reasonable solution?

warner25 16 hours ago

I'm an old PhD student. I've seen cases where an easy / naive solution written in Python can be orders-of-magnitude slower than a solution written in C, but I suspect that you're right: that a thoughtful / clever use of Python should be perfectly fine. I've also seen that professors don't know everything, and what they do know can be dated.

More importantly, though, I think a young PhD student should not pick a fight with his advisor and committee members or try to prove them wrong. Generally, do what they suggest and give them credit for it, or at least thank them enthusiastically for the suggestions and don't make a big deal about not following them. You're at their mercy for getting the PhD, and it's subjective, and their opinion of you probably matters at least as much as their opinion of your work. This is one the things that I learned from ~15 years of professional work under many different bosses before starting my PhD program, and something that I think many young students still need to learn.

bhaney a day ago

> This code has to run in parallel, and run fast

Then you're certainly not going to get away with writing it all in Python, but it's a very common paradigm to write hotter parts of the code in faster languages and then glue everything together in Python. I don't see why that wouldn't work here.

> My thought is to write it in Python, profile it, and if needed, rewrite the slow parts in C

That's a very reasonable and common approach if you aren't already confident in which parts will need the extra performance ahead of time.

> Should a young PhD student in HPC really be investing all their time in C and not consider Python as a reasonable solution?

You should absolutely be using both together, each to their respective strengths. The only thing unreasonable about any of this is the idea of pitting the languages against each other and acting like one needs to win.

GianFabien a day ago

I did my PhD after 20+ years in industry, lots of it programming in C on AIX/HPUX/Solaris systems. I ended up learning Python to complete my research work.

Granted compared with Python, C is verbose and the edit-compile-debug-run is a drag. However, C as a language is not too bad. It is the libraries and APIs that slow me down. Often times the abstractions are too leaky or a poor fit for what needs to be done.

What works for me is to test and refine algorithms in Python. When it works well, then I use the Python code as pseudo-code and translate to even more optimized C code. It helps to modularize your Python code, so that you only need to port the performance critical portions to C and the rest can remain in Python.

Of course, you still need to learn and gain experience with C. Personally I wouldn't put much faith in Python to C transpilers. For optimal performance, you really need to understand your algorithms and data structuring. These days understanding how caching and locality of code and data impacts performance is crucial to writing performant code.

BTW have you considered using CUDA, etc for your finite element code? GPGPUs are ideal for that sort of computation. Lots of potential for parallelization.

  • megaloblasto a day ago

    This is a great response. I really appreciate it, thank you. I think this will be my approach going forward. Its so easy to translate what I'm thinking into python. Once I profile it and figure out what needs to be optimized, I'll write it in C. I realize that gaining experience with C can be helpful, and I am improving. I just really don't enjoy coding in C.

    • badpun 20 hours ago

      The problem with Python is that it can be so slow that you might be waiting for your algorithms to converge for a long time during each run. However, I agree that it is in general useful for sketching.

bjourne 15 hours ago

You're in academia. The goal isn't to write fast code it is to create (and publish!) knowledge. If it is easier for you to create knowledge in Python than in C you should use that. In HPC exact performance on a specific device with a specific runtime is uninteresting. What is interesting is how well the solution scales, what its bottle-necks and constraints are.

gregjor a day ago

Other commenters gave excellent and actionable answers to your question. I want to quibble about your reaction to C.

> I do not enjoy working with C at all. Its esoteric and difficult to understand, and just overall feels like I'm using a tool from the 70s.

Esoteric means "intended for or likely to be understood by only a small number of people with a specialized knowledge or interest." That does not describe C, a language widely understood and used by a large number of programmers, across application domains and programming interests.

Difficult to understand describes a reaction you have to learning C, not a property of the language. Again a very large number of programmers understand and use C, have for decades, and a huge amount of C code gets written and maintained constantly. The C language includes very few keywords and a simple syntax, and a small standard library compared to Python. People new to C usually trip over memory management, pointers, and the overall philosophy behind C, not learning the language itself.

C does date back to the late 1970s, but so does most hardware and software technology we use today. Newer does not equal better, and C has remained relevant and popular despite its age because it works so well. Toyota introduced the Corolla in the mid-1960s and it remains relevant and widely-used today, not to mention influential in the automobile industry. C occupies a similar position, a language that works so well it has staying power and has undergone relatively minor updates over time, unless you count derivative languages that expand on and perhaps improve on C -- C++, Go, Rust, Zig, many others.

Good luck with your project.

codingdave 19 hours ago

Few things are black and white to the point they should be saying a tool is worthless. Nor should we be declaring a binary right/wrong on their take on it. There is nuance and grey areas to all things, so the question is less "Are they right?", and more "What drives them to say such a thing?"

tripplyons a day ago

If you're studying HPC, I would say that it's probably worth learning how to do it the hard way, especially if you will have other projects that require doing so.

However, if you want to use Python, I would consider JAX. It has a nice set of operations based on numpy and scipy and can compile to many backends (GPU, TPU, CPU, etc.) using XLA. The compiler is great at finding ways to optimize routines that you might not think of. Some of the manual parallelism functionality is still considered experimental, but I haven't seen that cause any issues or prevent functionality.

DamonHD a day ago

Python will be much slower than C, but if you follow the path that you suggest and use Python mainly as glue to stick C-based performance critical parts together you could be fine.

I used to edit an HPC trade rag, and I've written a lot of performance-critical code in C and C++, and even in Java eg for high-speed trading.

As a now-old fresh PhD student I think that your profs are probably wrong!

sn9 15 hours ago

Using Python for glue code and compiled native code (whether C or C++ or Rust or whatever) is a classic strategy.

Just profile your code with something like Scalene: https://github.com/plasma-umass/scalene

Alternatively, you can just write it in Julia.

pyb 18 hours ago

" My thought is to write it in Python, profile it, and if needed, rewrite the slow parts in C. "

Your thinking is correct though sometimes, even with the best intentions, people stop at the first step. Fixing code that already works is usually not a high priority task.

thesuperbigfrog a day ago

There are already Python wrapper libraries for PETSc: https://petsc.org/release/petsc4py/

I am not a HPC user or developer, but I have written Python wrapper libraries for C libraries. It is fairly easy to do, but it looks like some of what you are looking for is already done.

  • PaulHoule a day ago

    I remember using a Lua front end to FORTRAN programs in the mid 1990s. Python is fine as a scripting language to script components written in a systems language but you don’t want to do billions of FLOPS with it.

  • megaloblasto a day ago

    Thank you for the reply. I do know about these wrappers, but the documentation is lacking, and there are a few things that I want to implement that are not possible in PETSc. I'm not sure I want to fork the repo and add the things that I need.

    • thesuperbigfrog a day ago

      As others have stated in the comments, scientific computing in pure Python is likely to be too slow and possibly difficult to deploy on HPC nodes.

      Most use of Python in scientific computing and research tends to be as a high-level "glue" or scripting language that calls extremely optimized C, C++, or FORTRAN libraries. Most of the libraries involved are decades old, very mature, and widely used.

      Depending on what you are trying to do, it might be possible to use CUDA or other GPU-based parallelization libraries.

      Rust-based libraries might exist, but are likely to be considerably younger, possibly buggy for some problem-set edge cases, and less widely used than their corresponding C, C++, or FORTRAN counterparts.

      Ambition and building new things to solve problems is always good, but your professors know what works. If you are building new things, I would make sure to compare your work with existing solutions to verify that the new stuff is behaving correctly.

      Best of luck in your endeavors!

fragmede a day ago

C++ seems like a good middle ground. Supports more modern features than pure C, but is compiled and is faster than Python. Also it sounds like your professors support that, vs not supporting Python.