10 Ways of Analyzing High-Energy Particles with Python

Sabrina Amrouche
3 min readApr 8, 2021

In few lines of code, explore an LHC simulated particle collision. As an ambitious physics student who wants to dive into the data analysis and machine learning world or as an experienced data wizard who wants to play with light-speed particles, this post will equip you with the technical hacks and scientific perspective you have been looking for!

But first, the data

Simulation data points generated when protons collide in the Large Hadron Collider (LHC). You like this visualization? Checkout my HEP viz post.

When protons collide at nearly the speed of light, thousands new particles emerge in the detector. Since every HEP detector is designed to trace the passage of particles, each interaction is recorded as a measurement. These are the measurements you can see in the figure above. There are, in fact, 100K of them. This dataset was released as part of TrackML challenge.

Information attached to every measurement.

Now this table can be created with the following two lines if you already have the specified file.

import pandas as pddata=pd.read_csv("lhc_event00001.csv")

A pandas data frame allows to perform powerful operations on the data event. For example, if we want to retrieve the electrons in the event:

data[data.particle_type=="electron"]

The same syntax can be used to condition on any “feature” in the event. Since the particle_id is shared among all measurements produced by the same particle, it can be used to “group” measurements. The following line groups back the measurements into particles:

data.groupby("particle_id")

When looping over the particle groups, we are able to print the measurements of each particle.

for k,v in data.groupby("particle_id"):
print (k,v)

We can also select few particles we are interested in and plot their trajectories within the detector:

Trajectories of two muons (red) and two electrons (black)

The quickest way to get an idea of the particle content of the event is through the Counter() function.

from collections import Counter
Counter(data.particle_type)

The bar() plot in matplotlib allow to nicely place the above numbers into a chart.

import matplotlib.pyplot as plt
cp=Counter(data.particle_type)
plt.bar(list(cp.keys()),list(cp.values()),width=.4)
plt.yscale("log")
plt.show()
Distribution of particle types in an event

A very interesting quantity in high energy physics is the transverse momentum of a particle “PT”. From the momentum components contained in the dataset, we compte PT as follows:

import numpy as np
pt=np.sqrt(data.px.values**2+data.py.values**2)
#assign it to the data frame
data["pt"]=pt

To visualize the transverse momentum distribution of particles (not measurements) we use:

dict_pt=dict(zip(data.particle_id, data.pt))plt.hist(dict_pt.values(),bins=np.arange(0,10,.1),alpha=.5)
plt.yscale("log")
plt.xlabel("Transverse momentum (GeV)",fontsize=15)
plt.show()

A 100 MeV cut appears in the distribution and this means that particles with lower transverse momentum are discarded in the simulation.

The vertex is an accurate estimation of where the particle comes from. An essential task in high energy physics is to retrieve the trajectories of particle to do exacty this: vertexing!

Let’s look at the distribution in 3D of all our vertices.

plt.plot(data.x,data.y,data.z,".",alpha=0.1)
plt.plot(data.vx,data.vy,data.vz,"k.")
Measurements collected during a collision and their associated vertices in back.

The full github repository to generate these cool plots can be here.

Curious about where machine learning fits in or how to retrieve vertices from tracks? Stay tuned for the next post!

--

--