Pandas is undoubtedly one of the most important Python libraries in data science and machine learning. Many practitioners in the field are using it on a daily basis. But even as an experienced user, you might have been puzzled by this warning: SettingWithCopyWarning. A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead.

It could be tempting to just ignore it, especially when the code does what you wanted. But taking the warning seriously does not only prevent bugs from happening in your data pipeline, it is also a good way to understand the internals of Pandas better. In this post, I will look at two different data transformations that both raise the warning, but operate differently.

Let us assume we have the following dataframe:

d = {'col1': [1.0, 2.0, 3.0], 'col2': [4.0, 5.0, np.nan]}
df = pd.DataFrame(data=d)
print(df)
#      col1  col2
#   0   1.0   4.0
#   1   2.0   5.0
#   2   3.0   NaN

Causes of the warning

Example 1

The first cause of the warning occurs like this: We might want to drop the missing values and call dropna() on our dataframe.

df_no_na = df.dropna()
print(df_no_na)
#      col1  col2
#   0     1   3.0
#   1     2   4.0

We got a new dataframe df_no_na. Pandas removed the row with missing values. Good job! Let’s continue our transformation, by creating a third column, which is the sum of column one and two.

df_no_na.['new_column'] = df_no_na['col1'] + df_no_na['col2']
# SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame.  Try using .loc[row_indexer,col_indexer] = value instead.

Oh no! We got SettingWithCopyWarning. Maybe we should just follow the instructions and use .loc[] indexing?

df_no_na.loc[:,'new_column'] = df_no_na['col1'] + df_no_na['col2']
# SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead.
Sadly, creating a new column for our dataframe df_no_na raised the warning, even though we tried using a different indexing method. This is somewhat puzzling because SettingWithCopyWarning suggests using .loc[row_indexer,col_indexer] = value instead and that is pretty much what we just tried above.

Example 2

Our first attempt to transform our dataframe was unsuccessful, lets try another approach: Instead of dropping all rows with missing values, we now want to replace the missing value in column 'col2' at row df.col1 == 3.0 with the value 0.0. We do this via indexing the row and the column:

df[df.col1 == 3]['col2'] = 0
# SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead.
This is a different kind of data transformation, but we get the same warning again. What is the underlying cause of this?

Indexing in Pandas

The first thing we have to understand is that the expression df[df.col1 == 3]['col2'] is so-called syntactic sugar, making it easier to express an operation, but potentially more difficult to understand the underlying process.

# expression with syntactic sugar
df[df.col1 == 3.0]['col2'] = 0.0

# executed code
df.__getitem__(df.__getitem__('col1') == 3.0).__setitem__('col2', 0.0)

Both versions will produce identical results and both cause the warning. The first one is certainly easier to read, however the second one reveals an important aspect of the operation: What we perform is a __setitem__ operation ON a __getitem__ operation. The inner __getitem__ expression indexes the row and the subsequent __setitem__ changes the value in the column 'col2' of that row. This chaining of a get operation and a set operation is what actually causes SettingWithCopyWarning.

# the 'chaining' cause of the warning in pseudo code
df.__getitem__('some location').__setitem__('some new value')

The warning in Pandas is implemented, because when a __setitem__ is chained to a __getitem__ operation the new values are only applied to a subset of our data and there is a chance that this operation is not what we want to do. It might be the case that this is our intention, but Pandas simply does not know and warns us that we are potentially Setting some new value only on copy of our dataframe.

Solutions

A recommended solution for Example 2 is using .loc[] indexing. This is actually what is proposed in the warning message and might be the main cause that the Pandas developers had in mind. With .loc[] there is no chaining, just a __setitem__ operation on the original dataframe.

# Our code
df.loc[df.col1 == 3.0, 'col2'] = 0.0

# Code executed
df.loc.__setitem__((df.__getitem__('col1') == 3.0, 'col2'), 0.0)
print(df)

      col1  col2
   0   1.0   4.0
   1   2.0   5.0
   2   3.0   0,0

If the filtering operation becomes longer and more complex, it is recommended to build a mask:

# mask
filter_mask = ((df.col1 == 3.0) & (df.col2 < 100))
df.loc[filter_mask]

      col1  col2
   2   3.0   0.0

But what about the second example? The issue is a little bit different here. If we call dropna() on the dataframe, Pandas actually returns a new dataframe, but the new dataframe keeps a trace from the old one in its private property _is_copy, which is usually none, but become a weakref in this case:

df_no_na._is_copy
<weakref at 0x1122c80e8; to 'DataFrame' at 0x109586828>

There are three potential solutions that can be applied here:

I. Modifying the original dataframe

# inplace
df.dropna(inplace=True)
print(df)
    col1  col2
0   1.0   4.0
1   2.0   5.0

The argument inplace can be found in many Pandas operations and allows you to change the return type of your function. The default value for inplace is False. Once set to True, it will return None and apply the operation to the object it was called upon. In this case, the object df is modified and the row with missing values was dropped from the original df.

II. Returning a deep copy of the dataframe

# copy
df_no_na = df.dropna().copy()

This is possibly the easiest but also the brute force approach to the problem. Calling copy on the dataframe returns a deep copy of the original object and no strings are attached to the old one anymore.

III. Resetting the _is_copy property

# resetting the _is_copy property
df_no_na = df.dropna()
df_no_na._is_copy = None
While this will achieve the same result as with the previous one, I would personally not recommend it. Intentionally modifying properties that are designed to be private is not a good coding practice. Only because Python allows it, does not mean you should do it. But what kind of solution you prefer is up to you. They will all do the job.

Configuring SettingWithCopyWarning

It is also possible to change the level of SettingWithCopyWarning from warning to error, which will break your code, to None which will just ignore it. It may be useful to adjust the error handling to your particular use of the library: For instance, it would be a wise decision to raise an error when code is running in production, whereas for prototyping and exploration, a warning may suffice.

# set level of severity to 'raise', 'warn' or None
pd.set_option('mode.chained_assignment', 'warn')

Concluding remarks

What is most puzzling about SettingWithCopyWarning to me, is that it occurs in seemingly unrelated contexts (just compare the examples above), and requires completely different types of solutions. The warning message itself certainly covers the case of chained assignment and provides a potential solution (.loc[] indexing), but it is rather misleading when it arises in the first example.

However, there is an underlying concept that SettingWithCopyWarning is all about: It warns you about whether the operation that is performed is applied to a view of the original dataframe or a copy of it and it wants to let you to know, that you should resolve this ambiguity in your code. In both examples there is a potential risk of assigning an operation to the wrong object. In this sense, SettingWithCopyWarning is a reminder of writing clear and explicit data transformations that, if taken seriously, will help you bring more robustness into your data transformations.