英文:

What is the shortest way to drop partial duplicates from a list of tuple in Python without using Pandas?

问题 {#heading}

我有一个元组列表，每个元组的结构如下：(姓名，年龄，城市)。我的列表中最多有大约30个元组。

没有重复项。但是有时姓名和年龄会重复。

示例输入可能如下所示：

lst = [("Dave", 20, "Dublin"), ("Dave", 20, "Paris"), ("Lisa", 20, "Monaco"), ("Lisa", 20, "London"), ("Frank", 56, "Berlin"),  ("Frank", 40, "Berlin")]

我想要删除部分重复项，其中子集将是姓名和年龄，但不包括城市。理想情况下，我想保留第一个重复项。我猜示例会更容易理解：

预期输出：

expected_lst = [("Dave", 20, "Dublin"), ("Lisa", 20, "Monaco"), ("Frank", 56, "Berlin"),  ("Frank", 40, "Berlin")]

Dave和Lisa的重复项被删除了，但Frank没有被删除，因为年龄不匹配。

我到目前为止尝试过的方法：

我查看了以下帖子：

但它们似乎不符合我所要求的，我没有成功理解如何将这些解决方案应用于我的情况。

我找到了一个似乎有效的解决方案，即将我的列表转换为pandas DataFrame，然后使用drop_duplicates()函数及其subset参数删除重复项：

df = pd.DataFrame(lst, columns=["Name", "Age", "City"]).drop_duplicates(subset=["Name", "Age"])

然后使用itertuples将其转换回列表。

expected_lst = list(df.itertuples(index=False, name=None))

但是，我不需要pandas来完成我的代码的任何其他步骤。更改我的数据类型似乎有点"过于"。

因此，我想知道是否有更好的方法来获得我期望的输出，可能要更快或更短？我不是专家，但我认为将列表转换为pandas DataFrame，然后再转回列表的效率不是很高？英文:

I have a list of tuples where each tuple is structured like this : (Name, Age, City). I have, at most, about 30 tuples in my list.

There are no duplicates. However, sometimes, Name and Age are duplicated.

Example input would be something like this :

lst = [(&quot;Dave&quot;, 20, &quot;Dublin&quot;), (&quot;Dave&quot;, 20, &quot;Paris&quot;), (&quot;Lisa&quot;, 20, &quot;Monaco&quot;), (&quot;Lisa&quot;, 20, &quot;London&quot;), (&quot;Frank&quot;, 56, &quot;Berlin&quot;),  (&quot;Frank&quot;, 40, &quot;Berlin&quot;)]

I would like to remove partial duplicates, where the subset would be Name and Age but not City. Ideally I'd like to keep the first duplicate. I guess an example would make it easier to understand :

Expected output :

expected_lst = [(Dave, 20, Dublin), (Lisa, 20, Monaco), (Frank, 56, Berlin),  (Frank, 40, Berlin)]

Dave's and Lisa's duplicates were removed, but not Frank since the Age does not match.

What I have tried so far :

I checked these posts :

But they do not seem to match what I'm asking for and I didn't manage to understand how to apply the solutions to my case.

I did find a solution that seems to work, which is to convert my list to a pandas DataFrame and then drop duplicates using the drop_duplicates() function and its subset parameter :

df = pd.DataFrame(lst, columns= [&quot;Name&quot;, &quot;Age&quot;, &quot;City&quot;]).drop_duplicates(subset=([&quot;Name&quot;, &quot;Age&quot;]))

And then using itertuples to convert it back to a list.

expected_lst = list(df.itertuples(index=False, name=None))

However, I do not need pandas for any of the other steps of my code. Changing the type of my data seems a bit "much".

I was therefore wondering if there was a better way to get my expected output, that would maybe either be quicker or shorter to write ? I'm not an expert but I assume that converting a list to a pandas DataFrame and then back to a list is not very efficient ?

答案1 {#1}

得分: 2

你可以使用元组中的"唯一"元素（姓名、年龄）作为字典键，值是完整的元组。因此，姓名+年龄是唯一的。

为了确保保留第一个条目，你需要在插入之前检查(temp中是否包含) (name, age)。编辑：或者只需像MatBailie所说的那样反转列表。

data = [("Dave", 20, "Dublin"), ("Dave", 20, "Paris"), ("Lisa", 20, "Monaco"), ("Lisa", 20, "London"), ("Frank", 56, "Berlin"), ("Frank", 40, "Berlin")]
temp = {(name, age) : (name, age, city) for name, age, city in reversed(data)}
for unique_item in temp.values():
print(unique_item)

输出：

('Dave', 40, 'Paris')
('Lisa', 20, 'London')
('Frank', 56, 'Berlin')
('Frank', 40, 'Berlin') 英文:

You can use the tuple of the "unique" elements (name, age) as dict key, where the value is the full tuple. Thus the name+age is unique.

In order to ensure you keep the first entry, you need to check if (name, age) is in temp before inserting it. edit: or just reverse the list, like MatBailie said

data = [(&quot;Dave&quot;, 20, &quot;Dublin&quot;), (&quot;Dave&quot;, 20, &quot;Paris&quot;), (&quot;Lisa&quot;, 20, &quot;Monaco&quot;), (&quot;Lisa&quot;, 20, &quot;London&quot;), (&quot;Frank&quot;, 56, &quot;Berlin&quot;),  (&quot;Frank&quot;, 40, &quot;Berlin&quot;)]
`temp = {(name, age) : (name, age, city) for name, age, city in reversed(data)}
for unique_item in temp.values():
print(unique_item)
`

>('Dave', 40, 'Paris')
('Lisa', 20, 'London')
('Frank', 56, 'Berlin')
('Frank', 40, 'Berlin')

答案2 {#2}

得分: 0

你可以利用 itertools.groupby，以元组的前两个元素作为键值（首先需要对数据进行排序，因为 groupby 在连续条目上操作）：

from itertools import groupby
filtered_data = [next(g) for k, g in groupby(sorted(data), key=lambda tup: tup[:2])]
[(&#39;Dave&#39;, 20, &#39;Dublin&#39;), (&#39;Frank&#39;, 40, &#39;Berlin&#39;), (&#39;Frank&#39;, 56, &#39;Berlin&#39;), (&#39;Lisa&#39;, 20, &#39;London&#39;)]

当然，这仅在初始元组的顺序对你不重要时才有效。否则，@KennyOstrom 的回答会保留原始顺序。英文:

You could make use of itertools.groupby, using the first 2 elements of your tuples as a key (you first need to sort the data, since groupby operates on consecutive entries):

from itertools import groupby
filtered_data = [next(g) for k,g in groupby(sorted(data), key=lambda tup: tup[:2])]
[(&amp;#39;Dave&amp;#39;, 20, &amp;#39;Dublin&amp;#39;), (&amp;#39;Frank&amp;#39;, 40, &amp;#39;Berlin&amp;#39;), (&amp;#39;Frank&amp;#39;, 56, &amp;#39;Berlin&amp;#39;), (&amp;#39;Lisa&amp;#39;, 20, &amp;#39;London&amp;#39;)]

Of course, this only works if the initial order of tuples doesn't matter to you. Otherwise, @KennyOstrom's answer preserves the original order.

51工具盒子

在Python中，不使用Pandas，从元组列表中删除部分重复的最短方式是什么？

问题 {#heading}

答案1 {#1}

答案2 {#2}

[('Dave', 20, 'Dublin'), ('Frank', 40, 'Berlin'), ('Frank', 56, 'Berlin'), ('Lisa', 20, 'London')]

[(&#39;Dave&#39;, 20, &#39;Dublin&#39;), (&#39;Frank&#39;, 40, &#39;Berlin&#39;), (&#39;Frank&#39;, 56, &#39;Berlin&#39;), (&#39;Lisa&#39;, 20, &#39;London&#39;)]

厉飞雨

相关推荐

最新文章

猜你喜欢

快捷分类

问题 {#heading}

答案1 {#1}

答案2 {#2}

[(&#39;Dave&#39;, 20, &#39;Dublin&#39;), (&#39;Frank&#39;, 40, &#39;Berlin&#39;), (&#39;Frank&#39;, 56, &#39;Berlin&#39;), (&#39;Lisa&#39;, 20, &#39;London&#39;)]

[(&amp;#39;Dave&amp;#39;, 20, &amp;#39;Dublin&amp;#39;), (&amp;#39;Frank&amp;#39;, 40, &amp;#39;Berlin&amp;#39;), (&amp;#39;Frank&amp;#39;, 56, &amp;#39;Berlin&amp;#39;), (&amp;#39;Lisa&amp;#39;, 20, &amp;#39;London&amp;#39;)]

厉飞雨

相关推荐

最新文章

猜你喜欢

快捷分类

[('Dave', 20, 'Dublin'), ('Frank', 40, 'Berlin'), ('Frank', 56, 'Berlin'), ('Lisa', 20, 'London')]

[('Dave', 20, 'Dublin'), ('Frank', 40, 'Berlin'), ('Frank', 56, 'Berlin'), ('Lisa', 20, 'London')]