51工具盒子

依楼听风雨
笑看云卷云舒,淡观潮起潮落

Mongo DB / Python – Search DB for string but limit results to 1 of each item based on specified field

英文:

Mongo DB / Python - Search DB for string but limit results to 1 of each item based on specified field

问题 {#heading}

我正在尝试搜索我的MongoDB产品。数据集中有多个相同的产品以记录随时间变化的价格。我想搜索一个短语,然后将结果限制为每个UPC的一个。我的当前代码效果不错,但会返回多个相同UPC。

当前代码,将返回多个相同UPC:

response = self.DB.find({'$text': {'$search': f'/{search}/'}}, {'Response': 0, '_id': 0}).sort("timestamp", -1)

示例数据集:

{
  "_id": {
    "$oid": "64cf05707844ef1a25ee57fa"
  },
  "upc": "032622013625",
  "name": "Luigi Bormioli Michelangelo Beverage 20oz Set of 4",
  "salePrice": 29.99,
  "timestamp": "2023-08-05 22:29:04 EDT-0400"
}

{
  "_id": {
    "$oid": "64cf057c7844ef1a25ee57fd"
  },
  "upc": "048894970887",
  "name": "Basic Window Fan - Holmes",
  "salePrice": 54.99,
  "available": false,
  "timestamp": "2023-08-05 22:29:16 EDT-0400"
}

{
  "_id": {
    "$oid": "64cf05707844ef1a25ee57fa"
  },
  "upc": "032622013625",
  "name": "Luigi Bormioli Michelangelo Beverage 20oz Set of 4",
  "salePrice": 29.97,
  "timestamp": "2023-08-04 13:25:09 EDT-0400"
}

不确定是否应该使用distinct还是find? 英文:

I am trying to search my MongoDB of products. The dataset has multiple of each product to record price over time. I would like to search for a phrase then limit the results to 1 of each UPC. My current code works well but will return multiple of the same UPC.

Current Code, will return multiple of the same UPC:

response = self.DB.find({'$text': {'$search': f'/{search}/'}}, {'Response': 0, '_id': 0}).sort("timestamp", -1)

Example Data Set:

{
  "_id": {
    "$oid": "64cf05707844ef1a25ee57fa"
  },
  "upc": "032622013625",
  "name": "Luigi Bormioli Michelangelo Beverage 20oz Set of 4",
  "salePrice": 29.99,
  "timestamp": "2023-08-05 22:29:04 EDT-0400",

}
}
{
\"_id\": {
\"$oid\": \"64cf057c7844ef1a25ee57fd\"
},
\"upc\": \"048894970887\",
\"name\": \"Basic Window Fan - Holmes\",
\"salePrice\": 54.99,
\"available\": false,
\"timestamp\": \"2023-08-05 22:29:16 EDT-0400\",


    }




}
}
{
\"_id\": {
\"$oid\": \"64cf05707844ef1a25ee57fa\"
},
\"upc\": \"032622013625\",
\"name\": \"Luigi Bormioli Michelangelo Beverage 20oz Set of 4\",
\"salePrice\": 29.97,
\"timestamp\": \"2023-08-04 13:25:09 EDT-0400\",

`}
}
`

Not sure if I should be using distinct, or find?

答案1 {#1}

得分: 1

你可以在聚合管道中使用"$top""$group"来获取你的结果。如果你只想返回特定的字段,你可以使用"$project"阶段。以下是示例代码:

response = self.DB.aggregate([
  {
    "$match": {'$text': {'$search': f'/{search}/'}}
  },
  {
    "$group": {
      "_id": "$upc",
      "mostRecent": {
        "$top": {
          "sortBy": {
            "timestamp": -1
          },
          "output": "$$ROOT"
        }
      }
    }
  },
  {
    "$replaceWith": "$mostRecent"
  }
])

希望这能帮助你。 英文:

You could use "$top" with "$group" in an aggregation pipeline to get your result. If you only want certain fields returned, you could use a "$project" stage.

response = self.DB.aggregate([
  {
    "$match": {'$text': {'$search': f'/{search}/'}}
  },
  {
    "$group": {
      "_id": "$upc",
      "mostRecent": {
        "$top": {
          "sortBy": {
            "timestamp": -1
          },
          "output": "$$ROOT"
        }
      }
    }
  },
  {
    "$replaceWith": "$mostRecent"
  }
])

答案2 {#2}

得分: 0

我使用存储UPC ID的字典过滤了结果数组,并将其附加到文档列表中,如果UPC ID不存在。

import pymongo

myclient = pymongo.MongoClient("mongodb://mongoadmin:ansible@localhost:27017/")
mydb = myclient["mydatabase"]
mycol = mydb["product"]
mycol.drop()

data = [
    {
        "_id": {
            "oid": "64cf05707844ef1a25ee57fa"
        },
        "upc": "032622013625",
        "name": "Luigi Bormioli Michelangelo Beverage 20oz Set of 4",
        "salePrice": 29.99,
        "timestamp": "2023-08-05 22:29:04 EDT-0400"
    },
    {
        "_id": {
            "oid": "64cf057c7844ef1a25ee57fd"
        },
        "upc": "048894970887",
        "name": "Basic Window Fan - Holmes",
        "salePrice": 54.99,
        "available": False,
        "timestamp": "2023-08-05 22:29:16 EDT-0400"
    },
    {
        "_id": {
            "oid": "64cf05707844ef1a25ee57fb"
        },
        "upc": "032622013625",
        "name": "Luigi Bormioli Michelangelo Beverage 20oz Set of 4",
        "salePrice": 29.97,
        "timestamp": "2023-08-04 13:25:09 EDT-0400"
    }
]

for d in data:
    x = mycol.insert_one(d)

resp = mycol.create_index(
    [
        ("upc", "text")
    ]
)
print(resp)

search = "032622013625"
response = mycol.find({"$text": {"$search": f"/{search}/"}}, {'Response': 0, '_id': 0}).sort("timestamp", -1)

list_upc_already_seen = []
list_documents = []

for doc in response:
    upc = doc.get("upc")
    if upc not in list_upc_already_seen:
        list_documents.append(doc)
        list_upc_already_seen.append(upc)

print(list_documents)

[{ 'upc': '032622013625', 'name': 'Luigi Bormioli Michelangelo Beverage 20oz Set of 4', 'salePrice': 29.99, 'timestamp': '2023-08-05 22:29:04 EDT-0400' }] 英文:

I filtered the array of results by using a dictionary storing the upc id, and append into a list of documents if upc id is not existing.

import pymongo

myclient = pymongo.MongoClient(\"mongodb://mongoadmin:ansible@localhost:27017/\")
mydb = myclient\[\"mydatabase\"\]
mycol = mydb\[\"product\"\]
mycol.drop()


data=\[
{
\"_id\": {
\"oid\": \"64cf05707844ef1a25ee57fa\"
},
\"upc\": \"032622013625\",
\"name\": \"Luigi Bormioli Michelangelo Beverage 20oz Set of 4\",
\"salePrice\": 29.99,
\"timestamp\": \"2023-08-05 22:29:04 EDT-0400\"


},
{
\"_id\": {
\"oid\": \"64cf057c7844ef1a25ee57fd\"
},
\"upc\": \"048894970887\",
\"name\": \"Basic Window Fan - Holmes\",
\"salePrice\": 54.99,
\"available\": False,
\"timestamp\": \"2023-08-05 22:29:16 EDT-0400\"


},
{
\"_id\": {
\"oid\": \"64cf05707844ef1a25ee57fb\"
},
\"upc\": \"032622013625\",
\"name\": \"Luigi Bormioli Michelangelo Beverage 20oz Set of 4\",
\"salePrice\": 29.97,
\"timestamp\": \"2023-08-04 13:25:09 EDT-0400\"


}
\]


for d in data:

x = mycol.insert_one(d)


resp=mycol.create_index(
\[
(\"upc\", \"text\")
\]
)
print(resp)


search=\"032622013625\"
response = mycol.find( { \"$text\": { \"$search\": f\"/{search}/\"}}, {\'Response\': 0, \'_id\': 0}).sort(\"timestamp\", -1)


list_upc_already_seen=\[\]
list_documents=\[\]


for doc in response:
upc=doc.get(\"upc\")
if upc not in list_upc_already_seen:
list_documents.append(doc)
list_upc_already_seen.append(upc)

`print(list_documents)
`

[{'upc': '032622013625', 'name': 'Luigi Bormioli Michelangelo Beverage 20oz Set of 4', 'salePrice': 29.99, 'timestamp': '2023-08-05 22:29:04 EDT-0400'}]


赞(4)
未经允许不得转载:工具盒子 » Mongo DB / Python – Search DB for string but limit results to 1 of each item based on specified field