英文:
Is there a way to multithread or batch REST API calls in Python?
问题 {#heading}
我有一个非常长的密钥列表,我正在使用每个密钥调用REST API以获取关于它的一些元数据。
API只能一次接受一个密钥,但我想知道是否有办法可以批量或多线程从我的一侧调用这些API? 英文:
I've got a very long list of keys, and I am calling a REST API with each key to GET some metadata about it.
The API can only accept one key at a time, but I wondered if there was a way I could batch or multi-thread the calls from my side?
答案1 {#1}
得分: 0
是的,有多种方法可以在Python中多线程或批量处理REST API调用,以提高程序性能。一种方法是使用concurrent.futures
模块,该模块提供了一个高级接口,用于异步执行函数,可以使用线程或进程。
以下是一个示例代码,演示了如何使用concurrent.futures
来批量执行多线程REST API调用:
import requests
from concurrent.futures import ThreadPoolExecutor
from itertools import islice
API_ENDPOINT = 'https://api.example.com/metadata'
def get_metadata(keys):
results = []
with ThreadPoolExecutor(max_workers=5) as executor:
for batch in iter(lambda: list(islice(keys, 5)), []):
futures = [executor.submit(get_metadata_for_key, key) for key in batch]
results += [future.result() for future in futures]
return results
def get_metadata_for_key(key):
url = f"{API_ENDPOINT}/{key}"
response = requests.get(url)
if response.status_code == 200:
return response.json()
else:
return None
在这个示例中,get_metadata
函数接受一个键列表,并使用ThreadPoolExecutor
来批量执行get_metadata_for_key
函数,每次处理5个键。islice
函数用于创建一个迭代器,从输入列表中返回5个键的批次。executor.submit
函数用于为批次中的每个键提交一个新任务,它返回一个concurrent.futures.Future
对象。future.result()
函数用于检索每个任务的结果并将其附加到结果列表中。
您可以修改max_workers
参数来控制用于执行任务的线程数。在这个示例中,我使用了5个线程。
英文:
Yes, there are ways to multithread or batch REST API calls in Python to improve the performance of your program. One way to do this is by using the concurrent.futures
module which provides a high-level interface for asynchronously executing functions using threads or processes.
Here's an example code that shows how you can use concurrent.futures
to perform multithreaded REST API calls in batches:
import requests
from concurrent.futures import ThreadPoolExecutor
from itertools import islice
API_ENDPOINT = 'https://api.example.com/metadata'
def get_metadata(keys):
results = []
with ThreadPoolExecutor(max_workers=5) as executor:
for batch in iter(lambda: list(islice(keys, 5)), []):
futures = [executor.submit(get_metadata_for_key, key) for key in batch]
results += [future.result() for future in futures]
return results
def get_metadata_for_key(key):
url = f"{API_ENDPOINT}/{key}"
response = requests.get(url)
if response.status_code == 200:
return response.json()
else:
return None
In this example, get_metadata
function takes a list of keys and uses the ThreadPoolExecutor
to execute get_metadata_for_key
function for each key in batches of 5. The islice
function is used to create an iterator that returns batches of 5 keys from the input list. The executor.submit
function is used to submit a new task to the thread pool for each key in the batch, which returns a concurrent.futures.Future
object. The future.result()
function is used to retrieve the result of each task and append it to the results list.
You can modify the max_workers
parameter to control the number of threads used for executing tasks. In this example, I'm using 5 threads.
答案2 {#2}
得分: 0
I will provide the translation of the code part you provided:
import requests
from concurrent.futures import ThreadPoolExecutor
API_ENDPOINT = 'https://api.example.com/metadata'
def get_metadata_for_key(key):
url = f"{API_ENDPOINT}/{key}"
response = requests.get(url)
if response.status_code == 200:
return response.json()
else:
return None
def get_save_metadata(keys, workers):
results = {}
batches = [keys[i : i + workers] for i in range(0, len(keys), workers)]
with ThreadPoolExecutor(max_workers=workers) as executor:
for batch in tqdm(batches): #tqdm shows a progress bar
futures = {key: executor.submit(get_metadata_for_key, key) for key in batch}
futures_clean = {k: v.result() for k, v in futures.items() if v is not None}
results.update({k: xmltodict.parse(v) for k, v in futures_clean.items()})
return results
Please note that the code remains in English as per your request. 英文:
The other reply to this looks like ChatGPT so it should be ignored.
I did, however, use its code as a base to write a function that does what I want.
import requests
from concurrent.futures import ThreadPoolExecutor
API_ENDPOINT = \'https://api.example.com/metadata\'
def get_metadata_for_key(key):
url = f\"{API_ENDPOINT}/{key}\"
response = requests.get(url)
if response.status_code == 200:
return response.json()
else:
return None
def get_save_metadata(keys, workers):
results = {}
batches = \[keys\[i : i + workers\] for i in range(0, len(keys), workers)\]
with ThreadPoolExecutor(max_workers=workers) as executor:
for batch in tqdm(batches): #tqdm shows a progress bar
futures = {key: executor.submit(get_metadata_for_key, key) for key in batch}
futures_clean = {k: v.result() for k, v in futures.items() if v is not None}
results.update({k: xmltodict.parse(v) for k, v in futures_clean.items()})
return results