Blog Pengembang Android

[ad_1]

Rilis 25.08 Rapids terus mendorong batas menuju membuat ilmu data yang dipercepat lebih mudah diakses dan diukur dengan penambahan beberapa fitur baru, termasuk:

Dua alat profil baru untuk pemecahan masalah CUML.Accel Code
Dukungan untuk data yang lebih besar dan lebih kompleks di mesin GPU Polar
Dukungan Algoritma Baru di CUML dan CUML.Accel
Pembaruan Dukungan Versi CUDA

Pelajari lebih lanjut tentang fitur baru di bawah ini.

Rilis 25.08 membawa penambahan dua opsi profil baru ke cuml.accel. Mirip dengan profiler yang sebelumnya dirilis untuk cudf.panda, fitur profil baru ini membantu pengguna memahami operasi mana yang dipercepat oleh CUML pada GPU, yang kembali berjalan pada CPU, dan berapa lama operasi ini berlangsung. Ini dapat berguna bagi pengguna yang mencoba memahami kemacetan kinerja saat ini dalam alur kerja pembelajaran mesin mereka.

Pertama, kami memperkenalkan profiler tingkat fungsi. Profiler ini menunjukkan kepada pengguna semua operasi dalam skrip atau sel tertentu yang dijalankan pada GPU vs CPU. Ini juga menunjukkan jumlah waktu yang diambil masing -masing fungsi pada masing -masing.

Ada dua cara untuk menggunakan profiler tingkat fungsi. Jika menjalankan notebook Jupyter atau Ipython, pengguna dapat menelepon %%cuml.accel.profile setelah cuml.accel telah dimuat dan profil seluruh sel:

%%cuml.accel.profile


from sklearn.linear_model import Ridge
from sklearn.datasets import make_regression


X, y = make_regression(n_samples=100)


# Fit and predict on GPU
ridge = Ridge(alpha=1.0)
ridge.fit(X, y)
ridge.predict(X)


# Retry, using an unsupported hyperparameter
ridge = Ridge(positive=True)
ridge.fit(X, y)
ridge.predict(X)

Output sel ini berisi hasil profil:

cuml.accel profile                                             
┏━━━━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━┓
┃ Function      ┃ GPU calls ┃ GPU time ┃ CPU calls ┃ CPU time ┃
┡━━━━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━┩
│ Ridge.fit     │         1 │  141.2ms │         1 │      3ms │
│ Ridge.predict │         1 │   31.5ms │         1 │   97.3µs │
├───────────────┼───────────┼──────────┼───────────┼──────────┤
│ Total         │         2 │  172.7ms │         2 │    3.1ms │
└───────────────┴───────────┴──────────┴───────────┴──────────┘
Not all operations ran on the GPU. The following functions required CPU fallback for the following reasons:
* Ridge.fit
  - `positive=True` is not supported
* Ridge.predict
  - Estimator not fit on GPU

Profiler tingkat fungsi juga dapat dipanggil pada skrip Python menggunakan --profile Bendera dari CLI:

python -m cuml.accel --profile script.py

Profiler kedua adalah profiler tingkat garis, menunjukkan kepada pengguna di mana setiap bagian kode dieksekusi garis demi garis. Seperti profiler tingkat fungsi, profiler tingkat garis dapat dipanggil dalam buku catatan dengan %%cuml.accel.line_profile.

%%cuml.accel.line_profile


from sklearn.linear_model import Ridge
from sklearn.datasets import make_regression


X, y = make_regression(n_samples=100)


# Fit and predict on GPU
ridge = Ridge(alpha=1.0)
ridge.fit(X, y)
ridge.predict(X)


# Retry, using an unsupported hyperparameter
ridge = Ridge(positive=True)
ridge.fit(X, y)
ridge.predict(X)

cuml.accel line profile                                                    
┏━━━━┳━━━┳━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃  # ┃ N ┃    Time ┃ GPU % ┃ Source                                       ┃
┡━━━━╇━━━╇━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│  1 │ 1 │       - │     - │ from sklearn.linear_model import Ridge       │
│  2 │ 1 │       - │     - │ from sklearn.datasets import make_regression │
│  3 │   │         │       │                                              │
│  4 │   │         │       │                                              │
│  5 │ 1 │   1.1ms │     - │ X, y = make_regression(n_samples=100)        │
│  6 │   │         │       │                                              │
│  7 │   │         │       │                                              │
│  8 │   │         │       │ # Fit and predict on GPU                     │
│  9 │ 1 │       - │     - │ ridge = Ridge(alpha=1.0)                     │
│ 10 │ 1 │ 174.2ms │  99.0 │ ridge.fit(X, y)                              │
│ 11 │ 1 │   5.2ms │  99.0 │ ridge.predict(X)                             │
│ 12 │   │         │       │                                              │
│ 13 │   │         │       │                                              │
│ 14 │   │         │       │ # Retry, using an unsupported hyperparameter │
│ 15 │ 1 │       - │     - │ ridge = Ridge(positive=True)                 │
│ 16 │ 1 │   4.5ms │   0.0 │ ridge.fit(X, y)                              │
│ 17 │ 1 │ 172.7µs │   0.0 │ ridge.predict(X)                             │
│ 18 │   │         │       │                                              │
└────┴───┴─────────┴───────┴──────────────────────────────────────────────┘
Ran in 185.6ms, 96.4% on GPU

Profiler garis juga dapat dipanggil melalui --line-profile Bendera dari baris perintah:

python -m cuml.accel --line-profile script.py

Dengan kemampuan profil baru ini, cuml.accel Memberikan lebih banyak alat untuk membuat kode pembelajaran mesin yang mempercepat dan men -debugging lebih mudah.

Model Streamer		Safetensors Loader
Concurrency	Time to load model to GPU (sec.)	Time to load model to GPU (sec.)
1	47.56	47.99
4	14.43
8	14.42
16	14.34

Tensorizer
Number of readers	Time to load model to GPU (sec.)
1	50.74
4	17.38
8	16.49
16	16.11
32	17.18
64	16.44
100	16.81

Model Streamer		Safetensors Loader
Concurrency	Time to load model to GPU (sec.)	Time to load model to GPU (sec.)
1	43.71	47
4	11.19
8	7.53
16	7.61
20	7.62

Tensorizer
Number of readers	Time to load model to GPU (sec.)
1	43.85
4	14.44
8	10.36
16	10.61
32	10.95

Model Streamer
Concurrency	Time to load model to GPU (sec.)
4	28.24
16	8.45
32	4.88
64	5.01

vLLM with different loaders
Loader	Total time until vLLM engine is ready for requests (sec.)
Safetensors Loader	66.13
Model Streamer	35.08
Tensorizer	36.19

How is a model loaded to a GPU for inference?

How does the Model Streamer work?

How does the HF Safetensors Loader work?

How does the CoreWeave Tensorizer work?

Where loading meets inference engines: Loading weights with vLLM

Comparing model loader performance across three storage types

Experiment setup

Experiment #1 results: GP3 SSD

Experiment #2: IO2 SSD

Experiment #3: S3

‍Experiment #4: vLLM with all loaders

Get started with NVIDIA Run:ai Model Streamer

Jaringan Utara -Selatan: Kunci untuk beban kerja AI perusahaan yang lebih cepat

What is speculative decoding?

Speculative decoding basics using draft-target and EAGLE-3

What is the draft-target approach to speculative decoding?

Draft generation

Parallel verification

Rejection sampling

What is the EAGLE approach to speculative decoding?

What is the EAGLE head?

What is Multi-Token-Prediction in DeepSeek-R1?

How to implement speculative decoding

How does speculative decoding impact inference latency?

Get started with speculative decoding

Acknowledgments

Proses data yang lebih besar dan lebih kompleks dengan mesin GPU POLAR yang ditenagai oleh NVIDIA CUDF

Bekerja dengan set data yang lebih besar dari memori GPU dengan pelaksana streaming default baru

Simpan data kompleks seperti struct dan operasi string di GPU

Algoritma baru didukung dalam cuml: embedding spektral, linearsvc, linearsvr, dan kernelridge

Dukungan Dukungan Cuda 11

Kesimpulan

Integrasi umpan RSS