Search
Login

PwC

Name
Code Link
Description
Retrieved
Stars
https://github.com/stanford-oval/suql
This paper presents the first conversational agent that supports the full generality of hybrid data access for large knowledge corpora, through a language we developed called SUQL (Structured and Unstructured Query Language). Code: https://github.com/stanford-oval/suql
2024/05/07
126
https://github.com/mlc-ai/web-llm
Finally, we build an end-to-end framework on top of our abstraction to automatically optimize deep learning models for given tensor computation primitives. Code: https://github.com/mlc-ai/web-llm
2024/05/07
9577
https://github.com/efficient-large-model/vila
Visual language models (VLMs) rapidly progressed with the recent success of large language models. Code: https://github.com/efficient-large-model/vila
2024/05/07
478
https://github.com/cpacker/memgpt
Large language models (LLMs) have revolutionized AI, but are constrained by limited context windows, hindering their utility in tasks like extended conversations and document analysis. Code: https://github.com/cpacker/memgpt
2024/05/04
9235
https://github.com/ml-gsai/microdreamer
In this paper, we introduce score-based iterative reconstruction (SIR), an efficient and general algorithm for 3D generation with a multi-view score-based diffusion model. Code: https://github.com/ml-gsai/microdreamer
2024/05/04
46
https://github.com/suzgunmirac/meta-prompting
This collaborative prompting approach empowers a single LM to simultaneously act as a comprehensive orchestrator and a panel of diverse experts, significantly enhancing its performance across a wide array of tasks. Code: https://github.com/suzgunmirac/meta-prompting
2024/05/04
227
https://github.com/jinhualiang/wavcraft
We introduce WavCraft, a collective system that leverages large language models (LLMs) to connect diverse task-specific models for audio content creation and editing. Code: https://github.com/jinhualiang/wavcraft
2024/05/04
189
https://github.com/runyiyang/sundae
However, this comes with high memory consumption, e. g., a well-trained Gaussian field may utilize three million Gaussian primitives and over 700 MB of memory. Code: https://github.com/runyiyang/sundae
2024/05/04
48
https://github.com/nvlabs/radio
A handful of visual foundation models (VFMs) have recently emerged as the backbones for numerous downstream tasks. Code: https://github.com/nvlabs/radio
2024/05/04
176
https://github.com/2471023025/ralm_survey
Large Language Models (LLMs) have catalyzed significant advancements in Natural Language Processing (NLP), yet they encounter challenges such as hallucination and the need for domain-specific knowledge. Code: https://github.com/2471023025/ralm_survey
2024/05/04
76
https://github.com/facebookresearch/lightplane
Contemporary 3D research, particularly in reconstruction and generation, heavily relies on 2D images for inputs or supervision. Code: https://github.com/facebookresearch/lightplane
2024/05/04
126
https://github.com/prometheus-eval/prometheus-eval
Proprietary LMs such as GPT-4 are often employed to assess the quality of responses from various LMs. Code: https://github.com/prometheus-eval/prometheus-eval
2024/05/04
131
https://github.com/hvision-nku/storydiffusion
This module converts the generated sequence of images into videos with smooth transitions and consistent subjects that are significantly more stable than the modules based on latent spaces only, especially in the context of long video generation. Code: https://github.com/hvision-nku/storydiffusion
2024/05/04
394
https://github.com/kindxiaoming/pykan
Inspired by the Kolmogorov-Arnold representation theorem, we propose Kolmogorov-Arnold Networks (KANs) as promising alternatives to Multi-Layer Perceptrons (MLPs). Code: https://github.com/kindxiaoming/pykan
2024/05/04
4923
https://github.com/chenyangzhu1/multibooth
MultiBooth addresses these issues by dividing the multi-concept generation process into two phases: a single-concept learning phase and a multi-concept integration phase. Code: https://github.com/chenyangzhu1/multibooth
2024/05/01
90
https://github.com/tothebeginning/pulid
We propose Pure and Lightning ID customization (PuLID), a novel tuning-free ID customization method for text-to-image generation. Code: https://github.com/tothebeginning/pulid
2024/05/01
218
https://github.com/FoundationVision/Groma
We introduce Groma, a Multimodal Large Language Model (MLLM) with grounded and fine-grained visual perception ability. Code: https://github.com/FoundationVision/Groma
2024/05/01
260
https://github.com/snap-stanford/stark
Answering real-world user queries, such as product search, often requires accurate retrieval of information from semi-structured knowledge bases or databases that involve blend of unstructured (e. g., textual descriptions of products) and structured (e. g., entity relations of products) information. Code: https://github.com/snap-stanford/stark
2024/05/01
151
https://github.com/MangoKiller/MolTC
Molecular Relational Learning (MRL), aiming to understand interactions between molecular pairs, plays a pivotal role in advancing biochemical research. Code: https://github.com/MangoKiller/MolTC
2024/05/01
113
https://github.com/magic-research/PLLaVA
PLLaVA achieves new state-of-the-art performance on modern benchmark datasets for both video question-answer and captioning tasks. Code: https://github.com/magic-research/PLLaVA
2024/05/01
207
https://github.com/zzxslp/som-llava
Set-of-Mark (SoM) Prompting unleashes the visual grounding capability of GPT-4V, by enabling the model to associate visual objects with tags inserted on the image. Code: https://github.com/zzxslp/som-llava
2024/04/28
41
https://github.com/opengvlab/internvl
Compared to both open-source and proprietary models, InternVL 1. 5 shows competitive performance, achieving state-of-the-art results in 8 of 18 benchmarks. Code: https://github.com/opengvlab/internvl
2024/04/28
988
https://github.com/ToruOwO/hato
Two significant challenges exist: the lack of an affordable and accessible teleoperation system suitable for a dual-arm setup with multifingered hands, and the scarcity of multifingered hand hardware equipped with touch sensing. Code: https://github.com/ToruOwO/hato
2024/04/28
37
https://github.com/JackAILab/ConsistentID
ConsistentID comprises two key components: a multimodal facial prompt generator that combines facial features, corresponding facial descriptions and the overall facial context to enhance precision in facial details, and an ID-preservation network optimized through the facial attention localization strategy, aimed at preserving ID consistency in facial regions. Code: https://github.com/JackAILab/ConsistentID
2024/04/28
128
https://github.com/microsoft/FILM
While many contemporary large language models (LLMs) can process lengthy input, they still struggle to fully utilize information within the long context, known as the lost-in-the-middle challenge. Code: https://github.com/microsoft/FILM
2024/04/28
79
https://github.com/dcharatan/flowmap
This paper introduces FlowMap, an end-to-end differentiable method that solves for precise camera poses, camera intrinsics, and per-frame dense depth of a video sequence. Code: https://github.com/dcharatan/flowmap
2024/04/28
382
https://github.com/ericlbuehler/mistral.rs
Starting with a set of pre-trained LoRA adapters, our gating strategy uses the hidden states to dynamically mix adapted layers, allowing the resulting X-LoRA model to draw upon different capabilities and create never-before-used deep layer-wise combinations to solve tasks. Code: https://github.com/ericlbuehler/mistral.rs
2024/04/28
506
https://github.com/apple/corenet
Contrastive learning has emerged as a transformative method for learning effective visual representations through the alignment of image and text embeddings. Code: https://github.com/apple/corenet
2024/04/28
5136
https://github.com/lean-dojo/leancopilot
In this paper, we explore LLMs as copilots that assist humans in proving theorems. Code: https://github.com/lean-dojo/leancopilot
2024/04/25
777
https://github.com/princeton-vl/multislam_diffpose
The backbone is trained end-to-end using a novel differentiable solver for wide-baseline two-view pose. Code: https://github.com/princeton-vl/multislam_diffpose
2024/04/25
31
https://github.com/holmeswww/agentkit
The chains of nodes can be designed to explicitly enforce a naturally structured "thought process". Code: https://github.com/holmeswww/agentkit
2024/04/25
199
https://github.com/ailab-cvc/seed-x
We hope that our work will inspire future research into what can be achieved by versatile multimodal foundation models in real-world applications. Code: https://github.com/ailab-cvc/seed-x
2024/04/25
92
https://github.com/Jyxarthur/flowsam
The objective of this paper is motion segmentation -- discovering and segmenting the moving objects in a video. Code: https://github.com/Jyxarthur/flowsam
2024/04/25
148
https://github.com/fasterdecoding/snapkv
Specifically, SnapKV achieves a consistent decoding speed with a 3. 6x increase in generation speed and an 8. 2x enhancement in memory efficiency compared to baseline when processing inputs of 16K tokens. Code: https://github.com/fasterdecoding/snapkv
2024/04/25
59
https://github.com/ez-hwh/autocrawler
We propose AutoCrawler, a two-stage framework that leverages the hierarchical structure of HTML for progressive understanding. Code: https://github.com/ez-hwh/autocrawler
2024/04/25
147
https://github.com/hiyouga/llama-factory
We propose a new metric to assess personality generation capability based on this evaluation method. Code: https://github.com/hiyouga/llama-factory
2024/04/25
19582
https://github.com/id-animator/id-animator
Based on this pipeline, a random face reference training method is further devised to precisely capture the ID-relevant embeddings from reference images, thus improving the fidelity and generalization capacity of our model for ID-specific video generation. Code: https://github.com/id-animator/id-animator
2024/04/25
73
https://github.com/yisol/IDM-VTON
Finally, we present a customization method using a pair of person-garment images, which significantly improves fidelity and authenticity. Code: https://github.com/yisol/IDM-VTON
2024/04/25
407
https://github.com/apple/corenet
To this end, we release OpenELM, a state-of-the-art open language model. Code: https://github.com/apple/corenet
2024/04/25
1747
https://github.com/Infini-AI-Lab/TriForce
However, key-value (KV) cache, which is stored to avoid re-computation, has emerged as a critical bottleneck by growing linearly in size with the sequence length. Code: https://github.com/Infini-AI-Lab/TriForce
2024/04/20
34
https://github.com/microsoft/recai
This paper introduces RecAI, a practical toolkit designed to augment or even revolutionize recommender systems with the advanced capabilities of Large Language Models (LLMs). Code: https://github.com/microsoft/recai
2024/04/20
254
https://github.com/FaceOnLive/Face-Liveness-Detection-SDK-Linux
SPSC and SDSC augment live samples into simulated attack samples by simulating spoofing clues of physical and digital attacks, respectively, which significantly improve the capability of the model to detect "unseen" attack types. Code: https://github.com/FaceOnLive/Face-Liveness-Detection-SDK-Linux
2024/04/19
184
https://github.com/liming-ai/ControlNet_Plus_Plus
To this end, we propose ControlNet++, a novel approach that improves controllable generation by explicitly optimizing pixel-level cycle consistency between generated images and conditional controls. Code: https://github.com/liming-ai/ControlNet_Plus_Plus
2024/04/19
55
https://github.com/Recognito-Vision/NIST-FRVT-Top-1-Face-Recognition
In the realm of security applications, biometric authentication systems play a crucial role, yet one often encounters challenges concerning privacy and security while developing one. Code: https://github.com/Recognito-Vision/NIST-FRVT-Top-1-Face-Recognition
2024/04/19
110
https://github.com/facebookresearch/llm-transparency-tool
We present the LM Transparency Tool (LM-TT), an open-source interactive toolkit for analyzing the internal workings of Transformer-based language models. Code: https://github.com/facebookresearch/llm-transparency-tool
2024/04/19
260
https://github.com/facebookresearch/generative-recommenders
Large-scale recommendation systems are characterized by their reliance on high cardinality, heterogeneous features and the need to handle tens of billions of user actions on a daily basis. Code: https://github.com/facebookresearch/generative-recommenders
2024/04/19
80
https://github.com/Beomi/InfiniTransformer
This work introduces an efficient method to scale Transformer-based Large Language Models (LLMs) to infinitely long inputs with bounded memory and computation. Code: https://github.com/Beomi/InfiniTransformer
2024/04/18
87
https://github.com/Recognito-Vision/Face-SDK-Linux-Demos
This enables our method - namely LAndmark-based Facial Self-supervised learning LAFS), to learn key representation that is more critical for face recognition. Code: https://github.com/Recognito-Vision/Face-SDK-Linux-Demos
2024/04/18
81
https://github.com/siyan-zhao/prepacking
In this work, we highlight the following pitfall of prefilling: for batches containing high-varying prompt lengths, significant computation is wasted by the standard practice of padding sequences to the maximum length. Code: https://github.com/siyan-zhao/prepacking
2024/04/18
27
Load more
TOP